E-Book, Englisch, 598 Seiten
Kaeslin Top-Down Digital VLSI Design
1. Auflage 2014
ISBN: 978-0-12-800772-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
From Architectures to Gate-Level Circuits and FPGAs
E-Book, Englisch, 598 Seiten
ISBN: 978-0-12-800772-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Since 1989, Hubert Kaeslin has headed the Micro-electronics Design Center of ETH Zurich, which taped out more than 300 circuit designs under his supervision over the past 23 years, both for research and educational purposes. He has written more than 75 scientific papers and his professional interests extend to digital signal processing, IT security, graph theory, and visual formalisms. Dr. Kaeslin is a Senior Member of IEEE and has been awarded the title of professor by ETH in 2010.
Autoren/Hrsg.
Weitere Infos & Material
1;Front Cover;1
2;Top-Down Digital VLSI Design: From Architectures to Gate-Level Circuits and FPGAs;4
3;Copyright;5
4;Contents;6
5;Preface;16
5.1;Why This Book?;16
5.2;Highlights;17
5.3;Notes to Instructors;17
6;Acknowledgments;20
7;Chapter 1: Introduction to Microelectronics;22
7.1;1.1 Economic Impact;22
7.2;1.2 Microelectronics Viewed From Different Perspectives;25
7.2.1;1.2.1 The Guinness Book of Records Point of View;25
7.2.2;1.2.2 The Marketing Point of View;26
7.2.2.1;General-purpose ICs;26
7.2.2.2;Application-specific integrated circuit;26
7.2.3;1.2.3 The Fabrication Point of View;28
7.2.3.1;Full-custom ICs;28
7.2.3.2;Semi-custom ICs;28
7.2.3.3;Field-programmable logic;32
7.2.3.4;Standard parts;32
7.2.4;1.2.4 The Design Engineer's Point of View;32
7.2.4.1;Hand layout;32
7.2.4.2;Cell libraries and schematic entry;33
7.2.4.3;Automatic circuit synthesis;36
7.2.4.4;Design with virtual components;38
7.2.4.5;Electronic system-level (ESL) design automation;39
7.2.5;1.2.5 The Business Point of View;39
7.3;1.3 The VLSI Design Flow;41
7.3.1;1.3.1 The Y-chart, a Map of Digital Electronic Systems;41
7.3.2;1.3.2 Major Stages in Digital VLSI Design;43
7.3.3;1.3.3 Cell Libraries;52
7.3.4;1.3.4 Electronic Design Automation Software;53
7.4;1.4 Problems;55
7.5;1.5 Appendix I: A Brief Glossary of Logic Families;56
7.6;1.6 Appendix II: An Illustrated Glossary ofCircuit-Related Terms;58
8;Chapter 2: Field-Programmable Logic;62
8.1;2.1 General Idea;62
8.2;2.2 Configuration Technologies;64
8.2.1;2.2.1 Static Memory;64
8.2.2;2.2.2 Flash Memory;65
8.2.3;2.2.3 Antifuses;65
8.3;2.3 Organization of Hardware Resources;67
8.3.1;2.3.1 Simple Programmable Logic Devices (SPLD);67
8.3.2;2.3.2 Complex Programmable Logic Devices (CPLD);68
8.3.3;2.3.3 Field-Programmable Gate Arrays (FPGA);68
8.4;2.4 Commercial Aspects;74
8.4.1;2.4.1 An Overview on FPL Device Families;74
8.4.2;2.4.2 The Price and the Benefits of Electrical Configurability;74
8.5;2.5 Extensions of the Basic Idea;76
8.6;2.6 The FPL Design Flow;80
8.7;2.7 Conclusions;82
9;Chapter 3: From Algorithms to Architectures;84
9.1;3.1 The Goals of Architecture Design;84
9.1.1;3.1.1 Agenda;84
9.2;3.2 The Architectural Solution Space;86
9.2.1;3.2.1 The Antipodes;86
9.2.2;3.2.2 What Makes an Algorithm Suitable for a Dedicated VLSI Architecture?;92
9.2.3;3.2.3 There is Plenty of Land Between the Antipodes;95
9.2.4;3.2.4 Assemblies of General-Purpose and Dedicated Processing Units;96
9.2.5;3.2.5 Host Computer with Helper Engines;97
9.2.6;3.2.6 Application-Specific Instruction set Processors;98
9.2.7;3.2.7 Reconfigurable Computing;100
9.2.8;3.2.8 Extendable Instruction set Processors;102
9.2.9;3.2.9 Platform ICs (DSPP);103
9.2.10;3.2.10 Digest;104
9.3;3.3 Dedicated VLSI Architectures and How to Design Them;108
9.3.1;3.3.1 There is Room for Remodeling in the Algorithmic Domain ... ;108
9.3.2;3.3.2 ... and there is Room in the Architectural Domain;110
9.3.3;3.3.3 Systems Engineers and VLSI Designers Must Collaborate;111
9.3.4;3.3.4 A Graph-Based Formalism for Describing Processing Algorithms;113
9.3.5;3.3.5 The Isomorphic Architecture;115
9.3.6;3.3.6 Relative Merits of Architectural Alternatives;116
9.3.7;3.3.7 Computation Cycle Versus Clock Period;118
9.4;3.4 Equivalence Transforms for Combinational Computations;119
9.4.1;3.4.1 Common Assumptions;120
9.4.2;3.4.2 Iterative Decomposition;121
9.4.2.1;Performance and cost analysis;121
9.4.3;3.4.3 Pipelining;124
9.4.3.1;Performance and cost analysis;124
9.4.3.2;Pipelining in the presence of multiple feedforward paths;127
9.4.4;3.4.4 Replication;129
9.4.4.1;Performance and cost analysis;129
9.4.5;3.4.5 Time Sharing;131
9.4.5.1;Performance and cost analysis;132
9.4.6;3.4.6 Associativity Transform;137
9.4.7;3.4.7 Other Algebraic Transforms;138
9.4.8;3.4.8 Digest;139
9.5;3.5 Options for Temporary Storage of Data;141
9.5.1;3.5.1 Data Access Patterns;141
9.5.2;3.5.2 Available Memory Configurations and Area Occupation;141
9.5.3;3.5.3 Storage Capacities;142
9.5.4;3.5.4 Wiring and the Costs of Going Off-Chip;143
9.5.5;3.5.5 Latency and Timing;143
9.5.6;3.5.6 Digest;144
9.6;3.6 Equivalence Transforms for Non-Recursive Computations;147
9.6.1;3.6.1 Retiming;147
9.6.2;3.6.2 Pipelining Revisited;148
9.6.3;3.6.3 Systolic Conversion;151
9.6.4;3.6.4 Iterative Decomposition and Time Sharing Revisited;151
9.6.5;3.6.5 Replication Revisited;152
9.6.6;3.6.6 Digest;153
9.7;3.7 Equivalence Transforms for Recursive Computations;154
9.7.1;3.7.1 The Feedback Bottleneck;154
9.7.2;3.7.2 Unfolding of First-Order Loops;155
9.7.2.1;Performance and cost analysis;157
9.7.3;3.7.3 Higher-Order Loops;158
9.7.3.1;Performance and cost analysis;160
9.7.4;3.7.4 Time-Variant Loops;160
9.7.4.1;Performance and cost analysis;161
9.7.5;3.7.5 Nonlinear or General Loops;161
9.7.6;3.7.6 Pipeline Interleaving, not Quite an Equivalence Transform;165
9.7.7;3.7.7 Digest;167
9.8;3.8 Generalizations of the Transform Approach;169
9.8.1;3.8.1 Generalization to other Levels of Detail;169
9.8.1.1;Architecture level;169
9.8.1.2;Bit level;170
9.8.2;3.8.2 Bit-Serial Architectures;171
9.8.3;3.8.3 Distributed Arithmetic;173
9.8.4;3.8.4 Generalization to other Algebraic Structures;176
9.8.4.1;Finite fields;176
9.8.4.2;Semirings;176
9.8.5;3.8.5 Digest;181
9.9;3.9 Conclusions;181
9.9.1;3.9.1 Summary;181
9.9.2;3.9.2 The Grand Architectural Alternativesfrom an Energy Point of View;184
9.9.3;3.9.3 A Guide to Evaluating Architectural Alternatives;186
9.10;3.10 Problems;189
9.11;3.11 Appendix I: A Brief Glossary of Algebraic Structures;191
9.11.1;Examples with one operation;192
9.11.2;Examples with two operations;192
9.12;3.12 Appendix II: Area and Delay Figures of VLSI Subfunctions;195
10;Chapter 4: Circuit Modeling with Hardware Description Languages;200
10.1;4.1 Motivation and Background;200
10.1.1;4.1.1 Why Hardware Synthesis?;200
10.1.2;4.1.2 Agenda;200
10.1.3;4.1.3 Alternatives for Modeling Digital Hardware;201
10.1.4;4.1.4 The Genesis of VHDL and SystemVerilog;201
10.1.5;4.1.5 Why Bother Learning Hardware Description Languages?;203
10.1.6;4.1.6 A First Look at VHDL and SystemVerilog;205
10.2;4.2 Key Concepts and Constructs of VHDL;207
10.2.1;4.2.1 Circuit Hierarchy and Connectivity;207
10.2.1.1;How to compose a circuit from components;210
10.2.2;4.2.2 Interacting Concurrent Processes;211
10.2.2.1;How to describe combinational logic behaviorally;212
10.2.2.2;How to describe a register behaviorally;214
10.2.3;4.2.3 A Discrete Replacement for Electrical Signals;219
10.2.3.1;The need for multiple logic values to describe a circuit node;219
10.2.3.2;Some logic values get collapsed during synthesis;221
10.2.3.2.1;How to model three-state outputs and busses;222
10.2.3.2.2;Selecting adequate data types;223
10.2.3.3;Data types for modeling multi-bit signals;224
10.2.3.4;Orientation of binary vectors;225
10.2.3.5;Data types for modeling fractional and floating point numbers;225
10.2.4;4.2.4 An Event-Driven Scheme of Execution;227
10.2.4.1;The need for a mechanism that schedules process execution;227
10.2.4.2;Simulation time versus execution time;228
10.2.4.3;The benefits of a discretized model of time;228
10.2.4.4;Transaction versus event;229
10.2.4.5;Delay modeling;230
10.2.4.6;The d delay;230
10.2.4.7;Signal versus variable;231
10.2.4.8;Event-driven simulation revisited;232
10.2.4.9;Sensitivity list;232
10.2.4.10;Wait statement30;233
10.2.4.11;What exactly is it that makes a process statement exhibit sequential behavior?;234
10.2.4.12;How to safely code sequential circuits for synthesis;235
10.2.4.13;Initial values cannot replace a reset mechanism;236
10.2.4.14;Detecting clock edges and other signal events;236
10.2.4.15;Signal attributes;237
10.2.4.16;How to check timing conditions;237
10.2.4.17;Concurrent assertion statements;237
10.2.5;4.2.5 Facilities for Model Parametrization;239
10.2.5.1;The need for supporting parametrized circuit models;239
10.2.5.2;Generics;240
10.2.5.3;The generate statement used to conditionally spawn concurrent processes;241
10.2.5.4;The generate statement used to conditionally instantiate components;242
10.2.5.5;The need to accommodate multiple models for one circuit block;244
10.2.5.6;Configuration specification and binding;244
10.2.5.7;Elaboration;245
10.2.6;4.2.6 Concepts Borrowed from Programming Languages;247
10.2.6.1;Structured flow control statements;247
10.2.6.2;Object;247
10.2.6.3;Constant;247
10.2.6.4;Variable;247
10.2.6.5;User-defined data types;247
10.2.6.6;Subtypes;248
10.2.6.7;Arrays and records;248
10.2.6.8;Type attributes and array attributes;248
10.2.6.9;Subprogram, function, and procedure;249
10.2.6.10;Package;249
10.2.6.11;Predefined package standard;251
10.2.6.12;Predefined package textio;251
10.2.6.13;Design unit and design file;252
10.2.6.14;Design library;252
10.2.6.15;Library and use clauses;253
10.2.6.16;Special libraries work and std;254
10.3;4.3 Key Concepts and Constructs of SystemVerilog;255
10.3.1;4.3.1 Circuit Hierarchy and Connectivity;255
10.3.1.1;The need for supporting modularity and hierarchical composition;255
10.3.1.2;Module, structural view;255
10.3.1.3;How to compose a circuit from components;256
10.3.1.4;Special constructs for modeling busses;257
10.3.2;4.3.2 Interacting Concurrent Processes;258
10.3.2.1;The need for modeling concurrent activities;258
10.3.2.2;How to describe combinational logic behaviorally;258
10.3.2.3;How to describe a register behaviorally;259
10.3.2.4;Local versus shared variables;260
10.3.2.5;Module, behavioral view;261
10.3.2.6;Hardware modeling styles compared;262
10.3.3;4.3.3 A Discrete Replacement for Electrical Signals;264
10.3.3.1;The need for multiple logic values to describe a circuit node;264
10.3.3.2;How to model three-state outputs and busses;266
10.3.3.3;Selecting adequate data types;267
10.3.3.4;Data types for modeling multi-bit signals;268
10.3.3.5;Orientation of binary vectors;269
10.3.4;4.3.4 An Event-Driven Scheme of Execution;270
10.3.4.1;The need for a mechanism that schedules process execution;270
10.3.4.2;Event-driven simulation;270
10.3.4.3;Sensitivity list, process suspension and reactivation;271
10.3.4.4;Delay modeling;271
10.3.4.5;Blocking versus nonblocking assignments;272
10.3.4.6;Always blocks;272
10.3.4.7;No binding order of execution for simultaneous events;273
10.3.4.8;Initial values cannot replace a reset mechanism;274
10.3.4.9;How to check timing conditions;274
10.3.5;4.3.5 Facilities for Model Parametrization;275
10.3.5.1;Parameters;275
10.3.5.2;The generate statement;276
10.3.5.3;The need to accommodate multiple models for one circuit block;277
10.3.5.4;Conditional compilation of source code;277
10.3.6;4.3.6 Concepts Borrowed from Programming Languages;278
10.3.6.1;User-defined data types;278
10.3.6.2;Subroutine, function, and task;278
10.3.6.3;Package;279
10.3.6.4;Classes, semaphores, mailboxes, etc;280
10.3.6.5;System tasks;281
10.4;4.4 Automatic Circuit Synthesis From HDL Models;283
10.4.1;4.4.1 Synthesis Overview;283
10.4.2;4.4.2 Data Types;284
10.4.3;4.4.3 Finite State Machines and Sequential Subcircuits in General;284
10.4.3.1;Hardware-compatible wake-up conditions for all processes;284
10.4.3.2;Explicit versus implicit state models;285
10.4.3.3;How to capture a finite state machine;286
10.4.3.4;Packing an entire FSM into a single process statement;286
10.4.3.5;Distributing an FSM over two (or more) concurrent processes;287
10.4.3.6;FSM optimization ignored in the language standards;293
10.4.4;4.4.4 RAM and ROM Macrocells;293
10.4.5;4.4.5 Timing Constraints;296
10.4.5.1;How to partition a circuit in view of synthesis and optimization;300
10.4.5.2;Synthesis constraints are not part of the HDL standards;297
10.4.5.3;How to formulate timing constraints;298
10.4.5.4;How to partition a circuit in view of synthesis and optimization;300
10.4.6;4.4.6 Limitations and Caveats;302
10.4.6.1;Some circuits essentially need to be defined as gate-level netlists;302
10.4.7;4.4.7 How to Establish a Register Transfer Level Model Step by Step;303
10.5;4.5 Conclusions;305
10.6;4.6 Problems;307
10.7;4.7 Appendix I: VHDL and SystemVerilog Side by Side;310
10.8;4.8 Appendix II: VHDL Extensions and Standards;314
10.8.1;4.8.1 Protected Shared Variables IEEE 1076a;314
10.8.2;4.8.2 The Analog and Mixed-Signal Extension IEEE 1076.1;315
10.8.3;4.8.3 Mathematical Packages for Real and Complex Numbers IEEE 1076.2;317
10.8.4;4.8.4 The Arithmetic Packages IEEE 1076.3;317
10.8.5;4.8.5 The Standard Delay Format (SDF) IEEE 1497;318
10.8.6;4.8.6 A Handy Compilation of Type Conversion Functions;319
10.8.7;4.8.7 Coding Guidelines;321
11;Chapter 5: Functional Verification;322
11.1;5.1 Goals of Design Verification;322
11.1.1;5.1.1 Agenda;323
11.2;5.2 How to Establish Valid Functional Specifications;324
11.2.1;5.2.1 Formal Specification;325
11.2.2;5.2.2 Rapid Prototyping;325
11.2.3;5.2.3 Hardware-Assisted Verification;327
11.3;5.3 Preparing Effective Simulation and Test Vectors;328
11.3.1;5.3.1 A First Glimpse at VLSI Testing;328
11.3.2;5.3.2 Fully Automated Response Checking is a Must;329
11.3.3;5.3.3 Assertion-Based Verification Checks from Within;331
11.3.4;5.3.4 Exhaustive Verification Remains an Elusive Goal;336
11.3.5;5.3.5 Directed Verification is Indispensable but has its Limitations;337
11.3.5.1;Testing distinct functional mechanisms separately;337
11.3.5.2;Monitoring toggle counts is of limited use;340
11.3.5.3;Automatic test pattern generation does not help either;340
11.3.5.4;Monitoring code coverage helps but does not suffice;340
11.3.5.5;Routine is the dark side of experience;342
11.3.5.6;Even real-world data sometimes prove too forgiving;342
11.3.6;5.3.6 Directed Random Verification Guards Against Human Omissions;343
11.3.7;5.3.7 Statistical Coverage Analysis Produces Meaningful Metrics;348
11.3.8;5.3.8 Collecting Test Cases from Multiple Sources Helps;349
11.3.9;5.3.9 Separating Test Development from Circuit Design Helps;350
11.4;5.4 Consistency and Efficiency Considerations;352
11.4.1;5.4.1 A Coherent Schedule for Simulation and Test;353
11.4.2;5.4.2 Protocol Adapters Help Reconcile Different Views on Data and Latency;356
11.4.3;5.4.3 Calculating High-Level Figures of Merit;359
11.4.4;5.4.4 Patterning Simulation Set-Ups after the Target System;360
11.4.5;5.4.5 Initialization;361
11.4.6;5.4.6 Trimming Run Times by Skipping Redundant Simulation Sequences;361
11.5;5.5 Testbench Coding and HDL Simulation;362
11.5.1;5.5.1 Modularity and Reuse are the Keys to Testbench Design;362
11.5.2;5.5.2 Anatomy of a File-Based Testbench;363
11.5.2.1;All disk files stored in ASCII format;364
11.5.2.2;Separate processes for stimulus application and for response acquisition;364
11.5.2.3;Stimuli and responses collected in records;364
11.5.2.4;Simulation to proceed even after expected responses have been exhausted;365
11.5.2.5;Stoppable clock generator;365
11.5.2.6;Reset treated as an ordinary stimulus bit;365
11.6;5.6 Conclusions;367
11.7;5.7 Problems;368
11.8;5.8 Appendix I: Formal Approaches to Functional Verification;371
11.8.1;Equivalence checking;371
11.8.2;Model checking;371
11.8.3;Deductive verification or model proving;372
11.9;5.9 Appendix II: Deriving a Coherent Schedule;373
11.9.1;External timing requirements imposed by a model under test (MUT);373
11.9.2;Precedence relations captured in a constraint graph;374
11.9.3;Solving the constraint graph;375
11.9.4;Anceau diagrams help visualize periodic events and timing;375
12;Chapter 6: The Case for Synchronous Design;378
12.1;6.1 Introduction;378
12.2;6.2 The Grand Alternatives for Regulating State Changes;380
12.2.1;6.2.1 Synchronous Clocking;381
12.2.2;6.2.2 Asynchronous Clocking;381
12.2.3;6.2.3 Self-Timed Clocking;382
12.3;6.3 Why a Rigorous Approach to Clocking is Essential in VLSI;385
12.3.1;6.3.1 The Perils of Hazards;385
12.3.2;6.3.2 The Pros and Cons of Synchronous Clocking;386
12.3.3;6.3.3 Clock-as-Clock-Can is not an Option in VLSI;388
12.3.4;6.3.4 Fully Self-Timed Clocking is not Normally an Option Either;389
12.3.5;6.3.5 Hybrid Approaches to System Clocking;389
12.4;6.4 The Dos and Donts of Synchronous Circuit Design;391
12.4.1;6.4.1 First Guiding Principle: Dissociate Signal Classes!;391
12.4.2;6.4.2 Second Guiding Principle: Allow for Circuits to Settle Before Clocking!;392
12.4.3;6.4.3 Synchronous Design Rules at a More Detailed Level;393
12.4.3.1;Unclocked bistables prohibited;393
12.4.3.2;Zero-latency loops prohibited;394
12.4.3.3;Monoflops, one-shots, edge detectors and clock chopping prohibited;395
12.4.3.4;Clock and reset signals to be distributed by fanout trees;396
12.4.3.5;Beware of unsafe clock gates;396
12.4.3.6;No gating of reset signals;396
12.4.3.7;Bistables with both asynchronous reset and preset inputs prohibited;398
12.4.3.8;Reset signals to be properly conditioned;398
12.4.3.9;Pay attention to portable design;400
12.5;6.5 Conclusions;401
12.6;6.6 Problems;402
12.7;6.7 Appendix: On Identifying Signals;403
12.7.1;6.7.1 Signal Class;403
12.7.1.1;Colors in schematic diagrams and HDL source code;403
12.7.1.2;Clock symbols and clock domains in schematic diagrams;404
12.7.2;6.7.2 Active Level;405
12.7.2.1;Naming of complementary signals;405
12.7.2.2;Inversion symbols in schematic diagrams;405
12.7.3;6.7.3 Signaling Waveforms;406
12.7.4;6.7.4 Three-State Capability;407
12.7.5;6.7.5 Inputs, Outputs and Bidirectional Ports;407
12.7.6;6.7.6 Present State vs. Next State;408
12.7.7;6.7.7 Signal Naming Convention Syntax;408
12.7.8;6.7.8 Usage of Upper and Lower Case Letters in HDL Source Code;409
12.7.9;6.7.9 A Note on the Portability of Names Across EDA Platforms;410
13;Chapter 7: Clocking of Synchronous Circuits;412
13.1;7.1 What is the Difficulty With Clock Distribution?;412
13.1.1;7.1.1 Agenda;413
13.1.2;7.1.2 Timing Quantities Related to Clock Distribution;414
13.2;7.2 How Much Skew and Jitter Does a Circuit Tolerate?;415
13.2.1;7.2.1 Basics;415
13.2.2;7.2.2 Single-Edge-Triggered One-Phase Clocking;417
13.2.2.1;Hardware resources and operation principle;417
13.2.2.2;Detailed analysis;418
13.2.2.3;Setup condition;418
13.2.2.4;Hold condition;420
13.2.2.5;Implications;420
13.2.3;7.2.3 Dual-Edge-Triggered One-Phase Clocking;423
13.2.3.1;Hardware resources and operation principle;423
13.2.3.2;Implications;424
13.2.3.3;Scan-type testing requires the presence of shift registers;421
13.2.3.4;Watch out when mixing cells from different libraries;422
13.2.3.5;Hold time fixing;423
13.2.4;7.2.4 Symmetric Level-Sensitive Two-Phase Clocking;425
13.2.4.1;Hardware resources and operation principle;425
13.2.4.2;Detailed analysis;426
13.2.4.3;Setup condition;426
13.2.4.4;Hold condition;427
13.2.4.5;Implications;427
13.2.5;7.2.5 Unsymmetric Level-Sensitive Two-Phase Clocking;428
13.2.5.1;Hardware resources and operation principle;428
13.2.5.2;Detailed analysis;429
13.2.5.3;Setup condition;429
13.2.5.4;Hold condition;429
13.2.5.5;Implications;430
13.2.6;7.2.6 Single-Wire Level-Sensitive Two-Phase Clocking;432
13.2.6.1;Hardware resources and operation principle;432
13.2.6.2;Detailed analysis;432
13.2.6.3;Setup condition;435
13.2.6.4;Hold condition;435
13.2.6.5;Implications;433
13.2.7;7.2.7 Level-Sensitive One-Phase Clocking and Wave Pipelining;433
13.2.7.1;Hardware resources and operation principle;433
13.2.7.2;Detailed analysis;435
13.2.7.3;Implications;435
13.3;7.3 How to Keep Clock Skew Within Tight Bounds;437
13.3.1;7.3.1 Clock Waveforms;437
13.3.2;7.3.2 Collective Clock Buffers;438
13.3.3;7.3.3 Distributed Clock Buffer Trees;440
13.3.4;7.3.4 Hybrid Clock Distribution Networks;442
13.3.5;7.3.5 Clock Skew Analysis;442
13.4;7.4 How to Achieve Friendly Input/Output Timing;444
13.4.1;7.4.1 Friendly as Opposed to Unfriendly I/O Timing;444
13.4.2;7.4.2 Impact of Clock Distribution Delay on I/O Timing;445
13.4.3;7.4.3 Impact of PTV Variations on I/O Timing;447
13.4.4;7.4.4 Registered Inputs and Outputs;448
13.4.5;7.4.5 Adding Artificial Contamination Delay to Data Inputs;448
13.4.6;7.4.6 Driving Input Registers From an Early Clock;449
13.4.7;7.4.7 Clock Tapped From Slowest Component in Clock Domain;449
13.4.8;7.4.8 "Zero-Delay'' Clock Distribution by Way of a DLL or PLL;450
13.5;7.5 How to Implement Clock Gating Properly;453
13.5.1;7.5.1 Traditional Feedback-Type Registers with Enable;453
13.5.2;7.5.2 A Crude and Unsafe Approach to Clock Gating;454
13.5.3;7.5.3 A Simple Clock Gating Scheme that May Work Under Certain Conditions;455
13.5.4;7.5.4 Safe Clock Gating Schemes;455
13.6;7.6 Summary;459
13.7;7.7 Problems;462
14;Chapter 8: Acquisition of Asynchronous Data;466
14.1;8.1 Motivation;466
14.2;8.2 Data Consistency in Vectored Acquisition;468
14.2.1;8.2.1 Plain Bit-Parallel Synchronization;468
14.2.2;8.2.2 Unit-Distance Coding;469
14.2.3;8.2.3 Suppression of Jumbled Data Patterns;470
14.2.4;8.2.4 Handshaking;471
14.2.4.1;NRZ or two-phase handshake protocol;472
14.2.4.2;RZ or four-phase handshake protocol;473
14.2.5;8.2.5 Partial Handshaking;474
14.2.6;8.2.6 FIFO Synchronizers;475
14.3;8.3 Data Consistency in Scalar Acquisition;478
14.3.1;8.3.1 No Synchronization Whatsoever;478
14.3.2;8.3.2 Synchronization at Multiple Places;478
14.3.3;8.3.3 Synchronization at a Single Place;479
14.3.4;8.3.4 Synchronization From a Slow Clock;479
14.4;8.4 Marginal Triggering and Metastability;481
14.4.1;8.4.1 Metastability and How it Becomes Manifest;481
14.4.2;8.4.2 Repercussions on Circuit Functioning;484
14.4.3;8.4.3 A statistical Model for Estimating Synchronizer Reliability;485
14.4.4;8.4.4 Plesiochronous Interfaces;487
14.4.5;8.4.5 Containment of Metastable Behavior;487
14.4.5.1;Estimate reliability at the system level;488
14.4.5.2;Select flip-flops with good metastability resolution;488
14.4.5.3;Remove combinational delays from synchronizers;489
14.4.5.4;Drive synchronizers with fast-switching clock;489
14.4.5.5;Free synchronizers from unnecessary loads;489
14.4.5.6;Lower clock frequency at the consumer end;489
14.4.5.7;Use multi-stage synchronizers;490
14.4.5.8;Keep feedback path within synchronizers short;490
14.5;8.5 Summary;491
14.6;8.6 Problems;492
15;Appendix A: Elementary Digital Electronics;494
15.1;A.1 Introduction;494
15.1.1;A.1.1 Common Number Representation Schemes;494
15.1.2;A.1.2 Floating Point Number Formats;496
15.1.3;A.1.3 Notational Conventions for Two-Valued Logic;498
15.2;A.2 Theoretical Background of Combinational Logic;499
15.2.1;A.2.1 Truth Table;499
15.2.2;A.2.2 The n-Cube;500
15.2.3;A.2.3 Karnaugh Map;500
15.2.4;A.2.4 Program Code;500
15.2.5;A.2.5 Logic Equations;501
15.2.6;A.2.6 Two-Level Logic;503
15.2.6.1;Sum-of-products;503
15.2.6.2;Product-of-sums;503
15.2.6.3;Other two-level logic forms;503
15.2.7;A.2.7 Multi-Level Logic;504
15.2.8;A.2.8 Symmetric and Monotone Functions;505
15.2.9;A.2.9 Threshold Functions;506
15.2.10;A.2.10 Complete Gate Sets;506
15.2.11;A.2.11 Multi-Output Functions;507
15.2.12;A.2.12 Logic Minimization;508
15.2.12.1;Metrics for logic complexity and implementation costs;508
15.2.12.2;Minimal versus unredundant expressions;509
15.2.12.3;Multi-level versus two-level logic;510
15.2.12.4;Multi-output versus single-output minimization;510
15.2.12.5;Manual versus automated logic optimization;511
15.3;A.3 Circuit Alternatives for Implementing Combinational Logic;512
15.3.1;A.3.1 Random Logic;512
15.3.2;A.3.2 Programmable Logic Array (PLA);512
15.3.3;A.3.3 Read-Only Memory (ROM);514
15.3.4;A.3.4 Array Multiplier;514
15.3.5;A.3.5 Digest;515
15.4;A.4 Bistables and Other Memory Circuits;517
15.4.1;A.4.1 Flip-Flops or Edge-Triggered Bistables;518
15.4.1.1;The data or D-type flip-flop;518
15.4.1.2;Initialization facilities;518
15.4.1.3;Scan facility;519
15.4.1.4;Enable/disable facility;520
15.4.1.5;The toggle or T-type flip-flop;520
15.4.1.6;The nostalgia or JK-type flip-flop;521
15.4.2;A.4.2 Latches or Level-Sensitive Bistables;521
15.4.2.1;The data or D-type latch;521
15.4.3;A.4.3 Unclocked Bistables;522
15.4.3.1;The SR-seesaw;522
15.4.3.2;The edge-triggered SR-seesaw;523
15.4.3.3;The Muller-C element;524
15.4.3.4;The mutual exclusion element;525
15.4.4;A.4.4 Random Access Memories (RAM);527
15.5;A.5 Transient Behavior of Logic Circuits;529
15.5.1;A.5.1 Glitches, a Phenomenological Perspective;529
15.5.2;A.5.2 Function Hazards, a Circuit-Independent Mechanism;530
15.5.3;A.5.3 Logic Hazards, a Circuit-Dependent Mechanism;531
15.5.4;A.5.4 Digest;533
15.6;A.6 Timing Quantities;534
15.6.1;A.6.1 Delay Parameters Serve for Combinational and Sequential Circuits;534
15.6.2;A.6.2 Timing Conditions Get Imposed by Sequential Circuits Only;536
15.6.3;A.6.3 Secondary Timing Quantities are Derived From Primary Ones;538
15.6.4;A.6.4 Timing Constraints Address Synthesis Needs;539
15.7;A.7 Basic Microprocessor Input/Output Transfer Protocols;540
15.8;A.8 Summary;542
16;Appendix B: Finite State Machines;544
16.1;B.1 Abstract Automata;544
16.1.1;B.1.1 Mealy Machine;545
16.1.2;B.1.2 Moore Machine;546
16.1.3;B.1.3 Medvedev Machine;547
16.1.4;B.1.4 Relationships Between Finite State Machine Models;548
16.1.4.1;Equivalence of Mealy and Moore machines in the context of automata theory;548
16.1.4.2;Equivalence of Mealy and Moore machines in the context of hardware design;550
16.1.4.3;Equivalence of Moore and Medvedev machines;550
16.1.5;B.1.5 Taxonomy of Finite State Machines;552
16.1.6;B.1.6 State Reduction;553
16.2;B.2 Practical Aspects and Implementation Issues;555
16.2.1;B.2.1 Parasitic States and Symbols;555
16.2.2;B.2.2 Mealy-, Moore-, Medvedev-Type, and Combinational Output Bits;557
16.2.3;B.2.3 Through Paths and Logic Instability;558
16.2.4;B.2.4 Switching Hazards;559
16.2.5;B.2.5 Hardware Costs;560
16.2.5.1;Concurrency, hierarchy and modularity are key to efficiency;560
16.2.5.2;State reduction;561
16.2.5.3;State encoding;562
16.3;B.3 Summary;563
17;Appendix C: Symbols and Constants;564
17.1;C.1 Abbreviations;564
17.2;C.2 Mathematical Symbols;565
17.3;C.3 Physical and Material Constants;569
17.3.1;A note on carbon allotropes;569
18;Bibliography;574
19;Index;586
Field-Programmable Logic
Abstract
What makes Field-Programmable Logic (FPL) attractive in many applications are their short turnaround times and low up-front costs. The hundreds of variations may be confusing, however, and our text explains the key commonalities and differences between popular device families. The configuration technology (SRAM, flash, antifuse) defines whether a device is reconfigurable or one-time programmable, and whether the configuration persists after power down. How hardware resources are organized internally sets apart Field-Programmable Gate Arrays (FPGA) from Complex Programmable Logic Devices (CPLD). Extra circuits such as block RAMs, configurable datapath units, hardwired dedicated building blocks, and embedded microprocessor cores further differentiate products. Front-end design follows a flow that is similar to that of ASICs, but that requires taking into account the particularities and limitations of the platform targeted.
Keywords
Field-programmable Logic (FPL)
Field-Programmable Gate Array (FPGA)
Complex Programmable Logic Devices (CPLD)
FPL design flow
Xilinx
What makes field-programmable logic (FPL) attractive in many applications are their low up-front costs and short turnaround times. That it should be possible to turn a finished piece of silicon into an application-specific circuit by purely electrical means — i.e. with no bespoke photomasks or wafer processing steps — may seem quite surprising at first, and section 2.1 demonstrates the basic approach. Sections from 2.2 onwards then give technical details and explain the major differences that set apart the various product families from each other, before the particularities of the FPL design flow get discussed in section 2.6.
2.1 General idea
The term “programmable” in field-programmable logic is a misnomer as there is no program, no instruction sequence to execute. Instead, pre-manufactured subcircuits get configured into the target circuit via electrically programmable links that can be done — and in many cases also undone — as dictated by so called configuration bits. This is nicely illustrated in figs.2.1 to 2.3 from [8] (copyright Wiley-VCH Verlag GmbH & Co. KG, reprinted with permission).
Key properties of any FPL device depend on decisions taken by its developers along two dimensions. A first choice refers to how the device is actually being configured and how its configuration is stored electrically, while a second choice is concerned with the overall organization of the hardware resources made available to customers.1
2.2 Configuration technologies
Three configuration technologies coexist today, they all have their roots in memory technology.
2.2.1 Static memory
The key element here is an electronic switch — such as a transmission gate, a pass transistor, or a three-state buffer — that gets turned “on” or “off ” under control of a configuration bit. Unlimited reprogrammability is obtained from storing the configuration data in static memory (SRAM) cells or in similar on-chip subcircuits built from two cross-coupled inverters, see fig.2.4a.
Reconfigurability is very helpful for debugging. It permits one to probe inner nodes, to alternate between normal operation and various diagnostic modes, and to patch a design once a flaw has been located. Many RAM-based FPL devices further allow for reconfiguring their inner logic during operation, a capability known as in-system configuration (ISC) that opens a door towards reconfigurable computing.
As a major drawback of SRAM-based storage, an FPL device must (re-)obtain the entire configuration — the settings of all its programmable links — from outside whenever it is being powered up. The problem is solved in one of three possible ways, namely
(a) by reading from a dedicated bit-serial or bit-parallel off-chip ROM,
(b) by downloading a bit stream from a host computer, or
(c) by long-term battery backup.
2.2.2 Flash memory
Flash memories rely on special MOSFETs where a second gate electrode is sandwiched between the transistor's bulk material underneath and a control gate above, see fig.2.4b. The name floating gate captures the fact that this gate is entirely surrounded by dielectric material. An electrical charge trapped there determines whether the MOSFET, and hence the programmable link too, is “on”or “off ”.2
Charging occurs by way of hot electron injection from the channel. That is, a strong lateral field applied between source and drain accelerates electrons to the point where they get injected through the thin dielectric layer into the floating gate, see fig.2.5a. The necessary programming voltage on the order of 5 to 20 V is typically generated internally by an on-chip charge pump.
Erasure occurs by allowing the electrons trapped on the floating gate to tunnel through the oxide layer underneath the floating gate. The secret is a quantum-mechanical effect known as Fowler-Nordheim tunneling that comes into play when a strong vertical field (8 … 10 MV/cm or so) is applied across the gate oxide.
Flash FPL devices are non-volatile and immediately live at power up, thereby doing away with the need for any kind of configuration-backup apparatus. The fact that erasure must occur in chunks, that is to say many bits at a time, is perfectly adequate in the context of FPL. Data retention times vary between 10 and 40 years. Endurance of flash FPL is typically specified with 100 to 1000 configure-erase cycles, which is much less than for flash memory chips.
2.2.3 Antifuses
Fuses, which were used in earlier bipolar PROMs and SPLDs, are narrow bridges of conducting material that blow in a controlled fashion when a programming current is forced through. Antifuses, such as those employed in today's FPGAs, are thin dielectrics separating two conducting layers that are made to rupture upon applying a programming voltage, thereby establishing a conductive path of low impedance.
In either case, programming is permanent. Whether this is desirable or not depends on the appli- cation. Full factory testing prior to programming of one-time programmable links is impossible for obvious reasons. Special circuitry is incorporated to test the logic devices and routing tracks at the manufacturer before the unprogrammed devices are being shipped. On the other hand, antifuses are only about the size of a contact or via and, therefore, allow for higher densities than repro- grammable links, see fig.2.4c and d. Antifuse-based FPL is also less sensitive to radiation effects, offers superior protection against unauthorized cloning, and does not need to be configured following power-up.
Table 2.1
FPL configuration technologies compared
2.3 Organization of hardware resources
2.3.1 Simple programmable logic devices (SPLD)
Historically, FPL has evolved from purely combinational devices with just one or two programmable levels of logic such as ROMs, PALs and PLAs. Flip-flops and local feedback paths were added later to allow for the construction of finite state machines, see fig.2.6a and b. Products of this kind continue to be commercially available for glue logic applications. Classic SPLD examples include the 18P8 (combinational) and the 22V10 (sequential).
The rigid two-level-logic-plus-register architecture and the scanty resources (number of inputs, outputs, product terms, flip-flops) naturally restrict SPLDs to small applications. More powerful architectures had thus to be sought, and the spectacular progress of VLSI technology has made their implementation economically feasible from the late 1980's onwards.
2.3.2 Complex programmable logic devices (CPLD)
CPLDs simply followed the motto “more of the same”, see fig.2.6c. Many identical subcircuits, each of which conforms to a classic SPLD, are combined on a single chip together with a large programmable interconnect matrix or network. A difficulty...




