Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky,

Slides:



Advertisements
Similar presentations
Delay models (I) A B C Real (analog) behaviorAbstract behavior A B C Abstractions are necessary to define delay models manageable for design, synthesis.
Advertisements

1 Bridging the gap between asynchronous design and designers Hao Zheng.
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Combinational Logic.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 BalsaOpt a tool for Balsa Synthesis Francisco Fernández-Nogueira, UPC (Spain) Josep Carmona, UPC (Spain)
1 Logic Design of Asynchronous Circuits Jordi Cortadella Jim Garside Alex Yakovlev Univ. Politècnica de Catalunya, Barcelona, Spain Manchester University,
1 Advanced Digital Design Synthesis of Control Circuits by A. Steininger and J. Lechner Vienna University of Technology.
Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro.
ECE 551 Digital System Design & Synthesis Lecture 09 Synthesis of Common Verilog Constructs.
Synchronous Digital Design Methodology and Guidelines
Hazard-free logic synthesis and technology mapping I Jordi Cortadella Michael Kishinevsky Alex Kondratyev Luciano Lavagno Alex Yakovlev Univ. Politècnica.
Hardware and Petri nets Synthesis of asynchronous circuits from Signal Transition Graphs.
Direct synthesis of large-scale asynchronous controllers using a Petri-net-based approach Ivan BlunnoPolitecnico di Torino Alex BystrovUniv. Newcastle.
Logic Synthesis for Asynchronous Circuits Based on Petri Net Unfoldings and Incremental SAT Victor Khomenko, Maciej Koutny, and Alex Yakovlev University.
Hardware and Petri nets: application to asynchronous circuit design Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.
Formal Verification of Safety Properties in Timed Circuits Marco A. Peña (Univ. Politècnica de Catalunya) Jordi Cortadella (Univ. Politècnica de Catalunya)
Introduction to asynchronous circuit design: specification and synthesis Part IV: Synthesis from HDL Other synthesis paradigms.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
Introduction to asynchronous circuit design: specification and synthesis Part III: Advanced topics on synthesis of control circuits from STGs.
1 Logic design of asynchronous circuits Part II: Logic synthesis from concurrent specifications.
Asynchronous Sequential Logic
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat.
Introduction to asynchronous circuit design: specification and synthesis Part II: Synthesis of control circuits from STGs.
COMP Clockless Logic and Silicon Compilers Lecture 3
Combining Decomposition and Unfolding for STG Synthesis (application paper) Victor Khomenko 1 and Mark Schaefer 2 1 School of Computing Science, Newcastle.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
1 Logic synthesis from concurrent specifications Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain In collaboration with M. Kishinevsky,
1 Clockless Logic Prof. Montek Singh Feb. 3, 2004.
Asynchronous Interface Specification, Analysis and Synthesis M. Kishinevsky Intel Corporation J. Cortadella Technical University of Catalonia.
1 Logic design of asynchronous circuits Part III: Advanced topics on synthesis.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
Visualisation and Resolution of Coding Conflicts in Asynchronous Circuit Design A. Madalinski, V. Khomenko, A. Bystrov and A. Yakovlev University of Newcastle.
Bridging the gap between asynchronous design and designers Part II: Logic synthesis from concurrent specifications.
1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.
Resolution of Encoding Conflicts by Signal Insertion and Concurrency Reduction based on STG Unfoldings V. Khomenko, A. Madalinski and A. Yakovlev University.
STG-based synthesis and Petrify J. Cortadella (Univ. Politècnica Catalunya) Mike Kishinevsky (Intel Corporation) Alex Kondratyev (University of Aizu) Luciano.
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
1 State Encoding of Large Asynchronous Controllers Josep Carmona and Jordi Cortadella Universitat Politècnica de Catalunya Barcelona, Spain.
Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya.
1 Petrify: Method and Tool for Synthesis of Asynchronous Controllers and Interfaces Jordi Cortadella (UPC, Barcelona, Spain), Mike Kishinevsky (Intel Strategic.
Automatic synthesis and verification of asynchronous interface controllers Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.
Derivation of Monotonic Covers for Standard C Implementation Using STG Unfoldings Victor Khomenko.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Asynchronous Circuit Verification and Synthesis with Petri Nets J. Cortadella Universitat Politècnica de Catalunya, Barcelona Thanks to: Michael Kishinevsky.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Automated synthesis of micro-pipelines from behavioral Verilog HDL Ivan BlunnoPolitecnico di Torino Luciano LavagnoUniversità di Udine.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
Digital Computer Design Fundamental
A Usable Reachability Analyser Victor Khomenko Newcastle University.
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Curtis A. Nelson 1 Technology Mapping of Timed Circuits Curtis A. Nelson University of Utah September 23, 2002.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 99-1 Under-Graduate Project Design of Datapath Controllers Speaker: Shao-Wei Feng Adviser:
Specification mining for asynchronous controllers Javier de San Pedro† Thomas Bourgeat ‡ Jordi Cortadella† † Universitat Politecnica de Catalunya ‡ Massachusetts.
Synthesis from HDL Other synthesis paradigms
Asynchronous Interface Specification, Analysis and Synthesis
Synthesis of Speed Independent Circuits Based on Decomposition
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Part IV: Synthesis from HDL Other synthesis paradigms
Synthesis of asynchronous controllers from Signal Transition Graphs:
332:437 Lecture 8 Verilog and Finite State Machines
De-synchronization: from synchronous to asynchronous
Clockless Logic: Asynchronous Pipelines
332:437 Lecture 8 Verilog and Finite State Machines
Presentation transcript:

Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic, USA Luciano Lavagno, Università di Udine, Italy

Outline I: Introduction to basic concepts on asynchronous design II: Synthesis of control circuits from STGs III: Advanced topics on synthesis of control circuits from STGs IV: Synthesis from HDL and other synthesis paradigms Note: no references in the tutorial

Introduction to asynchronous circuit design: specification and synthesis Part I: Introduction to basic concepts on asynchronous circuit design

Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

Synchronous circuit RRRRCL CLK Implicit synchronization

Asynchronous circuit RRRRCL Explicit synchronization: Req/Ack handshakes Req Ack

Synchronous communication Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set-up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency)

Dual rail Two wires per bit –“00” = spacer, “01” = 0, “10” = 1 n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist

Bundled data Validity signal –Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols –level sensitive (latch) –transition sensitive (register): 2-phase / 4-phase

Example: memory read cycle Transition signaling, 4-phase Valid address Address Valid data Data AA DD

Example: memory read cycle Transition signaling, 2-phase Valid address Address Valid data Data AA DD

Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

Asynchronous modules Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible, e.g. by overlapping the return-to- zero phase of step i-1 with the evaluation phase of step i) Data INData OUT req inreq out ack inack out DATA PATH CONTROL startdone

Completion detection Cdone Completion detection tree

Asynchronous latches: C element C A B Z A B Z Z 1 0 Z Vdd Gnd A A A AB B B B Z Z Z

Dual-rail logic A.t A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment

Differential cascode voltage switch logic start A.t B.t C.t A.fB.f C.f Z.tZ.f done 3-input AND/NAND gate

Bundled-data logic blocks delay startdone logic Conventional logic + matched delay

Micropipelines (Sutherland 89) LLLLlogic R in A out C C C C R out A in delay

Data-path / Control LLLLlogic R in R out CONTROL A in A out

Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

Control specification A+ B+ A- B- A B A input B output

Control specification A+ B+ A- B- A B

Control specification A+ B- A- B+ A B

Control specification A+ C- A- C+ A C B+ B- B C

Control specification A+ C- A- C+ A C B+ B- B C

Control specification C C Ri Ro Ai Ao Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl

A simple filter: specification y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop R in A in A out R out IN OUT filter

A simple filter: block diagram xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT x and y are level-sensitive latches (transparent when R=1) + is a bundled-data adder (matched delay between R a and A a ) R in indicates the validity of IN After A in + the environment is allowed to change IN (R out,A out ) control a level-sensitive latch at the output

A simple filter: control spec. xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out -

A simple filter: control impl. R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out - C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out

Control: observable behavior Rx+Rx+ R in + Ax+Ax+Ra+Ra+Aa+Aa+R out +A out +z+R out -A out -Ry+Ry+ Ry-Ry- Ay+Ay+ Rx-Rx-Ax-Ax- Ay-Ay- A in - A in + Ra-Ra- R in - Aa-Aa- z- C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out z

Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: Environment: 3 times units Gates: 1 time unit events: x+  x’-  y+  z+  z’-  x-  x’+  z-  z’+  y-  time:

Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: unbounded delays events: x+  x’-  y+  z+  x-  x’+  y- time: very slow failure !

Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires

Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. –Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. –Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. –DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). –Formally, it is the same as speed independent –In practice, different synthesis strategies are used BD SI  QDI DI

Motivation (designer’s view) Modularity –Plug-and-play interconnectivity Reusability –IPs with abstract timing behaviors High-performance –Average-case performance (no worst-case delay synchronization) –No clock skew (local timing assumptions instead) Many interfaces are asynchronous –Buses, networks,...

Motivation (technology aspects) Low power –Automatic clock gating Electromagnetic compatibility –No peak currents around clock edges Robustness –High immunity to technology and environment variations (in-die variations, temperature, power supply,...)

Problems Concurrent models for specification –CSP, Petri nets,... Difficult to design –Hazards, synchronization Complex timing analysis –Difficult to estimate performance Difficult to test –No way to stop the clock

But we have some success stories... Philips AMULET microprocessors Sharp Intel (RAPPID) IBM (interlocked pipeline) Start-up companies: –Theseus Logic, Cogency, ADD...

Introduction to asynchronous circuit design: specification and synthesis Part II: Synthesis of control circuits from STGs

Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

x y z x+ x- y+ y- z+ z- Signal Transition Graph (STG) x y z

x y z x+ x- y+ y- z+ z-

x+ x- y+ y- z+ z- xyz 000 x+ 100 y+ z+ y x y+ z- 010 y-

xyz 000 x+ 100 y+ z+ y x y+ z- 010 y- Next-state functions

x z y

Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

VME bus Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle

STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller

Choice: Read and Write cycles DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK-DTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

Circuit synthesis Goal: –Derive a hazard-free circuit under a given delay model and mode of operation

Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller

Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+

Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS (DSr, DTACK, LDTACK, LDS, D)

QR (LDS+) QR (LDS-) Excitation / Quiescent Regions ER (LDS+) ER (LDS-) LDS- LDS+ LDS-

Next-state function 0  1 LDS- LDS+ LDS- 1  0 0  0 1 

Karnaugh map for LDS DTACK DSr D LDTACK DTACK DSr D LDTACK LDS = 0 LDS = /1?

Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

Concurrency reduction LDS- LDS+ LDS DSr+

Concurrency reduction LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+

State encoding conflicts LDS- LDTACK- LDTACK+ LDS

Signal Insertion LDS- LDTACK- D- DSr- LDTACK+ LDS+ CSC- CSC

Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

Complex-gate implementation Under what conditions does a hazard-free implementation exist?

Implementability conditions Consistency –Rising and falling transitions of each signal alternate in any trace Complete state coding (CSC) –Next-state functions correctly defined Persistency –No event can be disabled by another event (unless they are both inputs)

Implementability conditions Consistency + CSC + persistency There exists a speed-independent circuit that implements the behavior of the STG (under the assumption that any Boolean function can be implemented with one complex gate)

Persistency a- c+ b+b+ b+b+ a c b a c b is this a pulse ? Speed independence  glitch-free output behavior under any delay

Speed-independent implementations How can the implementability conditions –Consistency –Complete state coding –Persistency be satisfied? Standard circuit architectures: –Complex (hazard-free) gates –C elements with monotonic covers –“Standard” gates and latches

a+ b+ c+ d+ a- b- d- a+ c-a a+ b+ c+ a- b- c- a+ c- a- d- d+

a+ b+ c+ a- b- c- a+ c- a- d- d+ ab cd ER(d+) ER(d-)

ab cd a+ b+ c+ a- b- c- a+ c- a- d- d+ Complex gate

Implementation with C elements C R S z  S+  z+  S-  R+  z-  R-  S (set) and R (reset) must be mutually exclusive S must cover ER(z+) and must not intersect ER(z-)  QR(z-) R must cover ER(z-) and must not intersect ER(z+)  QR(z+)

ab cd a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d

a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d but...

a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Assume that R=ac has an unbounded delay Starting from state 0000 (R=1 and S=0): a+ ; R- ; b+ ; a- ; c+ ; S+ ; d+ ; R+ disabled (potential glitch)

ab cd a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Monotonic covers

C-based implementations C S R d C d a b c a b c d weak generalized C element (gC)

Synthesis exercise y- z-w- y+x+ z+ x- w y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ Derive circuits for signals x and z (complex gates and monotonic covers)

Synthesis exercise y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz Signal x

Synthesis exercise y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz Signal z

Introduction to asynchronous circuit design: specification and synthesis Part III: Advanced topics on synthesis of control circuits from STGs

Outline Logic decomposition –Hazard-free decomposition –Signal insertion –Technology mapping Optimization based on timing information –Relative timing –Timing assumptions and constraints –Automatic generation of timing assumptions

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

No Hazards a b c x 0 abcx b a c

Decomposition May Lead to Hazards abcx b a c+ a b z c x

Decomposition Acknowledgement Generating candidates Hazard-free signal insertion –Event insertion –Signal insertion

Global acknowledgement a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-

a b c z a b d y How about 2-input gates ? d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-

a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

a b c z a b d y 0 0 d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

c z d y a b d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

Strategy for logic decomposition Each decomposition defines a new internal signal Method: Insert new internal signals such that –After resynthesis, some large gates are decomposed –The new specification is hazard-free Generate candidates for decomposition using standard logic factorization techniques: –Algebraic factorization –Boolean factorization (boolean relations)

y- z-w- y+x+ z+ x- w+ Decomposition example y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wxyz

yz=1 yz= y- y+ x- x+ w+ w- z+ z- w- z- y+ x y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ C C x y x y w z x y z y z w z w z y

y- z-w- y+x+ z+ x- w+ s- s+ s- s+ s- s=1 s= y+ x- w+ z+ z x+ w- z- y+ x y+ z y-

s- s+ s- s=1 s= y+ x- w+ z+ z x+ w- z- y+ x y+ z C C x y x y w z x y z w z w z y s y-

C C x y x y w z x y z y z w z w z y yz=1yz= y- y+ x- x+ w+ w- z+ z- w- z- y+ x y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 1001

s- s+ s=1 s= x- w+ z x+ w- z- y+ x y+ z y- z-w- y+x+ z+ x- w+ s- s+ z- is delayed by the new transition s- !

C C x y x y w z x y z w z w z yyyyyyy s- s+ s=1 s= x- w+ z x+ w- z- y+ x y+ z y-

F C Sr D Decomposition (Algebraic, Boolean relations) Hazard-free ? (Event insertion) NO YES C C C C Sr D D

F C D Hazard-free ? (Event insertion) NO YES C C Sr D until no more progress Decomposition (Algebraic, Boolean relations)

Signal insertion for function F State Graph F=0F=1 Insertion by input borders F- F+

Event insertion a b ER(x) c

Event insertion a b ER(x) c x x x x b SR(x) a

Properties to preserve a a b b a a b b a a b b x a a b b a a b b b a a b b x x a is persistent a is disabled by b = hazards

Boolean decomposition F x1x1 xnxn f HG x1x1 xnxn h1h1 hmhm f f = F (x 1,…,x n )f = G(H(x 1,…,x n )) Our problem: Given F and G, find H

C h1h1 h2h2 f state f next(f) (h 1,h 2 ) s (0,-) (-,0) s (1,1) s (0,0) s (-,1) (1,-) dc - - (-,-) This is a Boolean Relation

y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d F Rs y R S

y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs y a c d c d

y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya

y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya D d c

Technology mapping Merging small gates into larger gates introduces no new hazards Standard synchronous technique can be applied, e.g. BDD-based boolean matching Handles sequential gates and combinational feedbacks Due to hazards there is no guarantee to find correct mapping (some gates cannot be decomposed) Timing-aware decomposition can be applied in these rare cases

Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

Timing assumptions in design flow Speed-independent: wire delays after a fork smaller than fan-out gate delays Burst-mode: circuit stabilizes between two changes at the inputs Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)

Relative Timing Circuits Assumptions: “a before b” –for concurrent events: reduces reachable state space –for ordered events: permits early enabling –both increase don’t care space for logic synthesis => simplify logic (better area and timing) “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)

Speed-independent C-element Relative Timing Asynchronous Circuits a- before b- Timing assumption (on environment): a b c RT C-element: faster,smaller; correct only under timing constraint: a- before b- a b c

State Graph (Read cycle) DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+

Lazy Transition Systems ER (LDS+) ER (LDS-) LDS- LDS+ LDS- DTACK- FR (LDS-) Event LDS- is lazy: firing = subset of enabling

Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

Speed-independent Netlist LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

Adding timing assumptions (I) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map LDTACK- before DSr+ FAST SLOW

Adding timing assumptions (I) DTACK D DSr LDS LDTACK csc map LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+

State space domain LDTACK- before DSr+ LDTACK- DSr+

State space domain LDTACK- before DSr+ LDTACK- DSr+

State space domain LDTACK- before DSr+ LDTACK- DSr+ Two more unreachable states

Boolean domain DTACK DSr D LDTACK DTACK DSr D LDTACK LDS = 0 LDS = /1?

Boolean domain DTACK DSr D LDTACK DTACK DSr D LDTACK LDS = 0 LDS = One more DC vector for all signalsOne state conflict is removed

Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK LDTACK- before DSr+ TIMING CONSTRAINT

Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

Ordered events: early enabling a c b a a c b a b b c c F G Logic for gate c may change

Adding timing assumptions (II) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK D- before LDS-

State space domain LDS- D- Reachable space is unchanged For LDS- enabling can be changed in one state D- before LDS- Potential enabling for LDS- DSr-

Boolean domain DTACK DSr D LDTACK DTACK DSr D LDTACK LDS = 0 LDS =

Boolean domain DTACK DSr D LDTACK DTACK DSr D LDTACK LDS = 0 LDS = One more DC vector for one signal: LDS If used: LDS = DSr, otherwise: LDS = DSr + D

Before early enabling LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK

Netlist with two constraints LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+ and D- before LDS- TIMING CONSTRAINTS DTACK D DSr LDS LDTACK Both timing assumptions are used for optimization and become constraints

Rule I (out of 6): a,b - non-input events –Untimed ordering: a||b and a enabled before b, but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) Deriving automatic timing assumptions aaa b b b c c

Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect I: a state becomes DC for all signals

Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect II: another state becomes local DC for signal of event b

Backannotation of Timing Constraints Timed circuits require post-verification Can synthesis tools help ? –Report the least stringent set of timing constraints required for the correctness of the circuit –Not all initial timing assumptions may be required Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

Timing constraints generation a b c d e d d e e b b c c d a Assumptions: d before b and c before e and a before d

Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c d a

Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c Correct behavior d a

Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 2 Incorrect behavior d a

Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c {1, 3} d before b {1} d before c d a 5 {2, 4} c before e Other possible constraints remove states from assumption domain => invalid

Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c {1} d before c d a 5 {2, 4} c before e Constraints for the minimal cost solution: d before c and c before e

Timing aware state encoding Solve only state conflicts reachable in the RT assumptions domain Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic State variables inserted concurrently with I/O events => latency and cycle time reduction

Value of Relative Timing RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual Back-annotation of timing constraints => minimal required timing information for the back-end tools Timing-aware state encoding allows significant area/performance optimization

Specification (STG + user assumptions) Lazy State Graph Lazy SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis Timing-aware state encoding Boolean minimization Logic decomposition Technology mapping Design Flow with Timing Required Timing Constraints Automatic Timing Assumptions

FIFO example FIFO li lo ro ri li- li+ lo+ lo- ro+ ro- ri+ ri-

Speed-Independent Implementation without concurrency reduction 3 state signals are required

SI implementation with concurrency reduction li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- + gC + -

RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x-

RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- To satisfy the constraint: Delay(x- ) < Delay (ri+ ) and Delay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default or easy to satisfy by sizing

Introduction to asynchronous circuit design: specification and synthesis Part IV: Synthesis from HDL Other synthesis paradigms

Outline Synthesis from standard HDL (Verilog) [L. Lavagno et al Async00] –Subset for asynchronous specification –Data-path/control partitioning –Circuit architecture. Control generation Synthesis from asynchronous HDL (CSP, Tangram) –CSP for control generation [A. Martin et al, Caltech] –Tangram for silicon compilation [K. van Berkel et al, Philips] Control synthesis using FSMs [K. Yun, S. Nowick] –Burst-mode machines –Comparison with STGs Disclaimer: this is NOT a comprehensive review

Motivation Language-based design key enabler to synchronous logic success Use HDL as single language for specification logic simulation and debugging synthesis post-layout simulation HDL must support multiple levels of abstraction

Control-data partitioning Splitting of asynchronous control and synchronous data path Automated insertion of bundling delays CONTROL UNIT DATA PATH delay request acknowledge

Design flow Control/data splitting STG (control) HDL specification Synthesizable HDL (data) Synthesis (petrify) Timing analysis (Synopsys) HDL implementation Synthesis (Synopsys) Logic implementation Delay insertion Logic delays

Asynchronous Verilog subset by example always begin wait(start); R = SMP * 3; RES = SMP * 4 + R; if(RES[7] == 1) RES = 0; else begin if(RES[6] == 1) RES = 1; end; done = 1; wait(!start); done = 0; end R RESRES SMP donestart RES C.U. begin-end for sequencing, fork-join for concurrency, if- else for input choice Only structured mix of sequencing, concurrency and choice can be specified

Controller design flow Trace Expressions Circuit Petri Net Transformations Reductions Synthesis HDL Syntax-directed translation

Trace expressions: example ( a || ( b ; c) ) || (d e) || ;  a bc de

Reduction Example a fb c dg h e d;  a; ( b || f ) c g; h;  e

Transformation: concurrency reduction a fb c d ; || a bcdf ; ; Concurrency in TE: b and f have a common parallel father

a fb c d f and b are ordered ; || a bcdf ; ; ; Transformation: concurrency reduction

Synthesis Place-based encoding ( based on a David-cell approach) Transformations to improve area and performance Structural methods to derive a circuit [Pastor et al.] Transactions on CAD, Nov’98

Place-based encoding p1 p2 p3 p4 t1 t2 p3+ p1- p2- p4+ p3- t1 t2 p1+ p2+ p ER(t1) = 111- ER(t2) = --11

Synthesis example: VME bus p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- LDTACK+ D+ DTACK+ DSr- D- DTACK-LDS- LDTACK-DSr+ LDS+ Place encoding

VME bus spec after transforms p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- ldtack+ lds+d+ dtack+ dsr-p9+ d- lds-dtack- p9-ldtack- dsr+ Reductions Transforms

Deriving Next state function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ?

Deriving Next State function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? y = x + z

Conclusion Initial prototype of automated flow without state explosion for ASIC design –From HDLs (control / data splitting) –Existing tools for data-path synthesis –Direct synthesis guarantees implementation (HDL  Petri net, Petri-net-based encoding) –Synthesis of large controllers by efficient spec models (Free-choice Petri nets + trace expressions) –Exploration of the design space (optimization) by property-preserving transformations –Logic synthesis by structural methods Quality of design often acceptable Timing post-optimization can be applied

Synthesis from asynchronous HDL CSP based languages CSP = communicating sequential processes [T. Hoare] Two synthesis techniques –based on program transformations [Caltech] –based on direct compilation [Philips] Tools are more mature than for asynchronous synthesis from standard HDL Complete shift in design methodology is required

Using CSP for control generation After li goes high do full handshake at the right, then complete handshake at the left and iterate. li+ro+ri+ro-ri-lo+li-lo- ro ri li lo Q element *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] “;” = sequencing operator ro+ = ro goes high; ro- = ro goes low [li] = wait until li is high; [not li] = wait until li is low CSP: STG:

Using CSP for control generation *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] Conflict: ro+ and ro- are not mutually exclusive (since ri+ and li+ are not) Eliminate conflict by state signal insertion (= CSC) CSP: Production rules: li -> ro+; ri -> ro- not ri -> lo+; not li -> lo- ri li ro weak

Conflict elimination *[[li];ro+;[ri];x+;[x];ro-;[not ri];lo+;[not li];x-;[not x];lo-] CSP: Production rules: not x and li -> ro+; x or not li -> ro- x and not ri -> lo+; not x or ri -> lo- ri -> x+; not li -> x- FF x not x li lo ri ro

Conclusions Generating circuits from CSP control program is similar to STG synthesis One can be reduced to the other Particular technique may vary. Direct CSP program transformations can be (and were) used instead of methods based on state space generation See reference list for more details

Buffer example in Tangram (a?byte & b!byte) begin x0: var byte | forever do a?x0 ; b!x0 od end Buffer * x a b T ; T a b passive port active port Each circle mapped to a netlist Data path Q element

Summary Tangram program is partitioned into data path and control Data path is implemented as dual or single rail Control is mapped to composition of standard elements (“;” “||” etc) Each standard element is mapped to a circuit Post-optimization is done Composing islands of control elements and re-synthesis with STG can give more aggressive optimization Philips made a few chips using Tangram, including a product: 8051 micro-controller in low-power pager Muna (25 wks battery life from one AAA battery) Similar approach used in Balsa (Manchester Univ., public domain)

Burst mode FSM s1 s2 s3 s4 b-/x- a+b+/y+ a-/x+y- c+/y- c-/y+ Close to synchronous FSMs with binary encoded I/O Work in bursts: –Input transitions fire –Output transitions fire –State signals change Mostly limited to fundamental mode: next input burst cannot arrive before stabilization at the outputs

Extended Burst mode s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ Directed don’t cares (b*): some concurrency is allowed for input transitions that do not influence an output burst Conditional guards = “if b=1 then …”

Synthesis of XBM Next state and output functions free of functional and logic hazards Sequential feedbacks should not introduce new hazards State assignment –one state of the BM spec to one layer of Karnaugh map –compatible layers are merged –layers are compatible if merging does not introduce CSC violations or hazards –Layers are encoded using race free encoding

XBM and STG s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ x- a+ y+ b+ eps c- a- c+ y- y+ x+ y- b-

Summary Specification: XBM is subclass of STGs Synthesis: techniques are extensions of synchronous state assignment and logic minimization Timing: –environment is limited to fundamental mode (difficult for pipelined and highly concurrent systems) –internals are delay insensitive See reference list for details

Summary Specification: Signal Transition Graph (formalized timing diagram) Synthesis: –state encoding –Boolean function derivation –algebraic and Boolean sequential decomposition –technology mapping Timing: –delay model implies timing constraints –exploiting timing assumptions leads to minimization and generates further assumptions Future work: –integrated flow –testing