Download presentation
Presentation is loading. Please wait.
Published byCurtis Francis Modified over 9 years ago
1
Observability Conditions and Automatic Operand- Isolation in High-Throughput Asynchronous Pipelines Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) Patmos 2012, Sep 2012, Newcastle upon Tyne
2
The Asynchronous Advantage Async logic removes wasteful margins and can achieve robust, fast circuits, with low power consumption Manufacturing margin Clock jitter, skew margin Worst case – average case Flip-flop alignment Cycle time of clocked logic Logic gates Logic Time 2
3
Asynchronous Channels Req Ack Data Data stable 1 2 3 4 1 2 3 4 Ack 1-of-N Data Req Ack Bundled-Data Channel small area and lower power Sender Receiver Data Ack Dual-Rail (1-of-N) Channel Fewer timing assumptions, higher power/area Sender Receiver Data rails don’t switch when the same value is communicated multiple times Data rails switch even when the same value is communicated over and over
4
Asynchronous Circuit Design - Today Applications 3D Network on chips (STMicroelectronics) Ethernet Switches (Intel SRD) Ultra high-speed FPGAs (Achronix) Process variation Low-power chip design (Encryption – Tiempo, …) Basic challenges: Automation Proteus design flow (USC) Uses commercial synchronous CAD tools Starting at a high-level specification written in SVC (SystemVerilogCSP) Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G) - 1.2 B transistors, 90% Asynchronous 13% Proteus Tiempo TAM16 - Clockless 16-bit microcontroller STMicroelectronics WIOMING 3D-IC (July 2012) Achronix FPGA. 1.7 M LUTs. 2.1 Gbps IO
5
Recent Developments in Asynchronous Circuits May 2010 Wave semiconductors funded to acquire low-power circuits based Aug 2010 Timeless Design Automation acquired by Fulcrum Microsystems July 2011 Intel acquired Fulcrum Microsystems April 2012 Achronix announces 22nm asynchronous FPGA June 2012 Tiempo Working with STMicroelectronics, Unveils First 32nm Clockless Test Chip
6
Constraints Sync Library Clock Gating Clock Tree Synthesis Netlist Clock Gating The Proteus Flow Synthesis Physical Design Verilog Netlist Constraints Final Layout Proteus/ Sync Library ClockFree System- Verilog Image Netlist SVC2RTL Design Goals Synth. RTL Constraints Async Netlist Key Features Re-uses synchronous EDA tools Seamless integration into existing flows Up to 2X higher performance Tool Status Started at USC Async CAD/VLSI Commercialized by TimeLess (2008) Acquired by Fulcrum (2010) Intel Acquired Fulcrum (2011) Used in Intel Ethernet Alta FM6000 chip The Problem Limited and manual power optimization 6
7
Flow Demo SystemVerilog Specification, CSP (Communicating sequential Processes ) Uses abstract Receive and Send module acc_RTL (...,CLK,_RESET);` acc_RTL SEND L.0 E L.1 L.e RECV In.d[i] R.0 E R.1 R.e In.e[i] Sum.d[i] Sum.e[i] SynthesisClockfree Physical Design Final Layout Asynchronous Gate-level Netlist Synthesized Image Netlist SVC2RTL acc_RTL Synthesizable RTL
8
Conditional Communication in Proteus 0 1 0 Not received Dummy value 0 1 Not sent
9
Asynchronous Gates in Proteus Unconditional Gates SEND RECEIVE Conditional Unconditional
10
Example: ALU SVC Description No conditionality in high-level description
11
Reconverging fanouts + Unnecessary calculation
12
Adding Isolation Cells All inputs/outputs are unconditional Operand Isolation And-based isolation cells Generated by synchronous RTL synthesizer Does not prevent switching in asynchronous circuits Isolation cells are not effective in asynchronous circuits
13
Three-valued logic Formal justification of conditioning Three-valued logic image model Each iteration is modeled by a clock cycle Each variable can be 0, 1, or N (no token) Status of each channel One iteration
14
3VL Operators Inversion: A=0: ¬A = 1, A=1: ¬A = 0, A=N: ¬A = N Conditional Communication RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators 3V literals Example: x {0,N} is 1 when x=0 or x=N Behave like buffers when E=1 Behave like binary gates when both inputs are non-N
15
3VL Unconditional Functions Unconditional functions Can be represented only by , , operators Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, … Lemma 1: the output is N iff at least one of the inputs is N.
16
SEND/RECEIVE Operators Conditional Communication RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators Behave like buffers when E=1
17
Synchronized Variables Variables that are simultaneously N or simultaneously non-N (0 or 1) Denoted by: x 1 Ⓒx 2 Example: In the ALU, i1, i2, and op are synchronized i 1 Ⓒ i 2 Ⓒ op
18
SEND Reconditioning Assuming y=f(x) is unconditional and e TFO(y) Lemma 2: Application: SEND cells can be moved through logic Similar to retiming in synchronous circuits Less switching when e=0 Less number of SENDs
19
Observability in 3V Networks Local Observability Partial Care (LOPC) OPC(f,C,x j ) of input x j of a node representing a function f is the condition under which f’s output is not affected as x j changes in C {0,1,N} Global Observability Partial Care (GOPC) GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C {0,1,N} Example: i 1 changes in {0,1} are not observable when… i 2 =0 or i 2 =1 s =1
20
GOPC Conditioning When x j is not observable… Add a SEND followed by a RECEIVE Move the SENDs using SEND reconditioning Lemma 3: SEND Reconditioning 0 0 or 1 N N N N N 1
21
Conditioning & + 0 0 + No Activity
22
Inserting Isolating Nodes and Recognizing Enable Domains Synchronous synthesis tools can insert isolating nodes Constrained to insert isolating nodes only on non-critical paths Node u is in e’s Enable Domain OIED(e) if All paths starting from a primary input and ending at u include an isolating node controlled by e Detected using a DFS search
23
Pre-layout Analysis W u : power of receiving data on all inputs and sending the output (unconditional nodes) K: power of conditional nodes r f : activity factor Total power Power of each domain Domain power after isolation (n inputs) Benefit of isolating each domain
24
Post-layout Experimental Results Case study: 32-bit ALU placed and routed Back annotated switching activity using a VCD file Results: Isolating ADD and SUB are detrimental for r ADD and r SUB > 0.2 53% power reduction when only isolating MUL (r f =0.25) Area cost of isolating MUL is about 4% and no performance penalty
25
Conclusions and Future Work Conditional communication in async. circuits is not free Creates area and performance overheads Requires manual or automatic optimization Asynchronous circuits can/should leverage sync. tools This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits Our future work Evaluate the proposed method on bigger designs Adopt other sync power optimization techniques such as clock gating Optimize the location of SEND/RECEIVE nodes (Reconditioning)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.