Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu Chen; Fu-Chiung Cheng Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan Circuits and Systems (APCCAS), 2010 IEEE Asia Pacific Conference on National Sun Yat-sen University Embedded System Laboratory
2 With the VLSI technology improving rapidly, SoC has been becoming the most important VLSI application. However, clock distribution and low power have already become the two most important issues in SoC design. In addition, it’s also a very important issue to integrate IPs that can perform operations correctly with different clocks. Asynchronous circuits may resolve these problems by removing the “clock” signal. But it’s too hard to implement the whole circuits with asynchronous circuit. The GALS (Globally-Asynchronous Locally-Synchronous) design methodology can balance this problem via separating each synchronous design with asynchronous interface. Thus, each part of the circuit can perform operations with its own clock. The communication between different parts of the circuit can be achieved via asynchronous channels. The GALS provides a reliable communication between different modules. However, the latency of GALS interface may cause performance degradation seriously. Thus how to reduce the latency of GALS interface is significant. In this paper, we implemented a small and simple stretchable-clock based GALS wrapper with low latency in Verilog HDL and synthesized the design with TSMC 0.13μm cell library. We also showed that the wrapper can operate correctly with modules which operate with great different clock frequencies. In addition, we also recommend adding FIFO storage element on the transmission path.
3 What’s the problem IPs can perform operations correctly with different clocks. ◦ Synchronous circuits work by “clock” signal Some drawbacks ◦ Asynchronous circuits work by handshake protocols high implementation costs and difficulties ◦ GALS (Globally-Asynchronous Locally -Synchronous) design methodology To integrate both the advantages of Synchronous and Asynchronous Circuits The latency of GALS cause performance degradation seriously. ◦ A stretchable-clock based GALS wrapper with low latency.
4 Related work [This paper] [1,2,3,4] Some drawbacks of Synchronous circuit  GALS systems  GALS has large latency How to deal with these drawbacks GALS was first Appeared in 1984  To integrate both the advantages of Syn. and Asyn Circuits 1. clock skew 2. difficulty in clock distribution 3. worse case performance 4. not modular 5. sensitive to variations in physical parameters 6. synchronization failure 7. noise (EMI) reducing the latency of asynchronous interface  1.Pausible clock generator 2.Stretchable clock generator The major difference between them is the way to stop the clock  Asynchronous circuit handshake protocols high implementation costs and difficulties 1.Input controller 2.Output controller GALS methodology was proposed
5 Proposed method The new STG (Signal Transition Graph) Compose with REQ 、 ACK 、 stretch 、 WR(or RD) The proposed new wrapper Input controller Output controller
1.Stoppable clock generator 2.The most commonly used approach so far 3.Uses odd number of inverters to generate the local clock signal of the locally synchronous module 6 AB 00 0 1 1 0 11 1 Y 0 1 1 Ri Ai lclk rclk
7 AB 00 0 1 1 0 11 0 Y 1 Hold 1.The basic idea is similar to the above approach : stop the clock when data transfer occurs 2.The major difference with above approach is the way to stop the clock The symbol "C” represents C-element, a self-timed latch AB 00 0 1 1 0 11 1 Y 0 0 0
8 =0 =1=1 =1=1 =1=1 =1=1 If receiver needs to receive data
10 If it put a First-In-First-Out (FIFO), the sender could put the data into the FIFOs and get acknowledge earlier. Thus sender will continue computation instead of waiting for receiver. The latch is controlled by ACK; data has to be stored correctly in the latch during the time from ACK+ to ACK-
Implemented proposed design Gate-level in Verilog HDL Synopsys Design Complier Be used to synthesize our gate-level design With TSMC 0.13μm cell library 11 Compare area and latency with two different GALS models proposed
13 This paper propose a new GALS wrapper ◦ Based on four-phase handshake protocol. ◦ Consists of an input controller and an output controller The Area and Latency are improved. ◦ Compared to the C-element based design The area of the new wrapper is only 30.8% The latency of the new wrapper is only 39.7% ◦ Compared to the standard cell based design The area of the new wrapper is only 63.5% The latency of the new wrapper is only 55%
14 This paper list the GALS history and principle for design ◦ Like the GALS concept Synchronous Asynchronous GALS ◦ To ensure operation correctness, the synchronous modules must be stopped when the data transfer occurs Improving my recognize for GALS ◦ The control of Asynchronous wrapper ◦ STG