Presentation is loading. Please wait.

Presentation is loading. Please wait.

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.

Similar presentations


Presentation on theme: "-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang."— Presentation transcript:

1 -1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang

2 -2- High Level Architecture 23% 36% 29% 0% 48% 18% 38% 8% 21% 18% 4% 15% 9% 2% 8% 4% 1% 4% 2% 1% 4% % Gates % Area % Power

3 -3- Branch & Path Metric Generation U L U L U L U L U L U L U L U L l Branch Metrics Computation apparently implemented with a CORDIC block (contains 840 MUX’s, 58 adders & flip-flops, 32 15-bit busses) l Branch Metrics Hard- wired to each ACS unit l Path Metrics Stored in ACS units l Each ACS unit handles 16 states Hard-wired Path Metric Interconnect

4 -4- ACS Architecture l Each ACS unit stores 32 path metrics l Only two SRAM’s are active at a time l Across all four ACS units, each path metric is stored twice l SRAM accounts for 88% of the area and 27% of the power for each ACS unit 8x9 SRAM PM U PM L PM U BM U PM L BM L Add Compare Select Pipeline Register MUX

5 -5- Traceback Architecture l State-Machine blocks are just large sum-of products combinational networks (351 gates each) l Each memory unit contains a 16x64 SRAM and logic (192 MUX’s, 128 flip- flops) Decision Bits Traceback Next_ramin Pipeline Register MUX SRAM Traceback Memory Unit 192 Out Decision Bits Traceback Memory Unit 22% Area 20% Power Finite State Machine 11% Area 13% Power Traceback Unit

6 -6- Design Flow l Design Compiler Synthesis script (from Mentor/Inventra) l SRAM Generator (from Norman Walker) l VHDL gate-level sims (timing verification, switching activity annotation) l PowerMill Simulations (SRAM, core) l Design Compiler, Power Compiler (Static timing, power analysis) l Floor Planning (Preview) l Place & Route (Silicon Ensemble) l Interconnect Parasitic Extraction (“report simcap”) l PowerMill simulations, PathMill static analysis l Design Compiler, Power Compiler (Static timing, power analysis with back-annotated interconnect parasitics) Synthesis & Module Generation Pre-Layout Verification & Analysis Post-Layout Verification & Analysis Floor Planning Place & Route

7 -7- Synthesis and SRAM Generation l Synthesis with Synopsys Design Compiler »Constraint: 66 kHz clock (effectively infinite) »Bottom-up synthesis of 62 VHDL entities l Low-Power SRAM generator (from Pleiades) »Very large sense-amps, control logic »Optimized for power, speed at low supply- voltages »Word-length limited to a power of 2

8 -8- Simulation Models Behavioral C Behavioral VHDL RTL VHDL Parameterized, bit-true, and fast Used for system level design and BER simulations Synthesizable, crafted for specific parameters and implementation structure Used for synthesis quality Parameterized, bit-true, and cycle-true Used for structural simulations and test bench reference

9 -9- BER Simulation Results

10 -10-SRAM l Simulation Tools: TimeMill & PowerMill l Parameters »66 MHz clock »Voltage 2.5V »Random Generated Test Vectors l Results »Power Analysis »Timing Analysis

11 -11- SRAM: Power Numbers l SRAM used for ACS Unit » 8 words by 9 data bits OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity663.731.65924.885 Write Activity563.211.40821.120 Read/Write 612.291.53022.950 Parasitic Extraction OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity949.892.374735.6205 Write Activity772.8301.932028.980 Read/Write851.422.128531.9275

12 -12- SRAM: Power Numbers l SRAM used for Traceback Unit » 16 words by 64 data bits OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity2170.75.426781.4005 Write Activity1893.44.733571.0025 Read/Write 2086.95.217278.2580

13 -13- SRAM: Timing Numbers l Delays » Delays – Setup Time; Hold Time – time needed for data address to become stable Setup(ns) Hold(ns)Data Resolution(ns) ACS SRAM ~1 ~2~1.8 Traceback SRAM ~1 ~2~5

14 -14- Place and Route l Floor planning of the Viterbi SRAM macro cells and standard cells was done in Preview, and Silicon Ensemble was used for routing. l Total SRAM macro cell area was 1.58 mm 2 (1.08 mm 2 with 9x8 SRAMs) »Area of the 16 9x8 bit SRAM macro cells: 0.052 mm 2 each, 62% larger than required, as 16x8 bit SRAMs were used (SRAM generator output had been verified for powers of 2) »Area of the 3 16x64 bit SRAM macro cells: 0.25 mm 2 each l Area of the standard cells 1.02 mm 2 (0.35 mm 2 from DEF file) l Final chip area was 4.0 mm 2 (original estimate 2.5 mm 2 ) l Parasitics for timing simulation were extracted from the final routed nets in Silicon Ensemble.

15 -15- Wiring Statistics l Six metal layers, layers 5 and 6 used for power and ground respectively l Ground and power spaced alternately 100 um apart horizontally and vertically. l There were about 6200 nets and 46,114 vias. Total wire lengths: l metal layer 1: 3,293 um l metal layer 2: 458,440 um l metal layer 3: 510,517 um l metal layer 4: 218,023 um l metal layer 5: 96,882 um signal, and 38,400 um power l metal layer 6: 8,660 um signal, and 37,500 um ground l wire length: 685 mm horizontal, 611 mm vertical, total 1296 mm

16 -16- Final Placement and Routing l Significant routing congestion at 16 by 64 bit SRAM outputs, due to Silicon Ensemble grid size of 1 um (observe white and light blue wires). l Minimum of 6 unroutable nets observed, even at 12 mm 2 chip area. l Final size was 1.25 mm x 3.2 mm, 4 mm 2, with 9 unroutable nets. l Violation reports in Silicon Ensemble did not identify which nets were unroutable, other than problems with ground and power connections.

17 -17- Static Timing Checks l All timing checks performed with Design Compiler’s report_timing command l Parasitic capacitances back-annotated with the set_load command l No RC parasitics annotated l No SRAM model was used for timing checks l Critical Path was from ACS control logic, through a PM ouput MUX select signal (in an ACS unit), through the following ACS unit. l Checks performed at 2.5V

18 -18- Static Power Checks l All timing checks performed with Design Compiler’s report_power command l Switching activity was measured for every output port (transition counts over 16,000-cycle simulation) l Back-annotation performed with SAIF files l No SRAM model was used for power checks (added in manually) l Checks performed at 2.5V w/ 60 MHz clock

19 -19- Delay and Energy Scaling

20 -20- Performance Results For fixed throughput requirement 100ksps:

21 -21-Summary l Performance in intended operation (100ksps) »Clock Speed: 1.6 MHz »Power Dissipation: 0.14 mW »Power Density: 34.9 uW per mm 2 l Cost »Die Size: 4 mm 2 »Design effort: 30 work days l Predictability and portability »Mentor/Inventra predictions vs. measured results


Download ppt "-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang."

Similar presentations


Ads by Google