Presentation on theme: "1 Design and Impementation of a Sub- threshold BFSK Transmitter By: Suganth Paul # Rajesh Garg $ Sunil P. Khatri $ Sheila Vaidya % # Intel Corporation,"— Presentation transcript:
1 Design and Impementation of a Sub- threshold BFSK Transmitter By: Suganth Paul # Rajesh Garg $ Sunil P. Khatri $ Sheila Vaidya % # Intel Corporation, Austin, TX $ Department of ECE, Texas A&M University, College Station, TX % Lawrence Livermore National Lab., Livermore, CA
2 Outline Sub-threshold circuits – the opportunity Challenges Process/temperature/voltage variations Solution – dynamic body bias Validation via test chip Design methodology Silicon results Conclusions
3 The Opportunity Compared traditional circuit with sub-threshold (obtained by simply setting VDD < V T ) Performed simulations for 2 different processes on a 21 stage ring oscillator. Impressive power reduction Impressive power reduction (100X – 500X) Power-Delay-Product (P-D-P) improves by as much as 20X P-D-P is an important metric to compare circuit design styles Power consumption has become a major issue for recent ICs There is a large and growing class of applications where power reduction is paramount – not speed. Such applications are ideal candidates for sub-threshold circuit design
4 Sub-threshold Logic Ids has an exponential dependence on process, voltage and temperature (PVT) Need to stabilize the circuit performance by compensating for PVT variations No approach to compensate sub-threshold delay Existing approaches compensate sub-threshold currents To compensate delay, need a representative circuit Not easy to come up with representative circuit for standard cells
5 Our Solution self-adjusting body-bias to phase-lock the circuit delay to a beat clock. We propose a technique that uses self-adjusting body-bias to phase-lock the circuit delay to a beat clock. network of PLAs Use a network of PLAs to implement circuits. common nbulk node Several PLAs in a cluster share a common nbulk node. A representative PLA in each cluster is chosen to phase lock the delay of the PLAs to the beat clock If the delay is too high, a forward body bias is applied to speed up the representative PLA. If the delay is low, body bias is brought back down to zero to slow down the representative PLA. All other PLAs exhibit the same delay as the representative PLA, since they all share a common nbulk terminal
6 Objective Validate and verify flow by designing a sub-threshold circuit for the application Choose a test application Low power, low speed Develop a sub-threshold circuit design flow Implement our delay compensation scheme to negate PVT variations Implement the same application using a standard cell based flow on the same die Fabricate and test the chip (TSMC 0.25 um process) Compare the sub-threshold circuit with the standard cell circuit in terms of power consumption
7 Test Application - Binary Frequency Shift Keying (BFSK) Transmitter DAC Amplifier Antenna Digital BFSK Modulator Produces two tones f 1 if Input is LOW f 2 if Input is HIGH Binary Input Data Digital Block Implemented Using Sub-threshold Circuits Specifications Input bit Rate: R B = 32kbps, Broadcast distance: D = 1000m FSK tones: f 1 =150kHz, f 2 =450kHz, Channel bandwidth: B = 300kHz
8 Sub-threshold Design Approach Digital part of the circuit implemented as NPLA Digital part of the circuit implemented as NPLA (Network of Programmable Logic Arrays) NPLAs have low delay Critical path delay easy to find PLAs have common nbulk node Circuit level PVT compensation phase locked with the critical path delay An external Beat Clock (BCLK) signal is phase locked with the critical path delay charge pump that modulates the bulk voltage Delay controlled by a charge pump that modulates the bulk voltage of transistors in the circuit Compensates for both inter- and intra-die variations
9 Dynamic NOR-NOR PLA We use precharged NOR-NOR PLAs as the structure of choice Wordlines run horizontally Inputs / their complements and outputs run vertically Each PLA has a “ completion ” signal that switches low after all the outputs switch Several PLAs in a cluster share a common nbulk node. Inputs Outputs completion clk Precharge Evaluate
11 The Charge Pump - PLA “completion” signal lags beat clock - nbulk node gets forward biased - PLA “completion” signal leads beat clock - nbulk goes back to zero bias pullup pulldown
12 Effectiveness of the Approach We simulated a single PLA from 0ºC to 100ºC. Also applied V T variations (10%) and VDD variations (10%). The light region shows the variations on delay over all the corners without delay compensation. The red region shows the delays with the self-adjusting body- bias circuit.
13 Design Flow BFSK Design HDLSynthesis Map to NPLA Logic Verification Integrated Spice Netlist Layout LVSRC Extraction Full Chip Spice Verification Spice Verification: Functional, timing, charge pump Design Of Analog Components
14 98 DFF Sine Lookup Table Depth: 2 9 = 512 Phase Increment Clk Mux Binary Input Phase Accumulator BFSK Design f out < f clk /2, Nyquist criterion, implies < 256. Phase increments chosen based on f clk or left programmable in real time to get Software Defined Radio (SDR) operation. We fix phase increments to avoid extra input pins required for SDR f out = f clk 512
15 Design Flow BFSK Design HDLSynthesis Map to NPLA Logic Verification Integrated Spice Netlist Layout LVSRC Extraction Full Chip Spice Verification Spice Verification: Functional, timing, charge pump Design Of Analog Components
16 Basic BFSK transmitter Block Diagram DAC Amplifier Antenna Digital BFSK Modulator Produces two tones f 1 if Input is LOW f 2 if Input is HIGH Binary Input Data Digital Block Implemented Using NPLA based Sub-threshold Circuits
17 System Architecture Charge Pump Phase Accum NCO Binary to Thermometer Encoder DFF CLK BEAT CLK CLK DACAmplifier Antenna Digital BFSK Modulator Input 98 19 Phase Detector Ref. PLA completion Common Bulkn Digital BFSK using NPLA 4 LSBs - Binary 15 MSBs - Thermometer Avoids glitches in DAC o/p
18 Delay Compensated Sub- threshold Design block diagram L1 PLA L2 PLA L2 PLA L3 PLA L4 PLA DFFs Beat Clk Phase Detector Charge Pump Completion of Reference PLA Common nbulk node of a cluster of PLAs, modulated by charge pump Clk L1 PLA L2 PLA L2 PLA NPLA
19 HDL to Schematic of Digital BFSK Digital BFSK transmitter described using VHDL VHDL synthesized using FPGA synthesis tool, to get a gate level netlist This is imported into SIS in “ blif ” format The “ blif ” file is logically optimized and mapped into NPLA Technology Independent Optimization done on circuit Circuit converted to a mult-level network of nodes with 5 or less inputs per node Circuit traversed from inputs to outputs, and nodes are implemented using PLAs of size (8/6/12) Using NPLA throughput equation, f clk estimated as 1.2MHz We choose f 1 ≈0.115* f clk and f 2 = 0.345* f clk
20 Design Flow BFSK Design HDLSynthesis Map to NPLA Logic Verification Integrated Spice Netlist Layout LVSRC Extraction Full Chip Spice Verification Spice Verification: Functional, timing, charge pump Design Of Analog Components
21 System Architecture Charge Pump Phase Accum NCO Binary to Thermometer Encoder DFF CLK BEAT CLK CLK DACAmplifier Antenna Digital BFSK Modulator Input 98 19 Phase Detector Ref. PLA completion Common Bulkn
22 Thermometer Coded 8-BIT DAC 4 4 LSBs Digital BFSK Output Binary to Thermometer Code Conversion DAC 15 11111 01110 00101 00000 ThermBinary Adjacent Values Differ by 1-bit
23 8-BIT DAC Schematic CM legT 4 - T 18 B3B3 B2B2 B1B1 B0B0 Device size16W 1 8W 1 4W 1 2W 1 W1W1 Currents flow through mirror legs based on input value W1W1 Output current / voltage modulated based by sum of weighted currents through R out Thermometer codes prevent glitches at output DAC supply is 0.7V to handle 0.6V digital signals Rout, Rcm are off-chip resistances
24 Amplifier Schematic Common Source Amplifer Supply of 0.7V Rd, Rs are off-chip resistances M1 biased by DAC Rout resistor C L on-chip antenna load 80pF
26 Layout Manual PLA layout for every PLA in design NPLA routed using SEDSM I/O pad cells, ESD diodes layout done manually DAC, amplifier layout done manually Antenna coil layout done manually
27 PLA Layout Word, Lines Input, Bit Line Output, Lines Transistors, modified based on logic to be implemented
28 I/O PAD CELL Layout I/O PAD Primary ESD Diodes Secondary ESD Diodes I/O Drivers Fully Compliant with TSMC Design rules ESD Diodes have guard rings to prevent latchup Fully Compliant with TSMC Design rules ESD Diodes have guard rings to prevent latchup
29 Die Photo Digital BFSK output domain, 2V Digital BFSK inputs domain, 0.7V Digital BFSK domain, 0.6V Std Cell domain, 2.5V
30 Experimental Results from Silicon Output of BFSK transistor is shown As input changes from 0 to 1, the output frequency changes showing the modulation Output of BFSK transistor is shown As input changes from 0 to 1, the output frequency changes showing the modulation Fclk = 1MHz F1 = 117kHz F2 = 347kHz The adjacent peaks are around -10dB below the fundamental peaks We found from Matlab Simulations that, signals from the extracted Spice netlist, could be demodulated at the receiver side
31 Results from Silicon Nbulk kept at 0V, 0.45V Maximum frequency shows an quadratic dependence on supply Voltage Operating Range
32 Design StyleOperating Voltage Frequency of Operation Avg Current Power Dissipated Sub-threshold0.6V1.05MHz 26.8 W Std Cell2.5V1.05MHz 208 A520 W Power Comparison Sub-threshold power calculated only for Phase Accumulator, and NCO blocks on 0.6V power supply, Std Cell implements only this portion of BFSK circuit Sub-threshold gives 19.4X lesser power
33 Bulkn Node Modulation Bulk node modulates when beat clock demands speedup or slow-down Bulk node modulates as supply voltage is changed, so that circuit delay is maintained constant.
34 Conclusion Validated a sub-threshold circuit design methodology based on dynamic body bias (first-of-kind) Validated design tools and techniques First-of-kind design automation flow, will help bring sub- threshold design to mainstream. We implemented an ultra low power, low data rate wireless BFSK transmitter The fabricated chip, works as expected, validating our design flow. We compared the sub-threshold design a with Std Cell based design and showed 19.4X reduction in power.
37 Introduction Power consumption has become a significant hurdle for recent ICs Higher power consumption leads to Shorter battery life Higher on-chip temperatures – reduced operating life of the chip There is a large and growing class of applications where power reduction is paramount – not speed. Such applications are ideal candidates for sub- threshold circuit design For sub-threshold circuits, VDD ≤ V T
38 TX/RX System Testing TX PCB with subthreshold IC TX antennas RX board RX setup
39 Solving the Problem of Delay Sensitivity to Process, Voltage and Temperature Variations Solving the Problem of Delay Sensitivity to Process, Voltage and Temperature Variations "A Variation-tolerant Sub-threshold Design Approach", Jayakumar, Khatri. Design Automation Conference (DAC) 2005 Anaheim, CA, June 13-17.
40 An Example Showing Phase Locking This figure shows how the body bias (and hence the delay of the PLA) changes with changes in VDD. The adjustment is very quick (within a few clock cycles). VDD change 0.2V to 0.22V VDD change 0.22V to 0.18V
41 Energy and Speed We may be interested in the minimum energy operating point for the design Minimizing VDD reduces power but minimum VDD does not mean minimum energy The optimum VDD value increases with increased logical depth, and with temperature "Minimum Energy Near-threshold Network of PLA based Design", Jayakumar, Khatri. International Conference on Computer Design (ICCD) 2005, Oct 2-5, San Jose, CA. Reclaiming the speed penalty Can be done for datapath circuits, using asynchronous micropipelining Showed that speedup of 7X is possible, with a area overhead of 44% "A PLA based Asynchronous Micropipelining Approach for Subthreshold Circuit Design", Jayakumar, Garg, Gamache, Khatri. IEEE/ACM Design Automation Conference (DAC) 2006, July 24-28, San Francisco, CA.
42 On-chip Antenna Antenna size needs to be at least a 10 th of the transmit wavelength to radiate effectively Transmit wavelength around 600m Due to on-chip space constraints, antenna coil length is only 0.2m We have the option of using an external antenna And we had a 60dB safety margin in the link budget analysis. This could compensate for a lossy antenna
43 Spectrum of Amplifier Tones Fclk = 1MHz F1 = 117kHz F2 = 347kHz The adjacent peaks are around -10dB below the fundamental peaks We found from Matlab Simulations that, signals from the extracted Spice netlist, could be demodulated at the receiver side