Low Power Design for Portable Devices

Low Power Design for Portable Devices
A investigation of techniques for the advancement of portable electronics By: Patrick Gonzalez Project Advisor: Dr. Baback Izadi Project Co-Advisor: Dr. Damu Radhakrishnan

Outline Introduction Proposed PDA VLSI Level Low-Power
System Level Low-Power DC Power Systems Review Proposed PDA Accomplishments Current-Future Work Conclusion

Introduction Portable Devices are: Everywhere Here to stay
Advancing in capabilities rapidly In serious need of more power

Why Low Power? Power is expensive, non-renewable, and negatively impacts the environment Extending life of battery powered systems Increased desire for complex devices equals greater need for power efficiency Reliability is cut in half for every 10oC rise in temperature

Project Background Low power design group
Research several areas of low-power design Proposed theoretical low-power PDA Tested validity of new design Researched three main areas Investigated new design possibilities for each Interest from participation in recently formed low power design group NP First step was to research several areas of low-power design Once a solid background was gained, I proposed a theoretical low-power PDA design Next, the validity of the new design was tested This was done by researching three main areas of the system: VLSI level, system level, and DC power The possibility of new designs for each of these areas was then investigated NEXT: DYNAMIC POWER EQUATION

Dynamic Power Consumption
PDynamic = VDDVSwingCloadaf + VCCI sc + VDDIl Switching Power Short-Circuit Dissipation Leakage First, the total power consumption of a CMOS system should be defined. The power can be categorized into three areas as shown here: The switching power Short-circuit dissipation and Leakage The leakage is due to drain-junction and sub-threshold leakage. This only accounts for a very small portion of the overall power and, since it is an issue that is based more on the design properties of CMOS transistor, this portion was not focused on for my research. NEXT: SWITCHING POWER

Switching Power Gate capacitances of gate inputs connected to output
Interconnect capacitances Diffusion capacitance of drains The switching power, which constitutes about 85% of the total, is due to the charging and discharging of circuit capacitances. These include, using this inverter as a model: The gate capacitances of the transistors that are connected to the output of the inverter Can be thought of as the capacitances of the wiring Diffusion cap of the drains of the inverter itself NEXT: SHORT CIRCUIT DISSIPATION

Short-Circuit Dissipation
The short-circuit dissipation is due to a short circuits that can occur: Using the inverter as an example again: An example is when on of the transistors is ‘on’ and, as the voltage changes, the threshold voltage of the other transistor is reached before the conducting transistor can be turned off. If the source voltage is less than the sum of the two transistors thresholds, then it is impossible for each to be switched simultaneously, hence reducing the short circuit dissipation to 0. NEXT: THEORETICAL PDA Current flows from Vdd to ground 0 if Vdd<VtN+ |Vt|

Proposed PDA Dual-processor Dual battery design SDRAM & Flash memories
Specially designed ultra-low-power High-performance Dual battery design SDRAM & Flash memories Solar cell After researching many different areas of low-power design, I was able to propose a theoretical PDA design: This design would incorporate 2 processors: one specially designed for ultra-low-power and the other would be a high-performance processor. Two batteries would also be used for the system. The memory system would include both SDRAM and a Flash memory. The flash memory would be used for saving data when the batteries are at the end of their lives in order to avoid any loss of data. A simple solar cell was also included in order to allow some level of life extension for the batteries NEXT: VLSI TECHNIQUES

x + VLSI Level Techniques Adders Multipliers Recent interest
Common unit Multipliers Power hungry Created using adders + x For the VLSI level, I researched two areas: adders and multipliers. There has recently been a resurgence of interest in new low-power adder designs. This is due to the fact that adders are one of the most common and most often used units in processors. Multipliers were investigated because they are one of the more power hungry units especially in DSP. They are generally created using adders and are also a very common unit in processor design. NEXT: ADDERS

CMOS Adders Conventional Design Recent Design 28 transistors
Equations H = A Å B Sum = H Å Cin Cout = HA + HCin Conventional Design 28 transistors Static logic Slow Power hungry Recent Design 14 transistors Pass logic Fast Low-power Here are the general equations for a full adder design. The conventional CMOS design for a FA requires 28 transistors in a static logic configuration. This design is slow and power hungry. Recent designs have gotten down to 14 transistors using a pass logic configuration. I will not discuss the differences between static and pass logic, suffice to say that pass logic is generally faster and consumes less power. It can be seen in the equations above that the XOR operation is one of the more important in the design of an adder. NEXT: XOR/XNOR

CMOS Adders Cont. XOR / XNOR 6 Transistors Full voltage swing
Voltage scalable Fast I used this module as a core for my designs. I won’t go through the exact operation of the circuit. It is a recently proposed XOR/XNOR circuit, created by Dr. Radhakrishnan, which uses only 6 transistors, the lowest to date. It yields a full voltage swing at all nodes, is voltage scalable, and has a short delay time. NEXT: MY ADDER

CMOS Adders Cont. Personal Investigation 2 XOR / XNOR modules (12T)
Simple 2-1 MUX for Co output (4T) Requires only 2 standard cells Using this XOR/XNOR cell, I proposed the following FA design. For the sake of simplicity, I will discuss it in a block diagram format. The FA is created using… As can be seen, the layout of this design would require only 2 cells to create. It also has the bonus of an extra signal…a complementary sum. NEXT: FEATURES

CMOS Adders Cont. New Full Adder Circuit Features 16 Transistor design
Full voltage swing at all nodes Simple layout with fewer standard cells required (2) Extra complementary sum output (discussed later) Completely functional Size, power, and delay currently being investigated This full adder has the following features: NEXT: MULTIPLIERS

CMOS Multipliers Conventional Design
Booth encoded for fewer partial products Wallace tree structure for reducing partial products Fast CSA or CLA for final addition Multipliers were the other VLSI area that was investigated… It is generally known, that a standard multiplier is created using: The reduction of the partial products encompasses a majority of the total power dissipation, therefore, this is the area that was concentrated on. NEXT: 4-2 COMPRESSOR

CMOS Multipliers Cont. Partial Product Reductions 4-2 Compressor
Cout & C have same weight Often using 2 FAs 40T, 32T, 30T, 28T A main component involved in PP reduction is a compressor, often a 4-2 compressor For those who have not seen a 4-2 compressor, I will brush over it as quickly as possible. It can be viewed a a type of adder with a total of 5 bits as inputs (4 bits and a carry in). There are three outputs: A sum, a first carry out, and a second carry out. Unlike an adder, both of the carries have the same weight of 2, as can be seen in the truth table. The final entry shows that the sum of the inputs is 5, the outputs show this as each carry being set for a total of 4 and the sum raising the total to 5. The most common approach for creating a 4-2 compressor is using 2 FAs. A conventional design uses 40 transistors, but recent investigations have generated the same operations using 32, 30, and 28 transistors. NEXT: 3-2 COMPRESSOR

CMOS Multipliers Cont. Personal Investigation
Consider a 3-2 compressor 1st half of sum same as in standard full adder 2nd half of sum equal to complement of standard FA sum Cout is same, for each half, as that of standard FA Carry is either 0 or ‘S’ depending on the value of ‘W’ good candidate for new FA design For my investigation, I considered a 3-2 compressor first. This would include a total of 4 inputs with 3 outputs as shown here. It was noticed that, if the table is broken in half, the first half of the sum output is the same as that of a standard FA. The second half is the complement of the FA sum. Since the table is broken into 2 parts, it was obvious that the MSb could be used to control which output is used. For Cout, the values were forced to resemble, in each half, that of the output of a FA carry. By doing this, it was found that the values of Carry would either be 0 or that of the FA sum. From this analysis, it was realized that the new FA design previously discussed would be a very good candidate for creating using in this design. But, since the 4-2 compressor is a more usable design for multipliers, this investigation was expanded to see what could be done for such a design. NEXT: MY 4-2 COMPRESSOR

CMOS Multipliers Cont. Personal Investigation Cont.
Expand to 4-2 compressor Same general concept XOR/XNOR of 2 MSbs control ‘S’ Cout is repeated for every quarter block Carry is either 1, 0, or S depending on XOR/XNOR value of the 2 MSbs Can be simplified with reference to proposed FA Here, when the table was sectioned by quarters, it was seen that the same general concept could be followed. The first and last quarters of the sum used the value of the sum of a FA which could be controlled by the XNOR of the 2 MSbs. Again, each section of Cout was forced to resemble that of the FAs carry out. And, when this was done, the final Carry output, also controlled by the XOR/XNOR of the 2 MSbs, would either take the value of the Carry in, or that of the standard FA sum. From this, the entire table was easily reduced to the one shown here: NEXT: SMALLER TABLE

CMOS Multipliers Cont. Personal Investigation Cont.
Expand to 4-2 compressor Same general concept XOR/XNOR of 2 MSbs control ‘S’ Cout is repeated for every quarter block Carry is either 1, 0, or S depending on XOR/XNOR value of the 2 MSbs Can be simplified with reference to proposed FA Where the outputs are all taken with respect to the outputs of the new FA, except for the final Carry when the XOR of the 2 MSbs is 0. From this, I was able to create the circuit for the compressor. NEXT: CIRCUIT DESIGN

CMOS Multipliers Cont. Circuit Design Proposed FA XOR/XNOR 2 MUX
FA + S X Y Z Cout S XOR/XNOR Cin W I Circuit Design Proposed FA XOR/XNOR 2 MUX Total 30 Transistors New 4-2 Compressor Cout Sum Carry Cin The entire design was created using… Once each of these modules was wired together… a new 4-2 compressor was created with a total of 30 transistors. NEXT: NEW COMPRESSOR DETAILS Mux Sum Mux Carry

CMOS Multiplier Cont. New 4-2 compressor details
Full voltage swing at all nodes Strong noise margin 30 Transistors (Equal to smallest design w/ full swing) Usable for voltage range from 2|VT| and up Requires only 2 standard cells to complete the layout 2|Vt|… Assuming the thresholds of the pMOS and nMOS transistors are the same Only 2 standard cells… an XOR/XNOR cell and a 2-1 MUX cell NEXT: SCHEDULING

System Level Power Management
Scheduling Static Scheduling Dynamic Scheduling Multiprocessor Investigation The next area that was investigated was system level low power including Scheduling…both static and dynamic and a recent Multiprocessor design that was proposed NEXT: STATIC AND DYNAMIC

Voltage Scaling Static Scheduling Dynamic Scheduling Compile-Time
Task Ordering Dynamic Scheduling Run-Time Deadline Critical First shows what happens when this hypothetical application is run completely using low voltage. It can be seen that it over-runs its deadline. Static scheduling is performed when the application is compiled. The worst case times for each task are known, so the algorithm will sort through each task in order to find the task or tasks that need to be run at a higher voltage/frequency in order to meet the deadline time. Dynamic scheduling is performed during run time. For this approach, which is much more difficult to do, the worst case execution time for each different type of task is summed up. For this length of time, the processor is run at its low voltage level. Since some tasks will be completed earlier than the WC, an increase in the slack time can be obtained. But, once all of this time is used, the processor will switch to its high voltage level until, either the application is done, or more slack time is gained. NEXT: MULTIPROCESSOR

Multiprocessor Investigation
Olsen & Morrow 2 Processors Control logic Problems: Deign requires multiple operating systems Processors cannot share code or data Recently, a dual processor portable system has been proposed. It appears here to resemble the theoretical one mentioned earlier, and indeed much of the design is similar. But, some troubling areas that were found in this design were… The design was assumed to be using 2 off-the-shelf processors, and as such, 2 operating systems would be needed for the different instruction sets. This would also not allow any sharing of code or data between the 2 processors. These areas would cause a considerable amount of overhead for the design to function correctly. NEXT: HEURISTIC

Multiprocessor Personal Research
Heterogeneous Multiprocessor Scheduling Decision Heuristic for Low-Power DL(vi, pj)=SL(vi)-max{ti,jA, tjM}-min{ti,jA, tjM}+D (vi,pj)-E(vi,pj) vi task i pj processor j DL(vi, mj) the dynamic level of a task-machine pair (The lower DL is, the higher the scheduling priority) SL(vi) the static level of task i. (Larger value for long execution time tasks) ti,jA data ready time tjM processor ready time max{ti,jA, tjM} - a task-processor pair with an earlier starting time will have higher scheduling priority min{ti,jA, tjM} - a task-processor pair with an earlier deadline time will have higher (vi, pj) if processor j executes task vi faster than the other processors, its value will be positive, which increases the scheduling priority E(vi, pj) a measure of how much power will be used if task i is performed on processor j For scheduling on the multiprocessors, I created this heuristic for dynamic scheduling for low-power. It is a modified version of one that was used for a different purpose. I don’t have time to completely discuss every aspect of it, but I will just attempt to give the general idea. The DYNAMIC LEVEL or DL factor, is used to prioritize a task and processor pair. The lower the value, the more priority the pair has. The Static Level is a value taken from the task with the WCET. The min and max factors are used for controlling the priority with reference to the start time and deadline time of a task. The delta function is a comparison value of a task on a processor vs. the same task being run on the other processors that have been evaluated. And, the E factor is a measure of how much power will be used if task i is performed on processor j. That about sums up a high level approach to this heuristic, and scheduling. Next, I will briefly discuss the DC Power Systems NEXT: BATTERIES

DC Power Systems Battery Operation Anode oxidizes Cathode reduces
(loses electrons) Cathode reduces (gains electrons) Electrolyte passes ions allows equilibrium * Or more simply, the batteries… The basic operation of a battery is a redox reaction, where, the anode of the cell oxidizes, or loses electrons, and the cathode receives electrons, or reduces. If these two components were connected out of the electrolyte, an electron transfer would take place quickly but for only a moment. This would occur because, as the electrons were moved from the anode to the cathode, a positive charge would be built up at the anode, and a negative one would occur at the cathode. After this, it would obviously take a great deal of energy to transfer any more electrons. The electrolyte is used to keep this charge buildup from occurring by allowing charged ions to transfer, thus equalizing the charge states of the two electrodes. But this equilibrium is not perfect. NEXT: CONTINUOUS DISCHARGE

DC Power Systems Cont. Continuous Battery Discharge Rate
These are the general curves that would be seen for the continuous discharge of a battery. Of course, the greater the current delivered to the load, the faster the discharge. But, we have all probably experienced a time when we are using a flashlight or some battery operated device, and it goes dead. Then we turn it off for some time, and when we turn it back on, there is a small amount of charge available. Recently, this phenomenon has been investigated as a way to extend the lifetime of batteries in portable devices. NEXT: INTERMITTENT DISCHARGE

DC Power Systems Cont. Intermittent Battery Discharge Rate
Non-equal transfer between electrons and ions Charge recovery Level of recovery dependent on active and rest times This phenomenon occurs from an unequal transfer of electrons and ions. It has been found that the charge recovery is due to a stabilization of the charge between the two electrodes. The level of recovery has also been found to depend on the amount of time that the battery is actively delivering a charge vs. the time that it is given to rest. As shown here, the rightmost curve is for a battery that was continuously discharged at 300mA, and the leftmost is for the same continuous discharge, but at 600mA. The red, green, and blue curves represent the intermittent discharge of the battery with a 10 min period of active and rest states. The red curve is for an active time of 5.5 minutes with 4.5 minutes of rest. As we move left, the green plot is when the battery was only given a 4 minute rest time. And the blue is when only 2 minutes, of the 10, were given for rest. NEXT: INSTANTANEOUS VS. RAMPED

DC Power Systems Cont. Research
Multi-cell Intermittent Discharge Techniques Others investigations for pulse discharge involve immediate transitional battery switching Proposed investigation into ramped voltage transitions for intermittent discharge Now, in my research, I realized that all who have investigated this area have only used an immediate switching of the cells for pulse discharge. This approach can give to more noise, strong spikes, and possibly damage to the batteries themselves (resulting in a lower total lifetime for rechargeable cells). So, I have come up with a proposal for an investigation into … NEXT: RAMPED VOLTAGE GRAPHS

DC Power Systems Cont. Zero-delay ramped voltage switching
Extended rest ramped The general concept would be to simultaneously ramp two batteries between full charge delivery and 0 charge delivery. And, since it has been found that rest period lengths can have a big effect on the level of charge recovery, this can be done as seen in the lower graph. Here, one battery is allowed to recover, while the other one is used at full charge delivery for the same amount of time.

Proposed PDA Design Dual-Processors Dynamic Scheduling
Ultra-Low-Power Processor Same instruction set (close) Shared pipelines (code & data) Dynamic Scheduling Static Scheduling (Compile-Time) Dynamic Scheduling (Run-Time) Multi-Battery Supply Efficient voltage switching Now, to refresh the entire PDA design idea and how these areas come into play. One of the two processors can be specially designed using ultra-low-power VLSI techniques. It can be created in a way so as to use the same instruction set as the high-performance processor (or at least very close). Using this approach, the possibility of sharing pipelines between the processors can be investigated. This would cut much overhead by allowing the 2 to share code and data. Static scheduling could be used to set each task to be executed on the most efficient processor. And dynamic, run-time, scheduling could be used for times when the queue of one processor is to great and the other processor is not being used. Of course, the two batteries could be set up in such a way as to use the most efficient switching technique thereby extending their lives. Also, the solar cell would have the perfect time for giving some level of charge to the batteries when they are in the rest state. NEXT: ACCOMPLISHMENTS

Accomplishments Attained solid background in low-power design techniques as required for future investigations Created full reference document for members of the low-power group Created theoretical PDA design for group research involving several areas of engineering Validated several PDA design areas worthy of future research Sending out a conference paper on the new 4-2 compressor What was accomplished during this project… I attained a solid background in low-power design techniques which will be required for future research that I will be involved in here with the low-power group. NEXT: CURRENT & FUTURE WORK

Current & Future Work Currently involved in VLSI, Scheduling, & Multiprocessor techniques Research other areas (memory, solar charging,…) Develop mature design Expand research group (Faculty, Grad, Undergrad)

Conclusion Low-Power Design Absolute necessity Possible at all levels
Environmental – Toxic materials, CA energy crisis Economic – Demand for portable electronics Safety – Police, Fire, EMT, Forestry, … Manufacturability – Faster processors require lower power Possible at all levels Complete redesign of current device architectures

Acknowledgments Dr. Baback Izadi & Dr. Radhakrishnan Judy Depuy
John Aurrichio My fellow classmates and all others present

Any Questions?

Low Power Design for Portable Devices

Similar presentations

Presentation on theme: "Low Power Design for Portable Devices"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Low Power Design for Portable Devices

Similar presentations

Presentation on theme: "Low Power Design for Portable Devices"— Presentation transcript:

Similar presentations

About project

Feedback