CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH

CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm - 2444 BH
Lecture 5 - ALU Design I: Adder and Simple Logic Operations Instructor: Prof. Jason Cong

Basis of Evaluation Cons Pros very specific non-portable
difficult to run, or measure hard to identify cause representative Actual Target Workload portable widely used improvements useful in reality less representative Full Application Benchmarks (e.g. SPEC’95) Small “Kernel” Benchmarks (e.g. Livermore loop) easy to “fool” easy to run, early in design cycle “peak” may be a long way from application performance identify peak capability and potential bottlenecks Microbenchmarks (e.g. MIPS, MFLOPS) CS151B Jason Cong

Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr count CPI clock rate Program X Compiler X X Instr. Set X X X Organization X X Technology X CS151B Jason Cong

Integrated Circuit Costs
Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer =  * ( Wafer_diam / 2)2 –  * Wafer_diam – Test dies  Wafer Area Die Area  2 * Die Area Die Area Die Yield = Wafer yield Die yield: assume defects are randomly, distributed,  { 1+ Defects_per_unit_area * Die_Area  } Die Cost is goes roughly with (die area)3 or (die area)4 CS151B Jason Cong

Good Dice Per Wafer (Before Testing!)
Die Yield Raw Dice Per Wafer wafer diameter die area (mm2) 6”/15cm 8”/20cm 10”/25cm die yield 23% 19% 16% 12% 11% 10% typical CMOS process:  =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer Good Dice Per Wafer (Before Testing!) 6”/15cm 8”/20cm 10”/25cm typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000 CS151B Jason Cong

Packaging Cost: depends on pins, heat dissipation
Other Costs IC cost = Die cost Testing cost Packaging cost Final test yield Packaging Cost: depends on pins, heat dissipation Chip Die Package Test & Total cost pins type cost Assembly 386DX $ QFP $1 $4 $9 486DX2 $ PGA $11 $12 $35 PowerPC 601 $ QFP $3 $21 $77 HP PA $ PGA $35 $16 $124 DEC Alpha $ PGA $30 $23 $202 SuperSPARC $ PGA $20 $34 $326 Pentium $ PGA $19 $37 $473 CS151B Jason Cong

The Design Process "To Design Is To Represent"
Design activity yields description/representation of an object -- Traditional craftsman does not distinguish between the conceptualization and the artifact -- Separation comes about because of complexity -- The concept is captured in one or more representation languages -- This process IS design Design Begins With Requirements One way to think about design is: To Design is to Represent. The result of your design activity is a representation of the object you are designing. Every design begins with a set of requirements. First is the functional requirement: a statement of what it will do. WHAT Then there is a list of performance requirements: what speed it needs to run, how much power it is allowed to consume, how big can your design be, how much will it cost ... etc. COST/PERFORMANCE +1 = 5 min. (X:45) -- Functional Capabilities: what it will do -- Performance Characteristics: Speed, Power, Area, Cost, . . . CS151B Jason Cong

Design is a "creative process," not a simple method
Design Process (cont.) Design Finishes As Assembly CPU -- Design understood in terms of components and how they have been interconnected/assembled -- Top Down decomposition of complex functions (behaviors) into more primitive functions -- Bottom-up composition of primitive building blocks into more complex assemblies Datapath Control ALU Regs Shifter Nand Gate One of the fun part about being a designer is that you got to be a little kid playing “lego” again, except this time, you will be designing the building blocks as well as putting the building blocks together. The two approaches you will use are: Top Down and Bottom Up. You use the Top Down approach to decompose complex function into primitive functions. After the primitive functions are implemented, you then need to integrate them back together to implement the original complex function. For example, when you design a CPU, you use the top down approach to break the CPU into these primitive blocks. Once you have these blocks implemented, you then put them together to form the CPU. This is pretty clean cut. In many other design problems, you cannot just apply the top-down and then bottom up once. You need to repeat the process several times because design is a creative process, NOT a simple method. Top-Down & Bottom-up together +2 = 7 min. (X:47) Design is a "creative process," not a simple method CS151B Jason Cong

Design Refinement Informal System Requirement Initial Specification
Intermediate Specification Final Architectural Description Intermediate Specification of Implementation Final Internal Specification Physical Implementation refinement increasing level of detail Unless you are a real genius like Mozart who could do everything right the first time, the “creative” process means a successive of refinement. So you would start out with an informal system requirement and keep on refining it until you have your final physical implementation. As you refine your design, you will keep on increasing the level of details in the specification. High level function (add) to gates +1 = 8 min. (X:48) CS151B Jason Cong

Design as Search Feasible (good) choices vs. Optimal choices Problem A
Strategy 1 Strategy 2 SubProb2 SubProb3 SubProb 1 BB1 BB2 BB3 BBn Design involves educated guesses and verification -- Given the goals, how should these be prioritized? -- Given alternative design pieces, which should be selected? -- Given design space of components & assemblies, which part will yield the best solution? Feasible (good) choices vs. Optimal choices One way to think about the design process is that it is a search for the proper solution through the design space (point to the diagram). How do you know where to find the proper solution? Well usually you don’t. What you need to do is make educated guesses and then verify whether your guesses are correct. If you are correct, you congratulate yourself. You you are wrong, try again. You will have a set of design goals: some are given to you by your supervisors and some may set some for you own. In any case, with a set of goals and some may be contradicting, you must learn how to prioritize them. The way to remember about design is that there are many ways to do the same thing. There is really no such thing as the absolute “right way” to do certain things. Ideally, we like to always pick the best solution but remember your goal should be best solution for the ORIGINAL problem (Problem A). For the Sub-problems down here, you may not need to have to make the optimal choice each time. Sometimes all you need is a reasonable good choice. It takes design time to do best choice at every level, and if run out of design time can jeopardize project in fast moving technology If you have a choice that is good enough for a sub-problem, you should be happy with it and move onto other sub-problems that require your attention. Remember, even the world’s fastest ALU will not do you any good unless you have an equally fast controller to controls it. +3 = 11 min. (X:51) CS151B Jason Cong

Problem: Design a “fast” ALU for the MIPS ISA
Requirements? Must support the Arithmetic / Logic operations Tradeoffs of cost and speed based on frequency of occurrence, hardware budget CS151B Jason Cong

Add, AddU, Sub, SubU, AddI, AddIU And, Or, AndI, OrI, Xor, Xori, Nor
MIPS ALU requirements Add, AddU, Sub, SubU, AddI, AddIU => 2’s complement adder/sub with overflow detection And, Or, AndI, OrI, Xor, Xori, Nor => Logical AND, logical OR, XOR, nor SLTI, SLTIU (set less than) => 2’s complement adder with inverter, check sign bit of result ALU from from CS 150 / P&H book chapter 4 supports these ops CS151B Jason Cong

MIPS arithmetic instruction format
31 25 20 15 5 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 Type op funct ADDI 10 xx ADDIU 11 xx SLTI 12 xx SLTIU 13 xx ANDI 14 xx ORI 15 xx XORI 16 xx LUI 17 xx Type op funct ADD 00 40 ADDU 00 41 SUB 00 42 SUBU 00 43 AND 00 44 OR 00 45 XOR 00 46 NOR 00 47 Type op funct 00 50 00 51 SLT 00 52 SLTU 00 53 Signed arith. generate overflow, no carry CS151B Jason Cong

Design Trick: divide & conquer
Break the problem into simpler problems, solve them and glue together the solution Example: assume the immediates have been taken care of before the ALU 10 operations (4 bits): M0 M1 M2 M3 00 add 01 addU 02 sub 03 subU 04 and 05 or 06 xor 07 nor 12 slt 13 sltU CS151B Jason Cong

Refined Requirements ALU (1) Functional Specification
inputs: 2 x 32-bit operands A, B, 4-bit mode outputs: 32-bit result S, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (powerview symbol, VHDL entity) 32 32 A B 4 ALU c m ovf S 32 CS151B Jason Cong

Bit slice with carry look-ahead . . .
Design Decisions ALU bit slice 7-to-2 C/L 7 3-to-2 C/L PLD Gates CL0 CL6 mux Simple bit-slice big combinational problem many little combinational problems partition into 2-step problem Bit slice with carry look-ahead . . . CS151B Jason Cong

Refined Diagram: bit-slice ALU
32 A B 32 ALU0 a0 b0 m cin co s0 ALU31 a31 b31 m cin co s31 4 M Ovflw 32 S CS151B Jason Cong

7-to-2 Combinational Logic
start turning the crank . . . Function Inputs Outputs K-Map M0 M1 M2 M3 A B Cin S Cout add Just fill in all combinations that you want 127 CS151B Jason Cong

Design trick 3: solve part of the problem and extend
Seven plus a MUX ? Design trick 2: take pieces you know (or can imagine) and try to put them together Design trick 3: solve part of the problem and extend A B 1-bit Full Adder CarryOut Mux CarryIn Result add and or S-select Now that I have shown you how to build a 1-bit full adder, we have all the major components needed for this 1-bit ALU. In order to build a 4-bit ALU, we simply connect four 1-bit ALUs in series to feed the CarryOut of one ALU to the CarryIn of the next ALU. Even though I called this an ALU, I actually lied a little. There is something missing about this ALU. This ALU can NOT perform the subtract operation. Let’s see how can we fix this problem. 2 min = 35 min. (Y:15) CS151B Jason Cong

Additional operations
A - B = A + (– B) = A + B + 1 form two complement by invert and add one S-select invert CarryIn and A or Result Mux add 1-bit Full Adder B CarryOut Set-less-than? – left as an exercise CS151B Jason Cong

LSB and MSB need to do a little extra
Revised Diagram LSB and MSB need to do a little extra 32 A B 32 a0 b0 a31 b31 4 ALU0 ALU0 M ? co cin co cin s0 s31 C/L to produce select, comp, c-in Ovflw 32 S CS151B Jason Cong

Overflow Examples: 7 + 3 = 10 but ... - 4 - 5 = - 9 but ... Decimal
Binary Decimal 2’s Complement 0000 0000 1 0001 -1 1111 2 0010 -2 1110 3 0011 -3 1101 4 0100 -4 1100 5 0101 -5 1011 6 0110 -6 1010 7 0111 -7 1001 -8 1000 Examples: = but ... = but ... Well so far so good but life is not always perfect. Let’s consider the case 7 plus 3, you will get 10. But if you perform the binary arithmetics on our 4-bit adder you will get 1010, which is negative 6. Similarly, if you try to add negative 4 and negative 5 together, you should get negative 9. But the binary arithmetics will give you 0111, which is 7. So what went wrong? The problem is overflow. The number you get are simply too big, in the positive 10 case, and too small in the negative 9 case, to be represented by four bits. +2 = 39 min. (Y:19) 1 1 1 1 1 1 1 7 1 1 – 4 3 – 5 + 1 1 + 1 1 1 1 1 – 6 1 1 1 7 CS151B Jason Cong

Overflow Detection Overflow: the result is too large (or too small) to represent properly Example: - 8  4-bit binary number  7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: 2 positive numbers and the sum is negative 2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: Carry into MSB  Carry out of MSB Recalled from some earlier slides that the biggest positive number you can represent using 4-bit is 7 and the smallest negative you can represent is negative 8. So any time your addition results in a number bigger than 7 or less than negative 8, you have an overflow. Keep in mind is that whenever you try to add two numbers together that have different signs, that is adding a negative number to a positive number, overflow can NOT occur. Overflow occurs when you to add two positive numbers together and the sum has a negative sign. Or, when you try to add negative numbers together and the sum has a positive sign. If you spend some time, you can convince yourself that If the Carry into the most significant bit is NOT the same as the Carry coming out of the MSB, you have a overflow. +2 = 41 min. (Y:21) 1 1 1 1 1 1 1 7 1 1 –4 3 – 5 + 1 1 + 1 1 1 1 1 – 6 1 1 1 7 CS151B Jason Cong

Overflow Detection Logic
Carry into MSB  Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 1-bit ALU Result0 X Y X XOR Y B0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 CarryOut0 1 1 1 1 1 1 CarryIn2 A2 1-bit ALU Result2 Recall the XOR gate implements the not equal function: that is, its output is 1 only if the inputs have different values. Therefore all we need to do is connect the carry into the most significant bit and the carry out of the most significant bit to the XOR gate. Then the output of the XOR gate will give us the Overflow signal. +1 = 42 min. (Y:22) B2 CarryIn3 Overflow A3 1-bit ALU Result3 B3 CarryOut3 CS151B Jason Cong

LSB and MSB need to do a little extra
More Revised Diagram LSB and MSB need to do a little extra 32 A B 32 signed-arith and cin xor co a0 b0 a31 b31 4 ALU0 ALU0 M co cin co cin s0 s31 C/L to produce select, comp, c-in Ovflw 32 S CS151B Jason Cong

But What about Performance?
Critical Path of n-bit Rippled-carry adder is n*CP A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 Result1 CarryIn1 CarryOut1 A2 B2 Result2 CarryIn2 CarryOut2 A3 B3 Result3 CarryIn3 CarryOut3 Design Trick: Throw hardware at it CS151B Jason Cong

Carry Look Ahead (Design trick: peek)
C0 = Cin A B C-out 0 0 0 “kill” 0 1 C-in “propagate” 1 0 C-in “propagate” 1 1 1 “generate” A0 B0 A1 B1 A2 B2 A3 B3 S G P C1 = G0 + C0  P0 G = A and B P = A xor B C2 = G1 + G0 P1 + C0  P0  P1 Names: suppose G0 is 1 => carry no matter what else => generates a carry suppose G0 =0 and P0=1 => carry IFF C0 is a 1 => propagates a carry Like dominoes What about more than 4 bits? C3 = G2 + G1 P2 + G0  P1  P2 + C0  P0  P1  P2 G P C4 = . . . CS151B Jason Cong

Plumbing as Carry Lookahead Analogy
CS151B Jason Cong

Cascaded Carry Look-ahead (16-bit): Abstraction
G0 P0 C1 = G0 + C0  P0 4-bit Adder C2 = G1 + G0 P1 + C0  P0  P1 4-bit Adder C3 = G2 + G1 P2 + G0  P1  P2 + C0  P0  P1  P2 4-bit Adder G P CS151B Jason Cong C4 = . . .

2nd level Carry, Propagate as Plumbing
CS151B Jason Cong

Design Trick: Guess (or “Precompute”)
CP(2n) = 2*CP(n) n-bit adder n-bit adder CP(2n) = CP(n) + CP(mux) n-bit adder 1 n-bit adder n-bit adder Use multiplexor to save time: guess both ways and then select (assumes mux is faster than adder) Cout Carry-select adder CS151B Jason Cong

Carry Skip Adder: reduce worst case delay
B A4 B A0 4-bit Ripple Adder 4-bit Ripple Adder P3 S P3 S P2 P2 P1 P1 P0 P0 Just speed up the slowest case for each block Exercise: optimal design uses variable block sizes CS151B Jason Cong

Additional MIPS ALU requirements
Mult, MultU, Div, DivU (next lecture) => Need 32-bit multiply and divide, signed and unsigned Sll, Srl, Sra (next lecture) => Need left shift, right shift, right shift arithmetic by 0 to 31 bits Nor (leave as exercise to reader) => logical NOR or use 2 steps: (A OR B) XOR CS151B Jason Cong

Elements of the Design Process
Divide and Conquer (e.g., ALU) Formulate a solution in terms of simpler components. Design each of the components (subproblems) Generate and Test (e.g., ALU) Given a collection of building blocks, look for ways of putting them together that meets requirement Successive Refinement (e.g., carry lookahead) Solve "most" of the problem (i.e., ignore some constraints or special cases), examine and correct shortcomings. Formulate High-Level Alternatives (e.g., carry select) Articulate many strategies to "keep in mind" while pursuing any one approach. Work on the Things you Know How to Do The unknown will become “obvious” as you make progress. Here are some key elements of the design process. First is divide and conquer. (a) First you formulate a solution in terms of simpler components. (b) Then you concentrate on designing each components. Once you have the individual components built, you need to find a way to put them together to solve our original problem. Unless you are really good or really lucky, you probably won’t have a perfect solution the first time so you will need to apply successive refinement to your design. While you are pursuing any one approach, you need to keep alternate strategies in mind in case what you are pursuing does not work out. One of the most important advice I can give you is that work on the things you know how to do first. As you make forward progress, a lot of the unknowns will become clear. If you sit around and wait until you know everything before you start, you will never get anything done. +2 = 15 min. (X:55) CS151B Jason Cong

Summary of the Design Process
Hierarchical Design to manage complexity Top Down vs. Bottom Up vs. Successive Refinement Importance of Design Representations: Block Diagrams Decomposition into Bit Slices Truth Tables, K-Maps Circuit Diagrams Other Descriptions: state diagrams, timing diagrams, reg xfer, . . . Optimization Criteria: Gate Count [Package Count] top down bottom up mux design meets at TT This slide summaries some of the key points of the design process. Using a hierarchical design style is the best way to manage complexity because it allows you to ignore low level details while concentrating on the big picture. However, you cannot ignore the details forever. That’s why you need to use both the top-down and bottom-up strategies as you make successive refinements to your design. Some of the optimization criteria are: chip or broad area, pin count if you are designing a chip, delay, power, cost, and last but not least, the design time. As I pointed out at the last lecture, the computer market is so competitive that if your product is late by a year, you may fall way behind in the performance curve so design time can be one of the most important consideration. +2 = 57 min. (X:57) Area Logic Levels Fan-in/Fan-out Delay Power Pin Out Cost Design time CS151B Jason Cong

An Overview of the Design Process
Lecture Summary Cost and Price Die size determines chip cost: cost die size( +1) Cost v. Price: business model of company, pay for engineers R&D must return $8 to $14 for every $1 invester An Overview of the Design Process Design is an iterative process, multiple approaches to get started Do NOT wait until you know everything before you start Example: Instruction Set drives the ALU design On-line Design Notebook Open a window and keep an editor running while you work;cut&paste Refer to the handout as an example Former CS 152 students (and TAs) say they use on-line notebook for programming as well as hardware design; one of most valuable skills CS151B Jason Cong

Acknowledgements The majority of slides in this lecture are from UC Berkeley for their CS152 course (David Patterson, John Kubiatowicz, …) CS151B Jason Cong

CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH

Similar presentations

Presentation on theme: "CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH

Similar presentations

Presentation on theme: "CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH"— Presentation transcript:

Similar presentations

About project

Feedback