1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California.

1 Introduction to Hardware/Architecture David A. Patterson http://cs.berkeley.edu/~patterson/talks patterson@cs.berkeley.edu EECS, University of California Berkeley, CA 94720-1776

2 Technology Trends: Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “Moore’s Law”: Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law

3 Technology Trends: Processor Performance 1.54X/yr Processor performance increase/yr mistakenly referred to as Moore’s Law (transistors/chip)

4 5 components of any Computer Processor (active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk, Network

5 Computer Technology =>Dramatic Change n Processor m 2X in speed every 1.5 years; 1000X performance in last 15 years n Memory m DRAM capacity: 2x / 1.5 years; 1000X size in last 15 years m Cost per bit: improves about 25% per year n Disk m capacity: > 2X in size every 1.5 years m Cost per bit: improves about 60% per year m 120X size in last decade n State-of-the-art PC “when you graduate” (1997-2001) m Processor clock speed: 1500 MegaHertz (1.5 GigaHertz) m Memory capacity: 500 MegaByte(0.5 GigaBytes) m Disk capacity: 100 GigaBytes(0.1 TeraBytes) m New units! Mega => Giga, Giga => Tera

6 Integrated Circuit Costs Die cost = Wafer cost Dies per Wafer * Die yield Die Cost is goes roughly with the cube of the area: fewer dies per wafer * yield worse with die area Flaws Dies

7 Die Yield (1993 data) Raw Dices Per Wafer wafer diameterdie area (mm 2 ) 100144196256324400 6”/15cm139 906244 32 23 8”/20cm265 17712490 68 52 10”/25cm431 290206153116 90 die yield23%19%16%12%11%10% typical CMOS process:  =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer Good Dices Per Wafer (Before Testing!) 6”/15cm31169532 8”/20cm5932191175 10”/25cm96533220139 typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000

8 1993 Real World Examples ChipMetalLineWaferDefectAreaDies/YieldDie Cost layerswidthcost/cm 2 mm 2 wafer 386DX20.90$900 1.0 43 360 71%$4 486DX230.80$1200 1.0 81 181 54%$12 PowerPC 60140.80$1700 1.3 121 115 28%$53 HP PA 710030.80$1300 1.0 196 66 27%$73 DEC Alpha30.70$1500 1.2 234 53 19%$149 SuperSPARC30.70$1700 1.6 256 48 13%$272 Pentium30.80$1500 1.5 296 40 9%$417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

9 Processor Trends/ History n History of innovations to 2X / 1.5 yr m Pipelining (helps seconds / clock, or clock rate) m Out-of-Order Execution (helps clocks / instruction) m Superscalar (helps clocks / instruction)

10 Pipelining is Natural! °Laundry Example °Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away °Washer takes 30 minutes °Dryer takes 30 minutes °“Folder” takes 30 minutes °“Stasher” takes 30 minutes to put clothes into drawers ABCD

11 Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM

12 Pipelined Laundry: Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30

13 Pipeline Hazard: Stall A depends on D; stall since folder tied up TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A E F bubble 30

14 Out-of-Order Laundry: Don’t Wait A depends on D; rest continue; need more resources to allow out-of-order TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30 E F bubble

15 Superscalar Laundry: Parallel per stage More resources, HW match mix of parallel tasks? TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A E F (light clothing) (dark clothing) (very dirty clothing) (light clothing) (dark clothing) (very dirty clothing) 30

16 Superscalar Laundry: Mismatch Mix Task mix underutilizes extra resources TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time 30 (light clothing) (dark clothing) (light clothing) A B D C

17 State of the Art: Alpha 21264 n 15M transistors n 2 64KB caches on chip; 16MB L2 cache off chip n Clock 600 MHz n 90 watts n Superscalar: fetch up to 6 instructions/clock cycle, retires up to 4 instruction/clock cycle n Execution out-of-order

18 Other example: Sony Playstation 2 n Emotion Engine: 6.2 GFLOPS, 75 million polygons per second (Microprocessor Report, 13:5) m Superscalar MIPS core + vector coprocessor + graphics/DRAM m Claim: “Toy Story” realism brought to games

19 The Goal: Illusion of large, fast, cheap memory n Fact: Large memories are slow, fast memories are small n How do we create a memory that is large, cheap and fast (most of the time)? n Hierarchy of Levels m Similar to Principle of Abstraction: hide details of multiple levels

20 Hierarchy Analogy: Term Paper n Working on paper in library at a desk n Option 1: Every time need a book m Leave desk to go to shelves (or stacks) m Find the book m Bring one book back to desk m Read section interested in m When done with section, leave desk and go to shelves carrying book m Put the book back on shelf m Return to desk to work m Next time need a book, go to first step

21 Hierarchy Analogy: Library n Option 2: Every time need a book m Leave some books on desk after fetching them m Only go to shelves when need a new book m When go to shelves, bring back related books in case you need them; sometimes you’ll need to return books not used recently to make space for new books on desk m Return to desk to work m When done, replace books on shelves, carrying as many as you can per trip n Illusion: whole library on your desktop n Buzzword “cache” from French for hidden treasure

22 Why Hierarchy works: Natural Locality n The Principle of Locality: m Program access a relatively small portion of the address space at any instant of time. Address Space 02^n - 1 Probability of reference n What programming constructs lead to Principle of Locality?

23 Memory Hierarchy: How Does it Work? n Temporal Locality (Locality in Time):  Keep most recently accessed data items closer to the processor m Library Analogy: Recently read books are kept on desk m Block is unit of transfer (like book) n Spatial Locality (Locality in Space):  Move blocks consists of contiguous words to the upper levels m Library Analogy: Bring back nearby books on shelves when fetch a book; hope that you might need it later for your paper

24 Memory Hierarchy Pyramid Levels in memory hierarchy Central Processor Unit (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB “Upper” “Lower” Level 3... (data cannot be in level i unless also in i+1)

25 Big Idea of Memory Hierarchy n Temporal locality: keep recently accessed data items closer to processor n Spatial locality: moving contiguous words in memory to upper levels of hierarchy n Uses smaller and faster memory technologies close to the processor m Fast hit time in highest level of hierarchy m Cheap, slow memory furthest from processor n If hit rate is high enough, hierarchy has access time close to the highest (and fastest) level and size equal to the lowest (and largest) level

26 Disk Description / History 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” Sector Track Cylinder Head Platter Arm Embed. Proc. (ECC, SCSI) Track Buffer

27 Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 Mbytes (2.5” diameter) source: N.Y. Times, 2/23/98, page C3 1997: 3090 Mbit/s. i. 8100 Mbytes (3.5” diameter) 2000: 10,100 Mb/s. i. 25,000 MBytes 2000: 11,000 Mb/s. i. 73,400 MBytes

28 State of the Art: Ultrastar 72ZX m 73.4 GB, 3.5 inch disk m 2¢/MB m 16 MB track buffer m 11 platters, 22 surfaces m 15,110 cylinders m 7 Gbit/sq. in. areal density m 17 watts (idle) m 0.1 ms controller time m 5.3 ms avg. seek (seek 1 track => 0.6 ms) m 3 ms = 1/2 rotation m 37 to 22 MB/s to media source: www.ibm.com; www.pricewatch.com; 2/14/00 Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte { + Sector Track Cylinder Head Platter Arm Embed. Proc. Track Buffer

29 A glimpse into the future? n IBM microdrive for digital cameras m 340 Mbytes n Disk target in 5-7 years?

30 Questions? Contact us if you’re interested: email: patterson@cs.berkeley.edu http://iram.cs.berkeley.edu/

1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California.

Similar presentations

Presentation on theme: "1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California.

Similar presentations

Presentation on theme: "1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California."— Presentation transcript:

Similar presentations

About project

Feedback