Presentation on theme: "1 Microprocessor: From the Humble Random-Logic Replacement to the Giant Killer History, the Present, the International Scene, Own Research, and some Closing."— Presentation transcript:
1 Microprocessor: From the Humble Random-Logic Replacement to the Giant Killer History, the Present, the International Scene, Own Research, and some Closing Thoughts for CE Students Ganesh Gopalakrishnan, School of Computing, University of Utah NSF CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) 2005-TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protocols in Shared Memory Processors (the ``MPV'' project) Microsoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''
2 The Microprocessor Rules! Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination -- Albert Einstein Virtually all computers are based on the humble “micro”
3 This Talk How did the “micro” come about? What is the latest in the world of the micro? What about the international scene? What research am I doing? How about some time-tested advise for CE students?
4 Birth of the micro, Mu-P, … Intel’s 4004 and TI’s TMS-1000 were the first 4004 – with cover removed (L) and on (R) Patent awarded to TI ! Intel made single-chip computer for Datapoint Marketed it as 8008 when Datapoint did not use the design
5 Revolution of the 70s and 80s Intel : 4004, 4040, 8008, 8080, 8085, 8086, 80186, 80286, 80386, 80486, Pentium, PPro, … now “X86” (also Itanium) Motorola: 6800, 6810, 6820, 68000, 68010, 68020, … then PowerPC (collab with IBM) Other companies Burst of activity – EVERY student wanted to build something out of a “mu-P” in the 70s and 80s.
6 … and it turned into a Giant Killer! It became amply clear in the 80s that it was going to replace “mainframes” casual experiments conducted between Sun-2 (68020) versus Digital’s VAX 11/750 and 780 The birth of the IBM PC around 1980 started things going mu-P’s way!
7 … and a super Giant Killer! John Hennessy’s prediction during SC’97: ( http://news- service.stanford.edu/news/1997/november19/superco mp1119.html http://news- service.stanford.edu/news/1997/november19/superco mp1119.html John Hennessy: “Today’s microprocessor chipping away at supercomputer market” Traditionally designed supercomputers will vanish within a decade – it has! Bus-based multi microprocessor machines rule! Clusters of them fill vast rooms now!
8 IBM ASCI White Machine Released in 2000 -- Peak Performance : 12.3 teraflops. -- Processors used : IBM RS6000 SP Power3's - 375 MHz. -- There are 8,192 of these processors -- The total amount of RAM is 6Tb. -- Two hundred cabinets - area of two basket ball courts.
9 IBM BlueGene/L The first machine in the family, Blue Gene/L, is expected to operate at a peak performance of about 360 teraflops (360 trillion operations per second), and occupy 64 racks -- taking up only about the same space as half of a tennis court. Researchers at the Lawrence Livermore National Laboratory (LLNL) plan to use Blue Gene/L to simulate physical phenomena that require computational capability much greater than presently available, such as cosmology and the behavior of stellar binary pairs, laser-plasma interactions, and the behavior and aging of high explosives.
10 IBM Power-5 based supercomputer 8 die x 2 CPUs x 2-way execution = 32-way shared memory machine!
11 Sun Niagara processor 8 CPU cores (I’ve heard it is a 32-way machine too – maybe 4-way SMT?)
12 So what are the design issues? Complex cache coherence protocols ! Silicon debugging is becoming a headache ! Programming apps is becoming hard !
13 What is cache coherence? Thread and process interactions need to coordinate Otherwise something analogous to this will happen ! Teller 1 Teller 2 Read bank balance ($100) Read bank balance ($100) Add $10 on scratch paper ($110) Subtract $10 on scratch paper ($90) Enter $110 into account Enter $90 into account USER LEFT WITH $90 – NOT WITH $100 !!
14 Cache Coherence Protocol Verification My “MPV” research project develops techniques to ensure that cache coherence protocols are correct … dir Chip-level protocols Inter-cluster protocols Intra-cluster protocols mem
15 Programming these supercomputers! My “Gauss” project (in collaboration with Robert M. Kirby) ensures that supercomputer programs do not contain bugs, and also perform efficiently Virtually all supercomputers are programmed using the “MPI” communication library Mis-using this library can often result in bugs that show up only after porting P1 MPI_SEND(to P2, Msg) MPI_RECV(from P2, Msg) P2 MPI_SEND(to P1, Msg) MPI_RECV(from P1, Msg) If the system does not provide sufficient buffering, the sends may both block, thus causing a deadlock !
16 Ensuring Simulations are Correct and Efficient (Photo courtesy NHTSA)
17 CANNOT Assume there is a “front-side bus” CANNOT Record all link traffic CAN ONLY Generate sets of possible cache states HOW BEST can one match against designed behavior? I did a prelim study of a simple example during sabbatical Organizing workshop talks in November (FMCAD 2006) cpu Invisible “miss” traffic Visible “miss” traffic Silicon Debugging: Can’t see “inside” CPUs without paying a huge price
18 The real rage these days Multicores ! Putting two simple CPUs achieves 80% performance per cpu with only 50% of the power per CPU chip as a whole gives 1.6x performance for same power PROVIDED we can keep the cores busy Simple way to keep ‘em busy Virus-checker in background while user computes Photoshop in one and Windows on another More complex ways to keep multiple cores busy are being investigated
19 The real rage these days Transaction Memories! Users cause too many bugs when programming using locks Transaction memories allow shared memory threads to “watch” each others read/write actions Conflicting accesses can rollback and retry
20 LOTS of hard problems remain open How to provide memory bandwidth? Put multicore CPU chip on top of highly dense DRAM chip (e.g. 8 GB) Most users will buy just “one of those” Others will buy SDRAM module add-ons Slooow access may need to re-learn olden day techniques of overlay programming !!
21 Learn from History – Learn Computer History If you want to understand today, you have to search yesterday. ~Pearl Buck Things are changing SO fast that basic principles are often being diluted Get excited by studying computer history and seeing how much better off we are (also be chagrined by all the lost opportunity!)
22 Where to learn computer history? Computer History Museum, Mountain View Intel Museum, Santa Clara Boston Computer Museum Many in the UK (Manchester, London, …) Travel widely – be inspired by what you see!
23 It is all International, baby! Learn multiple cultures, how the world works Anything that’s automated is outsourced That said, the US has to try VERY hard to give up its amazing set of advantages Amazing work ethics Individuality Infrastructure !!!!!!!! The talent pool is VERY deep here in the US But, sadly, we as a nation are REALLY trying hard to foolishly give up a lot Simple but sure-footed corrections now we are well off
24 A glimpse of the International Scene Lessons from MSR India Amazing talent-pool Relatively high availability of talent Lessons from Intel India Talent-pool still lacks depth and abilities of many of our CEs We can stay competitive in hardware for a LONG time to come Yet,… it’s easy to ignore how SMART and numerous the talent outside the US is… Seeing first-hand is amazingly useful! Apply for international internships!
25 Gradual loss of manufacturing death How many goods are made in China these days? Any green-house laws? Airbus assembly factory moving to China Autos… watch out How much software / services made in India? THE REAL DANGER Loss of manufacturing kills pride and incentive to learn – we don’t want that in CE
26 Recipe for success The best ideas don’t always work Wait for the world to be ready for the ideas The devil is in the detail Too much established momentum Decide goal (short-term impact vs. long-term) Quiet tenacity Tenacity without ruffling feathers needlessly Work hard! work smart! learn theory! be a champion algorithm / program designer! learn advanced hardware design! Learn to write extremely clearly and precisely! Learn to give inspiring talks! (be inspired first!)