COSC6385 Advanced Computer Architecture Lecture 0. Introduction Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
Course Information Instructor: Larry Shi Lab: PGH 348 Office Hours: TR after class or by appointment Online resources: http://i2c.cs.uh.edu/class/spring2017-cosc6385/ Constantly updated, check it out regularly Textbook Hennessy and Patterson, Computer Architecture: A Quantitative Approach , Morgan Kaufmann. Other teaching materials Key papers available later in class meetings and course web Slides & Lectures Textbook
Readings
Goals of This Course Understand fundamental challenges of achieving performance and efficiency in computer architecture (e.g., speed, throughput, power, size, cost) Become familiar with approaches and techniques used for achieving high performance and balancing conflicting requirements (speed, power, cost, size) Develop a knowledge base and ability to critically evaluate and analyze the performance of processor-based systems
Introduction Rapidly changing field: vacuum tube -> transistor -> IC -> VLSI doubling every 1.5 years: memory capacity processor speed (Due to advances in technology and organization) Things you’ll be learning: how computers work, a basic foundation how to analyze their performance issues affecting modern processors (caches, pipelines) Why learn this stuff? you want to call yourself a “computer expert”
Moore’s Law (a.k.a. Intel’s Roadmap) 1.7 billions Montecito 90 nm 596 mm2 3 billions Nvidia Fermi @40nm 2 billions Tukwila 65 nm 698 mm2 2,250 10 μm 13.5mm2 42millions Exponential growth Source: Intel Corp.
Historical Perspective ENIAC built in World War II was the first general purpose computer Used for computing artillery firing tables 80 feet long by 8.5 feet high and several feet wide Each of the twenty 10 digit registers was 2 feet long Used 18,000 vacuum tubes Performed 1900 additions per second – Since then Moore’s Law Transistor density doubles every 18-24 months – Modern version #cores double every 18-24 months
The Eckert–Mauchly Award The Eckert–Mauchly Award recognizes contributions to digital systems and computer architecture. It is known as the computer architecture community’s most prestigious award. It was named for John Presper Eckert and John William Mauchly, who between 1943 and 1946 collaborated on the design and construction of the first large scale electronic computing machine, known as ENIAC.
First Transistor
Feature Size Feature size shrinks by 70% per 18 to 24 months
Transistor Cost
you need to make a purchasing decision or offer “expert” advice Why Learn This Stuff? you need to make a purchasing decision or offer “expert” advice
you want to build software people use (need performance) Why Learn This Stuff? you want to build software people use (need performance)
IBM Brain Simulation Project
Inline Assembly GPU FPGA High Speed Trading? Inline Assembly GPU FPGA
Bitcoin Mining
Course Scope To Learn Core concepts of modern microprocessor architecture ISA, performance, pipelining Instruction-Level parallelism Branch prediction and Front-end fetch Dynamic HW Scheduling Techniques Memory Hierarchy Multiprocessors, SMT, Multi-core, Many-core Cache Coherence and Memory Consistency Models Case studies of Commercial Microprocessors VLIW, EPIC, Static Scheduling GPU Virtualization and Cloud Physical design, emerging trend, technology integration
Course Scope To Learn Intel Nehalem Instruction level parallelism Pipelined architecture Branch prediction Cache design Memory system optimizations Coherent shared memory Interconnection Networks Parallel Architectures & Clusters Multithreading Multicore Intel Nehalem
Course Scope
Requires time commitment
Grading Three homework assignments: 50% Reading assignment: 10% Exams: 40% Midterm: 20% Final exam: 20%
Stack of A Computing Problem Problems Apps Trend Algorithms Architects’ Territory Programming Languages Compilers ISA System Architecture Implementation MicroArchitecture Logic and Circuits Technology Trend Transistors Manufacturing
Focus on Computer Architecture instruction set software hardware Technology Programming Languages Applications Computer Architecture Operating Systems History Virtualization
Software-Hardware Spiral Faster computing hardware drives ever increasing performance in software. In turn more advanced software drives innovation for increasing performance in hardware. And the spiral continues with hardware and software continuing to push the limits of capabilities with each new generation.
Computer Architecture How do we architect 1B+ transistors into efficient, cost-effective computing devices The instruction set architecture Contract between the software and the implementation Necessary for Moore’s Law scaling! The microarchitecture An implementation of the instruction set architecture Modern instruction set architectures: 80x86 (aka iA32), PowerPC (e.g. G4, G5) Xscale, ARM, MIPS Intel/HP EPIC (iA64), AMD64, Intel’s EM64T, SPARC, HP PA-RISC, DEC/Compaq/HP Alpha
Instruction Sets Present Market Segments High Performance Servers Desktop, Notebook PC Embedded Processors Market Segments x86 PPC Atom ARM MIPS SH X86, Itanium, Sparc, Alpha Present
Constantly Changing Definition 50s to 60s: Computer Architecture ~ Computer Arithmetic 70s to mid 80s: Instruction Set Design, especially ISA appropriate for compilers 90s: Speculation: Predict this, predict that; memory system; I/O system; Multiprocessors; Networks 2000s: Power efficiency , Communication, On-die Interconnection Network, Multi-this, Multi-that. (We are here) 2015 and beyond: Thousand-core processors, Self adapting systems? Self organizing structures? DNA Systems/Quantum Computing?
Constantly Changing Definition Pipelining ILP Multicore/Manycore
Technology Scaling 30% scaling down in dimensions -> doubles transistor density Power per transistor Vdd scaling -> lower power dynamic power leakage short circuit
Power Density Trend Source: Intel Corp.
Google Server Farms (Oregon)
SyNAPSE A circuit board with a 4×4 array of SyNAPSE-developed chips. Each chip has one million electronic “neurons” and 256 million electronic synapses between neurons. Built on 28nm process technology, the 5.4 billion transistor chip has one of the highest transistor counts of any chip ever produced as of 2014.
Google’s Tensor Processing Unit Google’s Tensor Processing Unit, a custom application-specific integrated circuit. The TPU was built specifically for machine learning applications and has apparently been running in Google’s own data centers for over a year.
Computer Architecture for Data Center
Accelerator Architecture
IVB + FPGA Software Development Platform
Quantum Computer Quest
Quantum Computer Quantum computer, a machine potentially far ahead of today’s best supercomputers Makes use of such unusual properties of quantum physics as a particle’s ability to move in one direction and its opposite at the same time
Job Description of a Computer Architect Used to be “Performance, performance, performance” Make trade-off of performance, complexity effectiveness, power, technology, cost, etc. New Fads Availability Where you store your photos, emails and shared docs today? Cloud computing Reliability Toyota blamed soft errors for the sudden acceleration problem Security Intel acquired McAfee Power management It is about money !
Companies Who Hire Computer Architect