Samira Khan University of Virginia Jan 26, 2016 COMPUTER ARCHITECTURE CS 6354 Fundamental Concepts: Computing Models The content and concept of this course.

Slides:



Advertisements
Similar presentations
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Advertisements

Computer Architecture lecture 1 、 2 学习报告 (第二次) 亢吉男
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/16/2013.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Organization and Design Computer Abstractions and Technology
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Computer Architecture And Organization UNIT-II General System Architecture.
Computer Architecture Lecture 2: Fundamental Concepts and ISA
Indira Gandhi National Open University presents. A Video Lecture Course: Computer Platforms.
Computer Architecture 2 nd year (computer and Information Sc.)
Processor Architecture
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Computer Architecture: Out-of-Order Execution II
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University.
15-740/ Computer Architecture Lecture 2: SIMD, MIMD, and ISA Principles Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/14/2011.
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/14/2014.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University.
Computer Architecture Lecture 16: SIMD Processing (Vector and Array Processors) Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/24/2014.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/15/2014.
Samira Khan University of Virginia Jan 28, 2016 COMPUTER ARCHITECTURE CS 6354 Fundamental Concepts: Computing Models and ISA Tradeoffs The content and.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
William Stallings Computer Organization and Architecture 6th Edition
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Computer Architecture: SIMD and GPUs (Part I)
Computer Architecture Lecture 2: Fundamental Concepts and ISA
Caches Samira Khan March 23, 2017.
15-740/ Computer Architecture Lecture 3: Performance
Precise Exceptions and Out-of-Order Execution
Prof. Onur Mutlu Carnegie Mellon University
Fall 2012 Parallel Computer Architecture Lecture 22: Dataflow I
Samira Khan University of Virginia Aug 28, 2017
Architecture & Organization 1
Samira Khan University of Virginia Sep 4, 2017
Morgan Kaufmann Publishers
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014
Architecture & Organization 1
Computer Architecture Dataflow (Part II) and Systolic Arrays
T Computer Architecture, Autumn 2005
Samira Khan University of Virginia Sep 5, 2018
Computer Architecture
What is Computer Architecture?
Introduction to Microprocessor Programming
COMS 361 Computer Organization
Computer Architecture Lecture 3a: Some Fundamentals
Samira Khan University of Virginia Jan 23, 2019
Computer Architecture Lecture 2b: Course Logistics
Samira Khan University of Virginia Jan 16, 2019
Samira Khan University of Virginia Jan 30, 2019
CSE378 Introduction to Machine Organization
Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 3/19/2012
Prof. Onur Mutlu Carnegie Mellon University
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

Samira Khan University of Virginia Jan 26, 2016 COMPUTER ARCHITECTURE CS 6354 Fundamental Concepts: Computing Models The content and concept of this course are adapted from CMU ECE 740

AGENDA Review from last lecture Why study computer architecture? Fundamental concepts – Computing models 2

LAST LECTURE RECAP What it means/takes to be a good (computer) architect – Roles of a computer architect (look everywhere!) Levels of transformation Abstraction layers, their benefits, and the benefits of comfortably crossing them Two example problems and solution ideas – Solving DRAM Scaling with system-level detection and mitigation – Merging memory and storage with non-volatile memories Course Logistics Assignments: HW (today), Review Set 1 (Saturday) 3

REVIEW: KEY TAKEAWAY Breaking the abstraction layers (between components and transformation hierarchy levels) and knowing what is underneath enables you to solve problems and design better future systems Cooperation between multiple components and layers can enable more effective solutions and systems 4

HOW TO DO THE PAPER REVIEWS 1: Brief summary – What is the problem the paper is trying to solve? – What are the key ideas of the paper? Key insights? – What is the key contribution to literature at the time it was written? – What are the most important things you take out from it? 2: Strengths (most important ones) – Does the paper solve the problem well? 3: Weaknesses (most important ones) – This is where you should think critically. Every paper/idea has a weakness. This does not mean the paper is necessarily bad. It means there is room for improvement and future research can accomplish this. 4: Can you do (much) better? Present your thoughts/ideas. 5: What have you learned/enjoyed/disliked in the paper? Why? Review should be short and concise (~half a page to a page) 5

AGENDA Review from last lecture Why study computer architecture? Fundamental concepts – Computing models 6

AN ENABLER: MOORE’S LAW Moore, “Cramming more components onto integrated circuits,” Electronics Magazine, Component counts double every other year Image source: Intel 7

Number of transistors on an integrated circuit doubles ~ every two years Image source: Wikipedia 8

RECOMMENDED READING Moore, “Cramming more components onto integrated circuits,” Electronics Magazine, Only 3 pages A quote: “With unit cost falling as the number of components per circuit rises, by 1975 economics may dictate squeezing as many as components on a single silicon chip.” Another quote: “Will it be possible to remove the heat generated by tens of thousands of components in a single silicon chip?” 9

WHAT DO WE USE THESE TRANSISTORS FOR? Your readings for this week should give you an idea… Patt, “Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution,” Proceedings of the IEEE Mutlu and Subramanium, “Research Problems and Opportunities in Memory Systems,” SUPERFRI

WHY STUDY COMPUTER ARCHITECTURE? Enable better systems: make computers faster, cheaper, smaller, more reliable, … – By exploiting advances and changes in underlying technology/circuits Enable new applications – Life-like 3D visualization 20 years ago? – Virtual reality? – Personalized genomics? Personalized medicine? Enable better solutions to problems – Software innovation is built into trends and changes in computer architecture > 50% performance improvement per year has enabled this innovation Understand why computers work the way they do 11

COMPUTER ARCHITECTURE TODAY (I) Today is a very exciting time to study computer architecture Industry is in a large paradigm shift (to multi-core and beyond) – many different potential system designs possible Many difficult problems motivating and caused by the shift – Power/energy constraints  multi-core? – Complexity of design  multi-core? – Difficulties in technology scaling  new technologies? – Memory wall/gap – Reliability wall/issues – Programmability wall/problem – Huge hunger for data and new data-intensive applications No clear, definitive answers to these problems 12

COMPUTER ARCHITECTURE TODAY (II) These problems affect all parts of the computing stack – if we do not change the way we design systems No clear, definitive answers to these problems Microarchitecture ISA Program/Language Algorithm Problem Runtime System (VM, OS, MM) User Logic Circuits Electrons Many new demands from the top (Look Up) Many new issues at the bottom (Look Down) Fast changing demands and personalities of users (Look Up) 13

Computing landscape is very different from years ago Both UP (software and humanity trends) and DOWN (technologies and their issues), FORWARD and BACKWARD, and the resulting requirements and constraints General Purpose GPUs Hybrid Main Memory Persistent Memory/Storage Every component and its interfaces, as well as entire system designs are being re-examined COMPUTER ARCHITECTURE TODAY (III) Heterogeneous Processors and Accelerators 14

You can revolutionize the way computers are built, if you understand both the hardware and the software (and change each accordingly) You can invent new paradigms for computation, communication, and storage Recommended book: Thomas Kuhn, “The Structure of Scientific Revolutions” (1962) – Pre-paradigm science: no clear consensus in the field – Normal science: dominant theory used to explain/improve things (business as usual); exceptions considered anomalies – Revolutionary science: underlying assumptions re-examined COMPUTER ARCHITECTURE TODAY (IV) 15

You can revolutionize the way computers are built, if you understand both the hardware and the software (and change each accordingly) You can invent new paradigms for computation, communication, and storage Recommended book: Thomas Kuhn, “The Structure of Scientific Revolutions” (1962) – Pre-paradigm science: no clear consensus in the field – Normal science: dominant theory used to explain/improve things (business as usual); exceptions considered anomalies – Revolutionary science: underlying assumptions re-examined COMPUTER ARCHITECTURE TODAY (IV) 16

… BUT, FIRST … Let’s understand the fundamentals… You can change the world only if you understand it well enough… – Especially the past and present dominant paradigms – And, their advantages and shortcomings – tradeoffs – And, what remains fundamental across generations – And, what techniques you can use and develop to solve problems 17

AGENDA Review from last lecture Why study computer architecture? Fundamental concepts – Computing models 18

WHAT IS A COMPUTER? Three key components Computation Communication Storage (memory) 19

Memory (program and data) I/O Processing control (sequencing) datapath WHAT IS A COMPUTER? 20

THE VON NEUMANN MODEL/ARCHITECTURE Also called stored program computer (instructions in memory). Two key properties: Stored program – Instructions stored in a linear memory array – Memory is unified between instructions and data The interpretation of a stored value depends on the control signals Sequential instruction processing – One instruction processed (fetched, executed, and completed) at a time – Program counter (instruction pointer) identifies the current instr. – Program counter is advanced sequentially except for control transfer instructions When is a value interpreted as an instruction? 21

Recommended reading – Burks, Goldstein, Von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” Stored program Sequential instruction processing THE VON NEUMANN MODEL/ARCHITECTURE 22

THE VON NEUMANN MODEL (OF A COMPUTER) CONTROL UNIT IPInst Register PROCESSING UNIT ALU TEMP MEMORY Mem Addr Reg Mem Data Reg INPUTOUTPUT 23

Q: Is this the only way that a computer can operate? A: No. Qualified Answer: But, it has been the dominant way – i.e., the dominant paradigm for computing – for N decades THE VON NEUMANN MODEL (OF A COMPUTER) 24

Von Neumann model: An instruction is fetched and executed in control flow order – As specified by the instruction pointer – Sequential unless explicit control flow instruction Dataflow model: An instruction is fetched and executed in data flow order – i.e., when its operands are ready – i.e., there is no instruction pointer – Instruction ordering specified by data flow dependence Each instruction specifies “who” should receive the result An instruction can “fire” whenever all operands are received – Potentially many instructions can execute at the same time Inherently more parallel THE DATA FLOW MODEL (OF A COMPUTER) 25

VON NEUMANN VS DATAFLOW Consider a Von Neumann program – What is the significance of the program order? – What is the significance of the storage locations? Which model is more natural to you as a programmer? v <= a + b; w <= b * 2; x <= v - w y <= v + w z <= x * y +*2 -+ * a b z Sequential Dataflow 26

MORE ON DATA FLOW In a data flow machine, a program consists of data flow nodes – A data flow node fires (fetched and executed) when all it inputs are ready i.e. when all inputs have tokens Data flow node and its ISA representation 27

DATA FLOW NODES 28

AN EXAMPLE DATA FLOW PROGRAM OUT 29

CONTROL FLOW VS. DATA FLOW 30

Dataflow Machine: Instruction Templates Each arc in the graph has an operand slot in the program Destination 1 Destination 2 Presence bits L 4L * 3R 4R - 5L + 5R * out Opcode Operand 1Operand 2 a b +*7 -+ * y x Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974.

DATA FLOW SUMMARY Availability of data determines order of execution A data flow node fires when its sources are ready Programs represented as data flow graphs (of nodes) Data Flow at the ISA level has not been (as) successful Data Flow implementations under the hood (while preserving sequential ISA semantics) have been successful – Out of order execution – Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA

DATA FLOW CHARACTERISTICS Data-driven execution of instruction-level graphical code – Nodes are operators – Arcs are data (I/O) – As opposed to control-driven execution Only real dependencies constrain processing No sequential I-stream – No program counter Operations execute asynchronously Execution triggered by the presence of data 33

DATA FLOW ADVANTAGES/DISADVANTAGES Advantages – Very good at exploiting irregular parallelism – Only real dependencies constrain processing Disadvantages – Debugging difficult (no precise state) Interrupt/exception handling is difficult (what is precise state semantics?) – Implementing dynamic data structures difficult in pure data flow models – Too much parallelism? (Parallelism control needed) – High bookkeeping overhead (tag matching, data storage) – Instruction cycle is inefficient (delay between dependent instructions), memory locality is not exploited 34

ANOTHER WAY OF EXPLOITING PARALLELISM SIMD: – Concurrency arises from performing the same operations on different pieces of data MIMD: – Concurrency arises from performing different operations on different pieces of data Control/thread parallelism: execute different threads of control in parallel  multithreading, multiprocessing – Idea: Use multiple processors to solve a problem 35

FLYNN’S TAXONOMY OF COMPUTERS Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966 SISD: Single instruction operates on single data element SIMD: Single instruction operates on multiple data elements – Array processor – Vector processor MISD: Multiple instructions operate on single data element – Closest form: systolic array processor, streaming processor MIMD: Multiple instructions operate on multiple data elements (multiple instruction streams) – Multiprocessor – Multithreaded processor 36

Concurrency arises from performing the same operations on different pieces of data – Single instruction multiple data (SIMD) – E.g., dot product of two vectors Contrast with thread (“control”) parallelism – Concurrency arises from executing different threads of control in parallel Contrast with data flow – Concurrency arises from executing different operations in parallel (in a data driven manner) SIMD exploits instruction-level parallelism – Multiple instructions concurrent: instructions happen to be the same 37 SIMD PROCESSING

Single instruction operates on multiple data elements – In time or in space Multiple processing elements Time-space duality – Array processor: Instruction operates on multiple data elements at the same time – Vector processor: Instruction operates on multiple data elements in consecutive time steps 38

ARRAY VS. VECTOR PROCESSORS ARRAY PROCESSORVECTOR PROCESSOR LD VR  A[3:0] ADD VR  VR, 1 MUL VR  VR, 2 ST A[3:0]  VR Instruction Stream Time LD0LD1 LD2 LD3 AD0AD1 AD2 AD3 MU0MU1 MU2 MU3 ST0ST1 ST2 ST3 LD0 LD1AD0 LD2AD1 MU0 LD3AD2 MU1 ST0 AD3 MU2 ST1 MU3 ST2 ST3 Space Same same time Different same space Different time Same space 39

SCALAR PROCESSING Conventional form of processing (von Neumann model) add r1, r2, r3 40

SIMD ARRAY PROCESSING Array processor 41

VLIW PROCESSING Very Long Instruction Word – We will get back to this later 42

VECTOR PROCESSORS A vector is a one-dimensional array of numbers Many scientific/commercial programs use vectors for (i = 0; i<=49; i++) C[i] = (A[i] + B[i]) / 2 A vector processor is one whose instructions operate on vectors rather than scalar (single data) values Basic requirements – Need to load/store vectors  vector registers (contain vectors) – Need to operate on vectors of different lengths  vector length register (VLEN) – Elements of a vector might be stored apart from each other in memory  vector stride register (VSTR) Stride: distance between two elements of a vector 43

VECTOR PROCESSOR ADVANTAGES + No dependencies within a vector – Pipelining, parallelization work well – Can have very deep pipelines, no dependencies! + Each instruction generates a lot of work – Reduces instruction fetch bandwidth + Highly regular memory access pattern – Interleaving multiple banks for higher memory bandwidth – Prefetching + No need to explicitly code loops – Fewer branches in the instruction sequence 44

Samira Khan University of Virginia Jan 26, 2016 COMPUTER ARCHITECTURE CS 6354 Fundamental Concepts: Computing Models The content and concept of this course are adapted from CMU ECE 740