Computer Organization & Design (COD)

Computer Organization & Design (COD)
Peng Liu (刘鹏) College of Information Science and Electronic Engineering 信息与电子工程学院 Zhejiang University 浙江大学

Course Outline Instructor Prerequisites Topics of Study
Course Expectations Grading Logic Review and MIPS ISA

Course Information Instructor: LIU, Peng (刘鹏) TA:
Office Hours: TBD, Room 306, ISEE Building, Yuquan Campus stop by/ whenever 学在浙里 TA: ZHANG, Zhendong 张振东 Office Hours: Friday 14:00-16:00 PM. Building of ISEE Room 309.

Prerequisites Logic Design C / JAVA / Python Verilog / OpenCL / C++
Compilers, Operating System, and Circuits/VLSI background is a plus, not needed

Other Course Info Textbooks:
COD (HW/SW Interface) D. Patterson & J. Hennessy, 5th edition, Morgan Kaufmann, 机械工业出版社，2014 CD includes manuals, appendices, in-depth sections, “Green card” summarizes MIPS ISA Website: Webpage for paper links (might be slow to update) Check frequently

Grading Homework Sets - 30% Two Quizzes - 10% Two Projects - 10%
2 Programming Assignments (10%) Two Quizzes - 10% Two Projects - 10% Finial Exam - 50% Lab score * (0.5/3.5) + COD * (3.0/3.5)

Major Topics Hardware-software interface
Machine language and assembly language programming Compiler optimizations and performance Processor design Pipelined processor design Memory hierarchy design Caches I/O devices and systems Virtual memory & operating systems support Multiprocessors and Multithreading

Why take this class? Learn how modern computing hardware works
Understand where computing hardware is going in the future And learn how to contribute to this future… How does this impact system software and applications? Essential to understand OS/compilers/PL For everyone else, it can help you write better code! How are future technologies going to impact computing systems? PL programming language

Topics of Study Focus on what modern computer architects worry about ( both academia and industry) Get through the basics of modern processor design Look at technology trends: multithreading, chip multiprocessor, power-, reliability-, and security-aware design Recent research ideas, and the future of computing hardware

COD 2017 Zhejiang University
Lecture 1 Introduction to Programmable Digital Systems and MIPS Instruction Set Architecture COD 2017 Zhejiang University

Current State of the World
Electronic systems dominate almost everything And of these – most systems use processors and memory Why? Break this question into three questions Why electronics? Why use digital integrated circuits (ICs) to build electronics? Why use processors in ICs? Why use electronics Electrons are easy to move / control Easier than the current alternatives Result is that we move information / not real physical stuff Think phone, , fax, TV, WWW, etc.

Programmable Components aka Processors
An old approach to “solve” complexity problem Build a generic device and customize with memory Through a process called programming (Re)use device in a large number of systems Best way to do this is with a general purpose processor Processor complexity grows with technology But software model stays roughly the same C, C++, Java, … run on Pentium 2, 3, 4, M, Core, Core 2, … True for sequential programs This is getting much tougher to do Recent hardware developments require software model changes Multi core

The Complexity Problem
Complexity is the limiting factor in modern chip design Two problems 1. How do you make use of all that IC resources? Appliance Cellphone, iPad, Apple TV, video camera Too many applications to cast all into hardware logic Takes too long to finish the design 2. How do you make sure it works? Verification problem How do you fix bugs? Only way to survive complexity: Hide complexity in “general-purpose” components Reuse components

Embedded Neural Networks
Local Processing 1-to-10 TOPS/W CNN processing is crucial for always-on embedded operation Source: ISSCC

What is Computer Architecture?

Challenges in the 21st Century

Modeling + Design First Component (Modeling/Measurement):
Come up with a way to: Diagnose where power is going in your system Quantify potential savings Second Component (Design) Try out lots of ideas Or characterize tradeoffs of ideas… This class will focus on both of these at many levels of the computing hierarchy

What is Computer Architecture?
Application Gap too large to bridge in one step (but there are exceptions, e.g. magnetic compass) Physics In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information processing applications efficiently using available manufacturing technologies.

Abstraction Layers in Modern Systems
Application Algorithm Reliability, power, … Parallel computing, security, … Reinvigoration of computer architecture, mid-2000s onward. Original domain of the computer architect (‘50s-’80s) Programming Language Operating System/Virtual Machines Domain of recent computer architecture (‘90s) Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) Circuits Devices Physics

Architecture continually changing
Compatibility Cost of software development makes compatibility a major force in market Applications suggest how to improve technology, provide revenue to fund development Improved technologies make new applications possible Applications Technology

? Major Technology Generations CMOS Bipolar nMOS Vacuum Tubes pMOS
Relays [from Kurzweil] Electromechanical

Transistors could Stop Shrinking in 2021
GlobalFoundries, Intel, Samsung, and TSMC Source: IEEE Spectrum Sep. 2016

A gamut of potential future computing technologies, including new kinds of transistors and memory devices, neuromorphic computing, superconducting circuitry, and processors that use approximate instead of exact answers. 2016年9月17日GlobalFoundries宣布了7nm FinFET半导体工艺规划，面向数据中心、网络、高级移动处理器、深度学习等领域。14nm FinFET->7nm FinFET,电路集成密度可以增加超过1倍，性能则可以提升30%。 AMD 处理器Starship, 48核心96线程，热设计功耗180W。 Intel 7nm 推迟到2022年

Uniprocessor Performance
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 What happened???? VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present

The End of the Uniprocessor Era
Single biggest change in the history of computing systems

Measurement & Evaluation
Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Parallelism Technology Programming Languages Applications Interface Design (ISA) Computer Architecture: • Organization • Hardware/Software Boundary • Building Blocks Compilers Operating Measurement & Evaluation History Systems

Computer Architecture: A Little History
Throughout the course we’ll use a historical narrative to help understand why certain ideas arose Why worry about old ideas? Helps to illustrate the design process, and explains why certain decisions were taken Because future technologies might be as constrained as older ones Those who ignore history are doomed to repeat it Every mistake made in mainframe design was also made in minicomputers, then microcomputers, where next?

Difference Engine 1823 Babbage’s paper is published 1834 The paper is read by Scheutz & his son in Sweden 1842 Babbage gives up the idea of building it; he is onto Analytic Engine! 1855 Scheutz displays his machine at the Paris World Fare Can compute any 6th degree polynomial Speed: 33 to digit numbers per minute! Charles Babbage 查尔斯·巴贝奇：（1791～1871），英国数学家，计算机先驱。家境富有，所有财产都用于科学研究 – 数学天才，剑桥大学“路卡辛讲座”的数学教授 Now the machine is at the Smithsonian

Built in 1944 in IBM Endicott laboratories
Harvard Mark I Built in 1944 in IBM Endicott laboratories Howard Aiken – Professor of Physics at Harvard Essentially mechanical but had some electro-magnetically controlled relays and gears Weighed 5 tons and had 750,000 components A synchronizing clock that beat every seconds (66Hz) Performance: 0.3 seconds for addition 6 seconds for multiplication 1 minute for a sine calculation Decimal arithmetic No Conditional Branch! Broke down 临时出故障的 Broke down once a week!

Linear Equation Solver John Atanasoff, Iowa State University
Atanasoff built the Linear Equation Solver. It had 300 tubes! Special-purpose binary digital calculator Dynamic RAM (stored values on refreshed capacitors) Application: Linear and Integral differential equations Background: Vannevar Bush’s Differential Analyzer --- an analog computer Technology: Tubes and Electromechanical relays Iowa爱荷华州 Atanasoff decided that the correct mode of computation was using electronic binary digits.

ENIAC - The first electronic computer (1946)

Computing Devices Then…
EDSAC, University of Cambridge, UK, 1949

And then there was IBM 701 IBM 701 -- 30 machines were sold in 1953-54
used CRTs as main memory, 72 tubes of 32x32b each IBM a cheaper, drum based machine, more than 120 were sold in 1954 and there were orders for 750 more! Users stopped building their own machines. Why was IBM late getting into computer technology? IBM was making too much money! Even without computers, IBM revenues were doubling every 4 to 5 years in 40’s and 50’s.

Intel 4004 Micro-Processor
1971 1000 transistors 1 MHz operation

And in conclusion … Computer Architecture >> ISAs and RTL
Computer architecture is shaped by technology and applications History provides lessons for the future Computer Science at the crossroads from sequential to parallel computing Salvation requires innovation in many fields, including computer architecture Salvation拯救；救助

Microprogramming A brief look at microprogrammed machines
To show how to build very small processors with complex ISAs To help you understand where CISC machines came from Because it is still used in the most common machines (x86, PowerPC, IBM360) As a gentle introduction into machine structures To help understand how technology drove the move to RISC

ISA to Microarchitecture Mapping
ISA often designed with particular microarchitectural style in mind, e.g., CISC  microcoded RISC  hardwired, pipelined VLIW  fixed-latency in-order pipelines JVM  software interpretation But can be implemented with any microarchitectural style Core 2 Duo: hardwired pipelined CISC (x86) machine (with some microcode support) This lecture: a microcoded RISC (MIPS) machine Intel could implement a dynamically scheduled out-of-order VLIW (IA-64) processor ARM Jazelle: A hardware JVM processor Simics: Software-interpreted SPARC RISC machine An ISA is a designed to meet a number of design constraints: Ease of programming Ease of being a compiler target Ease of representing parallelism Ultimate performance Ease of implementation… Challenge in being implemented efficiently across many generations.

Microarchitecture: Implementation of an ISA
Controller Data path control points status lines Structure: How components are connected. Static Behavior: How data moves between components Dynamic

Microcontrol Unit Maurice Wilkes, 1954
Embed the control logic state table in a memory array Matrix A Matrix B Decoder Next state op conditional code flip-flop  address Control lines to ALU, MUXs, Registers Logic == vacuum tubes Sir Maurice Wilkes still attending architecture conferences.

Microcoded Microarchitecture
Memory (RAM) Datapath mcontroller (ROM) Addr Data zero? busy? opcode enMem MemWrt holds fixed microcode instructions holds user program written in macrocode instructions (e.g., MIPS, x86, etc.)

The MIPS32 ISA Processor State Data types
32 32-bit GPRs, R0 always contains a 0 16 double-precision/32 single-precision FPRs FP status register, used for FP compares & exceptions PC, the program counter some other special registers Data types 8-bit byte, 16-bit half word 32-bit word for integers 32-bit word for single precision floating point 64-bit word for double precision floating point Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect Byte addressable memory- big-endian mode All instructions are 32 bits See H&P Appendix A for full description

MIPS ISA Textbook reading
Look at how instructions are defined and represented What is an instruction set architecture (ISA)? Interplay of C and MIPS ISA Components of MIPS ISA Register operands Memory operands Arithmetic operations Control flow operations

5 components of any Computer
Processor Computer Control (“brain”) Datapath (“brawn”) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk (where not running) The five classic components of a computer: datapath, control, memory, input, and output. These five components also serve as the framework for the rest of this class.

Computer (All Digital Systems) Are At Their Core Pretty Simple
Computers only work with binary signals Signal on a wire is either 0, or 1 Usually called a “bit” More complex stuff (numbers, characters, strings, pictures) Must be built from multiple bits Built out of simple logic gates that perform boolean logic AND, OR, NOT, … And memory cells that preserve bits over time Flip-flops, registers, SRAM cells, DRAM cells, … To get hardware to do anything, need to break it down to bits Stings of bits that tell hardware what to do are called instructions A sequence of instructions called machine language program (machine code)

Hardware/Software Interface
The Instruction Set Architecture (ISA) defines what instructions do MIPS, Intel IA32 (x86), Sun SPARC, PowerPC, IBM 390, Intel IA64, ARMV7, ARMV8 These are all ISAs Many different implementations can implement same ISA (family) 8086,386, 486, Pentium, Pentium II, Pentium 4 implement IA32 Of course they continue to extend it, while maintaining binary compatibility ISA last a long time X86 has been in use since the 70s IBM 390 started as IBM 360 in 60s

Running An Application

MIPS ISA MIPS – semiconductor company that built one of the first commercial RISC architectures Founded by J.Hennessy We will study the MIPS architecture in some detail in this class Why MIPS instead of Intel 80x86? MIPS is simple, elegant and easy to understand X86 is ugly and complicated to explain X86 is dominant on desktop MIPS is prevalent in embedded applications as processor core of system on chip (SOC)

C vs MIPS Programmers Interface
MIPS I ISA Registers 32 32b integer, R0=0 32 32b single FP 16 64b double FP PC and special registers Memory local variables global variables 232linear array of bytes Data types int, short, char, unsigned, float, double, aggregate data types, pointers word (32b), byte (8b), half-word (16b) single FP (32b), double FP (64b) Arithmetic operators +, -, *, %, ++, <, etc. add, sub, mult, slt, etc. Memory access a, *a, a[i], a[i][j] lw, sw, lh, sh, lb, sb Control If-else, while, do-while, for, switch, procedure call, return branches, jumps, jump and link

Why Have Registers? Memory-memory ISA Benefits of registers
ALL HLL variables declared in memory Why not operate directly on memory operands? E.g. Digital Equipment Corp (DEC) VAX ISA Benefits of registers Smaller is faster Multiple concurrent accesses Shorter names Load-Store ISA Arithmetic operations only use register operands Data is loaded into registers, operated on, and stored back to memory All RISC instruction sets

Using Registers Registers are a finite resource that needs to be managed Programmer Compilers: register allocation Goals Keep data in registers as much as possible Always use data still in registers if possible Issues Finite number of registers available Spill register to memory when all register in use Arrays Data is too large to store in registers What’s the impact of fewer or more registers?

Register Naming Registers are identified by a $<num>
By convention, we also give them names $zero contains the hardwired value 0 $v0, $v1 are for results and expression evaluation $a0-$a3 are for arguments $s0, $s1, … $s7 are for save values $to, $t1, …$t9 are for temporary values The others will be introduced as appropriate Compilers use these conventions to simplify linking

Assembly Instructions
The basic type of instruction has four components: Operation name Destination operand 1st source operand 2nd source operand add dst, src1, src # dst = src1 + src2 dst, src1, and src2 are register names ($) What do these instructions do? - add $1, $1, $1

C Example Simple C procedure: sum_pow2 = 2b+c
1:int sum_pow2 (int b, int c) 2:{ 3: int pow2[8] = {1, 2, 4, 8, 16, 32, 64, 128}; 4: int a, ret; 5: a = b + c; 6: if (a < 8) 7: ret = pow2[a]; 8: else 9: ret = 0; 10: return (ret); 11:}

Arithmetic Operators Consider line 5, C operation for addition
a = b + c; Assume the variables are in register $1-$3 respectively. The add operator using registers add $1, $2, $ # a = b +c Use the sub operator for a=b-c in MIPS sub $1, $2, $ # a = b - c But we know that variables a,b, and c really start in some memory location Will add load & store instruction soon

Complex Operations What about more complex statements?
a = b + c + d – e; Break into multiple instructions add $t0, $s1, $s # $t0 = b + c add $t1, $t0, $s # $t1 = $t0 + d sub $s0, $t1, $s # a = $t1 - e

Signed & Unsigned Number
If given b[n-1:0] in a register or in memory Unsigned value Signed value (2’s complement)

Unsigned & Signed Numbers
Example values 4 bits Unsigned: [0, 24 -1] Signed : [ -23, 23 -1] Equivalence Same encoding for non-negative values Uniqueness Every bit pattern represents unique integer value Not true with sign magnitude

Arithmetic Overflow

Constants Often want to be able to specify operand in the instruction: immediate or literal Use the addi instruction addi dst, src1, immediate The immediate is a 16 bit signed value between -215 and Sign-extended to 32 bits Consider the following C code a++; The addi operator addi $s0, $s0, # a = a + 1

Memory Data Transfer Data transfer instructions are used to move data to and from memory. A load operation moves data from a memory location to a register and a store operation moves data from a register to a memory location.

Data Transfer Instructions: Loads
Data transfer instructions have three parts Operator name (transfer size) Destination register Base register address and constant offset Lw dst, offset (base) Offset value is a singed constant

Memory Access All memory access happens through loads and stors
Aligned words, half-words, and bytes More on this later today Floating Point loads and stores for accessing FP registers Displacement based addressing mode

Loading Data Example Consider the example
a = b + *c; Use the lw instruction to load Assume a($s0), b($s1), c($s2) lw $t0, 0 ($s2) # $t0 = Memory[c] add $s0, $s1, $t # a = b + *c

Accessing Arrays Arrays are really pointers to the base address in memory Address of element A[0] Use offset value to indicate which index Remember that addresses are in bytes, so multiply by the size of the element Consider the integer array where pow2 is the base address With this compiler on this architecture, each int requires 4 bytes The data to be accessed is at index 5: pow2[5] Then the address from memory is pow2 + 5*4 Unlike C, assembly does not handle pointer arithmetic for you!

Array Memory Diagram

Array Example Consider the example a = b + pow2[7]
Use the lw instruction offset, assume $s3 = 1000 lw $t0, 28($s3) # $t0 = Memory[pow2[7]] add $s0, $s1, $t0 # a = b + pow2[7]

Complex Array Example Consider line 7 from sum_pow2() ret = pow2[a];
First find the correct offset, again assume $s3 = 1000 sll $t0, $s0, # $t0 = 4 * a; shift left by 2 add $t1, $s3, $t0 # $t1 = pow2 + 4*a lw $v0, 0($t1) # $v0 = Memory[pow2[a]]

Storing Data Storing data is just the reverse and the instruction is nearly identical. Use the sw instruction to copy a word from the source register to an address in memory. sw src, offset (base) Offset value is signed

Storing Data Example Consider the example *a = b + c;
Use the sw instruction to store add $ t0, $s1, $s2 # $t0 = b + c sw $t0, 0($s0) # Memory[s0] = b + c

Storing to an Array Consider the example Use the sw instruction offset
a[3] = b + c; Use the sw instruction offset add $t0, $s1, $s # $t0 = b + c sw $t0, 12($s0) # Memory[a[3]] = b + c

Complex Array Storage Consider the example
a [i] = b + c; Use the sw instruction offset add $t0, $s1, $s2 # $t0 = b + c sll $t1, $s3, # $t1 = 4 * I add $t2, $s0, $t1 #t2 = a + 4*I sw $t0, 0($t2) # Memory[a[i]]= b + c

A “short” Array Example
ANSI C requires a short to be at least 16 bits and no longer than an int, but does not define the exact size For our purposes, treat a short as 2 bytes So, with a short array c[7] is at c + 7*2, shift left by 1

HomeWork Readings: Read Chapter 1, and Chapter 2.1-2.4.
D. Brooks, P. Bose, S. Schuster, H. Jacobson, P. Kudva, A. Buyuktosunoglu, J.D. Wellman, V. Zyuban, M. Gupta, and P. Cook, “Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors,” IEEE Micro, Nov/Dec, 2000. T. Mudge, “Power: A First-Class Architectural Design Constraint,” Computer, 2001. Shekhar Borkar and Andrew A. Chien, “The Future of Microprocessors,” CACM, 54（5）: 67-77, 2011. Daniel A. Reed, Jack Dongarra, 百亿亿次级计算和大数据，ACM通讯, 2015, 58（7）: Ken Shirriff, “The Surprising Story of the First Microprocessors,” IEEE Spectrum, Sep M. Tehranipoor and F. Koushanfar, "A Survey of Hardware Trojan Taxonomy and Detection," IEEE Design and Test of Computers, 2010. Read Chapter 1, and Chapter 下一次技术浪潮：人工智能， Iphone X A11芯片

Acknowledgements These slides contain material from courses:
UCB CS152. Stanford EE108B

Computer Organization & Design (COD)

Similar presentations

Presentation on theme: "Computer Organization & Design (COD)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Organization & Design (COD)

Similar presentations

Presentation on theme: "Computer Organization & Design (COD)"— Presentation transcript:

Similar presentations

About project

Feedback