ISA CSCE430/830 Instruction Set Architecture (ISA) Serves as an interface between software and hardware. Provides a mechanism by which the software tells the hardware what should be done. instruction set High level language code : C, C++, Java, Fortran, hardware Assembly language code: architecture specific statements Machine language code: architecture specific bit patterns software compiler assembler
ISA CSCE430/830 Instruction Set Design Issues Instruction set design issues include: –Where are operands stored? »registers, memory, stack, accumulator –How many explicit operands are there? »0, 1, 2, or 3 –How is the operand location specified? »register, immediate, indirect,... –What type & size of operands are supported? »byte, int, float, double, string, vector... –What operations are supported? »add, sub, mul, move, compare...
ISA CSCE430/830 Classifying ISAs Accumulator (before 1960, e.g. 68HC11 ): 1-addressadd Aacc acc + mem[A] Stack (1960s to 1970s): 0-addressaddtos tos + next Memory-Memory (1970s to 1980s): 2-addressadd A, Bmem[A] mem[A] + mem[B] 3-addressadd A, B, C mem[A] mem[B] + mem[C] Register-Memory (1970s to present, e.g. 80x86 ): 2-addressadd R1, AR1 R1 + mem[A] load R1, AR1 mem[A] Register-Register (Load/Store, RISC) (1960s to present, e.g. MIPS ): 3-addressadd R1, R2, R3R1 R2 + R3 load R1, R2R1 mem[R2] store R1, R2mem[R1] R2
ISA CSCE430/830 Operand Locations in Four ISA Classes GPR
ISA CSCE430/830 Code Sequence C = A + B for Four Instruction Sets StackAccumulatorRegister (register-memory) Register (load- store) Push A Push B Add Pop C Load A Add B Store C Load R1, A Add R1, B Store C, R1 Load R1,A Load R2, B Add R3, R1, R2 Store C, R3 memory acc = acc + mem[C] R1 = R1 + mem[C] R3 = R1 + R2
ISA CSCE430/830 Types of Addressing Modes (VAX) Addressing ModeExampleAction 1.Register directAdd R4, R3R4 <- R4 + R3 2.Immediate Add R4, #3R4 <- R DisplacementAdd R4, 100(R1)R4 <- R4 + M[100 + R1] 4.Register indirect Add R4, (R1)R4 <- R4 + M[R1] 5.IndexedAdd R4, (R1 + R2)R4 <- R4 + M[R1 + R2] 6.Direct Add R4, (1000)R4 <- R4 + M 7.Memory IndirectAdd <- R4 + M[M[R3]] 8.AutoincrementAdd R4, (R2)+R4 <- R4 + M[R2] R2 <- R2 + d 9.AutodecrementAdd R4, (R2)-R4 <- R4 + M[R2] R2 <- R2 - d 10. ScaledAdd R4, 100(R2)[R3]R4 <- R4 + M[100 + R2 + R3*d] Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX.
ISA CSCE430/830 Types of Operations Arithmetic and Logic:AND, ADD Data Transfer:MOVE, LOAD, STORE ControlBRANCH, JUMP, CALL SystemOS CALL, VM Floating PointADDF, MULF, DIVF DecimalADDD, CONVERT StringMOVE, COMPARE Graphics(DE)COMPRESS
ISA-2 CSCE430/830 MIPS Instructions All instructions exactly 32 bits wide Different formats for different purposes Similarities in formats ease implementation oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format opaddress 6 bits26 bits J-Format
ISA-2 CSCE430/830 MIPS Instruction Types Arithmetic & Logical - manipulate data in registers add $s1, $s2, $s3$s1 = $s2 + $s3 or $s3, $s4, $s5$s3 = $s4 OR $s5 Data Transfer - move register data to/from memory load & store lw $s1, 100($s2)$s1 = Memory[$s ] sw $s1, 100($s2)Memory[$s ] = $s1 Branch - alter program flow beq $s1, $s2, 25if ($s1==$s1) PC = PC *25 else PC = PC + 4
ISA-2 CSCE430/830 MIPS Arithmetic & Logical Instructions Instruction usage (assembly) add dest, src1, src2dest=src1 + src2 sub dest, src1, src2dest=src1 - src2 and dest, src1, src2dest=src1 AND src2 Instruction characteristics –Always 3 operands: destination + 2 sources –Operand order is fixed –Operands are always general purpose registers Design Principles: –Design Principle 1: Simplicity favors regularity –Design Principle 2: Smaller is faster
ISA-2 CSCE430/830 Arithmetic & Logical Instructions - Binary Representation Used for arithmetic, logical, shift instructions –op: Basic operation of the instruction (opcode) –rs: first register source operand –rt: second register source operand –rd: register destination operand –shamt: shift amount (more about this later) –funct: function - specific type of operation Also called “R-Format” or “R-Type” Instructions oprsrtrdfunctshamt 6 bits5 bits 6 bits 031
ISA-2 CSCE430/830 oprsrtrdfunctshamt 6 bits5 bits 6 bits Decimal Binary Arithmetic & Logical Instructions - Binary Representation Example Machine language for add $8, $17, $18 See reference card for op, funct values
ISA-2 CSCE430/830 MIPS Data Transfer Instructions Transfer data between registers and memory Instruction format (assembly) lw $dest, offset($addr)load word sw $src, offset($addr)store word Uses: –Accessing a variable in main memory –Accessing an array element
ISA-2 CSCE430/830 Review: Chapter 1 Classes of Computers and Classes of Parallelism Technology Trend Dependability Performance Measurements and Benchmarks Principles
ISA-2 CSCE430/830 5 Classes of Computers Personal Mobile Devices –Cost is its primary concern –Energy, media performance, and responsiveness Desktop Computing –Price-Performance is its primary concern Servers –Availability, Scalability, and Throughput Clusters/warehouse-scale computers –Price-Performance, Energy Embedded Computer –Price
ISA-2 CSCE430/830 Classes of Parallelism & Architectures Data-Level Parallelism –Data items can be operated on at the same time Task-Level Parallelism –Tasks can operate independently and largely in parallel Instruction-Level Parallelism: data-level para. –Pipelining, speculative execution Vector Architectures & GPU: data-level para. –A single instruction operates a collection of data in para. Thread-Level Parallelism: either data-level para. or task-level para. –Exploits parallelism via parallel threads Request-Level Parallelism: task-level para. –Exploits parallelism via decoupled tasks
ISA-2 CSCE430/830 4 ways for hardware to support parallelism Single Instruction stream, Single Data stream –SISD Single Instruction stream, Multiple Data streams –SIMD, e.g., GPU, targets data-level parallelism Multiple Instruction streams, Single Data stream –MISD, no commercial multiprocessor of this type Multiple Instruction streams, Multiple Data streams –MIMD, e.g., multi-core processors, targets task-level parallelism
ISA-2 CSCE430/830 Trend in Technology Integrated Circuit (IC) logic technology –Moore’s Law: a growth rate in transistor count on a chip of about 40%-55% per year, or doubling every 18 or 24 months. Semiconductor DRAM –In 2011, a growth rate in capacity: 25%-40% per year Flash –A growth rate in capacity: 50%-60% per year Magnetic Disk –Since 2004, it has dropped back to 40% per year.
ISA-2 CSCE430/830 Trend in Performance Bandwidth vs. Latency –The improvement on Bandwidth is much significant than that on Latency.
ISA-2 CSCE430/830 Growth in Processor Performance RISC Move to multi-processor Parallelism: via Pipelining Locality: using Cache Hurdle: Power Wall Lack: Instruction- level Parallelism
ISA-2 CSCE430/830 An example of Intel 486 CPU released in 1992 ， 66MHz, w/ L2 Cache ， W A80486DX2-66.html
ISA-2 CSCE430/830 A CPU fan for Intel 486 CPU fan.aspx
ISA-2 CSCE430/830 An example of Intel Pentium 4 CPU released in 2002, 2.8GHz, w/ 512KB Cache, 68.4W s.php?item_id=146&category_id=61
ISA-2 CSCE430/830 A typical CPU fan for Intel Pentium 4
ISA-2 CSCE430/830 A special CPU fan for gaming/multimedia users Cooling/Asus-Star-Ice-CPU-Cooler-Review
ISA-2 CSCE430/830 Trend in Power and Energy in IC Energy dynamic – ½ X Capacitive Load X Voltage 2 Power dynamic – ½ X Capacitive Load X Voltage 2 X Freq. switched Example –Intel MHz Voltage: 5V –Intel Pentium 4 2.8GHz Voltage: 1.5V –Intel Core 990x 3.4GHz Voltage: V Improving Energy Efficiency –Do nothing well; Dynamic Voltage-Frequency Scaling(DVFS); Design for typical case; Overclocking Power static – Current static X Voltage
ISA-2 CSCE430/830 Dependability Service Accomplishment & Service Interruption Transitions between 2 states: Failure & Restoration Measurements –Reliability: a measure of the continuous service accomplishment from a reference initial instant. »MTTF: Mean time to failure »FIT: failures per billion hours, 1/MTTF X 10 9 »MTTR: Mean time to repair »MTBF: Mean time between failures = MTTF + MTTR –Availability: a measure of the service accomplishment with respect to the alternation between the two states. »MTTF/(MTTF+MTTR) »Upper bound: 100%
ISA-2 CSCE430/830 Performance Measurements and Benchmarks Metrics –Throughput: a total amount of work done in a given time –Response time (Execution time): the time between the start and the completion of an event Speedup of X relative to Y –Execution time Y / Execution time X Execution time –Wall clock time: a latency to complete a task –CPU time: only computation time Benchmarks –Kernels, Toy programs, Synthetic benchmarks –Benchmark suites: SPEC [CPU] & TPC [Transaction Processing] –SpecRatio = Execution Time reference / Execution Time target
ISA-2 CSCE430/830 Design Principles Take Advantage of Parallelism Principle of Locality Focus on the Common Case –Amdahl’s Law –Upper bound of the speedup: ?
ISA-2 CSCE430/830 Design Principles Take Advantage of Parallelism Principle of Locality Focus on the Common Case –Amdahl’s Law –Upper bound of the speedup: » 1 / (1 - Fraction enhanced )
ISA-2 CSCE430/830 Exercise 1 If the new processor is 10 times faster than the original process, and we assume that the original processor is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement? Fraction enhanced = 0.4, Speedup enhanced = 10 Speedup overall = 1/( /10) = 1.56 What is the upper bound of the overall speedup? Upper bound = 1/0.6 = 1.67
ISA-2 CSCE430/830 Exercise 2 In a disk subsystem: –10 disks, each rated at 1,000,000-hour MTTF –1 ATA controller, 500,000-hour MTTF –1 power supply, 200,000-hour MTTF –1 fan, 200,000-hour MTTF –1 ATA cable, 1,000,000-hour MTTF Assuming the lifetimes are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole
ISA-2 CSCE430/830 Exercise 2 Because the overall failure rate of the collection is the sum of the failure rates of the modules, the failure rate of the system –= 10*(1/1,000,000) + 1/500, /200, /200, /1,000,000 –= 23/1,000,000 or 23,000 FIT Because MTTF is the inverse of the failure rate –MTTF system = 1/(23/1,000,000) = 43,500 hours