Download presentation
Presentation is loading. Please wait.
Published byChristal Lynch Modified over 8 years ago
1
Computer Structure 2013 – Introduction 1 MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh
2
Computer Structure 2013 – Introduction 2 General Course Information u Grade 20% : 4 exercises (all mandatory) תקף Submission in pairs 80% Final exam No midterm exam u Course web site http://webcourse.cs.technion.ac.il/234267 Foils will be on the web several days before the class
3
Computer Structure 2013 – Introduction 3 Class Focus u CPU Introduction: performance, instruction set (RISC vs. CISC) Pipeline, hazards Branch prediction Out-of-order execution u Memory Hierarchy Cache Main memory Virtual Memory u System Motherboard & chipset, DRAM, I/O, Disk, peripherals Power considerations
4
Computer Structure 2013 – Introduction 4 Computer System – Sandy Bridge mouse LAN Lan Adap South Bridge (PCH) Audio Codec DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive PS/2 keybrd/ mouse Cache DDRIII Channel 1 Mem BUS DDRIII Channel 2 Memory controller Core Display port External Graphics Card PCI express ×16 GFX Display link 2133-1066 MHz Line in Line out S/PDIF in S/PDIF out Super I/O LPC USB SATA BIOS PCI express ×1 exp slots System Agent 4×DMI D-sub HDMI DVI
5
Computer Structure 2013 – Introduction 5 Architecture & Microarchitecture u Architecture The processor features seen by the “user” Instruction set, addressing modes, data width, … u Micro-architecture The way of implementation of a processor Caches size and structure, number of execution units, … Timing is considered uArch (though it is user visible) u Processors with different uArch can support the same Architecture
6
Computer Structure 2013 – Introduction 6 Compatibility u Backward compatibility New hardware can run existing software Core2 Duo can run SW written for Pentium 4, Pentium M, Pentium III, Pentium II, Pentium , 486, 386, 268 u Forward compatibility New software can run on existing hardware Example: new software written with AVX TM runs on older processor which does not support AVX TM Commonly supports one or two generations behind u Architecture independent SW JIT – just in time compiler: Java and.NET Binary translation
7
Computer Structure 2013 – Introduction 7 Moore’s Law The number of transistors doubles every ~2 years
8
Computer Structure 2013 – Introduction 8 CPI – Cycles Per Instruction u CPUs work according to a clock signal Clock cycle is measured in nsec (10 -9 of a second) Clock frequency (= 1/clock cycle) measured in GHz (10 9 cyc/sec) u Instruction Count (IC) Total number of instructions executed in the program u CPI – Cycles Per Instruction Average #cycles per Instruction (in a given program) IPC (= 1/CPI) : Instructions per cycles CPI = #cycles required to execute the program IC
9
Computer Structure 2013 – Introduction 9 Calculating the CPI of a Program u ICi: #times instruction of type i is executed in the program u IC: #instruction executed in the program: Fi: relative frequency of instruction of type i : Fi = ICi/IC u CPI i – #cycles to execute instruction of type i e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the entire program: u CPI:
10
Computer Structure 2013 – Introduction 10 CPU Time u CPU Time - time required to execute a program CPU Time = IC CPI clock cycle u Our goal: minimize CPU Time Minimize clock cycle: more GHz (process, circuit, uArch) Minimize CPI: uArch (e.g.: more execution units) Minimize IC: architecture (e.g.: AVX TM )
11
Computer Structure 2013 – Introduction 11 Speedup overall = t exe t’ exe = 1 S F (1 – F) + Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: Amdahl’s Law t’ exe = t exe × S F (1 – F) + t exe t’ exe
12
Computer Structure 2013 – Introduction 12 Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP Speedup overall = 1 0.95 =1.053 t’ exe = t exe × (0.9 + 0.1 / 2) = 0.95 × t exe Corollary: Make The Common Case Fast Amdahl’s Law: Example
13
Computer Structure 2013 – Introduction 13 Comparing Performance u Peak Performance MIPS, GFLOPS Often not useful: unachievable / unsustainable in practice u Benchmarks Real applications, or representative parts of real apps Synthetic benchmarks representative parts of real apps Targeted at the specific system usages SPEC INT – integer applications Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, … SPEC FP – floating point applications Mostly important scientific applications TPC Benchmarks Measure transaction-processing throughput
14
Computer Structure 2013 – Introduction 14 Evaluating Performance of future CPUs u Use a performance simulator to evaluate the performance of a new feature / algorithm Models the uarch to a great detail Run 100’s of representative applications u Produce the performance s-curve Sort the applications according to the IPC increase Baseline (0) is the processor without the new feature Negative outliers Positive outliers Bad S-curve Small negative outliers Positive outliers Good S-curve
15
Computer Structure 2013 – Introduction 15 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design
16
Computer Structure 2013 – Introduction 16 ISA Considerations u Reduce the IC to reduce execution time E.g., a single vector instruction performs the work of multiple scalar instructions u Simple instructions simpler HW implementation Higher frequency, lower power, lower cost u Code size Long instructions take more time to fetch Longer instructions require a larger memory Important in small devices, e.g., cell phones
17
Computer Structure 2013 – Introduction 17 Architectural Consideration Example Immediate data size 1% of data values > 16-bits 12 – 16 bits of needed 0% 10% 20% 30% 0 1 2 3456789 10 11 12131415 Immediate data bits Int. Avg. FP Avg.
18
Computer Structure 2013 – Introduction 18 Software Specific Extensions u Extend arch to accelerate exec of specific apps Reduce IC u Example: SSE TM – Streaming SIMD Extensions 128-bit packed (vector) / scalar single precision FP (4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits
19
Computer Structure 2013 – Introduction 19 CISC Processors u CISC – Complex Instruction Set Computer The idea: a high level machine language Example: x86 u Characteristic Many instruction types, with a many addressing modes Some of the instructions are complex Execute complex tasks Require many cycles ALU operations directly on memory Only a few registers, in many cases not orthogonal Variable length instructions common instructions get short codes save code length
20
Computer Structure 2013 – Introduction 20 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency Top 10 x86 Instructions
21
Computer Structure 2013 – Introduction 21 CISC Drawbacks u Complex instructions and complex addressing modes complicates the processor slows down the simple, common instructions contradicts Make The Common Case Fast u Not compiler friendly Non orthogonal registers Unused complex addressing modes u Variable length instructions are a pain Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown Unknown where the inst. ends, and where the next inst. starts An instruction may cross a cache line or a page
22
Computer Structure 2013 – Introduction 22 RISC Processors u RISC – Reduced Instruction Set Computer The idea: simple instructions enable fast hardware u Characteristics A small instruction set, with few instruction formats Simple instructions that execute simple tasks Most of them require a single cycle (with pipeline) A few indexing methods ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers – all instructions and all addressing modes available with any register Three address machine: Add dst, src1, src2 Fixed length instructions u Complier friendly – easier to utilize the ISA Orthogonal, simple
23
Computer Structure 2013 – Introduction 23 RISC Processors (Cont.) u Simple architecture Simple micro-architecture Simple, small and fast control logic Simpler to design and validate Leave space for large on die caches Shorten time-to-market u Existing RISC processor are not “pure” RISC E.g., support division which takes many cycles Examples: MIPSTM, SparcTM, AlphaTM, PowerTM u Modern CISC processors use RISC arch ideas Internally translate the CISC instructions into RISC-like operations the inside core looks much like that of a RISC processor
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.