Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an.

Similar presentations

Presentation on theme: "Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an."— Presentation transcript:

1 Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an

2 Review of Computer Architecture zCredit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. zI made some modifications to the note for clarity. yAssume some background information from CSCE 430 or equivalent

3 von Neumann architecture zMemory holds data and instructions. zCentral processing unit (CPU) fetches instructions from memory. ySeparate CPU and memory distinguishes programmable computer. zCPU registers help out: program counter (PC), instruction register (IR), general- purpose registers, etc.

4 von Neumann Architecture Memory Unit CPU Control + ALU Output Unit Input Unit

5 CPU + memory memory CPU PC address data IRADD r5,r1,r3 200 ADD r5,r1,r3

6 Recalling Pipelining

7 What is a potential Problem with von Neumann Architecture?

8 Harvard architecture CPU PC data memory program memory address data address data

9 von Neumann vs. Harvard zHarvard can’t use self-modifying code. zHarvard allows two simultaneous memory fetches. zMost DSPs (e.g Blackfin from ADI) use Harvard architecture for streaming data: ygreater memory bandwidth. ydifferent memory bit depths between instruction and data. ymore predictable bandwidth.

10 Today’s Processors Harvard or von Neumann?

11 RISC vs. CISC zComplex instruction set computer (CISC): ymany addressing modes; ymany operations. zReduced instruction set computer (RISC): yload/store; ypipelinable instructions.

12 Instruction set characteristics zFixed vs. variable length. zAddressing modes. zNumber of operands. zTypes of operands.

13 Tensilica Xtensa zRISC based variable length yBut not CISC

14 Programming model zProgramming model: registers visible to the programmer. zSome registers are not visible (IR).

15 Multiple implementations zSuccessful architectures have several implementations: yvarying clock speeds; ydifferent bus widths; ydifferent cache sizes, associativities, configurations; ylocal memory, etc.

16 Assembly language zOne-to-one with instructions (more or less). zBasic features: yOne instruction per line. yLabels provide names for addresses (usually in first column). yInstructions often start in later columns. yColumns run to end of line.

17 ARM assembly language example label1ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment destination

18 Pseudo-ops zSome assembler directives don’t correspond directly to instructions: yDefine current address. yReserve storage. yConstants.

19 Pipelining zExecute several instructions simultaneously but at different stages. zSimple three-stage pipe: fetchdecodeexecutememory

20 Pipeline complications zMay not always be able to predict the next instruction: yConditional branch. zCauses bubble in the pipeline: fetchdecode Execute JNZ fetchdecodeexecute fetchdecodeexecute

21 Superscalar zRISC pipeline executes one instruction per clock cycle (usually). zSuperscalar machines execute multiple instructions per clock cycle. yFaster execution. yMore variability in execution times. yMore expensive CPU.

22 Simple superscalar zExecute floating point and integer instruction at the same time. yUse different registers. yFloating point operations use their own hardware unit. zMust wait for completion when floating point, integer units communicate.

23 Costs zGood news---can find parallelism at run time. yBad news---causes variations in execution time. zRequires a lot of hardware. yn 2 instruction unit hardware for n-instruction parallelism.

24 Finding parallelism zIndependent operations can be performed in parallel: ADD r0, r0, r1 ADD r3, r2, r3 ADD r6, r4, r0 ++ + r0 r1r2r3 r0 r4 r6 r3

25 Pipeline hazards Two operations that have data dependency cannot be executed in parallel: x = a + b; a = d + e; y = a - f; - + + x a b d e a y f

26 Order of execution zIn-order: yMachine stops issuing instructions when the next instruction can’t be dispatched. zOut-of-order: yMachine will change order of instructions to keep dispatching. ySubstantially faster but also more complex.

27 VLIW architectures zVery long instruction word (VLIW) processing provides significant parallelism. zRely on compilers to identify parallelism.

28 What is VLIW? zParallel function units with shared register file: register file function unit function unit function unit function unit... instruction decode and memory

29 VLIW cluster zOrganized into clusters to accommodate available register bandwidth: cluster...

30 VLIW and compilers zVLIW requires considerably more sophisticated compiler technology than traditional architectures---must be able to extract parallelism to keep the instructions full. zMany VLIWs have good compiler support.

31 Scheduling ab c d ef g abe fc dg nop expressionsinstructions

32 EPIC zEPIC = Explicitly parallel instruction computing. zUsed in Intel/HP Merced (IA-64) machine. zIncorporates several features to allow machine to find, exploit increased parallelism.

33 IA-64 instruction format zInstructions are bundled with tag to indicate which instructions can be executed in parallel: taginstruction 1instruction 2instruction 3 128 bits

34 Memory system zCPU fetches data, instructions from a memory hierarchy: Main memory L2 cache L1 cache CPU

35 Memory hierarchy complications zProgram behavior is much more state- dependent. yDepends on how earlier execution left the cache. zExecution time is less predictable. yMemory access times can vary by 100X.

36 Memory Hierarchy Complication Pentium 3-MPentium 4-MPentium M Core P6 (Tualatin 0.13µ) Netburst (Northwood 0.13µ) "P6+" (Banias 0.13µ, Dothan 0.09µ) L1 Cache (data + code) 16Kb + 16Kb8Kb + 12Kµops (TC)32Kb + 32Kb L2 Cache512Kb 1024Kb Instructions SetsMMX, SSEMMX, SSE, SSE2 Max frequencies (CPU/FSB) 1.2GHz 133MHz 2.4GHz 400MHz (QDR) 2GHz 400MHz (QDR) Number of transistors44M55M77M, 140M SpeedStep2nd generation 3rd generation

37 End of Overview zNext class: Altera Nios II processors

Download ppt "Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an."

Similar presentations

Ads by Google