Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 TK2123: COMPUTER ORGANISATION & ARCHITECTURE Prepared By: Associate Prof. Dr Masri Ayob Lecture 5: Computer Performance.

Similar presentations


Presentation on theme: "1 TK2123: COMPUTER ORGANISATION & ARCHITECTURE Prepared By: Associate Prof. Dr Masri Ayob Lecture 5: Computer Performance."— Presentation transcript:

1 1 TK2123: COMPUTER ORGANISATION & ARCHITECTURE Prepared By: Associate Prof. Dr Masri Ayob Lecture 5: Computer Performance

2 Prepared by: Dr Masri Ayob - TK2123 2 Contents This lecture will discuss: Speeding up computer operation.Speeding up computer operation. Improvements in Chip Organisation and Architecture.Improvements in Chip Organisation and Architecture. Multilevel MachinesMultilevel Machines

3 Prepared by: Dr Masri Ayob - TK2123 3 Speeding up computer operation Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution

4 Prepared by: Dr Masri Ayob - TK2123 4 Performance Balance Processor speed increased. Memory capacity increased. Memory speed lags behind processor speed.

5 Prepared by: Dr Masri Ayob - TK2123 5 Logic and Memory Performance Gap

6 Prepared by: Dr Masri Ayob - TK2123 6 Solutions Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper”Make DRAM “wider” rather than “deeper” Change DRAM interface CacheCache Reduce frequency of memory access More complex cache and cache on chipMore complex cache and cache on chip Increase interconnection bandwidth High speed busesHigh speed buses Hierarchy of busesHierarchy of buses

7 Prepared by: Dr Masri Ayob - TK2123 7 I/O Devices Peripherals with intensive I/O demands Large data throughput demands Processors can handle this Problem moving data Solutions: CachingCaching BufferingBuffering Higher-speed interconnection busesHigher-speed interconnection buses More elaborate bus structuresMore elaborate bus structures Multiple-processor configurationsMultiple-processor configurations

8 Prepared by: Dr Masri Ayob - TK2123 8 Typical I/O Device Data Rates

9 Prepared by: Dr Masri Ayob - TK2123 9 Key is Balance Processor components Main memory I/O devices Interconnection structures

10 Prepared by: Dr Masri Ayob - TK2123 10 Improvements in Chip Organization and Architecture Increase hardware speed of processor Fundamentally due to shrinking logic gate sizeFundamentally due to shrinking logic gate size More gates, packed more tightly, increasing clock rateMore gates, packed more tightly, increasing clock rate Propagation time for signals reducedPropagation time for signals reduced Increase size and speed of caches Dedicating part of processor chipDedicating part of processor chip Cache access times drop significantlyCache access times drop significantly Change processor organization and architecture Increase effective speed of executionIncrease effective speed of execution ParallelismParallelism

11 Prepared by: Dr Masri Ayob - TK2123 11 Problems with Clock Speed and Logic Density Power Power density increases with density of logic and clock speed.Power density increases with density of logic and clock speed. Dissipating heat.Dissipating heat. RC delay Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them.Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them. Delay increases as RC product increases.Delay increases as RC product increases. Wire interconnects thinner, increasing resistance.Wire interconnects thinner, increasing resistance. Wires closer together, increasing capacitance.Wires closer together, increasing capacitance.

12 Prepared by: Dr Masri Ayob - TK2123 12 Problems with Clock Speed and Logic Density Memory latency Memory speeds lag processor speeds.Memory speeds lag processor speeds.Solution: More emphasis on organisational and architectural approachesMore emphasis on organisational and architectural approaches

13 Prepared by: Dr Masri Ayob - TK2123 13 Intel Microprocessor Performance

14 Prepared by: Dr Masri Ayob - TK2123 14 Increased Cache Capacity Typically two or three levels of cache between processor and main memory. Chip density increased More cache memory on chipMore cache memory on chip Faster cache accessFaster cache access Pentium chip devoted about 10% of chip area to cache. Pentium 4 devotes about 50%

15 Prepared by: Dr Masri Ayob - TK2123 15 More Complex Execution Logic Enable parallel execution of instructions Pipeline works like assembly line Different stages of execution of different instructions at same time along pipelineDifferent stages of execution of different instructions at same time along pipeline Superscalar allows multiple pipelines within single processor Instructions that do not depend on one another can be executed in parallelInstructions that do not depend on one another can be executed in parallel

16 Prepared by: Dr Masri Ayob - TK2123 16 Diminishing Returns Internal organisation of processors complex Can get a great deal of parallelismCan get a great deal of parallelism Further significant increases likely to be relatively modest.Further significant increases likely to be relatively modest. Benefits from cache are reaching limit. Increasing clock rate runs into power dissipation problem. Some fundamental physical limits are being reached.Some fundamental physical limits are being reached.

17 Prepared by: Dr Masri Ayob - TK2123 17 New Approach – Multiple Cores Multiple processors on single chip Large shared cacheLarge shared cache Within a processor, increase in performance proportional to square root of increase in complexity If software can use multiple processors, doubling number of processors almost doubles performance So, use two simpler processors on the chip rather than one more complex processor With two processors, larger caches are justified Power consumption of memory logic less than processing logicPower consumption of memory logic less than processing logic Example: IBM POWER4 Two cores based on PowerPCTwo cores based on PowerPC

18 Prepared by: Dr Masri Ayob - TK2123 18 POWER4 Chip Organization

19 Prepared by: Dr Masri Ayob - TK2123 19 Pentium Evolution (1) 8080 first general purpose microprocessorfirst general purpose microprocessor 8 bit data path8 bit data path Used in first personal computer – AltairUsed in first personal computer – Altair8086 much more powerfulmuch more powerful 16 bit16 bit instruction cache, prefetch few instructionsinstruction cache, prefetch few instructions 8088 (8 bit external bus) used in first IBM PC8088 (8 bit external bus) used in first IBM PC80286 16 Mbyte memory addressable16 Mbyte memory addressable up from 1Mbup from 1Mb80386 32 bit32 bit Support for multitaskingSupport for multitasking

20 Prepared by: Dr Masri Ayob - TK2123 20 Pentium Evolution (2) 80486 sophisticated powerful cache and instruction pipeliningsophisticated powerful cache and instruction pipelining built in maths co-processorbuilt in maths co-processorPentium SuperscalarSuperscalar Multiple instructions executed in parallelMultiple instructions executed in parallel Pentium Pro Increased superscalar organizationIncreased superscalar organization Aggressive register renamingAggressive register renaming branch predictionbranch prediction data flow analysisdata flow analysis speculative executionspeculative execution

21 Prepared by: Dr Masri Ayob - TK2123 21 Pentium Evolution (3) Pentium II MMX technologyMMX technology graphics, video & audio processinggraphics, video & audio processing Pentium III Additional floating point instructions for 3D graphicsAdditional floating point instructions for 3D graphics Pentium 4 Note Arabic rather than Roman numeralsNote Arabic rather than Roman numerals Further floating point and multimedia enhancementsFurther floating point and multimedia enhancementsItanium 64 bit64 bit Itanium 2 Hardware enhancements to increase speedHardware enhancements to increase speed

22 Prepared by: Dr Masri Ayob - TK2123 22 Intel Computer Family (3) Moore’s law for (Intel) CPU chips.

23 Prepared by: Dr Masri Ayob - TK2123 23 Intel Computer Family (1) The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHZ is 1 million cycles/sec.

24 Prepared by: Dr Masri Ayob - TK2123 24 PowerPC 1975, 801 minicomputer project (IBM) RISC Berkeley RISC I processor 1986, IBM commercial RISC workstation product, RT PC. Not commercial successNot commercial success Many rivals with comparable or better performanceMany rivals with comparable or better performance 1990, IBM RISC System/6000 RISC-like superscalar machineRISC-like superscalar machine POWER architecturePOWER architecture IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh) Result is PowerPC architecture Derived from the POWER architectureDerived from the POWER architecture Superscalar RISCSuperscalar RISC Apple MacintoshApple Macintosh Embedded chip applicationsEmbedded chip applications

25 Prepared by: Dr Masri Ayob - TK2123 25 PowerPC Family (1) 601: Quickly to market. 32-bit machineQuickly to market. 32-bit machine603: Low-end desktop and portableLow-end desktop and portable 32-bit32-bit Comparable performance with 601Comparable performance with 601 Lower cost and more efficient implementationLower cost and more efficient implementation604: Desktop and low-end serversDesktop and low-end servers 32-bit machine32-bit machine Much more advanced superscalar designMuch more advanced superscalar design Greater performanceGreater performance620: High-end serversHigh-end servers 64-bit architecture64-bit architecture

26 Prepared by: Dr Masri Ayob - TK2123 26 PowerPC Family (2) 740/750: Also known as G3Also known as G3 Two levels of cache on chipTwo levels of cache on chipG4: Increases parallelism and internal speedIncreases parallelism and internal speedG5: Improvements in parallelism and internal speedImprovements in parallelism and internal speed 64-bit organization64-bit organization

27 Prepared by: Dr Masri Ayob - TK2123 27 Internet Resources http://www.intel.com/ Search for the Intel MuseumSearch for the Intel Museumhttp://www.ibm.comhttp://www.dec.com Charles Babbage Institute PowerPC Intel Developer Home

28 Prepared by: Dr Masri Ayob - TK2123 28 Languages, Levels, Virtual Machines A multilevel machine

29 Prepared by: Dr Masri Ayob - TK2123 29 Contemporary Multilevel Machines

30 Prepared by: Dr Masri Ayob - TK2123 30 Evolution of Multilevel Machines Invention of microprogrammingInvention of microprogramming Invention of operating systemInvention of operating system Migration of functionality to microcodeMigration of functionality to microcode Elimination of microprogrammingElimination of microprogramming

31 Prepared by: Dr Masri Ayob - TK2123 31 The Computer Spectrum The current spectrum of computers available.

32 Prepared by: Dr Masri Ayob - TK2123 32 Metric Units The principal metric prefixes.

33 Prepared by: Dr Masri Ayob - TK2123 33 Thank you Q & A


Download ppt "1 TK2123: COMPUTER ORGANISATION & ARCHITECTURE Prepared By: Associate Prof. Dr Masri Ayob Lecture 5: Computer Performance."

Similar presentations


Ads by Google