Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mile Stojčev, Teufik Tokić, Ivan Milentijević

Similar presentations


Presentation on theme: "Mile Stojčev, Teufik Tokić, Ivan Milentijević"— Presentation transcript:

1 The Limits of Semiconductor Technology & Coming Challenges in Microarchitecture and Architecture
Mile Stojčev, Teufik Tokić, Ivan Milentijević Faculty of Electonic Engineering, Niš

2 Outline Technology Trends
Process Technology Challenges – Low Power Design Microprocessors’ Generations Challenges in Education

3 Outline – Technology Trends
Moore’s Law 1 Moore’s Law 2 Performance and New Technology Generation Technology Trends – Example Trends in Future Processor Technology Memory Technology

4 Moore's Law 1 In 1965, Gordon Moore, director of research and development at Fairchild Semiconductor, later founder of Intel corp., wrote a paper for Electronics entitled “Cramming more components onto integrated circuits”. In the paper Moore observed that “The complexity for minimum component cost has increased at a rate of roughly a factor of two per year”. This observation became known as Moore's law. In fact, by 1975 the leading chips had maybe one-tenth as many components as Moore had predicted. The doubling period had stretched out to an average of 17 months in the decade ending in 1975, then slowed to 22 months through 1985 and 32 months through It has revived to a now rel­atively peppy 22 to 24 months in recent years.

5 Moore’s Law 1 continue Similar exponential growth rates have occurred for other aspects of computer technology – disk capacities, memory chip capacities, and processor performance. These remarkable growth rates have been the major driving forces of the computer revolution. Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years

6 Moore’s Law 1 – number of transistors

7 Moore’s Law 1 - Linewidths
One of the key drivers behind the industries ability to double transistor counts every 18 to 24 months, is the continuous reduction in linewidths. Shrinking linewidths not only enables more components to fit onto an IC (typically 2x per linewidth generation) but also lower costs (typically 30% per linewidth generation).

8 Moore’s Law 1 - Die size Shrinking linewidths have slowed the rate of growth in die size to 1.14x per year versus 1.38 to 1.58x per year for transistor counts, and since the mid nineties accelerating linewidth shrinks have halted and even reversed the growth in die sizes.

9 Moore's Law in Action The number of transistors on chip doubles annually

10 Moore’s Law 1 – Microprocessor

11 Moore’s Law 1– Capacity Single Chip DRAM

12 Improving frequency via pipelining
Process technology and microarchitecture innovations enable doubling the frequency increase every process generation The figure presents the contribution of both: as the process improves, the frequency increases and the average amount of work done in pipeline stages decreases

13 Process Complexity Shrinking linewidths isn’t free. Linewidth shrinks require process modifications to deal with a variety of issues that come up from shrinking the devices - leading to increasing complexity in the processes being used.

14 Moore’s Law 2 (Rock’s Law)
In 1996 Intel augmented Moore’s law (the number of transistor on processor double approximately every 18 mounts) with Moore’s law 2. Law 2 says that as sophistication of chip increases, the cost of fabrication rises exponentially. The cost of semiconductor tools doubles every four years. By this logic, chip fabrication plants, or fabs, were supposed to cost $5 billion each by the late 1990s and $10 billion by now

15 Moore’s Law 2 (Rock’s Law) - continue
For example: In 1986 Intel manufactured 386 that counted transistors in fabs costing $200 million. In 1996 for Pentium processor that counted 6 million transistors $2 billion facility to produce was needed.

16 Moore’s Law 2 (Rock’s Law) The Cost of Semiconductor Tools Doubles Every Four Years

17 Machrone’s Law The PC you want to bay will always be $5000

18 Metcalfe’s Law A network’s value grows proportionately to the Number of its users squared

19 Wirth’s Law Software is slowing faster than hardware is accelerating

20 Performance and new technology generation
According to the Moore’s law each new generation has approximately doubled logic circuit density and increased performance by about 40 % while quadrupling memory capacity. The increase in component per chip comes from following key factors: The factor of two in component density come from 20.5 shrink in each lithography dimensions (20.5 per x and 20.5 per y). An additional factor of 20.5 comes from an increase in chip area. A final factor of 20.5 comes from device and circuit cleverness.

21 Development in ICs

22 Semiconductor Industry Association Roadmap Summary for high-end Processors

23 total transistors/chip

24 Clock Frequency Versus Year for Various Representative Machines

25 Limiting in Clocking Traditional clocking techniques will reach their limit when the clock frequency reaches the 5-10 GHz range For higher frequency clocking (>10GHz) new ideas and new ways of designing digital systems are needed

26 Intel’s Microprocessors Clock Frequency

27 Technology Trends - Example
As an illustration of just how computer technology is improving, let’s consider what would have happened if automobiles had improved equally quickly. Assume that an average car in 1977 had a top speed of 150 km/h and an average fuel economy of 10 km/l. If both top speed and efficiency improved at 35% per year from 1977 to 1987, and by 50% per year from 1987 to 2000, tracking computer performance, what would the average top speed and fuel economy of car be in 1987? In 2000?

28 Solution In 1987:The span 1977 to 1987 is 10 years, so both traits would have improved by factor of  (1.35)10 = 20.1  giving a top speed of  3015 km/h  and fuel economy of 201 km/l In 2000: Thirteen more years elapse, this time at a 50% per year improvement rate, for a total factor of (1.5)13 = over the 1987 values  This gives a top speed of km/h and fuel economy of km/l.   This is fast enough to cover the distance from the earth to the moon in under 39 min, and to make round trip on less than 10 liters of gasoline.

29 Future size versus time in silicon ICs
The semiconductor industry itself has developed a “roadmap” based on the idea of Moore’s law. The National Roadmap for Semiconductors (NTRS) and most recently the International Technology Roadmap for semiconductors (ITRS) now extend the device scaling and increased functionality scenario to the year 2014, at which point minimum future size are projected to be 35 nm and chips with > 1011 components are expected to be available.

30 Trends in future size over time

31 Processor technology today
The most advanced processor technology today (year 2003) is 0.10 mm=100nm  Ideally, processor technology scales by a factor of ~0.7 all physical dimensions of devices (transistors and wires) With such scaling, typical improvement figures are the following:  1.4 – 1.5 times faster transistors two times smaller transistors 1.35 times lower operating voltages three times lower switching power

32 Processor Technology and Microprocessors
Process technology is the most important technology that drives the microprocessor industry. It is characterized by growing 1000 times in frequency (from 1MHz to 1GHz) and integration (from ~10K to 1M devices) in 25 years Microarchitecture attempts to increase both IPC and frequency

33 Process technology and microarchitecture
Microarchitecture techniques such as caches, branch prediction, and out-of-order execution can increase instruction per cycle (IPC) Pipelining, as microarchitecture idea, help to increase frequency Modern architecture (ISA) and good optimizing compiler can reduce the number of dynamic instructions executed for a given program

34 Frequency and performance improvements
While in-order microprocessor used four to five pipe stages, modern out-of-order microprocessors use over ten pipe stages With frequencies higher than 1 GHz more than 20 pipeline stages are used

35 Performance of memory and CPU
Memory in computer system is hierarchically organized In 1980 microprocessors were often designed without caches. Nowadays, microprocessors often come with two levels of caches.

36 Memory Hierarchy

37 Processor-DRAM Gap Microprocessor performance improved 55% per year since 1987, and 35% per year until 1986 Memory technology improvements aim primarily at increasing DRAM capacity not DRAM speed

38 Relative processor/memory speed

39 Type of Memories

40 Percentage of Usage

41 Typical Applications of DRAM

42 An anecdote In recent database benchmark study using TPC-C, both 200MHz Pentium Pro and Alpha systems were measured at 4.2 – 4.5 CPU cycles per instruction retired. IN other words, three out of every four CPU cycles retired zero instructions: most were spent waiting for memory ... Processor speed has seriously outstripped memory speed. Increasing the width of instruction issue and increasing the number of simultaneous instruction streams only makes the memory bottleneck worse

43 An anecdote - continue If a CPU chip today needs to move 2GBytes/s (say, 16 bytes every 8ns) across the pins to keep itself busy, imagine a chip in the foreseeable future with twice the clock rate, twice the issue width, and two instruction streams. All this factors multiply together to require about 16 GBytes/s of pin bandwidth to keep this chip busy. If is not clear whether pin bandwidth can keep up – 32 bytes every 2ns?

44 Memory system In 1GHz microprocessor, accessing main memory can take about 100 cycles. Such access may stall a pipelined microprocessor for many cycles and seriously impact the overall performance. To reduce memory stalls at a reasonable cost, modern microprocessor take advantage of the locality of references in the program and use a hierarchy of memory components

45 Expensive Memory Called a Cache
A small, fast, and expensive (in $/bit) memory called a cache is located on – die and holds frequently used data  A somewhat bigger, but slower and cheaper cache may be located between the microprocessor and the system bus which connects the microprocessor to the main memory

46 Two Levels of Caches Most advanced microprocessors today employ two levels of caches on chip The first level is ~ 32 – 128kB – it takes two to three cycles to access and typically catches about 95% of all accesses The second level is 256kB to over 1MB – it typically takes six to ten cycles to access and catches over 50% of misses of the first level

47 Memory Hierarchy Impact on Performance
Of – chip memory access may elapse about 100 cycles The cache miss that eventually has to go to the main memory can take about the same amount of time as executing 100 arithmetic and logic unit (ALU) instructions, so the structure of memory hierarchy has a major impact on performance Cache a made bigger and heuristics are used to make sure the cache contains portions of memory that are most likely to be used in the near future of program execution.

48 As conclusion concerning memory - problems
Today’s chip are largely able to execute code faster than we can feed then with instruction and data There are not longer performance bottlenecks in the floating-point multiplier or in having only a single integer unit. The real design action is in memory subsystems – caches, busses, bandwidth and latency.

49 As conclusion concerning memory – problems, continue
If the memory research community would follow the microprocessor community’s lead by learning more heavily on architecture – and system level solutions in addition to technology – level solutions to achieve higher performance, the gap might begin to close On expect that over the coming decade memory subsystems design will be the only important design issue for microprocessors.

50 Memory Hierarchy Solutions
Organization choices (CPU architecture, L1/L2 cache organizations, DRAM architecture, DRAM speed) can affect total execution time by a factor of two. System level parameters most affect performance: The number of independent channels and banks connecting the CPU to the DRAMs can effect a 25% performance change Burst – width – refers to data access granularity can effect a 15% performance change Magnetic RAM (MRAM) – new type of memory

51 Magnetic RAM – MRAM or NanotechRAM (NRAM)
Based on nanoscale semiconductor technology Nanotechnology RAM device consists of tiny Carbon nanotubes Differing electrical changes swing the tubes into one of two positions, representing the ones and zeroes necessary for digital storage. Moreover the tubes stay in position until a new signal resets them.

52 MRAM Capacity The 10 Gbit devices consists of carbon nanotubes that are 1 nm – just a few thaunsand atoms – in diameter on a silicon wafer MRAM is nonvolatile, it has the fast read-and-write perfotrmance of static RAM (SRAM) The 10 Gbit devices consists of 10 Billions carbon nanotubes that are 1 nm – just a few thaunsand atoms – in diameter on a silicon wafer.

53 MRAM as Universal Memory
MRAM can replace many ofthers types of memory including SRAM, DRAM, ROM, EEPROM, Flash EEPROM, and feroelectric RAM (FRAM) . Prediction are crystalline structures that users grow on silicon.

54 Capacity of DRAM and FLASH - MRAM

55 Surpassing the Prediction from Moore's Low – DRAM vs MRAM
The famous Moore's Low predicts that the memory density will be doubled in 1,5 years, while the new growth model clearly indicates the doubling of NAND Flash memory density every year.

56 Overall memory prediction roadmap
Even though the density growth of DRAM will slow down, DRAM will still keep on leading the overall memory technology and will be able to reach 8 Gb density in ten years High-density memory growth will surpass the prediction from Moore's Low

57 Overall memory prediction roadmap -cont

58 1988 Computer Food Chain

59 1997 Computer Food Chain

60 2003 Computer Food Chain

61 Outline Technology Trends
Process Technology Challenges – Low Power Design Microprocessors’ Generations Challenges in Education

62 Outline - Low Power Design
Power trends in VLSI View Point on Power Research Efforts in Low Power Design Is there an Optimal Design Point

63 Power consumption During 1995 energy consumption of all PC machines installed in USA was 60 * 106 MWh. During 2000 energy consumption of all PC machines installed in USA was 10% of the total energy production. During 2015 on except that the energy consumption of all PC machines will be 15% greater then 1995, or 69*106 MWh.

64 Typical Low-Power Applications
battery operated equipments, mobile communication equipments, wireless communication equipments, instrumentation, consumer electronics, biomedical technologies, industry, process controls ...

65 Power dissipation in time
“CMOS Circuits dissipate little power by nature. So believed circuit designers” (Kuroda-Sakurai, 95) 100 x4 / 3years 10 Power (W) 1 0.1 0.01 80 85 90 95 “By the year 2000 power dissipation of high-end ICs will exceed the practical limits of ceramic packages, even if the supply voltage can be feasibly reduced.”

66 Gloom and Doom predictions

67 Power density will increase

68 VDD, Power and Current Trend
2.5 200 500 Voltage 2 Power 1.5 Voltage [V] Current Power per chip [W] VDD current [A] 1 0.5 1998 2002 2006 2010 2014 Year International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA) , Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA) (* Taken from Sakurai’s ISSCC 2001 presentation)

69 Power Delivery Problem (not just California)
Your car starter !

70 Power Consumption New Dimension in Design

71 Sources of Power Consumption
The three major sources of power consumption in digital CMOS circuits are: + P4 where: P1 – capacitive switching power (dynamic - dominant) P2 – short circuit power (dynamic) P3 – leakage current power (static) P4 – static power dissipation (minor)

72 Research Efforts in Low-Power Design
Reduce the active load: Minimize the circuits Use more efficient design Charge recycling More efficient layout Technology scaling: The highest win Thresholds should scale Leakage starts to byte Dynamic voltage scaling Psw = pt CL V2dd fCLK Reduce Switching Activity: Conditional clock Conditional precharge Switching-off inactive blocks Conditional execution Run it slower: Use parallelism Less pipeline stages Use double-edge flip-flop

73 Reducing the Power Dissipation
The power dissipation can be minimized by reducing: supply voltage load capacitance switching activity Reducing the supply voltage brings a quadratic improvement Reducing the load capacitance contributes to the improvement of both power dissipation and circuit speed.

74 Amount of Reducing the Power Dissipation

75 Gate Delay and Power Dissipation in Term of Supply Voltage

76 Needs for Low-Power Efficient methodologies and technologies for the design of high-throughput and low-power digital systems are needed. The main interest of many researches is now oriented towards lowering the energy dissipation of these systems while still maintaining the high-throughput in real time processing.

77 Low-Power Design Techniques
The basic idea is: Decreasing activity of the some parts within VLSI IC. The term power manager refer to such techniques in general. Applying power management to a design typically involves two steps identifying idle or low active conditions for various parts of the circuit; and redesigning the circuits in order to eliminate or decrease switching activity in idle or low-active components.

78 General Approaches to Reduce Power
Reduction in fCLK is an option acceptable when some components may be idle or low-active during operation; Reduction in Vdd is the most effective way for power reduction, since the power is proportional to the square of Vdd. The problem with reducing Vdd is that it leads to an increase in circuit delay; The product pt·CL is called the average switched capacitance per cycle and the main directions for reducing this capacitance are done at system-, architectural-, RTL-, circuit- or technology level.

79 Low Power and Low Energy System Design
The design of low power circuits can be tackled at different levels, from system to technology

80 Multiple Frequency on the Chip as Technique to Reduce Power
Less aggressive approach is which attracts more attention. This technique is standardly used in VLSI ICs in order to reduce the power dissipation while maintaining the operating

81 Energy Minimization Using Multiple Frequency
PLL based DLL based

82 Clock Gating & Clock Distribution as Techniques to Reduce Power
- The use of gated clock is the most common approach to reduce energy. Unused modules are turned off by suppressing the clock to the module

83 Energy Minimazation Using Multiple Supply Voltage
Multiple supply voltage on the chip, as less aggressive approach, is attracting attention This has the advantage of allowing modules on the critical paths to use the highest voltage level (thus meeting the required timing constraints) while allowing modules on noncritical paths to use lower voltages (thus reducing the energy consumption) This scheme tends to result in smaller area overhead compared to parallel architectures

84 System Level Dynamic Power Management as another Techniques to Reduce Power
Dynamic power management is design methodology that dynamically reconfigures an electronic system to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components Power State Machine Power Manager

85 Dynamic instruction statistics
Power Breakdown in High-Performance CPU and Dynamic Instruction Statistics Power breakdown Dynamic instruction statistics

86 Architecture Trade-offs – Reference Datapath

87 Parallel Datapath

88 The More Parallel the Better

89 Pipeline Datapath

90 Architecture Summary for a Simple

91 Outline Technology Trends
Process Technology Challenges – Low Power Design Microprocessors’ Generations Challenges in Education

92 Outline Microprocessors’ Generations
First generation: Behind the power curve Second Generation: Becoming “real” computers Third Generation: Challenging the “establishment” Fourth Generation: 1990- Architectural and performance leadership

93 The microprocessor today
When we say “microprocessor” today, we generally mean the shaded area of the figure

94 The First Generation: 1971-78
Getting enough bits and transistors Transistor counts < 50,000 Performance < 0.5 MIPS Architecture: 8-16 bits Narrow datapaths (= slow performance) Awkward architectures Assembly language + some BASIC Processors: Intel 4004, , 8086 Zilog Z-80 Motorola 6800, 6502

95 Intel 4004 First general-purpose, single-chip microprocessor
Shipped in 1971 8-bit architecture, 4-bit implementation 2,300 transistors Performance < 0.1 MIPS 8008: 8-bit implementation in 1972 3,500 transistors First microprocessor-based computer (Micral) Targeted at laboratory instrumentation Mostly sold in Europe

96 Intel 8080 Intel’s first 16-bit architecture
Delivered in 1974 4,800 transistors Performance < 0.2 MIPS Used in Altair 8800 system Kit form (advertised in Popular Electronics) in 1975 $297 or $395 with case! 256 bytes of memory; expandable to 64K! Keyboard and floppy 100-line bus becomes S-100, first microcomputer bus Gates & Allen write BASIC Wozniak builds Homebrew Computer Club

97 Intel 8086 Introduced in 1978 New 16-bit architecture
Performance < 0.5 MIPS New 16-bit architecture “Assembly language” compatible with 8080 29,000 transistors Includes memory protection, support for FP coprocessor In 1981, IBM introduces PC Based on bit bus version of 8086

98 Second Generation: 1979-85 Becoming “real” computers
First 32-bit architecture (68000) First virtual memory support Workstations, Macs, and PCs based on microprocessors Transistors >50,000 Performance <= 1 MIPS Processors: Motorola 68000, 68020 Intel 80286, 80386

99 Motorola 68000 Major architectural step in microprocessors:
First 32-bit architecture initial 16-bit implementation First flat 32-bit address Support for paging General-purpose register architecture Loosely based on PDP-11 First implementation in 1979 68,000 transistors < 1 MIPS Used in Apple Mac Sun, Silicon Graphics, & Apollo workstations

100 Third Generation: 1985-89 Challenging the “establishment”
Microprocessors surpass minicomputers in performance, rival mainframes Implementation technology of choice: all new architectures are microprocessors RISC architecture techniques take hold Transistors < 500K Performance > 5 MIPS Processors: MIPS R2000, R3000 Sun SPARC HP PA-RISC

101 MIPS R2000 Several firsts: Implemented in 1985
First RISC microprocessor First microprocessor to provide integrated support for instruction & data cache First pipelined microprocessor (sustains 1 instruction/clock) Implemented in 1985 125,000 transistors 5-8 MIPS

102 Fourth Generation: 1990- Architectural and performance leadership
First 64-bit architecture First multiple-issue machine First multilevel caches Transistors >1M Clock rates> 100MHz Performance > 50 MIPS Processors: Intel i860, Pentium, MIPS R4000, MIPS R1000, DEC Alpha, Sun UltraSPARC, HP PA-RISC, PowerPC Generation 4.5: same basic approach, but faster clock rates & wider issue Alpha 21264, Pentium III & 4, Intel Itanium

103 Key Architectural Trends
Increase performance at 1.6x per year True from 1985-present Combination of technology and architectural enhancements Technology provides faster transistors and more of them Faster transistors leads to high clock rates More transistors: Architectural ideas turn transistors into performance Responsible for about half the yearly performance growth Two key architectural directions Sophisticated memory hierarchies Exploiting instruction level parallelism

104 Memory Hierarchies Caches: hide latency of DRAM and increase BW
CPU-DRAM access gap has grown by a factor of 30-50! Trend 1: Increasingly large caches On-chip: from 128 bytes (1984) to 100K+ bytes Multilevel caches: add another level of caching First multilevel cache:1986 Secondary cache sizes today: 128KB to 4-16 MB Trend 2: Advances in caching techniques: Reduce or hide cache miss latencies early restart after cache miss (1992) nonblocking caches: continue during a cache miss (1994) Cache aware combos: computers, compilers, code writers prefetching: instruction to bring data into cache early

105 Exploiting ILP ILP is the implicit parallelism among instructions
Exploited by Overlapping execution in a pipeline Issuing multiple instruction per clock superscalar: uses dynamic issue decision (HW driven) VLIW: uses static issue decision (SW driven) 1985: simple microprocessor pipeline (1 instr/clock) 1990: first static multiple issue microprocessors 1995: sophisticated dynamic schemes determine parallelism dynamically execute instructions out-of-order speculative execution depending on branch prediction “Off-the-shelf” ILP techniques yielded 20 year path.

106 MIPS R4000 First 64-bit architecture Integrated caches
On-chip Support for off-chip, secondary cache Integrated floating point Implemented in 1991: Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS

107 Intel i860 First multiple issue microprocessor: Implemented in 1991:
2 instructions/clock Dual issue mode Novel push pipeline Novel cache bypass Implemented in 1991: 1.3M transistors 50 mips Used primarily as attached processor (e.g., graphics)

108 MIPS R10000 First speculative processor Implemented in 1996:
Instruction scheduled and executed out-of-order Up to 4 instructions can complete per clock Window of 32 instructions (up to 32 in-flight) Maintain precise state by completing instructions in order Implemented in 1996: 6.8M transistors 200 MHz

109 Intel IA-64 and Itanium EPIC architecture: Itanium
Use compiler centric approach while avoiding disadvantages. Parallelism demarcated by the compiler Many special instruction & features for exploiting ILP in the compiler. Itanium First implementation (2001) 25 M transistors 800 MHz 130 Watts

110 Breakdown of tasks between compiler and runtime hardware

111 Today’s Uniprocessor ILP Menu
Wide variety of approaches both hardware and compiler intensive Hardware Techniques Dynamic scheduling Dynamic issue (i.e. superscalar) Dynamic branch prediction Dynamic disambiguation Dynamic speculation Software Techniques Static scheduling Static issue (i.e. VLIW) Static branch prediction Alias/pointer analysis Static speculation Lower hardware complexity More, longer range analysis More machine dependence More stable performance Higher complexity Potential clock rate impact No clear cut winners at the present!

112 Big Picture--ILP and Memory Systems
ILP Mountain My view No performance wall, but steeper slopes ahead. Easier territory is behind us. Industry-research gap vanished. Energy efficiency may be key limit. Speculation Dynamic scheduling Cache Mountain Multiple issue Scheduled pipelines Multipath prefetching Simple pipelining Compiler prefetching Critical word & early restart Simple caches Multilevel caches & buffers

113 Microprocessors today where they are, and what can do

114 Microprocessors where they go

115 Intel more Transistor

116 Intel Faster Devices

117 Number of Transistors in Intel’s processors

118 Higher level parallelism
Several approaches have been proposed to go beyond optimizing single-thread performance (latency) and to exploit higher performance (throughput) at better energy efficiency The more prononuced are: simultaneons multithreaded (SMT) processor, and chip multiprocessors (CMT)

119 Multithreading Microprocessor can execute multiple operations at a time 4 or 6 operations per cycle Hard to achieve this level of parallelism from single program Can we run multiple programs (threads) on (single) processor without much effort? Simultaneous multithreading (SMT) or Hyperthreading is a solution

120 Parallel Thread Sequencing Model

121 Principles of SMT

122 Multithreading in today’s processors
Today many high-end microprocessors are multithreaded (e.g., Intel Pentium 4) Support for 2-4 threads but expect to get only 1.3X improvement in throughput

123 Chip Multiprocessor Several processor cores in one die
Shared L2 caches Chip Communication to build multichip module with many CMPs + memory

124 Chip multiprocessor (CMP) platform model
CMP is a simple very powerful techiniques to obtain more performance in a power-effecient manner. The idea is to put several microprocessors on a single die. This type of architecture is reffered also as Multiprocessor System-on-Chip (MPSoC) The performance of small-scale CMP scales close to linear with the number of microprocessors and is likely to exceed the performance of an equivalent multiprocessor system

125 Chip multiprocessor (CMP) platform model - continue
CMP is an atractive option to use when moving to a new process technology, such as SoC Typical MPSoC applications we meet in network processors, multimedia hubs, signal processors, etc MPSoCs are usually implemented as heterogenous systems CMT and SMT can coexist-a CMP die can integrate several SMT microprocessors

126 Generic circa 2010 Microprocessor
4 – 8 general-purpose processing engines on chip used to execute independent programs Explicitly parallel programs (when possible) Speculatively parallel threads Special-purpose processing units (e.g., DSP functionality) Elaborate memory hierarchy Elaborate inter-chip communication facilities

127 Characteristics of superscalar, simultaneous multithreading, and chip multiprocessor architectures

128 The microprocessor tomorrow
When we say “microprocessor” tomorrow, we generally mean the shaded area of the figure

129 Outline Technology Trends
Process Technology Challenges – Low Power Design Microprocessors’ Generations Challenges in Education

130 Outline Challenges in Education
Changes in curricula Fundamentals A sort of the challenge we should accept

131 Chalenges in Education
It has often said that: Where you stand depends on where you sit In this context, starting from our positions an experiences, this is our view concerning the theme How shall we satisfy the long-term educational needs of engineers?

132 How to organize a training of new engineers ?
The engineers we are training today will still be practicing 40 years from now Are we preparing them for what they will be doing then? Is the whole system of engineering education – not just the undergraduate curriculum – organized to support today’s graduate for the next 40 years ? We think not on both counts

133 Our view & our experience
Our view is that the practice of engineering is rapidly changing, and that engineering education is not keeping up Our experiences are primarily in information technology (both in academy and industry), which, admittedly, has changed more rapidly than same other fields.

134 Changes in curricula It is almost a cliché to talk about change – so mach so that a passing reference to it becomes a substitute for serious thought about its implications But the fact is that the practice of engineering is changing at about the same pace as the technology it creates

135 What are fundamentals The undergraduate curriculum should teach (only) fundamentals Everyone agrees with that But what are fundamentals? Since the adoption of the engineering science model, the fundamentals have been largely continuous mathematics and physics But, as we said earlier, engineering is changing

136 What kinds of fundamentals we need now – some examples
Information technology (IT) will be embedded in virtually engineered product and process in the future – i.e., the design space for all engineers will include IT Discrete mathematics, not continuous math., is the underpinning of IT. It is a new fundamental Biological materials and process are a bit behind IT in their impact on engineering, but they a closing fast Thus the chemical and biological sciences are also becoming fundamental to engineering

137 Kinds of Fundamentals Engineering systems are increasingly complex, and increasingly contain components from across the spectrum of traditional engineering fields. More knowledge of the full spectrum will be the fundamental Engineering is global, and is performed in a holistic business context The engineer must design under constraints that include global cultural and business contexts, and so must understand them. They two are new fundamentals.

138 How to add these new fundamentals
The challenge is that we cannot just add these new fundamentals to a curriculum that is already too full. We have to look critically at the current cherished fundamentals and either displace them or find ways to cover them much more rapidly.

139 What will the character and essence of electrical and computer engineering education look like in the future ? It is difficult to predict the future with any accuracy, but it is safe to say that: Web-based teaching, distance learning, electronic books, and interactive learning environments will play increasingly significant roles in shaping what we teach, how we teach, and how students learn.

140 A sort of challenge we should accept
During one visit at our faculty, Prof. Krishna Shenai from University of Illinois of Chicago, director of Micro Systems Research Center says to us that he has never seen a process that cannot be speeded up by a factor of two and improved in quality at the same time. That is the sort of challenge we should accept for improving engineering education.


Download ppt "Mile Stojčev, Teufik Tokić, Ivan Milentijević"

Similar presentations


Ads by Google