Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.

Similar presentations


Presentation on theme: "CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley."— Presentation transcript:

1 CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley

2 CIS 629 Parallel Arch. Intro 2 Definition: Parallel Computer Definition: “A parallel computer is a collection of processiong elements that cooperate and communicate to solve large problems fast.” Almasi and Gottlieb, Highly Parallel Computing,1989 Role of a computer architect: To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost.

3 CIS 629 Parallel Arch. Intro 3 Parallel Architecture Design Issues –How large a collection of processors? –How powerful are processing elements? –How do they cooperate and communicate? –How are data transmitted between processors? –Where to put the memory and I/O? –What type of interconnection? –What are HW and SW primitives for programmer? –Does it translate into performance?

4 CIS 629 Parallel Arch. Intro 4 Is Parallel Computing Inevitable? Application demands: Our insatiable need for computing cycles Technology Trends Architecture Trends Economics Current trends: –Today’s microprocessors have multiprocessor support –Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!... –Tomorrow’s microprocessors are multiprocessors

5 CIS 629 Parallel Arch. Intro 5 Whither Parallel Machines? 1997, 500 fastest machines in the world: 319 MPPs, 73 bus-based shared memory (SMP), 106 parallel vector processors (PVP) 2000, 381 of 500 fastest: 144 IBM SP (~cluster), 121 Sun (bus SMP), 62 SGI (NUMA SMP), 54 Cray (NUMA SMP)

6 CIS 629 Parallel Arch. Intro 6 Commercial Computing Relies on parallelism for high end –Computational power determines scale of business that can be handled Databases, online-transaction processing, decision support, data mining, data warehousing

7 CIS 629 Parallel Arch. Intro 7 Scientific Computing Demand

8 CIS 629 Parallel Arch. Intro 8 Engineering Computing Demand Large parallel machines a mainstay in many industries –Petroleum (reservoir analysis) –Automotive (crash simulation, drag analysis, combustion efficiency), –Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism), –Computer-aided design –Pharmaceuticals (molecular modeling) –Visualization »in all of the above »entertainment (films like Toy Story) »architecture (walk-throughs and rendering) –Financial modeling (yield and derivative analysis) –etc.

9 CIS 629 Parallel Arch. Intro 9 Also CAD, Databases, … Applications: Speech and Image Processing

10 CIS 629 Parallel Arch. Intro 10 Summary of Application Trends Transition to parallel computing has occurred for scientific and engineering computing In rapid progress in commercial computing –Database and transactions as well as financial –Usually smaller-scale, but large-scale systems also used Desktop also uses multithreaded programs, which are a lot like parallel programs Demand for improving throughput on sequential workloads Solid application demand exists and will increase

11 CIS 629 Parallel Arch. Intro 11 Technology Trends What does this picture tell us?

12 CIS 629 Parallel Arch. Intro 12 How far will ILP go? Infinite resources and fetch bandwidth, perfect branch prediction and renaming –real caches and non-zero miss latencies

13 CIS 629 Parallel Arch. Intro 13 What about Multiprocessor Trends?

14 CIS 629 Parallel Arch. Intro 14 Economics Commodity microprocessors not only fast but CHEAP –Development costs tens of millions of dollars –BUT, many more are sold compared to supercomputers –Crucial to take advantage of the investment, and use the commodity building block Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors Standardization makes small, bus-based SMPs commodity Desktop: few smaller processors versus one larger one? Multiprocessor on a chip?

15 CIS 629 Parallel Arch. Intro 15 Scientific Supercomputing Proving ground and driver for innovative architecture and techniques –Market smaller relative to commercial as MPs become mainstream –Dominated by vector machines starting in 70s –Microprocessors have made huge gains in floating-point performance »high clock rates »pipelined floating point units (e.g., multiply-add every cycle) »instruction-level parallelism »effective use of caches (e.g., automatic blocking) –Plus economics Large-scale multiprocessors replace vector supercomputers

16 CIS 629 Parallel Arch. Intro 16 Raw Uniprocessor Performance: LINPACK

17 CIS 629 Parallel Arch. Intro 17 Raw Parallel Performance: LINPACK SIMD MIMD

18 CIS 629 Parallel Arch. Intro 18 Flynn Taxonomy of Parallel Architectures SISD (Single Instruction Single Data) –Uniprocessors MISD (Multiple Instruction Single Data) –none SIMD (Single Instruction Multiple Data) –Vector processors, data parallel machines –Examples: Illiac-IV, CM-2 MIMD (Multiple Instruction Multiple Data) –Examples: Sun Enterprise 5000, Cray T3D, SGI Origin »Flexible »Use off-the-shelf micros MIMD current winner: <= 128 processor

19 CIS 629 Parallel Arch. Intro 19 Major MIMD Styles 1.Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor") 2.Decentralized memory (memory module with CPU) get more memory bandwidth, lower memory latency Drawback: Longer communication latency Drawback: Software model more complex

20 CIS 629 Parallel Arch. Intro 20 Decentralized Memory versions 1.Shared Memory with "Non Uniform Memory Access" time (NUMA) 2.Message passing "multicomputer" with separate address space per processor –Can invoke software with Remote Procedue Call (RPC) –Often via library, such as MPI: Message Passing Interface –Also called "Syncrohnous communication" since communication causes synchronization between 2 processes

21 CIS 629 Parallel Arch. Intro 21 Speedup Speedup (p processors) = For a fixed problem size (input data set), performance = 1/time Speedup fixed problem (p processors) = Performance (p processors) Performance (1 processor) Time (1 processor) Time (p processors)

22 CIS 629 Parallel Arch. Intro 22 Speedup - what’s happening? Ideally, linear speedup In reality, communication overhead reduces Suprisingly, super-linear speedup is achievable

23 CIS 629 Parallel Arch. Intro 23 Amdahl’s Law Most fundamental limitation on parallel speedup If fraction s of seq execution is inherently serial, speedup <= 1/s Example: 2-phase calculation –sweep over n-by-n grid and do some independent computation –sweep again and add each value to global sum Time for first phase = n 2 /p Second phase serialized at global variable, so time = n 2 Speedup <= or at most 2 Trick: divide second phase into two –accumulate into private sum during sweep –add per-process private sum into global sum Parallel time is n 2 /p + n2/p + p, and speedup at best 2n 2 n2n2 p + n 2 2n 2 2n 2 + p 2

24 CIS 629 Parallel Arch. Intro 24 Amdahl’s Law 1 p 1 p 1 n 2 /p n2n2 p work done concurrently n2n2 n2n2 Time n 2 /p (c) (b) (a)

25 CIS 629 Parallel Arch. Intro 25 Concurrency Profiles –Area under curve is total work done, or time with 1 processor –Horizontal extent is lower bound on time (infinite processors) –Speedup is the ratio:, base case: –Amdahl’s law applies to any overhead, not just limited concurrency f k k fkfk k p  k=1    1 s + 1-s p

26 CIS 629 Parallel Arch. Intro 26 Communication Performance Metrics: Latency and Bandwidth 1.Bandwidth –Need high bandwidth in communication –Match limits in network, memory, and processor –Challenge is link speed of network interface vs. bisection bandwidth of network 2.Latency –Affects performance, since processor may have to wait –Affects ease of programming, since requires more thought to overlap communication and computation –Overhead to communicate is a problem in many machines 3.Latency Hiding –How can a mechanism help hide latency? –Increases programming system burdern –Examples: overlap message send with computation, prefetch data, switch to other tasks


Download ppt "CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley."

Similar presentations


Ads by Google