Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Similar presentations


Presentation on theme: "Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU."— Presentation transcript:

1 Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU

2 Faster: Final Chapter Strategies for faster processors Instruction–level parallelism – general applicability but limited gain SIMD – Single Instruction Multiple Data MIMD – Multiple Instruction Multiple Data 11/30/15Computer Architecture lecture 242

3 SIMD Multimedia extensions for x86 architecture – 4 or 8-way parallel arithmetic Vector arithmetic – multiple fast pipelines – specialized processors GPUs – Graphics Processing Units – co-processor for CPU – hundreds / thousands of arithmetic units 11/30/15Computer Architecture lecture 243

4 GPU High-quality rendering is very compute intensive – image realized by 1M triangles – computing pixels may require 4x10 9 cycles – led to specialized graphics ‘cards’ for rendering hardwired sequence of stages Transition in early 2000’s to more general design – large arrays of processors – fostered experiments for wider use of GPU 11/30/15Computer Architecture lecture 244

5 GPU Several levels of parallelism: – basic unit is a streaming processor (SP) also called CUDA core scalar integer and floating point arithmetic large register file – 128 streaming processors form a streaming multiprocessor (SM) act as SIMD through hardware multithreading (SIMT = single instruction multiple thread) – 16 SMs form a GPU act as MIMD processor composed of SIMD processors 11/30/15Computer Architecture lecture 245

6 GPU structure 11/30/15Computer Architecture lecture 246

7 CUDA NVIDIA developed software to execute GPU programs from C (CUDA = Compute Unified Device) 11/30/15Computer Architecture lecture 247

8 GPU Aimed at high throughput, latency tolerant tasks – multithreading hides latency of main memory, reduces need for large, multilevel cache Very high throughput for suitable tasks – multi-teraflops possible 11/30/15Computer Architecture lecture 248

9 MIMD Provided through (any combination of) – multithreading – multicore chips – clusters Processes communicate through – message passing – shared memory UMA (uniform memory architecture) NUMA (non-uniform memory architecture) 11/30/15Computer Architecture lecture 249


Download ppt "Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU."

Similar presentations


Ads by Google