Presentation is loading. Please wait.

Presentation is loading. Please wait.

并行程序设计 PARALLEL PROGRAMMING Pingpeng Yuan. PARALLEL PROGRAMMING  What  Why  How  Goal  exam 2.

Similar presentations


Presentation on theme: "并行程序设计 PARALLEL PROGRAMMING Pingpeng Yuan. PARALLEL PROGRAMMING  What  Why  How  Goal  exam 2."— Presentation transcript:

1 并行程序设计 PARALLEL PROGRAMMING Pingpeng Yuan

2 PARALLEL PROGRAMMING  What  Why  How  Goal  exam 2

3 What is Parallel Programming? Coordinating multiple processing elements to solve a problem 3

4 PARALLELISM - A SIMPLISTIC UNDERSTANDING  Multiple tasks at once.  Distribute work into multiple execution units.  Two approaches -  Data Parallelism  Functional or Control Parallelism  数据并行 – 将数据分成块,然后 每一计算单元分别处理数据块.  功能并行 – 将问题划分成不同的 任务,然后处理单元分别处理任 务 4

5 WHY  Why  Technology Trend  Application Needs 5

6 Age Growth HUMAN ARCHITECTURE! GROWTH PERFORMANCE 6 Vertical Horizontal

7 No. of Processors C.P.I COMPUTATIONAL POWER IMPROVEMENT 7 Multiprocessor Uniprocessor

8 GENERAL TECHNOLOGY TRENDS 8 Microprocessor performance increases 50% - 100% per year Clock frequency doubles every 3 years Transistor count quadruples every 3 years

9 CLOCK FREQUENCY GROWTH RATE (INTEL FAMILY) 9 30% per year

10 INTEL MANY INTEGRATED CORE (MIC) 32 core version of MIC:

11 TILERA’S 100 CORES (JUNE 2011)  Tilera has introduced a range of processors (64-bit Gx family: 36 cores, 64 cores and 100 cores), aiming to take on Intel in servers that handle high-throughput web applications  64-bit cores running up to 1.5GHz  Manufactured in 40nm technology 11

12 TOP500 …. Paradigm Change in HPC

13 GPU ARCHITECTURE NVIDIA Fermi, 512 Processing Elements (PEs)

14 THE GAP BETWEEN CPU AND GPU ref: Tesla GPU Computing Brochure

15 GPU WILL TOP THE LIST IN NOV 2010

16 TRANSISTOR COUNT GROWTH RATE (INTEL FAMILY) 16 Transistor count grows much faster than clock rate - 40% per year, order of magnitude more contribution in 2 decades

17 HOW TO USE MORE TRANSISTORS  Improve single threaded performance via architecture:  Not keeping up with potential given by technology  Use transistors for memory structures to improve data locality  Use parallelism  Instruction-level  Thread level 17

18 SIMILAR STORY FOR STORAGE (TRANSISTOR COUNT) 18

19 DRAM densities to double every 3 years Projections for DRAM densities revised downwards over time Current densities at 4Gb/die TRENDS IN DRAM CAPABILITIES DRAM data rates to double every 4-5 years Projections for DRAM data rates revised upwards over time Current data-rates at 2.2 Gb/s

20 SIMILAR STORY FOR STORAGE  内存容量和内存访问速度差距更明显  从 起内存容量扩大了 1000x ,每年增长 50%  延迟每年只降低了 3% (only 2x from )  内存带宽增加了 2x  处理器速度变快,内存变大,内存相对变慢  需要并行传输更多地数据  需要更多的 cache 层次 20

21 存储层次 MEMORY HIERARCHY  每一层次可视作为下一层的 cache 21 CPU registers L1 cache L2 cache Primary Memory Secondary Storage Tertiary Storage 100 bytes 32KB 256KB 1GB 1TB 1PB 10 ms 1s-1hr < 1 ns 1 ns 4 ns 60 ns

22 SIMILAR STORY FOR STORAGE  并行增加了每层的效率,但没有增加访问时 间  并行和局部性在存储系统内部同样如此  内存芯片上同时取多个 bit ;然后在狭窄的通道上 流水传输  缓冲区存储最近访问的数据 22

23 DISK TRENDS  Disks too: Parallel disks plus caching  Disk capacity,  doubled every 3+ years  25% improvement each year  factor of 10 every decade  Still exponential, but far less rapid than processor performance  Disk capacity, 1990-recently  doubling every 12 months  100% improvement each year  factor of 1000 every decade  Capacity growth 10x as fast as processor performance! 23

24 DISK TRENDS  Only a few years ago, we purchased disks by the megabyte  Today, 1 GB (a billion bytes) costs $1 $0.50 $0.05 from Dell  => 1 TB costs $1K $500 $50, 1 PB costs $1M $500K $50K  Technology is amazing  Flying a 747 6” above the ground  Reading/writing a strip of postage stamps 24

25  总之,飞速增长  处理器速度  存储能力  带宽相对于延迟和时钟频率之间的差距  并行是计算机体系结构发展的必然趋势 25

26 COMMODITY COMPUTER SYSTEMS 1946  2003 General-purpose computing: Serial. 5KHz  4GHz General-purpose computing goes parallel. Clock frequency growth flat. #Transistors/chip 1980  2011: 29K  30B! #”cores”: ~d y-2003

27 If you want your program to run significantly faster … you’re going to have to parallelize it 27

28 DRIVERS OF PARALLEL COMPUTING – APPLICATION NEEDS ref:

29 APPLICATIONS OF PARALLEL PROCESSING 29

30 30

31 WHY DO WE NEED PARALLEL PROCESSING ? 31 Reasonable running time = Fraction of hour to several hours ( s) In this time, a TIPS/TFLOPS machine can perform operations Example 2: Fluid dynamics calculations (1000  1000  1000 lattice) 10 9 lattice points  1000 FLOP/point  time steps = FLOP Example 3: Monte Carlo simulation of nuclear reactor particles to track (for 1000 escapes)  10 4 FLOP/particle = FLOP Decentralized supercomputing ( from Mathworld News, 2006/4/7 ): Grid of tens of thousands networked computers discovers – 1, the 43 rd Mersenne prime, as the largest known prime ( digits ) Example 1: Southern oceans heat Modeling (10-minute iterations) 300 GFLOP per iteration  iterations per 6 yrs = FLOP 4096 E-W regions 1024 N-S regions 12 layers in depth

32 32

33 33

34 34

35 大数据时代 大数据分类: 互联网数据 科学数据 多媒体数据 行业应用数据,如金融数据 根据 IDC 的报告, 2012 年全球的数据总量为 2.7ZB ,预计到 2020 年,全球的数据总量将 达到 35ZB 。

36 WHAT MAKES IT BIG DATA? 36 VOLUMEVELOCITYVARIETYVALUE SOCIAL BLOG SMART METER

37 NUMBERS  How many data in the world?  800 Terabytes, 2000  160 Exabytes, 2006  500 Exabytes(Internet), 2009  2.7 Zettabytes, 2012  35 Zettabytes by 2020  How many data generated ONE day?  7 TB, Twitter  10 TB, Facebook 37 Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011

38 BIG DATA USE CASES 38 Today’s ChallengeNew DataWhat’s Possible Healthcare Expensive office visits Remote patient monitoring Preventive care, reduced hospitalization Manufacturing In-person support Product sensors Automated diagnosis, support Location-Based Services Based on home zip code Real time location data Geo-advertising, traffic, local search Public Sector Standardized services Citizen surveys Tailored services, cost reductions Retail One size fits all marketing Social media Sentiment analysis segmentation

39 HOW  How  实践是检验真理的唯一标准 39

40 PARALLEL PROGRAMMING  课程内容结构  Parallel Architectures  Parallel Algorithms  Parallel Programming 40

41 Most people in the research community agree that there are at least two kinds of parallel programmers that will be important to the future of computing Programmers that understand how to write software, but are naïve about parallelization and mapping to architecture Programmers that are knowledgeable about parallelization, and mapping to architecture, so can achieve high performance GOAL

42 授课计划  总共 32 学时  4 学时 : 课程介绍 + 并行计算系统体系结构  4 学时:并行算法基础  24 学时:并行程序设计 42

43 考核要求  成绩评定方式:平时成绩(出勤率 + 1 doc ) + 考试 成绩(分数比例: 20 : 80 )  1 doc  针对某一并行计算技术问题,对相关解决技术进行评论 并给出改进  评论主要着眼于创新点和存在的问题,以及可能下一步 的研究工作。 43


Download ppt "并行程序设计 PARALLEL PROGRAMMING Pingpeng Yuan. PARALLEL PROGRAMMING  What  Why  How  Goal  exam 2."

Similar presentations


Ads by Google