Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA

Similar presentations


Presentation on theme: "1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA"— Presentation transcript:

1 1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA liuj@wou.edu

2 2 Now the question – Why parallel?  Three things are for sure: Tax, death, and parallelism  How long does it take a single person to build I-5? Answer   What we do is that we want to solve a very computational intensive problem, such as modeling protein interacting with the water surrounding it. The problem could take a long long time. The protein simulation problem take a Cray X/MP 31,688 years to simulate 1 second of interaction (in 1990). Let’s say today super computer is 100 time faster than Cray X/MP, we still need more than 300 years! The only solution  parallel processing

3 3 Why parallel (2)  Moore’s Law The logic density of silicon-based IC (Integrated Circuits) closely followed the curve, that is, it doubles every year (until 1970, then every 18 months)  Why is the density related to processor’s speed? Because, during the process of “Computing,” the electrons need to carry signal from one end of a circuit to the other end.  For a 2GHz computer, its signals travel about.5 meters per clock cycle (.5 nanosecond)  That is, the speed of light places a physical limitation on how fast a sign processor computer can run

4 4 Why parallel (3)  There are problems require much faster computation power than today’s fastest single CPU computers can provide.  The speed of light limits how fast a single CPU computer can run  If we want to solve some computational intensive problems in a reasonable amount of time, we have to result to parallel computers!

5 5 Some Definitions  Parallel processing Information processing that emphasizes on concurrent manipulation of data belonging to many processes solving a single problem Example: having 100 processors sorting an array of 1,400,000,000 element – is Parallel processing Example: printing homework while reading emails – is concurrent, but not Parallel processing because the processes are not solving the same problem.  A parallel computer is a multi-processor computer capable of parallel processing Computers with just co-processors for math and image processing are not considered as parallel computers (some people disagree with this notion)

6 6 Two forms of parallelisms  Control Parallelism Concurrency is achieve by applying different operations to different data elements of a single problem Pipeline is a special form of control parallelism  Assembly line is an example of pipeline  Data Parallelism Concurrency is achieve by applying the same operation to different data elements of a single problem  Taking a class is an example of data parallelism (if we assuming you all are learning at the same speed)  Marching of army brigade can be considered as data parallelism Note the granularity of the above examples

7 7 Control VS. Data Parallelism  Looking the following statement 1.if a[i] > b[i] 2. a[i] = a[i]*b[i] 3.else 4. b[i] = a[i]-b[i]  In a control parallelism fashion, some processors execute statement a[i] = a[i]*b[i], other may execute b[i] = a[i]-b[i] during the same clock cycle  In a data parallelism fashion, especially on a SIMD machine, this if statement is executed in two clock cycles: During the first clock cycle, all the processors satisfy the condition of a[i] > b[i] execute statement a[i] = a[i]*b[i]. During the second machine cycle, processors not satisfy the condition of a[i] > b[i] execute statement b[i] = a[i]-b[i]

8 8 Speedup – Take I  Speedup is a measurement of how well or how effective a parallel algorithm is  Is defined as the ratio between the time needed for the most efficient sequential algorithm to perform a computation and the time needed to perform the same computation on a parallel computer with a parallel algorithm. That is,  Example, we developed a parallel bubble sort that sort n elements in O(log n) time using n processors. The speedup is because there are efficient sorting algorithms that has a complexity of O(nlogn)

9 9 Brain Exercise  Six equally skilled students need to make 210 special cookies, each consists of the following tasks 1.Break dough into small pieces of equal size (1) 2.Hand roll the small size dough pieces into balls (1) 3.Press the balls flat for rolling (1) 4.Roll the flat dough into wrappers (1) 5.Place suitable amount of fillings onto the wrappers (1) 6.Fold the wrappers to enclose the fillings completely to finish making a cookie (1) How to do this in a pipeline fashion? How to do this in a control parallelism fashion, other than pipeline? How to do this in data parallel fashion?

10 10 Approach #1 D1 ~ D6 D7 ~ D12 T1T2T3T4T5T6T7T8T9T10T11T12 S1123456123456 S2123456123456 S3123456123456 S4123456123456 S5123456123456 S6123456123456

11 11 Approach #2 T1T2T3T4T5T6T7T8T9T10T11T12 S1111111111111 S2 22222222222 S3 3333333333 S4 444444444 S5 55555555 S6 6666666 D1D2D3D4D5D7D6

12 12 Analysis  Sequential cost (1+1+1+1+1+1)*210 = 1260 time units  Maximum Speedup for Approach #1 ?  Maximum Speedup for Approach #2 ?  Other questions to consider If I have 1260 students, can I get the task done in 1 time unit? What if step 3 takes 3 time units and step 6 takes 2 time units? What if I add more “skilled” students to different approaches, what would be the effect?

13 13 Grand challenges  A list of problems that are very computational intensive, but can benefit human being greatly, heavily funded by the US government  The following is just the category of problems

14 14 Parallel Computers & Companies

15 15 One of the Fastest Computer  Per ttp://abcnews.go.com/Technology/WireStory?id=5028546&page=2 ttp://abcnews.go.com/Technology/WireStory?id=5028546&page=2  By: IBM and Los Alamos National Laboratory  Name: Roadrunner (Named after New Mexico’s state bird )  Twice as fast as IBM's Blue Gene, which is three time faster than the next fastest computer in the world  Cost $100,000,000 – very cheap  Speed 1,000,000,000,000,000 FLOP per second (petaflop)  Usage: primarily on nuclear weapons work, including simulating nuclear explosions  Related to gaming: In some ways, it's "a very souped-up Sony PlayStation 3."  Some facts:  The interconnecting system occupies 6,000 square feet with 57 miles of fiber optics and weighs 500,000 pounds. Although made from commercial parts, the computer consists of 6,948 dual-core computer chips and 12,960 cell engines, and it has 80 terabytes of memory housed in 288 connected refrigerator-sized racks.  Two years ago, the fastest computer in the world can perform 100,000,000,000,000 FLOP per second 100 taraflop

16 16 Parallel Computers and Programming – the trend  Hardware Super computers – multiprocessor/multicomputer – the fastest computers at the time Beowulf – cluster of off-the-shelf computers linked by a switch Othe distributed system such as NOW Multi-core – Many core (a CPU itself) within a CPU, soon will go over 60+ cores per CPU  Programming MPI for message passing architecture Vendor specific add-on to well known programming languages New language such as Microsoft’s F# Multi-core programming (add-on to well known programming languages)  Intel's Threading Building Blocks (TBB)Threading Building Blocks  Microsoft’s Task Parallel Library -- support Parallel For, PLINQ and etc, need to keep an eye on this one  Third party such as Jibu – may merge with MS

17 17 Multi-Core Programming  Sequential   Parallel 

18 18 Why Study Parallel Processing/Programming  Making your code run more efficiently  Utilize existing resources (other cores)  … …  Good coding class for CS students To learn something new To improve your skill sets To improve your problem solving skills To exercise your brain To review many Computer Science subject areas To relax a constraint our professors embedded in our thinking process in our early years of studying (What is the PC in a CPU?)

19 19 PRAM (Parallel Random Access Machine)  A theoretical parallel computer  Consists of a control unit, global memory, and an unbounded set of processors, each with its own memory.  In addition, Each processor has its unique id At each step, a active processor can Read/Write memory (global or private), perform the instruction as all other active processors, idle, or activate another processor  How many steps does it take to activate n processors

20 20 PRAM

21 21 Important Terms  computational intensive problem  Moore’s Law  Parallel processing  parallel computer  Control Parallelism  Data Parallelism  Speedup  Grand challenges  Massive Parallel Computer  Roadrunner  petaflop  Super computers  Beowulf  NOW  MPI  Multi-core  PRAM


Download ppt "1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA"

Similar presentations


Ads by Google