2 3 4 5 void ordered_fill (float* array, int array_length) { int index; for (index = 0; index < array_length; index++) { array[index] = index; }

Slides:



Advertisements
Similar presentations
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Advertisements

11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!
Supercomputing in Plain English An Introduction to High Performance Computing Part I: Overview: What the Heck is Supercomputing? Henry Neeman, Director.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Increasing Machine Throughput Superscalar Processing Multiprocessor.
High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU.
Limitations of ANOVA ©2005 Dr. B. C. Paul. The Data Size Effect We Did ANOVA with one factor We Did ANOVA with one factor We Did it with two factors (Driver.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Topics Parallel Computing Shared Memory OpenMP 1.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Recursion. Recursion is a powerful technique for thinking about a process It can be used to simulate a loop, or for many other kinds of applications In.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Introduction to Parallel Programming & Cluster Computing Overview: What the Heck is Supercomputing? Josh Alexander, University of Oklahoma Ivan Babic,
Mr Barton’s Maths Notes
10-Jun-15 Introduction to Primitives. 2 Overview Today we will discuss: The eight primitive types, especially int and double Declaring the types of variables.
CS 280 Data Structures Professor John Peterson. Project Not a work day but I’ll answer questions as long as they keep coming! I’ll try to leave the last.
CS 280 Data Structures Professor John Peterson. Project Questions?
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Supercomputing in Plain English An Introduction to High Performance Computing Henry Neeman, Director OU Supercomputing Center for Education & Research.
Parallel Programming & Cluster Computing Shared Memory Multithreading Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Tuesday October.
Lesson 4: Percentage of Amounts.
Multiplication. Multiplication is increasing the value of a number by adding in equal sets. 3 x 4 means add 3 together 4 times Multiplying.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Parallel Programming & Cluster Computing The Tyranny of the Storage Hierarchy Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Tuesday.
Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman Director OU Supercomputing Center for Education & Research ChE.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Parallel Programming & Cluster Computing Shared Memory Multithreading Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman, Director OU Supercomputing Center for Education & Research Blue.
Parallel & Cluster Computing Overview of Parallelism Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08.
Supercomputing in Plain English Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman, University of Oklahoma Assistant.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Parallel Programming & Cluster Computing Overview: What the Heck is Supercomputing? Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen.
Data Structures & Algorithms and The Internet: A different way of thinking.
Creating Mathematical Conversations using Open Questions Marian Small Sydney August, 2015 #LLCAus
Social Media Roundup Bad social media: 7 Ways to lose your audience.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Module 1 Lesson 9 Find related multiplication facts by adding and subtracting equal groups in array models.
A little help Formatting Your Magazine Prepared especially for you by Mrs. Bianco.
Supercomputing and Science An Introduction to High Performance Computing Part I: Overview Henry Neeman.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Parallel Programming & Cluster Computing Overview of Parallelism Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education.
Access Forms and Queries. Entering Data in Your Table  You can add data to your table in Datasheet view, by typing in the columns and rows.  This.
Parallel Programming & Cluster Computing Shared Memory Multithreading Henry Neeman, Director OU Supercomputing Center for Education & Research University.
Hashing – Part I CS 367 – Introduction to Data Structures.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Parallel Programming & Cluster Computing An Overview of High Performance Computing Henry Neeman, University of Oklahoma Paul Gray, University of Northern.
Supercomputing in Plain English Shared Memory Multithreading
Lesson 3: Arrays and Loops. Arrays Arrays are like collections of variables Picture mailboxes all lined up in a row, or storage holes in a shelf – You.
WATERFALL DEVELOPMENT MODEL. Waterfall model is LINEAR development lifecycle. This means each phase must be completed before moving onto the next!!! WHAT.
Supercomputing in Plain English Shared Memory Multithreading Blue Waters Undergraduate Petascale Education Program May 23 – June
Week 12 - Friday.  What did we talk about last time?  Finished hunters and prey  Class variables  Constants  Class constants  Started Big Oh notation.
Parallel & Cluster Computing 2005 Supercomputing Overview Paul Gray, University of Northern Iowa David Joiner, Kean University Tom Murphy, Contra Costa.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Pass the Buck Every good programmer is lazy, arrogant, and impatient. In the game “Pass the Buck” you try to do as little work as possible, by making your.
Supercomputing and Science An Introduction to High Performance Computing Part V: Shared Memory Parallelism Henry Neeman, Director OU Supercomputing Center.
Concurrency and Performance Based on slides by Henri Casanova.
1 Point Perspective Students will tape down a piece of paper and go through the steps As we go through the presentation.
A Fragmented Approach by Tim Micheletto. It is a way of having multiple cache servers handling data to perform a sort of load balancing It is also referred.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
Concurrency Idea. 2 Concurrency idea Challenge –Print primes from 1 to Given –Ten-processor multiprocessor –One thread per processor Goal –Get ten-fold.
Parallel Programming & Cluster Computing Overview: What the Heck is Supercomputing? Henry Neeman, University of Oklahoma Charlie Peck, Earlham College.
Supercomputing in Plain English Shared Memory Multithreading
Matrix. Matrix Matrix Matrix (plural matrices) . a collection of numbers Matrix (plural matrices)  a collection of numbers arranged in a rectangle.
CSC 4250 Computer Architectures
Supercomputing in Plain English
November 14 6 classes to go! Read
The Jigsaw Puzzle Metaphor
Supercomputing CS1313 Fall 2018.
Computational Thinking
Presentation transcript:

2

3

4

5 void ordered_fill (float* array, int array_length) { int index; for (index = 0; index < array_length; index++) { array[index] = index; }

6 void random_fill (float* array, int* random_permutation_index, int array_length) { int index; for (index = 0; index < array_length; index++) { array[random_permutation_index[index]] = index; }

7 In a simple array fill, locality provides a factor of 8 to 20 speedup over a randomly ordered fill on a Pentium4. Better

8

9 CS 491 – Parallel and Distributed Computing

10 The definition of A = B C is for r  {1, nr}, c  {1, nc}.

11

12 If the matrix is big, then each sweep of a row will clobber nearby values in cache.

13 Better

14

15

16

17

18 Better

19

20

22 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.

23

24 Suppose you want to do a jigsaw puzzle that has, say, a thousand pieces. We can imagine that it’ll take you a certain amount of time. Let’s say that you can put the puzzle together in an hour.

25 If Scott sits across the table from you, then he can work on his half of the puzzle and you can work on yours. Once in a while, you’ll both reach into the pile of pieces at the same time (you’ll contend for the same resource), which will cause a little bit of slowdown. And from time to time you’ll have to work together (communicate) at the interface between his half and yours. The speedup will be nearly 2-to-1: y’all might take 35 minutes instead of 30.

26 Now let’s put Paul and Charlie on the other two sides of the table. Each of you can work on a part of the puzzle, but there’ll be a lot more contention for the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaces. So y’all will get noticeably less than a 4-to-1 speedup, but you’ll still have an improvement, maybe something like 3-to-1: the four of you can get it done in 20 minutes instead of an hour.

27 If we now put Dave and Tom and Horst and Brandon on the corners of the table, there’s going to be a whole lot of contention for the shared resource, and a lot of communication at the many interfaces. So the speedup y’all get will be much less than we’d like; you’ll be lucky to get 5-to-1. So we can see that adding more and more workers onto a shared resource is eventually going to have a diminishing return.

28 Now let’s try something a little different. Let’s set up two tables, and let’s put you at one of them and Scott at the other. Let’s put half of the puzzle pieces on your table and the other half of the pieces on Scott’s. Now y’all can work completely independently, without any contention for a shared resource. BUT, the cost per communication is MUCH higher (you have to scootch your tables together), and you need the ability to split up (decompose) the puzzle pieces reasonably evenly, which may be tricky to do for some puzzles.

29 It’s a lot easier to add more processors in distributed parallelism. But, you always have to be aware of the need to decompose the problem and to communicate among the processors. Also, as you add more processors, it may be harder to load balance the amount of work that each processor gets.

30 Load balancing means ensuring that everyone completes their workload at roughly the same time. For example, if the jigsaw puzzle is half grass and half sky, then you can do the grass and Scott can do the sky, and then y’all only have to communicate at the horizon – and the amount of work that each of you does on your own is roughly equal. So you’ll get pretty good speedup.

31 Load balancing can be easy, if the problem splits up into chunks of roughly equal size, with one chunk per processor. Or load balancing can be very hard. EASY HARD

32 CS 491 – Parallel and Distributed Computing

33 [1] Image by Greg Bryan, Columbia U. [2] “Update on the Collaborative Radar Acquisition Field Test (CRAFT): Planning for the Next Steps.”Update on the Collaborative Radar Acquisition Field Test (CRAFT): Planning for the Next Steps Presented to NWS Headquarters August [3] See for details. [4] [5] [6] Richard Gerber, The Software Optimization Cookbook: High-performance Recipes for the Intel Architecture. Intel Press, 2002, pp [7] RightMark Memory Analyzer. [8] ftp://download.intel.com/design/Pentium4/papers/ pdf ftp://download.intel.com/design/Pentium4/papers/ pdf [9] [10] [11] ftp://download.intel.com/design/Pentium4/manuals/ pdf ftp://download.intel.com/design/Pentium4/manuals/ pdf [12]

34