Presentation is loading. Please wait.

Presentation is loading. Please wait.

If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)? Joel Adams Department of Computer Science Calvin College.

Similar presentations

Presentation on theme: "If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)? Joel Adams Department of Computer Science Calvin College."— Presentation transcript:

1 If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)? Joel Adams Department of Computer Science Calvin College

2 An Anecdote about CCSC:MW This story has nothing to do with parallel computing, but it may be of interest… Did you know that if it were not for CCSC:MW, CS Education Week would likely not exist? CCSC:MW 2014 - 2

3 How CCSC:MW  CS Ed Week At CCSC:MW in 2008: The ACM-CSTA’s Chris Stevenson gave the keynote, describing the decline of CS in high schools CCSC:MW 2014 - 3 –No Child Left Behind was killing HS CS! –I’m pretty apolitical, but...

4 How CCSC:MW  CS Ed Week I decided to visit my Congressman, Rep. Vernon Ehlers, ranking member of the House Committee on Science & Technology CCSC:MW 2014 - 4 (a Physics PhD and former Calvin prof). He was surprised to hear of the problems (esp. enrollment declines) CS was facing.

5 How CCSC:MW  CS Ed Week Rep. Ehlers contacted the ACM, specifically Cameron Wilson. They worked together on CS Education Week, which the House passed 405-0 in 2009. CCSC:MW 2014 - 5 CCSC:MW catalyzed CS Education Week!

6 What’s Happening Now? There is a bill currently in Congress: –H.R. 2536: The CS Education Act of 2013 –It seeks to strengthen K-12 CS education, and make CS a core subject. –It currently has 116 co-sponsors (62R, 54D); is supported by ACM, NCWIT, Google, MS,... –It has been referred to the Committee on Early Childhood, Elementary, and Secondary Ed., chaired by Rep. Todd Rokita (R, IN). CCSC:MW 2014 - 6

7 CCSC:MW 2014 - 7 Most Representatives Are Unaware

8 What Can You Do? There is strength in numbers: Contact your Congressional reprentative and ask them to co-sponsor HR 2536. –If you are in Rep. Rokita’s district… (!) –More co-sponsors improve its chances. Tweet to Rep. Rokita (@ToddRokita) –Tell him you support HR 2536 – the CS Education Act of 2013 – and want it to pass. CCSC:MW 2014 - 8

9 And Now, Back To Today’s Topic Overview The past –How our computing foundation has shifted The present –Today’s hardware & software landscapes The future? –Preparing ourselves & our students CCSC:MW 2014 - 9

10 Temperature 2020 hot plate projected actual the sun CCSC:MW 2014 - 10

11 The Heat Problem… … was not caused by Moore’s Law It was caused by manufacturers doubling the clock speeds every 18-24 months This was the “era of the free lunch” for software developers: –If your software was sluggish, faster hardware would fix your problem within two years! CCSC:MW 2014 - 11

12 Solving the Heat Problem… In 2005, manufacturers stopped doubling the clock speeds because of the heat, power consumption, electron bleeding, … This ended the “era of the free lunch” –Software will no longer speed up on its own. CCSC:MW 2014 - 12

13 Clock Speed (frequency) trend CCSC:MW 2014 - 13

14 But Moore’s Law Continued Every 2 years, manufacturers could still double the transistors in a given area: –2006: Dual-core CPUs –2008: Quad-core CPUs –2010: 8-core CPUs –2012: 16-core CPUs –… Each of these cores has the full functionality of a traditional CPU. CCSC:MW 2014 - 14

15 12 Years of Moore’s Law CCSC:MW 2014 - 15 2013: Adapteva Parallella -A Dual-core 1-GHz ARM A7 -16 core Epiphany Coprocessor -1 GB RAM -Gigabit Ethernet, USB, HDMI, … -Ubuntu Linux -~$99 (but free via university program!) 2001: 18 nodes, each with: -One 1-GHz Athlon CPU -1 GB RAM / node -Gigabit Ethernet, USB, HDMI, … -Ubuntu Linux -~$60,000 (funded by NSF).

16 Multiprocessors are Inexpensive 2014: Nvidia Jetson TK1 -Quad-core ARM A15 -Kepler GPU w/ 192 CUDA cores -2 GB RAM -Gigabit Ethernet, HDMI, USB, … -Ubuntu Linux -~$200 CCSC:MW 2014 - 16

17 Multiprocessors are Everywhere CCSC:MW 2014 - 17

18 Some Implications Traditional sequential programs will not run faster on today’s hardware. –They may well run slower because the manufacturers are decreasing clock speeds. The only software that will run faster is parallel software designed to scale with the number of cores. CCSC:MW 2014 - 18

19 Categorizing Parallel Hardware Parallel Systems Shared Memory Distributed Memory MulticoreAccelerators GPUsCoprocessors Heterogeneous Systems Older Clusters Modern Super Computers Newer Clusters CCSC:MW 2014 - 19

20 Hardware: A Diverse Landscape Shared-memory systems Distributed-memory systems Heterogeneous systems Memory Core 1 Core 2 Core 3 Core 4 Mem 1 CPU 1 Mem 2 CPU 2 Mem 3 CPU 3 Mem N CPU N Network CCSC:MW 2014 - 20

21 CS Curriculum 2013 Because of this hardware revolution, the advent of cloud computing, and so on, CS2013 has added a new knowledge area: Parallel and Distributed Computing (PDC) CCSC:MW 2014 - 21

22 What is PDC? It goes beyond traditional concurrency: –Parallel emphasizes: o Throughput / performance (and timing) o Scalability (performance improves with # of cores) o New topics like speedup, Amdahl’s Law, … –Distributed emphasizes: o Multiprocessing (no shared memory) –MPI, MapReduce/Hadoop, BOINC, … o Cloud computing o Mobile apps accessing scalable web services CCSC:MW 2014 - 22

23 Software: Communication Options In shared-memory systems, programs may: Communicate via the shared-memory –Languages: Java, C++11, … –Libraries: POSIX threads, OpenMP Communicate via message passing –Message-passing languages: Erlang, Scala, … –Libraries: the Message Passing Interface (MPI) CCSC:MW 2014 - 23

24 CS Curriculum 2013 (CS2013) The CS2013 core includes 15 hours of parallel & distr. computing (PDC) topics +5 hours in core Tier 1 +10 hours in core Tier 2 + related topics in System Fundamentals (SF) How/where do we cover these topics in the CS curriculum? CCSC:MW 2014 - 24

25 Model 1: Create a New Course Add a new course to the CS curriculum that covers the core PDC topics: +If someone else has to teach this new course, dealing with PDC is their problem, not mine! –The CS curriculum is already full! –What do we drop to make room? CCSC:MW 2014 - 25

26 Model 2: Across the Curriculum Sprinkle 15+ hours (3 weeks) of PDC across our core CS courses, not counting SF: +Students see relationship of PDC to data structures, algorithms, prog. lang., … +Easier to make room for 1 week in 1 course than jettison an entire course. +Spreads the effort across multiple faculty –All those faculty have to be “on board” CCSC:MW 2014 - 26

27 Calvin CS Curriculum YearFall SemesterSpring Semester 1 Intro to Computing Calculus I Data Structures Calculus II 2 Algorithms & DS Intro. Comp. Arch. Discrete Math I Programming Lang. Discrete Math II 3 Software Engr Adv. Elective OS & Networking Adv. Elective Statistics 4 Adv. Elective Sr. Practicum I Adv. Elective Sr. Practicum II Perspectives on Comp. Data Structures Calculus II Programming Lang. Discrete Math II OS & Networking Algorithms & DS Intro. Comp. Arch. Adv. Elective: HPC Software Engr. CCSC:MW 2014 - 27

28 Why Introduce Parallelism in CS2? For students to be facile with parallelism, they need to see it early and often. Performance (Big-Oh) is a topic that’s first addressed in CS2. Data structures let us store large data sets –Slow sequential processing of these sets provides a natural motivation for parallelism. CCSC:MW 2014 - 28

29 Parallel Topics in CS2 Lecture topics: –Single threading vs. multithreading –The single-program-multiple-data (SPMD), fork-join, parallel loop, and reduction patterns –Speedup, asymptotic performance analysis –Parallel algorithms: searching, sorting –Race conditions: non-thread-safe structures Lab exercise: Compare sequential vs. parallel matrix operations using OpenMP CCSC:MW 2014 - 29

30 Lab Exercise: Matrix Operations Given a Matrix class, the students: Measure the time to perform sequential addition and transpose methods For each of three different approaches: –Use the approach to parallelize those methods –Record execution times in a spreadsheet –Create a chart showing time vs # of threads Students directly experience the speedup… CCSC:MW 2014 - 30

31 Addition: m3 = m1 + m2 + = Single-threaded: += Multi-threaded (4 threads): CCSC:MW 2014 - 31 ~36 steps ~9 steps

32 Tranpose: m2 = m1.transpose().tranpose() = Single-threaded: = Multi-threaded (4 threads):.tranpose() CCSC:MW 2014 - 32 ~24 steps ~6 steps

33 SIGCSE 2014 - 33

34 Programming Project Parallelize other Matrix operations –Multiplication –Assignment –Constructors –Equality Some operations (file I/O) are inherently sequential, providing a useful lesson… CCSC:MW 2014 - 34

35 Alternative Exercise/Project Parallelize image-processing operations: –Color-to-grayscale –Invert (negative) –Blur, Sharpen –Sepia-tinting Many students will find photo-processing to be more engaging than matrix ops. CCSC:MW 2014 - 35

36 Assessment All students complete end-of-course evaluations with open-ended feedback: They really like the week on parallelism –Covering material that is not in the textbook makes CS2 seem fresh and cutting edge –Students really like learning how they can use all their cores instead of just one –Having students experience speedup is key (and even better if they can see it) CCSC:MW 2014 - 36

37 More Implications Software developers who cannot build parallel apps will be unable to leverage the full power of today’s hardware. –At a competitive disadvantage? Designing / writing parallel apps is very different from designing / writing sequential apps. –Pros think in terms of parallel design patterns CCSC:MW 2014 - 37

38 Parallel Design Patterns … are industry-standard strategies that parallel professionals have found useful over 30+ years of practice. … often have direct support built into popular platforms like MPI and OpenMP. … are likely to remain useful, regardless of future PDC developments. … provide a framework for PDC concepts. CCSC:MW 2014 - 38

39 Algorithm Strategy Patterns Example 1: Most parallel programs use one of just three parallel algorithm strategy patterns: Data decomposition: divide up the data and process it in parallel. Task decomposition: divide the algorithm into functional tasks that we perform in parallel (to the extent possible). Pipeline: divide the algorithm into linear stages, through which we “pump” the data. Of these, only data decomposition scales well… CCSC:MW 2014 - 39

40 Data Decomposition (1 thread) Thread 0 CCSC:MW 2014 - 40

41 Data Decomposition (2 threads) Thread 0 Thread 1 CCSC:MW 2014 - 41

42 Data Decomposition (4 threads) Thread 0 Thread 1 Thread 2 Thread 3 CCSC:MW 2014 - 42

43 Task Decomposition Independent functions in a sequential computation can be “parallelized”: int main() { x = f(); y = g(); z = h(); w = x + y + z; } f() g() h() Thread 1Thread 2Thread 3 main() Thread 0 CCSC:MW 2014 - 43

44 Pipeline Programs with non-independent functions… int main() {... while (fin) { fin >> a; b = f(a); c = g(b); d = h(c); fout << d; }... } a0a0 Time- Step: 0 a1a1 b0b0 1 a2a2 b1b1 c0c0 2 a3a3 b2b2 c1c1 d0d0 3 a4a4 b3b3 c2c2 d1d1 4 a5a5 b4b4 c3c3 d2d2 5 a6a6 b5b5 c4c4 d3d3 6 f(a) main() g(b) h(c) Thread 1 Thread 2 Thread 3 Thread 0 … can still be pipelined: CCSC:MW 2014 - 44

45 Scalability If a program gets faster as more threads /cores are used, its performance scales. For the three algorithm strategy patterns: –Only data decomposition scales well. Algorithm Strategy PatternScalability Limited By Task DecompositionNumber of functions/tasks PipelineNumber of pipeline stages Data DecompositionAmount of data to be processed CCSC:MW 2014 - 45

46 The Reduction Pattern Programs often need to combine the local results of N parallel tasks: When N is large, O(N) time is too slow The reduction pattern does it in O(lg(N)) time: 8 8 9 9 1 1 5 5 7 7 6 6 2 2 4 4 To sum these 8 numbers: 14 10 12 6 6 Step 1 24 18 Step 2 42 Step 3 CCSC:MW 2014 - 46

47 A Parallel Pattern Taxonomy

48 Faculty Development Resources National Computational Science Institute (NCSI) offers workshops each summer: – The XSEDE Education Program offers workshops, bootcamps, and facilities: – The LittleFe Project offers “buildouts” at which participants can build (and take home) a free portable Beowulf cluster: – CCSC:MW 2014 - 48

49 LittleFe SIGCSE 2014 - 49 Little Fe (v4): 6 nodes -Dual-core Atom CPU -Nvidia ION2 w/ 16 CUDA cores -2 GB RAM -GigabitEthernet, USB, … -Custom Linux distro (BCCD) -Pelican case -~$2500 (but free at “buildouts”!)

50 Faculty Development Resources CSinParallel is an NSF-funded project to help CS educators integrate PDC topics. –1-3 hour hands-on PDC “modules” in: o Different level courses o Different languages o Different parallel design patterns (patternlets) –Workshops (today, here; summer 2015 in Chicago) –Community of supportive people to help work through problems and issues. – CCSC:MW 2014 - 50

51 Patternlets Demo CCSC:MW 2014 - 51

52 Summary Every CS major should learn about PDC –CS2013 adds PDC to the CS core curriculum –CS2 is a natural place to introduce parallelism, using ‘embarrassingly parallel’ problems –Address synchronization in later courses Parallel design patterns provide a stable intellectual framework for PDC. There are a variety of resources available to help us all make the transition. CCSC:MW 2014 - 52

53 “The pessimist complains about the wind; the optimist expects it to change; the realist adjusts the sails.” - William Arthur Ward Thank you! Time for questions… CCSC:MW 2014 - 53

54 Links to Resources CSinParallel: LittleFe: XSEDE: NCSI: CS Education Act of 2013: – –Rep. Todd Rokita (@ToddRokita) CCSC:MW 2014 - 54

55 SIGCSE 2014 - 55

Download ppt "If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)? Joel Adams Department of Computer Science Calvin College."

Similar presentations

Ads by Google