Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker.

Slides:



Advertisements
Similar presentations
To find out more or to apply, please visit our career portal and post your CV. goodyear-dunlop.com/career The Opportunity Develop and apply skill to analyze.
Advertisements

Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Distributed Systems CS
Jeanette Patterson Regional Coordinator – North East CAS Master Teacher CTL Technology – Kings Priory School
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
IPDPS Looking Back Panel Uzi Vishkin, University of Maryland.
NSF/TCPP Early Adopter Experience at Jackson State University Computer Science Department.
James Edwards and Uzi Vishkin University of Maryland 1.
Emerging Educational Approaches Dick Ng’ambi Centre for Educational Technology University of Cape Town.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
June 13, Introduction to CS II Data Structures Hongwei Xi Comp. Sci. Dept. Boston University.
© 2002 University of South Carolina CSCE 491 Computer Engineering Senior Design Project Proposal for Spring 2002 Dr. James P. Davis, Associate Professor.
Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker.
Better Speedups for Parallel Max-Flow George C. Caragea Uzi Vishkin Dept. of Computer Science University of Maryland, College Park, USA June 4 th, 2011.
Graduate Student Satisfaction with an Online Discrete Mathematics Course Amber Settle, CTI, DePaul University joint work with Chad Settle, University of.
Teaching Parallelism Panel, SPAA11 Uzi Vishkin, University of Maryland.
XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin.
Programmability and Portability Problems? Time for Hardware Upgrades Uzi Vishkin ~2003 Wall Street traded companies gave up the safety of the only paradigm.
The Computer Science Course at Omar Al-Mukhtar University, Libya The Computer Science Course at Omar Al-Mukhtar University, Libya User-Centered Design.
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
Globalization and Education Collin College EDUC 1301 Chapter 8.
1 Welcome to EQ2430/EQ2435/EQ2440 Project in Wireless Communication Lecture 1 March 20, 2015 Per Zetterberg School of Electrical Engineering.
Principles/theory matter and can matter more: Big lead of PRAM algorithms on prototype-HW Uzi Vishkin There is nothing more practical than a good theory--
Impact of the 1906 Bay Area Earthquake and San Francisco Fires USGS Forum on Catastrophe Preparedness David Keeton Swiss Re.
Certificate in Information Technology (CIT) EnhanceEdu, IIIT-Hyderabad CIT IIIT-H 1.
DAAD project “Joint Course on OOP using Java” Design Patterns in the course ‘OOP in Java’ - first experiences Ana Madevska Bogdanova Institute of informatics.
Introduction to Parallel Computing. Serial Computing.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
I-SPAN’05 December 07, Process Scheduling for the Parallel Desktop Designing Parallel Operating Systems using Modern Interconnects Process Scheduling.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
If Exascale by 2018, Really? Yes, if we want it, and here is how Laxmikant Kale.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Does humans-in-the-service-of-technology have a future Preview of Viewpoint article: Is Multi-Core Hardware for General-Purpose Parallel Processing Broken?
Course Introduction CSE250. Course Overview This course will be difficult Work hard and start early You are adults and I will treat you as such – I won’t.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
Stakeholders How to engage them ? How to ensure success ? David Padua University of Illinois at Urbana-Champaign.
Using Alice in an introductory programming course for non-CS majors Adelaida A. Medlock Department of Computer Science Drexel University
Barriers to Industry HPC Use or “Blue Collar” HPC as a Solution Presented by Stan Ahalt OSC Executive Director Presented to HPC Users Conference July 13,
Programmability Hiroshi Nakashima Thomas Sterling.
S3.kth.se EQ2430/2440 Lecture 21/3-11 Per Zetterberg.
Department of Computer Science Get the Parallelism out of my Cloud Karu Sankaralingam and Remzi H. Arpaci-Dusseau University of Wisconsin-Madison
CDA 4253 FPGA System Design Hao Zheng Dept of Comp Sci & Eng USF.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
09/02/2010CS4961 CS4961 Parallel Programming Lecture 4: CTA, cont. Data and Task Parallelism Mary Hall September 2,
University of Washington Today Quick review? Parallelism Wrap-up 
Concurrency and Performance Based on slides by Henri Casanova.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
“A Learner-Centered Computational Experience in Nanotechnology for Undergraduate STEM Students” IEEE ISEC 2016 Friend Center at Princeton University March.
{ Community Event: District Educator Dawnita Westover EDU: 620 Meeting Individual Student Needs With Technology Instructor Conner July 6, 2015.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
CS498 DHP Program Optimization Fall Course organization  Instructors: Mar í a Garzar á n David Padua.
Parallel Programming By J. H. Wang May 2, 2017.
Computer Science Department
Postgraduate Research in Edinburgh
Computer Science Department
Meredith A. Henry, M.S. Department of Psychology
Providing a Supported Online Course on Parallel Computing Steven I
Lesson Objectives Aims You should be able to:
Alternative Processor Panel Results 2008
Computing and Mathematics
Lecture on High Performance Processor Architecture (CS05162)
Mattan Erez The University of Texas at Austin
Presentation transcript:

Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker Jeffrey C. Carver, University of Alabama

Motivation 1/4 Programmers of today’s parallel machines must overcome 3 productivity busters, beyond just identifying operations that can be executed in parallel: (i)impose the often difficult 4-step programming-for-locality recipe: decomposition, assignment, orchestration, and mapping [CS99] (ii) reason about concurrency in threads; e.g., race conditions (iii) for machines such as GPU, that fall behind on serial (or low parallelism) code, whole programs must be highly parallel 2

Motivation 2/4: Commodity computer systems If you want your program to run significantly faster … you’re going to have to parallelize it  Parallelism: only game in town But, where are the players? “The Trouble with Multicore: Chipmakers are busy designing microprocessors that most programmers can't handle”—D. Patterson, IEEE Spectrum 7/2010 Only heroic programmers can exploit the vast parallelism in current machines – Report by CSTB, U.S. National Academies 2011 An education agenda must: (i) recognize this reality, (ii) adapt to it, and (iii) identify broad impact opportunities for education

Motivation 3/4: Technical Objectives Parallel computing exists for providing speedups over serial computing Its emerging democratization  the general body of CS students & graduates must be capable of achieving good speedups What is at stake? A general-purpose computer that can be programmed effectively by too few programmers, or requires excessive learning  application SW development costs more, weakening market potential of not only the computer: Traditionally, Economists look to the manufacturing sector for bettering the recovery prospects of the economy. Software production is the quintessential 21 st century mode of manufacturing. These prospects are at peril if most programmers are unable to design effective software for mainstream computers 4

Motivation 4/4: Possible Roles for Education Facilitator. Prepare & train students and the workforce for a future dominated by parallelism. Testbed. Experiment with vertical approaches and refine them to identify the most cost-effective ways for achieving speedups. Benchmark. Given a vertical approach, identify the developmental stage at which it can be taught. Rationale: Ease of learning/teaching is a necessary (though not sufficient) condition for ease-of- programming 5

The joint inter-university course UIUC: Parallel Programming for Science and Engineering, Prof: DP UMD: Parallel Algorithms, Prof: UV Student population: upper-division undergrads and graduate students. Diverse majors and backgrounds ~1/2 of the fall 2010 sessions, joint by videoconferencing. Objectives 1.Demonstrate logistical and educational feasibility of a real-time joint co-taught course. Outcome Overall success. Minimal glitches. Helped to alert students that success on material taught by the other prof is as important. 2. Compare OpenMP using 8-processor SMP against PRAM/XMTC using 64-processor XMT (<1/4 of silicon area for 2 SMP processors) 6

Joint sessions DP taught OpenMP programming. Provided parallel architecture knowledge UV taught parallel (PRAM) algorithms. ~20 minutes of XMTC programming 3 joints programming assignments Non-shared sessions UIUC: mostly MPI. Submitted more OpenMP programming assignments UMD: More parallel algorithms. Dry homework on design & analysis of parallel algorithms. Submitted a more demanding XMTC programming assignment JC: Anonymous questionnaire filled by the students. Accessed by DP and UV only after all grades were posted, per IRB guidelines 7

Rank approaches for achieving (hard) speedups Breadth-first-search (BFS) example 42 students in fall 2010 joint UIUC/UMD course -<1X speedups using OpenMP on 8-processor SMP -7x-25x speedups on 64-processor XMT FPGA prototype Questionnaire All students, but one : XMTC ahead of OpenMP 8

Has the study of PRAM algorithms helped XMT programming? Majority of UIUC students No UMD students Strong Yes: enforced by written explanation Discussion Exposure of UIUC students to PRAM algorithms and XMT programming much more limited. Their understanding of this material not challenged by analytic homework, or exams. For same programming challenges, performance of UIUC and UMD students was similar. Must students must be exposed to minimal amount of parallel algorithms and their programming, and be properly challenged on analytic understanding to internalize their merit? If yes: tension with pressure on parallel computing courses to cover a hodge-podge of programming paradigms & architecture backgrounds

More Issues/lessons Recall the title of the courses at UIUC/UMD: Should we use class time only for algorithms or also for programming? Algorithms: high level of abstraction. Allows to cover more advanced problems. Note: Understanding tested only for UMD students. Made do with already assigned courses. Next time: more homogenous population; e.g., CS grad class. If interested in taking part, please let us know General lesson: IRB requires pre-submission of all questionnaires. Must complete planning by then.

Conclusion For parallelism to succeed serial computing in the mainstream, the first experience of students got to: - demonstrate solid hard speedups - be trauma-free Beyond education If broadly done, objective ranking of approaches for achieving hard speedups, through education and by other means, provide a clue for curing the ills of the field. 11

Course homepages agora.cs.illinois.edu/display/cs420fa10/Home and For summary of the PRAM/XMT education approach: Includes teaching experience extending from middle school to graduate courses, course material [class notes, programming assignments, video presentations of a full- day tutorial and a full-semester graduate course, a software toolchain (compiler and cycle-accurate simulator [HIPS 5/20]) available for free download, and the XMT hardware 12