Oklahoma Supercomputing Symposium 2012 Building Courses Around MPI & Cuda with a LittleFe Karl Frinkle Mike Morris.

Slides:



Advertisements
Similar presentations
NGS computation services: API's,
Advertisements

Oklahoma Supercomputing Symposium Getting HPC into Regional University Curricula with Few Resources Karl Frinkle - Mike Morris.
1 Computational models of the physical world Cortical bone Trabecular bone.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Beowulf Supercomputer System Lee, Jung won CS843.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Types of Parallel Computers
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Job Submission on WestGrid Feb on Access Grid.
Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Course Introduction and Getting Started with C 1 USF - COP C for Engineers Summer 2008.
Compilers and Interpreters. Translation to machine language Every high level language needs to be translated to machine code There are different ways.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Software Development. Chapter 3 – Your first Windows 8 app.
1 Lab 3 Transport Layer T.A. Youngjoo Han. 2 Transport Layer  Providing logical communication b/w application processes running on different hosts 
Command School On Task In Touch Online Software for Schools Developed by Schools.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
CS 1308 Computer Literacy and the Internet. Introduction  Von Neumann computer  “Naked machine”  Hardware without any helpful user-oriented features.
Introduction to Research Consulting Henry Neeman, University of Oklahoma Director, OU Supercomputing Center for Education & Research (OSCER) Assistant.
Extracted directly from:
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Git – versioning and managing your software L. Grewe.
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction.
SC Education Program Paul Gray OU Supercomputing Symposium Oct. 6, 2008.
The Cluster Computing Project Robert L. Tureman Paul D. Camp Community College.
1 Project Information and Acceptance Testing Integrating Your Code Final Code Submission Acceptance Testing Other Advice and Reminders.
C++ Programming Language Lecture 1 Introduction By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Welcome to CS 115! Introduction to Programming. Class URL Write this down!
GPU Architecture and Programming
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Ten Commandments of Word Processing. I. Thou shall not use spaces n Put no more than two spaces together. n Use the key to line things up. n Better yet,
Contemporary Languages in Parallel Computing Raymond Hummel.
OU Supercomputer Symposium – September 23, 2015 Parallel Programming in the Classroom Analysis of Genome Data Karl Frinkle - Mike Morris Parallel Programming.
CS 732: Advance Machine Learning
Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.
Information Systems Design and Development Technical Implications (Software) Computing Science.
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Supercomputing versus Big Data processing — What's the difference?
Operating Systems {week 01.b}
MASS Java Documentation, Verification, and Testing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Research Facilitation
Sustaining HPC Curricula at a Regional Institution
Constructing a system with multiple computers or processors
NGS computation services: APIs and Parallel Jobs
Is System X for Me? Cal Ribbens Computer Science Department
Parallelizing Incremental Bayesian Segmentation (IBS)
Fix Windows 10 Error Code 0x80072ee7,Call Support Number
Student Orientation Fall 2016.
Distributed System Structures 16: Distributed Structures
Using Remote HPC Resources to Teach Local Courses
Introduction to Research Facilitation
Constructing a system with multiple computers or processors
Distributed computing deals with hardware
Constructing a system with multiple computers or processors
By Brandon, Ben, and Lee Parallel Computing.
Types of Parallel Computers
Introduction to Research Facilitation
Presentation transcript:

Oklahoma Supercomputing Symposium 2012 Building Courses Around MPI & Cuda with a LittleFe Karl Frinkle Mike Morris

Oklahoma Supercomputing Symposium 2012 Building Courses Around MPI & Cuda with a LittleFe Karl Frinkle Mike Morris

Oklahoma Supercomputing Symposium 2012 What is a LittleFe? BigFeLittleFe Remember, Fe = Iron

Our first cluster – made from junk – but it worked! This was a “SluggishFe”

Our present LittleFe It’s called “The Kraken”

Our present LittleFe It’s called “The Kraken”

7

Oklahoma Supercomputing Symposium 2012 What is MPI? Message Passing Interface MPI was designed for distributed memory systems, such as clusters. As a side note, OpenMP is more suited to shared memory systems, such as p-threads on a single computer.

Oklahoma Supercomputing Symposium 2012 What is CUDA? Compute Unified Device Architecture CUDA is the computing engine developed by Nvidia for high speed graphics rendering and physics calculations. CUDA allows languages such as C/C++ to take advantage of high- speed GPUs in computationally intensive programs.

Oklahoma Supercomputing Symposium 2012 Building Courses Around MPI & Cuda with a LittleFe

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others

12 Henry Neeman Director, OSCER

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others a. You’re here! b. Attend a summer workshop

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster a. Try your school’s discard stash b. Apply for a FREE LittleFe c. Purchase a LittleFe (< $3,000)

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster III. Get your chair/admin to OK a course a. It doesn’t have to be permanent b. Use a “Special Seminar” c. Be sure to make it upper level d. Be prepared to teach it for free e. Advertise & promote the course

Oklahoma Supercomputing Symposium 2012 (Advertise & Promote the Course) i. Exhaust the CS crowd ii. Court the CIS/MIS/MATH students iii. Contact other disciplines (even artsy ones) iv. Waive most CS pre-reqs – you’ll be pleasantly surprised how quickly students adapt v. Use your (and others’) imaginations!

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster III. Get your chair/admin to OK a course IV. Ask your network techs to help you... a. Connect your cluster to the internet b. Get your cluster to a status where you can SSH into it from off-site c. Get MPI & CUDA running properly on your machine

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster III. Get your chair/admin to OK a course IV. Ask your network techs to help you... V. Work “Hello World” to death a. Start with the MPI version b. Make students modify it (Karl will show us) c. Consider waiting on CUDA until you get some MPI stuff running well

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster III. Get your chair/admin to OK a course IV. Ask your network techs to help you... V. Work “Hello World” to death VI. Come up with a single project a. We used the ol’ reliable matrix multiplication b. Scorn us if you wish – there’s enough material there for a whole semester

Oklahoma Supercomputing Symposium 2012 Recipe for Course Development I. Get involved with Henry Neeman, OSCER & others II. Buy or build a cluster III. Get your chair/admin to OK a course IV. Ask your network techs to help you... V. Work “Hello World” to death VI. Come up with a single project VII. Turn the students loose! (but watch over them)

21... and now, we’ll let Karl take over and...

22... and now, we’ll let Karl take over and... Release The Kraken!

Oklahoma Supercomputing Symposium 2012 Our first course: CS4973 Special Studies – Parallel Programming... from Hello World to memory management matrix multiplication etc.

Oklahoma Supercomputing Symposium 2012 The Matrix: Evolution Teach necessities as you go Phase I: You gotta start somewhere! (1)Matrix stored in ASCII text file with rows/columns (2)Program is on one node, and in C only (3)You learn File I/O here

Oklahoma Supercomputing Symposium 2012 The Matrix: Evolution Teach necessities as you go Phase 2: A little better? (1)Matrix stored in ASCII text file with no rows/columns (2)Program is on one node, and in C only (3)You learn a little mathematical indexing here

Oklahoma Supercomputing Symposium 2012 The Matrix: Evolution Teach necessities as you go Phase 3: Even Distribution over MPI? (1)Matrix stored in binary file (2)Program is simply distributed (3)Learn about (a) MPI file I/O structure (b) Binary files (read, write) (c) Validation process to ensure working code (d) Computer architecture breakdown

Oklahoma Supercomputing Symposium 2012 The Matrix: Evolution Teach necessities as you go Phase 4: Uneven workload (1)Design programs that deal with architecture (2)Program requires more sophisticated MPI code (3)Learn about (a) coding to the architecture (b) performing tasks in the classroom with people! (c) program analysis

Oklahoma Supercomputing Symposium 2012 The Matrix: Evolution Teach necessities as you go Phase 5: Tweaking time (1)Learning new MPI commands to cut down on runtime (2)Streamlining code, how fast can your code run a problem? (3)Continue streamlining, a good program is never complete.

Oklahoma Supercomputing Symposium 2012 Runtimes Program# NodesDimensionReal RuntimeUser RuntimeSystem Runtime Even Distribution 12100X10012m48.744s0m56.396s4m31.217s X1000>24 hours 11000X100077m33.990s23m3.086s53m26.540s Head Node Distribute 12100X1000m3.604s0m0.504s0m1.744s X10003m32.956s0m0.504s0m1.744s Head Node Double Distribute 12100X1000m4.134s0m0.436s0m1.244s X10002m7.069s0m26.554s3m25.397s

Oklahoma Supercomputing Symposium 2012 Our second course: CS4973 Special Studies – CUDA Parallel Programming... from Hello World in CUDA to new team projects C + MPI + CUDA...

Oklahoma Supercomputing Symposium 2012 Our second course: CS4973 Special Studies – CUDA Parallel Programming (1) Large number multiplication (2) Prime number checking (3) password cracking

Oklahoma Supercomputing Symposium 2012 Our third course: CS4973 Special Studies – whatever we want... pondering content at this time...

Oklahoma Supercomputing Symposium 2012 Thank You! We especially thank Henry Neeman, Charlie Peck, Tom Murphy, Bob Panoff and everyone else associated with OU IT, the LittleFe project and all of our colleagues and friends in the educational community involved with HPC, for all the help we have received. Karl Frinkle – Mike Morris