CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.

Slides:



Advertisements
Similar presentations
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Advertisements

Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
CS 140: Models of parallel programming: Distributed memory and MPI.
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.
Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.
The Structure of Networks with emphasis on information and social networks RU T-214-SINE Summer 2011 Ýmir Vigfússon.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
Chapter Chapter Goals Describe the layers of a computer system Describe the concept of abstraction and its relationship to computing Describe.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
CS 240A: Models of parallel programming: Distributed memory and MPI.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
1 Computer Science, University of Warwick Metrics  FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Tools and Primitives for High Performance Graph Computation
CS240A: Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation.
Lecture 1: Introduction to High Performance Computing.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Chapter 01 Nell Dale & John Lewis.
1 High-Performance Graph Computation via Sparse Matrices John R. Gilbert University of California, Santa Barbara with Aydin Buluc, LBNL; Armando Fox, UCB;
Agenda9/11/13 Do Now –Display your name tag and log into your computer Pre-Assessment Test Info and Interests Syllabus and Course Expectations Opening.
1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara Grand Challenges in Data-Intensive Discovery.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Computers and Computing. COS 111 Computers and Computing.
Chapter 1 The Big Picture.
CS6963 L15: Design Review and CUBLAS Paper Discussion.
Dr. John Lowther, Associate Professor of CS Adjunct Associate Prof. of Cognitive and Learning Sciences Computer Graphics:
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Tests and tools for ENEA GRID Performance test: HPL (High Performance Linpack) Network monitoring A.Funel December 11, 2007.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
A lower bound to energy consumption of an exascale computer Luděk Kučera Charles University Prague, Czech Republic.
CMSC104 Problem Solving and Computer Programming Spring 2011 Section 04 John Park.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
CMSC104 Problem Solving and Computer Programming Spring 2009 Sections 0201 & 0301 Ms. Dawn Block.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
CS240A: Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Preliminary CPMD Benchmarks On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Carnegie Mellon University © Robert T. Monroe Management Information Systems Cloud Computing I Cloud Models and Technologies Management.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
CMSC104 Problem Solving and Computer Programming Spring 2008
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction Super-computing Tuesday
CMSC104 Problem Solving and Computer Programming Fall 2010 Section 01
Super Computing By RIsaj t r S3 ece, roll 50.
Parallel Computers Today
CMSC 104 Problem Solving and Computer Programming Fall 2010
Lecture 1: Parallel Architecture Intro
Parallel Analytic Systems
CMSC104 Problem Solving and Computer Programming Fall 2010
CMSC104 Problem Solving and Computer Programming Spring 2010
CMSC104 Problem Solving and Computer Programming Fall 2009 Section 2
Problem Solving and Computer Programming
CMSC104 Problem Solving and Computer Programming Spring 2010
Presentation transcript:

CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for some of their slides.

Course bureacracy Read course home page Join Google discussion group (see course home page) Accounts on Triton, San Diego Supercomputing Center: Use “ssh –keygen –t rsa” and then your “id_rsa.pub” file to Stefan Boeriu, If you weren’t signed up for the course as of last week, me your registration info right away Triton logon demo & tool intro coming soon– watch Google group for details

Homework 1 See course home page for details. Find an application of parallel computing and build a web page describing it. Choose something from your research area. Or from the web or elsewhere. Create a web page describing the application. Describe the application and provide a reference (or link) Describe the platform where this application was run Find peak and LINPACK performance for the platform and its rank on the TOP500 list Find the performance of your selected application What ratio of sustained to peak performance is reported? Evaluate the project: How did the application scale, ie was speed roughly proportional to the number of processors? What were the major difficulties in obtaining good performance? What tools and algorithms were used? Send us (John and Matt) the link -- we will post them Due next Monday, April 4

Why are we here? Computational science The world’s largest computers have always been used for simulation and data analysis in science and engineering. Performance Getting the most computation for the least cost (in time, hardware, or energy) Architectures All big computers (and most little ones) are parallel Algorithms The building blocks of computation

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point ops/sec  PFLOPS = 1,000,000,000,000,000 / sec (10 15 )

Supercomputers 1976:Cray-1, 133 MFLOPS (10 6 ) Supercomputers 1976: Cray-1, 133 MFLOPS (10 6 )

Trends in processor clock speed

AMD Opteron 12-core chip

Generic Parallel Machine Architecture Key architecture question: Where is the interconnect, and how fast? Key algorithm question: Where is the data? Proc Cache L2 Cache L3 Cache Memory Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects

4-core Intel Nehalem chip (2 per Triton node):

Triton memory hierarchy Node Memory Proc Cache L2 Cache L3 Cache Proc Cache L2 Cache Proc Cache L2 Cache Proc Cache L2 Cache Proc Cache L2 Cache L3 Cache Proc Cache L2 Cache Proc Cache L2 Cache Proc Cache L2 Cache Chip Node

One kind of big parallel application Example: Bone density modeling Physical simulation Lots of numerical computing Spatially local See Mark Adams’s slides…

“The unreasonable effectiveness of mathematics” As the “middleware” of scientific computing, linear algebra has supplied or enabled: Mathematical tools “Impedance match” to computer operations High-level primitives High-quality software libraries Ways to extract performance from computer architecture Interactive environments Computers Continuous physical modeling Linear algebra

14 Top 500 List (November 2010) = x P A L U Top500 Benchmark: Solve a large system of linear equations by Gaussian elimination

15 Large graphs are everywhere… WWW snapshot, courtesy Y. HyunYeast protein interaction network, courtesy H. Jeong Internet structure Social interactions Scientific datasets: biological, chemical, cosmological, ecological, …

Another kind of big parallel application Example: Vertex betweenness centrality Exploring an unstructured graph Lots of pointer-chasing Little numerical computing No spatial locality See Eric Robinson’s slides…

Social network analysis Betweenness Centrality (BC) C B (v): Among all the shortest paths, what fraction of them pass through the node of interest? Brandes’ algorithm A typical software stack for an application enabled with the Combinatorial BLAS

An analogy? Computers Continuous physical modeling Linear algebra Discrete structure analysis Graph theory Computers

Node-to-node searches in graphs … Who are my friends’ friends? How many hops from A to B? (six degrees of Kevin Bacon) What’s the shortest route to Las Vegas? Am I related to Abraham Lincoln? Who likes the same movies I do, and what other movies do they like?... See breadth-first search example slides

20 Graph 500 List (November 2010) Graph500 Benchmark: Breadth-first search in a large power-law graph

21 Floating-Point vs. Graphs = x P A L U Petaflops 6.6 Gigateps

22 Floating-Point vs. Graphs = x P A L U Peta / 6.6 Giga is about 380,000! 2.5 Petaflops 6.6 Gigateps

An analogy? Well, we’re not there yet …. Discrete structure analysis Graph theory Computers  Mathematical tools ? “Impedance match” to computer operations ? High-level primitives ? High-quality software libs ? Ways to extract performance from computer architecture ? Interactive environments