Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show &

Slides:



Advertisements
Similar presentations
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
Computer Architecture and Data Manipulation Chapter 3.
History of Distributed Systems Joseph Cordina
Introduction CS 524 – High-Performance Computing.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Parallel Computing Overview CS 524 – High-Performance Computing.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
PC hardware and x86 programming Lec 2 Jinyang Li.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Introduction.  This course is all about how computers work  But what do we mean by a computer?  Different types: desktop, servers, embedded devices.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Parallel Computing.
CS591x -Cluster Computing and Parallel Programming
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Outline Why this subject? What is High Performance Computing?
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Paula Michelle Valenti Garcia #30 9B. MULTICORE TO CLUSTER Parallel circuits processing, symmetric multiprocessor, or multiprocessor: in the PC has been.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Parallel Processing - introduction
Modern Processor Design: Superscalar and Superpipelining
Morgan Kaufmann Publishers
What is Parallel and Distributed computing?
Chapter 1 Introduction.
What is Computer Architecture?
What is Computer Architecture?
What is Computer Architecture?
Husky Energy Chair in Oil and Gas Research
CSE378 Introduction to Machine Organization
Presentation transcript:

Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show & tell Focus primarily on distributed memory computing on linux PC clusters Target audience: –Experience with linux computing & Fortran/C –Requires parallel computing for own studies 1 credit possible for completion of ‘proportional’ project

Main idea Distribute a job over multiple processing units Do bigger jobs than is possible on single machines Solve bigger problems faster Resources: e.g., www-jics.cs.utk.edu

Sequential limits Moore’s law Clock speed physically limited –Speed of light –Miniaturization; dissipation; quantum effects Memory addressing –32 bit words in PCs: 4 Gbyte RAM max.

Machine architecture: serial –Single processor –Hierarchical memory: Small number of registers on CPU Cache (L1/L2) RAM Disk (swap space) –Operations require multiple steps Fetch two floating point numbers from main memory Add and store Put back into main memory

Vector processing Speed up single instructions on vectors –E.g., while adding two floating point numbers fetch two new ones from main memory –Pushing vectors through the pipeline Useful in particular for long vectors Requires good memory control: –Bigger cache is better Common on most modern CPUs –Implemented in both hardware and software

SIMD Same instruction works simultaneously on different data sets Extension of vector computing Example: DO IN PARALLEL for i=1,n x(i) = a(i)*b(i) end DONE PARALLEL

MIMD Multiple instruction, multiple data Most flexible, encompasses SIMD/serial. Often best for ‘coarse grained’ parallelism Message passing Example: domain decomposition –Divide computational grid in equal chunks –Work on each domain with one CPU –Communicate boundary values when necessary

1976 Cray-1 at Los Alamos (vector) 1980s Control Data Cyber 205 (vector) 1980s Cray XMP –4 coupled Cray-1s 1985 Thinking Machines Connection Machine –SIMD, up to 64k processors Nec/Fujitsu/Hitachi –Automatic vectorization Historical machines

Sun and SGI (90s) Scaling between desktops and compute servers –Use of both vectorization and large scale parallelization –RISC processors –Sparc for Sun –MIPS for SGI: PowerChallenge/Origin

Happy developments High performance Fortran / Fortran 90 Definitions for message passing languages –PVM –MPI Linux Performance increase of commodity CPUs Combination leads to affordable cluster computing

Who’s the biggest Linpack matrix-vector benchmarks June 2003: –Earth Simulator, Yokohama, NEC, 36 Tflops –Asci Q, Los Alamos, HP, 14 Tflops –Linux cluster, Livermore, 8 Tflops

Parallel approaches Embarrassingly parallel –“Monte Carlo” searches home Analyze lots of small time series Parallalize DO-loops in dominantly serial code Domain decomposition –Fully parallel –Requires complete rewrite/rethinking

Example: seismic wave propagation 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002) Massively parallel computation on linux PC clusters Approx. 34 Gbyte RAM needed for 10 km average resolution

Resolution Spectral elements: 10 km average resolution 4 th order interpolation functions Reasonable graphics resolution: 10 km or better 12 km: = 1 GB 6 km: = 8 GB

Simulated EQ (d=15 km) after 17 minutes 512x colors Positive only Truncated max Log10 scale Particle velocity P PP PKIKP SK PPP PKP PKPab

512x colors Positive only Truncated max Log10 scale Particle velocity Some S component R PKS PcSS PcS S SS

Resources at UM Various linux clusters in Geology –Agassiz (Ehlers) 8 Pentium 2 Gbyte each –Panoramix (van Keken) Gbyte –Trans (van Keken, Ehlers) 24 2 Gbyte SGIs –Origin 2000 (Stixrude, Lithgow, van Keken) Center for Advanced UM –Athlon clusters (384 1 Gbyte each) –Opteron cluster (to be installed) NPACI

Software resources GNU and Intel compilers –Fortran/Fortran 90/C/C++ MPICH www-fp.mcs.anl.gov –Primary implementation of MPI –“Using MPI” 2 nd edition, Gropp et al., 1999 Sun Grid Engine Petsc www-fp.mcs.anl.gov –Toolbox for parallel scientific computing