Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters.

Similar presentations


Presentation on theme: "Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters."— Presentation transcript:

1 Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters Programming methods, languages, and environments Message passing (SR, MPI, Java) Higher-level language: HPF Applications N-body problems, search algorithms Grid computing Multimedia content analysis on Grids (guest lecture Frank Seinstra) Many-core (GPU, Cell) programming (guest lectures Ana Varbanescu and Rob van Nieuwpoort)

2 Parallel Programming on Computational Grids

3 Outline Grids Parallel programming on grids Case study: Ibis Paper: –Real-World Distributed Computing with Ibis,

4 IEEE Computer Aug. 2010

5

6 Grids Seamless integration of geographically distributed computers, databases, instruments –The name is an analogy with power grids Highly active research area –Open Grid Forum –Globus middleware –Many European projects, e.g.: Gridlab: Grid Application Toolkit and Testbed DEISA: Distributed European Infrastructure for Supercomputing Applications XtreemOS: Linux-based OS for grids Contrail (1 Oct 2010): Cloud computing –VL-e (Virtual laboratory for e-Science) project –….

7 Why Grids? New distributed applications that use data or instruments across multiple administrative domains and that need much CPU power –Computer-enhanced instruments –Collaborative engineering –Browsing of remote datasets –Use of remote software –Data-intensive computing –Very large-scale simulation –Large-scale parameter studies

8 Web, Grids, Clouds and e-Science Web is about exchanging information Grid is about sharing resources –Computers, data bases, instruments Clouds are about hiring resources –Pay-as-you go, virtualization e-Science supports experimental science by providing a virtual laboratory on top of Grids & Clouds –Support for visualization, workflows, data management, security, authentication, high-performance computing

9 The big picture Management of comm. & computing Management of comm. & computing Management of comm. & computing Potential Generic part Potential Generic part Potential Generic part Application Virtual Laboratory Application oriented services Grids Harness distributed resources

10 The data explosion e-Science experiments generate much data, that often is distributed and that need much (parallel) processing –high-resolution imaging: ~ 1 GByte per measurement –Bio-informatics queries: 500 GByte per database –Satellite world imagery: ~ 5 TByte/year –Current particle physics: 1 PByte per year –LHC physics: 10-30 PByte per year

11 Distributed supercomputing Parallel processing on geographically distributed computing systems (grids) Examples: –SETI@home ( ), RSA-155, Entropia, Cactus Mostly limited to trivially parallel applications Questions: –Can we generalize this to more HPC applications? –What high-level programming support is needed?

12 Grids usually are hierarchical –Collections of clusters, supercomputers –Fast local links, slow wide-area links Can optimize algorithms to exploit this hierarchy –Minimize wide-area communication Wide-area bandwidth is increasing –DAS-3/DAS-4 have 10 Gb/s dedicated optical links between the sites –Wide-area latency remains high (limited by speed-of-light) Speedups on a grid?

13 Example: N-body simulation Much wide-area communication –Each node needs info about remote bodies CPU 1 CPU 2 CPU 1 CPU 2 AmsterdamDelft

14 Trivial optimization AmsterdamDelft CPU 1 CPU 2 CPU 1 CPU 2

15 Wide-area optimizations Message combining on wide-area links Latency hiding on wide-area links Collective operations for wide-area systems –Broadcast, reduction, all-to-all exchange Load balancing Conclusions: –Many applications can be optimized to run efficiently on a hierarchical wide-area system –Need better programming support

16 Outline Grids Case study: Ibis Paper: –Real-World Distributed Computing with Ibis, Sept. 2003

17 The Ibis system High-level & efficient programming support for distributed supercomputing Use Java-centric approach + JVM technology –Inherently more portable than native compilation Goal: drastically simplify programming and deployment of high performance distributed applications Target: –Large-scale distributed systems, including clusters, grids, desktop grids, clouds, mobile devices …. –Possibly all at the same time for 1 application

18 Real-world distributed systems

19 World wide testbed

20 Problem How to write (high-performance) applications for real-world distributed systems? How to deal with: –Performance: efficiency on wide-area system –Heterogeneity: different systems & APIs –Malleability:resources come and go –Fault tolerance: crashes –Connectivity:firewalls, NAT, etc.

21 Our approach Study fundamental underlying problems … hand-in-hand with realistic applications … integrate solutions in one system: Ibis Distributed SystemsUser !

22 Applications Scientific applications –Imaging (VU Medical Center, AMOLF) –Bioinformatics (sequence analysis) –Astronomy (data analysis challenge) Multimedia content analysis Games and model checking Semantic web (distributed reasoning)

23 Multimedia content analysis Automatically extract information from images & video –E.g., video archive, surveillance cameras Extract feature vectors from images –Describe properties (color, shape) –Data-parallel task on a cluster Compute on consecutive images –Task-parallelism on a grid

24 Example: object recognition ● Analyze video stream from camera to learn and recognize every-day objects ● Representative for more serious applications ● Same algorithms used for surveillance cameras ● London Underground  >120.000 years of processing for >> 10.000’s CCTV cameras

25 Games and Model Checking Can solve entire Awari game on wide-area DAS-3 (889 B positions) –Needs 10G private optical network Distributed model checking has very similar communication pattern –Search huge state spaces, random work distribution, bulk asynchronous transfers Can efficiently run DeVinE model checker on wide- area DAS-3, use up to 1 TB memory

26 Distributed reasoning MaRVIN (Frank van Harmelen et al, VU): –A distributed platform for massive RDF inferencing (deductive closure) –``a brain the size of a planet’’ Uses Ibis to run on heterogeneous systems (clusters, desktop grids) Used for Billion Triple track of Semantic Web Challenge 2008 –Inputs 800M RDF triples, derives 29B triples

27 WebPIE Web-scale Parallel Inference Engine MapReduce-based distributed RDFS/OWL inference engine using Hadoop (not yet Ibis) Used up to 100 Billion triples Won CCGrid SCALE2010 Award

28 Awards SCALE 2008 (CCGrid’08) DACH 2008 – BS DACH 2008 - FT AAAI-VC 2007 ISWC 2008 / SCALE 2010 Multimedia Computing Astronomy Semantic Web (van Harmelen et al.) (Cluster/Grid’08)

29 Ibis Philosophy Real-world distributed applications should be developed and compiled on a local workstation, and simply be launched from there

30 Ibis Approach Virtual Machines (Java) deal with heterogeneity Provide range of programming abstractions Designed for dynamic/faulty environments Easy deployment through middleware-independent programming interfaces Modular and flexible: can replace Ibis components by external ones

31 Ibis Design Applications need functionality for –Programming (as in programming languages) –Deployment (as in operating systems) Programming Logical Likes math Deployment Practical Visual (GUI)

32 Ibis System

33 Ibis brains

34 Programming system

35 Programming models Message passing (IPL, RMI, MPJ) Satin: Fault-tolerant, malleable divide-and-conquer system Jorus: Transparent library with multimedia operations Maestro: Self-optimizing fault-tolerant dataflow framework

36 Satin: a parallel divide-and-conquer system on top of Ibis Divide-and-conquer is inherently hierarchical More general than master/worker Satin: Cilk-like primitives (spawn/sync) in Java

37 Example interface FibInter { public int fib(long n); } class Fib implements FibInter { int fib (int n) { if (n < 2) return n; return fib(n-1) + fib(n-2); } Single-threaded Java

38 Example Java + divide&conquer interface FibInter extends ibis.satin.Spawnable { public int fib(long n); } class Fib extends ibis.satin.SatinObject implements FibInter { public int fib (int n) { if (n < 2) return n; int x = fib (n - 1); int y = fib (n - 2); sync(); return x + y; }

39 IPL (Ibis Portability Layer) Java-centric “run-anywhere” library Point-to-point, multicast, streaming Simple model for tracking resources –Join-Elect-Leave –Supports malleability & fault-tolerance

40 SmartSockets library Detects connectivity problems Tries to solve them automatically With as little help from the user as possible Integrates existing and several new solutions Reverse connection setup, STUN, TCP splicing, SSH tunneling, smart addressing, etc. Uses network of hubs as a side channel

41 SmartSockets

42 Ibis Deployment system

43 IbisDeploy GUI

44 JavaGAT GAT: Grid Application Toolkit –Makes grid applications independent of the underlying grid infrastructure Used by applications to access grid services –File copying, resource discovery, job submission & monitoring, user authentication Successor API is currently being standardized

45 Grid Applications with GAT GAT Engine Remote Files Monitoring Info service Resource Management GridLabGlobusUnicoreSSHP2PLocal GAT Grid Application File.copy(...)‏ submitJob(...)‏ gridftp globus Intelligent dispatching Koala

46 JavaGat example

47 Zorilla: Java P2P supercomputing middleware

48 Ibis Architecture

49 Ibis demo (movie)

50 Object recognition Client Broker Servers Ibis (Java) Runs simultaneously on clusters (DAS-3, Japan, Australia), Desktop Grid, Amazon EC2 Cloud Connectivity problems solved automatically by Ibis SmartSockets

51 Ibis movie (part 1)

52 Performance on 1 DAS-3 cluster Relative speedups of Java/Ibis and C++/MPI –Using TCP or Myricom’s MX protocol Sequential performance Java: 88% of C++

53 DAS-3DAS-3

54 Speedup (wide-area) Homogeneous wide-area systems (DAS-3): –Frame rate increases linearly with #clusters World-wide experiment : –24 frames per second (@ 640 x 480 resolution) –Speed limited by camera, not computing infrastructure

55 Smart Phones GSM + PC + GPS + camera + networks + …. Will become ubiquitous (like GSMs) Our goal: study distributed applications running on (multiple) smart phones & other resources

56 Distributed smart phone applications Current model: client/server –Client runs on the phone –Server runs in a cloud provided by developer Disadvantages –User depends on service provider –Developer must deal with scalability, cost etc

57 Cyber Foraging ``Dynamically augment the computing resources of a wireless mobile computer by exploiting wired hardware infrastructure (surrogates)’’ –``Living off the land’’ [Satyanarayanan, IEEE Pers. Comm. 2001] Surrogates –Any PC, cluster, grid, cloud … –No pre-installed application code –Can be used for different applications Requires deployment and communication systems → Ibis

58 Cyber Foraging with Ibis Implemented Ibis on Android –Google’s open-source Java-based platform Ibis deployment system –JavaGAT (SSH adaptor) –IbisDeploy library + GUI Ibis programming system –SmartSockets library –IPL + Jorus multimedia library

59 Application: eyeDentify Object recognition on a G1 smartphone Smartphone is too limited for the application –Can reduce accuracy parameters of the algorithm –Can run only up to 128 x 96 pixels (memory bound)

60 eyeDentify with cyber foraging Ibis cyber foraging version [ISM’09] –Deploys computation server (with high accuracy and large images) on a surrogate (DAS-3 cluster) –Launched from IbisDeploy/eyeDentify client on phone + +

61 Comparison Response time for 64 x 48 pixels –Standalone version: 32 sec –Foraging: 0.54 sec (0.12 sec computation) Response time for 2048 x 1536 pixels –Standalone: would take ~ 20 minutes with enough memory –Foraging: 6.5 sec (4.9 sec computation) Foraging version is 40x more energy-efficient

62 Other distributed applications Disaster management (Katrina) –Use ad-hoc Wifi network when GSM network fails –Finding nearby people with certain skills Bus drivers, CPR –Distributed decision support Moving people to shelters (logistics) Social networks –Similar issues Find nearby friends, decide on restaurant

63 Another serious app Track position → automatic diary of your life Cross-comparisons between diaries Haven’t we met before? Yes, on 23 Oct 2010, 3.48 pm at N 52°22.688´ E 004°53.990´

64 Interdroid Distributed Communication Data Management Novel Mobile Distributed Applications Context Sensitive Programming Models

65 Summary Parallel computing on Grids (distributed supercomputing) is a challenging and promising research area Ibis: a Java-centric Grid programming environment Extends to the mobile world


Download ppt "Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters."

Similar presentations


Ads by Google