Presentation is loading. Please wait.

Presentation is loading. Please wait.

Denis Caromel 1 A Strong Programming Model Bridging Distributed and Multi-Core Computing 1.Background: INRIA, Univ. Nice, OASIS.

Similar presentations


Presentation on theme: "Denis Caromel 1 A Strong Programming Model Bridging Distributed and Multi-Core Computing 1.Background: INRIA, Univ. Nice, OASIS."— Presentation transcript:

1

2 Denis Caromel 1 Denis.Caromel@inria.fr A Strong Programming Model Bridging Distributed and Multi-Core Computing 1.Background: INRIA, Univ. Nice, OASIS Team 2.Programming: Parallel Programming Models: a)Asynchronous Active Objects, Futures, Typed Groups b)High-Level Abstractions (OO SPMD, Comp., Skeleton) 3.Optimizing 4.Deploying

3 Denis Caromel 2 1. Background and Team at INRIA

4 Denis Caromel 3 OASIS Team & INRIA Computer Science and Control 8 Centers all over France Workforce: 3 800 Strong in standardization committees: – IETF, W3C, ETSI, … Strong Industrial Partnerships Foster company foundation: 90 startups so far - Ilog (Nasdaq, Euronext) - … - ActiveEon A joint team between: INRIA, Nice Univ. CNRS Participation to EU projects: –CoreGrid, EchoGrid, Bionets, SOA4ALL –GridCOMP (Scientific Coordinator) ProActive 4.0.1: Distributed and Parallel: From Multi-cores to Enterprise+Science GRIDs

5 Denis Caromel 4 Startup Company Born of INRIA Co-developing, Providing support for Open Source ProActive Parallel SuiteProActive Parallel Suite Worldwide Customers (EU, Boston USA, etc.)

6 Denis Caromel 5 OASIS Team Composition (35) Researchers (5): D. Caromel (UNSA, Det. INRIA) E. Madelaine (INRIA) F. Baude (UNSA) F. Huet (UNSA) L. Henrio (CNRS) PhDs (11): Antonio Cansado (INRIA, Conicyt) Brian Amedro (SCS-Agos) Cristian Ruz (INRIA, Conicyt) Elton Mathias (INRIA-Cordi) Imen Filali (SCS-Agos / FP7 SOA4All) Marcela Rivera (INRIA, Conicyt) Muhammad Khan (STIC-Asia) Paul Naoumenko (INRIA/Région PACA) Viet Dung Doan (FP6 Bionets) Virginie Contes (SOA4ALL) Guilherme Pezzi (AGOS, CIFRE SCP) + Visitors + Interns PostDoc (1): Regis Gascon (INRIA) Engineers (10): Elaine Isnard (AGOS) Fabien Viale (ANR OMD2, Renault ) Franca Perrina (AGOS) Germain Sigety (INRIA) Yu Feng (ETSI, FP6 EchoGrid) Bastien Sauvan (ADT Galaxy) Florin-Alexandru.Bratu (INRIA CPER) Igor Smirnov (Microsoft) Fabrice Fontenoy (AGOS) Open position (Thales) Trainee (2): Etienne Vallette d’Osia (Master 2 ISI) Laurent Vanni (Master 2 ISI) Assistants (2): Patricia Maleyran (INRIA) Sandra Devauchelle (I3S)

7 Denis Caromel 6 ProActive Contributors

8 Denis Caromel 7 ProActive Parallel Suite: Architecture

9 Denis Caromel 8

10 9 ProActive Parallel Suite 9 Physical Infrastructure

11 Denis Caromel 10 ProActive Parallel Suite 10

12 Denis Caromel 11 2. Programming Models for Parallel & Distributed

13 Denis Caromel 12 ProActive Parallel Suite

14 Denis Caromel 13 ProActive Parallel Suite

15 Denis Caromel 14 Distributed and Parallel Active Objects

16 Denis Caromel 15 A ProActive : Active objects Proxy Java Object A ag = newActive (“A”, […], VirtualNode) V v1 = ag.foo (param); V v2 = ag.bar (param);... v1.bar(); //Wait-By-Necessity V Wait-By-Necessity is a Dataflow Synchronization JVM A Active Object Future Object Request Req. Queue Thread v1 v2 ag WBN!

17 Denis Caromel 16 First-Class Futures Update

18 Denis Caromel 17 Wait-By-Necessity: First Class Futures ba Futures are Global Single-Assignment Variables V= b.bar () c c c.gee (V) v v b

19 Denis Caromel 18 Wait-By-Necessity: Eager Forward Based ba AO forwarding a future: will have to forward its value V= b.bar () c c c.gee (V) v v b

20 Denis Caromel 19 Wait-By-Necessity: Eager Message Based ba AO receiving a future: send a message V= b.bar () c c c.gee (V) v v b

21 Denis Caromel 20 Standard system at Runtime: No Sharing NoC: Network On Chip Proofs of Determinism

22 Denis Caromel 21 (2) ASP: Asynchronous Sequential Processes ASP  Confluence and Determinacy Future updates can occur at any time Execution characterized by the order of request senders Determinacy of programs communicating over trees, … A strong guide for implementation, Fault-Tolerance and checkpointing, Model-Checking, …

23 Denis Caromel 22 No Sharing even for Multi-Cores Related Talks at PDP 2009 SS6 Session, today at 16:00 Impact of the Memory Hierarchy on Shared Memory Architectures in Multicore Programming Models Rosa M. Badia, Josep M. Perez, Eduard Ayguade, and Jesus Labarta Realities of Multi-Core CPU Chips and Memory Contention David P. Barker

24 Denis Caromel 23 TYPED ASYNCHRONOUS GROUPS

25 Denis Caromel 24 A Creating AO and Groups Typed Group Java or Active Object A ag = newActiveGroup (“A”, […], VirtualNode) V v = ag.foo(param);... v.bar(); //Wait-by-necessity V Group, Type, and Asynchrony are crucial for Composition JVM

26 Denis Caromel 25 Broadcast and Scatter JVM ag cg ag.bar(cg); // broadcast cg ProActive.setScatterGroup(cg) ; ag.bar(cg); // scatter cg c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 c3 s c1 c2 c3 s Broadcast is the default behavior Use a group as parameter, Scattered depends on rankings

27 Denis Caromel 26 Static Dispatch Group JVM ag cg c1 c2 c3 c4 c5 c6 c7 c8c0 c9c1 c2 c3 c4 c5 c6 c7 c8c0 c9 c1 c2 c3 c4 c5 c6 c7 c8c0 c9 Slowest Fastest empty queue ag.bar(cg);

28 Denis Caromel 27 Dynamic Dispatch Group JVM ag cg c1 c2 c3 c4 c5 c6 c7 c8c0 c9c1 c2 c3 c4 c5 c6 c7 c8c0 c9 c1 c2 c3 c4 c5 c6 c7 c8c0 c9 Slowest Fastest ag.bar(cg);

29 Denis Caromel 28 Handling Group Failures (2) JVM ag vg V vg = ag.foo (param) ; Group groupV = PAG.getGroup(vg); el = groupV.getExceptionList();... vg.gee(); failure Except. List

30 Denis Caromel 29 Abstractions for Parallelism The right Tool to execute the Task

31 Denis Caromel 30 Object-Oriented SPMD

32 Denis Caromel 31 OO SPMD A ag = newSPMDGroup (“A”, […], VirtualNode) // In each member myGroup.barrier (“2D”); // Global Barrier myGroup.barrier (“vertical”); // Any Barrier myGroup.barrier (“north”,”south”,“east”,“west”); A Still, not based on raw messages, but Typed Method Calls ==> Components

33 Denis Caromel 32 Object-Oriented SPMD Single Program Multiple Data Motivation Use Enterprise technology (Java, Eclipse, etc.) for Parallel Computing Able to express in Java MPI’s Collective Communications: broadcast reduce scatter allscatter gather allgather Together with Barriers, Topologies.

34 Denis Caromel 33 MPI Communication primitives For some (historical) reasons, MPI has many com. Primitives: MPI_Send StdMPI_Recv Receive MPI_Ssend SynchronousMPI_Irecv Immediate MPI_Bsend Buffer… (any) source, (any) tag, MPI_Rsend Ready MPI_Isend Immediate, async/future MPI_Ibsend, … I’d rather put the burden on the implementation, not the Programmers ! How to do adaptive implementation in that context ? Not talking about: the combinatory that occurs between send and receive the semantic problems that occur in distributed implementations

35 Denis Caromel 34 Application Semantics rather than Low-Level Architecture-Based Optimization MPI: MPI_Send MPI_Recv MPI_Ssend MPI_Irecv MPI_Bsend MPI_Rsend MPI_Isend MPI_Ibsend What we propose: High-level Information from Application Programmer (Experimented on 3D ElectroMagnetism, and Nasa Benchmarks) Examples: ro.foo ( ForgetOnSend (params) ); ActiveObject.exchange(…params ); Optimizations for Both  Distributed &  Multi-Core

36 35 NAS Parallel Benchmarks Designed by NASA to evaluate benefits of high performance systems Strongly based on CFD 5 benchmarks (kernels) to test different aspects of a system 2 categories or focus variations: communication intensive and computation intensive

37 Denis Caromel 36 Communication Intensive CG Kernel (Conjugate Gradient) Floating point operations Eigen value computation High number of unstructured communications 12000 calls/node 570 MB sent/node 1 min 32 65 % comms/WT Message density distribution Data density distribution

38 Denis Caromel 37 Communication Intensive CG Kernel (Conjugate Gradient)  Comparable Performances

39 Denis Caromel 38 Communication Intensive MG Kernel (Multi Grid) Floating point operations Solving Poisson problem Structured communications 600 calls/node 45 MB sent 1 min 32 80 % comms Message density distribution Data density distribution

40 Denis Caromel 39 Communication Intensive MG Kernel (Multi Grid) Pb. With high-rate communications 2D  3D matrix access

41 Denis Caromel 40 Computation Intensive EP Kernel (Embarrassingly Parallel) Random numbers generation Almost no communications  This is Java!!!

42 Denis Caromel 41 Related Talk at PDP 2009 T4 Session, today at 14:00 NPB-MPJ: NAS Parallel Benchmarks Implementation for Message Passing in Java Damián A. Mallón, Guillermo L. Taboada, Juan Touriño, Ramón Doallo Univ. Coruña, Spain

43 Denis Caromel 42 Parallel Components

44 Denis Caromel 43 GridCOMP Partners

45 Denis Caromel 44 Objects to Distributed Components Typed Group Java or Active Object V A Example of component instance JVM Truly Distributed Components IoC: Inversion Of Control (set in XML)

46 GCM Scopes and Objectives: Grid Codes that Compose and Deploy No programming, No Scripting, … No Pain Innovation: Abstract Deployment Composite Components Multicast and GatherCast MultiCast GatherCast

47 Denis Caromel 46 Optimizing MxN Operations 2+ composites can be involved in the Gather- multicast

48 Denis Caromel 47 Related Talk at PDP 2009 T1 Session, Yesterday at 11:30 Towards Hierarchical Management of Autonomic Components: a Case Study Marco Aldinucci, Marco Danelutto, and Peter Kilpatrick

49 Denis Caromel 48 Skeleton

50 49 Algorithmic Skeletons for Parallelism High Level Programming Model [Cole89] Hides the complexity of parallel/distributed programming Exploits nestable parallelism patterns Task Parallelism Patterns for farm pipe if while map fork divide & conquer Data fork seq(f 1 )‏ seq(f 2 )‏ pipe seq(f 3 )‏ d&c(f b,f d,f c )‏ BLAST Skeleton Program

51 50 Algorithmic Skeletons for Parallelism High Level Programming Model [Cole89] Hides the complexity of parallel/distributed programming Exploits nestable parallelism patterns Task Parallelism Patterns for farm pipe if while map fork divide & conquer Data fork seq(f 1 )‏ seq(f 2 )‏ pipe seq(f 3 )‏ d&c(f b,f d,f c )‏ BLAST Skeleton Program public boolean condition(BlastParams param){ File file = param.dbFile; return file.length() > param.maxDBSize; } public boolean condition(BlastParams param){ File file = param.dbFile; return file.length() > param.maxDBSize; }

52 Denis Caromel 51 3. Optimizing

53 Denis Caromel 52 Programming & Optimizing Monitoring, Debugging, Optimizing

54 Denis Caromel 53

55 Denis Caromel 54 Optimizing User Interface

56 Denis Caromel 55 IC2D

57 Denis Caromel 56 ChartIt

58 Denis Caromel 57 Pies for Analysis and Optimization

59 Denis Caromel 58 Video 1: IC2D Optimizing Monitoring, Debugging, Optimizing

60 Denis Caromel 59

61 Denis Caromel 60 4. Deploying & Scheduling

62 Denis Caromel 61

63 Denis Caromel 62 Deploying

64 Denis Caromel 63 Deploy on Various Kinds of Infrastructures Internet Clusters Parallel Machine Large Equipment Internet Job management for embarrassingly parallel application (e.g. SETI) Internet Servlets EJBsDatabases

65 Denis Caromel 64 GCM Standardization Fractal Based Grid Component Model 4 Standards: 1. GCM Interoperability Deployment 2. GCM Application Description 3. GCM Fractal ADL 4. GCM Management API Overall, the standardization is supported by industrials: BT, FT-Orange, Nokia-Siemens, Telefonica, NEC, Alcatel-Lucent, Huawei …

66 Denis Caromel 65 Protocols and Scheduler in GCM Deployment Standard Protocols: Rsh, ssh Oarsh, Gsissh Scheduler, and Grids: GroupSSH, GroupRSH, GroupOARSH ARC (NorduGrid), CGSP China Grid, EEGE gLITE, Fura/InnerGrid (GridSystem Inc.) GLOBUS, GridBus IBM Load Leveler, LSF, Microsoft CCS (Windows HPC Server 2008) Sun Grid Engine, OAR, PBS / Torque, PRUN Soon available in stable release: Java EE Amazon EC2

67 Denis Caromel 66 Abstract Deployment Model Problem Difficulties and lack of flexibility in deployment Avoid scripting for configuration, getting nodes, connecting … Problem Difficulties and lack of flexibility in deployment Avoid scripting for configuration, getting nodes, connecting … 66 2009 A key principle: Virtual Node (VN) Abstract Away from source code: Machines names Creation/Connection Protocols Lookup and Registry Protocols Interface with various protocols and infrastructures: Cluster: LSF, PBS, SGE, OAR and PRUN (custom protocols) Intranet P2P, LAN: intranet protocols: rsh, rlogin, ssh Grid: Globus, Web services, ssh, gsissh

68 Denis Caromel 67 Resource Virtualization 67 Application Deployment Descriptor Mapping Connections Nodes Acquisition Creation Infrastructure VN Runtime structured entities: 1 VN --> n Nodes in m JVMs on k Hosts 2009

69 Denis Caromel 68 Resource Virtualization 68 Application VN1 VN2 GCM XML Deployment Descriptor node Host JVM node JVM Host node JVM 2009

70 Denis Caromel 69 Virtualization resources 69 Application VN1 VN2 node Host JVM node JVM Host node JVM 2009

71 Denis Caromel 70 Multiple Deployments 70 One Host Local Grid Distributed Grids Internet 2009

72 Denis Caromel 71 Scheduling Mode

73 Denis Caromel 72 Programming with flows of tasks Program an application as an ordered set of tasks Logical flow : Tasks execution are orchestrated Data flow : Results are forwarded from ancestor tasks to children as parameter The task is the smallest execution unit Two types of tasks: - Standard Java - Native, i.e. any third party application (binary, scripts, etc.) 72 Task 1(input 1) Task 2(input 2) Task 3(res1,res2) res1 res2 2009

74 Denis Caromel 73 Defining and running jobs with ProActive A workflow application is a job a set of tasks which can be executed according to a dependency tree Rely on ProActive Scheduler Java or XML interface Dynamic job creation in Java Static description in XML Task failures are handled by the ProActive Scheduler A task can be automatically re-started or not (with a user-defined bound) Dependant tasks can be aborted or not The finished job contains the cause exceptions as results if any 73 2009

75 Denis Caromel 74 Scheduler / Resource Manager Overview 74 Multi-platform Graphical Client (RCP) File-based or LDAP authentication Static Workflow Job Scheduling, Native and Java tasks, Retry on Error, Priority Policy, Configuration Scripts,… Dynamic and Static node sources, Resource Selection by script, Monitoring and Control GUI,… ProActive Deployment capabilities : Desktops, Clusters, ProActive P2P,… 2009

76 Denis Caromel 75 Scheduler: User Interface

77 Denis Caromel 76 Scheduler: User Interface

78 Denis Caromel 77 Video 2: Scheduling Scheduler and Resource Manager: See the video at http://proactive.inria.fr/userfiles/media/video s/Scheduler_RM_Short.mpg

79 Denis Caromel 78

80 Denis Caromel 79 Summary

81 Denis Caromel 80 Current Open Source Tools: Acceleration Toolkit : Concurrency +Parallelism +Distribution

82 Denis Caromel 81 Conclusion  Applications:  3D Electromagnetism SPMD on 300 machines at once  Groups of over 1000!  Record: 4 000 Nodes ► Summary:  Programming: OO  Asynchrony, First-Class Futures, No Sharing  Higher-level Abstractions (SPMD, Skel., …)  Composing: Hierarchical Components  Optimizing:IC2D Eclipse GUI  Deploying: ssh, Globus, LSF, PBS, …, WS Our next target: 10 000 Core Applications

83 Denis Caromel 82 Conclusion: Why does it scale? Thanks to a few key features: Connection-less, RMI+JMS unified Messages rather than long-living interactions

84 Denis Caromel 83 Conclusion: Why does it Compose? Thanks to a few key features: Because it Scales: asynchrony ! Because it is Typed: RMI with interfaces ! First-Class Futures: No unstructured Call Backs and Ports

85 Denis Caromel 84 Perspectives for Parallelism & Distribution A need for several, coherent, Programming Models for different applications: –Actors (Functional Parallelism) + Active Objects + Futures –OO SPMD: optimizations away from low-level optimizations –Parallel Component: Codes and Synchronizations: MultiCast GatherCast: Capturing // Behavior at Interfaces! –Adaptive Parallel Skeletons –Event Processing (Reactive Systems) Efficient Implementations are needed to prove Ideas! Proofs of Programming Model Properties Needed for Scalability! Our Community never had a greatest Future! Thank You for your attention!

86 Denis Caromel 85

87 Denis Caromel 86 Extra Material Grid + SOA J2EE + Grid Amazon EC2 ProActive Image and Deployment

88 Denis Caromel 87 Perspective: SOA + Grid

89 Denis Caromel 88 SOA Integration: Web Services, BPEL Workflow

90 Denis Caromel 89 Active Objects as Web Services Why ? Access Active Objects from any language How ? HTTP Server SOAP Engine (Axis) Usage: ProActive.exposeAsWebService(); ProActive.unExposeAsWebService(); JVM Web Service Client Web Services

91 Denis Caromel 90 ProActive + Services + Workflows Principles: 3 kinds of Parallel Services 3. Domain Specific Parallel Services (e.g. Monte Carlo Pricing) 2. Typical Parallel Computing Services (Parameter Sweeping, D&C, …) 1.Basic Job Scheduling Services (parallel execution on the Grid)

92 Denis Caromel 91 3 kinds of Parallel Services 3. Domain Specific Parallel Services: providing business functionalities executed in parallel 2. Parallelization services: typical parallel computing patterns (Parameter Sweeping, D&C, …) 1. Job Scheduling service: Schedule and Run jobs in parallel on the Grid. Parallel Services Operational Services … 1. Job Scheduling Service 2. Parameter Sweeping Service 2. Divide & Conquer Service Other Operational Service … Domain Specific Parallel Service … Other Basic Service High level Business ProcessGrid 3. Domain Specific Parallel Service

93 Denis Caromel 92 A sample pattern: Parameter Sweeping I1 I2 … In O1 O2 … On Exec Logic parameter sweeping I1 I2 … In O1 O2 … On Parameter Sweeping Service, customized with an Exec logic X Param Sweeping Service Process using parameter sweeping service … All the running instances of the Exec logic X are executed on the grid as a whole PA Scheduler & PA Resource Manager

94 Denis Caromel 93 Video SOA Integration: Web Services, BPEL Workflow

95 Denis Caromel 94 6. J2EE Integration Florin Alexandru Bratu OASIS Team - INRIA

96 Denis Caromel 95 J2EE Integration with Parallelism + Grids/Clouds Performing Grid & Cloud Computing From & In an Application Servers 1.Delegating heavy computations outsides J2EE Applications 2.Using Deployed J2EE Nodes as Computational Resources

97 Denis Caromel 96 ProActive – J2EE Integration (1) 1.Delegating heavy computations outsides J2EE Applications

98 Denis Caromel 97 ProActive – J2EE Integration (2) 2. Using Deployed J2EE Nodes as Computational Resources Objective Being able to deploy active objects inside the JVMs of application servers Implementation Based on a Sun standard – Java Connector Architecture JSR112 Deployment module: resource adapter (RAR)‏ Works with all J2EE-compliant Application Servers

99 Integration(2)‏

100 Denis Caromel 99 Grids & Clouds: Amazon EC2 Deployment

101 Denis Caromel 100 Big Picture + Clouds

102 Denis Caromel 101 Clouds: ProActive Amazon EC2 Deployment Principles & Achievements: ProActive Amazon Images (AMI) on EC2 So far up to 128 EC2 Instances (Indeed the maximum on the EC2 platform, … ready to try 4 000 AMI) Seamless Deployment: no application change, no scripting, no pain Open the road to : In house Enterprise Cluster and Grid + Scale out on EC2

103 Denis Caromel 102 ProActive Deployment on Amazon EC2 Video

104 Denis Caromel 103

105 Denis Caromel 104 P2P: Programming Models on Overlay Networks

106 Denis Caromel 105

107 Denis Caromel 106


Download ppt "Denis Caromel 1 A Strong Programming Model Bridging Distributed and Multi-Core Computing 1.Background: INRIA, Univ. Nice, OASIS."

Similar presentations


Ads by Google