Download presentation
Presentation is loading. Please wait.
Published byFlora Miller Modified over 9 years ago
1
Open-MPI Open source project case study Adam Lev & Alex Margolin
2
Contents What is MPI What is Open-MPI Open-MPI as an open source project How Open-MPI works
3
What is MPI MPI – Message Passing Interface Distributed application methodology (vs. SM) and Interface (de-facto standard) Much like STL – A good implementation of common operations for distributed apps Portable, scalable, widespread, includes optimized algorithms... In short – probably better then what you'd write by yourself!
4
Example #include "mpi.h" #include int main (int argc, char** argv) { int index, rank, np, comm; MPI::Init(argc, argv); np = MPI::COMM_WORLD.Get_size(); rank = MPI::COMM_WORLD.Get_rank(); for (index = 0; (index < rank); index++) { MPI::COMM_WORLD.Recv(&comm, 1, MPI::INT, index, rank); std::cout<<"#"<<rank-1<<" Got "<<comm<<" from "<<index<<"\n";} comm = rank; for (index = rank + 1; (index < np); index++) { MPI::COMM_WORLD.Send(&comm, 1, MPI::INT, index, index); } MPI::Finalize(); }
5
Example $ ~/huji/mpi/bin/mpirun -np 5 simple #3 Got 0 from 0 #2 Got 0 from 0 #0 Got 0 from 0 #2 Got 1 from 1 #3 Got 1 from 1 #1 Got 0 from 0 #1 Got 1 from 1 #3 Got 2 from 2 #2 Got 2 from 2 #3 Got 3 from 3
6
What is Open-MPI An MPI-2 Implementation (much like gcc & icc are implementations of a C compiler) Main components (from user perspective): –mpicc –Compiles code against Open-MPI code –mpirun -Run parallel instances of executables Other bindings also available: –mpicxx, mpif77, mpif90... An open source project!
7
Project Background “Born” in 2003, over HPC conferences A “proof-of-concept” by 2004 Released by 2005 Joint efforts: –T-MPI (U. of Tennessee) –LA-MPI (Los Alamos) –LAM/MPI (Indiana U.) –PACX-MPI (HLRS, U. Stuttgart)
8
Written from scratch Each project had different strong points –Hard to integrate into one code base New concepts didn't fit into the old code Easier to start over –A fresh start (no legacy burden...) –Decades of combined MPI implementation experience
9
Prior work, “Legacy code” All kept in “maintenance mode” –Can't abandon existing user bases –New releases (if any) for critical bug fixes –(Vast) Majority of time spent on Open-MPI All major features being (slowly) rolled into Open-MPI
10
Open-MPI developers Founders –HPC Center, Stuttgart –Indiana University –Los Alamos National Lab –The University of Tennessee Recent additions –Cisco Systems –Mellanox Technologies –Sun Microsystems –University of Houston –Voltaire
11
Project coordination Each organiztion: –Shares some common goals –Has its own, different goals –... but that's not necessarily a problem! Open-MPI represents the priorities of it's current members –... and new members keep joining.
12
Project goals Next generation MPI(2) Implementation Prevent (reduce) forking –Community / 3 rd party involvement –Production quality research platform –Rapid deployment for new platforms Open-Source (communities, BSD license) An MPI that “just works” (user-friendliness) Portable performance
13
Design goals Extend/Enhance previous ideas –MCA (Modular Component Architecture) –Support of heterogeneous environments –Error detection, fault tolerance Design for dynamic environment –Robustness, dynamic resources and demands Portable efficiency on any parallel resource
14
Implementation goals All of MPI-2 standard Optimized performance –Low latency, high bandwidth Production quality Thread safety and concurrency Based on a component framework (flexible) Natively support commodity networks: –TCP, SM, Infiniband, Myrinet, etc.
15
Operating systems Current: Linux, OS X (BSD) Not frequently tested: Solaris (sun), AIX (IBM) Development: MS Windows (surprised?) Future (maybe?): HP/UX, IRIX Majority of Open-MPI is POSIX C (portable) Segregate specific OS functionality –Plug-ins
16
Legal stuff (just a little, promise) Commit access requires legal paperwork Must allow redistribution under BSD license Add your name to each file you edit “External code” must be carefully reviewed These are all “Rules of thumb” –When in doubt – Ask! –When in trouble – consult a real lawyer...
17
Open source project OpenMPI is open source –Forks are possible, but discouraged Doesn't ban closed source –May be distributed separately, as plug-ins Strong relationship with open source community –Open repository, mailing lists, development... Working with and for the HPC community
18
Documentation Initially got abandoned and outdated... Now “rediscovered” using Doxygen User and Installation guides – skeletons Man pages: mostly “mpirun.1”... Best sources: web (FAQ) and mailing-lists In one word: Insufficient (own experience)
19
(Human) Communication Public/Private mailing-lists Weekly teleconference Phones, Instant messaging Quarterly face-to-face meetings
20
Standards and conventions Style: 4 space tabs, upper case macros... Correctness: (Null == x), foo(void) { bar() } Use #ifndef-#define, #include ompi/... Avoid gcc warnings at all cost Prefixes: ompi_btl_tcp_foo = 5 MPI API invocation disallowed (use internal APIs)
21
Version control /trunk is free-for-all (don't break it!) Releases are forked from the trunk /tmp for developing/deleting/breaking code Nightly tests compile and run snapshots Version numbers: –Major.Minor.Release[Qualifier]: 4.5.6rc2r9849 –Qualifier: a/b/rc/r X – alpha/beta/release candidate/Subversion number
23
Top-level architecture Three main code sections: –Open MPI layer (OMPI): Top level MPI(2) API and supporting logic –Open run-time environment (ORTE): Interface to back-end runtime system –Open portability access layer (OPAL): Utility code (lists. Reference counting, etc.) Dependencies, not layers –OMPI => ORTE => OPAL –Strict abstraction barriers
24
OMPI All MPI semantics –Groups, communicators, datatypes, etc. Heavily optimized –Most of the research results are here...
26
MCA “The Modular Component Architecture (MCA) is the foundation upon which the entire Open MPI project is built.” open-mpi.org FAQ Top-level architecture for component services Find, load, unload components
27
Frameworks Frameworks are sets of components with the same functionality and uniform interface: Targeted set of functionality Defined interfaces E.g., MPI point-to-point, high resolution timers
28
Components & Modules Components –Think “plugins”! –Code that exports a specific interface. –Loaded / unloaded at run-time Modules –A component paired with resources –Modules have private state; components do not.
29
MCA top level view Comp. … … … …
30
Byte Transfer Layer (BTL) An easy example framework to discuss is the MPI framework named "btl", or the Byte Transfer Layer. BTL is used to communicate via different kinds of networks. Hence, Open MPI has btl components for shared memory, TCP, Infiniband, Myrinetc, etc. if a node running an Open MPI application has multiple ethernet NICs, the Open MPI application will contain one TCP btl component, but two TCP btl modules.
31
Organized by directory /mca/ / –Section = opal, orte, ompi –Framework = framework name, or “base” –Component= component name, or “base” Example –Ompi/mca/btl/tcp
32
Why components? 3 rd parties can develop /distribute –OMPI development for the community –As source or as binary (open vs. close source) Small, discrete chunks of code –Good for learning / new developers –Easier to maintain and extend Run-Time decisions (vs. compile time)
33
Component/ Module lifecycle Component Open : per-process initialization Selection: per-scope, determine if want to use Close: per- process finalization Module Initialization: if component selected Normal usage. Finalization, per-scope cleanup Comp. module Comp.
34
Creating a framework Choose a name Create /mca/ / Create /mca/ /base/ define interface in /mca/ /.h Create functions for framework to open/initialize/close components Re-run autogen.sh
35
Create some directories Create basic directory structure under /mca – – /base/ Add Makefile.am file in /, probobly copying from another framework. Library name must be specified as libmca_.la
36
Framework header Framework header: – /mca/ /.h Need to define two structures –mca_ _base_module_t –mca_ _base_component_t Quick example..
37
Creating a component Choose a name Create directory: /mca/ / / Provide build system information Provide component structure of known name Run autogen.sh
38
Component build system Three choices for how to configure component –no-configure: component always built –configure.m4: component provides macro to run in main configure script –configure.stub: component will have its own configure script, run from main configure First two are preferred – depends on what you are building. Another two files that the system needs is config.params and Makefile.am.
39
Sounds pretty complicated Well, it is. However, copying from another component is recommended path –Just make sure to change all the variable names… –And deleted unwanted files like.ignore When in doubt, there are always the mailing lists to help you out
40
OPAL Utilities for making your life easier Utilities for portably interacting with the OS C-Based object management system Rich set of container classes –Lists –Hash Tables
41
Utility code Actual, real documentation! Opal/util/*.[h,c] Lots of compatibility code Useful “add-on” code –Network device listings (if.h) –Manipulate argv arrays (argv.h) –printf debugging code (output.h) –Error reporting (show_help.h)
42
Nice debugging and error messages Functions to emit debugging/ error messages to stderr, stdout, file, syslog, etc. –Opal_output(0,”hello, world”); –Opal_output_verbose(0, 10, “debugging..”); Auto print of rich, detailed error messages for common user errors. Error messages in text files rather then in source code.
43
Object system C-style reference counting object system Single inheritance Statically or dynamically allocate objects Constructors/ destructors associated with each object instance
44
Threads Generic interface for PTHREADS, Solaris and windows native. Support for: –Thread manipulation –Mutexes –Condition variables Mutexes support either OS locks or atomic locks (choose your weapons carefully)
45
OPAL Lowest layer in the Open-MPI structure Most OS/system-specific code goes here: –Assembly code (Hardware dependent) –Processor/Memory affinity –High resolution timers “Glue code” –OBJ macros, utility classes
46
ORTE Run-time environment support –Hook to back-end resource managers, etc. –Process discovery/allocation/launch –I/O Forwarding –Other functionalities that OPAL doesn't provide General purpose registry Messaging (not high-performance)
47
ORTE Screen clipping taken: 4/30/2009, 4:55 PM
48
ORTE objectives
49
ORTE arcitectute
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.