Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.openfabrics.org Open MPI Project State of the Union - April 2007 Jeff Squyres Cisco, Inc.

Similar presentations


Presentation on theme: "Www.openfabrics.org Open MPI Project State of the Union - April 2007 Jeff Squyres Cisco, Inc."— Presentation transcript:

1 www.openfabrics.org Open MPI Project State of the Union - April 2007 Jeff Squyres Cisco, Inc.

2 2 www.openfabrics.org Overview  Project purpose  Sub projects  Current status  Continuing / future directions

3 3 www.openfabrics.org Why does Open MPI exist?  Maximize all MPI expertise  Research / academia  Industry  …elsewhere  Capitalize on [literally] years of MPI research and implementation experience  The sum is greater than the parts Research / academia Industry

4 4 www.openfabrics.org Why separate from M[VA]PICH?  Open, inclusive community  Not limited to just Open Fabrics  Common: TCP, shared memory, OFED* (MVAPICH only)  OMPI-specific: Myrinet, Portals, InfiniPath  M[VA]PICH have different project goals  They both chose to remain separate

5 5 www.openfabrics.org Current membership  14 members, 6 contributors  4 US DOE labs  8 universities  7 vendors  1 individual

6 6 www.openfabrics.org Not-so-subtle hint  …would love to see an iWARP vendor in the list!  (please come talk to me!)

7 7 www.openfabrics.org Current projects  “Open MPI Project” is an umbrella organization for multiple projects  OMPI:Open MPI  ORTE:Open Run-Time Environment  PLPA:Portable Linux Processor Affinity  MTT:MPI (Middleware) Testing Tool

8 8 www.openfabrics.org Project: Open MPI / ORTE  Recently released new 1.2 series  OF-related changes compared to v1.1 series  Better overall performance, lots of bug fixes  Improvements for run-time/launch scalability  Relocate installed MPI (good for ISVs)  Support for fork() with OFED 1.2  Support fixed limits for registered memory  Fixes for heterogeneous network environments  Native InfiniPath support

9 9 www.openfabrics.org Version history

10 10 www.openfabrics.org Success stories  OFED + Open MPI  Thunderbird Sandia cluster #6 in Top 500  Road Runner Los Alamos cluster 16k Opteron cores + 16k cell broadband engines  Coyote Los Alamos cluster 2580 Opteron cores  Sun ClusterTools v7

11 11 www.openfabrics.org OFED involvement  Initially planned on “v1.2ofed”  Included some OF- specific updates  But community released v1.2.1 before OFED 1.2  Therefore, included community OMPI v1.2.1 release in OFED v1.2 OMPI SVN development trunk v1.2 series branch v1.2 v1.2ofed v1.2.1 Today

12 12 www.openfabrics.org OFED involvement  “MPI Selector”  Menu-based and CLI commands  Trivially set system-wide and per-user default MPI selection  No editing of “dot” files necessary  Displays / select between all installed MPI’s  Works with all MPI’s  Including HP MPI and Intel MPI

13 13 www.openfabrics.org Ongoing OFA-related work  More flexible OF wireup schemes  Heterogeneous networking scenarios  Multiple QP’s per connection  More flexible resource affinity schemes  Processor / core, HCA / port  Automatic path migration  RDMA CM functionality  Better LMC / multi-LID routing

14 14 www.openfabrics.org Ongoing OFA-related work  Message coalescing  Asynchronous progress  Exploit new Mellanox HCA capabilities  Better utilization of network resources  Heterogeneity  Multicast, UD

15 15 www.openfabrics.org Roadmap  1.2 series is current stable  v1.2.1 latest release  1.3 series tentatively targeted at end of year  Checkpoint / restart (and other FT)  Integration with debuggers  Windows support (*)  MPI collectives performance improvements  LSF integration

16 16 www.openfabrics.org Project: Processor affinity (PLPA)  Linux API for affinity has changed 3 times  Changed number and type of arguments  Used same function name (!)  Both kernel and glibc functions  Installed glibc may not match kernel!  Affinity is critical for performance  Especially with increasing core count per host  Already critical on NUMA machines (locality!)

17 17 www.openfabrics.org Which API to use?  Compile-time solution not sufficient  Need complex “configure” script to figure it out  Only determines glibc API, not kernel API, so it may not even be sufficient  Does not help for shipping static binaries (ISVs)  Need a run-time solution  Paul Hargrove (LBNL) devised safe kernel probe  PLPA library born

18 18 www.openfabrics.org PLPA library  Constant API suitable for ISVs  BSD license  Automatically performs the run-time probe  Dispatches to correct back-end kernel function  Bypasses glibc

19 19 www.openfabrics.org Current status  Releases:  Stable series: 1.0.x  Upcoming series: 1.1  New 1.1 features  Topology information Mapping between (socket,core) tuple, hardware threads, CPU node, and Linux processor ID’s  plpa-taskset(1) command Same as taskset(1), but groks topology information

20 20 www.openfabrics.org Project: MPI Testing Tool (MTT)  Could be named “Middleware Testing Tool”  Very little (no?) MPI-specific  Not specific to Open MPI  Has been used with LAM, MPICH2, MVAPICH2  Used as primary test mechanism for OMPI  Distributed testing by member organizations

21 21 www.openfabrics.org Open MPI MTT Usage  Distributed regression testing  Nightly and weekend runs  Results e-mailed every weekday morning  Supports various resource managers  Supports correctness and performance tests  Cornerstone of Open MPI release process  Each member tests the platforms they care about

22 22 www.openfabrics.org Indiana U. server Member site Tarball Nightly Regression Testing  Nightly tarballs created at Indiana U.  Member sites  Download tarball and tests  Compile and run tests  Members upload results to central DB  E-mail sent at 12 and 24 hour intervals  Real-time web querying

23 23 www.openfabrics.org Indiana U. server Member site Tarball DB Nightly Regression Testing  Nightly tarballs created at Indiana U.  Member sites  Download tarball and tests  Compile and run tests  Members upload results to central DB  E-mail sent at 12 and 24 hour intervals  Real-time web querying

24 24 www.openfabrics.org Indiana U. server Tarball DB Nightly Regression Testing  Nightly tarballs created at Indiana U.  Member sites  Download tarball and tests  Compile and run tests  Members upload results to central DB  E-mail sent at 12 and 24 hour intervals  Real-time web querying

25 25 www.openfabrics.org Usage in Open MPI  Currently available to all OMPI members  Strongly “encouraged”  E-mail results examined every day  12 and 24 hour windows  Weekend windows  MTT software to be released publicly later this year

26 26 www.openfabrics.org The Open MPI Project  More than just MPI  Concerned with real-world HPC  Open community  Come join us!  Solid OpenFabrics support is critical  Many unanswered questions  Plenty of room for academic and industry ongoing work

27 www.openfabrics.org Thank You http://www.open-mpi.org/


Download ppt "Www.openfabrics.org Open MPI Project State of the Union - April 2007 Jeff Squyres Cisco, Inc."

Similar presentations


Ads by Google