SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.

Slides:



Advertisements
Similar presentations
National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
Advertisements

Three types of remote process invocation
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
SCIP Optimization Suite
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
An Introduction to Hashing. By: Sara Kennedy Presented: November 1, 2002.
SALSA HPC Group School of Informatics and Computing Indiana University.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
Types of Parallel Computers
SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
V4 – Executive Summary 1.Provide online add/delete of I/O to support continuous operation. 2.Provide redundant control of remote I/O to support improved.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Grid IO APIs William Gropp Mathematics and Computer Science Division.
Patch Management Module 13. Module You Are Here VMware vSphere 4.1: Install, Configure, Manage – Revision A Operations vSphere Environment Introduction.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
© 2010 VMware Inc. All rights reserved Patch Management Module 13.
Obsydian OLE Automation Ranjit Sahota Chief Architect Obsydian Development Ranjit Sahota Chief Architect Obsydian Development.
SRM at Clemson Michael Fenn. What is a Storage Element? Provides grid-accessible storage space. Is accessible to applications running on OSG through either.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
SDM Center February 2, 2005 Progress on MPI-IO Access to Mass Storage System Using a Storage Resource Manager Ekow J. Otoo, Arie Shoshani and Alex Sim.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
Project 2 Presentation & Demo Course: Distributed Systems By Pooja Singhal 11/22/
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
SALSA HPC Group School of Informatics and Computing Indiana University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Lattice QCD Data Grid Middleware: status report M. Sato, CCS, University of Tsukuba ILDG6, May, 12, 2005.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Ch. 5 Ch. 51 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (more notes) Dr. Carter Tiernan.
Hashing Hashing is another method for sorting and searching data.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
1 Grid File Replication using Storage Resource Management Presented By Alex Sim Contributors: JLAB: Bryan Hess, Andy Kowalski Fermi: Don Petravick, Timur.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
JL MONGE - Chimere software evolution 1 Chimere software evolution How to make a code : –Obscure –Less efficient –Usable by nobody but its author.
SDM Center Coupling Parallel IO to SRMs for Remote Data Access Ekow Otoo, Arie Shoshani and Alex Sim Lawrence Berkeley National Laboratory.
1 P-GRADE Portal tutorial at EGEE’09 Introduction to hands-on Gergely Sipos MTA SZTAKI EGEE.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
A. Sim, CRD, L B N L 1 SRM Collaboration Meeting, Sep , 2005 SRM v3.0 LBNL Implementation Status Report Scientific Data Management Research Group.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Locality-driven High-level I/O Aggregation
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
CS6604 Digital Libraries IDEAL Webpages Presented by
SDM workshop Strawman report History and Progress and Goal.
Database Design and Programming
File-System Interface
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
Presentation transcript:

SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.

SDM Center SciDAC All Hands Meeting October’052 Outline  Project objectives  Status and accomplishments  Usage in an application  Extensions  Other future work

SDM Center SciDAC All Hands Meeting October’053 Project Objectives Development of the MpiioSrm library  mpiiosrm.h libmpiiosrm.a, libmpiiosrm.so Allows near-online access to files on mass storage system (e.g., HPSS), from MPI applications on a linux cluster Access files from local and remote MSS with MPI applications. Applications on a Linux cluster having  Local parallel file system (PVFS2) and  HPSS as the remote mass storage system

SDM Center SciDAC All Hands Meeting October’054 Status – 1 Libmpiiosrm Module Dependencies MPI Applications MPI-IOSRM PVFS2GPFSUFSXFSOther BDBOther ADIO HPSS SAM Jasmin Castor High Level Access and Control Record Structured File Access Low Level File System Access pNetCDF HDF5 mpiiosrm

SDM Center SciDAC All Hands Meeting October’055 Status – 2 Main Functions Functions in libmpiiosrm.a: 1.MPI_File_srm_proxy_init(); 2.MPI_File_srm_open(); [in place of MPI_File_open()] 3.MPI_File_srm_close(); [in place of MPI_File_close()] 4.MPI_File_srm_delete(); [in place of MPI_File_delete()] 5.MPI_File_srm_proxy_destroy() Functions (2) and (4) take a file name as one of its parameters. Note name changes from last meeting.

SDM Center SciDAC All Hands Meeting October’056 Status – 3 Major Changes Since Last AHM Function names revised MPI_File_srm_proxy_init() function starts an SRM client as a detached thread.  Only the process with the proxy_rank spawns this thread. Use of PVFS2 srm_put() implemented for MPI_File_writes, i.e.,  Files can now migrate from PVFS2 to HPSS.  Still being tested.

SDM Center SciDAC All Hands Meeting October’057 Usage in an Application Steps for reading remote files:  Prepare an input file for the program A file containing the file names to be read from HPSS if not found in local parallel file system.  Initiate grid-proxy-init(); password, etc. User requires a grid certificate  Start a namesrver, drmServer and a trmServer  Compile the program to be executed  Run “mpiexec –n XX ” to access files given in the input file.

SDM Center SciDAC All Hands Meeting October’058 Usage in an Application Input file Layout Implicit layout of parallel files  Uses default PVFS configuration  Alternatively use keys of MPI-IO File hints  Specify only pairs of source and destination URLs Explicit layout specifies for each file:  Pairs of source and destination URL  Start_IO_Node  Striping factor  Striping unit

SDM Center SciDAC All Hands Meeting October’059 Usage in an Application Program Skeleton … MPI_Init(); … MPI_Info_create(); MPI_Info_set(); … MPI_File_srm_proxy_init(); … MPI_File_srm_open(); … … MPI_File_srm_close(); MPI_File_srm_proxy_destroy(); … MPI_Finalize();

SDM Center SciDAC All Hands Meeting October’0510 Extensions MPI Applications Srm- -Client HPSS SAM Jasmin Castor DRMTRM Srm-Server MPI-IO File Control Data vs File Access Multi-site Access Fault tolerance & Failsafe

SDM Center SciDAC All Hands Meeting October’0511 Extensions - 2 Control of Prefetching of File Bundles  Process files, one at a time, by availability  Process files, one at a time, by sequence  Process files by bundles Data Access instead of File Access only  Allow for file filtering at the source SRM  Use of select criteria and indexes to generate only relevant data

SDM Center SciDAC All Hands Meeting October’0512 Extension - 3 Multi-Site Access  Extend access to other MSS implementing SRM specs.  Access files from multiple sites in a session  Extensions to Xrootd servers Fault Tolerance and Failsafe Operations  Easier now with multiple srm_client proxies being spawn as threads Access from C++ and Fortran

SDM Center SciDAC All Hands Meeting October’0513 Other Future Work Parallel Multidimensional Index Schemes  Repertoire of high and low dimensional indexing methods for scientific applications High dimensions:  Bitmaps (John, Kurt, etc)  Others Low Dimensions (1 ~ 8)  R-Tree, Order Preserving Extendible hashing,  Multi-level Grid File  String Searching Methods – Suffix trees, PATRICIA, etc.

SDM Center SciDAC All Hands Meeting October’0514 Other Future Work - cont. Extendible Multidimensional Array Files  With extendibility in all dimensions, not just one  For both dense arrays and sparse arrays  Efficiently accessible in MPI with irregular distributed array method using map arrays.  Multi-resolution array files

SDM Center SciDAC All Hands Meeting October’0515 Other Proposed Activities cont. Array Mapping Method for k dims, A[N]…[N] Element Access Ops with E extensions and constant k. Storage Size, element size s, integer size w Conventional Method, Extendible in 1 dimension only O(1)w*k + s*N k Index Array, Extendible in any dimensions O(1)w*N*k*(k+1) + s*N k Index Array Tree, Extendible in any dimensions O(ln E)w*((k+6)*E - 3) + s*N k

SDM Center SciDAC All Hands Meeting October’0516 Example of Mapping Function i0i0 i1i < 35 5 < < >= Red-Black-Like Binary Tree

SDM Center SciDAC All Hands Meeting October’0517 The End