Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer P. Balaji, W. Feng, H. Lin, J. Archuleta, S. Matsuoka, A. Warren, J. Setubal,

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.
Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems A. Chan, P. Balaji, W. Gropp, R. Thakur Math. and Computer.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Big Data A big step towards innovation, competition and productivity.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
A Scalable Framework for the Collaborative Annotation of Live Data Streams Thesis Proposal Tao Huang
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
GePSeA: A General Purpose Software Acceleration Framework for Lightweight Task Offloading Ajeet SinghPavan BalajiWu-chun Feng Dept. of Computer Science,
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Descriptive Data Analysis of File Transfer Data Sudarshan Srinivasan Victor Hazlewood Gregory D. Peterson.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Impact of Network Sharing in Multi-core Architectures G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech Mathematics and Comp.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Semantics-based Distributed I/O with the ParaMEDIC Framework P. Balaji, W. Feng, H. Lin Math. and Computer Science, Argonne National Laboratory Computer.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Electronic visualization laboratory, university of illinois at chicago A Case for UDP Offload Engines in LambdaGrids Venkatram Vishwanath, Jason Leigh.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Semantics-based Distributed I/O for mpiBLAST P. Balaji ά, W. Feng β, J. Archuleta β, H. Lin δ, R. Kettimuthu ά, R. Thakur ά and X. Ma δ ά Argonne National.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,
Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Clouds , Grids and Clusters
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
SCALABLE OPEN ACCESS Hussein Suleman
Hadoop Technopoints.
Presentation transcript:

Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer P. Balaji, W. Feng, H. Lin, J. Archuleta, S. Matsuoka, A. Warren, J. Setubal, E. Lusk, R. Thakur, I. Foster, D. S. Katz, S. Jha, K. Shinpaugh, S. Coghlan, D. Reed Math. and Computer Science, Argonne National Laboratory Computer Science and Engg., Virginia Tech Dept. of Computer Sci., North Carolina State University Dept. of Math. And Computing Sci, Tokyo Inst. of Technology Virginia Bioinformatics Institute, Virginia Tech Center for Computation and Tech., Louisiana State University Scalable Computing and Multicore Division, Microsoft Research

Pavan Balaji, Argonne National Laboratory Distributed Computation and I/O Growth of combined compute and I/O requirements –E.g., Genomic sequence search, Large-scale data mining, data visual analytics and communication profiling –Commonality: Require a lot of compute power and use and generate a lot of data Data has to be managed for later processing or archival Managing large data volumes: Distributed I/O –Non-local access to large compute systems Data generated remotely and transferred to local systems –Resource locality: Applications need compute and storage Data generated at one site and moved to another ISC '08

Pavan Balaji, Argonne National Laboratory Distributed I/O: The Necessary Evil Lot of prior research tries to improve distributed I/O Continues to be the elusive holy grail –Not everyone has a lambda grid Scientists run jobs on large centers from their local system –There is just too much data! Very difficult to achieve high performance for “real data” [1] Bandwidth is not everything –Real software requires synchronization (milliseconds) –High-speed TCP eats up memory – slows down applications –Data encryption or endianness conversion required in some cases –Solution: FEDEX ! ISC '08 [1] “Wide Area Filesystem Performance Using Lustre on the Teragrid”, S. Simms, G. Pike, D. Balog. Teragrid Conference, 2007

Pavan Balaji, Argonne National Laboratory Presentation Outline Distributed I/O on the WAN Genomic Sequence Search on the Grid ParaMEDIC: Framework to Decouple Compute and I/O ParaMEDIC on a Worldwide Supercomputer Experimental Results Concluding Remarks ISC '08

Pavan Balaji, Argonne National Laboratory Why is Sequence Search So Important? ISC '08

Pavan Balaji, Argonne National Laboratory Challenges in Sequence Search Genome database size doubles every 12 months –Compute power doubles months Consequence: –Compute time to search this database increases –Amount of data generated increases Parallel Sequence search helps with computational requirements –E.g., mpiBLAST, ScalaBLAST ISC '08

Pavan Balaji, Argonne National Laboratory Large-scale Sequence Search: Reason 1 The Case of the Missing Genes –Problem: Most current genes have been detected by a gene-finder program, which can miss real genes –Approach: Every possible location along a genome should be checked for the presence of genes –Solution: All-to-all sequence search of all 567 microbial genomes that have been completed to date … but requires more resources than can be traditionally found at a single supercomputer center 2.63 x sequence searches! ISC '08

Pavan Balaji, Argonne National Laboratory Large-scale Sequence Search: Reason 2 The Search for a Genome Similarity Tree –Problem: Genome databases are stored as an unstructured collection of sequences in a flat ASCII file –Approach: Correlate sequences by matching each sequence with every other –Solution Use results from all-to-all sequence search to create genome similarity tree … but requires more resources than can be traditionally found at a single supercomputer center –Level 1: 250 matches; Level 2: = 62,500 matches; Level 3: = 15,625,000 matches … ISC '08

Pavan Balaji, Argonne National Laboratory Genomic Sequence Search on the Grid All-to-all sequence search for microbial genomes –Potential to solve many unsolved problems –Resource requirements shoots out of the roof top Compute: 263 trillion sequence searches Storage: Can generate more than a petabyte of data Plan: –Use a distributed supercomputer taking compute resources from multiple supercomputing centers –Store output data in a storage center for later processing Using distributed compute resources is easy (relatively) Storing a petabyte of data remotely? ISC '08

Pavan Balaji, Argonne National Laboratory Presentation Outline Distributed I/O on the WAN Genomic Sequence Search on the Grid ParaMEDIC: Framework to Decouple Compute and I/O ParaMEDIC on a Worldwide Supercomputer Experimental Results Concluding Remarks ISC '08

Pavan Balaji, Argonne National Laboratory ParaMEDIC Overview ParaMEDIC: Parallel Meta-data Environment for Distributed I/O and Computing [2] Transforms output to application-specific “meta-data” –Application generates output data –ParaMEDIC takes over: Transforms output to (orders-of-magnitude smaller) application-specific meta-data at the compute site Transports meta-data over the WAN to the storage site Transforms meta-data back to the original data at the storage site (host site for the global file-system) –Similar to compression, yet different Deals with data as abstract objects, not as a byte-stream ISC '08 [2] “Semantics-based Distributed I/O with the ParaMEDIC Framework”, P. Balaji, W. Feng and H. Lin. IEEE International Conference on High Performance Distributed Computing (HPDC), 2008

Pavan Balaji, Argonne National Laboratory The ParaMEDIC Framework ISC '08 Applications mpiBLAST Communication Profiling Remote Visualization ParaMEDIC Data Tools Data Encryption Data Integrity Communication Services Direct Network Global Filesystem Application Plugins mpiBLAST Plugin Communication Profiling Plugin Basic Compression ParaMEDIC API (PMAPI) Other Utilities Column Parsing Data Sorting

Pavan Balaji, Argonne National Laboratory Tradeoffs in the ParaMEDIC Framework Trading Computation and I/O –More computation: Converting output to meta-data and back requires extra work –Lesser I/O: Only meta-data is transferred over the WAN, so lesser bandwidth usage on the WAN –But, well, computation is free; I/O is not ! Trading Portability and Performance –Utility functions help develop application plugins, but will always need non-zero effort –Data is dealt has high-level objects: Better chance of improved performance ISC '08

Pavan Balaji, Argonne National Laboratory Sequence Search with mpiBLAST ISC '08 Query Sequences Database Sequences Output Sequential Search of Queries Parallel Search of Queries Query Sequences Database Sequences Output

Pavan Balaji, Argonne National Laboratory mpiBLAST Meta-Data ISC '08 Query Sequences Database Sequences Output Alignment information for a bunch of sequences Alignment of two sequences is independent of the remaining sequences Meta-data (IDs of matched sequences) Communicate over the WAN Query Sequences Temporary Database Sequences

Pavan Balaji, Argonne National Laboratory ParaMEDIC-powered mpiBLAST The ParaMEDIC Framework Compute Sites Storage Site WAN ISC '08

Pavan Balaji, Argonne National Laboratory Presentation Outline Distributed I/O on the WAN Genomic Sequence Search on the Grid ParaMEDIC: Framework to Decouple Compute and I/O ParaMEDIC on a Worldwide Supercomputer Experimental Results Concluding Remarks ISC '08

Pavan Balaji, Argonne National Laboratory Our Worldwide Supercomputer NameLocationCoresArch. Memory (GB) NetworkStorage (TB) Distance from Storage SystemXVT2200PPC 970FX4IBNFS (30)11,000 BreadboardArgonne128Opteron410GENFS (5)10,000 Blue Gene/LArgonne2048PPC 4401ProprietaryPVFS (14)10,000 SiCortexArgonne5832MIPS3ProprietaryNFS (4)10,000 JazzArgonne700Xeon1-2GEG/PVFS (20)10,000 TeraGrid (UC) U. Chicago 320Itanium24 Myrinet 2000 NFS (4)10,000 TeraGrid (SDSC) San Diego 60Itanium24 Myrinet 2000 GPFS (50)9,000 OliverLSU512Xeon4IBLustre (12)11,000 Open Science Grid U.S.200 Opteron + Xeon 1-2GE-11,000 TSUBAMETiTech72Opteron16GELustre (350)0 ISC '08

Pavan Balaji, Argonne National Laboratory Dynamic Availability of Compute Clients Two possible extremes: –Complete parallelism across all nodes --- single failure will lose all existing output –Sequential computation of tasks (using different processors to do each task) --- out-of-core computation ! Hierarchical computation with small-scale parallelism Clients maintain very little state –Each client set (a few processors) runs a separate instance of mpiBLAST –Each client set gets a task, computes on it and sends the output to the storage system ISC '08

Pavan Balaji, Argonne National Laboratory Performance Optimizations Architectural Heterogeneity –Data to be converted to architecture independent format –Trouble for vanilla mpiBLAST; not so much for ParaMEDIC Utilizing Parallelism on Processing Nodes –ParaMEDIC I/O has three parts Compute clients, Post-processing servers and I/O servers Post-processing: Each server handles a different stream –Simple, but only effective when there are enough streams Disconnected or Cached I/O –Clients cache output from multiple tasks locally –Allows data aggregation for better bandwidth and merging ISC '08

Pavan Balaji, Argonne National Laboratory Presentation Outline Distributed I/O on the WAN Genomic Sequence Search on the Grid ParaMEDIC: Framework to Decouple Compute and I/O ParaMEDIC on a Worldwide Supercomputer Experimental Results Concluding Remarks ISC '08

Pavan Balaji, Argonne National Laboratory I/O Time Measurements ISC '08

Pavan Balaji, Argonne National Laboratory Storage Bandwidth Utilization (Lustre) ISC '08

Pavan Balaji, Argonne National Laboratory Storage Bandwidth Utilization (ext3fs) ISC '08

Pavan Balaji, Argonne National Laboratory Microbial Genome Database Search Semantic-aware metadata gives scientists 2.5*10 14 searches at their finger-tips –All metadata results from all searches can fit on iPod Nano –“Semantically compressed” 1 Petabyte into 4 Gigabytes (10 6 X) Usual compression results in 1 PB into 300 TB (3X) Semantic Compression ISC '08

Pavan Balaji, Argonne National Laboratory Preliminary Analysis of the Output Analysis of the Similarity Tree –Expect that replicons (i.e., chromosomes) will match other replicons reasonably well –But many replicons do not match many other replicons 25% of all replicon-replicon searches do not match at all! ISC '08

Pavan Balaji, Argonne National Laboratory Presentation Outline Distributed I/O on the WAN Genomic Sequence Search on the Grid ParaMEDIC: Framework to Decouple Compute and I/O ParaMEDIC on a Worldwide Supercomputer Experimental Results Concluding Remarks ISC '08

Pavan Balaji, Argonne National Laboratory Concluding Remarks Distributed I/O is a necessary evil –Difficult to get high performance for “real data” Traditional approaches deal with data as a stream of bytes (allows for portability across any type of data) We proposed ParaMEDIC –Semantics-based meta-data transformation of data –Trade Portability for Performance Evaluated on a World-wide Supercomputer –Self Sequence searched all completed microbial genomes –Generated a petabyte of data that was stored half-way around the world ISC '08

Thank You! Web: Acknowledgments: U. Chicago: R. Kettimuthu, M. Papka and J. Insley Argonne National Lab: N. Desai and R. Bradshaw Virginia Tech: G. Zelenka, J. Lockhart, N. Ramakrishnan, L. Zhang, L. Heath, and C. Ribbens Renaissance Computing Institute: M. Rynge and J. McGee Tokyo Institute of Technology: R. Fukushima, T. Nishikawa, T. Kujiraoka, and S. Ihara Sun Microsystems: S. Vail, S. Cochrane, C. Kingwood, B. Cauthen, S. See, J. Fragalla, J. Bates, R. Cagle, R. Gaines, and C. Bohm Louisiana State University: H. Liu