Global Terascale Data Management Scott Atchley, Micah Beck, Terry Moore Vanderbilt ACCRE Workshop 7 April 2005.

Slides:



Advertisements
Similar presentations
Recent Developments in Logistical Networking Micah Beck, Assoc. Prof. & Director Logistical Computing & Internetworking (LoCI) Lab Computer Science Department.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
High Performance Computing Course Notes Grid Computing.
1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,
SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
IBP-BLAST: Using Logistical Networking to Distribute BLAST Databases Over a Wide Area Network Ravi Kosuri 1 Jay Snoddy 2, 3 Stefan Kirov2 Erich Baker 1*
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
Moving Large Amounts of Data Rob Schuler University of Southern California.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
File and Object Replication in Data Grids Chin-Yi Tsai.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Introduction to Logistical Networking Micah Beck, Assoc. Prof. & Director Logistical Computing & Internetworking (LoCI) Lab APAN Advanced.
Working together towards a next generation storage element Surya D. Pathak Advanced Computing Center for Research and Education.
1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Logistical Networking Micah Beck, Research Assoc. Professor Director, Logistical Computing & Internetworking (LoCI) Lab Computer.
Logistical Networking as an Advanced Engineering Testbed Micah Beck, Assoc. Prof. & Director Logistical Computing & Internetworking (LoCI) Lab
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Presenters: Rezan Amiri Sahar Delroshan
June 29 San FranciscoSciDAC 2005 Terascale Supernova Initiative Discovering New Dynamics of Core-Collapse Supernova Shock Waves John M. Blondin NC State.
An Exposed Approach to Reliable Multicast in Heterogeneous Logistical Networks Micah Beck, Assoc. Prof. & Director Logistical Computing & Internetworking.
1 Mobile Management of Network Files Alex BassiMicah Beck Terry Moore Computer Science Department University of Tennessee.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Wide Area Data Sharing with Logistical Networking Micah Beck, Assoc. Prof. & Director Logistical Computing & Internetworking (LoCI) Lab Computer Science.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
An End-to-End Approach to Scalable Network Storage Micah Beck, Associate Professor Director, Logistical Computing & Internetworking (LoCI) Lab Terry Moore,
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
August 28, 2003APAN, Logistical Networking WS DiDaS Distributed Data Storage Ludek Matyska Masaryk University, Institute of Comp. Sci. and CESNET, z.s.p.o.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Logistical Networking: Buffering in the Network Prof. Martin Swany, Ph.D. Department of Computer and Information Sciences.
Workflow Management Concepts and Requirements For Scientific Applications.
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.
StoRM: a SRM solution for disk based storage systems
CMS and Logistical Storage
Parallel NetCDF + MASS Development
TeraScale Supernova Initiative
Presentation transcript:

Global Terascale Data Management Scott Atchley, Micah Beck, Terry Moore Vanderbilt ACCRE Workshop 7 April 2005

Talk Outline »What is Terascale Data Management? »Accessing Globally Distributed Data »What is Logistical Networking? »Performing Globally Distributed Computation

Terascale Data Management »Scientific Data Sets are Growing Rapidly Growing from 100s of GBs to 10s of TBs Terascale Supernova Initiative (ORNL) Princeton Plasma Physics Lab fusion simulation »Distributed resources Simulate on supercomputer at ORNL or NERSC Analyze/Visualize at ORNL, NCSU, PPPL, etc. »Distributed Collaborators TSI has members at ORNL, NCSU, SUNYSB, UCSD, FSU, FAU, UC-Davis, others

Terascale Data Management »Scientists use NetCDF or HDF to manage dataset Scientist has logical view of data as variables, dimensions, metadata, etc. NetCDF or HDF manages serialization of data to local file Scientist can query for specific variables, metadata, etc. Requires local file access to use (local disk, NFS, NAS, SAN, etc.) Datasets can be too large for the scientist to store locally - if so, can’t browse

NetCDF (Common Data Format) Variable1 Variable2 Variable3 Header NetCDF file stored on disk Scientist generates data Scientist uses NetCDF to organize data

Accessing Globally Distributed Data »Collaboration requires shared file system Nearly impossible over the wide-area network »Other methods include HTTP, FTP, GridFTP, scp Can be painfully slow to transfer large data Can be cumbersome to set up/manage accounts »How to manage replication? More copies can improve performance, but… If more than one copy exists, how to let others know? How to choose which replica to use?

What is Logistical Networking? »An architecture for globally distributed, sharable resources Storage Computation »A globally deployed infrastructure Over 400 storage servers in 30 countries Serving 30 TB of storage »Open-source client tools and libraries Linux, Solaris, MacOSX, Windows, AIX, others Some Java tools

Logistical Networking »Modeled on Internet Protocol »IBP provides generic storage and generic computation Weak semantics Highly scalable »L-Bone provides resource discovery »exNode provides data aggregation (length and replication) and annotation »LoRS provides fault-tolerance, high performance, security »Multiple applications available Physical Layer IBP Logistical Runtime System Applications LBoneexNode

Current Infrastructure Deployment The public deployment includes 400 IBP depots in 30 countries serving 30 TB storage (leverages PlanetLab). Private deployments for DOE, Brazilian and Czech backbones.

Available LoRS Client Tools Binaries for Windows and MacOSX. Source for Linux, Solaris, AIX, others.

LoDN - Web-based File System Store files into the Logistical Network using Java upload/download tools. Manages exNode “warming” (lease renewal and migration). Provides account (single user or group) as well as “world” permissions.

NetCDF/L »Modified NetCDF that stores data in logistical network (lors://) »Uses libxio (Unix IO wrapper) »Ported NetCDf with 13 lines of code »NetCDF 3.6 provides for >2 GB files (64-bit offset) »LoRS parameters available via environment variables

ncview using NetCDF/L

Libxio »Unix IO wrapper (open(), read(), write(), etc.) »Developed in Czech Republic for Distributed Data Storage (DiDaS) project »Port any Unix IO app using 12 lines #ifdef HAVE_LIBXIO #define open xio_open #define close xio_close... #endif »DiDaS ported transcode (video transcoding) and mplayer (video playback) apps using libxio

High Performance Streaming for Large Scale Simulations »Viraj Bhat, Scott Klasky, Scott Atchley, Micah Beck, Doug McCune, Manish Parashar, High Performance Threaded Data Streaming for Large Scale Simulations, in the proceedings of 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, Nov »Streamed data from NERSC to PPPL using LN »Sent data from previous timestep while computation proceeds on next timestep »Automatically adjusts to network latencies Adds threads when needed »Imposed <3% overhead (as compared to no IO) »Writing to GPFS at NERSC imposed 3-10% overhead (depending on block size)

Failsafe Mechanisms for Data Streaming in Energy Fusion Simulation Buffer Overflow PPPL depots depots close by Signal Data Flow Network Failure Write to GPFS or Simulation Depot GPFS file sys exnodercv Re-fetch failed transfers from GPFS/depots Supercomputer nodes depots on simulation end Post- processing routines Replication

LN in the Ctr for Plasma Edge Simulation »Implementing workflow automation Reliable communication between stages Exposed mapping of parallel files »Collaborative use of detailed simulation traces Terabytes per time step Accessed globally Distributed postprocessing and redistribution »Vizualization of partial datasets Heirarchy: Workstation/cluster/disk cache/tape Access prediction, prestaging is vital

Performing Globally Distributed Computation »Adding generic, restricted computing within the depot »Side-effect-free programming only (no system calls or external libraries except malloc allowed) »Uses IBP capabilities to pass arguments »Mobile code “oplets” with restricted environment C/compiler based Java byte code based »Test applications include Text mining: parallel grep Medical vizualization: brain fiber tracing

Grid Computing for Distributed Viz: DT- MRI Brain Scans (Jian Huang) 1. Send data 2. Request processing 3. Return results

Transgrep »Stores 1.3 GB (web server) log file »Data is striped and replicated on dozens of depots »Transgrep breaks job into uniform blocks (10 MB) »Has depots search within blocks »Depots return matches as well as partial first and last lines »Client sorts results and searches lines that overlap block boundaries »15-20 times speed up versus local search »Automatically handles slow or failed depots

Czech Republic LN Infrastructure

Brazil RNP LN Infrastructure

Proposed OneTenn Infrastructure

Logistical Networking First Steps »Try LoDN by browsing as a guest (Java) »Download the LoRS tools and try them out »Download and use NetCDF/L »Use libxio to port your Unix IO apps to use LN »Run your own IBP depots (publicly or privately)

More information