Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &

Slides:

Advertisements

Similar presentations

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki

Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

11/12/2003LIGO Document G Z1 Data reduction for S3 I Leonor (UOregon), P Charlton (CIT), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald.

Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

GridFTP Guy Warner, NeSC Training.

Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.

Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Patrick R Brady University of Wisconsin-Milwaukee

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

G Z LIGO Scientific Collaboration Grid Patrick Brady University of Wisconsin-Milwaukee LIGO Scientific Collaboration.

November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.

Update on the LIGO Data Analysis System LIGO Scientific Collaboration Meeting LIGO Hanford Observatory August 19 th, 2002 Kent Blackburn Albert Lazzarini.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.

09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

LFC Replication Tests LCG 3D Workshop Barbara Martelli.

GridFTP Richard Hopkins

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.

A Fully Automated Fault- tolerant System for Distributed Video Processing and Offsite Replication George Kola, Tevfik Kosar and Miron Livny University.

State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.

11/12/2003LIGO-G Z1 Data reduction for S3 P Charlton (CIT), I Leonor (UOregon), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald (AEI),

AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.

Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.

LIGO-G W Use of Condor by the LIGO Scientific Collaboration Gregory Mendell, LIGO Hanford Observatory On behalf of the LIGO Scientific Collaboration.

NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

GridFTP Guy Warner, NeSC Training Team.

1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.

Tackling I/O Issues 1 David Race 16 March 2010.

Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

Feb 4, 2005Scott Koranda1 Cataloging, Replicating, and Managing LIGO Data on the Grid Scott Koranda UW-Milwaukee On behalf of the LIGO.

Data Acquisition, Diagnostics & Controls (DAQ)

U.S. ATLAS Grid Production Experience

Gregory Mendell LIGO Hanford Observatory

Example: Rapid Atmospheric Modeling System, ColoState U

The VIRGO DATA ANALYSIS Fulvio Ricci

Seismic Hazard Analysis Using Distributed Workflows

BDII Performance Tests

Bernd Panzer-Steindel, CERN/IT

Grid Canada Testbed using HEP applications

JDAT Production Hardware

Patrick Dreher Research Scientist & Associate Director

Milestone 2 Include the names of the papers

A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,

Part Three: Data Management

Outline Problem DiskRouter Overview Details Real life DiskRouters

TeraScale Supernova Initiative

Software Implementation

Short to middle term GRID deployment plan for LHCb

Presentation transcript:

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Heavyweight Data from LIGO Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 interferometers at LHO, 1 at LLO 1000’s of channels recorded at rates of 16 KHz, 16 Hz, 1 Hz,… Output is binary ‘frame’ files holding 16 seconds data with GPS timestamp ~ 100 MB from LHO ~ 50 MB from LLO ~ 1 TB/day in total S1 run ~ 2 weeks S2 run ~ 8 weeks 4 km LIGO interferometer at Livingston, LA

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Networking to IFOs Limited LIGO IFOs remote, making bandwidth expensive Couple of T1 lines for /administration only Ship tapes to Caltech (SAM- QFS) Reduced data sets (RDS) generated and stored on disk ~ 20 % size of raw data ~ 200 GB/day GridFedEx protocol

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Replication to University Sites CIT UWM PSU MIT UTB Cardiff AEI

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Why Bulk Replication to University Sites? Each has compute resources (Linux clusters) –Early plan was to provide one or two analysis centers –Now everyone has a cluster Cheap storage is cheap –$1/GB for drives –TB RAID-5 < $10K –Throw more drives into your cluster Analysis applications read a lot of data –Different ways to slice some problems, but most want access to large sets of data for a particular instance of search parameters

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org LIGO Data Replication Challenge Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting…) Support a number of storage models at sites –CIT → SAM-QFS (tape) and large IDE farms –UWM → 600 partitions on 300 cluster nodes –PSU → multiple 1 TB RAID-5 servers –AEI → 150 partitions on 150 nodes with redundancy Coherent mechanism for data discovery by users and their codes Know what data we have, where it is, and replicate it fast and easy

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Prototyping “Realizations” Need to keep “pipe” full to achieve desired transfer rates –Mindful of overhead of setting up connections –Set up GridFTP connection with multiple channels, tuned TCP windows and I/O buffers and leave it open –Sustained 10 MB/s between Caltech and UWM, peaks up to 21 MB/s Need cataloging that scales and performs –Globus Replica Catalog (LDAP) < 10 5 and not acceptable –Need solution with relational database backend scales to 10 7 and fast updates/reads No need for “reliable file transfer” (RFT) –Problem with any single transfer? Forget it, come back later… Need robust mechanism for selecting collections of files –Users/sites demand flexibility choosing what data to replicate Need to get network people interested –Do your homework, then challenge them to make your data flow faster

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org LIGO, err… Lightweight Data Replicator (LDR) What data we have… –Globus Metadata Catalog Service (MCS) Where data is… –Globus Replica Location Service (RLS) Replicate it fast… –Globus GridFTP protocol –What client to use? Right now we use our own Replicate it easy… –Logic we added –Is there a better solution?

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Data Replicator Replicated 20 TB to UWM thus far Just deployed at MIT, PSU, AEI Deployment in progress at Cardiff LDRdataFindServer running at UWM

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Data Replicator “Lightweight” because we think it is the minimal collection of code needed to get the job done Logic coded in Python –Use SWIG to wrap Globus RLS –Use pyGlobus from LBL elsewhere Each site is any combination of publisher, provider, subscriber –Publisher populates metadata catalog –Provider populates location catalog (RLS) –Subscriber replicates data using information provided by publishers and providers Take “Condor” approach with small, independent daemons that each do one thing –LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,…

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Future? LDR is a tool that works now for LIGO Still, we recognize a number of projects need bulk data replication –There has to be common ground What middleware can be developed and shared? –We are looking for “opportunities” Code for “solve our problems for us…” –Want to investigate Stork, DiskRouter, ? –Do contact me if you do bulk data replication…