Outline What users want ? Data pipeline overview

Slides:



Advertisements
Similar presentations
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Advertisements

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Interactive Medical Imaging over the Network Centre Hospitalier Universitaire Vaudois Lausanne Raffaele Noro, Olivier Verscheure and Jean-Pierre Hubaux.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Supervisor: Hadi Salimi Abdollah Ebrahimi Mazandaran University Of Science & Technology January,
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Distributed Multigrid for Processing Huge Spherical Images Michael Kazhdan Johns Hopkins University.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
By Dinesh Bajracharya Nepal Components of Information system.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Condor DAGMan: Managing Job Dependencies with Condor
Pegasus WMS Extends DAGMan to the grid world
Hadoop Aakash Kag What Why How 1.
Hadoop.
U.S. ATLAS Grid Production Experience
NRAO VLA Archive Survey
Example: Rapid Atmospheric Modeling System, ColoState U
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Living in a Network Centric World
Living in a Network Centric World
Grid Means Business OGF-20, Manchester, May 2007
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ch 4. The Evolution of Analytic Scalability
Outline Problem DiskRouter Overview Details Real life DiskRouters
STORK: A Scheduler for Data Placement Activities in Grid
Condor-G Making Condor Grid Enabled
Frieda meets Pegasus-WMS
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
Harrison Howell CSCE 824 Dr. Farkas
MapReduce: Simplified Data Processing on Large Clusters
Overview of Computer system
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility A well.
Presentation transcript:

Data Pipelines: Real Life Fully Automated Fault-tolerant Data Movement and Processing

Outline What users want ? Data pipeline overview Real life Data pipelines NCSA and WCER pipelines Conclusions

What users want ? Make data available at different sites Process data and make results available at different sites Use distributed computing resources for processing Full automation and fault-tolerance

What users want ? Can we press a button and expect it to complete ? Can we not bother about failures ? Can we get acceptable throughput ? Yes… Data pipeline is the solution!

Data Pipeline Overview Fully automated framework for data movement and processing Fault tolerant & resilient to failures Understands failures and handles them Self-tuning Rich statistics Dynamic visualization of system state

Data Pipelines Design View data placement and computation as full fledged jobs Data placement handled by Stork Computation handled by Condor/Condor-G Dependencies between jobs handled by DAGMan Tunable statistics generation/collection tool Visualization handled by DEVise

Fault Tolerance Failure makes automation difficult Variety of failures happen in real life Network, software, hardware System designed taking failure into account Hierarchical fault tolerance Stork/Condor, DAGMan Understands failures Stork switches protocols Persistent logging. Recovers from machine crashes

Self Tuning Users are domain experts and not necessarily computer experts Data movement tuned using Storage system characteristics Dynamic network characteristics Computation scheduled on data availability

Statistics/Visualization Network statistics Job run-times, data transfer times Tunable statistics collection Statistics entered into Postgres database Interesting facts can be derived from the data Dynamic system visualization using DEVise

Real life Data Pipelines Astronomy data processing pipeline ~3 TB (2611 x 1.1 GB files) Joint work with Robert Brunner, Michael Remijan et al. at NCSA WCER educational video pipeline ~6TB (13 GB files) Joint work with Chris Thorn et al at WCER

DPOSS Data Palomar-Oschin photographic plates used to map one half of celestial sphere Each photographic plate digitized into a single image Calibration done by software pipeline at Caltech Want to run SExtractor on the images The Palomar Digital Sky Survey (DPOSS)

Condor Pool @Starlight NCSA Pipeline Staging Site @UW Condor Pool @UW Staging Site @NCSA Unitree @NCSA Input Data flow Output Data flow Processing Condor Pool @Starlight

NCSA Pipeline Moved & Processed 3 TB of DPOSS image data in under 6 days Most powerful astronomy data processing facility! Adapt for other datasets (Petabytes): Quest2, CARMA, NOAO, NRAO, LSST Key component in future Astronomy Cyber infrastructure

WCER Pipeline Need to convert DV videos to MPEG-1, MPEG-2 and MPEG-4 Each 1 hour video is 13 GB Videos accessible through ‘transana’ software Need to stage the original and processed videos to SDSC

WCER Pipeline First attempt at such large scale distributed video processing Decoder problems with large 13 GB files Uses bleeding edge technology Encoding Resolution Average Time MPEG-1 Half (320 x 240) 2 hours MPEG-2 Full (720x480) 8 hours MPEG-4 Half (320 x 480) 4 hours

WCER Pipeline Staging Site @UW Condor Pool @UW WCER Input Data flow Output Data flow Processing SRB Server @SDSC

Condor File Transfer Mechanism Condor Pool @StarLight Conclusion Data pipelines are useful for diverse fields We have shown two working case studies in astronomy and educational research Successfully processed terabytes of data We are working with our collaborators to make this production quality MSScmd DiskRouter UniTree Server @NCSA Management Site @UW User submits a DAG at management site Staging Site @NCSA Staging Site @UW Condor Pool @UW Globus-url-copy Condor File Transfer Mechanism Condor Pool @StarLight 2a 2b 1 2 3 Control flow Input Data flow Output Data flow Processing 4 5 4b 4a A B C D E

Questions Thanks for listening Contact Information George Kola kola@cs.wisc.edu Tevfik Kosar kosart@cs.wisc.edu Collaborators NCSA Robert Brunner (rb@astro.uiuc.edu) NCSA Michael Remijan (remijan@ncsa.uiuc.edu) WCER Chris Thorn (cathorn@wisc.edu)