Distributed Storage Wednesday morning, 9:00am Derek Weitzel OSG Graduate Research Assistant University of Nebraska – Lincoln.

Slides:



Advertisements
Similar presentations
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Advertisements

GridFTP: File Transfer Protocol in Grid Computing Networks
XSEDE 13 July 24, Galaxy Team: PSC Team:
A Computation Management Agent for Multi-Institutional Grids
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
10 May 2007 HTTP - - User data via HTTP(S) Andrew McNab University of Manchester.
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
FTP (File Transfer Protocol) & Telnet
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
OSG Public Storage and iRODS
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Distributed File Systems
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Distributed Storage Tuesday afternoon Horst Severini University of Oklahoma Slides courtesy of Derek Weitzel University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
OSG Storage Architectures Tuesday Afternoon Brian Bockelman, OSG Staff University of Nebraska-Lincoln.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
WWW: an Internet application Bill Chu. © Bei-Tseng Chu Aug 2000 WWW Web and HTTP WWW web is an interconnected information servers each server maintains.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Distributed Storage Rob Quick Indiana University Slides courtesy of Derek Weitzel University of Nebraska – Lincoln.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Five todos when moving an application to distributed HTC.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Compute and Storage For the Farm at Jlab
Large Output and Shared File Systems
Data Bridge Solving diverse data access in scientific applications
Introduction to Data Management in EGI
Thoughts on Computing Upgrade Activities
Web Caching? Web Caching:.
Haiyan Meng and Douglas Thain
Thursday AM, Lecture 2 Brian Lin OSG
Presentation transcript:

Distributed Storage Wednesday morning, 9:00am Derek Weitzel OSG Graduate Research Assistant University of Nebraska – Lincoln

2013 OSG User School Who Am I? PhD Graduate Student from Nebraska Student of International Summer School for Grid Computing 2

2013 OSG User School Cloud Storage? Has anyone used cloud storage before? 3

2013 OSG User School Why did you use cloud storage? Why did you use Dropbox?  Sharing files with others?  Access to the files from anywhere (web)?  Not necessarily for storage, you still have to have a copy on your computer 4

2013 OSG User School What would Dropbox look like in Science? Would Dropbox look the same in Science? Do you want a copy of all your files on your computer, and some ‘Cloud’?  2 copies Do you want it accessible from everywhere? 5

2013 OSG User School What would Dropbox look like in Science? I think Dropbox would look different Data is getting too big to have a ‘local’ copy on your desktop, and somewhere else. Want the total dataset somewhere big, and well maintained (Cloud?) Want only a working set on my desktop. 6

2013 OSG User School Outline Common data patterns in HTC Applications. Storage architecture for HTC sites Strategies for Managing Data. Reliability and the Cost of Complexity Prestaging data on the OSG Advanced Data Management Architectures 7

8 CMS 6 PB raw/run Phobos 50 TB/run E917 5 TB/run Now

9 Total needed data volume on tape: 84 PB Phobos E917

2013 OSG User School Common Storage Patterns There are many variations in how storage is handled. Not only variations from different workflows, but even within a workflow Important: Pick a proper storage strategy for you application! 10

2013 OSG User School Common Storage Patterns I’m going to highlight a few of the storage patterns seen at HTC sites It’s important to note that moving files around is relatively easy. Error cases are the hard part of storage management. 11

2013 OSG User School Classifying Usage Ask yourself, per job:  Events in a job’s life:  INPUT: What should be available when the job is starting?  RUNTIME: What is needed while the job runs?  OUTPUT: What output is produced?  Important quantities:  FILES: How many files?  DATA: How big is the “working set” of data? How big is the sum of all the files?  RATES: How much data is consumed on average? At peak?

2013 OSG User School Why we care about quantities? FILES: How many files?  Many file systems cannot handle lots of small files DATA:  Working set is how much you need on a worker node for 1 ‘unit’ of work  How much space do you need on the file server to store the input? Output? 13

2013 OSG User School Why we care about quantities? RATES: How much data is consumed on average? At peak?  Even professionals have a hard time figuring this out  Rates determine how you should stage your data (talk about staging shortly) 14

2013 OSG User School Examples of usage Simulation  Small input, big out Searches  Big input, small output Data processing  ~same input and output Analysis  Big input, small output 15

2013 OSG User School Simulation Based on different input configurations, generate the physical response of a system.  Input: The HTC application must manage many input files; one per job.  Runtime: An executable reads the input and later produces output. Sometimes, temporary scratch space is necessary for intermediate results.  Output: These outputs can be small [KB] (total energy and configuration of a single molecule) or large [GB] (electronic readout of a detector for hundreds of simulated particle collisions).  “Huge” outputs [TB] are currently not common in HTC.

2013 OSG User School Searches Given a database, calculate a solution to a given problem.  Input: A database (possibly several GB) shared for all jobs, and an input file per job.  Runtime: Job reads the configuration file at startup, and accesses the database throughout the job’s runtime.  Output: Typically small (order of a few MB); the results of a query/search.

2013 OSG User School Data Processing Transform dataset(s) into new dataset(s). The input dataset might be re-partitioned into new logical groupings, or changed into a different data format.  Input: Configuration file. Input dataset  Runtime: Configuration is read at startup; input dataset is read through, one file at a time. Scratch space used for staging output.  Output: Output dataset; similar in size to the input dataset.

2013 OSG User School Analysis Given some dataset, analyze and summarize its contents.  Input: Configuration file and data set.  Runtime: Configuration file is read, then the process reads through the files in the dataset, one at a time (approximately constant rate). Intermediate output written to scratch area.  Output: Summary of dataset; smaller than input dataset.

2013 OSG User School OSG Anti-Patterns Just as we want to identify common successful patterns, we want to identify common patterns that are unsuccessful on OSG.  Files larger than >5GB.  Requiring a (POSIX) shared file system.  Lots of small files (more than 1 file per minute of computation time).  Jobs consuming more than 10GB per hour, or needing scratch space more than 5GB.  Locking, appending, or updating files. When using OSG resources opportunistically, the effective limitations may be more restrictive than above!

2013 OSG User School Exercise to help identify problems Remember, we want to identify:  Input data – How much data is needed for the entire workflow?  Working set – How much data is needed to run 1 unit of work? Including possible temporary files.  Output data – How much data is output to the workflow. 21

2013 OSG User School Questions? Questions? Comments?  Feel free to ask me questions: Derek Weitzel Upcoming sessions  9:30-10:15  Hands-on exercises  10:15 – 10:30  Break  10:30 – 12:00  More! 22

2013 OSG User School Exercise ucation/OSGSS2013StorageEx1 ucation/OSGSS2013StorageEx1 23

Lecture 2: Using Remote Storage Systems Derek Weitzel

2013 OSG User School Last Exercise What where some of the numbers for the last exercise? 25

2013 OSG User School Last Exercise Yeast:  Data: 22MB (executable = 15MB, why?)  Rate  MB/s = 22MB /.3s = 73MB/s  Files = 11 /.3s = 36/s Compare:  Spinning disk = ~60MB/s 26

2013 OSG User School Storage Architectures on the OSG Lets look at how storage is structured on the OSG This will give you an idea of where the bottlenecks can be. Also can help when forming data solutions on your own.

2013 OSG User School Storage at OSG CEs All OSG sites have some kind of shared, POSIX-mounted storage (typically NFS).* This is almost never a distributed or high-performance file system This is mounted and writable on the CE.* This is readable (sometimes read-only) from the OSG worker nodes. 28 *Exceptions apply! Sites ultimately decide

2013 OSG User School Why Not? This setup is called the “classic SE” setup, because this is how the grid worked circa  Why didn’t this work? High-performance filesystems not reliable or cheap enough. Scalability issues. Difficult to manage space. 29

2013 OSG User School Storage Elements In order to make storage and transfers scalable, sites set up a separate system for storage (the Storage Element). Most sites have an attached SE, but there’s a wide range of scalability. These are separated from the compute cluster; normally, you interact it via a get or put of the file.  Not POSIX! 30

2013 OSG User School Storage Elements on the OSG 31 User point of View!

2013 OSG User School SE Internals 32 Only care about

2013 OSG User School Complicated? 33

2013 OSG User School GridFTP in One Slide An set of extensions to the classic FTP protocol. Two most important extensions:  Security: Authentication, encryption, and integrity provided by GSI/X509. Use proxies instead of username/password.  Improved Scalability: Instead of transferring a file over one TCP connection, multiple TCP connections may be used.

2013 OSG User School SRM SRM = Storage Resource Management SRM is a web-services-based protocol for doing:  Metadata Operations.  Load-balancing.  Space management. This allows us to access storage remotely, and to treat multiple storage implementations in a homogeneous manner.

2013 OSG User School Data Access Methods To go with our archetypical problems, we’ll have a few common approaches to implementing solutions:  Job Sandbox  Prestaging  Caching  Remote I/O A realistic solution will combine multiple methods. Descriptions to follow discuss the common case; there are always exceptions or alternates.

2013 OSG User School Job Sandbox Data sent with jobs. The user generates the files on the submit machines. These files are copied from submit node to worker node.  With Condor, a job won’t even start if the sandbox can’t be copied first. You can always assume it is found locally. What are the drawbacks of the job sandbox?

2013 OSG User School Prestaging Data placed into some place “near” to where the jobs are run. By increasing the number of locations of the data and doing intelligent placement, we increase scalability.  In some cases, prestaging = making a copy at each site where the job is run.  Not always true; a single storage element may be able to support the load of many sites.

2013 OSG User School Caching A cache transparently stores data close to the user process. Future requests can be served from the cache.  User must place file/object at a source site responsible for keeping the file permanently. Cache must be able to access the file at its source.  Requesting a file will bring it into the cache.  Once the cache becomes full, an eviction policy is invoked to decide what files to remove.  The cache, not the user, decides what files will be kept in the cache.

2013 OSG User School Remote I/O Application data is streamed from a remote site upon demand. I/O calls are transformed into network transfers.  Typically, only the requested bytes are moved.  Often, data is streamed from a common source site.  Can be done transparently so the user application doesn’t know the I/O isn’t local. Can be effectively combined with caching to scale better.

2013 OSG User School Scalability For each of the four previous methods (sandboxes, prestaging, caching, remote I/O), how well do they scale?  What are the limitations on data size?  What resources are consumed per user- process?  What are the scalability bottlenecks?

2013 OSG User School The Cost of Reliability For each of the four previous methods, what must be done to create a reliable system?  What’s the cost for tracking the location of files?  What recovery must be done on failures?  What must be done manually, or require extra infrastructure?  What is the most critical bottlenecks or points of failure?  How do these tie into the job submission system?

2013 OSG User School Comparing Compute and Storage The “shared resource” for computing is the gatekeeper.  One badly behaving user can overload the gatekeeper, preventing new jobs from starting.  However, once jobs are started, they are mostly independent.  There are some shared aspects: what are they? Why am I not concerned about them?

2013 OSG User School Comparing Compute and Storage Storage is different:  It is often used throughout the job’s lifetime, especially for remote I/O.  One badly behaved user can crash the storage resource – or at least severely degrade.  Opportunistic usage of storage via prestaging does not automatically have a limited lifetime.  Most users assume that data, once written to a SE, will be retrievable.

2013 OSG User School Prestaging Data on the OSG Prestaging is currently the most popular data management method on the OSG, but requires the most work.  Requires a transfer system to move a list of files between two sites. Example systems: Globus Online, Stork, FTS.  Requires a bookkeeping system to record the location of datasets. Examples: LFC, DBS.  Requires a mechanism to verify the current validity of file locations in the bookkeeping systems. Most systems are ad-hoc or manual.

2013 OSG User School Previous example? In the first exercise, you formulated a data plan for BLAST. Now, you are going to implement the data movement of blast using pre- staging. 46

2013 OSG User School Exercise Time! on/OSGSS2013StorageEx2 on/OSGSS2013StorageEx2 47

Lecture 3: Caching and Remote I/O Derek Weitzel

2013 OSG User School HTTP Caching is easy on the OSG Widely deployed caching servers (required by other experiments) Tools are easy to use  curl, wget Servers are easy to setup  Just need http server, somewhere 49

2013 OSG User School HTTP Caching In practice, you can cache almost any URL Site caches are typically configured to cache any URL, but only accept requests from WN’s Sites control the size and policy for caches, keep this in mind. 50

2013 OSG User School HTTP Caching - Pitfalls curl – Have to add special argument to use caching:  curl -H "Pragma:” Services like Dropbox and Google Docs explicitly disable caching  Dropbox: cache-control: max-age=0  Google Docs: Cache-Control: private, max-age=0  They also use uncachable HTTPS 51

2013 OSG User School HTTP Caching The HTTP protocol is perhaps the most popular application-layer protocol on the planet. Built-in to HTTP 1.1 are elaborate mechanisms to use HTTP through a proxy and for caching. There are mature, widely-used command- line clients for the HTTP protocol on Linux.  Curl and wget are the most popular; we will use wget for the exercises.  Oddly enough, curl disables caching by default.

2013 OSG User School HTTP Caching: What do you need? All you need is a public webserver… somewhere. Will pull down files to the worker node through squid cache. 53

2013 OSG User School HTTP Cache Dataflow For this class, we will use: -Source site: osg-ss-submit.chtc.wisc.edu -HTTP cache: osg-ss-se.chtc.wisc.edu

2013 OSG User School Proxy Request Here’s the HTTP headers for the request that goes to the proxy osg-ss- se.cs.wisc.edu: GET HTTP/1.0 User-Agent: Wget/ Red Hat modified Accept: */* Host: osg-ss-submit.chtc.wisc.edu

2013 OSG User School Proxy Request Response: 56 HTTP/ OK Date: Tue, 19 Jun :53:26 GMT Server: Apache/2.2.3 (Scientific Linux) Last-Modified: Tue, 19 Jun :26:56 GMT ETag: "136d0f-59-4c2d9f120e800" Accept-Ranges: bytes Content-Length: 89 Content-Type: text/html; charset=UTF-8 X-Cache: HIT from osg-ss-se.chtc.wisc.edu X-Cache-Lookup: HIT from osg-ss-se.chtc.wisc.edu:3128 Via: 1.0 osg-ss-se.chtc.wisc.edu:3128 (squid/2.6.STABLE21) Proxy-Connection: close

2013 OSG User School HTTP Proxy HTTP Proxy is very popular for small VO’s Easy to manage, easy to implement Highly recommend HTTP Proxy for files less than 100MB. 57

2013 OSG User School Now on to Remote I/O Do you only read parts of the data (skip around in the file?) Do you have data that is only available on the submit host? Do you have an application that requires a specific directory structure? 58

2013 OSG User School 59 Remote I/O is for you

2013 OSG User School Remote I/O 60

2013 OSG User School Remote I/O We will perform Remote I/O with the help of Parrot from ND. It redirects I/O requests back to the submit host. Can ‘pretend’ that a directory is locally accessible. 61

2013 OSG User School Examples of Remote I/O What does Remote IO look like in practice? 62 submit host: $ echo “hello from submit host” > newfile script on the worker node (in Nebraska?) $ cat /chirp/CONDOR/home/dweitzel/newfile hello from submit host $

2013 OSG User School Celebrate! 63

2013 OSG User School Downsides of Remote I/O What are some downsides to Remote I/O? All I/O goes through the submit host. Distributed remote I/O is hard (though can be done, see XrootD) 64

2013 OSG User School Now lets do it! on/OSGSS2013StorageEx3 on/OSGSS2013StorageEx3 65