Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility 12000 Jefferson Ave. Newport News, Virginia USA 23606

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
Distributed Multimedia Systems
Introduction CSCI 444/544 Operating Systems Fall 2008.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
CHAPTER 2 OPERATING SYSTEM OVERVIEW 1. Operating System Operating System Definition A program that controls the execution of application programs and.
Chapter 4 Processor Management
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Disk Farms at Jefferson Lab Bryan Hess
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
ITFN 2601 Introduction to Operating Systems Lecture 4 Scheduling.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
Lecture 12 Scheduling Models for Computer Networks Dr. Adil Yousif.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
Compute and Storage For the Farm at Jlab
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Introduction to Operating System (OS)
Main Memory Management
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Presentation transcript:

Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA Spring06 HEPiX – CASPUR

JASMine Jefferson Lab Mass Storage System - Tape and Disk 2 STK Powderhorn Silos 1.5PB used out of 2PB capacity offline storage Components Data Movers – B tape drives and local staging disks –Legacy 9940A drives – read only, tape copies Disk Cache - 60TB online storage Library Manager

Auger Jefferson Lab Batch Computing System LSF6, fair share scheduling 175 dual Intels, PIII and Xeons PIIIs run 3 jobs Xeons use hyperthreading, run 7 jobs 1100 job slots Increase total job throughput – keep the CPUs busy!

Integration Overview Auger batch farm is JASMine’s largest client (other clients are the physics experiments, interactive users, HPC, SRM grid users…) Goal: Keep farm CPUs busy! don’t start a job until its data is online in the cache don’t delete a job’s data from the cache until the job is done with it (until the data has been copied locally)

Integration In the early implementations, the mass storage system and batch farm were independent entities, with little coordination between them. A farm job would run, specify its data, then wait for it to arrive online. Auger now prestages a job’s data, so the job doesn’t use a job slot and tie up a CPU sitting idle until it can really run.

Integration (cont) But make sure prestaging doesn’t preload too much data… In early implementation, a user who submitted many jobs would cause too much data to be prestaged FIFO, so data would overrun the cache, and be gone before it was used! Auger now provides simple, smart staging…have just enough data for each user prestaged to stay ahead of the LSF fair share scheduler (at least 1 file) –Algorithm based on ratio of free space to how much a user has already used (configurable)

Adaptive Cache Simple round robin of multiple distributed cache disk servers wasn’t enough Use adaptive cache, considering not only -total disk usage of the cache server But also -current load on the disk Combine with -Strict or relaxed allocation Strict – files may not exceed allocation limit Relaxed – files may exceed limit if additional space available -Strict or automatic deletion Automatic - FIFO deletion policy Strict – delete only if user requests deletion

Throughput JASMine routinely moves 8 -10TB/day. It has seen maximum of 20TB/day. These numbers are to/from tape; they are higher if the online disk files are included…

Conclusions JASMine is distributed scaleable modular efficient and allows for further growth with minimal changes. Adding simple Auger features for smart prestaging and adaptive cache policies take advantage of JASMine’s capabilities.