Disk Farms at Jefferson Lab Bryan Hess

Slides:



Advertisements
Similar presentations
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Electrical Engineering Department Software Systems Lab TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Persistent chat room Authors: Hazanovitch Evgeny Hazanovitch.
Electrical Engineering Department Software Systems Lab TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Meeting recorder Application based on Software Agents.
1 - Oracle Server Architecture Overview
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Network Administration Procedures Tools –Ping –SNMP –Ethereal –Graphs 10 commandments for PC security.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
File Systems (2). Readings r Silbershatz et al: 11.8.
Introduction to Cyberspace
CS212: OPERATING SYSTEM Lecture 1: Introduction 1.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
Web Services An introduction for eWiSACWIS May 2008.
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
Write-through Cache System Policies discussion and A introduction to the system.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Client Call Back Client Call Back is useful for multiple clients to keep up to date about changes on the server Example: One auction server and several.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova,
ENERGY-EFFICIENCY AND STORAGE FLEXIBILITY IN THE BLUE FILE SYSTEM E. B. Nightingale and J. Flinn University of Michigan.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
What if? or Combining different ideas J. F. Pâris.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
1 Example security systems n Kerberos n Secure shell.
Compute and Storage For the Farm at Jlab
Getting the Most out of Scientific Computing Resources
Understanding and Improving Server Performance
Getting the Most out of Scientific Computing Resources
Chapter 1: Introduction
Server Concepts Dr. Charles W. Kann.
Chapter 1: Introduction
RAID RAID Mukesh N Tekwani
NCSA Supercluster Administration
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Distributed P2P File System
Web Server Administration
Introduction to Cyberspace
THE GOOGLE FILE SYSTEM.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Disk Farms at Jefferson Lab Bryan Hess

Background Data stored in 6,000 tape StorageTek silo Data throughput > 2TB per day Batch farm of ~250cpus for data reduction and analysis Interactive analysis as well

User Needs Fast access to frequently used data from silo Automatic staging of files for the batch farm (distinct disk pool for farm) Tracking of disk cache use – Disk Use – File Use – Access patterns

Cache Disk Management Read-only area with subset of silo data Unified NFS-view of cache disks in /cache User interaction with cache – jcache(request files) – jcache –g halla (request for specific group) – jcache –d (early deletion)

Cache disk policies Disk Pools are divided into groups Management policy set per group: – Cache – LRU files removed as needed – Stage – Reference counting – Explicit – manual addition and deletion

Architecture: Hardware Linux (now prexx?) Dual 750MHz Pentium III Asus Motherboards Mylex RAID controllers 11 x 73GB disks ≈ 800GB raid0 Gigabit ethernet to Foundry BigIron switch …about 3¢/MB

Architecture: Software Java 1.3 Cache manager on each node MySQL database used by all servers Protocol for file transfers (more shortly) Writes to cache are never NFS Reads from cache may be NFS

Protocol for file moving Simple extensible protocol for file copies Messages are java serialized object Protocol is synchronous – all calls block asynchrony by threading Fall back to raw data transfer for speed– faster and more fair than NFS. Session may make many connections

Protocol for file moving Cache server extends the basic protocol – Add database hooks for cache – Add hooks for cache policies – Additional message type were added

Example: Get from cache using our Protocol (1) cacheClient.getFile(“/foo”, “halla”); – send locate request to any server – receive locate reply Client (farm node) cache1 cache2 cache3 cache4 Cache3 has /foo Where is /foo? Database

Example: Get from cache using our Protocol (2) cacheClient.getFile(“/foo”, “halla”); – contact appropriate server – initiate direct xfer – Returns true on success Client (farm node) cache1 cache2 cache3 cache4 Sending /foo Get /foo Database

Example: simple put to cache using our Protocol putFile(“/quux”,”halla”, ); Client (data mover) cache1 cache2 cache3 cache4 Cache4 has room Where can I put /quux? Database

Fault Tolerance Dead machines do not stop the system – Only impact is on NFS clients Exception handling for – Receive timeouts – Refused connections – Broken connections – Complete garbage on connections

Authorization and Authentication Shared secret for each file transfer session – Session authorization by policy objects – Example: receive 5 files from Plug-in authenticators – Establish shared secret between client and server – No cleartext passwords

Bulk Data Transfers Model supports parallel transfers – Many files at once, but not bbftp style – For bulk data transfer over WANs Web-based class loader– zero pain updates Firewall issues – Client initiates all connections

Additional Information