GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.

Slides:



Advertisements
Similar presentations
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Operating System.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
GLAST LAT ProjectDOE/NASA Review of the GLAST/LAT Project, Feb , 2001 IOC – S.Williams SAS - R.Dubois 1 Instrument Operations Center (IOC) and Science.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
GLAST LAT ProjectISOC CDR, 4 August 2004 Document: LAT-PR-04500Section 3.11 GLAST Large Area Telescope: Instrument Science Operations Center CDR Section.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
GLAST LAT ProjectISOC Peer Review - March 2, 2004 Document: LAT-PR Section 2.1 Requirements 1 Gamma-ray Large Area Space Telescope GLAST Large.
GLAST LAT ProjectDOE/NASA Review of the GLAST/LAT Project, Aug. 14, 2001 Scott Williams 1 GLAST Large Area Telescope: Instrument Operations Center Scott.
GLAST LAT Project ISOC Peer Review - March 2, 2004 Document: LAT-PR Section 4.3 Pipeline, Data Storage and Networking Issues1 Gamma-ray Large.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Section 15-1GLAST Ground System Design Review August 18&19, 2004 ISOC Organization ISOC Manager R Cameron Commanding, H&S Timeline Planning Command Generation.
GLAST LAT ProjectISOC CDR, 4 August 2004 Document: LAT-PR-04500Section 4.11 GLAST Large Area Telescope: Instrument Science Operations Center CDR Section.
Parallel Reconstruction of CLEO III Data Gregory J. Sharp Christopher D. Jones Wilson Synchrotron Laboratory Cornell University.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.
Fermi Large Area Telescope (LAT) Integration and Test (I&T) Data Experience and Lessons Learned LSST Camera Workshop Brookhaven, March 2012 Tony Johnson.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
GLAST LAT ProjectI&T PDR Presentation – Jan. 9, 2002 R. Claus1 Integration and Test Organization Chart I&T&C Manager Elliott Bloom WBS I&T Engineer.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Overview The GLAST Data Handling Pipeline (the Pipeline) provides a uniform interface to the diverse data handling needs of the GLAST collaboration. Its.
We have developed a GUI-based user interface for Chandra data processing automation, data quality evaluation, and control of the system. This system, known.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
LAT HSK Data Handling from B33 Cleanroom. ISOC Software Architecture.
GLAST LAT ProjectTracking Software Week – Pisa – 20 May 2002 R.Dubois1 SAS Overview Overall Data Flow Sim/Recon –Gleam –Recon rewrites –Calibrations Level.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
GLAST LAT ProjectFace to Face Managers Meeting 04/14/ Integration and Test 1 GLAST Large Area Telescope: I & T Input to Face to Face Managers.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
1 Stepping in everyone’s toes ( but for a good cause….) Eduardo do Couto e Silva Software Meeting – January 2001.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
EOVSA Pipeline Processing System J. McTiernan EOVSA Prototype Review 24-Sep-2012.
GLAST LAT ProjectSAS PDR, Aug. 17, 2001 R.Dubois1 Schedule Balloon Science Tools, Analysis Platforms Calibrations Performance Tuning, Tracking Data Production.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
1 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 CAA Cross Cal Meeting Oct 2014 Pipeline Automation Chris Perry.
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test Two Tower Integration Readiness Review SVAC Elliott.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
CMS High Level Trigger Configuration Management
Existing Perl/Oracle Pipeline
“Running Monte Carlo for the Fermi Telescope using the SLAC farm”
GLAST Large Area Telescope
GLAST Large Area Telescope
Laura Bright David Maier Portland State University
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Data Challenge 1 Closeout Lessons Learned Already
Level 1 Processing Pipeline
GLAST Large Area Telescope Instrument Science Operations Center
Presentation transcript:

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data through reconstruction (Level 1). Provide near real-time feedback to IOC. Facilitate verification and generation of calibration constants. Produce bulk Monte Carlo simulations. Backup all data that passes through.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 2 LAT Data Processing Facility Some Important Numbers. Downlink rate – 300 Kb/sec → 3 GB/day → 1 TB/year. Data plus generated products ~ 3 – 5 TB/year. Over 5 years ~ TB. Average event rate in telemetry ~ 30 Hz ( ,background). Current reconstruction algorithm. ~ 0.2 sec/event on a 400 MHz Pentium processor. Assuming 4 GHz processors by launch ~ 0.02 sec/event. ~ 5 processors more than adequate to keep up with incoming data as well as turning around a days downlink in ~ 4 hours. Represents only about ~1 % of current capacity of SLAC Computing Center.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 3 LAT Data Processing Facility Even inflating estimates by considering re-processing data concurrently with prompt processing, a conservative estimate of resource requirements over the life of the mission is: ~ a few tens of processors. ~ 50 TB of disk. SLAC Computing Center is officially committed to providing these resources at no explicit expense to GLAST.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 4 balloon Mock Data Challenge flight PDR Raw+Recon+MC 8.5 TB/year (BABAR temp space is 25 TB) LAT Data Processing Facility

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 5 balloon Mock Data Challenge flight PDR Raw+Recon+MC 8 CPUs dedicated (SLAC Computing Services will have ~2000 CPUs for BABAR) LAT Data Processing Facility

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 6 Processing Pipeline CPUs; ~50 TB disk by 2010 IOC Batch system Level 0 Level 1, diagnostics HSM Automated Tape Archive Level 0 Section 7.8 SAS Overview

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 7 Data Manager Raw Data Files (Test Beam, Balloon, Flight) Housekeeping File(s) Oracle Data Base Raw ROOT Files (Test Beam, Balloon, Flight, MC) Reconstructed ROOT Files (Test Beam, Balloon, Flight, MC ROOTWriter MC IRF Files MC Generator (GismoGenerator,G4) Reconstruction (aoRecon,TkrRecon CalRecon) Interface (WWW, GUI,CLI) Processing Files Event Based Analysis (ROOT,…) LAT Data Manager

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 8 LAT Data Manager Automated Server (Data Manager) Initial specification : Dispatches files in various states to appropriate processes. Tracks state of processing for all datasets in system (completed, pending, failed, etc.) and logs this information to the database. Provides near real-time feedback to the IOC by performing rapid, high level diagnostic analyses that integrate data from all subsystems.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 9 LAT Data Manager Automated Server (Data Manager) - cont. Design is simplified by having all datasets always on disk, at least virtually – utilizes HSM provided and supported by SLAC computing center. Utilizes load balancing LSF batch system at SLAC to dispatch processing jobs in parallel. Provides a WWW interface for dispatching and tracking processes. Current prototype at SLAC, written in perl, is used for processing MC runs.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 10 LAT Data Processing Database Heart of data processing facility is a database to handle state of processing, as well as an automated server. Entity diagram for prototype at: –relational database tracks state of file based datasets throughout lifetime in the system, from arrival at IOC or MC generation, through Level 1 output. –Automated server will poll IOC generated database entries for new Level 0 datasets and take immediate action, as well as generate MC data, and log all actions to the database.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 11 Prototype DB ERM

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 12 LAT Data Processing Database Database Three categories of relational tables 1.Tasks 2.Processes 3.Datasets Tables allow for grouping of similar datasets “Values" entries in “info” tables allow customization of tables by task, hopefully allowing for different metadata for MC, flight, test beam,… data Current prototype based on experience with similar data pipeline used for SLD experiment at SLAC

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 13 LAT Data Processing Database Database Tables I Tasks Major groupings of datasets, e.g. flight data, BFEM data, or particular MC simulations, (eg 50M background events using pdrApp v7) Allows grouping of datasets associated with these tasks Tasks will have differing types of metadata describing them, particularly for MC simulations

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 14 LAT Data Processing Database Database Tables II Processes Series of processes will be applied to each input dataset to produce the final output (note that for simulations the input ‘dataset’ may be an initial random number seed or seed sequence) Database will track the sequence of processes as well as all datasets generated by them Different tasks may require different processes and, indeed, different sequences Executable and its version number should be identified in the database Properties of processes (jobs) should also be tracked, for example memory used, CPU time, node name

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 15 LAT Data Processing Database Database Tables III Datasets Data will be handled as files (on disk or tape) Datasets will be the inputs and outputs of processes in the sequences Processes can generate multiple datasets Processes in a sequence may depend on particular output datasets of precursor processes Properties of datasets will be recorded, eg location, size, status Datasets may also contain metadata (different for different tasks)

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 16 Data Manager Prototype Existing Data Manager Prototype is set of perl scripts that: Performs automated MC batch processing using LSF and SLAC batch farm (e.g. produced 50 M background events, 10 M gammas for PDR studies) Provides utilities for processing, filtering, and displaying results of MC runs Provides very preliminary scripts for entering results of MC runs into Oracle tables Will evolve into Data Manager for Data Processing Facility (DPF) by being split into a server and set of utility packages as described in Data Manager spec

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 17 Manpower & Schedule Budget cut for ’02 delays official start until FY ’03 –Starting anyways with prototypes –Use student help and steal time Manpower Estimates –Server – 1 FTE for 1 year –Diagnostics – 1 FTE for 6 months –Web interface – 1 FTE for ~ 2 months –Support – 0.5 FTE Schedule –We’ll see how the student help works out. Want a system in place now to track MC processing. –Diagnostics - delay start until nearer launch. May need to set up for Qual unit in late ’03.