Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
CHEP031 Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid Jinghua Liu for Pablo Yepes, Jinghua Liu Rice University, Houston, TX Maarten Ballintijn,
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Types of Operating System
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
PROOF in Atlas Tier 3 model Sergey Panitkin 1 BNL.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
Overview, Major Developments, Directions1 ROOT Project Status Major developments Directions NSS05 Conference 25 October Ren é Brun CERN Based on my presentation.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Background Computer System Architectures Computer System Software.
The ALICE data quality monitoring Barthélémy von Haller CERN PH/AID For the ALICE Collaboration.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ROOT and PROOF Tutorial Arsen HayrapetyanMartin Vala Yerevan Physics Institute, Yerevan, Armenia; European Organization for Nuclear Research (CERN)
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Experience of PROOF cluster Installation and operation
Diskpool and cloud storage benchmarks used in IT-DSS
PROOF – Parallel ROOT Facility
Chapter 2: System Structures
G. Ganis, 2nd LCG-France Colloquium
Support for ”interactive batch”
PROOF - Parallel ROOT Facility
Presentation transcript:

Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN

DVD stack with one year of LHC data! (~ 20 Km) Mt. Blanc (4.8 Km) LHC Data Challenge The LHC generates: ■ 40 million collisions per second Combined the 4 experiments record: ■ After filtering, 100 interesting collision per second ■ From 1 to 12 MB per collision ⇒ from 0.1 to 1.2 GB/s ■ collisions registered every year ■ ~ 10 PetaBytes (10 15 B) per year ■ LHC data correspond to 20 millions DVD’s per year! ■ Computing power equivalent to of today’s PC ■ Space equivalent to large PC disks Balloon (30 Km) Airplane (10 Km)

Typical HEP analysis needs a continuous algorithm refinement cycle HEP Data Analysis Implement algorithm Run over data set Make improvements

HEP Data Analysis Ranging from I/O bound to CPU bound Need many disks to get the needed I/O rate Need many CPUs for processing Need a lot of memory to cache as much as possible

Some ALICE Numbers 1.5 PB of raw data per year 360 TB of ESD+AOD per year (20% of raw) One pass using 400 disks at 15 MB/s will take 16 hours Using parallelism is the only way to analyze this amount of data in a reasonable amount of time

PROOF Design Goals System for running ROOT queries in parallel on a large number of distributed computers or multi-core machines PROOF is designed to be a transparent, scalable and adaptable extension of the local interactive ROOT analysis session Extends the interactive model to long running “interactive batch” queries

Where to Use PROOF CERN Analysis Facility (CAF) Departmental workgroups (Tier-2’s) Multi-core, multi-disk desktops (Tier-3/4’s)

The Traditional Batch Approach File catalog BatchScheduler Storage CPU’s Query Split analysis job in N stand-alone sub-jobs Collect sub-jobs and merge into single output Batch cluster Split analysis task in N batch jobs Job submission sequential Very hard to get feedback during processing Analysis finished when last sub-job finished Job splitter Job Job Job Job Job Merger Job Job Job Job QueueJob Job Job Job

The PROOF Approach File catalog Master Scheduler Storage CPU’s Query PROOF query: data file list, mySelector.C Feedback, merged final output PROOF cluster Cluster perceived as extension of local PC Same macro and syntax as in local session More dynamic use of resources Real-time feedback Automatic splitting and merging

Multi-Tier Architecture Adapts to wide area virtual clusters Geographically separated domains, heterogeneous machines Network performance Less important VERY important Optimize for data locality or high bandwidth data server access

The ROOT Data Model Trees & Selectors preselectionanalysis Ok Output list Process() Branc h Lea f Eventn Read needed parts only Chain Loop over events 12nlast Terminate() - Finalize analysis (fitting,...) (fitting,...)Terminate() - Finalize analysis (fitting,...) (fitting,...)Begin() - Create histograms - Define output list Begin() - Create histograms - Define output list

TSelector - User Code // Abbreviated version class TSelector : public TObject { protected: TList *fInput; TList *fOutput; public void Notify(TTree*); void Begin(TTree*); void SlaveBegin(TTree *); Bool_t Process(int entry); void SlaveTerminate(); void Terminate(); }; 12

TSelector::Process() // select event b_nlhk->GetEntry(entry); if (nlhk[ik] <= 0.1) return kFALSE; b_nlhpi->GetEntry(entry); if (nlhpi[ipi] <= 0.1) return kFALSE; b_ipis->GetEntry(entry); ipis--; if (nlhpi[ipis] <= 0.1) return kFALSE; b_njets->GetEntry(entry); if (njets < 1) return kFALSE; // selection made, now analyze event b_dm_d->GetEntry(entry); //read branch holding dm_d b_rpd0_t->GetEntry(entry); //read branch holding rpd0_t b_ptd0_d->GetEntry(entry); //read branch holding ptd0_d //fill some histograms hdmd->Fill(dm_d); h2->Fill(dm_d,rpd0_t/ *1.8646/ptd0_d);...

The Packetizer The packetizer is the heart of the system It runs on the master and hands out work to the workers Different packetizers allow for different data access policies ■ All data on disk, allow network access ■ All data on disk, no network access ■ Data on mass storage, go file-by-file ■ Data on Grid, distribute per Storage Element Makes sure all workers end at the same time Pull architecture workers ask for work, no complex worker state in the master

PROOF Scalability on Multi-Core Machines Current version of Mac OS X fully 8 core capable. Running my MacPro since 4 months with dual Quad Core CPU’s.

Production Usage in Phobos 16

Interactive Batch Allow submission of long running queries Allow client/master disconnect, reconnect Allow interaction and feedback at any time during the processing

Analysis Scenario AQ1: 1s query produces a local histogram AQ2: a 10m query submitted to PROOF1 AQ3 - AQ7: short queries AQ8: a 10h query submitted to PROOF2 BQ1: browse results of AQ2 BQ2: browse intermediate results of AQ8 AQ3 - AQ6: submit 4 10m queries to PROOF1 CQ1: browse results of AQ8, BQ3 - BQ6 Monday at 10:15 ROOT session on my laptop Monday at 16:25 ROOT session on my desktop Wednesday at 8:40 Browse from any web browser

Open/close sessions Define a chain Submit a query, execute a command Query editor Online monitoring of feedback histograms Browse folders with query results Retrieve, archive and delete query results Session Viewer GUI

Monitoring MonALISA based monitoring ■ Each host reports to MonALISA ■ Each worker reports to MonALISA Internal monitoring ■ File access rate, packet generation time and latency, processing time, etc. ■ Produces a tree for further analysis

Query Monitoring The same for: CPU usage, cluster usage, memory, event rate, local/remote MB/s and files/s 21

ALICE CAF Test Setup Since May evaluation of CAF test setup ■ 33 machines, 2 CPUs each, 200 GB disk Tests performed ■ Usability tests ■ Simple speedup plot ■ Evaluation of different query types ■ Evaluation of the system when running a combination of query types

Query Type Cocktail 4 different query types ■ 20% very short queries ■ 40% short queries ■ 20% medium queries ■ 20% long queries User mix ■ 33 nodes ■ 10 users, 10 or 30 workers/user, max ave. speedup = 6.6 ■ 5 users, 20 workers/user ■ 15 users, 7 workers/user

Relative Speedup 24 Average expected speedup

Relative Speedup 25 Average expected speedup

Cluster Efficiency 26

Multi-User Scheduling Scheduler is needed to control the use of available resources in multi-user environments Decisions taken per query based on the following metric: ■ Overall cluster load ■ Resources needed by the query ■ User quotas and priorities Requires dynamic cluster reconfiguration Generic interface to external schedulers planned (Condor, LSF,...)

Conclusions The LHC will generate data on a scale not seen anywhere before LHC experiments will critically depend on parallel solutions to analyze their enormous amounts of data Grids will very likely not provide the needed stability and reliability we need for repeatable high statistics analysis Wish us good luck!