ATLAS Analysis Overview Eric Torrence University of Oregon/CERN 10 February 2010 Atlas Offline Software Tutorial.

Slides:



Advertisements
Similar presentations
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Advertisements

SLUO LHC Workshop, SLACJuly 16-17, Analysis Model, Resources, and Commissioning J. Cochran, ISU Caveat: for the purpose of estimating the needed.
Ganga in the ATLAS Full Dress Rehearsal Birmingham 4th June 2008 Karl Harrison University of Birmingham - ATLAS Full Dress Rehearsal (FDR) involves loading.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
29 July 2008Elizabeth Gallas1 An introduction to “TAG”s for ATLAS analysis Elizabeth Gallas Oxford Oxford ATLAS Physics Meeting Tuesday 29 July 2008.
Data-coordinator report Lian-You SHAN ACC15 Jan 9, 2010, IHEP.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
11/10/2015S.A.1 Searches for data using AMI October 2010 Solveig Albrand.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
30 Jan 2009Elizabeth Gallas1 Introduction to TAGs Elizabeth Gallas Oxford ATLAS-UK Distributed Computing Tutorial January 2009.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
US ATLAS T2/T3 Workshop at UTANovember 11, Profiling Analysis at Startup Jim Cochran Iowa State Outline: - Computing model status - Expected Early.
Tier 3 Computing Doug Benjamin Duke University. Tier 3’s live here Atlas plans for us to do our analysis work here Much of the work gets done here.
DPDs and Trigger Plans for Derived Physics Data Follow up and trigger specific issues Ricardo Gonçalo and Fabrizio Salvatore RHUL.
TAGs and Early Data David Malon for the ATLAS TAG Team U.S. ATLAS Analysis Workshop Argonne National Laboratory January.
3rd November Richard Hawkings Luminosity, detector status and trigger - conditions database and meta-data issues  How we might apply the conditions.
Full Dress Rehearsal (FDR1) studies Sarah Allwood-Spiers 11/3/2008.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
Artemis School On Calibration and Performance of ATLAS Detectors Jörg Stelzer / David Berge.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Moritz Backes, Clemencia Mora-Herrera Département de Physique Nucléaire et Corpusculaire, Université de Genève ATLAS Reconstruction Meeting 8 June 2010.
A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
The “Comparator” Atlfast vs. Full Reco Automated Comparison Chris Collins-Tooth 19 th February 2006.
Issues with cluster calibration + selection cuts for TrigEgamma note Hardeep Bansil University of Birmingham Birmingham ATLAS Weekly Meeting 12/08/2010.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
H->bb Weekly Meeting Ricardo Gonçalo (RHUL) HSG5 H->bb Weekly Meeting, 22 February 2011.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
Trigger Input to First-Year Analysis Model Working Group And some soul searching… Trigger Open Meeting – 29 July 2009.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
L1Calo EM Efficiencies Hardeep Bansil University of Birmingham L1Calo Joint Meeting, Stockholm 29/06/2011.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
H->bb Weekly Meeting Ricardo Gonçalo (RHUL) HSG5 H->bb Weekly Meeting, 15 February 2011.
Using direct photons for L1Calo monitoring + looking at data09 Hardeep Bansil University of Birmingham Birmingham ATLAS Weekly Meeting February 18, 2010.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
ATLAS TAGs: Tools from the ELSSI Suite Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
1 Data taking period E1 Plots E1 - August 5th. Producing the plots Triggers used Plots I.Trigger. II.Luminosity. III.Jets. IV.Leptons. V.MET. What Next?
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
Metadata and Supporting Tools on Day One David Malon Argonne National Laboratory Argonne ATLAS Analysis Jamboree Chicago, Illinois 22 May 2009.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
L1Calo Databases ● Overview ● Trigger Configuration DB ● L1Calo OKS Database ● L1Calo COOL Database ● ACE Murrough Landon 16 June 2008.
Monitoring of L1Calo EM Efficiencies
Tier3(g,w) meeting at ANL ASC
Database Replication and Monitoring
SM Direct Photons meeting
ALICE analysis preservation
W→τν Note & Tau candidates in first data
Conditions Data access using FroNTier Squid cache Server
ATLAS TAGs: Tools from the ELSSI Suite
ATLAS DC2 & Continuous production
Presentation transcript:

ATLAS Analysis Overview Eric Torrence University of Oregon/CERN 10 February 2010 Atlas Offline Software Tutorial

Eric TorrenceFebruary Outline This talk is intended to give a broad outline for how analyses within Atlas should be done for first data. Technical details will be covered by other speakers ATLAS data types and locations Derived samples (dAOD/dESD) Data sample selection Luminosity

AMFY Task Force J. Boyd, A. Gibson, S. Hassani, R. Hawkings, B. Heinemann, S. Paganis, J. Stelzer, T. Wengler Analysis Model for the First Year Final report October 2009 See talk by Thorsten Wengler at Barcelona Atlas Week ionId=11&materialId=slides&confId= ionId=11&materialId=slides&confId=47256 Or the final AMFY Report

Eric TorrenceFebruary Analysis Data Formats RAW - specialized detector studies (e.g.: alignment) ESD - detector performance studies AOD - starting point for physics analyses (~1/10 of ESD) All events processed from RAW to AOD Processing separated by trigger-based data STREAM Egamma jet - Tau - MET Muon Minbias Bphysics Streams are inclusive, e.g. : All electron triggers written to e/gamma stream Physics Streams RAWAOD ESD AOD

Eric TorrenceFebruary Derived Samples AOD ESD AOD RAW Derived samples (dESD, dAOD) aim to reduce sample size further by removing events (skimming) and/or removing info (slimming/thinning) Many versions/variants possible for specific uses Skimming based on offline quantities and/or trigger dAOD dESD AOD Specialised RAW samples AODFix Central Production Group Production Small Group/User dAOD (Physics DPD) - e.g.: diMuon (2 mu, p T > 15 GeV) dESD (Perf. DPD) - e.g.: eGamma (CaloCells near e/gamma triggers) Currently popular: dESD_COLLCAND

Eric TorrenceFebruary User Data AOD ESD AOD RAW dAOD dESD AOD AODFix ntuple ESD, AOD, dESD - Stored on Grid in ATLAS space production managed centrally dAOD, ntuples - Stored in group disk or local user space managed by groups/users Each group has a grid space manager and a production manager. Good to find out who this person is... Performance Analysis Physics Analysis

Eric TorrenceFebruary Databases and Metadata Oracle server at nearest Tier-1 Tier-2 laptop desktopTier-3 Analysis model assumes any Athena job (ESD/dESD, AOD, dAOD) needs access to conditions data (e.g.: COOL) Detector geometry, conditions, alignments Data quality and trigger configuration metadata InSituPerformance information Dataset information (AMI) TAG database (TAG) Even a purely ntuple-based analysis may need some of this. Potential bottleneck, hassle for end users Running on MC typically easier than data Be aware of external files referenced by DB...

Eric TorrenceFebruary Database Access Client machine ATHENA COOL CORAL FroNTier client squid Tomcat FroNTier servlet Oracle DB server squid Server site IOVDbSvc Get File From ROOT Via POOL OraclesqliteMySQL ATLAS LCG David Front Client Machine Database Server Query/Reply Path Primary information source: Oracle server (Tier 1) Don’t want all users to hammer away on these machines Solution: FroNTier server/squid cache Being tested/deployed now across Atlas - need operational experience Topic for future tutorial!

Eric TorrenceFebruary Selecting your data Time range/machine conditions  Collision Energy: 7 TeV  All 2010 data approved for Summer conferences Detector/offline data quality (DQ flags)  Data periods with good electrons in barrel  Data with pixels turned on Trigger Configuration  EF_e20_loose active  Both e20 and mu10 active and unprescaled All analyses must define their data sample Key ingredient in defining luminosity

Eric TorrenceFebruary Luminosity Blocks ATLAS runs are subdivided into Luminosity Blocks (~1 min) LB is the atomic unit for selecting a data sample Most conditions are mapped to specific luminosity blocks  Trigger pre-scales (can only change at LB boundary)  Data Quality flags (mapped to LBs offline) Luminosity can only be determined for a specific set of runs/luminosity blocks Specifying your data sample functionally means specifying a list of runs/LBs to analyze. aka: GoodRunList Run Lumi Block

Eric TorrenceFebruary Data Quality Flags DQ Flags are simple indicators of data quality, but many many sources (I counted )  Detectors (divided by barrel, 2 endcaps)  Trigger (by slice)  Offline combined perf. (e/mu/tau/jet/MET/...) Physics analyses should not arbitrarily choose DQ flags Combined performance groups will define a recommended set of DQ flag criteria for physics objects (e.g.: Barrel electrons or forward jets) - soon with virtual flags Users in their working group will decide which set of (virtual) DQ flags is needed for each analysis, which can then be used to generate a GoodRunList Standard lists will be centrally produced by DQ group

Eric TorrenceFebruary Reprocessing Data processed at Tier 0 (right off the detector) have preliminary calibrations and first-pass DQ flags Any physics results must start with reprocessed data  Updated calibrations  Consistent release/bug fixes DQ flags are re-evaluated, tagged and locked after each reprocessing - flags fixed, but must use correct tag! Important to always access DQ information from the COOL database, using the tag appropriate to your data processing version, or else you won’t get consistent results Best to use officially generated (static) Good Run Lists appropriate for your reprocessed data

Eric TorrenceFebruary Luminosity Calculation In ATLAS, Lumi is calculated for a specific set of LBs and includes LB-dependent corrections for  Trigger prescales  L1 Trigger Deadtime The user derived ε sel must contain all other event- dependent efficiencies, including  Unprescaled trigger efficiency (vs. p T for example)  Skimming efficiency (in dAOD production)  TAG selection efficiency (in TAG-based analyses)  Event selection cuts σ = (N sel - N bgd )/( ε sel Lumi) Final luminosity can only be calculated after full GRL and trigger is specified

Eric TorrenceFebruary Example Physics Analysis Models Direct AOD/dAOD  User submitted Grid job run over all AODs or group-generated dAOD/dESD samples  Final user ntuples produced directly TAG-based analysis  Start with TAG-selected sample (see TAG tutorial)  Saves having to run over large datasets Ntuple-based analysis  Start with general purpose group ntuple where all data selection criteria haven’t been applied  Probably most complicated, but tools support this also Will concentrate on this, other cases are not so different

Eric TorrenceFebruary AOD/dAOD analysis Pre-run query based on input parameters already defines GRL (instantiated as an XML file) atlas-runquery (or TAG browser) currently best way to do this - eventually most users will start with pre-made lists Can find ‘expected’ Luminosity already without running a single job Needs COOL access, but can just be run once, XML GRL files can be transferred to your local area/laptop Time Range DQ query Trigger,... Data sample Definition RunQuery GRL (XML File) COOL LumiCalc COOL Expected Luminosity Pre-run query

Eric TorrenceFebruary AOD/dAOD analysis No automatic tool (yet) to convert GRL to AOD DataSet (although TAG does this for you) Using standard tools, you will get an output GRL - very useful to compare against input for cross-checks Copy of GRL will also be saved to your output ntuple if made with the PAT Ntuple Dumper* Input GRL (XML File) DataSet (AOD files) Distributed Analysis AMI Ganga/ pAthena job out Output Merge Output GRL (XML File) Ntuple GRL The Grid

Eric TorrenceFebruary AOD/dAOD analysis Comparing input and output XML files very useful to check for job failures and dataset consistency Output shows exactly which LBs were analyzed  run LumiCalc directly on ntuple (with COOL access)  extract XML file, copy to CERN, and run LumiCalc there Input GRL (XML File) Bookkeeping Ntuple GRL (XML File) Ntuple GRL Output GRL (XML File) Compare! LumiCalc COOL Observed Luminosity Ntuple GRL LumiCalc COOL Observed Luminosity

Eric TorrenceFebruary Sparse Data Problem Even if LB selects no events, LB still contributes to Luminosity GRL selection tool outputs LB metadata (also in ntuple) even when no events selected Derived (dAOD) samples must also obey this requirement Must merge all job output to correctly include all LBs considered working to include this in standard distributed analysis merger job 1 job 2 job 3 Grid JobOutput 1 ev 0 ev Input LB: 1-3 LB: 4-5 LB: 6-8crashed Total LBs analyzed:1-5

Eric TorrenceFebruary Other use cases Easiest to make entire selection (complete GRL specification) before launching AOD/dAOD job For many reasons, people may start with samples with partial (or no) GRL specifications applied  skimmed group dAODs  group ntuples  TAG-selected data* Scheme still works, as long as samples have been produced with the standard tools - metadata about all LBs contributing to the sample still present and consistent Users must still apply final GRL specification (including trigger) to create sample with well-defined luminosity Tools also work at the ntuple level, must save LB number

Eric TorrenceFebruary Conclusions Analysis model for the first year has undergone considerable discussion and development recently There may still be some rough edges General scheme for simple cases now exist Work ongoing to support more advanced use cases Tools available to make this easier to the end-user - many more technical details shown today There are many ways to screw this up, use central tools and central GRLs as much as possible If tools don’t seem do what you want, let a developer know!