fields of possible improvement

Slides:



Advertisements
Similar presentations
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
Advertisements

QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some.
Rapid Prototyping Model
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Group Electronique Csnsm AGATA SLOW CONTROL MEETING 19th fev AGATA PROJECT PREPROCESSING MEZZANINE SLOW CONTROL GUI FOR THE SEGMENT AND THE CORE.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Plans for Trigger Software Validation During Running Trigger Data Quality Assurance Workshop May 6, 2008 Ricardo Gonçalo, David Strom.
Tracking Task Force Predrag Buncic Offline Week, 19 March 2014.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Program Development Cycle Modern software developers base many of their techniques on traditional approaches to mathematical problem solving. One such.
Offline report – 7TeV data taking period (Mar.30 – Apr.6) ALICE SRC April 6, 2010.
DQM status report Y. Foka (GSI) Offline week from pp to PbPb.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
Cosmic Ray Run III Cosmic Ray AliEn Catalogue LHC08b 1.
ALICE Offline Week, CERN, Andrea Dainese 1 Primary vertex with TPC-only tracks Andrea Dainese INFN Legnaro Motivation: TPC stand-alone analyses.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
Debugging of # P. Hristov 04/03/2013. Introduction Difficult problem – The behavior is “random” and depends on the “history” – The debugger doesn’t.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Pixel DQM Status R.Casagrande, P.Merkel, J.Zablocki (Purdue University) D.Duggan, D.Hidas, K.Rose (Rutgers University) L.Wehrli (ETH Zuerich) A.York (University.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
2/18/2016I. Ravinovich1 HBD Run-10 data analysis I. Ravinovich WIS.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Predrag Buncic CERN Future of the Offline. Data Preparation Group.
Practice Array Web Process By Shengli Hu. Wads Practice 1 review wads everyday morning. mail assignment everyday. keep the status of wads reflect.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
Physics selection: online changes & QA M Floris, JF Grosse-Oetringhaus Weekly offline meeting 30/01/
V5-01-Release & v5-02-Release Peter Hristov 20/02/2012.
Offline Weekly Meeting, 24th April 2009 C. Cheshkov & C. Zampolli.
AliRoot Classes for access to Calibration and Alignment objects Magali Gruwé CERN PH/AIP ALICE Offline Meeting February 17 th 2005 To be presented to detector.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Some topics for discussion 31/03/2016 P. Hristov 1.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
AliRoot survey: Calibration P.Hristov 11/06/2013.
V4-19-Release P. Hristov 11/10/ Not ready (27/09/10) #73618 Problems in the minimum bias PbPb MC production at 2.76 TeV #72642 EMCAL: Modifications.
DAQ thoughts about upgrade 11/07/2012
SYSTEM INTEGRATION TESTING Getting ready for testing shifts Gunter Folger CERN PH/SFT Geant4 Collaboration Workshop 2011 SLAC.
1 14th June 2012 CPass0/CPass1 status and development.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 24/05/2015.
January 2009 offline detector review - 2 nd go 1 ● Offline – Geometry – Material budget – Simulation – Raw data – OCDB parameters – Reconstruction ● Calibration.
V4-18-Release P. Hristov 21/06/2010.
Calibration: preparation for pa
Principles of Information Systems Eighth Edition
Data Formats and Impact on Federated Access
Adam Backman Chief Cat Wrangler – White Star Software
PPR Production – What Is Available
DPG Activities DPG Session, ALICE Monthly Mini Week
CPass0/CPass1 on LHC12d/c Updated at 08:45 on 02/08
Visualization of embedding
MICE Collaboration Meeting Saturday 22nd October 2005 Malcolm Ellis
v4-18-Release: really the last revision!
New calibration strategy – follow-up from the production review
Analysis Trains - Reloaded
Savannah to Jira Migration
LCGAA nightlies infrastructure
Introduction to Operating System (OS)
Experience between AMORE/Offline and sub-systems
Data Preparation Group Summary of the Activities
Analysis framework - status
TPC status - Offline Q&A
Software Development Process
CS5123 Software Validation and Quality Assurance
MapReduce: Simplified Data Processing on Large Clusters
Offline framework for conditions data
Presentation transcript:

fields of possible improvement detectors – offline fields of possible improvement attitude and communication testing/benchmarking aliroot production diagnostics

attitude and communication differences in perception of the roles of the offline and the detectors problems often at the border and difficult to pin down (“crash only with the new software and only on grid”) atmosphere of working against each other rather than collaboration; hiding own errors , pointing to the errors of the other side unfriendly ways of communication: bullying, ridiculing, cutting discussion, ignoring

attitude and communication examples selection of events based on logical expression involving trigger classes requested by PWGPP requested by PWGPP when preparing for Pb-Pb in 2011; implemented but apparently not working https://savannah.cern.ch/task/?23160 re-requested in 2012 https://savannah.cern.ch/bugs/?91510 again discussed whether it is needed and how to do it https://savannah.cern.ch/task/?27425 (comments 6-15) finally OK problem reading TPC/Calib/Correction from OCDB. Offline: “TPC, reduce or split this object”. TPC: technical problems should be solved by the offline. merging calibration results for long runs does not work. Offline: “calibration experts, check your code”. Offline: “detectors, reduce your statistics requirement”. Actual reason: memory consumption during TFile:Cp. Repeated this week again. attempt to hide a faulty OCDB selection in the shadow of a physics selection bug https://savannah.cern.ch/task/?27425 comments 222-231

testing aliroot at present detectors are expected to test their software on grid before putting in production for a normal user it is tedious For example, testing the calibration software means: modify software aliroot tagged aliroot distributed on grid submit jobs if things go well, a few days later the results if things go bad, jobs crash and disappear without a trace once jobs finished, ask detectors to check

testing-facility proposal ALICE has O(10000) cores. Let’s dedicate O(100) cores for nightly tests of trunk cpass0/cpass1/full reco of a well defined recent run full reco of special samples (high pt tracks, Z>1 tracks, …) in the morning people can look at the result keep the results from the last 30 days keep some older results with lower granulation could be run by two service-task students, 6 months, interleaved this proposal was made in April 2012 but found little interest of the offline: “nightly test with 100 machines is useless”, “completely redundant”, “will just add entropy”, “idea strongly discouraged”

diagnostics understanding why grid jobs failed is often very difficult monalisa is extremely useful but some cases require statistical analysis: failing rate as a function of aliroot version, running place and time, CPU load during running, number of resubmissions, etc. for full diagnostics, we need to combine information from logbook, monalisa, QA

BACKUP