Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS.

Slides:



Advertisements
Similar presentations
MySQL Installation Guide. MySQL Downloading MySQL Installer.
Advertisements

1 CRAB Tutorial 19/02/2009 CERN F.Fanzago CRAB tutorial 19/02/2009 Marco Calloni CERN – Milano Bicocca Federica Fanzago INFN Padova.
Code Composer Department of Electrical and Computer Engineering
1 CMS user jobs submission with the usage of ASAP Natalia Ilina 16/04/2007, ITEP, Moscow.
CRAB Tutorial Federica Fanzago – Cern/Cnaf 13/02/2007 CRAB Tutorial (Cms Remote Analysis Builder)
User Experience in using CRAB and the LPC CAF Suvadeep Bose TIFR/LPC CMS101++ June 20, 2008.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 12 Separate Compilation Namespaces Simple Make Files (Ignore all class references.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
Usage of the Python Programming Language in the CMS Experiment Rick Wilkinson (Caltech), Benedikt Hegner (CERN) On behalf of CMS Offline & Computing 1.
Application Process USAJOBS – Application Manager USA STAFFING ® —OPM’S AUTOMATED HIRING TOOL FOR FEDERAL AGENCIES.
7/17/2009 rwjBROOKDALE COMMUNITY COLLEGE1 Unix Comp-145 C HAPTER 2.
Python quick start guide
Renesas Technology America Inc. 1 M16C/Tiny SKP Tutorial 2 Creating A New Project Using HEW4.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
SCRAM Software Configuration, Release And Management Background SCRAM has been developed to enable large, geographically dispersed and autonomous groups.
UNIX command line. In this module you will learn: What is the computer shell What is the command line interface (or Terminal) What is the filesystem tree.
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Physicists's experience of the EGEE/LCG infrastructure usage for CMS jobs submission Natalia Ilina (ITEP Moscow) NEC’2007.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training course describes how to configure the the C/C++ compiler options.
© 2012 LogiGear Corporation. All Rights Reserved Robot framework.
JAS3 + AIDA LC Simulations Workshop SLAC 19 th May 2003.
It’s All About UTS09 Or UTS10, or whatever…….. TaxWise is Very Portable Install TaxWise from the original CD; register EFIN, enter Site Information; save.
H.Melikian Introduction on C C is a high-level programming language that forms the basis to other programming languages such as C++, Perl and Java. It.
Lecture Set 2 Part B – Configuring Visual Studio; Configuration Options and The Help System (scan quickly for future reference)
CMSBrownBag,05/29/2007 B.Mangano How to “use” CMSSW on own Linux Box and be happy In this context “use” means: - check-out pre-compiled CMSSW code - run.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Introduction to Eclipse CSC 216 Lecture 3 Ed Gehringer Using (with permission) slides developed by— Dwight Deugo Nesa Matic
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
The report on the current situation of the BESIII framework zhangxiaomei maqiumei 10/3/2004.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
July 28' 2011INDIA-CMS_meeting_BARC1 Tier-3 TIFR Makrand Siddhabhatti DHEP, TIFR Mumbai July 291INDIA-CMS_meeting_BARC.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
GRID. Register Fill the form. Your IP (Juanjo) signature is needed and the one from the.
PRS Session, May 12, 2006Filip Moortgat, ETHZ Generator Interface Generator Interface in CMSSW existing/planned interfaces with generators content of the.
Debugging and Profiling With some help from Software Carpentry resources.
Gaudi Framework Tutorial, April Algorithm Tools: what they are, how to write them, how to use them.
(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.
User Experience in using CRAB and the LPC CAF Suvadeep Bose TIFR/LPC US CMS 2008 Run Plan Workshop May 15, 2008.
Evaluating & Maintaining a Site Domain 6. Conduct Technical Tests Dreamweaver provides many tools to assist in finalizing and testing your website for.
Argonne Jamboree January 2010 Esteban Fullana AOD example analysis.
Ganga 4 Basics - Tutorial Jakub T. Moscicki ARDA/LHCb Ganga Tutorial, November 2005.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
Python Basics  Functions  Loops  Recursion. Built-in functions >>> type (32) >>> int(‘32’) 32  From math >>>import math >>> degrees = 45 >>> radians.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
12. MODULES Rocky K. C. Chang November 6, 2015 (Based on from Charles Dierbach. Introduction to Computer Science Using Python and William F. Punch and.
1Bockjoo Kim 2nd Southeastern CMS Physics Analysis Workshop CMS Commissioning and First Data Stan Durkin The Ohio State University for the CMS Collaboration.
Dr. Abdullah Almutairi Spring PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used,
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
A GANGA tutorial Professor Roger W.L. Jones Lancaster University.
1 Tutorial:Initiation a l’Utilisation de la Grille EGEE/LCG, June 5-6 N. De Filippis CMS tools for distributed analysis N. De Filippis - LLR-Ecole Polytechnique.
Joe Foster 1 This talk extends the one I gave in 2006 called Visualizing Data with ROOT. –
Outline of Script Import Modules Setup Workspace Environment and Assign Data Path Variables Summary of Script Title and Author Info.
Modularization of Geant4 Dynamic loading of modules Configurable build using CMake Pere Mato Witek Pokorski
Starting Analysis with Athena (Esteban Fullana Torregrosa) Rik Yoshida High Energy Physics Division Argonne National Laboratory.
Session 7: More Module Interface Rob Kutschke art and LArSoft Course August 4, 2015.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
AOD example analysis Argonne Jamboree January 2010
Python’s Modules Noah Black.
Tree based validation tool for track reconstruction
USAJOBS – Application Manager
CRAB and local batch submission
MET Experience at UF Bobby Scurlock University of Florida
Stata Basic Course Lab 2.
Review We've seen that a module is a file that can contain classes as well as its own variables. We've seen that you need to import it to access the code,
Presentation transcript:

Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS ● Running CMSSW with CRAB Most of the files used in the tut. can be found in /afs/cern.ch/user/g/gpetrucc/public/Tutorial151206

Initialize the environment First time only: scramv1 project CMSSW CMSSW_1_2_0_pre9 cd CMSSW_1_2_0_pre9/src eval `scramv1 runtime -(c)sh` cmscvsroot CMSSW cvs login (use “98passwd” as password) All the other times: cd CMSSW_1_2_0_pre9/src eval `scramv1 runtime -(c)sh` cmscvsroot CMSSW

Create a EDAnalyzer skeleton ● Create your working directory under CMSSW_xxx/src mkdir Tutorial151206; cd Tutorial ● Create an EDAnalyzer named “Simple” mkedanlzr Simple This will create the following structure Simple/ (contains “BuildFile”) Simple/src (contains “Simple.cc”) Simple/interface,doc,test (all empty)

“Simple.cc” structure: #include class Simple : public EDAnalyzer { public:... private:... } void Simple::analyze(...) {... } void Simple::beginJob(...) {... } void Simple::endJob(...) {... }

Simple analysis task Count the number of tracks with pT > 5 GeV We need to: ● At the beginning: create an empty histogram. ● For every event:  Get the tracks  Loop on tracks, cut on pt and count  Fill the histogram ● At the end: write the histogram to a root file

How are tracks stored ? ● Go to the documentation page for RECO data: /html/RecoData.htmlRECO We have found out that tracks are of type reco::Track, stored in a reco::TrackCollection with name “ctfWithMaterialTracks”

What's a “Track” for CMSSW ? Click on the reco::Track link and find out:reco::Track ● Include file ● Package: DataFormats/TrackReco Then click on List all members to get the info:List all members You will find a member function “pt()”. Click on it.pt() Now we can start writing C++ code

How are tracks stored ? ● Go to the documentation page for RECO data: /html/RecoData.htmlRECO We have found out that tracks are of type reco::Track, stored in a reco::TrackCollection with name “ctfWithMaterialTracks”

Create the histogram class Simple : public EDAnalyzer {... private:... // member data TH1F *m_Tracks; } void Simple::beginJob(...) { m_Tracks = new TH1F(“tracks”, “Tracks (Pt > 5 GeV)”, 10, 0, 10); }

Get track collection void Simple::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup) { using namespace edm; using namespace reco; Handle tracks; iEvent.getByLabel(“ctfWithMaterialTracks”, tracks) [...] }

Loop over the tracks Handle tracks; iEvent.getByLabel([...]); TrackCollection::const_iterator trk; for (trk = tracks->begin(); trk != tracks->end(); ++trk) { [...] }

Cut on track pT and count int count = 0; TrackCollection::const_iterator trk; for (trk = tracks->begin(); [...]) { if (trk->pt() > 5.0) { count++; } m_Tracks->Fill(count);

Save the histogram void Simple::endJob(...) { TFile *f = new TFile(“histo.root”, “RECREATE”); f->WriteTObject(m_Tracks); f->Close(); delete m_Tracks; delete f; }

Now some technicalities: ● Adding the required include files (at the beginning of Simple.cc) #include... #include "DataFormats/TrackReco/interface/Track.h" #include

Adding libraries in BuildFile......

Compile your EDAnalyzer ● Go into the main folder of your project (CMSSW_xxx/src/Tutorial151206/Simple) ● scramv1 build (and cross your fingers) Parsing BuildFiles Entering Package Tutorial151206/Simple [...] >> Compiling [...]/Simple/src/Simple.cc >> Building shared library [...]/libTutorial151206Simple.so [...] Checking shared library for missing symbols: [...] --- Registered SEAL plugin Tutorial151206Simple [...] ● >> Package Simple built

Create test/Simple.cfg Process Demo = { source = PoolSource { untracked vstring fileNames = { "/afs/cern.ch/user/g/gpetrucc/public/Tutorial151206/ PhysVal-DiElectron-Ene10.root" } module demo = Simple { } path p = {demo} }

Run the EDAnalyzer ● Go to the Simple/test directory cmsRun Simple.cfg Using the site default catalog [...] %MSG-i FwkReport: [...] BeforeEvents Begin processing the 1th record. Run 1, Event 1 %MSG-i FwkReport: [...] Run: 1 Event: 1 Begin processing the 2th record. Run 1, Event 2 [...] [...] 10 %MSG-i FwkJob: PostSource [...] Run: 1 Event: 10 [...] ● Open “histo.root” and enjoy the plot

Links to more details: Core CMSSW Documentation: (some days the link is broken) Setting up CMSSW Environment: Writing a framework module: Tutorials from last CMSWeek:

Same thing, interactive Install the python tools (only once) cd CMSSW_xxxx/src cmscvsroot CMSSW cvs co -r HEAD PhysicsTools/PythonAnalysis Setup python environment (every time) (bash:) export PYTHONPATH=${PYTHONPATH}:$CMSSW_BASE/src/ PhysicsTools/PythonAnalysis/python (tcsh:) setenv PYTHONPATH ${PYTHONPATH}:$CMSSW_BASE/src/ PhysicsTools/PythonAnalysis/python

Interactive: startup ● Create a new file simple.py ● Start with the lines to initialize FWLite/PyROOT from ROOT import * from cmstools import * gSystem.Load("libFWCoreFWLite.so") AutoLibraryLoader.enable()

Interactive: read the data data = TFile("/afs/cern.ch/user/g/gpetrucc/ public/Tutorial151206/PhysVal-DiElectron- Ene10.root") events = EventTree(data.Get("Events")) trackBranch = events.branch("ctfWithMaterialTracks")

Interactive: event loop for event in events: tracks = trackBranch() # read tracks count = 0 # init counter for trk in tracks: # loop over tracks if trk.pt() > 5.0: # cut on pT count++ # increment print "Found ",count," tracks" # print

Interactive: running python simple.py Preparing CMS tab completer tool... Loading FWLite dictionary... Warning in [...] Found 0 tracks [...] Found 1 tracks

Histograms in pyton [..] histo = TH1F("tracks", "Tracks (Pt > 5 GeV)", 10, 0, 10) for event in events: [...] print "Found ",count," tracks" # print histo.Fill(count) f = TFile("histo.root", "RECREATE") f.WriteTObject(histo) f.Close()

Pros and cons of Python/FWLite PRO ● No need to recompile ● No need to include headers, BuildFile,... ● Shorter code ● Can be used interactively (check also ipython) ● Untyped functions allow greater code reuse CON ● Can use only some CMSSW packages ● Currently there are problems with:  Refs (e.g. B-tagging)  AssociationMaps)  TChains [there are workarounds] ● Can just read events... ● Can't run on CRAB

Finding data with DBS/DLS ● Reach for the DBS/DLS page: (“expert” is needed to get 1_2_x samples)

Finding data ● DBS Instance: RelVal/Writer (for 1_2_0_pre9) ● Application: anything with 1_2_0_pre9 (those with FEVT or Merged should work fine) ● Primary dataset: RelVal120pre9

Search results (summary) You can read from the summary view: A) The collection name (for CRAB) /RelVal120pre9Higgs-ZZ-4Mu/FEVT/ CMSSW_1_2_0_pre9-FEVT unmerged B) The site at which is stored (cern, fnal) C) The number of events available (2k, 1.2k)

Search results (Block details) Clicking on “Blocks” more information is given. To see the logical file names for the data, click on “plain” under “LFN list”. You should have a list of files like /store/unmerged/RelVal/... The physical location on castor is (usually) /castor/cern.ch/cms/store/unmerged/...

Reading that data with CMSSW ● Write LFNs in the.cfg file source = PoolSource { untracked int32 maxEvents = 3 untracked vstring fileNames = { “/store/ungerged/...”, [...] } } (write just the LFN, no “file:” and no “/castor”!) ● Remember to set maxEvents unless you want to read all the events in the file... ● Check if the sample is really in /castor before...

Running on remote samples CRAB Before using crab you need: ● A working CMSSW ● A working EDAnalyzer (with his cfg file) ● Access to Grid: certificate, VO membership ● The name of a data sample you want to access

Setup crab Setup your environment (every time): source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh (on lxplus) (source xxx.csh if you use tcsh) Additional tasks (first time only): ● Execute $CRABDIR/configureBoss ● Copy the default crab.cfg file from /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.cfg

Configure CRAB (crab.cfg) ● Read the comments in the cfg file ! ● [CRAB] section: main configuration  jobtype = cmssw (always)  scheduler = glitecoll (also edg should work) ● [CMSSW]: your job configuration (important!)  datasetpath= (“None” if you use Pythia...)  pset=  total_numer_of_events  events_per_job  output_file =

Configure CRAB (crab.cfg) ● [USER] section: common info  return_data = 1 (get your output back with crab)  copy_data = 0 (=1 to save the output on castor... more tricky) ● [EDG] section: GRID configuration (optional)  ce_white_list, se_white_list: use only the CE/SE with names in the list; you can try “cern”, “infn”)  ce_black_list, se_black_list: never use CE/SE whith the specified name (i.e. “tw”, “fnal”, “cern”)  rb = CERN (try CNAF if cern does not work)

Configure CRAB for RelVal ● By default, CRAB looks for samples in the MCGlobal/Writer DBS ● In order to read the RelVal samples, some more tweaking of crab.cfg is needed: the following parameters must be added under the [CMSSW] section dbs_instance=RelVal/Writer dls_endpoint=prod-lfc-cms-central.cern.ch/ grid/cms/DLS/RelVal ● This allows to set datasetpath to RelVal samples

Set up your EDAnalyzer.cfg ● The normal cfg file used for your job works fine. ● Crab takes care of setting up the options of the PoolSource (maxEvents, fileNames) ● Check the name of the output files! Crab takes care of adding “_ ” to each file name when retriving the job output.

Running CRAB ● Create and submit the jobs: crab -create -submit ● See the status of your jobs crab -status (hint: watch -n 120 “crab -status” ) ● Get the output of the completed jobs crab -getoutput

Further information ial.pdf

Backup slides ● Python crash course (4 slides)

Python crash course (1) ● Python is a scripting language. Script are executed just by typing “python ” ● You can also open a python interactive prompt: python [...] >>> >>> ● Writing is done with print print “Hello world. I = ”,i ● There is no “;” at the end of line

Python crash course (2) ● Comments start with “#” end finish at end of line: # this will be ignored ● Variable types are not declared. i = 37 (and not int a = 37 as in C++) ● Blocks are done with indentation, not “{“, “}”: if x > 3: print “x is large (x=“, x, “)” else: print “x is negligible” for i in range(5): # 0,1,2,3,4 print i

Python crash course (3) ● Python is object oriented ● There is no “new” keyword for creating objects: file = TFile(“ciao”) ● Members are accessed with “.” (dot) file.Close() (and not file->Close() ) ● Memory management is automatic: there is no need to call “delete”, “free()” as in C++ ● No pointers (objects are always “references”)

Further info on Python Tutorials and guides: PyROOT (use ROOT from Python): ftp://root.cern.ch/root/doc/chapter20.pdf Python within CMSSW (twiki): s ysis