Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS.

Similar presentations


Presentation on theme: "Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS."— Presentation transcript:

1 Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS ● Running CMSSW with CRAB Most of the files used in the tut. can be found in /afs/cern.ch/user/g/gpetrucc/public/Tutorial151206

2 Initialize the environment First time only: scramv1 project CMSSW CMSSW_1_2_0_pre9 cd CMSSW_1_2_0_pre9/src eval `scramv1 runtime -(c)sh` cmscvsroot CMSSW cvs login (use “98passwd” as password) All the other times: cd CMSSW_1_2_0_pre9/src eval `scramv1 runtime -(c)sh` cmscvsroot CMSSW

3 Create a EDAnalyzer skeleton ● Create your working directory under CMSSW_xxx/src mkdir Tutorial151206; cd Tutorial151206 ● Create an EDAnalyzer named “Simple” mkedanlzr Simple This will create the following structure Simple/ (contains “BuildFile”) Simple/src (contains “Simple.cc”) Simple/interface,doc,test (all empty)

4 “Simple.cc” structure: #include....... class Simple : public EDAnalyzer { public:... private:... } void Simple::analyze(...) {... } void Simple::beginJob(...) {... } void Simple::endJob(...) {... }

5 Simple analysis task Count the number of tracks with pT > 5 GeV We need to: ● At the beginning: create an empty histogram. ● For every event:  Get the tracks  Loop on tracks, cut on pt and count  Fill the histogram ● At the end: write the histogram to a root file

6 How are tracks stored ? ● Go to the documentation page for RECO data: http://cmsdoc.cern.ch/Releases/CMSSW/latest_nightly/doc /html/RecoData.htmlRECO We have found out that tracks are of type reco::Track, stored in a reco::TrackCollection with name “ctfWithMaterialTracks”

7 What's a “Track” for CMSSW ? Click on the reco::Track link and find out:reco::Track ● Include file ● Package: DataFormats/TrackReco Then click on List all members to get the info:List all members You will find a member function “pt()”. Click on it.pt() Now we can start writing C++ code

8 How are tracks stored ? ● Go to the documentation page for RECO data: http://cmsdoc.cern.ch/Releases/CMSSW/latest_nightly/doc /html/RecoData.htmlRECO We have found out that tracks are of type reco::Track, stored in a reco::TrackCollection with name “ctfWithMaterialTracks”

9 Create the histogram class Simple : public EDAnalyzer {... private:... // --------- member data ---------------- TH1F *m_Tracks; } void Simple::beginJob(...) { m_Tracks = new TH1F(“tracks”, “Tracks (Pt > 5 GeV)”, 10, 0, 10); }

10 Get track collection void Simple::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup) { using namespace edm; using namespace reco; Handle tracks; iEvent.getByLabel(“ctfWithMaterialTracks”, tracks) [...] }

11 Loop over the tracks Handle tracks; iEvent.getByLabel([...]); TrackCollection::const_iterator trk; for (trk = tracks->begin(); trk != tracks->end(); ++trk) { [...] }

12 Cut on track pT and count int count = 0; TrackCollection::const_iterator trk; for (trk = tracks->begin(); [...]) { if (trk->pt() > 5.0) { count++; } m_Tracks->Fill(count);

13 Save the histogram void Simple::endJob(...) { TFile *f = new TFile(“histo.root”, “RECREATE”); f->WriteTObject(m_Tracks); f->Close(); delete m_Tracks; delete f; }

14 Now some technicalities: ● Adding the required include files (at the beginning of Simple.cc) #include... #include "DataFormats/TrackReco/interface/Track.h" #include

15 Adding libraries in BuildFile......

16 Compile your EDAnalyzer ● Go into the main folder of your project (CMSSW_xxx/src/Tutorial151206/Simple) ● scramv1 build (and cross your fingers) Parsing BuildFiles Entering Package Tutorial151206/Simple [...] >> Compiling [...]/Simple/src/Simple.cc >> Building shared library [...]/libTutorial151206Simple.so [...] @@@@ Checking shared library for missing symbols: [...] --- Registered SEAL plugin Tutorial151206Simple [...] ● >> Package Simple built

17 Create test/Simple.cfg Process Demo = { source = PoolSource { untracked vstring fileNames = { "/afs/cern.ch/user/g/gpetrucc/public/Tutorial151206/ PhysVal-DiElectron-Ene10.root" } module demo = Simple { } path p = {demo} }

18 Run the EDAnalyzer ● Go to the Simple/test directory cmsRun Simple.cfg Using the site default catalog [...] %MSG-i FwkReport: [...] BeforeEvents Begin processing the 1th record. Run 1, Event 1 %MSG-i FwkReport: [...] Run: 1 Event: 1 Begin processing the 2th record. Run 1, Event 2 [...] [...] 10 %MSG-i FwkJob: PostSource [...] Run: 1 Event: 10 [...] ● Open “histo.root” and enjoy the plot

19 Links to more details: Core CMSSW Documentation: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBook http://cmsdoc.cern.ch/Releases/CMSSW/latest_nightly/doc/html/ http://cmsdoc.cern.ch/Releases/CMSSW/latest_nightly/doc/html/ (some days the link is broken) http://cmslxr.fnal.gov/lxr/ http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/ Setting up CMSSW Environment: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookSetComputerNode Writing a framework module: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookWriteFrameworkModule Tutorials from last CMSWeek: https://twiki.cern.ch/twiki/bin/view/CMS/December06CMSweekTutorials

20 Same thing, interactive Install the python tools (only once) cd CMSSW_xxxx/src cmscvsroot CMSSW cvs co -r HEAD PhysicsTools/PythonAnalysis Setup python environment (every time) (bash:) export PYTHONPATH=${PYTHONPATH}:$CMSSW_BASE/src/ PhysicsTools/PythonAnalysis/python (tcsh:) setenv PYTHONPATH ${PYTHONPATH}:$CMSSW_BASE/src/ PhysicsTools/PythonAnalysis/python

21 Interactive: startup ● Create a new file simple.py ● Start with the lines to initialize FWLite/PyROOT from ROOT import * from cmstools import * gSystem.Load("libFWCoreFWLite.so") AutoLibraryLoader.enable()

22 Interactive: read the data data = TFile("/afs/cern.ch/user/g/gpetrucc/ public/Tutorial151206/PhysVal-DiElectron- Ene10.root") events = EventTree(data.Get("Events")) trackBranch = events.branch("ctfWithMaterialTracks")

23 Interactive: event loop for event in events: tracks = trackBranch() # read tracks count = 0 # init counter for trk in tracks: # loop over tracks if trk.pt() > 5.0: # cut on pT count++ # increment print "Found ",count," tracks" # print

24 Interactive: running python simple.py Preparing CMS tab completer tool... Loading FWLite dictionary... Warning in [...] Found 0 tracks [...] Found 1 tracks

25 Histograms in pyton [..] histo = TH1F("tracks", "Tracks (Pt > 5 GeV)", 10, 0, 10) for event in events: [...] print "Found ",count," tracks" # print histo.Fill(count) f = TFile("histo.root", "RECREATE") f.WriteTObject(histo) f.Close()

26 Pros and cons of Python/FWLite PRO ● No need to recompile ● No need to include headers, BuildFile,... ● Shorter code ● Can be used interactively (check also ipython) ● Untyped functions allow greater code reuse CON ● Can use only some CMSSW packages ● Currently there are problems with:  Refs (e.g. B-tagging)  AssociationMaps)  TChains [there are workarounds] ● Can just read events... ● Can't run on CRAB

27 Finding data with DBS/DLS ● Reach for the DBS/DLS page: http://cmsdbs.cern.ch/discovery/expert (“expert” is needed to get 1_2_x samples) http://cmsdbs.cern.ch/discovery/expert

28 Finding data ● DBS Instance: RelVal/Writer (for 1_2_0_pre9) ● Application: anything with 1_2_0_pre9 (those with FEVT or Merged should work fine) ● Primary dataset: RelVal120pre9

29 Search results (summary) You can read from the summary view: A) The collection name (for CRAB) /RelVal120pre9Higgs-ZZ-4Mu/FEVT/ CMSSW_1_2_0_pre9-FEVT-1165234098-unmerged B) The site at which is stored (cern, fnal) C) The number of events available (2k, 1.2k)

30 Search results (Block details) Clicking on “Blocks” more information is given. To see the logical file names for the data, click on “plain” under “LFN list”. You should have a list of files like /store/unmerged/RelVal/... The physical location on castor is (usually) /castor/cern.ch/cms/store/unmerged/...

31 Reading that data with CMSSW ● Write LFNs in the.cfg file source = PoolSource { untracked int32 maxEvents = 3 untracked vstring fileNames = { “/store/ungerged/...”, [...] } } (write just the LFN, no “file:” and no “/castor”!) ● Remember to set maxEvents unless you want to read all the events in the file... ● Check if the sample is really in /castor before...

32 Running on remote samples CRAB Before using crab you need: ● A working CMSSW ● A working EDAnalyzer (with his cfg file) ● Access to Grid: certificate, VO membership ● The name of a data sample you want to access

33 Setup crab Setup your environment (every time): source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh (on lxplus) (source xxx.csh if you use tcsh) Additional tasks (first time only): ● Execute $CRABDIR/configureBoss ● Copy the default crab.cfg file from /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.cfg

34 Configure CRAB (crab.cfg) ● Read the comments in the cfg file ! ● [CRAB] section: main configuration  jobtype = cmssw (always)  scheduler = glitecoll (also edg should work) ● [CMSSW]: your job configuration (important!)  datasetpath= (“None” if you use Pythia...)  pset=  total_numer_of_events  events_per_job  output_file =

35 Configure CRAB (crab.cfg) ● [USER] section: common info  return_data = 1 (get your output back with crab)  copy_data = 0 (=1 to save the output on castor... more tricky) ● [EDG] section: GRID configuration (optional)  ce_white_list, se_white_list: use only the CE/SE with names in the list; you can try “cern”, “infn”)  ce_black_list, se_black_list: never use CE/SE whith the specified name (i.e. “tw”, “fnal”, “cern”)  rb = CERN (try CNAF if cern does not work)

36 Configure CRAB for RelVal ● By default, CRAB looks for samples in the MCGlobal/Writer DBS ● In order to read the RelVal samples, some more tweaking of crab.cfg is needed: the following parameters must be added under the [CMSSW] section dbs_instance=RelVal/Writer dls_endpoint=prod-lfc-cms-central.cern.ch/ grid/cms/DLS/RelVal ● This allows to set datasetpath to RelVal samples

37 Set up your EDAnalyzer.cfg ● The normal cfg file used for your job works fine. ● Crab takes care of setting up the options of the PoolSource (maxEvents, fileNames) ● Check the name of the output files! Crab takes care of adding “_ ” to each file name when retriving the job output.

38 Running CRAB ● Create and submit the jobs: crab -create -submit ● See the status of your jobs crab -status (hint: watch -n 120 “crab -status” ) ● Get the output of the completed jobs crab -getoutput

39 Further information https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookRunningGrid http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/ https://twiki.cern.ch/twiki/pub/CMS/December06CMSweekTutorials/CRAB_tutor ial.pdf http://arda-dashboard.cern.ch/cms/

40 Backup slides ● Python crash course (4 slides)

41 Python crash course (1) ● Python is a scripting language. Script are executed just by typing “python ” ● You can also open a python interactive prompt: gpetrucc@lxplus$ python [...] >>> 2+2 4 >>> ● Writing is done with print print “Hello world. I = ”,i ● There is no “;” at the end of line

42 Python crash course (2) ● Comments start with “#” end finish at end of line: # this will be ignored ● Variable types are not declared. i = 37 (and not int a = 37 as in C++) ● Blocks are done with indentation, not “{“, “}”: if x > 3: print “x is large (x=“, x, “)” else: print “x is negligible” for i in range(5): # 0,1,2,3,4 print i

43 Python crash course (3) ● Python is object oriented ● There is no “new” keyword for creating objects: file = TFile(“ciao”) ● Members are accessed with “.” (dot) file.Close() (and not file->Close() ) ● Memory management is automatic: there is no need to call “delete”, “free()” as in C++ ● No pointers (objects are always “references”)

44 Further info on Python Tutorials and guides: http://docs.python.org/tut/tut.html http://hetland.org/python/instant-python.php http://www.wag.caltech.edu/home/rpm/python_course/ http://wiki.python.org/moin/BeginnersGuide PyROOT (use ROOT from Python): ftp://root.cern.ch/root/doc/chapter20.pdf Python within CMSSW (twiki): https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookMakeAnalysi s https://twiki.cern.ch/twiki/bin/view/CMS/UserManualPythonAnal ysis


Download ppt "Data Analysis with CMSSW ● Running a simple analysis:  Within the framework: EDAnalyzer  Interactive: FWLite + PyRoot ● Finding the data with DBS/DLS."

Similar presentations


Ads by Google