Presentation is loading. Please wait.

Presentation is loading. Please wait.

ROOT I/O TTree Queries CHEP 2004 René Brun / CERN Philippe Canal / Fermilab Fons Rademakers / CERN

Similar presentations


Presentation on theme: "ROOT I/O TTree Queries CHEP 2004 René Brun / CERN Philippe Canal / Fermilab Fons Rademakers / CERN"— Presentation transcript:

1 ROOT I/O TTree Queries CHEP 2004 René Brun / CERN Philippe Canal / Fermilab Fons Rademakers / CERN http://root.cern.ch

2 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics2 Contents  Status Overview Overview List of other presentations List of other presentations  ROOT I/O Large Files Large Files Double32_t Double32_t Foreign objects Foreign objects New interfaces New interfaces XML back-end XML back-end Historical recap. Historical recap.  Containers Support Mainly for STL containers Mainly for STL containers Splitting Splitting TTree Query TTree Query  TTree Auto load of TRef’ed branches UserInfo CloneTree  TTree Query Calling free standing functions Rebinning Support for Indexed Friends Arbitrary C++ in queries (TTree::MakeProxy)  Support for SQL back-end  Future Plans

3 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics3 Presentations and Posters  [328] The Next Generation Root File Server by Andrew ANUSHEVSKY (Theatersaal: Sept 27,16:30 - 16:50) The Next Generation Root File ServerThe Next Generation Root File Server  [412] XML I/O in ROOT by Sergey LINEV (Brunig 1 + 2: Sept 29, 15:20 - 15:40) XML I/O in ROOTXML I/O in ROOT  [430] Global Distributed Parallel Analysis using PROOF and AliEn by Fons RADEMAKERS (Theatersaal: Sept 29, 15:20 - 15:40) Global Distributed Parallel Analysis using PROOF and AliEnGlobal Distributed Parallel Analysis using PROOF and AliEn  [104] Authentication/Security services in the ROOT framework by Gerardo GANIS (Brunig 3: Sept 29, 16:50 - 17:10) Authentication/Security services in the ROOT frameworkAuthentication/Security services in the ROOT framework  [169] Guidelines for Developing a Good GUI by Ilka ANTCHEVA (Brunig 1+2: Sept 30,14:00 - 14:20) Guidelines for Developing a Good GUIGuidelines for Developing a Good GUI  [287] Super scaling PROOF to very large clusters by Maarten BALLINTIJN (Ballsaal: Sept 30,15:00 - 15:20) Super scaling PROOF to very large clustersSuper scaling PROOF to very large clusters  Poster on September 29  [128] XTNetFile, a fault tolerant extension of ROOT TNetFile client XTNetFile, a fault tolerant extension of ROOT TNetFile clientXTNetFile, a fault tolerant extension of ROOT TNetFile client  Poster on September 30  [298] The ROOT 3-D graphics and geometry classes The ROOT 3-D graphics and geometry classesThe ROOT 3-D graphics and geometry classes  [170] The User Interface Design in ROOT The User Interface Design in ROOTThe User Interface Design in ROOT  [303] The ROOT Linear Algebra Package The ROOT Linear Algebra PackageThe ROOT Linear Algebra Package  [98] RDBC: ROOT DataBase Connectivity RDBC: ROOT DataBase ConnectivityRDBC: ROOT DataBase Connectivity  [99] Interactive Data Analysis with Carrot (ROOT Apache Module) Interactive Data Analysis with Carrot (ROOT Apache Module)Interactive Data Analysis with Carrot (ROOT Apache Module)

4 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics4 Status  ROOT 4.01/02 just released  Production Release of 4.01 planned for December 2004  Many improvements since CHEP2003  This talks: I/O and TTree queries I/O and TTree queries  For other developments, see the other ROOT related talks  XROOTD A new generation ROOT file server  Authentication Overhaul  Object Property Editor e.g.. TH1Editor, TH2Editor, TGraphEditor  New classes for GUI  GUI builder  Brand new GL viewer  Math and Stats New Matrix package Implementation New functions in TMath (Now a namespace) Quadratic programming

5 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics5 TFile and TDirectory  Very Large Files Support on all platforms for 64 bits integers via the portable typedefs Long64_t and ULong64_t. Support on all platforms for 64 bits integers via the portable typedefs Long64_t and ULong64_t. Long long on Unix, _int64 with VC++Long long on Unix, _int64 with VC++ Support for File larger than 2Gb added in ROOT 4.00 Support for File larger than 2Gb added in ROOT 4.00 File smaller than 2Gb still readable by older version of ROOTFile smaller than 2Gb still readable by older version of ROOT Support for TTree with more than 2**31 entries Support for TTree with more than 2**31 entries  Double32_t Same as Double_t in memory Same as Double_t in memory Same as Float_t on disk Same as Float_t on disk Support automatic schema evolution to and from float and double Support automatic schema evolution to and from float and double Warning: too many read/write cycle could result in some loss of precision Warning: too many read/write cycle could result in some loss of precision

6 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics6 XML output format  Update to the I/O classes to allow the customization of the backend.  Implemented for XML  Will be used for SQL support.  XML files allow the interchange of data with applications unable to read ROOT file directly  Example:  Refer to Sergey Linev’s presentation for more details  Extract from c.xml: TCanvas c; h.Draw(); c.SaveAs("c.xml"); c.SaveAs("c.root"); <TObject fUniqueID="0" fBits="3000008"/>

7 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics7 ROOT I/O History  Version 2.25 and older Only hand coded and generated streamer function, Schema evolution done by hand Only hand coded and generated streamer function, Schema evolution done by hand I/O requires : ClassDef, ClassImp and CINT Dictionary I/O requires : ClassDef, ClassImp and CINT Dictionary  Version 2.26 Automatic schema evolution Automatic schema evolution Use TStreamerInfo (with info from dictionary) to drive a general I/O routine. Use TStreamerInfo (with info from dictionary) to drive a general I/O routine.  Version 3.03/05 Lift need for ClassDef and ClassImp for classes not inheriting from TObject Lift need for ClassDef and ClassImp for classes not inheriting from TObject Any non TObject class can be saved inside a TTree or as part of a TObject-class Any non TObject class can be saved inside a TTree or as part of a TObject-class  Version 4.00/00 Automatic versioning of ‘Foreign’ classes Automatic versioning of ‘Foreign’ classes  Version 4.00/08 Non TObject classes can be saved directly in TDirectory Non TObject classes can be saved directly in TDirectory  Version 4.01/02 Large TTrees, TRef autoload Large TTrees, TRef autoload

8 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics8.... Bytecount (4 bytes) 0 (2 bytes) checksum (4 bytes) ObjectN Bytecount 0 checksum objectN+1.... Foreign Objects  To save non instrumented classes: Need just the data dictionary Need just the data dictionary Default versioning provided by a Checksum based on the type and name of the persistent data members Default versioning provided by a Checksum based on the type and name of the persistent data members Checksum stored as an additional 4 bytes Checksum stored as an additional 4 bytes  ClassDef advantages The IsA function generated by ClassDef speeds up considerably the access to the TClass for a given object. The IsA function generated by ClassDef speeds up considerably the access to the TClass for a given object. The version number (2 bytes maximum) consumes less space on disk than the “0+checksum” The version number (2 bytes maximum) consumes less space on disk than the “0+checksum”  New interface to store and retrieve object with Type Safety ptrclass *ptr = …; directory->WriteObject(ptr,"name"); ptrclass *ptr; directory->GetObject("name",ptr); 0 if object absent or of wrong type TBuffer

9 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics9 TClonesArray  Optimization of the number of calls to new and deletes  Ability to split the collection of objects in a TTree Improve compression and run-time Improve compression and run-time  Ability to save object member-wise Store the same data member of all the elements of the collections consecutively Store the same data member of all the elements of the collections consecutively Improve compression (buffer data more homogeneous) Improve compression (buffer data more homogeneous) Improve run-time (avoid n-1 tests of the data type) Improve run-time (avoid n-1 tests of the data type)  Ability to use in TTree::Draw as a collection  Ability to read back without the original compiled code

10 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics10 Old STL Container Support For versions older than 4.00/00  Collection always stored object wise  Nesting of STL collections was extremely limited  No splitting was possible  STL containers stored using a generated function  One generated function per actual data member.  Compiled version of these functions required for writing and also for reading void R__User_fList1(TBuffer &R__b, void *R__p, int) { if (R__b.IsReading()) { vector &fList1 = *(vector *)R__p; int R__n; fList1.clear(); R__b >> R__n; R__stl.reserve(R__n); for (int R__i = 0; R__i < R__n; R__i++) { THit R__t; R__t.Streamer(R__b); fList1.push_back(R__t); } } else { … writing … }

11 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics11 New Container Support  New Abstract Interface: TVirtualCollectionProxy TVirtualCollectionProxy Can be implemented for almost any collections Can be implemented for almost any collections  Allows Splitting (for collection of homogenous objects) Splitting (for collection of homogenous objects) Use in Tree Query (with automatic looping) Use in Tree Query (with automatic looping)  Will allow Member-wise streaming (as opposed to Object wise streaming) Member-wise streaming (as opposed to Object wise streaming)  Also Arbitrary nesting of STL containers Arbitrary nesting of STL containers Reading of STL containers without original code (Emulated mode) Reading of STL containers without original code (Emulated mode)  Note: as of 4.00/08 only std::vector has Proxies.  Early Prototype and fundamental Concepts by Victor Perevoztchikov

12 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics12 STL Support  Each STL container instance now has an associated TClass object  Several co-existing streaming implementations Generated Streamer Generated Streamer For object-wise streamingFor object-wise streaming Fully respect custom allocators and comparatorsFully respect custom allocators and comparators Easier to implement and similar run-time cost as a templated solutionsEasier to implement and similar run-time cost as a templated solutions Templated Proxy (e.g.. TVectorProxy) Templated Proxy (e.g.. TVectorProxy) For splitting and member-wise streaming Fully respect custom allocators and comparatorsFor splitting and member-wise streaming Fully respect custom allocators and comparators Emulation Proxy (e.g.. TEmulatedVectorProxy) Emulation Proxy (e.g.. TEmulatedVectorProxy) For reading without a compiled versionFor reading without a compiled version Allow easy sharing of ALL ROOT files that have no custom streamers.Allow easy sharing of ALL ROOT files that have no custom streamers.  Why not rely only on the Emulation Proxy Implementation difficulties An emulation proxy acting on “live STL object” requires a few tricks and assumptions memory footprint of the STL container object is (usually?) independent from the template parameter List proxy would need a series of list of increasing fixed size content (aka. list, list ) Does not respect allocators and comparator Templated proxy can be faster and more memory efficient. The emulation layer might actually be implemented using alternative collections (if we assume it does not have to deal with real objects)

13 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics13 Container I/O Implementation  Any container can be summarized by the sequence of its content’s addresses  Use TVirtualCollection::At via TVirtualCollection::operator[]  Pros I/O Code completely independent of the collection I/O Code completely independent of the collection Reduced code duplication in TStreamerInfoReduced code duplication in TStreamerInfo No run-time cost for TClonesArray No run-time cost for TClonesArray  Cons Implementation for containers with no random access iterator needs to cache the iterator. Implementation for containers with no random access iterator needs to cache the iterator.  Member-wise implementation Member-wise/object-wise choice will be encoded in the ‘version number’ of the STL collections Member-wise/object-wise choice will be encoded in the ‘version number’ of the STL collections API will be provided to select member-wise or object-wise for data member that are STL collections API will be provided to select member-wise or object-wise for data member that are STL collections

14 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics14 TTree  TRef autoload Added (optional) support for the auto-loading of branches referenced by a TRef object. Added (optional) support for the auto-loading of branches referenced by a TRef object. Generate one table of references to branches per entry Generate one table of references to branches per entry TRef::GetObject uses this table to find and load the branch containing the referenced object TRef::GetObject uses this table to find and load the branch containing the referenced object To enable it call: To enable it call:.  TTree::GetUserInfo Used to store with the TTree any user defined object(s) that is not depending on the entry number Used to store with the TTree any user defined object(s) that is not depending on the entry number Examples: Examples: Luminosity, Calibrations etc.Luminosity, Calibrations etc.. tree->BranchRef(); tree.GetUserInfo()->Add(myobject); class Event { TClonesArray *fTracks; TRef fLastTrack; }; branch=tree.GetBranch("fLastTrack"); branch->GetEntry(7); tlast = event->GetLastTrack();

15 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics15 Copying a TTree  Very flexible simple copying tools allowing cut on: Number of entries Number of entries Number of branches Number of branches Selection of entries base on a Formula Selection of entries base on a Formula Useable for both TTree and TChain Useable for both TTree and TChain  Important simplification of the interface Removed the requirement of explicitly setting the addresses for ALL the branches. Removed the requirement of explicitly setting the addresses for ALL the branches. 3 Branches2 Branches 3 Branches tree->SetBranchStatus(“br”,kFALSE”); newtree=tree->CloneTree(); tree->CopyTree(“fTracks.fPx<=1.2”);

16 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics16 TTree Queries  Implemented Boolean expression optimization ( && and || )  Rebinning now possible from the TTree data (via new histogram editor)  Improved TTree::Scan output (customization and array display)  Call to external functions: Free standing function or class static member function Free standing function or class static member function Compiled or interpreted with Numerical arguments and Numerical return type Compiled or interpreted with Numerical arguments and Numerical return type Example: Example: tree->Draw("TMath::Prob(var,5)");

17 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics17 TTree Queries  Support for Collections TTreeFormula now treats any collection class which has a TVirtualCollection in the exact same way as a TClonesArray TTreeFormula now treats any collection class which has a TVirtualCollection in the exact same way as a TClonesArray Automatically loops over the elements Automatically loops over the elements Can access a specific element Can access a specific element Synchronized with other collections and arrays in the formulas Synchronized with other collections and arrays in the formulas  Connecting several TTrees TChain adds more entries TTree Friends adds (virtually) more branches Prior to ROOT 4.00/08 correlation between Friends made only by entry number This is a problem if Trees have semantically a different sequence of entries Can now connect the Friend using an Index For example Run Number/Event Number Use abstract interface TVirtualIndex Concrete implementation: TTreeIndex 1 1 1 2 2 2 1 2 2 1 2 1 Main TreeUser Tree run event run event 1 1 1 2 2 2 1 2 2 1 2 1 Indexed Main TreeUser Tree run event run event

18 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics18 The MakeClass Revolution Current Fast Analysis Frameworks  TTree::Draw Fast histogramming Fast histogramming Load branch on Demand Load branch on Demand Only simple expressions Only simple expressions  MakeCode C-Style C-Style Obsolete Obsolete  MakeClass Flat representation of the tree Flat representation of the tree Difficulties with variable size arrays Difficulties with variable size arrays Branch loaded explicitly Branch loaded explicitly  MakeSelector Proof Ready Proof Ready Flat representation of the tree Flat representation of the tree Difficulties with variable size arrays Difficulties with variable size arrays Branch loaded explicitly Branch loaded explicitly Elegant Replacement for MakeClass/MakeSelector  Currently named MakeProxy  Creates a C++ context where branch names (including periods) can be used as variable  On demand loading of branches  Respect/recreate the original class structure  Array bound check  Use the user’s shared libraries (when available)

19 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics19 MakeProxy Examples  TTree::Draw of a script: Implemented using MakeProxy Implemented using MakeProxy Enables complex looping Enables complex looping Allow call to any C++ functions or member functions! Allow call to any C++ functions or member functions! Still provide on-demand loading of the branches Still provide on-demand loading of the branches And … allow any arbitrary C++ And … allow any arbitrary C++ Double_t hsimple() { int last = fTracks.GetLast(); for(int i=1; i < last-1; ++i) { htemp->Fill(fTracks.fPt[i]-fTracks.fPt[i-1]); } return fTracks.fPt[last] – fTracks.fPt[last-1]; } tree->Draw(“hsimple.C”);

20 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics20 File types & Access in 4.01/xx Local File X.xml RFIOChirp CastorDcache Local File X.root http rootd/xrootd Oracle SapDb PgSQL MySQL TFile TKey/TTree TStreamerInfo user TSQLServer TSQLRow TSQLResult TTreeSQL

21 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics21 New RDBMS interface: Goals  Access any RDBMS tables from TTree::Draw  Create a Tree in split mode  creating a RDBMS table and filling it.  The table can be processed by SQL directly.  The interface uses the normal I/O engine, including support for Automatic Schema Evolution.

22 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics22 New RDBMS Interface  Current prototype Simple TTree (branch with leaf list) Simple TTree (branch with leaf list) Implemented via TSQLxxx for reading and writing Implemented via TSQLxxx for reading and writing Implemented via RDBC for reading Implemented via RDBC for reading See: http://carrot.cern.ch/~onuchin/RDBC/See: http://carrot.cern.ch/~onuchin/RDBC/ http://carrot.cern.ch/~onuchin/RDBC/ Should be released in December 2004. Should be released in December 2004.  Should be expanded to support branch of objects Need to implement a way to store and retrieve TStreamerInfo(s) and TProcessID(s) in the database Need to implement a way to store and retrieve TStreamerInfo(s) and TProcessID(s) in the database Will probably use SQL binary ‘blob’ to store non-split objects. Will probably use SQL binary ‘blob’ to store non-split objects.

23 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics23 RDBMS Examples TTreeSQL tree(const char *db,const char* uid,…) tree.Print(), Browse, Scan, etc tree.Draw(“var1:var2”,”varx <0”) TTree query style converted to SQL Connect to an existing db TTreeSQL tree(“mysql://localhost/test”,”nobody”,”new”); Event *event = new Event; tree.Branch(“top”,”Event”,&event); tree.Fill(); tree.AutoSave(); A TSQLRow is filled and sent to the server Columns created using the normal split algorithm. Blobs created below split. Create the data base on server

24 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics24 Future Plans for I/O and TTree  Implement member-wise storing for std::vector (late 2004)  Implement TVirtualCollectionProxy for each of the STL containers (late 2004, early 2005)  Add support for auto loading of TRef branches across trees  TChain, TTree Friends and Indexing Add support for “befriending” TChain objects using an Indexed relation Add support for “befriending” TChain objects using an Indexed relation  TTree Queries Allow following (transparently) TRef and TRefArray Allow following (transparently) TRef and TRefArray

25 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics25 Summary  TFile improvement Large files and trees, Double32_t, XML output format. Large files and trees, Double32_t, XML output format. Support for non-instrumented classes Support for non-instrumented classes  Enhancement in I/O and Tree Query for collection Split Collections Split Collections Fast histograming of (potentially) any collections Fast histograming of (potentially) any collections Lift restrictions on STL I/O Lift restrictions on STL I/O Nested containersNested containers Reading without compiled codeReading without compiled code  TTree Remove stringent requirements on CloneTree Remove stringent requirements on CloneTree Add support for auto loading of referenced objects Add support for auto loading of referenced objects Support for RDBMS databases back-end coming soon. Support for RDBMS databases back-end coming soon.  TTree Queries Can call any functions taking numerical arguments Can call any functions taking numerical arguments Can use arbitrary C++ and still use the branch names as variables Can use arbitrary C++ and still use the branch names as variables TTree Friend linked by Index TTree Friend linked by Index


Download ppt "ROOT I/O TTree Queries CHEP 2004 René Brun / CERN Philippe Canal / Fermilab Fons Rademakers / CERN"

Similar presentations


Ads by Google