Presentation is loading. Please wait.

Presentation is loading. Please wait.

ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.

Similar presentations


Presentation on theme: "ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis."— Presentation transcript:

1 ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis

2 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis2

3 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis3 ROOT Trends Histogram Ntuple viewers Data Presenters Efficient Access to large and structured event collections Interaction with user & experiment classes Parallelism on the GRID Batch/Interactive Access to Catalogs Resource Brokers Process migration Progress Monitors Proxies/caches Virtual data sets PAW or ROOT like PAW

4 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis4 Memory Tree Each Node is a branch in the Tree tr 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 T.Fill() T.GetEntry(6) T Memory

5 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis5 8 Branches of T 8 leaves of branch Electrons A double-click to histogram the leaf

6 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis6 The Tree Viewer & Analyzer A very powerful class supporting complex cuts, event lists, 1-d,2-d, 3-d views parallelism

7 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis7 Tree Friends tr 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Public read Public read User Write Entry # 8

8 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis8 Tree Friends Root > TFile f1(“tree1.root”); Root > tree.AddFriend(“tree2”,“tree2.root”) Root > tree.AddFriend(“tree3”,“tree3.root”); Root > tree.Draw(“x:a”,”k<c”); Root > tree.Draw(“x:tree2.x”,”sqrt(p)<b”); x Processing time independent of the number of friends unlike table joins in RDBMS Collaboration-wide public read Analysis group protected user private

9 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis9 Data Volume & Organisation 100MB1GB10GB1TB100GB100TB1PB10TB 11500005000500505 TTree TChain A TChain is a collection of TTrees or/and TChains A TFile typically contains 1 TTree A TChain is typically the result of a query to the file catalogue

10 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis10 Data Volume & Processing Time Using technology available in 2003 1” 10” 1’ 10’ 1h 10h 1day 1month 1” 1” 10” 1’ 10’ 1h 10h 1day 10days 1” 1” 1” 10” 1’ 10’ 1h 10h 1day 1’ 10’ 1h 10h 100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB ROOT 1 Processor P IV 2.4GHz 2003 : Time for one query using 10 per cent of data Interactive batch PROOF 10 Processors PROOF 100Processors PROOF/ALIEN 1000Processors

11 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis11 Data Volume & Processing Time Using technology available in 2010 1” 1” 1” 10” 1’ 10’ 1h 10h 1day 1’ 10’ 1h 100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB ROOT 1 Processor XXXXX 2010 : Time for one query using 10 per cent of data Interactive batch PROOF 10 Processors PROOF 100Processors PROOF/ALIEN 1000Processors 1” 1” 10” 1’ 10’ 1h 10h 1day 10days 1” 1” 1” 1” 10” 1’ 10’ 1h 10h

12 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis12 Interactive Local Analysis On a public cluster, or the user’s laptop. Tools like PAW or successor are used for visualization and ntuples/trees analysis.

13 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis13 GRID: Interactive Analysis Case 1 Data transfer to user’s laptop Optional Run/File catalog Optional GRID software Optional run/File Catalog Remote file server eg rootd Trees Analysis scripts are interpreted or compiled on the local machine

14 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis14 GRID: Interactive Analysis Case 2 Remote data processing Optional Run/File catalog Optional GRID software Optional run/File Catalog Remote data analyzer eg proofd Trees Commands, scripts histograms Analysis scripts are interpreted or compiled on the remote machine

15 Ren é Brun 5 Oct 03 Intel: Distributed Data Analysis15 GRID: Interactive Analysis Case 3 Remote data processing Run/File catalog Full GRID software Run/File Catalog Remote data analyzer eg proofd Trees Commands, scripts Histograms,trees Trees slave Analysis scripts are interpreted or compiled on the remote master(s)


Download ppt "ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis."

Similar presentations


Ads by Google