Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University.

Similar presentations


Presentation on theme: "Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University."— Presentation transcript:

1 Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University

2 Outline How Strange is the Universe? 5 Modern Mysteries. In trying to resolve these mysteries, particle physicists face a significant data logistics problem. Solution should be flexible enough to encourage the creative approaches that will maximize productivity. REDDnet breaks “data-tethered” compute model, allows unfettered access w/o strong central control.

3 Is the Universe Even Stranger Than We Have Imagined? One piece of evidence: rotational velocities of stars in galaxies Pick a star, how fast is it moving around galactic center? Mass of galaxy is much, much larger than you get by counting the stars in the galaxy 1st Year Physics!

4 We Don’t Know What The Majority of Matter in the Universe Is. This “extra” matter is 90% of the Universe! Conventional explanations have mostly been ruled out – Planets, dust, … Most of the matter in the Universe is probably an exotic form of matter — heretofore unknown! But there is a good chance particle physicists will make some soon at the LHC at CERN! ~10% normal matter 90% “other” matter

5 5 Mysteries for a New Millennium What is the majority of matter in the universe made of? Does space have more than three dimensions? Where is all the anti-matter created by the Big Bang? What is this bizarre thing called “Dark Energy?” Why do things have mass?

6 Answering These Questions Presents Many Challenges… Experiments require significant infrastructure, large collaborations 2500 Physicists! CERN Large Hadron Collider: 2007 Start 27 km tunnel in Switzerland & France (100 m below ground) CMS

7 Petascale Computing Required 2008: ~50,000 8 GHz P4s CMS will generate Petabytes of data per year and require Petaflops of CPU… But physics is done in small groups, geographically distributed

8 Distributed Resources, People Why Distributed Resources? Sociology Politics Funding To maximize the quality and rate of scientific discovery, all physicists must have equal ability to access and analyze the experiment's data… CMS Collaboration: >37 Countries, >163 Institutes

9 LHC Data Grid Hierarchy Tier 1 Online System CERN Center PBs of Disk; Tape Robot FNAL Tier1 IN2P3 Tier1 INFN Tier1 RAL Tier1 Institute Workstations/Laptops ~150-1500 MBs 10 Gbps 1 to 10 Gbps ~PByte/sec 10-40+ Gbps Tier2 Center 1-10 Gbps Tier 0 +1 Tier 3 Tier 4 Caltech Tier2 Tier 2 Experiment >10 Tier1 and ~100 Tier2 Centers UERJ Tier2 Physics data cache Vanderbilt Tier3 The small Analysis Groups doing the physics: work at the Tier 3/4 Level.

10 Data Logistics Yin and Yang Uncertainty reigns at the most important level — where the physics will get done. Physicists will evolve novel use cases that will not jive with expectations or any plans/rules/edicts. High Level Control Infrastructure Ready?TestedUse Cases Tier 0 Strong, Centralized MostMuchUnderstood Tier 4AnarchyLittle/NoneNone?????

11 Use Cases: What we Do Know Physicists will: need access to 10-100 TB Data Sets for short term periods. run over this data many times, refining, improving their analysis. use local computing resources where they may not have much storage available. make “opportunistic use” of compute resources at Tier 3 sites and Grid sites. perform “production runs” at Tier 2 sites.

12 REDDnet at Tier 3 Opportunistic computing vs data-tethered computing –CMS has no formal solution for Tier 3 storage –Compute on resources — even those where data not hosted On-demand working storage –improve data logistics –Acts local — familiar user tools Demonstrate at a Tier 3 –Performance –Reliability –… and convenience

13 REDDnet SC06 Depots Near Term Plan of Work Provide T3 scratch space Host/mirror popular datasets on REDDnet Participate in Data and Service Challenges –Summer 07 Challenge Starting Soon –Network and Data Transfer Load tests Integrate with existing CMS tools Develop a Tier 3 Analysis environment –Initial small test community –Test with individual analyses –Run on the Grid


Download ppt "Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University."

Similar presentations


Ads by Google