Presentation is loading. Please wait.

Presentation is loading. Please wait.

HEP and Data Grids (Aug. 4-5, 2001)Paul Avery1 High Energy Physics and Data Grids Paul Avery University of Florida

Similar presentations


Presentation on theme: "HEP and Data Grids (Aug. 4-5, 2001)Paul Avery1 High Energy Physics and Data Grids Paul Avery University of Florida"— Presentation transcript:

1 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery1 High Energy Physics and Data Grids Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu US/UK Grid Workshop San Francisco August 4-5, 2001

2 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery2 Essentials of High Energy Physics è Better name  “Elementary Particle Physics”  Science: Elementary particles, fundamental forces e   udud cscs tbtb è Goal  unified theory of nature  Unification of forces (Higgs, superstrings, extra dimensions, …)  Deep connections to large scale structure of universe  Large overlap with astrophysics, cosmology, nuclear physics Quarks Leptons ParticlesForces Strong  gluon Electro-weak  , W , Z 0 Gravity  graviton

3 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery3 10 -10 m~ 10 eV>300,000 Y 10 -15 m MeV - GeV 10 -16 m>> GeV~10 -6 sec 10 -18 m ~ 100 GeV~10 -10 sec 1900....Quantum Mechanics Atomic physics 1940-50Quantum Electro Dynamics 1950-65Nuclei, Hadrons Symmetries, Field theories 1965-75Quarks. Gauge theories 1990 LEP3 families, Precision Electroweak 10 -19 m~10 2 GeV Origin of masses The next step... ~10 -12 sec2007 LHCHiggs ? Supersymmetry ? 1970  83 SPS ElectroWeak unification, QCD ~ 3 min 10 -32 m~10 16 GeV~10 -32 secProton Decay ? Underground GRAND Unified Theories ? 10 -35 m~10 19 GeV (Planck scale) ~10 -43 sec??Quantum Gravity? Superstrings ? The Origin of the Universe 1994 TevatronTop quark ue + Z e - u HEP Short History + Frontiers

4 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery4 HEP Research è Experiments primarily accelerator based  Fixed target, colliding beams, special beams è Detectors  Small, large, general purpose, special purpose è … but wide variety of other techniques  Cosmic rays, proton decay, g-2, neutrinos, space missions è Increasing scale of experiments and laboratories  Forced on us by ever higher energies  Complexity, scale, costs  large collaborations  International collaborations are the norm today  Global collaborations are the future (LHC) LHC discussed in next few slides

5 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery5 The CMS Collaboration 1010 448 351 1809 Member States Non-Member States Total USA 58 36 144 Member States Total USA 50 Non-Member States Number of Scientists Number of Laboratories Slovak Republic CERN France Italy UK Switzerland USA Austria Finland Greece Hungary Belgium Poland Portugal Spain Pakistan Georgia Armenia Ukraine Uzbekistan Cyprus Croatia China Turkey Belarus Estonia India Germany Korea Russia Bulgaria China (Taiwan) 1809 Physicists and Engineers 31 Countries 144 Institutions Associated Institutes Number of Scientists Number of Laboratories 36 5

6 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery6 CERN LHC site CMS Atlas LHCb ALICE

7 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery7 High Energy Physics at the LHC “Compact” Muon Solenoid at the LHC (CERN) Smithsonian standard man

8 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery8 Particle Proton  Proton 2835 bunch/beam Protons/bunch10 11 Beam energy7 TeV (7x10 12 ev) Luminosity10 34 cm  2 s  1 Crossing rate40 MHz (every 25 nsec) Collision rate~10 9 Hz Parton (quark, gluon) Proton Selection: 1 in 10 13 l l jet Bunch New physics rate ~ 10  5 Hz Collisions at LHC (2007  ?) (Average ~20 Collisions/Crossing)

9 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery9 HEP Data è Scattering is principal technique for gathering data  Collisions of beam-beam or beam-target particles  Typically caused by a single elementary interaction  But also background collisions  obscures physics è Each collision generates many particles: “Event”  Particles traverse detector, leaving electronic signature  Information collected, put into mass storage (tape)  Each event is independent  trivial computational parallelism è Data Intensive Science  Size of raw event record: 20KB  1MB  10 6  10 9 events per year  0.3 PB per year (2001)BaBar (SLAC)  1 PB per year (2005)CDF, D0 (Fermilab)  5 PB per year (2007)ATLAS, CMS (LHC)

10 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery10 Data Rates: From Detector to Storage Level 1 Trigger: Special Hardware 40 MHz ~1000 TB/sec 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 100 MB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage Physics filtering

11 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery11 LHC Data Complexity è “Events” resulting from beam-beam collisions:  Signal event is obscured by 20 overlapping uninteresting collisions in same crossing  CPU time does not scale from previous generations 20002007

12 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery12 40M events/sec, selectivity: 1 in 10 13 Example: Higgs Decay into 4 Muons

13 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery13 1800 Physicists 150 Institutes 32 Countries LHC Computing Challenges è Complexity of LHC environment and resulting data è Scale: Petabytes of data per year (100 PB by ~2010) Millions of SpecInt95s of CPU è Geographical distribution of people and resources

14 Transatlantic Net WG (HN, L. Price) Tier0 - Tier1 BW Requirements [*] è [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progress

15 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery15 Hoffmann LHC Computing Report 2001 Tier0 – Tier1 link requirements (1) Tier1  Tier0 Data Flow for Analysis0.5 - 1.0 Gbps (2) Tier2  Tier0 Data Flow for Analysis0.2 - 0.5 Gbps (3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps (4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps (5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each TOTAL Per Tier0 - Tier1 Link1.7 - 2.8 Gbps  Corresponds to ~10 Gbps Baseline BW Installed on US-CERN Link  Adopted by the LHC Experiments (Steering Committee Report)

16 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery16 LHC Computing Challenges è Major challenges associated with:  Scale of computing systems  Network-distribution of computing and data resources  Communication and collaboration at a distance  Remote software development and physics analysis Result of these considerations: Data Grids

17 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery17 Tier0 CERN Tier1 National Lab Tier2 Regional Center (University, etc.) Tier3 University workgroup Tier4 Workstation Global LHC Data Grid Hierarchy Tier 1 T2 3 3 3 3 3 3 3 3 3 3 3 Tier 0 (CERN) 4 4 4 4 3 3 Key ideas: è Hierarchical structure è Tier2 centers è Operate as unified Grid

18 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery18 Example: CMS Data Grid Tier2 Center Online System CERN Computer Center > 20 TIPS USA Center France Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations, other portals ~100 MBytes/sec 2.5 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec 2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/(  Tier1)/(  Tier2) ~1:1:1

19 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery19 Tier1 and Tier2 Centers è Tier1 centers  National laboratory scale: large CPU, disk, tape resources  High speed networks  Many personnel with broad expertise  Central resource for large region è Tier2 centers  New concept in LHC distributed computing hierarchy  Size  [national lab * university] 1/2  Based at large University or small laboratory  Emphasis on small staff, simple configuration & operation è Tier2 role  Simulations, analysis, data caching  Serve small country, or region within large country

20 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery20 LHC Tier2 Center (2001) Router FEth FEth Switch GEth Switch Data Server >1 RAID Tape WAN Hi-speed channel

21 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery21 è Buy late, but not too late: phased implementation  R&D Phase 2001-2004  Implementation Phase2004-2007  R&D to develop capabilities and computing model itself  Prototyping at increasing scales of capability & complexity 1.4 years 1.2 years 1.1 years 2.1 years Hardware Cost Estimates

22 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery22 HEP Related Data Grid Projects è Funded projects  GriPhyNUSANSF, $11.9M + $1.6M  PPDG IUSADOE, $2M  PPDG IIUSADOE, $9.5M  EU DataGridEU$9.3M è Proposed projects  iVDGLUSANSF, $15M + $1.8M + UK  DTFUSANSF, $45M + $4M/yr  DataTagEUEC, $2M?  GridPPUKPPARC, > $15M è Other national projects  UK e-Science (> $100M for 2001-2004)  Italy, France, (Japan?)

23 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery23 (HEP Related) Data Grid Timeline Q2 00 Q3 00 Q4 00 Q1 01 Q2 01 Q3 01 GriPhyN approved, $11.9M+$1.6M Outline of US-CMS Tier plan Caltech-UCSD install proto-T2 Submit GriPhyN proposal, $12.5M Submit iVDGL preproposal EU DataGrid approved, $9.3M 1 st Grid coordination meeting Submit PPDG proposal, $12M Submit DTF proposal, $45M Submit iVDGL proposal, $15M PPDG approved, $9.5M 2 nd Grid coordination meeting iVDGL approved? DTF approved? DataTAG approved Submit DataTAG proposal, $2M

24 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery24 Coordination Among Grid Projects è Particle Physics Data Grid (US, DOE)  Data Grid applications for HENP  Funded 1999, 2000 ($2M)  Funded 2001-2004 ($9.4M)  http://www.ppdg.net/ è GriPhyN (US, NSF)  Petascale Virtual-Data Grids  Funded 9/2000 – 9/2005 ($11.9M+$1.6M)  http://www.griphyn.org/ è European Data Grid (EU)  Data Grid technologies, EU deployment  Funded 1/2001 – 1/2004 ($9.3M)  http://www.eu-datagrid.org/  HEP in common  Focus: infrastructure development & deployment  International scope  Now developing joint coordination framework GridPP, DTF, iVDGL  very soon?

25 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery25 Data Grid Management

26 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery26 PPDG BaBar Data Management BaBar D0 CDF Nuclear Physics CMSAtlas Globus Users SRB Users Condor Users HENP GC Users CMS Data Management Nuclear Physics Data Management D0 Data Management CDF Data Management Atlas Data Management Globus Team Condor SRB Team HENP GC

27 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery27         EU DataGrid Project

28 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery28 PPDG and GriPhyN Projects è PPDG focus on today’s (evolving) problems in HENP  Current HEP:BaBar, CDF, D0  Current NP:RHIC, JLAB  Future HEP:ATLAS, CMS è GriPhyN focus on tomorrow’s solutions  ATLAS, CMS, LIGO, SDSS  Virtual data, “Petascale” problems (Petaflops, Petabytes)  Toolkit, export to other disciplines, outreach/education è Both emphasize  Application sciences drivers  CS/application partnership (reflected in funding)  Performance è Explicitly complementary

29 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery29 University CPU, Disk, Users PRIMARY SITE Data Acquisition, Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution PPDG Multi-site Cached File Access System

30 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery30 GriPhyN: PetaScale Virtual-Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source ~1 Petaflop ~100 Petabytes

31 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery31 Virtual Data in Action è Data request may  Compute locally  Compute remotely  Access local data  Access remote data è Scheduling based on  Local policies  Global policies  Cost Major facilities, archives Regional facilities, caches Local facilities, caches Item request

32 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery32 GriPhyN Goals for Virtual Data è Transparency with respect to location  Caching, catalogs, in a large-scale, high-performance Data Grid è Transparency with respect to materialization  Exact specification of algorithm components  Traceability of any data product  Cost of storage vs CPU vs networks è Automated management of computation  Issues of scale, complexity, transparency  Complications: calibrations, data versions, software versions, … Explore concept of virtual data and its applicability to data-intensive science

33 HEP and Data Grids (Aug. 4-5, 2001)Paul Avery33 Data Grid Reference Architecture


Download ppt "HEP and Data Grids (Aug. 4-5, 2001)Paul Avery1 High Energy Physics and Data Grids Paul Avery University of Florida"

Similar presentations


Ads by Google