Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida Global Data Grids for 21 st Century.

Similar presentations


Presentation on theme: "Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida Global Data Grids for 21 st Century."— Presentation transcript:

1 Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu Global Data Grids for 21 st Century Science GriPhyN/iVDGL Outreach Workshop University of Texas, Brownsville March 1, 2002

2 Outreach Workshop (Mar. 1, 2002)Paul Avery2 What is a Grid? è Grid: Geographically distributed computing resources configured for coordinated use è Physical resources & networks provide raw capability è “Middleware” software ties it together

3 Outreach Workshop (Mar. 1, 2002)Paul Avery3 What Are Grids Good For? è Climate modeling  Climate scientists visualize, annotate, & analyze Terabytes of simulation data è Biology  A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour è High energy physics  3,000 physicists worldwide pool Petaflops of CPU resources to analyze Petabytes of data è Engineering  Civil engineers collaborate to design, execute, & analyze shake table experiments  A multidisciplinary analysis in aerospace couples code and data in four companies From Ian Foster

4 Outreach Workshop (Mar. 1, 2002)Paul Avery4 What Are Grids Good For? è Application Service Providers  A home user invokes architectural design functions at an application service provider…  …which purchases computing cycles from cycle providers è Commercial  Scientists at a multinational toy company design a new product è Cities, communities  An emergency response team couples real time data, weather model, population data  A community group pools members’ PCs to analyze alternative designs for a local road è Health  Hospitals and international agencies collaborate on stemming a major disease outbreak From Ian Foster

5 Outreach Workshop (Mar. 1, 2002)Paul Avery5 Proto-Grid: SETI@home è Community: SETI researchers + enthusiasts è Arecibo radio data sent to users (250KB data chunks) è Over 2M PCs used

6 Outreach Workshop (Mar. 1, 2002)Paul Avery6 è Community  Research group (Scripps)  1000s of PC owners  Vendor (Entropia) è Common goal  Drug design  Advance AIDS research More Advanced Proto-Grid: Evaluation of AIDS Drugs

7 Outreach Workshop (Mar. 1, 2002)Paul Avery7 Why Grids? è Resources for complex problems are distributed  Advanced scientific instruments (accelerators, telescopes, …)  Storage and computing  Groups of people è Communities require access to common services  Scientific collaborations (physics, astronomy, biology, eng. …)  Government agencies  Health care organizations, large corporations, … è Goal is to build “Virtual Organizations”  Make all community resources available to any VO member  Leverage strengths at different institutions  Add people & resources dynamically

8 Outreach Workshop (Mar. 1, 2002)Paul Avery8 Grids: Why Now? è Moore’s law improvements in computing  Highly functional endsystems è Burgeoning wired and wireless Internet connections  Universal connectivity è Changing modes of working and problem solving  Teamwork, computation è Network exponentials  (Next slide)

9 Outreach Workshop (Mar. 1, 2002)Paul Avery9 Network Exponentials & Collaboration è Network vs. computer performance  Computer speed doubles every 18 months  Network speed doubles every 9 months  Difference = order of magnitude per 5 years è 1986 to 2000  Computers: x 500  Networks: x 340,000 è 2001 to 2010?  Computers: x 60  Networks: x 4000 Scientific American (Jan-2001)

10 Outreach Workshop (Mar. 1, 2002)Paul Avery10 Grid Challenges è Overall goal: Coordinated sharing of resources è Technical problems to overcome  Authentication, authorization, policy, auditing  Resource discovery, access, allocation, control  Failure detection & recovery  Resource brokering è Additional issue: lack of central control & knowledge  Preservation of local site autonomy  Policy discovery and negotiation important

11 Outreach Workshop (Mar. 1, 2002)Paul Avery11 Layered Grid Architecture (Analogy to Internet Architecture) Application Fabric Controlling things locally: Accessing, controlling resources Connectivity Talking to things: communications, security Resource Sharing single resources: negotiating access, controlling use Collective Managing multiple resources: ubiquitous infrastructure services User Specialized services: App. specific distributed services Internet Transport Application Link Internet Protocol Architecture From Ian Foster

12 Outreach Workshop (Mar. 1, 2002)Paul Avery12 Globus Project and Toolkit è Globus Project™ (Argonne + USC/ISI)  O(40) researchers & developers  Identify and define core protocols and services è Globus Toolkit™ 2.0  A major product of the Globus Project  Reference implementation of core protocols & services  Growing open source developer community è Globus Toolkit used by all Data Grid projects today  US:GriPhyN, PPDG, TeraGrid, iVDGL  EU:EU-DataGrid and national projects è Recent announcement of applying “web services” to Grids  Keeps Grids in the commercial mainstream  GT 3.0

13 Outreach Workshop (Mar. 1, 2002)Paul Avery13 Globus General Approach è Define Grid protocols & APIs  Protocol-mediated access to remote resources  Integrate and extend existing standards è Develop reference implementation  Open source Globus Toolkit  Client & server SDKs, services, tools, etc. è Grid-enable wide variety of tools  Globus Toolkit  FTP, SSH, Condor, SRB, MPI, … è Learn about real world problems  Deployment  Testing  Applications Diverse global services Core services Diverse resources Applications

14 Outreach Workshop (Mar. 1, 2002)Paul Avery14 Data Intensive Science: 2000-2015 è Scientific discovery increasingly driven by IT  Computationally intensive analyses  Massive data collections  Data distributed across networks of varying capability  Geographically distributed collaboration è Dominant factor: data growth (1 Petabyte = 1000 TB)  2000~0.5 Petabyte  2005~10 Petabytes  2010~100 Petabytes  2015~1000 Petabytes? How to collect, manage, access and interpret this quantity of data? Drives demand for “Data Grids” to handle additional dimension of data access & movement

15 Outreach Workshop (Mar. 1, 2002)Paul Avery15 Data Intensive Physical Sciences è High energy & nuclear physics  Including new experiments at CERN’s Large Hadron Collider è Gravity wave searches  LIGO, GEO, VIRGO è Astronomy: Digital sky surveys  Sloan Digital sky Survey, VISTA, other Gigapixel arrays  “Virtual” Observatories (multi-wavelength astronomy) è Time-dependent 3-D systems (simulation & data)  Earth Observation, climate modeling  Geophysics, earthquake modeling  Fluids, aerodynamic design  Pollutant dispersal scenarios

16 Outreach Workshop (Mar. 1, 2002)Paul Avery16 Data Intensive Biology and Medicine è Medical data  X-Ray, mammography data, etc. (many petabytes)  Digitizing patient records (ditto) è X-ray crystallography  Bright X-Ray sources, e.g. Argonne Advanced Photon Source è Molecular genomics and related disciplines  Human Genome, other genome databases  Proteomics (protein structure, activities, …)  Protein interactions, drug delivery è Brain scans (3-D, time dependent) è Virtual Population Laboratory (proposed)  Database of populations, geography, transportation corridors  Simulate likely spread of disease outbreaks Craig Venter keynote @SC2001

17 Outreach Workshop (Mar. 1, 2002)Paul Avery17 Example: High Energy Physics “Compact” Muon Solenoid at the LHC (CERN) Smithsonian standard man

18 Outreach Workshop (Mar. 1, 2002)Paul Avery18 1800 Physicists 150 Institutes 32 Countries LHC Computing Challenges è Complexity of LHC interaction environment & resulting data è Scale: Petabytes of data per year (100 PB by ~2010-12) è GLobal distribution of people and resources

19 Outreach Workshop (Mar. 1, 2002)Paul Avery19 Tier0 CERN Tier1 National Lab Tier2 Regional Center (University, etc.) Tier3 University workgroup Tier4 Workstation Global LHC Data Grid Tier 1 T2 3 3 3 3 3 3 3 3 3 3 3 Tier 0 (CERN) 4 4 4 4 3 3 Key ideas: è Hierarchical structure è Tier2 centers

20 Outreach Workshop (Mar. 1, 2002)Paul Avery20 Global LHC Data Grid Tier2 Center Online System CERN Computer Center > 20 TIPS USA Center France Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations, other portals ~100 MBytes/sec 2.5 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec 2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/(  Tier1)/(  Tier2) ~1:1:1

21 Outreach Workshop (Mar. 1, 2002)Paul Avery21 Sloan Digital Sky Survey Data Grid

22 Outreach Workshop (Mar. 1, 2002)Paul Avery22 LIGO (Gravity Wave) Data Grid Hanford Observatory Livingston Observatory Caltech MIT INet2 Abilene Tier1 LSC Tier2 OC3 OC48 OC3 OC12 OC48

23 Outreach Workshop (Mar. 1, 2002)Paul Avery23 Data Grid Projects è Particle Physics Data Grid (US, DOE)  Data Grid applications for HENP expts. è GriPhyN (US, NSF)  Petascale Virtual-Data Grids è iVDGL (US, NSF)  Global Grid lab è TeraGrid (US, NSF)  Dist. supercomp. resources (13 TFlops) è European Data Grid (EU, EC)  Data Grid technologies, EU deployment è CrossGrid (EU, EC)  Data Grid technologies, EU è DataTAG (EU, EC)  Transatlantic network, Grid applications è Japanese Grid Project (APGrid?) (Japan)  Grid deployment throughout Japan  Collaborations of application scientists & computer scientists  Infrastructure devel. & deployment  Globus based

24 Outreach Workshop (Mar. 1, 2002)Paul Avery24 Coordination of U.S. Grid Projects è Three U.S. projects  PPDG: HENP experiments, short term tools, deployment  GriPhyN: Data Grid research, Virtual Data, VDT deliverable  iVDGL:Global Grid laboratory è Coordination of PPDG, GriPhyN, iVDGL  Common experiments + personnel, management integration  iVDGL as “joint” PPDG + GriPhyN laboratory  Joint meetings (Jan. 2002, April 2002, Sept. 2002)  Joint architecture creation (GriPhyN, PPDG)  Adoption of VDT as common core Grid infrastructure  Common Outreach effort (GriPhyN + iVDGL) è New TeraGrid project (Aug. 2001)  13MFlops across 4 sites, 40 Gb/s networking  Goal: integrate into iVDGL, adopt VDT, common Outreach

25 Outreach Workshop (Mar. 1, 2002)Paul Avery25 Worldwide Grid Coordination è Two major clusters of projects  “US based”GriPhyN Virtual Data Toolkit (VDT)  “EU based” Different packaging of similar components

26 Outreach Workshop (Mar. 1, 2002)Paul Avery26 GriPhyN = App. Science + CS + Grids è GriPhyN = Grid Physics Network  US-CMSHigh Energy Physics  US-ATLASHigh Energy Physics  LIGO/LSCGravity wave research  SDSSSloan Digital Sky Survey  Strong partnership with computer scientists è Design and implement production-scale grids  Develop common infrastructure, tools and services (Globus based)  Integration into the 4 experiments  Broad application to other sciences via “Virtual Data Toolkit”  Strong outreach program è Multi-year project  R&D for grid architecture (funded at $11.9M +$1.6M)  Integrate Grid infrastructure into experiments through VDT

27 Outreach Workshop (Mar. 1, 2002)Paul Avery27 GriPhyN Institutions  U Florida  U Chicago  Boston U  Caltech  U Wisconsin, Madison  USC/ISI  Harvard  Indiana  Johns Hopkins  Northwestern  Stanford  U Illinois at Chicago  U Penn  U Texas, Brownsville  U Wisconsin, Milwaukee  UC Berkeley  UC San Diego  San Diego Supercomputer Center  Lawrence Berkeley Lab  Argonne  Fermilab  Brookhaven

28 Outreach Workshop (Mar. 1, 2002)Paul Avery28 GriPhyN: PetaScale Virtual-Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source ~1 Petaflop ~100 Petabytes

29 Outreach Workshop (Mar. 1, 2002)Paul Avery29 GriPhyN Research Agenda è Virtual Data technologies (fig.)  Derived data, calculable via algorithm  Instantiated 0, 1, or many times (e.g., caches)  “Fetch value” vs “execute algorithm”  Very complex (versions, consistency, cost calculation, etc) è LIGO example  “Get gravitational strain for 2 minutes around each of 200 gamma- ray bursts over the last year” è For each requested data value, need to  Locate item location and algorithm  Determine costs of fetching vs calculating  Plan data movements & computations required to obtain results  Execute the plan

30 Outreach Workshop (Mar. 1, 2002)Paul Avery30 Virtual Data in Action è Data request may  Compute locally  Compute remotely  Access local data  Access remote data è Scheduling based on  Local policies  Global policies  Cost Major facilities, archives Regional facilities, caches Local facilities, caches Fetch item

31 Outreach Workshop (Mar. 1, 2002)Paul Avery31 GriPhyN Research Agenda (cont.) è Execution management  Co-allocation of resources (CPU, storage, network transfers)  Fault tolerance, error reporting  Interaction, feedback to planning è Performance analysis (with PPDG)  Instrumentation and measurement of all grid components  Understand and optimize grid performance è Virtual Data Toolkit (VDT)  VDT = virtual data services + virtual data tools  One of the primary deliverables of R&D effort  Technology transfer mechanism to other scientific domains

32 Outreach Workshop (Mar. 1, 2002)Paul Avery32 GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus = initial solution is operational

33 Outreach Workshop (Mar. 1, 2002)Paul Avery33 Transparency wrt materialization Id Trans FParamName … i1 F X F.X … i2 F Y F.Y … i10 G Y PG(P).Y … TransProgCost … F URL:f 10 … G URL:g 20 … Program storage Trans. name URLs for program location Derived Data Catalog Transformation Catalog Update upon materialization App specificattr. id … …i2,i10 … … Derived Metadata Catalog id Id TransParam Name … i1 F X F.X … i2 F Y F.Y … i10 G Y PG(P).Y … Trans ProgCost … F URL:f 10 … G URL:g 20 … Program storage Trans. name URLs for program location App-specific-attr id … …i2,i10 … … id Physical file storage URLs for physical file location NameLObjN… F.XlogO3 … … LCNPFNs… logC1 URL1 logC2 URL2 URL3 logC3 URL4 logC4 URL5 URL6 Metadata Catalog Replica Catalog Logical Container Name GCMS Object Name Transparency wrt location Name LObjN … … X logO1 … … Y logO2 … … F.X logO3 … … G(1).Y logO4 … … LCNPFNs… logC1 URL1 logC2 URL2 URL3 logC3 URL4 logC4 URL5 URL6 Replica Catalog GCMS Object Name Catalog Architecture Metadata Catalog

34 Outreach Workshop (Mar. 1, 2002)Paul Avery34 iVDGL: A Global Grid Laboratory è International Virtual-Data Grid Laboratory  A global Grid laboratory (US, EU, South America, Asia, …)  A place to conduct Data Grid tests “at scale”  A mechanism to create common Grid infrastructure  A facility to perform production exercises for LHC experiments  A laboratory for other disciplines to perform Data Grid tests  A focus of outreach efforts to small institutions è Funded for $13.65M by NSF “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001

35 Outreach Workshop (Mar. 1, 2002)Paul Avery35 iVDGL Components è Computing resources  Tier1, Tier2, Tier3 sites è Networks  USA (TeraGrid, Internet2, ESNET), Europe (Géant, …)  Transatlantic (DataTAG), Transpacific, AMPATH, … è Grid Operations Center (GOC)  Indiana (2 people)  Joint work with TeraGrid on GOC development è Computer Science support teams  Support, test, upgrade GriPhyN Virtual Data Toolkit è Outreach effort  Integrated with GriPhyN è Coordination, interoperability

36 Outreach Workshop (Mar. 1, 2002)Paul Avery36 Current iVDGL Participants è Initial experiments (funded by NSF proposal)  CMS, ATLAS, LIGO, SDSS, NVO è U.S. Universities and laboratories  (Next slide) è Partners  TeraGrid  EU DataGrid + EU national projects  Japan (AIST, TITECH)  Australia è Complementary EU project: DataTAG  2.5 Gb/s transatlantic network

37 Outreach Workshop (Mar. 1, 2002)Paul Avery37  U FloridaCMS  CaltechCMS, LIGO  UC San DiegoCMS, CS  Indiana UATLAS, GOC  Boston UATLAS  U Wisconsin, MilwaukeeLIGO  Penn StateLIGO  Johns HopkinsSDSS, NVO  U Chicago/ArgonneCS  U Southern CaliforniaCS  U Wisconsin, MadisonCS  Salish KootenaiOutreach, LIGO  Hampton UOutreach, ATLAS  U Texas, BrownsvilleOutreach, LIGO  FermilabCMS, SDSS, NVO  BrookhavenATLAS  Argonne LabATLAS, CS U.S. iVDGL Proposal Participants T2 / Software CS support T3 / Outreach T1 / Labs (funded elsewhere)

38 Outreach Workshop (Mar. 1, 2002)Paul Avery38 Initial US-iVDGL Data Grid Tier1 (FNAL) Proto-Tier2 Tier3 university UCSD Florida Wisconsin Fermilab BNL Indiana BU Other sites to be added in 2002 SKC Brownsville Hampton PSU JHU Caltech

39 Outreach Workshop (Mar. 1, 2002)Paul Avery39 iVDGL Map (2002-2003) Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility DataTAG Surfnet Later  Brazil  Pakistan  Russia  China

40 Outreach Workshop (Mar. 1, 2002)Paul Avery40 Summary è Data Grids will qualitatively and quantitatively change the nature of collaborations and approaches to computing è The iVDGL will provide vast experience for new collaborations è Many challenges during the coming transition  New grid projects will provide rich experience and lessons  Difficult to predict situation even 3-5 years ahead

41 Outreach Workshop (Mar. 1, 2002)Paul Avery41 Grid References è Grid Book  www.mkp.com/grids è Globus  www.globus.org è Global Grid Forum  www.gridforum.org è TeraGrid  www.teragrid.org è EU DataGrid  www.eu-datagrid.org è PPDG  www.ppdg.net è GriPhyN  www.griphyn.org è iVDGL  www.ivdgl.org


Download ppt "Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida Global Data Grids for 21 st Century."

Similar presentations


Ads by Google