Download presentation
Presentation is loading. Please wait.
Published byAnis Fields Modified over 9 years ago
1
Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu Global Data Grids for 21 st Century Science GriPhyN/iVDGL Outreach Workshop University of Texas, Brownsville March 1, 2002
2
Outreach Workshop (Mar. 1, 2002)Paul Avery2 What is a Grid? è Grid: Geographically distributed computing resources configured for coordinated use è Physical resources & networks provide raw capability è “Middleware” software ties it together
3
Outreach Workshop (Mar. 1, 2002)Paul Avery3 What Are Grids Good For? è Climate modeling Climate scientists visualize, annotate, & analyze Terabytes of simulation data è Biology A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour è High energy physics 3,000 physicists worldwide pool Petaflops of CPU resources to analyze Petabytes of data è Engineering Civil engineers collaborate to design, execute, & analyze shake table experiments A multidisciplinary analysis in aerospace couples code and data in four companies From Ian Foster
4
Outreach Workshop (Mar. 1, 2002)Paul Avery4 What Are Grids Good For? è Application Service Providers A home user invokes architectural design functions at an application service provider… …which purchases computing cycles from cycle providers è Commercial Scientists at a multinational toy company design a new product è Cities, communities An emergency response team couples real time data, weather model, population data A community group pools members’ PCs to analyze alternative designs for a local road è Health Hospitals and international agencies collaborate on stemming a major disease outbreak From Ian Foster
5
Outreach Workshop (Mar. 1, 2002)Paul Avery5 Proto-Grid: SETI@home è Community: SETI researchers + enthusiasts è Arecibo radio data sent to users (250KB data chunks) è Over 2M PCs used
6
Outreach Workshop (Mar. 1, 2002)Paul Avery6 è Community Research group (Scripps) 1000s of PC owners Vendor (Entropia) è Common goal Drug design Advance AIDS research More Advanced Proto-Grid: Evaluation of AIDS Drugs
7
Outreach Workshop (Mar. 1, 2002)Paul Avery7 Why Grids? è Resources for complex problems are distributed Advanced scientific instruments (accelerators, telescopes, …) Storage and computing Groups of people è Communities require access to common services Scientific collaborations (physics, astronomy, biology, eng. …) Government agencies Health care organizations, large corporations, … è Goal is to build “Virtual Organizations” Make all community resources available to any VO member Leverage strengths at different institutions Add people & resources dynamically
8
Outreach Workshop (Mar. 1, 2002)Paul Avery8 Grids: Why Now? è Moore’s law improvements in computing Highly functional endsystems è Burgeoning wired and wireless Internet connections Universal connectivity è Changing modes of working and problem solving Teamwork, computation è Network exponentials (Next slide)
9
Outreach Workshop (Mar. 1, 2002)Paul Avery9 Network Exponentials & Collaboration è Network vs. computer performance Computer speed doubles every 18 months Network speed doubles every 9 months Difference = order of magnitude per 5 years è 1986 to 2000 Computers: x 500 Networks: x 340,000 è 2001 to 2010? Computers: x 60 Networks: x 4000 Scientific American (Jan-2001)
10
Outreach Workshop (Mar. 1, 2002)Paul Avery10 Grid Challenges è Overall goal: Coordinated sharing of resources è Technical problems to overcome Authentication, authorization, policy, auditing Resource discovery, access, allocation, control Failure detection & recovery Resource brokering è Additional issue: lack of central control & knowledge Preservation of local site autonomy Policy discovery and negotiation important
11
Outreach Workshop (Mar. 1, 2002)Paul Avery11 Layered Grid Architecture (Analogy to Internet Architecture) Application Fabric Controlling things locally: Accessing, controlling resources Connectivity Talking to things: communications, security Resource Sharing single resources: negotiating access, controlling use Collective Managing multiple resources: ubiquitous infrastructure services User Specialized services: App. specific distributed services Internet Transport Application Link Internet Protocol Architecture From Ian Foster
12
Outreach Workshop (Mar. 1, 2002)Paul Avery12 Globus Project and Toolkit è Globus Project™ (Argonne + USC/ISI) O(40) researchers & developers Identify and define core protocols and services è Globus Toolkit™ 2.0 A major product of the Globus Project Reference implementation of core protocols & services Growing open source developer community è Globus Toolkit used by all Data Grid projects today US:GriPhyN, PPDG, TeraGrid, iVDGL EU:EU-DataGrid and national projects è Recent announcement of applying “web services” to Grids Keeps Grids in the commercial mainstream GT 3.0
13
Outreach Workshop (Mar. 1, 2002)Paul Avery13 Globus General Approach è Define Grid protocols & APIs Protocol-mediated access to remote resources Integrate and extend existing standards è Develop reference implementation Open source Globus Toolkit Client & server SDKs, services, tools, etc. è Grid-enable wide variety of tools Globus Toolkit FTP, SSH, Condor, SRB, MPI, … è Learn about real world problems Deployment Testing Applications Diverse global services Core services Diverse resources Applications
14
Outreach Workshop (Mar. 1, 2002)Paul Avery14 Data Intensive Science: 2000-2015 è Scientific discovery increasingly driven by IT Computationally intensive analyses Massive data collections Data distributed across networks of varying capability Geographically distributed collaboration è Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2005~10 Petabytes 2010~100 Petabytes 2015~1000 Petabytes? How to collect, manage, access and interpret this quantity of data? Drives demand for “Data Grids” to handle additional dimension of data access & movement
15
Outreach Workshop (Mar. 1, 2002)Paul Avery15 Data Intensive Physical Sciences è High energy & nuclear physics Including new experiments at CERN’s Large Hadron Collider è Gravity wave searches LIGO, GEO, VIRGO è Astronomy: Digital sky surveys Sloan Digital sky Survey, VISTA, other Gigapixel arrays “Virtual” Observatories (multi-wavelength astronomy) è Time-dependent 3-D systems (simulation & data) Earth Observation, climate modeling Geophysics, earthquake modeling Fluids, aerodynamic design Pollutant dispersal scenarios
16
Outreach Workshop (Mar. 1, 2002)Paul Avery16 Data Intensive Biology and Medicine è Medical data X-Ray, mammography data, etc. (many petabytes) Digitizing patient records (ditto) è X-ray crystallography Bright X-Ray sources, e.g. Argonne Advanced Photon Source è Molecular genomics and related disciplines Human Genome, other genome databases Proteomics (protein structure, activities, …) Protein interactions, drug delivery è Brain scans (3-D, time dependent) è Virtual Population Laboratory (proposed) Database of populations, geography, transportation corridors Simulate likely spread of disease outbreaks Craig Venter keynote @SC2001
17
Outreach Workshop (Mar. 1, 2002)Paul Avery17 Example: High Energy Physics “Compact” Muon Solenoid at the LHC (CERN) Smithsonian standard man
18
Outreach Workshop (Mar. 1, 2002)Paul Avery18 1800 Physicists 150 Institutes 32 Countries LHC Computing Challenges è Complexity of LHC interaction environment & resulting data è Scale: Petabytes of data per year (100 PB by ~2010-12) è GLobal distribution of people and resources
19
Outreach Workshop (Mar. 1, 2002)Paul Avery19 Tier0 CERN Tier1 National Lab Tier2 Regional Center (University, etc.) Tier3 University workgroup Tier4 Workstation Global LHC Data Grid Tier 1 T2 3 3 3 3 3 3 3 3 3 3 3 Tier 0 (CERN) 4 4 4 4 3 3 Key ideas: è Hierarchical structure è Tier2 centers
20
Outreach Workshop (Mar. 1, 2002)Paul Avery20 Global LHC Data Grid Tier2 Center Online System CERN Computer Center > 20 TIPS USA Center France Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations, other portals ~100 MBytes/sec 2.5 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec 2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1
21
Outreach Workshop (Mar. 1, 2002)Paul Avery21 Sloan Digital Sky Survey Data Grid
22
Outreach Workshop (Mar. 1, 2002)Paul Avery22 LIGO (Gravity Wave) Data Grid Hanford Observatory Livingston Observatory Caltech MIT INet2 Abilene Tier1 LSC Tier2 OC3 OC48 OC3 OC12 OC48
23
Outreach Workshop (Mar. 1, 2002)Paul Avery23 Data Grid Projects è Particle Physics Data Grid (US, DOE) Data Grid applications for HENP expts. è GriPhyN (US, NSF) Petascale Virtual-Data Grids è iVDGL (US, NSF) Global Grid lab è TeraGrid (US, NSF) Dist. supercomp. resources (13 TFlops) è European Data Grid (EU, EC) Data Grid technologies, EU deployment è CrossGrid (EU, EC) Data Grid technologies, EU è DataTAG (EU, EC) Transatlantic network, Grid applications è Japanese Grid Project (APGrid?) (Japan) Grid deployment throughout Japan Collaborations of application scientists & computer scientists Infrastructure devel. & deployment Globus based
24
Outreach Workshop (Mar. 1, 2002)Paul Avery24 Coordination of U.S. Grid Projects è Three U.S. projects PPDG: HENP experiments, short term tools, deployment GriPhyN: Data Grid research, Virtual Data, VDT deliverable iVDGL:Global Grid laboratory è Coordination of PPDG, GriPhyN, iVDGL Common experiments + personnel, management integration iVDGL as “joint” PPDG + GriPhyN laboratory Joint meetings (Jan. 2002, April 2002, Sept. 2002) Joint architecture creation (GriPhyN, PPDG) Adoption of VDT as common core Grid infrastructure Common Outreach effort (GriPhyN + iVDGL) è New TeraGrid project (Aug. 2001) 13MFlops across 4 sites, 40 Gb/s networking Goal: integrate into iVDGL, adopt VDT, common Outreach
25
Outreach Workshop (Mar. 1, 2002)Paul Avery25 Worldwide Grid Coordination è Two major clusters of projects “US based”GriPhyN Virtual Data Toolkit (VDT) “EU based” Different packaging of similar components
26
Outreach Workshop (Mar. 1, 2002)Paul Avery26 GriPhyN = App. Science + CS + Grids è GriPhyN = Grid Physics Network US-CMSHigh Energy Physics US-ATLASHigh Energy Physics LIGO/LSCGravity wave research SDSSSloan Digital Sky Survey Strong partnership with computer scientists è Design and implement production-scale grids Develop common infrastructure, tools and services (Globus based) Integration into the 4 experiments Broad application to other sciences via “Virtual Data Toolkit” Strong outreach program è Multi-year project R&D for grid architecture (funded at $11.9M +$1.6M) Integrate Grid infrastructure into experiments through VDT
27
Outreach Workshop (Mar. 1, 2002)Paul Avery27 GriPhyN Institutions U Florida U Chicago Boston U Caltech U Wisconsin, Madison USC/ISI Harvard Indiana Johns Hopkins Northwestern Stanford U Illinois at Chicago U Penn U Texas, Brownsville U Wisconsin, Milwaukee UC Berkeley UC San Diego San Diego Supercomputer Center Lawrence Berkeley Lab Argonne Fermilab Brookhaven
28
Outreach Workshop (Mar. 1, 2002)Paul Avery28 GriPhyN: PetaScale Virtual-Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source ~1 Petaflop ~100 Petabytes
29
Outreach Workshop (Mar. 1, 2002)Paul Avery29 GriPhyN Research Agenda è Virtual Data technologies (fig.) Derived data, calculable via algorithm Instantiated 0, 1, or many times (e.g., caches) “Fetch value” vs “execute algorithm” Very complex (versions, consistency, cost calculation, etc) è LIGO example “Get gravitational strain for 2 minutes around each of 200 gamma- ray bursts over the last year” è For each requested data value, need to Locate item location and algorithm Determine costs of fetching vs calculating Plan data movements & computations required to obtain results Execute the plan
30
Outreach Workshop (Mar. 1, 2002)Paul Avery30 Virtual Data in Action è Data request may Compute locally Compute remotely Access local data Access remote data è Scheduling based on Local policies Global policies Cost Major facilities, archives Regional facilities, caches Local facilities, caches Fetch item
31
Outreach Workshop (Mar. 1, 2002)Paul Avery31 GriPhyN Research Agenda (cont.) è Execution management Co-allocation of resources (CPU, storage, network transfers) Fault tolerance, error reporting Interaction, feedback to planning è Performance analysis (with PPDG) Instrumentation and measurement of all grid components Understand and optimize grid performance è Virtual Data Toolkit (VDT) VDT = virtual data services + virtual data tools One of the primary deliverables of R&D effort Technology transfer mechanism to other scientific domains
32
Outreach Workshop (Mar. 1, 2002)Paul Avery32 GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus = initial solution is operational
33
Outreach Workshop (Mar. 1, 2002)Paul Avery33 Transparency wrt materialization Id Trans FParamName … i1 F X F.X … i2 F Y F.Y … i10 G Y PG(P).Y … TransProgCost … F URL:f 10 … G URL:g 20 … Program storage Trans. name URLs for program location Derived Data Catalog Transformation Catalog Update upon materialization App specificattr. id … …i2,i10 … … Derived Metadata Catalog id Id TransParam Name … i1 F X F.X … i2 F Y F.Y … i10 G Y PG(P).Y … Trans ProgCost … F URL:f 10 … G URL:g 20 … Program storage Trans. name URLs for program location App-specific-attr id … …i2,i10 … … id Physical file storage URLs for physical file location NameLObjN… F.XlogO3 … … LCNPFNs… logC1 URL1 logC2 URL2 URL3 logC3 URL4 logC4 URL5 URL6 Metadata Catalog Replica Catalog Logical Container Name GCMS Object Name Transparency wrt location Name LObjN … … X logO1 … … Y logO2 … … F.X logO3 … … G(1).Y logO4 … … LCNPFNs… logC1 URL1 logC2 URL2 URL3 logC3 URL4 logC4 URL5 URL6 Replica Catalog GCMS Object Name Catalog Architecture Metadata Catalog
34
Outreach Workshop (Mar. 1, 2002)Paul Avery34 iVDGL: A Global Grid Laboratory è International Virtual-Data Grid Laboratory A global Grid laboratory (US, EU, South America, Asia, …) A place to conduct Data Grid tests “at scale” A mechanism to create common Grid infrastructure A facility to perform production exercises for LHC experiments A laboratory for other disciplines to perform Data Grid tests A focus of outreach efforts to small institutions è Funded for $13.65M by NSF “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001
35
Outreach Workshop (Mar. 1, 2002)Paul Avery35 iVDGL Components è Computing resources Tier1, Tier2, Tier3 sites è Networks USA (TeraGrid, Internet2, ESNET), Europe (Géant, …) Transatlantic (DataTAG), Transpacific, AMPATH, … è Grid Operations Center (GOC) Indiana (2 people) Joint work with TeraGrid on GOC development è Computer Science support teams Support, test, upgrade GriPhyN Virtual Data Toolkit è Outreach effort Integrated with GriPhyN è Coordination, interoperability
36
Outreach Workshop (Mar. 1, 2002)Paul Avery36 Current iVDGL Participants è Initial experiments (funded by NSF proposal) CMS, ATLAS, LIGO, SDSS, NVO è U.S. Universities and laboratories (Next slide) è Partners TeraGrid EU DataGrid + EU national projects Japan (AIST, TITECH) Australia è Complementary EU project: DataTAG 2.5 Gb/s transatlantic network
37
Outreach Workshop (Mar. 1, 2002)Paul Avery37 U FloridaCMS CaltechCMS, LIGO UC San DiegoCMS, CS Indiana UATLAS, GOC Boston UATLAS U Wisconsin, MilwaukeeLIGO Penn StateLIGO Johns HopkinsSDSS, NVO U Chicago/ArgonneCS U Southern CaliforniaCS U Wisconsin, MadisonCS Salish KootenaiOutreach, LIGO Hampton UOutreach, ATLAS U Texas, BrownsvilleOutreach, LIGO FermilabCMS, SDSS, NVO BrookhavenATLAS Argonne LabATLAS, CS U.S. iVDGL Proposal Participants T2 / Software CS support T3 / Outreach T1 / Labs (funded elsewhere)
38
Outreach Workshop (Mar. 1, 2002)Paul Avery38 Initial US-iVDGL Data Grid Tier1 (FNAL) Proto-Tier2 Tier3 university UCSD Florida Wisconsin Fermilab BNL Indiana BU Other sites to be added in 2002 SKC Brownsville Hampton PSU JHU Caltech
39
Outreach Workshop (Mar. 1, 2002)Paul Avery39 iVDGL Map (2002-2003) Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility DataTAG Surfnet Later Brazil Pakistan Russia China
40
Outreach Workshop (Mar. 1, 2002)Paul Avery40 Summary è Data Grids will qualitatively and quantitatively change the nature of collaborations and approaches to computing è The iVDGL will provide vast experience for new collaborations è Many challenges during the coming transition New grid projects will provide rich experience and lessons Difficult to predict situation even 3-5 years ahead
41
Outreach Workshop (Mar. 1, 2002)Paul Avery41 Grid References è Grid Book www.mkp.com/grids è Globus www.globus.org è Global Grid Forum www.gridforum.org è TeraGrid www.teragrid.org è EU DataGrid www.eu-datagrid.org è PPDG www.ppdg.net è GriPhyN www.griphyn.org è iVDGL www.ivdgl.org
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.