Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

Similar presentations


Presentation on theme: "Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”"— Presentation transcript:

1 Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

2 2 “Web 2.0” l Software as services u Data- & computation-rich network services l Services as platforms u Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services l Enabled by massive infrastructure buildout u Google projected to spend $1.5B on computers, networks, and real estate in 2006 u Dozens of others are spending substantially l Paid for by advertising Declan Butler, Nature

3 3 Science 2.0: E.g., Virtual Observatories Data Archives User Analysis tools Gateway Figure: S. G. Djorgovski Discovery tools

4 4 Science 2.0: E.g., Cancer Bioinformatics Grid Data Service @ uchicago.edu <BPEL Workflow Doc> BPEL Engine Analytic service @ osu.edu Analytic service @ duke.edu <Workflow Results> <Workflow Inputs> link caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.

5 5 The Two Dimensions of Science 2.0 l Decompose across network Clients integrate dynamically u Select & compose services u Select “best of breed” providers u Publish result as new services l Decouple resource & service providers Function Resource Data Archives Analysis tools Discovery tools Users Fig: S. G. Djorgovski

6 6 Provisioning Technology Requirements: Integration & Decomposition l Service-oriented Grid infrastructure u Provision physical resources to support application workloads Appln Service Users Workflows Composition Invocation l Service-oriented applications u Wrap applications & data as services u Compose services into workflows “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

7 7 Globus Software Enables Grid Infrastructure l Web service interfaces for behaviors relating to integration and decomposition u Primitives: resources, state, security u Services: execution, data movement, … l Open source software that implements those interfaces u In particular, Globus Toolkit (GT4) l All standard Web services u “Grid is a use case for Web services, focused on resource management”

8 8 Open Source Grid Software Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

9 9 http://dev.globus.org Guidelines (Apache) Infrastructure (CVS, email, bugzilla, Wiki) Projects Include … dev.globus — Community Driven Improvement of Globus Software, NSF OCI

10 10 Community Services Provider Content Services Capacity Hosted Science Services 1) Integrate services from external sources u Virtualize “services” from providers 2) Coordinate & compose u Create new services from existing ones Capacity Provider “Service-Oriented Science”, Science, 2005

11 11 Birmingham The Globus-Based LIGO Data Grid Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory www.globus.org/solutions  Cardiff AEI/Golm

12 12 l Pull “missing” files to a storage system List of required Files GridFTP Local Replica Catalog Replica Location Index Data Replication Service Reliable File Transfer Service Local Replica Catalog GridFTP Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005 Replica Location Index Data Movement Data Location Data Replication

13 13 Decomposition Enables Separation of Concerns & Roles User Service Provider “Provide access to data D at S1, S2, S3 with performance P” Resource Provider “Provide storage with performance P1, network with P2, …” D S1 S2 S3 D S1 S2 S3 Replica catalog, User-level multicast, … D S1 S2 S3

14 14 Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Millions of BLAST, BLOCKS, etc., on OSG and TeraGrid Natalia Maltsev et al., http://compbio.mcs.anl.gov/puma2

15 15 Example: Earth System Grid l Climate simulation data u Per-collection control u Different user classes u Server-side processing l Implementation (GT) u Portal-based User Registration (PURSE) u PKI, SAML assertions u GridFTP, GRAM, SRM l >2000 users l >100 TB downloaded www.earthsystemgrid.org — DOE OASCR

16 16 Under the Covers

17 17 Example: Astro Portal Stacking Service l Purpose u On-demand “stacks” of random locations within ~10TB dataset l Challenge u Rapid access to 10- 10K “random” files u Time-varying load l Solution u Dynamic acquisition of compute, storage + + + + + + = + S4S4 Sloan Data Web page or Web Service Joint work with Ioan Raicu & Alex Szalay

18 18 Preliminary Performance (TeraGrid, LAN GPFS) Joint work with Ioan Raicu & Alex Szalay

19 19 Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Rupture Forecast Synthetic Seismogram Strain Green Tensor Hazard Curve Spectral Acceleration Hazard Map Tom Jordan et al., Southern California Earthquake Center

20 20 20 TB, 1.8 CPU-year Enlisting TeraGrid Resources Workflow Scheduler/Engine VO Service Catalog Provenance Catalog Data Catalog SCEC Storage TeraGrid Compute TeraGrid Storage VO Scheduler Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute

21 21 Science 1.0  Science 2.0 Gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Physical sciences Computational scientists NSF-funded  Terabytes  Services  Wikis  Communities  Science gateways  TeraGrid, OSG, campus  Workflow  Science as computation  All sciences (& humanities)  All scientists  NSF-funded

22 22 Science 2.0 Challenges l A need for new technologies, skills, & roles u Creating, publishing, hosting, discovering, composing, archiving, explaining … services l A need for substantial software development u “30-80% of modern astronomy projects is software”—S. G. Djorgovski l A need for more & different infrastructure u Computers & networks to host services l Can we leverage commercial spending? u To some extent, but not straightforward

23 23 Acknowledgements l Carl Kesselman for many discussions l Many colleagues, including those named on slides, for research collaborations and/or slides l Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures l Globus Alliance members for Globus software R&D l DOE OASCR, NSF OCI, & NIH for support

24 24 For More Information l Globus Alliance u www.globus.org l Dev.Globus u dev.globus.org l Open Science Grid u www.opensciencegrid.org l TeraGrid u www.teragrid.org l Background u www.mcs.anl.gov/~foster 2nd Edition www.mkp.com/grid2 Thanks for DOE, NSF, and NIH for research support!!


Download ppt "Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”"

Similar presentations


Ads by Google