Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

Slides:



Advertisements
Similar presentations
1 Birmingham LIGO Gravitational Wave Observatory Replicate 1 TB/day of data to 10+ international sites Uses GridFTP, RFT, RLS, DRS Cardiff AEI/Golm
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Services for Science.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Data Grids Darshan R. Kapadia Gregor von Laszewski
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact.
Service Oriented Grid Architecture Hui Li ICT in Business Colloquium, LIACS Mar 1 st, 2006 Note: Part of this presentation is based on Dr. Ian Foster’s.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
Grid Based Infrastructure for Distributed Medical Imaging Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
The Global Storage Grid Or, Managing Data for “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
GT4 Introductory and Advanced Practicals Rachana Ananthakrishnan, Charles Bacon, Lisa Childers Argonne National Laboratory University of Chicago.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Service-Oriented Science: Scaling eScience Impact Or, “Science 2.0”
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1 Globus Developments Malcolm Atkinson for OMII SC 18 th January 2005.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
1 The Application-Infrastructure Gap Dynamic and/or Distributed Applications A 1 B Shared Distributed Infrastructure.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Service-Oriented Science Scaling eScience Application & Impact.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Grid Dynamics.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Globus and Service Oriented Architecture.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Ian Foster Argonne National Laboratory University of Chicago Univa Corporation Service-Oriented Science Scaling Science Services APAC Conference, September.
Virtual Communities and Science in the Large Dr. Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute Research.
The Anatomy of the Grid: An Integrated View of Grid Architecture Ian Foster, Steve Tuecke Argonne National Laboratory The University of Chicago Carl Kesselman.
Grid Services Overview & Introduction Ian Foster Argonne National Laboratory University of Chicago Univa Corporation OOSTech, Baltimore, October 26, 2005.
Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to Grid Computing Midwest Grid Workshop Module 1 1.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Grid Based Infrastructure for Distributed Medical Imaging Carl Kesselman ISI Fellow Director, Center for Grid Technologies Information Sciences Institute.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Tools for collaboration How to share your duck tales…
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
1 ARGONNE  CHICAGO Grid Introduction and Overview Ian Foster Argonne National Lab University of Chicago Globus Project
Authors: Ronnie Julio Cole David
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Scaling eScience Impact.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Introduction to Grid Computing Grid School Workshop – Module 1 1.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Grid Enabling Open Science.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Application Hosting Services — Enabling Science 2.0 —
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Virtual Organizations By the Rules
Argonne National Laboratory
Presentation transcript:

Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

2 “Web 2.0” l Software as services u Data- & computation-rich network services l Services as platforms u Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services l Enabled by massive infrastructure buildout u Google projected to spend $1.5B on computers, networks, and real estate in 2006 u Dozens of others are spending substantially l Paid for by advertising Declan Butler, Nature

3 Science 2.0: E.g., Virtual Observatories Data Archives User Analysis tools Gateway Figure: S. G. Djorgovski Discovery tools

4 Science 2.0: E.g., Cancer Bioinformatics Grid Data uchicago.edu <BPEL Workflow Doc> BPEL Engine Analytic osu.edu Analytic duke.edu <Workflow Results> <Workflow Inputs> link caBiG: BPEL work: Ravi Madduri et al.

5 The Two Dimensions of Science 2.0 l Decompose across network Clients integrate dynamically u Select & compose services u Select “best of breed” providers u Publish result as new services l Decouple resource & service providers Function Resource Data Archives Analysis tools Discovery tools Users Fig: S. G. Djorgovski

6 Provisioning Technology Requirements: Integration & Decomposition l Service-oriented Grid infrastructure u Provision physical resources to support application workloads Appln Service Users Workflows Composition Invocation l Service-oriented applications u Wrap applications & data as services u Compose services into workflows “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

7 Globus Software Enables Grid Infrastructure l Web service interfaces for behaviors relating to integration and decomposition u Primitives: resources, state, security u Services: execution, data movement, … l Open source software that implements those interfaces u In particular, Globus Toolkit (GT4) l All standard Web services u “Grid is a use case for Web services, focused on resource management”

8 Open Source Grid Software Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 Credential Mgmt Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

9 Guidelines (Apache) Infrastructure (CVS, , bugzilla, Wiki) Projects Include … dev.globus — Community Driven Improvement of Globus Software, NSF OCI

10 Community Services Provider Content Services Capacity Hosted Science Services 1) Integrate services from external sources u Virtualize “services” from providers 2) Coordinate & compose u Create new services from existing ones Capacity Provider “Service-Oriented Science”, Science, 2005

11 Birmingham The Globus-Based LIGO Data Grid Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory  Cardiff AEI/Golm

12 l Pull “missing” files to a storage system List of required Files GridFTP Local Replica Catalog Replica Location Index Data Replication Service Reliable File Transfer Service Local Replica Catalog GridFTP Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005 Replica Location Index Data Movement Data Location Data Replication

13 Decomposition Enables Separation of Concerns & Roles User Service Provider “Provide access to data D at S1, S2, S3 with performance P” Resource Provider “Provide storage with performance P1, network with P2, …” D S1 S2 S3 D S1 S2 S3 Replica catalog, User-level multicast, … D S1 S2 S3

14 Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Millions of BLAST, BLOCKS, etc., on OSG and TeraGrid Natalia Maltsev et al.,

15 Example: Earth System Grid l Climate simulation data u Per-collection control u Different user classes u Server-side processing l Implementation (GT) u Portal-based User Registration (PURSE) u PKI, SAML assertions u GridFTP, GRAM, SRM l >2000 users l >100 TB downloaded — DOE OASCR

16 Under the Covers

17 Example: Astro Portal Stacking Service l Purpose u On-demand “stacks” of random locations within ~10TB dataset l Challenge u Rapid access to K “random” files u Time-varying load l Solution u Dynamic acquisition of compute, storage = + S4S4 Sloan Data Web page or Web Service Joint work with Ioan Raicu & Alex Szalay

18 Preliminary Performance (TeraGrid, LAN GPFS) Joint work with Ioan Raicu & Alex Szalay

19 Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Rupture Forecast Synthetic Seismogram Strain Green Tensor Hazard Curve Spectral Acceleration Hazard Map Tom Jordan et al., Southern California Earthquake Center

20 20 TB, 1.8 CPU-year Enlisting TeraGrid Resources Workflow Scheduler/Engine VO Service Catalog Provenance Catalog Data Catalog SCEC Storage TeraGrid Compute TeraGrid Storage VO Scheduler Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute

21 Science 1.0  Science 2.0 Gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Physical sciences Computational scientists NSF-funded  Terabytes  Services  Wikis  Communities  Science gateways  TeraGrid, OSG, campus  Workflow  Science as computation  All sciences (& humanities)  All scientists  NSF-funded

22 Science 2.0 Challenges l A need for new technologies, skills, & roles u Creating, publishing, hosting, discovering, composing, archiving, explaining … services l A need for substantial software development u “30-80% of modern astronomy projects is software”—S. G. Djorgovski l A need for more & different infrastructure u Computers & networks to host services l Can we leverage commercial spending? u To some extent, but not straightforward

23 Acknowledgements l Carl Kesselman for many discussions l Many colleagues, including those named on slides, for research collaborations and/or slides l Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures l Globus Alliance members for Globus software R&D l DOE OASCR, NSF OCI, & NIH for support

24 For More Information l Globus Alliance u l Dev.Globus u dev.globus.org l Open Science Grid u l TeraGrid u l Background u 2nd Edition Thanks for DOE, NSF, and NIH for research support!!