Grid Computing: Concepts, Applications, and Technologies

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

Distributed Data Processing
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Database Architectures and the Web
High Performance Computing Course Notes Grid Computing.
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Introduction to Grid Computing The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright (c)
The Grid Background and Architecture. 1. Keys to success for IT technologies Infrastructure Open Standards.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
1 GRID D. Royo, O. Ardaiz, L. Díaz de Cerio, R. Meseguer, A. Gallardo, K. Sanjeevan Computer Architecture Department Universitat Politècnica de Catalunya.
Grid Computing: Concepts, Applications, and Technologies
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Peer to Peer & Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University.
From GEANT to Grid empowered Research Infrastructures ANTONELLA KARLSON DG INFSO Research Infrastructures Grids Information Day 25 March 2003 From GEANT.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
The Anatomy of the Grid: An Integrated View of Grid Architecture Ian Foster, Steve Tuecke Argonne National Laboratory The University of Chicago Carl Kesselman.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Copyright © 2002 Intel Corporation. Intel Labs Towards Balanced Computing Weaving Peer-to-Peer Technologies into the Fabric of Computing over the Net Presented.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Authors: Ronnie Julio Cole David
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
1 Observations on Architecture, Protocols, Services, APIs, SDKs, and the Role of the Grid Forum Ian Foster Carl Kesselman Steven Tuecke.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 I.Foster LCG Grid Technology: Introduction & Overview Ian Foster Argonne National Laboratory University of Chicago.
Middleware and the Grid Steven Tuecke Mathematics and Computer Science Division Argonne National Laboratory.
Storage Management on the Grid Alasdair Earl University of Edinburgh.
] Open Science Grid Ben Clifford University of Chicago
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Chapter 1 Characterization of Distributed Systems
Accessing the VI-SEEM infrastructure
Grid Computing: Running your Jobs around the World
Clouds , Grids and Clusters
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Security Requirements for ChinaGrid Applications - What the current grid security solutions cannot do Hai Jin Huazhong University of Science and Technology.
Access Grid and USAID November 14, 2007
Globus —— Toolkits for Grid Computing
Grid Computing.
University of Technology
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
Grid Computing B.Ramamurthy 9/22/2018 B.Ramamurthy.
CS258 Spring 2002 Mark Whitney and Yitao Duan
The Globus Toolkit™: Information Services
The Grid and the Future of Business
Service Oriented Architecture (SOA)
Unit# 5: Internet and Worldwide Web
Implementing Production Grids
Introduction to Grid Technology
Grid Application Model and Design and Implementation of Grid Services
Large Scale Distributed Computing
Enterprise Integration
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Grid Computing: Concepts, Applications, and Technologies Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Outline The technology landscape Grid computing The Globus Toolkit Applications and technologies Data-intensive; distributed computing; collaborative; remote access to facilities Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Outline The technology landscape Grid computing The Globus Toolkit Applications and technologies Data-intensive; distributed computing; collaborative; remote access to facilities Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Living in an Exponential World (1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydro- dynamics star formation

Living in an Exponential World: (2) Storage Storage density doubles every 12 months Dramatic growth in online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) 2000 ~0.5 petabyte 2005 ~10 petabytes 2010 ~100 petabytes 2015 ~1000 petabytes? Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next?

Data Intensive Physical Sciences High energy & nuclear physics Including new experiments at CERN Gravity wave searches LIGO, GEO, VIRGO Time-dependent 3-D systems (simulation, data) Earth Observation, climate modeling Geophysics, earthquake modeling Fluids, aerodynamic design Pollutant dispersal scenarios Astronomy: Digital sky surveys

Ongoing Astronomical Mega-Surveys Large number of new surveys Multi-TB in size, 100M objects or larger In databases Individual archives planned and under way Multi-wavelength view of the sky > 13 wavelength coverage within 5 years Impressive early discoveries Finding exotic objects by unusual colors L,T dwarfs, high redshift quasars Finding objects by time variability Gravitational micro-lensing MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE ...

Coming Floods of Astronomy Data The planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008! All-sky survey every few days, so will have fine-grain time series for the first time

Data Intensive Biology and Medicine Medical data X-Ray, mammography data, etc. (many petabytes) Digitizing patient records (ditto) X-ray crystallography Molecular genomics and related disciplines Human Genome, other genome databases Proteomics (protein structure, activities, …) Protein interactions, drug delivery Virtual Population Laboratory (proposed) Simulate likely spread of disease outbreaks Brain scans (3-D, time dependent)

A Brain is a Lot of Data! (Mark Ellisman, UCSD) And comparisons must be made among many We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further

An Exponential World: (3) Networks (Or, Coefficients Matter …) Network vs. computer performance Computer speed doubles every 18 months Network speed doubles every 9 months Difference = order of magnitude per 5 years 1986 to 2000 Computers: x 500 Networks: x 340,000 2001 to 2010 Computers: x 60 Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

Outline The technology landscape Grid computing The Globus Toolkit Applications and technologies Data-intensive; distributed computing; collaborative; remote access to facilities Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Evolution of the Scientific Process Pre-electronic Theorize &/or experiment, alone or in small teams; publish paper Post-electronic Construct and mine very large databases of observational or simulation data Develop computer simulations & analyses Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams

Evolution of Business Pre-Internet Post-Internet Central corporate data processing facility Business processes not compute-oriented Post-Internet Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) Outsourcing becomes feasible => service providers of various sorts Business processes increasingly computing- and data-rich

The Grid “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

A Comparison SERIAL Fetch/Store Compute PARALLEL Fetch/Store Compute/ communicate Cooperative game GRID Fetch/Store Discovery of Resources Interaction with remote application Authentication / Authorization Security Compute/Communicate Etc

A Comparison SERIAL Fetch/Store Compute PARALLEL Fetch/Store Compute/ communicate Cooperative game GRID Fetch/Store Discovery of Resources Interaction with remote application Authentication / Authorization Security Compute/Communicate Etc

Distributed Computing vs. GRID Grid is an evolution of distributed computing Dynamic Geographically independent Built around standards Internet backbone Distributed computing is an “older term” Typically built around proprietary software and network Tightly couples systems/organization

Web vs. GRID Web Uniform naming access to documents Grid - Uniform, high performance access to computational resources http:// http:// Software Catalogs Sensor nets Colleges/R&D Labs

Is the World Wide Web a Grid ? Seamless naming? Yes Uniform security and Authentication? No Information Service? Yes or No Co-Scheduling? No Accounting & Authorization ? No User Services? No Event Services? No Is the Browser a Global Shell ? No

What does the World Wide Web bring to the Grid ? Uniform Naming A seamless, scalable information service A powerful new meta-data language: XML XML will be standard language for describing information in the grid SOAP – simple object access protocol Uses XML for encoding. HTML for protocol SOAP may become a standard RPC mechanism for Grid services Portal Ideas

The Ultimate Goal In future I will not know or care where my application will be executed as I will acquire and pay to use these resources as I need them

Why Grids? Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and Engineering.

An Example Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs?

Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Pentium II 300 MHz Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec HPSS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Caltech ~1 TIPS Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org

Intelligent Infrastructure: Distributed Servers and Services

The Grid: A Brief History Early 90s Gigabit testbeds, metacomputing Mid to late 90s Early experiments (e.g., I-WAY), academic software projects (e.g., Globus, Legion), application experiments 2002 Dozens of application communities & projects Major infrastructure deployments Significant technology base (esp. Globus ToolkitTM) Growing industrial interest Global Grid Forum: ~500 people, 20+ countries

The Grid World: Current Status Dozens of major Grid projects in scientific & technical computing/research & education www.mcs.anl.gov/~foster/grid-projects Considerable consensus on key concepts and technologies Open source Globus Toolkit™ a de facto standard for major protocols & services Industrial interest emerging rapidly IBM, Platform, Microsoft, Sun, Compaq, … Opportunity: convergence of eScience and eBusiness requirements & technologies

Outline The technology landscape Grid computing The Globus Toolkit Applications and technologies Data-intensive; distributed computing; collaborative; remote access to facilities Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Grid Technologies: Resource Sharing Mechanisms That … Address security and policy concerns of resource owners and users Are flexible enough to deal with many resource types and sharing modalities Scale to large number of resources, many participants, many program components Operate efficiently when dealing with large amounts of data & computation

Aspects of the Problem Need for interoperability when different groups want to share resources Diverse components, policies, mechanisms E.g., standard notions of identity, means of communication, resource descriptions Need for shared infrastructure services to avoid repeated development, installation E.g., one port/service/protocol for remote access to computing, not one per tool/appln E.g., Certificate Authorities: expensive to run A common need for protocols & services

The Hourglass Model Focus on architecture issues Design principles Propose set of core services as basic infrastructure Use to construct high-level, domain-specific solutions Design principles Keep participation cost low Enable local control Support for adaptation “IP hourglass” model A p p l i c a t i o n s Diverse global services Core services Local OS

Layered Grid Architecture (By Analogy to Internet Architecture) Application Internet Transport Application Link Internet Protocol Architecture Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services We define Grid architecture in terms of a layered collection of protocols. Fabric layer includes the protocols and interfaces that provide access to the resources that are being shared, including computers, storage systems, datasets, programs, and networks. This layer is a logical view rather then a physical view. For example, the view of a cluster with a local resource manager is defined by the local resource manger, and not the cluster hardware. Likewise, the fabric provided by a storage system is defined by the file system that is available on that system, not the raw disk or tapes. The connectivity layer defines core protocols required for Grid-specific network transactions. This layer includes the IP protocol stack (system level application protocols [e.g. DNS, RSVP, Routing], transport and internet layers), as well as core Grid security protocols for authentication and authorization. Resource layer defines protocols to initiate and control sharing of (local) resources. Services defined at this level are gatekeeper, GRIS, along with some user oriented application protocols from the Internet protocol suite, such as file-transfer. Collective layer defines protocols that provide system oriented capabilities that are expected to be wide scale in deployment and generic in function. This includes GIIS, bandwidth brokers, resource brokers,…. Application layer defines protocols and services that are parochial in nature, targeted towards a specific application domain or class of applications. These are are are … arrgh Resource “Sharing single resources”: negotiating access, controlling use Connectivity “Talking to things”: communication (Internet protocols) & security Fabric “Controlling things locally”: Access to, & control of, resources

Globus Toolkit™ A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications Offer a modular set of orthogonal services Enable incremental development of grid-enabled tools and applications Implement standard Grid protocols and APIs Available under liberal open source license Large community of developers & users Commercial support

General Approach Define Grid protocols & APIs Protocol-mediated access to remote resources Integrate and extend existing standards “On the Grid” = speak “Intergrid” protocols Develop a reference implementation Open source Globus Toolkit Client and server SDKs, services, tools, etc. Grid-enable wide variety of tools Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … Learn through deployment and applications

Key Protocols The Globus Toolkit™ centers around four key protocols Connectivity layer: Security: Grid Security Infrastructure (GSI) Resource layer: Resource Management: Grid Resource Allocation Management (GRAM) Information Services: Grid Resource Information Protocol (GRIP) and Index Information Protocol (GIIP) Data Transfer: Grid File Transfer Protocol (GridFTP) Also key collective layer protocols Info Services, Replica Management, etc.