Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1.

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Introduction to Grid Computing Slides adapted from Midwest Grid School Workshop 2008 (
High Performance Computing Course Notes Grid Computing.
Data Grids Darshan R. Kapadia Gregor von Laszewski
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Workload Management Massimo Sgaravatto INFN Padova.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Introduction to Grid Computing Health Grid 2008 University of Chicago Gleacher Center Chicago, IL USA June 2,
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Grid Computing. What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 –“A computational grid is a hardware.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Introduction to Grid Computing 1. Computing “Clusters” are today’s Supercomputers Cluster Management “frontend” Tape Backup robots I/O Servers typically.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
DISTRIBUTED COMPUTING
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to Grid Computing Midwest Grid Workshop Module 1 1.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Introduction to Grid Computing Grid School Workshop – Module 1 1.
Tools for collaboration How to share your duck tales…
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Authors: Ronnie Julio Cole David
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Introduction to Grid Computing Grid School Workshop – Module 1 1.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Introduction to Grids Ben Clifford University of Chicago
] Open Science Grid Ben Clifford University of Chicago
Grid Computing.
CS258 Spring 2002 Mark Whitney and Yitao Duan
Presentation transcript:

Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19,

Scaling up Science: Citation Network Analysis in Sociology Work of James Evans, University of Chicago, Department of Sociology 2

Scaling up the analysis Query and analysis of 25+ million citations Work started on desktop workstations Queries grew to month-long duration With data distributed across U of Chicago TeraPort cluster:  50 (faster) CPUs gave 100 X speedup  Many more methods and hypotheses can be tested! Higher throughput and capacity enables deeper analysis and broader community access. 3

Computing clusters have commoditized supercomputing Cluster Management “frontend” Tape Backup robots I/O Servers typically RAID fileserver Disk Arrays Lots of Worker Nodes A few Headnodes, gatekeepers and other service nodes 4

Initial Grid driver: High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech 5

Grids can process vast datasets. Many HEP and Astronomy experiments consist of:  Large datasets as inputs (find datasets)  “Transformations” which work on the input datasets (process)  The output datasets (store and publish) The emphasis is on the sharing of the large datasets Programs when independent can be parallelized. Montage Workflow: ~1200 jobs, 7 levels NVO, NASA, ISI/Pegasus - Deelman et al. Mosaic of M42 created on TeraGrid = Data Transfer = Compute Job 6

PUMA: Analysis of Metabolism PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Analysis on Grid Involves millions of BLAST, BLOCKS, and other processes Natalia Maltsev et al. 7

Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center). Seismic Hazard Model Seismicity Paleoseismology Local site effects Geologic structure Faults Stress transfer Crustal motion Crustal deformation Seismic velocity structure Rupture dynamics 88

Grids Provide Global Resources To Enable e-Science 9

Ian Foster’s Grid Checklist A Grid is a system that:  Coordinates resources that are not subject to centralized control  Uses standard, open, general-purpose protocols and interfaces  Delivers non-trivial qualities of service 10

Virtual Organizations (VOs) Groups of organizations that use the Grid to share resources for specific purposes Support a single community Deploy compatible technology and agree on working policies  Security policies - difficult Deploy different network accessible services:  Grid Information  Grid Resource Brokering  Grid Monitoring  Grid Accounting 11

What is a grid made of ? Middleware. Grid Protocols Grid Resources dedicated by UC, IU, Boston Grid Storage Grid Middleware Computing Cluster Grid resource time purchased from commercial provider Grid Middleware Computing Cluster Grid resources shared by OSG, LCG, NorduGRID Grid Middleware Computing Cluster Grid Client Application User Interface Grid Middleware Resource, Workflow And Data Catalogs Grid Storage Grid Storage Security to control access and protect communication (GSI) Directory to locate grid sites and services: (VORS, MDS) Uniform interface to computing sites (GRAM) Facility to maintain and schedule queues of work (Condor-G) Fast and secure data set mover (GridFTP, RFT) Directory to track where datasets live (RLS) 12

We will focus on Globus and Condor Condor provides both client & server scheduling  In grids, Condor provides an agent to queue, schedule and manage work submission Globus Toolkit (GT) provides the lower-level middleware  Client tools which you can use from a command line  APIs (scripting languages, C, C++, Java, …) to build your own tools, or use direct from applications  Web service interfaces  Higher level tools built from these basic components, e.g. Reliable File Transfer (RFT) 13

Grid software is like a “stack” Grid Security Infrastructure Job Management Data Management Information (config & status) Services Core Globus Services Standard Network Protocols Condor Grid Client Workflow system (explicit or ad-hoc) Grid Application (often includes a Portal) 14

Provisioning Grid architecture is evolving to a Service-Oriented approach. Service-oriented Grid infrastructure  Provision physical resources to support application workloads Appln Service Users Workflows Composition Invocation Service-oriented applications  Wrap applications as services  Compose applications into workflows “The Many Faces of IT as Service”, Foster, Tuecke, but this is beyond our workshop’s scope. See “Service-Oriented Science” by Ian Foster. 15

Job and resource management 16

Local Resource Manager: a batch scheduler for running jobs on a computing cluster Popular LRMs include:  PBS – Portable Batch System  LSF – Load Sharing Facility  SGE – Sun Grid Engine  Condor – Originally for cycle scavenging, Condor has evolved into a comprehensive system for managing computing LRMs execute on the cluster’s head node Simplest LRM allows you to “fork” jobs quickly  Runs on the head node (gatekeeper) for fast utility functions  No queuing (but this is emerging to “throttle” heavy loads) In GRAM, each LRM is handled with a “job manager” 17

July 11-15, 2005 Lecture3: Grid Job Management 18 GRAM – Globus Resource Allocation Manager “globusrun myjob …” Globus GRAM Protocol Organization A Organization B Gatekeeper & Job Manager Submit to LRM

GRAM provides a uniform interface to diverse cluster schedulers. User Grid VO Condor PBS LSF UNIX fork() GRAM Site A Site B Site C Site D 19

July 11-15, 2005 Lecture3: Grid Job Management 20 Condor-G: Grid Job Submission Manager Globus GRAM Protocol Gatekeeper & Job Manager Submit to LRM Organization A Organization B Condor-G myjob1 myjob2 myjob3 myjob4 myjob5 …

Data Management 21

Data management services provide the mechanisms to find, move and share data GridFTP  Fast, Flexible, Secure, Ubiquitous data transport  Often embedded in higher level services RFT  Reliable file transfer service using GridFTP Replica Location Service (RLS)  Tracks multiple copies of data for speed and reliability  Emerging services integrate replication and transport Metadata management services are evolving 22

GridFTP is secure, reliable and fast Security through GSI  Authentication and authorization  Can also provide encryption Reliability by restarting failed transfers Fast  Can set TCP buffers for optimal performance  Parallel transfers  Striping (multiple endpoints) Client Tools globus-url-copy, uberftp, custom clients 23

GridFTP examples globus-url-copy file:///home/YOURLOGIN/dataex/myfile gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex1 globus-url-copy gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex2 gsiftp://tp-osg.ci.uchicago.edu/YOURLOGIN/ex3

Grids replicate data files for faster access Effective use of the grid resources – more parallelism Each logical file can have multiple physical copies Avoids single points of failure Manual or automatic replication  Automatic replication considers the demand for a file, transfer bandwidth, etc. 25

File Catalog Services  Replica Location Service (RLS)  Phedex  RefDB / PupDB Requirements  Abstract the logical file name (LFN) for a physical file  maintain the mappings between the LFNs and the PFNs (physical file names)  Maintain the location information of a file File catalogues tell you where the data is 26

RLS -Replica Location Service RLS component of the data grid architecture (Globus component) It provides access to mapping information from logical names to physical names of items Its goal is to reduce access latency, improve data locality, improve robustness, scalability and performance for distributed applications RLS produces replica catalogs (LRCs), which represent mappings between logical and physical files scattered across the storage system. For better performance, the LRC can be indexed.

RLS -Replica Location Service RLS maps logical filenames to physical filenames. Logical Filenames (LFN)  Names a file with interesting data in it  Doesn’t refer to location (which host, or where in a host)‏ Physical Filenames (PFN)  Refers to a file on some filesystem somewhere  Often use gsiftp:// URLs to specify Two RLS catalogs:  Local Replica Catalog (LRC) and  Replica Location Index (RLI)

Local Replica Catalog (LRC) ‏ stores mappings from LFNs to PFNs. Interaction:  Q: Where can I get filename ‘experiment_result_1’?  A: You can get it from gsiftp://gridlab1.ci.uchicago.edu/home/benc/r.txt Undesirable to have one of these for whole grid  Lots of data  Single point of failure

Replica Location Index (RLI) ‏ stores mappings from LFNs to LRCs. Interaction:  Q: Who can tell me about filename ‘experiment_result_1’.  A: You can get more info from the LRC at gridlab1  (Then go to ask that LRC for more info)‏ Failure of one RLI or LRC doesn’t break everything RLI stores reduced set of information, so can cope with many more mappings

Globus RLS file1→ gsiftp://serverA/file1 file2→ gsiftp://serverA/file2 LRC RLI file3→ rls://serverB/file3 file4→ rls://serverB/file4 rls://serverA:39281 file1 file2 site A file3→ gsiftp://serverB/file3 file4→ gsiftp://serverB/file4 LRC RLI file1→ rls://serverA/file1 file2→ rls://serverA/file2 rls://serverB:39281 file3 file4 site B

Storage Resource Managers (SRMs) are middleware components Storage Resource Managers (SRMs) are middleware components  whose function is to provide dynamic space allocation file management on shared storage resources on the Grid  Different implementations for underlying storage systems are based on the same SRM specification What is SRM?

SRMs role in the data grid architecture SRMs role in the data grid architecture  Shared storage space allocation & reservation important for data intensive applications  Get/put files from/into spaces archived files on mass storage systems  File transfers from/to remote sites, file replication  Negotiate transfer protocols  File and space management with lifetime  support non-blocking (asynchronous) requests  Directory management  Interoperate with other SRMs SRMs role in grid

SRM: Main concepts  Space reservations  Dynamic space management  Pinning file in spaces  Support abstract concept of a file name: Site URL  Temporary assignment of file names for transfer: Transfer URL  Directory management and authorization  Transfer protocol negotiation  Support for peer to peer request  Support for asynchronous multi-file requests  Support abort, suspend, and resume operations  Non-interference with local policies

Birmingham The Globus-Based LIGO Data Grid Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory  Cardiff AEI/Golm 35

Grid Security 36

Grid security is a crucial component Problems being solved might be sensitive Resources are typically valuable Resources are located in distinct administrative domains  Each resource has own policies, procedures, security mechanisms, etc. Implementation must be broadly available & applicable  Standard, well-tested, well-understood protocols; integrated with wide variety of tools 37

Grid Security Infrastructure - GSI Provides secure communications for all the higher-level grid services Secure Authentication and Authorization  Authentication ensures you are whom you claim to be ID card, fingerprint, passport, username/password  Authorization controls what you are permitted to do Run a job, read or write a file GSI provides Uniform Credentials Single Sign-on  User authenticates once – then can perform many tasks 38

National Grid Cyberinfrastructure 39

Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance Composed of a large number (>50 and growing) of shared computing facilities, or “sites” A consortium of universities and national laboratories, building a sustainable grid infrastructure for science. 40

Diverse job mix Open Science Grid  70 sites (20,000 CPUs) & growing  400 to >1000 concurrent jobs  Many applications + CS experiments; includes long-running production operations 41

TeraGrid provides vast resources via a number of huge computing facilities. 42

To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with information maintained on the “health” of sites 43

OSG Resource Selection Service: VORS 44

Grid Workflow 45

A typical workflow pattern in image analysis runs many filtering apps. Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 46

Conclusion: Why Grids? New approaches to inquiry based on  Deep analysis of huge quantities of data  Interdisciplinary collaboration  Large-scale simulation and analysis  Smart instrumentation  Dynamically assemble the resources to tackle a new scale of problem Enabled by access to resources & services without regard for location & other barriers 47

Grids: Because Science needs community … Teams organized around common goals People, resource, software, data, instruments… With diverse membership & capabilities Expertise in multiple areas required And geographic and political distribution No location/organization possesses all required skills and resources Must adapt as a function of the situation Adjust membership, reallocate responsibilities, renegotiate resources 48

What’s next ? Take the self-paced course: opensciencegrid.org/OnlineGridCourse Contact us: Periodically check OSG site: opensciencegrid.org

Acknowledgements (OSG, Globus, Condor teams and associates) Gabrielle Allen, LSU Rachana Ananthakrishnan, ANL Ben Clifford, Uchicago Alex Sim, BNL Alain Roy, UW - Madison Mike Wilde, ANL 50