Introduction to Grids Ben Clifford University of Chicago

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

Introduction to Grid Computing Slides adapted from Midwest Grid School Workshop 2008 (
High Performance Computing Course Notes Grid Computing.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Workload Management Massimo Sgaravatto INFN Padova.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Introduction to Grid Computing Alina Bejan - University of Chicago SC08 Education Program, OU - Oklahoma Aug
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Introduction to Grid Computing Grid School Workshop – Module 1 1.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Authors: Ronnie Julio Cole David
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Review of Condor,SGE,LSF,PBS
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Eine Einführung ins Grid Andreas Gellrich IT Training DESY Hamburg
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
Introduction to Grid Computing Grid School Workshop – Module 1 1.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Storage Management on the Grid Alasdair Earl University of Edinburgh.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
] Open Science Grid Ben Clifford University of Chicago
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Accessing the VI-SEEM infrastructure
Grid and Cloud Computing
Workload Management Workpackage
Clouds , Grids and Clusters
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Peter Kacsuk – Sipos Gergely MTA SZTAKI
GWE Core Grid Wizard Enterprise (
Globus —— Toolkits for Grid Computing
Grid Computing.
The Client/Server Database Environment
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Network Requirements Javier Orellana
Building Grids with Condor
#01 Client/Server Computing
Grid Canada Testbed using HEP applications
CS258 Spring 2002 Mark Whitney and Yitao Duan
Chapter 2: Operating-System Structures
Wide Area Workload Management Work Package DATAGRID project
Mats Rynge USC Information Sciences Institute
Chapter 2: Operating-System Structures
#01 Client/Server Computing
Presentation transcript:

Introduction to Grids Ben Clifford University of Chicago

About Me ● Programmer at University of Chicago Computation Institute ● Open Science Grid – Education, Outreach, Training ● A Developer on Swift – grid workflow system ● Application support for grid users at University of Chicago

Motivation ● You can run lots of science applications on your desktop PC ● Sometimes they get too big ● Too much data to process ●... or... ● Too much computation

Scaling up Science: Citation Network Analysis in Sociology Work of James Evans, University of Chicago, Department of Sociology 4

Scaling up the analysis ● Query and analysis of 25+ million citations ● Work started on desktop workstations ● Queries grew to month-long duration ● With data distributed across U of Chicago TeraPort cluster: – 50 (faster) CPUs gave 100 X speedup – Many more methods and hypotheses can be tested! ● Higher throughput and capacity enables deeper analysis and broader community access. 5

High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional CentreGermany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech 6

Computing “Clusters” are today’s Supercomputers Cluster Management “frontend” Tape Backup robots I/O Servers typically RAID fileserver Disk Arrays Lots of Worker Nodes A few Headnodes, gatekeepers and other service nodes 7

Cluster Architecture Cluster User 8 Head Node(s) Login access (ssh) Cluster Scheduler (PBS, Condor, SGE) Remote File Access (scp, GridFTP) Node 0 … … … Node N Shared Cluster Filesystem Storage (applications and data) Job execution requests & status Comp ute Node s (10 to 10,000 PC’s with local disks) … Cluster User Internet Protocols

Grids consist of distributed clusters Grid Client Application & User Interface Grid Client Middleware Resource, Workflow & Data Catalogs 9 Grid Site 2: Sao Paolo Grid Service Middleware Compute Cluster Grid Storage Grid Protocols Grid Site 1: Fermilab Grid Service Middleware Compute Cluster Grid Storage …Grid Site N: UWisconsin Grid Service Middleware Compute Cluster Grid Storage

Ian Foster’s Grid Checklist ● A Grid is a system that: – Coordinates resources that are not subject to centralized control – Uses standard, open, general-purpose protocols and interfaces – Delivers non-trivial qualities of service 10

Open Science Grid map

March 2007JP Navarro 13 Grid Resources in the US Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elements –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elements –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites (soon 14) –Connected via dedicated multi-Gbps links –Mix of Architectures ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites (soon 14) –Connected via dedicated multi-Gbps links –Mix of Architectures ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes The TeraGrid The OSG

14 Grid Middleware glues the grid together ● A short, intuitive definition: the software that glues together different clusters into a grid, taking into consideration the socio-political side of things (such as common policies on who can use what, how much, and what for)

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 15

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 16

17 Local Resource Managers (LRM) Compute resources have a local resource manager (LRM) that controls: – Who is allowed to run jobs – How jobs run on a specific resource – Specifies the order and location of jobs Example policy:  Each cluster node can run one job.  If there are more jobs, then they must wait in a queue ● LRMs allow nodes in a cluster can be reserved for a specific person ● Examples: PBS, LSF, Condor

18 Globus Toolkit ● the de facto standard for grid middleware. ● Open source ● Adopted by different scientific communities and industries ● Conceived as an open set of architectures, services and software libraries that support grids and grid applications ● Provides services in major areas of distributed systems: – job submission – data management – security

19 GRAM Globus Resource Allocation Manager ● GRAM = provides a standardised interface to submit jobs to LRMs. ● Clients submit a job request to GRAM ● GRAM translates into something a(ny) LRM can understand …. Same job request can be used for many different kinds of LRM

20 Condor ● Condor is a specialized workload management system for compute-intensive jobs. ● is a software system that creates an HTC environment  Created at UW-MadisonUW-Madison  Detects machine availability  Harnesses available resources  Uses remote system calls to send R/W operations over the network  Provides powerful resource management by matching resource owners with consumers (broker)

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 21

22 High-performance tools needed to solve several data problems. ● The huge raw volume of data: – Storing it – Moving it – Measured in terabytes, petabytes, and further … ● The huge number of filenames: – filenames is expected soon – Collection of of anything is a lot to handle efficiently ● How to find the data

23 Data Questions on the Grid Where are the files I want? How to move data/files to where I want?

24 GridFTP high performance, secure, and reliable data transfer protocol based on the standard FTP  Extensions include  Strong authentication, encryption via Globus GSI  Multiple data channels for parallel transfers  Third-party transfers  Tunable network & I/O parameters  Authenticated reusable channels  Server side processing, command pipelining

25 RFT = Reliable file transfer protocol that provides reliability and fault tolerance for file transfers Part of the Globus Toolkit RFT acts as a client to GridFTP, providing management of a large number of transfer jobs (same as Condor to GRAM) RFT can keep track of the state of each job run several transfers at once deal with connection failure, network failure, failure of any of the servers involved.

26 RLS -Replica Location Service RLS It provides access to mapping information from logical names to physical names of items Its goal is to reduce access latency, improve data locality, improve robustness, scalability and performance for distributed applications RLS produces replica catalogs (LRCs), which represent mappings between logical and physical files scattered across the storage system. For better performance, the LRC can be indexed.

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 27

Grid Information ● Get useful information about grid sites to help use those sites. ● Which sites exist? ● Which sites are working? ● How do I use a site?

Why is this information important? ● Why is this important? ● Because the grid is always changing – sites come and go. Some work one day but not the next. New sites are often created. ● All sites look similar but different.

VORS map

March 2007JP Navarro 31 VORS entry for OSG_LIGO_PSU Gatekeeper: grid3.aset.psu.edu OSG Consortium Mtg March 2007 Quick Start Guide to the OSG

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 32

33 What do we need ? ● Identity ● Authentication ● Message Protection ● Authorization ● Single Sign On

34 Identity & Authentication ● Each entity should have an identity – entity = people, grid resources ● Authenticate: Establish identity – Is the entity who he claims he is ? – Stops masquerading imposters ● Authorisation: What is each identity allowed to do?

35 Single Sign-on ● Type in password once ● Get access to all authorised resources

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 36

Grid Application ● Application is what you want to do, to do science ● Usually cannot take an existing application and run it on the grid straight away. ● Make application parallel ● Make application distributed

File coupled application model ● Very common grid application model ● Each piece of your application is a separate program ● Pieces communicate through files ● Other models of grid applications are possible

File-coupled application Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executable B Output file

On the grid Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executabl e B Output file Input file 1 Input file 3 Intermediat e file 1 Your computer Grid site 1 Grid site 2

The Grid Middleware Stack Grid Security Infrastructure Job Management Data Management Grid Information Services Standard Internet Network Protocols Workflow system (explicit or ad-hoc) Grid Application 41

Grid workflow ● You've seen what a grid application might look like... ●... and how the various low level components like GridFTP, GRAM, Condor can be used to make your application go... ●... but lots of fiddly parts to fit together. ● eg. previous slide app had 4 file transfers and 3 job submissions

Grid workflow ● Make it easier to express your application in a higher-level language ● Take advantage of work other people have done for: – fault tolerance – workflow sharing – site selection – provenance tracking

44 DAGMan Directed Acyclic Graph Manager DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. (e.g., “Don’t run job “B” until job “A” has completed successfully.”)‏

Swift is… ● A language for writing scripts that: –process and produce large collections of persistent data –with large and/or complex sequences of application programs –on diverse distributed systems –with a high degree of parallelism –persisting over long periods of time –surviving infrastructure failures –and tracking the provenance of execution

About Open Science Grid – the organization The OSG is supported by the National Science Foundation and the U.S. Department of Energy's Office of Science. OSG has: the grid service providers – people who provide the software, compute and storage resources, technical support the grid consumers – scientists who use OSG for their applications – sometimes individual researchers, sometimes global collaborations.

March 2007JP Navarro 48 The OSG Software Stack A collection of software –Grid software: Condor, Globus and lots more –Utilities: Monitoring, Authorization, Configuration –Built for >10 flavors/versions of Linux Automated Build and Test: Integration and regression testing. An easy installation: Push a button, everything just works. Quick update processes. Responsive to user needs: process to add new components based on community needs. A support infrastructure: front line software support, triaging between users and software providers for deeper issues.

Where to go for more... ● OSG Education, Outreach, Training group – me + Alina Bejan + Mike Wilde + Charles Bacon – Notes available online 'forever' – ● basics similar to this course ● get access to the real Open Science Grid to implement your applications ● start using real OSG resources in OSGEDU VO ● Me: