Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Grid Computing Midwest Grid Workshop Module 1 1.

Similar presentations


Presentation on theme: "Introduction to Grid Computing Midwest Grid Workshop Module 1 1."— Presentation transcript:

1 Introduction to Grid Computing Midwest Grid Workshop Module 1 1

2 Scaling up Social Science: Citation Network Analysis Work of James Evans, University of Chicago, Department of Sociology 2

3 Scaling up the analysis Database queries of 25+ million citations Work started on small workstations Queries grew to month-long duration With database distributed across U of Chicago TeraPort cluster:  50 (faster) CPUs gave 100 X speedup  Many more methods and hypotheses can be tested! Higher throughput and capacity enables deeper analysis and broader community access. 3

4 Computing clusters have commoditized supercomputing Cluster Management “frontend” Tape Backup robots I/O Servers typically RAID fileserver Disk Arrays Lots of Worker Nodes A few Headnodes, gatekeepers and other service nodes 4

5 PUMA: Analysis of Metabolism PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Analysis on Grid Involves millions of BLAST, BLOCKS, and other processes Natalia Maltsev et al. http://compbio.mcs.anl.gov/puma2 5

6 Initial driver: High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech 6

7 Grids Provide Global Resources To Enable e-Science 7

8 Ian Foster’s Grid Checklist A Grid is a system that:  Coordinates resources that are not subject to centralized control  Uses standard, open, general-purpose protocols and interfaces  Delivers non-trivial qualities of service 8

9 Virtual Organizations Groups of organizations that use the Grid to share resources for specific purposes Support a single community Deploy compatible technology and agree on working policies  Security policies - difficult Deploy different network accessible services:  Grid Information  Grid Resource Brokering  Grid Monitoring  Grid Accounting 9

10 What is a grid made of ? Middleware. Grid Protocols Grid Resources dedicated by UC, IU, Boston Grid Storage Grid Middleware Computing Cluster Grid resource time purchased from commercial provider Grid Middleware Computing Cluster Grid resources shared by OSG, LCG, NorduGRID Grid Middleware Computing Cluster Grid Client Application User Interface Grid Middleware Resource, Workflow And Data Catalogs Grid Storage Grid Storage Security to control access and protect communication (GSI) Directory to locate grid sites and services: (VORS, MDS) Uniform interface to computing sites (GRAM) Facility to maintain and schedule queues of work (Condor-G) Fast and secure data set mover (GridFTP, RFT) Directory to track where datasets live (RLS) 10

11 We will focus on Globus and Condor Condor provides both client & server scheduling  In grids, Condor provides an agent to queue, schedule and manage work submission Globus Toolkit provides the lower-level middleware  Client tools which you can use from a command line  APIs (scripting languages, C, C++, Java, …) to build your own tools, or use direct from applications  Web service interfaces  Higher level tools built from these basic components, e.g. Reliable File Transfer (RFT) 11

12 Grid components form a “stack” Grid Security Infrastructure (M2) Job Management (M3) Data Management (M4) Information (config & status) Services (M5) Core Globus Services Standard Network Protocols Condor Grid Client (M3) Workflow system (explicit or ad-hoc) (M6) Grid Application (M6,M7,M8) (often includes a Portal) 12

13 Provisioning Grid architecture is evolving to a Service-Oriented approach. Service-oriented Grid infrastructure  Provision physical resources to support application workloads Appln Service Users Workflows Composition Invocation Service-oriented applications  Wrap applications as services  Compose applications into workflows “The Many Faces of IT as Service”, Foster, Tuecke, 2005...but this is beyond our workshop’s scope. See “Service-Oriented Science” by Ian Foster. 13

14 Security Workshop Module 2 14

15 Grid security is a crucial component Resources are typically valuable Problems being solved might be sensitive Resources are located in distinct administrative domains  Each resource has own policies, procedures, security mechanisms, etc. Implementation must be broadly available & applicable  Standard, well-tested, well-understood protocols; integrated with wide variety of tools 15

16 Security Services Forms the underlying communication medium for all the services Secure Authentication and Authorization Single Sign-on  User explicitly authenticates only once – then single sign-on works for all service requests Uniform Credentials Example: GSI (Grid Security Infrastructure) 16

17 Authentication stops imposters Examples of authentication:  Username and password  Passport  ID card  Public keys or certificates  Fingerprint Authentication means identifying that you are whom you claim to be 17

18 Is this device allowed to access to this service? Read, write, execute permissions in Unix Access conrol lists (ACLs) provide more flexible control Special “callouts” in the grid stack in job and data management perform authorization checks. Authorization controls what you are allowed to do. 18

19 Job and resource management Workshop Module 3 19

20 Job Management Services provide a standard interface to remote resources Includes CPU, Storage and Bandwidth Globus component is Globus Resource Allocation Manager (GRAM) The primary Condor grid client component is Condor-G Other needs:  scheduling  monitoring  job migration  notification 20

21 GRAM provides a uniform interface to diverse resource scheduling systems. User Grid VO Condor PBS LSF UNIX fork() GRAM Site A Site B Site C Site D 21

22 GRAM: What is it? Globus Resource Allocation Manager Given a job specification:  Create an environment for a job  Stage files to and from the environment  Submit a job to a local resource manager  Monitor a job  Send notifications of the job state change  Stream a job’s stdout/err during execution 22

23 A “Local Resource Manager” is a batch system for running jobs across a computing cluster In GRAM Examples:  Condor  PBS  LSF  Sun Grid Engine Most systems allow you to access “fork”  Default behavior  It runs on the gatekeeper: A bad idea in general, but okay for testing 23

24 Managing your jobs We need something more than just the basic functionality of the globus job submission commands Some desired features  Job tracking  Submission of a set of inter-dependant jobs  Check-pointing and Job resubmission capability  Matchmaking for selecting appropriate resource for executing the job Options: Condor, PBS, LSF, … 24

25 Data Management Workshop Module 4 25

26 Data management services provide the mechanisms to find, move and share data Fast Flexible Secure Ubiquitous Grids are used for analyzing and manipulating large amounts of data  Metadata (data about data): What is the data?  Data location: Where is the data?  Data transport: How to move the data? 26

27 GridFTP is a secure, efficient and standards-based data transfer protocol Robust, fast and widely accepted Globus GridFTP server Globus globus-url-copy GridFTP client Other clients exist (e.g., uberftp) 27

28 GridFTP is secure, reliable and fast Security through GSI  Authentication and authorization  Can also provide encryption Reliability by restarting failed transfers Fast  Can set TCP buffers for optimal performance  Parallel transfers  Striping (multiple endpoints) Not all features are accessible from basic client  Can be embedded in higher level and application-specific frameworks 28

29 Replica Location Service (RLS) Phedex RefDB / PupDB File catalogues tell you where the data is 29

30 Requirements from a File Catalogue Abstract out the logical file name (LFN) for a physical file  maintain the mappings between the LFNs and the PFNs (physical file names) Maintain the location information of a file 30

31 In order to avoid “hotspots”, replicate data files in more than one location Effective use of the grid resources Each LFN can have more than 1 PFN Avoids single point of failure Manual or automatic replication  Automatic replication considers the demand for a file, transfer bandwidth, etc. 31

32 Birmingham The Globus-Based LIGO Data Grid Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory www.globus.org/solutions  Cardiff AEI/Golm 32

33 National Grid Cyberinfrastructure Workshop Module 5 33

34 TeraGrid provides vast resources via a number of huge computing facilities. 34

35 Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance Composed of a large number (>50 and growing) of shared computing facilities, or “sites” http://www.opensciencegrid.org/ A consortium of universities and national laboratories, building a sustainable grid infrastructure for science. 35

36 www.opensciencegrid.org Jobs (2004) Open Science Grid  50 sites (15,000 CPUs) & growing  400 to >1000 concurrent jobs  Many applications + CS experiments; includes long-running production operations  Up since October 2003; few FTEs central ops 36

37 To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with information maintained on the “health” of sites 37

38 OSG Resource Selection Service: VORS 38

39 GT4 Container Monitoring and Discovery Service - MDS MDS- Index GT4 Cont. RFT MDS- Index GT4 Container MDS- Index Registration & WSRF/WSN Access GridFTP adapter Custom protocols for non-WSRF entities Clients (e.g., WebMDS) GRAMUser Automated registration in container WS-ServiceGroup 39

40 Grid Workflow Workshop Module 6 40

41 A typical workflow pattern in image analysis runs many filtering apps. Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 41

42 Workflows can process vast datasets. Many HEP and Astronomy experiments consist of:  Large datasets as inputs (find datasets)  “Transformations” which work on the input datasets (process)  The output datasets (store and publish) The emphasis is on the sharing of the large datasets Transformations are usually independent and can be parallelized. But they can vary greatly in duration. Montage Workflow: ~1200 jobs, 7 levels Mosaic of M42 created on TeraGrid = Data Transfer = Compute Job 42

43 Virtual data model enables workflow to abstract grid details. 43

44 An important application pattern: Grid Data Mining Workshop Module 7 44

45 Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center). Seismic Hazard Model Seismicity Paleoseismology Local site effects Geologic structure Faults Stress transfer Crustal motion Crustal deformation Seismic velocity structure Rupture dynamics 45

46 Data mining in the grid Decompose across network Clients integrate dynamically  Select & compose services  Select “best of breed” providers  Publish result as a new service Decouple resource & service providers Data Archives Analysis tools Discovery tools Users Fig: S. G. Djorgovski 46

47 Conclusion: Why Grids? New approaches to inquiry based on  Deep analysis of huge quantities of data  Interdisciplinary collaboration  Large-scale simulation and analysis  Smart instrumentation  Dynamically assemble the resources to tackle a new scale of problem Enabled by access to resources & services without regard for location & other barriers 47

48 Grids: Because Science Takes a Village … Teams organized around common goals  People, resource, software, data, instruments… With diverse membership & capabilities  Expertise in multiple areas required And geographic and political distribution  No location/organization possesses all required skills and resources Must adapt as a function of the situation  Adjust membership, reallocate responsibilities, renegotiate resources 48

49 Based on: Grid Intro and Fundamentals Review Dr Gabrielle Allen Center for Computation & Technology Department of Computer Science Louisiana State University gallen@cct.lsu.edu Grid Summer Workshop June 26-30, 2006 49


Download ppt "Introduction to Grid Computing Midwest Grid Workshop Module 1 1."

Similar presentations


Ads by Google