Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-Science NorthWest Jon MacLaren Monday 18 th to Friday 22 nd October 2004 GridPrimer Training Course University of Manchester GridPrimer An Introduction.

Similar presentations


Presentation on theme: "E-Science NorthWest Jon MacLaren Monday 18 th to Friday 22 nd October 2004 GridPrimer Training Course University of Manchester GridPrimer An Introduction."— Presentation transcript:

1 E-Science NorthWest Jon MacLaren Monday 18 th to Friday 22 nd October 2004 GridPrimer Training Course University of Manchester GridPrimer An Introduction to the world of Grid Computing

2 E-Science NorthWest Introduction What is “Grid” anyway?

3 e-Science NorthWest3 What is “Grid”?  Many definitions exist  The definitions have changed over time  The term Grid Computing was originally coined by Ian Foster as an analogy to electrical Power Grids in the mid- to-late nineties  This definition is rarely seen these days, but is still the one that people remember best...

4 e-Science NorthWest4 The power grid analogy "Computational Grid" was coined by analogy with power grids In power grids, plug in your appliance and draw current, without caring where the power is generated In computational grids, plug in your application and draw cycles Be inspired by this, but don’t believe it’s so simple.

5 e-Science NorthWest5 The roots of Grid computing The roots of Grid computing are in scientific High Performance Computing (HPC), informed by both capacity (high-throughput) and capability (large-scale) considerations.  The Grid is descended from: –Metacomputing –Distributed computing –Load balancing and scheduling  It is enabled by increasingly ubiquitous and reliable infrastructure.  It has outgrown its HPC roots.

6 e-Science NorthWest6 Other Definitions of the Grid…  "…is the web on steroids."  "…is distributed computing across multiple administrative domains" –Dave Snelling, senior architect of UNICORE  […provides] “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" –From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”  "…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."

7 e-Science NorthWest7 The Grid…  "…is the web on steroids."  "…is Napster for Scientists" [of data grids]  "…is the solution to all your problems."  "…is evil." [a system manager, of Globus]  "…is distributed computing re-badged."  "…is distributed computing across multiple administrative domains" –Dave Snelling, senior architect of UNICORE  […provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" –From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”  "…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships.“ –From “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”

8 e-Science NorthWest8 Yet more definitions...  The Original “Grid” book.  A collection of papers, so not self- consistent  Mainly “visions” of what Grid computing could be.  Focus is computational Grids  The new “Grid” book.  Second edition, not completely new  Big shift away from Computational Grids to other kinds of Grid...  Much more Data Grid-oriented  Semantic Grid included

9 e-Science NorthWest9 What is the Grid? Foster speaks. Again.  In a column for Grid Today (July 2002), Ian Foster was asked to define “What is the Grid?”. His brief article (see link below) gave a three-point checklist: 1.coordinates resources that are not subject to centralized control … 2.… using standard, open, general-purpose protocols and interfaces… 3.… to deliver nontrivial qualities of service.  http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

10 e-Science NorthWest10 Common Diagrammatical Representation of a Grid...

11 e-Science NorthWest11 Virtual Organisations ? R R R R R ? R R R R R ? R R R R R ? RR VO A VO B VO C Community overlays on classic organisational structures “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" "…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."

12 e-Science NorthWest12 What is a Virtual Organization? Facilitates the workflow of a group of users across multiple domains who share (some of) their resources to solve particular classes of problems. Collates and presents information about these resources in a uniform view.  The UK e-Science community is effectively a Virtual Organization made up of real institutions – the Universities involved.  An organization can of course be part of lots of Virtual Organizations.

13 e-Science NorthWest13 Sharing Share (v):  To participate in, use, enjoy, or experience jointly or in turns.  To have part; to receive a portion; to partake, enjoy, or suffer with others.  To allow someone to use or enjoy something that one possesses. Sharing must be mutually beneficial  common goals  trade

14 e-Science NorthWest14 Figure courtesy of Ian Foster Controlled sharing of resources and know-how with overlapping and volatile membership to generate new results The Grid as Collaboratory

15 e-Science NorthWest15 A collaboratory is…a center without walls, in which the nation's researchers can perform their research without regard to geographical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information in digital libraries William Wulf, 1989 U.S. National Science Foundation Collaboratory

16 e-Science NorthWest16 National Fusion Collaboratory  The National Fusion Grid is a SciDAC Collaboratory Pilot project that is creating and deploying collaborative software tools throughout the magnetic fusion research community. The goal of the project is to advance scientific understanding and innovation in magnetic fusion research by enabling more efficient use of existing experimental facilities and more effective integration of experiment, theory, and modeling.SciDAC  Making widespread use of Grid technologies  http://www.fusiongrid.org/ http://www.fusiongrid.org/

17 e-Science NorthWest17 The Chimpanzee Collaboratory  The Chimpanzee Collaboratory is a collaborative project of attorneys, scientists and public policy experts working to make significant and measurable progress in protecting the lives and establishing the legal rights of chimpanzees.  Not making widespread use of Grid technologies  http://www.chimpcollaboratory.org/ http://www.chimpcollaboratory.org/

18 e-Science NorthWest18 What computational grids are for  Focus is on computation  Resources “shared” include –computers users have work, need computer providers have computer, need work –applications users need to run a particular application providers make applications available to a community –networks  Applications need input data and generate output data –even computationally-oriented middleware needs ability to transfer files and harvest results –interoperability with data-oriented middleware is an issue  Some good examples in the pilot project presentations

19 e-Science NorthWest19 Capabilities Typical:  Single sign-on (more later)  Job submission, monitoring and management –submit a job to a resource on the grid –monitor the progress of a submitted job –retrieve results –cancel job  File transfer –move files from A to B, securely, reliably and efficiently  Resource discovery –locate resources or services with particular characteristics Less typical:  Metacomputing, workflow enactment, resource brokering,...

20 e-Science NorthWest20 Is that it?  Computational Grids very focussed on being able to access compute cycles easily  What if you want to analyse huge amounts of data?  What if? –Numerous or large data sources –Data updated frequently and on different schedules –Accessed by users at multiple locations in different organizations  So the ability to move compute cycles around will not be sufficient.  Technology must be developed to directly support the accessing of large, distributed data sources

21 e-Science NorthWest21 Registries organize services of interest to a community R R R R Registries organize services of interest to a community Registries organize services of interest to a community Many sources of data, services, computation Access Data integration activities may require access to, & exploration of, data at many locations Exploration & analysis may involve complex, multi-step workflows Security service Security service Policy service Policy service Security & policy must underlie access & management decisions Discovery Access Data integration activities may require access to, & exploration of, data at many locations Exploration & analysis may involve complex, multi-step workflows RM Resource management is needed to ensure progress & arbitrate competing demands RM Resource management is needed to ensure progress & arbitrate competing demands Many sources of data, services, computation Discovery Data Grid Figure courtesy of Ian Foster

22 e-Science NorthWest22 What do I have to choose from?  Globus Toolkit –version 2 is widely deployed; nearest thing to a de facto standard –horizontally integrated bag of tools –suits grid application developers better than end users  UNICORE –less widely deployed; few UK deployments –vertically integrated –suits end users better than application developers  Condor –high throughput computing –great for cycle harvesting  Web Services? –wait or roll your own using Web Services tools  Others –yes, there are others

23 e-Science NorthWest23 Globus Toolkit version 2  "Single sign-on" through Grid Security Infrastructure (GSI)  Remote execution of jobs –GRAM, job-managers, Resource Specification Language (RSL)  Grid-FTP –Efficient, reliable file transfer; third-party file transfers  MDS (Metacomputing Directory Service) –Resource discovery (GRIS and GIIS)  Co-allocation (DUROC) –Limited by support from scheduling infrastructure  Other GSI-enabled utilities –gsi-ssh, grid-cvs, etc.  Low-level APIs and command-line interfaces  Commodity Grid Kits (CoG-kits), Java, Perl, Python  Widespread deployment, lots of projects Diverse global services Core services Local OS A p p l i c a t i o n s

24 e-Science NorthWest24 UNICORE  Packaged Software with GUI  Open source –http://unicore.sourceforge.net/  Designed for firewalls  Strict security model –explicit delegation  Abstract Job Object (AJO) –built-in workflow management  Resource Broker –can submit to Globus grids  Has notion of software resource  Few APIs –extend through plug-ins –starting to expose service interfaces  Serves the user http://www.unicore.org/

25 e-Science NorthWest25 Unicore Client

26 e-Science NorthWest26 Condor: High-throughput computing Condor converts collections of workstations and clusters into a distributed high-throughput computing facility  Emphasis on policy management and reliability  High-throughput scheduler  Supports job checkpoint and migration –single processor jobs only  Remote system calls Condor-G lets Condor users add Globus-enabled resources to their private view of a Condor pool ("flock")  "glide-in" http://www.cs.wisc.edu/condor/

27 27 Requirements  End Users -Easy access to current, consistent data -Convenient access to applications and processing -Easy collaboration with colleagues and partners -Regardless of location, administrative domain, or platform -Applications often can not, or will not, be modified MUST WORK WITH LEGACY APPLICATIONS -Increasingly Java, J2EE as execution environment  IT -Support requests for better access and more resources -Streamline data management -Enable more flexibility in the use of resources -Protect corporate assets and intellectual property -IT managers are overworked - and represent 40% of IT costs Grids must simplify life, not make it more complicated!

28 28 AVAKI 2.5 Data Grid Enterprise Desktops Server Server Shared Data Cluster Shared Data Sources Shared Output Server Queuing System  Federates multiple data sources  Provides access to data in local and virtual file systems (DAS, NAS, SAN)  Provides access to shared data through standard interfaces  Caches data locally Partner Enterprise Users Partner Users User DepartmentsIT Departments

29 e-Science NorthWest29 Anything else? Why is it so diverse?  Don’t have to use Globus or UNICORE or Condor or Avaki to be doing Grid  Grid is also an approach to (distributed) computing. Can argue that building your own solution with your own protocols is still Grid  However, as Grids are about interoperability as well, this is perhaps harder to justify  Wouldn’t it be nice if: –all the complexity of the middlewares was hidden behind a web-page –there was a common API to all the different middlewares, or –there was a common infrastructure for all the middlewares to use, or ...more about these later! Portals GAT OGSI/WS

30 e-Science NorthWest30 Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams Generation Game Increased functionality, standardization Time Custom solutions Open Grid Services Architecture Web services Globus Toolkit Condor, Unicore Defacto standards GridFTP, GSI X.509, LDAP, FTP, … App-specific Services Data and knowledge intensive Open services-based architecture Builds on Web services GGF + OASIS+W3C Multiple implementations Global Grid Forum Industry participation (adapted from Ian Foster GGF7 Plenary)

31 e-Science NorthWest31 Are there any other kinds of Grid?  What about AccessGrid?  What about the SemanticGrid?  Is SETI@HOME a Grid?

32 e-Science NorthWest32 Interactive environments and virtual presence integrated with Grid middleware, multicast over IP  SARS Combat Grid, Taiwan  Emergency Access Grids  Integration of patient data  Integration of models of disease dissemination  Data mining using compute grid http://www.accessgrid.org

33 e-Science NorthWest33 AccessGrid VideoConferencing on Steroids!  Manchester was UK’s first AccessGrid node Solar Terrestrial Physics Workshop Teleradiology, Denver

34 e-Science NorthWest34

35 e-Science NorthWest35 Access Grid Support Centre http://www.agsc.ja.net/

36 16 th ECAI 2004, Valencia, 27 th August 2004 ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

37 e-Science NorthWest37 GRID.ORG TM Grid Computing Projects http://www.grid.org/home.htm

38 e-Science NorthWest38 GRID.ORG  “Grid.org is unmatched in its ability to accelerate today’s grand-scale research. Scientists now have at their disposal, an available resource that can open up new areas of discovery and analysis in time frames that were previously impossible.” Dr Graham Richards, Chairman of Chemistry, University of Oxford  Cycle-stealing, based on volunteered machines  Current Projects: –PatriotGrid – combatting bioterrorism (?!) –Cancer Research –Smallpox Research

39 e-Science NorthWest39 Smallpox Research Grid  Analysis of 35 million drug compounds against 11 smallpox proteins to try to find a way to stop the replication of the virus.  Volunteers from over 190 countries donated spare CPU power at www.grid.org, the world's largest public computing resourcewww.grid.org  Contributed over 39,000 years of computing time in less than six months.  44 lead molecules identified United Devices, IBM, Oxford University, Accelrys http://www.grid.org/projects/smallpox/

40 e-Science NorthWest40 Myth-Busting I Where is The Grid?

41 e-Science NorthWest41  BIRN No ONE Grid. Logical and Physical Grid configurations http://www.nbirn.nethttp://egee-intranet.web.cern.ch/ http://www.teragrid.org/ http://www.ngs.ac.uk

42 e-Science NorthWest42 This leads to some important questions...  Am I building a Grid? –Architecting/Installation/Politicking  Am I joining someone else’s Grid? –Installation/Politicking  Am I just consuming resources through a pre-existing Grid? –Client Installation  Am I developing new middleware/services to fit into a pre- existing Grid? –Engineering

43 e-Science NorthWest43 Are there Grids we can use? Or join?  National Initiatives –US TeraGrid – Manchester to join as satellite node –UK National Grid Service (and Level 2 Grid)  Project-wide Initiatives –EUROGRID (UNICORE Grid) –DataGrid (for the Large Hadron Collider at CERN) –EGEE

44 e-Science NorthWest44 National Grid Service http://www.ngs.ac.uk

45 e-Science NorthWest45 NGS  Currently in pre-production phase  Core comprises –JISC-funded nodes Compute clusters at Leeds and Oxford (64 dual processor systems) Data clusters at RAL and Manchester (20 dual processor systems, 18 TB) Access is free at point-of-use, subject to light-weight peer review –National HPC services HPCx and CSAR  Volunteer nodes to be added subject to minimum SLD  Middleware basis –Globus Toolkit version 2.4.3 (from VDT distribution) plus goodies –data nodes also provide Oracle, SRB, OGSA-DAI on data nodes –SRB client on compute nodes  Access through UK e-Science (or other recognised) certificates  First line of support provided by Grid Support Centre –until Grid Operation Support Centre is established  Sequel to the UK “Level 2 Grid”

46 e-Science NorthWest46 GSC Grid Support Centre Partnership between  CLRC  University of Edinburgh  University of Manchester Supports UK e-Science programme Will be replaced by Grid Operations Support Centre in October 2004 www.grid-support.ac.uk support@grid-support.ac.uk Globus Installation support Software download Evaluation reports Links to other resources Documentation National Directory Service Reference systems System admin training National Certificate Authority Condor SRB Web services Network Web Information Resource www.grid-support.ac.uk Helpdesk support@grid-support.ac.uk Technical Support Team UK Grid Support Centre CLRC Rutherford Appleton and Daresbury Laboratories and Universities of Edinburgh and Manchester Research Council Pilot Projects Regional e-Science Centres Research Labs UniversitiesIndustry

47 e-Science NorthWest47 GSC provides  Helpdesk support@grid-support.ac.uksupport@grid-support.ac.uk –first point of contact for requests and queries –phone contact during office hours –provides access to technical expertise at all sites  Web information resource –tutorials –evaluation reports –links to other resources –http://www.grid-support.ac.ukhttp://www.grid-support.ac.uk  Grid starter kit –Globus, Condor, SRB, OGSA-DAI –downloadable software –installation support –documentation

48 e-Science NorthWest48... and other services  Certificate Authority for the UK e-Science programme –Issues X.509 digital certificates (usable with Globus inter alia) –Uses network of registration authorities (RA) to validate users –Training courses for RA operators –Used by both e-Science and GridPP communities –CPS complies with other European and US recommendations –See http://ca.grid-support.ac.ukhttp://ca.grid-support.ac.uk  National resource directory service –based on Globus MDS 2 and BDII –holds published information on Grid-enabled resources  Training for system administrators –to help with setting up local installations

49 e-Science NorthWest49 Where do I get the stuff from?  Most of the Grid middleware shown is freely available –http://www.globus.org/http://www.unicore.org/http://www.globus.org/http://www.unicore.org/ –http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/ –But not Avaki! See: http://www.avaki.com/http://www.avaki.com/  For Globus, the picture is more complicated. –You can get GT2 from Globus... –...or from the Virtual Data Toolkit (VDT) –VDT is used by GryPhyN, iVDGL, PPDG and LCG  For Web Services, you can also get things for free. –Axis, etc. from http://www.apache.org/http://www.apache.org/ –Or the Sun J2EE Web Services Container from http://www.sun.com/http://www.sun.com/ –But integrating things like WS-Security can be difficult  Could do with a middleware repository...

50 50 Our vision for the OMII is for it to become the source for reliable, interoperable and open-source Grid middleware, ensuring the continued success of Grid-enabled e- Science in the UK. http://www.omii.ac.uk/

51 e-Science NorthWest51 Myth Busting II  The Academics-only myth –67% companies using or planning to use Grids (Forrester 2004) –Commercial vendors investing..  The Particle Physics-only myth –Life Sciences and Medicine will dominate because of their complex organisational, data and diversity characteristics

52 e-Science NorthWest52 The Computational Grid myth  Isn’t it just High Performance Computing and cycle stealing?  Most mature kind of Grid.  A generic mechanism for forming, managing and disbanding dynamic federations of services  Data integration, data access, data transport, transaction management, will dominant  Application integration and cooperative information systems is key  This myth persists in the USA. Everyone else has gotten over it.

53 e-Science NorthWest53 Killer App  Dave Snelling said (at the EU Concertation meeting) that we don’t really know what the Grid is for yet.  We continue to develop the technology, trying to apply it in various ways.  Someone will come along, pick up the technology, and use it for something we’ve not imagined yet.  Then this will really all take off...

54 e-Science NorthWest54 So, now you know what Grid is all about!  Thanks to others for the slides: –Carole Goble –Stephen Pickles –Rob Allan  Also: –Mark Mc Keown –Andrew Grimshaw –Steven Newhouse  So, any questions?


Download ppt "E-Science NorthWest Jon MacLaren Monday 18 th to Friday 22 nd October 2004 GridPrimer Training Course University of Manchester GridPrimer An Introduction."

Similar presentations


Ads by Google