Presentation is loading. Please wait.

Presentation is loading. Please wait.

What do we mean by the Grid and e-research? An overview of some key aspects and technologies in 30 minutes Jennifer M. Schopf UK National eScience Centre.

Similar presentations


Presentation on theme: "What do we mean by the Grid and e-research? An overview of some key aspects and technologies in 30 minutes Jennifer M. Schopf UK National eScience Centre."— Presentation transcript:

1 What do we mean by the Grid and e-research? An overview of some key aspects and technologies in 30 minutes Jennifer M. Schopf UK National eScience Centre Argonne National Lab

2 2 Talk Outline l Definition of Grid and eResearch l Globus Toolkit u Provider of basic infrastructure u Focus on data tools l OMII – Open Middleware Infrastructure u UK repository and distribution of eResearch tools

3 3 What is a Grid? l Many definitions – many differences especially between academics and industry u Both use the buzzword to get funding l My definition u Resource sharing u Coordinated problem solving u Dynamic, multi-institutional virtual orgs

4 4 Resource Sharing l Resources can be anything- u Computers u Storage/repositories u Sensors and Networks u People and software l Local Control of the resources, and local policies for their use l Sharing is always conditional u Issues of trust, policy u Negotiation and payment

5 5 Coordinated Problem Solving l Beyond client-server u Client Server defines a small set of well- understood interactions as the only ones that can take place l Actions in this space can include u Distributed data analysis u Computation and visualization of results u Collaboration

6 6 Dynamic, Multi-institutional Virtual Organizations l Crossing administrative domains u No one has full control over the resources u Local policy not global u Different local policy on different sites l Community overlays on classic organizational structures l Large or small, static or dynamic

7 7 What is eScience or eResearch? l Use of distributed resources, in a coordinated way, across multiple administrative domains to do science or further your research l “Classic” eScience u Use compute and data resources at many sites to run large scale simulations for a physics or biology application l Today’s Use Cases u Replicate data across multiple sites to increase reliability, redundancy and performance u Use one common interface to access a variety of data resources at multiple sites u Look at a number of available resources to select the one that best suits the application needs at this time

8 8 Why is this hard/different? l Lack of central control u Where things run u When they run l Shared resources u Contention, variability l Off-label use u Resources or software developed for one purpose (or community) is now being used in a way that wasn’t originally planned for l Communication u Different sites implies different sys admins, users, institutional goals, and often “strong personalities”

9 9 So why do it? l Work that needs to be done with a time limit l Data that can’t fit on one site l Data owned by multiple sites l Applications that need to be run bigger, faster, more

10 10 What functionality is needed to use a Grid? l Basics: u Run a job u Transfer a file u Find out what’s going on (service and job monitoring u All done securely l Higher-level u Replication u Higher level data movement u Workflow-scheduling

11 11 Grid2003: An Operational Grid  28 sites (2100-2800 CPUs) & growing  10 substantial applications + CS experiments  Running since October 2003, still up today Korea http://www.ivdgl.org/grid2003

12 12 Globus Toolkit Was Created To Help Applications l The Globus Toolkit consists of collections of solutions to problems that frequently come up when trying to build collaborative distributed applications l Heterogeneity u Focus on simplifying heterogeneity for application developers u Working towards more “vertical solutions” l Standards u Capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF) u Reference implementations of new/proposed standards in these organizations

13 13 Globus is Service-Oriented Infrastructure Technology l Software for service-oriented infrastructure u Service enable new & existing resources u E.g., GRAM on computer, GridFTP on storage system, custom application service u Uniform abstractions & mechanisms l Tools to build applications that exploit service- oriented infrastructure u Registries, security, data management, … l Open source & open standards u Each empowers the other u eg – monitoring across different protocols is hard l Enabler of a rich tool & service ecosystem

14 14 Our Goals for Globus Toolkit v4 l Usability, reliability, scalability, … u Web service components have quality equal or superior to pre-WS components u Documentation at acceptable quality level l Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform u WS-I Basic (Security) Profile compliant l New components, platforms, languages u And links to larger Globus ecosystem

15 15

16 16 Why Grid Security is Hard l Resources being used may be valuable & the problems being solved sensitive l Resources are often located in distinct administrative domains u Each resource has own policies & procedures l Set of resources used by a single computation may be large, dynamic, and unpredictable u Not just client/server, requires delegation l It must be broadly available & applicable u Standard, well-tested, well-understood protocols; integrated with wide variety of tools

17 17 Basic Grid Security Mechanisms l Grid-wide identities implemented as PKI certificates l Transport-level and message-level authentication l Ability to delegate credentials to agents l Ability to map between Grid & local identities l Local security administration & enforcement l Single sign-on support implemented as “proxies” l A “plug in” framework for authorization decisions

18 18 A Model Architecture for Data Grids Metadata Catalog Replica Loc. Svc Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP Control Channel Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS GridFTP Data Channel 1234

19 19 GT4 Data Functions l Find your data: Replica Location Service u Managing ~40M files in production settings l Move/access your data: GridFTP, Reliable File Transfer (RFT) u High-performance striped data movement l Couple data & execution management u GRAM uses GridFTP and RFT for staging l Access databases through standard Grid interfaces: OGSA-DAI

20 20 GridFTP in GT4 l Basic file transfer support, and memory-to-memory copies l Underlying protocol of access can vary u Functions as a hourglass offering one interface to different resources l Allows partial file transfer support l Can have parallel streams and stripping u Greatly improve performance over most FTP implementations u On TeraGrid network achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes

21 21 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery

22 22 RFT Performance Stats l Current maximum request size is approx 20,000 entries with a default 64MB heap size. l Infinite transfer - LAN u ~120,000 transfers (servers were killed by mistake) l Infinite transfer – WAN u ~67000 transfers (killed because of the same reason as above) l Sloan Digital Sky Survey DR3 archive move u 900+K files, 6 TB u Killed the transfer several times for recoverability testing u No human intervention has been required to date

23 23 Replica Location Service l Identify location of files via logical to physical name map l Distributed indexing of names, fault tolerant update protocols l GT4 version scalable & stable l Managing ~40 million files across ~10 sites Index Local DB Update send (secs) Bloom filter (secs) Bloom filter (bits) 10K<121 M 22410 M 5 M717550 M

24 24 OGSA-DAI l Data access u Relational & XML Databases, semi- structured files l Data integration u Multiple data delivery mechanisms, data translation l Extensible & Efficient framework u Request documents contain multiple tasks l A task = execution of an activity l Group work to enable efficient operation u Extensible set of activities l > 30 predefined, framework for writing your own u Moves computation to data u Pipelined and streaming evaluation u Concurrent task evaluation

25 25 OGSA-DAI l Current Release = Release 5 = in GT4 u Added Installation wizards & indexed files u >1100 registered users we know about u Running on 3 message passing infrastructures l Release 6 – May 2005 u Improved client side API u Explicit control of sequential & parallel tasks u Dynamic reconfigurability u WS-DAI reference implementation

26 26 l Any questions on Data Management?

27 27 The Resource Management Challenge l Enabling secure, controlled remote access to heterogeneous computational resources and management of remote computation u Authentication and authorization u Resource discovery & characterization u Reservation and allocation u Computation monitoring and control l Addressed by a set of protocols & services u GRAM protocol as a basic building block u Resource brokering & co-allocation services u GSI for security, MDS for discovery

28 28 Execution Management (GRAM) l Common WS interface to schedulers u Unix, Condor, LSF, PBS, SGE, … l More generally: interface for process execution management u Lay down execution environment u Stage data u Monitor & manage lifecycle u Kill it, clean up l A basis for application-driven provisioning

29 29 Monitoring and Discovery Challenges l Grid Information Service l Requirements and characteristics u Uniform, flexible access to information u Scalable, efficient access to dynamic data u Access to multiple information sources u Decentralized maintenance u Secure information provision l Basic monitoring for resource selection and notification of errors

30 30 Monitoring and Discovery l “Every service should be monitorable and discoverable using common mechanisms” u WSRF/WSN provides those mechanisms l A common aggregator framework for collecting information from services, thus: u Index Service: Registry supporting Xpath queries, with caching u Trigger Service: perform action on condition l Deep integration with Globus containers & services: every GT4 service is discoverable u GRAM, RFT, GridFTP, CAS, …

31 31 With a standard deployment, a project can… l Discover needed data from services in order to make job submission or replica selection decisions by querying the VO-wide Index l Evaluate the status of Grid services by looking at the VO-wide WebMDS setup l Be notified when disks are full or other error conditions happen by being on the list of administrators l Individual projects can examine the state of the resources and services of interest to them

32 32 The Globus Ecosystem l Globus components address core issues relating to resource access, monitoring, discovery, security, data movement, etc. u GT4 being the latest version l A larger Globus ecosystem of open source and proprietary components provide complementary components u A growing list of components l These components can be combined to produce solutions to Grid problems u We’re building a list of such solutions

33 33 Many Tools Build on, or Can Contribute to, GT4-Based Grids l Condor-G, DAGman l MPICH-G2 l GRMS l Nimrod-G l Ninf-G l Open Grid Computing Env. l Commodity Grid Toolkit l GriPhyN Virtual Data System l Virtual Data Toolkit l GridXpert Synergy l Platform Globus Toolkit l VOMS l PERMIS l GT4IDE l Sun Grid Engine l PBS scheduler l LSF scheduler l GridBus l TeraGrid CTSS l NEES l IBM Grid Toolbox l …

34 34 Open Middleware Infrastructure Institute l Formed University of Southampton (2004) u Focus on an easy to install e-Infrastructure solution u Utilise existing software & standards l Expanding with new partners in 2006 u OGSA-DAI team at Edinburgh u myGrid team at Manchester To be a leading provider of reliable interoperable and open-source Grid middleware components services and tools to support advanced Grid enabled solutions in academia and industry. Slides compliments of Steven Newhouse

35 35 Slides compliments of Steven Newhouse Activity l By providing a software repository of Grid components and tools from e-science projects l By re-engineering software, hardening it and providing support for components sourced from the community l By a managed programme to contract the development of “missing” software components necessary in grid middleware l By providing an integrated grid middleware release of the sourced software components

36 36 Slides compliments of Steven Newhouse The Managed Programme: Distribution and Repository l OGSA-DAI (Data Access service) l GridSAM (Job Submission & Monitoring service) l Grimoires (Registry service based on UDDI) l GeodiseLab (Matlab & Jython environments) l FINS (Notification services using WS-Eventing) l BPEL (Workflow service) l MANGO (Managing workflows with BPEL) l FIRMS (Reliable messaging)

37 37 So… l eResearch is expanding in scope l Globus Toolkit provides many basic tools, and is incorporated in many projects, esp those focused on data movement l In the UK, OMII is another useful source of eInfrastructure software 2nd Edition www.mkp.com/grid2

38 38 Additional Information l Contact: u Jennifer M. Schopf u jms@mcs.anl.gov u http://www.mcs.anl.gov/~jms l Globus Alliance: u http://www.globus.org l Information about OMII: u http//www.omii.ac.uk u s.newhouse@omii.ac.uk


Download ppt "What do we mean by the Grid and e-research? An overview of some key aspects and technologies in 30 minutes Jennifer M. Schopf UK National eScience Centre."

Similar presentations


Ads by Google