Presentation is loading. Please wait.

Presentation is loading. Please wait.

14.1 “Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007.

Similar presentations


Presentation on theme: "14.1 “Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007."— Presentation transcript:

1 14.1 “Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007

2 14.2 “Grid-enabling” A poorly defined and understood term! One simple definition: –Being able to execute an application on a grid platform, using the distributed resources available on that platform.

3 14.3 “Turning an existing application, installed on a Grid resource, into a service and generating the application-specific user interfaces to use that application through a web portal.” 1 This definition assumes a portal interface and the use of services. 1 From: "A Service-Oriented, Scalable Approach to Grid-Enabling of Legacy Scientific Applications" by Sanjeepan, Vivekananthan; Matsunaga, Andrea; Zhu, Liping; Lam, Herman; Fortes, Jose A.B. Proc. of 2005 Int. Conf.on Web Services (ICWS-2005), Orlando, Florida, p.553-560, 11-15 July, 2005. Another definition from the literature:

4 14.4 How does one do “Grid-enabling”? Still an open question and in the research domain without a standard approach. Here will describe various approaches.

5 14.5 Simple “grid-enabling” First step Simply running an application on a grid resource. Might just mean making sure executable and input files and available to the application. Not exactly making the most of the grid platform!

6 14.6 Best types of applications for grid-enabling One homogeneous application that needs to be executed multiple times with different arguments (“parameter sweep”) – perfect Computational intensive –a high 'compute time' vs. 'communication time' ratio An MPI type parallel application with minimal message-passing between grid sites

7 14.7 Parameter Sweep Examples Molecular biologist (drug designer) looking for compounds in large chemical data sets that best dock with a particular protein Geologist looking at change in density and depth of ore-body and overlying rock’s density to optimise cost and production Aerospace engineer understanding role of geometry parameters in aerodynamic design and optimization process High energy physicist investigating origin of mass by analyzing petabytes of data generated by high-energy accelerators such as the LHC (Large Hadron Collider) Neuroscientist performing brain activity analysis by conducting pair-wise cross co-relation analysis of MEG (Magneto-EncephaloGraphy) sensors data Source: Alchemi project.

8 14.8 Grid-enabling MPI programs Globus version of MPI available to run MPI jobs across a grid (MPICH-G2). http://www.globus.org/grid_software/computation/mpich-g2.php Message passing can cross sites:

9 14.9 MPICH-G2 programs Ideally one can simply run the MPI job unmodified across the grid. However not that simple

10 14.10 Problems: Firewalls: Need to accommodate firewalls by opening up ports Job Schedulers: Each site will have a separate independent local job scheduler, which will mean can guarantee all MPI processes will be operating at different sites at the same time to communicate. (This issue does not seem to be mentioned in MPICH-G2 documentation.) Latency: The delays in messages in transit are much larger and variable between sites (Internet)

11 14.11 http://www.ngpp.ngp.org.sg/

12 14.12 More advanced “grid-enabling” Some strategies: 1.Using Globus and Grid service APIs 2.Using Grid wrappers to form services 3.Higher-level toolkits

13 14.13 1. Using Globus APIs Globus provides a suite of services that have APIs 1 (C and Java interfaces) that could be called from the application. 1 API: An application programming interface is a source code interface that a computer system or program library provides in order to support requests for services to be made of it by a computer program. http://en.wikipedia.org/wiki/API

14 14.14 Examples GridFTP for high performance file transfers. MDS (Monitoring and Discovery Service) for resource monitoring and discovery. Provides information about available grid resources and their status RLS Replicator locator service: maintains and provides access to mapping information from logical names for data items to target names - a database that maps logical file names or file aliases to physical location. GASS – Global Access to Secondary Storage: Provides mechanisms for transferring data between a remote HTTP, FTP, or GASS server. Condor-G uses GASS to transfer the executable, stdin, stdout, and stderr to/from the remote resource.

15 Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 Globus Services

16 14.16 GridFTP Built on FTP using separation of data and control channels Provides features for –Large data transfers –Secure transfers –Fast transfers –Reliable transfers –Third party transfers Not a web service –RTF (Reliable File Transfer) service provided WS- level interface

17 14.17 Third party transfers PI = FTP Protocol Interpreter DTP= FTP Data Channel Process PI DTP PI Client Server Control channels Data channel

18 14.18 Performing a third-party transfer 1. Client establishes control channel with server 2. Using control channel, client sets up transfer parameters and requests data channel creation 3. Data channel established, 4. Client sends transfer command over control channel, 5. Data transfer starts through data channel. Either client or server can send.

19 14.19 Parallel transfers and striping Using multiple (virtual) connections for transfer –Same external network –Speed improvement possible, but limited by network card Striping – a version of parallel transfers that can use separate hardware interfaces –Implemented in GT 4.

20 14.20 GridFTP and RFT WS Client RFT service (Java) Client API (Java) XIO based (C) Control channel Data channel Control channel GridFTP server From Gridwise

21 14.21 GT 4 Replica Location Service Identify location of files via logical to physical name map Distributed indexing of names, fault tolerant update protocols Index I Foster

22 14.22 Monitoring and Discovery WSRF provides common mechanisms for monitoring and discovering a service. Every GT 4 is discoverable

23 14.23 2. Grid service wrapper approach Providing a wrapper to make it possible to access application as a grid service Request Grid service Application One of our guest speakers (Joel Hollingsworth) will discuss this in more detail

24 14.24 3. Higher–level toolkits Objective is to provide a suite of APIs that are system independent, to hides the underlying grid structure, and even that it is using Globus or any other lower-level grid middleware. Examples: Grid Application Toolkit (GAT)

25 14.25 Grid Application Toolkit (GAT) APIs for developing and executing portable grid applications that are independent of the underlying grid infrastructure and available services GAT APIs used by application to access grid services Essentially wrapper code that hides Globus API.

26 14.26

27 14.27 Deploying legacy code For the most part, people want to re-use their existing high performance code. Several projects to make this easier. Example GriddLeS: Grid Enabling Legacy Software http://www.csse.monash.edu.au/~davida/griddles/

28 14.28 Uses GAT

29 14.29

30 14.30 Other tools

31 14.31 Data Grids Data integration Data integration is the capability to link different datasets together, thereby enabling users to interact with them as if they were a single, unified and homogenous resource.

32 14.32 OGSA-DAI Project Open Grid Services Architecture Data Access and Integration Aim of the OGSA-DAI project is to develop middleware to assist with access and integration of data from separate sources via the grid. http://www.ogsadai.org.uk/

33 14.33 Grid-enabling a data resource using OGSA “ … Placing it behind wrapper middleware for the Grid, e.g., OGSA-DAI. … Once a data resource is Grid-enabled, its availability can be easily advertised in registries where advanced Grid middleware will know to find them and learn of their specific usage conditions for both access and update, as the case may be. ” http://www.ncess.ac.uk/learning/tutorials/datagrids/grid_en/why_grid_ en_important/what_grid_en_involves /

34 14.34 http://www.nces s.ac.uk/learning /tutorials/datagr ids/grid_en/why _grid_en_impor tant/what_grid_ en_involves /

35 14.35 OGSA-DAI Architecture

36 14.36 End of formal lecture materials in course !!

37 14.37 What Next Mini-project: Will be discussed Thursday March 29 th, 2007. PLEASE BE SURE TO ATTEND THIS CLASS Actually, mini-project will not start until April after MPI assignment, but next week have guest presentation.


Download ppt "14.1 “Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007."

Similar presentations


Ads by Google