OGSA-DAI Lectures Part 2 Tom Sugden, EPCC 2 nd International Summer School on Grid Computing, Vico Equense, Italy.

Slides:



Advertisements
Similar presentations
웹 서비스 개요.
Advertisements

Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
An Overview of OGSA-DAI Kostas Tourlas
“Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC
General introduction to Web services and an implementation example
Amy Krause Applications Consultant, EPCC Tom Sugden Applications Consultant, EPCC OGSA-DAI Client Toolkit Principles.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Distributed Heterogeneous Data Warehouse For Grid Analysis
Implementing an Activity EPCC, University of Edinburgh Tom Sugden First International Summer School on Grid Computing, Vico Equense,
Inside the GDS The Engine, Activities, Data Resource Implementations and Role Mapping EPCC, University of Edinburgh Tom Sugden First.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Writing Perform Documents EPCC, University of Edinburgh Amy Krause ( Tom Sugden First International Summer.
Mike Jackson EPCC OGSA-DAI Today Release 2.2 Principles and Architectures for Structured Data Integration: OGSA-DAI.
Chapter 14 Database Connectivity and Web Technologies
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
Jun Peng Stanford University – Department of Civil and Environmental Engineering Nov 17, 2000 DISSERTATION PROPOSAL A Software Framework for Collaborative.
V0.01 © 2009 Research In Motion Limited Push technology for Java applications Trainer name Date.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Jason Morrill NCOAUG Training Day February, 2008
INTRODUCTION TO WEB DATABASE PROGRAMMING
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Getting connected.  Java application calls the JDBC library.  JDBC loads a driver which talks to the database.  We can change database engines without.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
1 OGSA-DAI: Status and Future Plans Neil Chue Hong.
OGSA-DAI: Future Work and Wrap-up The OGSA-DAI Team
Web Services Mohamed Fahmy Dr. Sherif Aly Hussein.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
EdSkyQuery-G Overview Brian Hills, December
Fundamentals of Database Chapter 7 Database Technologies.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
OGSA-DAI User Guide The OGSA-DAI Team
OGSA-DAI Architecture The OGSA-DAI Team
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
OGSA-DAI.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
Grid Services I - Concepts
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
The OGSA-DAI Client Toolkit The OGSA-DAI Team
XML and Web Services (II/2546)
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.
OGSA-DAI Users’ Meeting Introduction Malcolm Atkinson Director 7 th April 2004.
The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh
Mike Jackson EPCC OGSA-DAI Today – Release 8 OGSA-DAI Tutorial GGF17, Tokyo.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
OGSA-DAI Usage Scenarios and Behaviour: Determining good practice Mario Antonioletti EPCC, University of Edinburgh
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
OGSA-DAI User Requirements and Scenarios
Presentation transcript:

OGSA-DAI Lectures Part 2 Tom Sugden, EPCC 2 nd International Summer School on Grid Computing, Vico Equense, Italy

2nd International Summer School on Grid Computing2 Outline l Inside a Grid Data Service (15 mins) l OGSA-DAI User Guide (30 mins) l The Client Toolkit APIs (20 mins) l Wrap-up (15 mins)

2nd International Summer School on Grid Computing3 Status l OGSA-DAI middleware u Release 4 of 7 u functional and flexible u performance and scalability issues l Depends on: u Globus Toolkit 3.2 u Java 1.4+ u Apache Ant l Supports various databases u MySQL, Oracle, DB2, PostgreSQL, Xindice

Inside a Grid Data Service

2nd International Summer School on Grid Computing5 Grid Data Service Data Resource Perform Document Response Document Result Data

2nd International Summer School on Grid Computing6 Overview l Low-level components of a Grid Data Service u Engine u Activities u Data Resource Implementation u Role Mapper l Extensibility of OGSA-DAI architecture u Interfaces u Abstract classes u Implementations

2nd International Summer School on Grid Computing7 Data Resource Implementation Role Mapper The Engine GDS Internals data query perform document response document element credentials Query Activity Transform Activity Delivery Activity role credentials connection role

2nd International Summer School on Grid Computing8 Grid Data Service l GDS has a document based interface u Consumes perform documents u Produces response documents u Additional operations for 3 rd party data delivery l Motivation for using a document interface u Change in behaviour ≠> interface change u Reduce number of operation calls u Extensible

2nd International Summer School on Grid Computing9 The GDS Engine l Engine is the central GDS component l Dictates behaviour when perform documents are submitted u Parses and validates perform document u Identifies required activities implementations u Processes activities u Composes response document u Returns response document to GDS

2nd International Summer School on Grid Computing10 Perform Documents l Perform documents u Encapsulate multiple interactions with a service into a single interaction u Abstract each interaction into an “activity” u Data can flow from one activity to another u Not quite workflow l No control constructs present (conditionals, loops, variables) Query  Transformation  Delivery

2nd International Summer School on Grid Computing11 Activities l An Activity dictates an action to be performed u Query a data resource u Transform data u Deliver results l Engine processes a sequence of activities l Subset of activities available to a GDS u Specified in a configuration file l Data can flow between activities HTML data WebRowSet data SQL Query Statement XSLT Transform Delivery ToURL

2nd International Summer School on Grid Computing12 Activity Taxonomy l Statement u Interact with the data resource l Delivery u Deliver data to and from 3 rd parties l Transform u Perform transformations on data Activity StatementDeliveryTransform l Activities fall into three main functional groups

2nd International Summer School on Grid Computing13 Building Blocks Predefined Activities sqlQueryStatement sqlStoredProcedure sqlUpdateStatement sqlBulkLoadRowset xPathStatement xUpdateStatement xQueryStatement xmlResourceManagement xmlCollectionManagement relationalResourceManager gzipCompression zipArchive xslTransform inputStream outputStream DeliverFromURL DeliverToURL DeliverToGFTP DeliverFromGFTP DeliverToStream DeliverFromGDT DeliverToGDT

2nd International Summer School on Grid Computing14 The Activity Framework l Extensibility point l Users can develop additional activities u To support different query languages l XQuery u To perform different kinds of transformation l STX u To deliver results using a different mechanism l WebDAV l An activity requires u XSD schema sql_query_statement.xsd u Java implementation SQLQueryStatementActivity

2nd International Summer School on Grid Computing15 The Activity Class l All Activity implementations extend the abstract Activity class Activity ~ mContext: ActivityContext + Activity( element: Element ) ~ cleanUp() ~ initialise() ~ processBlock() : void ~ setCompleted()

2nd International Summer School on Grid Computing16 Connected Activities Sql Query Statement Deliver ToURL select * from myTable where id=10

2nd International Summer School on Grid Computing17 Connected Activities cont. Deliver ToURL select * from myTable where id=10 Sql Query Statement

2nd International Summer School on Grid Computing18 The Perform Document <gridDataServicePerform xmlns=" xmlns:xsi=" xsi:schemaLocation=" This example performs a simple select statement to retrieve one row from the test database then delivers the results to an FTP location. select * from littleblackbook where id=10

2nd International Summer School on Grid Computing19 SQL Query Statement Activity Inputs and Outputs l Activities read and write blocks of data u Allows efficient streaming between activities u Reduces memory overhead l A block is a Java Object u Untyped but usually a String or byte array l Interfaces for reading and writing u BlockReader and BlockWriter XSL Transform Activity Deliver To URL

2nd International Summer School on Grid Computing20 Relational database Data Resource Implementations l Governs access to a data resource u Open/close connections u Validate user credentials using a RoleMapper u Facilitate connection pooling l Provided for JDBC and XML:DB open connection close connection JDBC Data Resource get connection return connection SQL Query Statement

2nd International Summer School on Grid Computing21 Accessing Data Resource Sequence Diagram :Activity:RoleMapper :DataResource Implementation Get connection using user credentials Get database role using user credentials :DatabaseRole Get user ID and password open connection using user ID and password Do exciting things with the connection Return connection :Context Get user credentials and data resource implementation

2nd International Summer School on Grid Computing22 Advantages of the Activity Model l Avoid multiple message exchanges u Multiple activities within a single request l Extensible u Developers can add functionality u Could import third party trusted activities l Simplicity u Internal classes manage data flow, access to databases, etc

2nd International Summer School on Grid Computing23 Issues with Activity Model l Incomplete syntax u No typing of inputs and outputs l How do you determine the data types that can be accepted? l Keeping implementation and XML Schema fragment in synch l Puts workload on the server u May need dynamic job placement l DAIS has factored out the perform document from the draft specs

2nd International Summer School on Grid Computing24 Summary l The Engine is the central component of a GDS l Activities perform actions u Querying, Updating u Transforming u Delivering l Data Resource Implementations manage access to underlying data resources l Architecture designed for extensibility u New Activities u New Role Mappers u New Data Resource Implementations

OGSA-DAI User Guide

2nd International Summer School on Grid Computing26 OGSA-DAI in a Nutshell l All you need to know to get started with OGSA- DAI in a handy pocket sized book! l Updated for Version 4

2nd International Summer School on Grid Computing27 Overview Installing OGSA-DAI Configuring Grid Data Service Factories Registering Services Using Grid Data Services Writing perform documents Using the supplied client applications Using the client toolkit Learn by scenario

2nd International Summer School on Grid Computing28 Scenario: Red Eyed Tree Frogs Alice is a molecular biologist  Based at the University of Edinburgh  Mapped the genetic sequence of the Red-Eyed Tree Frog

2nd International Summer School on Grid Computing29 Background Alice wants to make her work available to the scientific community  Publish an on-line database  Use OGSA-DAI Alice Bob Carroll

2nd International Summer School on Grid Computing30 Alice’s Database Tree Frogs MySQL relational database  jdbc:mysql://localhost:3306/TreeFrogs Contains 1 table with 1,000,000 rows  GeneticSequence JDBC Database Driver  org.gjt.mm.mysql.Driver Driver

2nd International Summer School on Grid Computing31 Installing OGSA-DAI l Download OGSA-DAI software u l Follow installation notes u Set-up prerequisite software l Java (JDK1.3 or newer) l Web services container (Tomcat) l Grid Middleware (Globus Toolkit 3.2) l Build tool (Ant) l Additional libraries (Log4J, database drivers, etc) u Deploy OGSA-DAI

2nd International Summer School on Grid Computing32 Configuring Services l Configure Grid Data Service Factories (GDSF) 1. Allow specific users read/write access 2. Allow anonymous users to search data Tree Frogs Public Factory Private Factory creates GDS creates GDS read/write read

2nd International Summer School on Grid Computing33 Part 1: Configuring Private Factory l Allow specific users to perform u SQL query statements u SQL update statements u Bulk load of data l To configure the factory: u Create data resource configuration file u Create activity configuration file u Create database roles file u Update server configuration

2nd International Summer School on Grid Computing34 Data Resource Configuration <roleMap implementation="...rolemap.SimpleFileRoleMapper" configuration="path/PrivateDatabaseRoles.xml"/> <dataResource implementation="...SimpleJDBCDataResourceImplementation"> jdbc:mysql://localhost:3306/treefrogs l Configuration file describes the data resource u Create TreeFrogsPrivate.xml u Base on examples\GDSFConfig\dataResourceConfig.xml

2nd International Summer School on Grid Computing35 Activity Configuration <activity name="sqlQueryStatement" implementation="package.SQLQueryStatementActivity" schemaFileName="path/sql_query_statement.xsd"/> <activity name="sqlUpdateStatement" implementation="package.SQLUpdateStatementActivity" schemaFileName="path/sql_update_statement.xsd"/> l Describes the activities that are supported by the data resource u Create TreeFrogsPrivateActivities.xml u Base on examples\GDSFConfig\activityConfig.xml

2nd International Summer School on Grid Computing36 Create Database Roles l Enables access to TreeFrogs database u Create file PrivateDatabaseRoles.xml u Base on examples\RoleMap\ExampleDatabaseRoles.xml alice / amph1b1an bob / tadp0le

2nd International Summer School on Grid Computing37 Edit Server Configuration l Specifies the services for the container l Loaded when Tomcat starts-up u Edit file server-config.xml... <parameter name="ogsadai.gdsf.config.xml.file" value="path/TreeFrogsPrivate.xml"/> <parameter name="ogsadai.gdsf.activity.xml.file" value="path/TreeFrogsPrivateActivities.xml"/>......

2nd International Summer School on Grid Computing38 Starting the Factory l Start service container (Tomcat) l View the factory using a web/service browser u Causes factory to start up ogsa/services/ogsadai/ TreeFrogFactoryPrivate ?wsdl

2nd International Summer School on Grid Computing39 Milestone 1 l Configuration for Private Tree Frog Factory complete l Specific users can u locate factory using known location u create GDS u query and update database Tree Frogs Private Tree Frog Factory creates GDS read/write

2nd International Summer School on Grid Computing40 Use-case 1: Remote update l Bob is a Professor of Biology u Based at the University of Sydney u Working in collaboration with Alice on the Red-Eyed Tree Frog genome l Through Alice’s OGSA-DAI services u Bob can contribute new sequences

2nd International Summer School on Grid Computing41 Interactions Client Tree Frogs 5. updated row count 4. bulk upload of data 3. new gene sequence 6. updated row count Private Tree Frog Factory Tree Frog Service 2. creates 1. creation parameters

2nd International Summer School on Grid Computing42 Perform Documents l Perform documents are used to communicate with GDS l Contain only supported activity types u sqlQueryStatement u sqlUpdateStatement u sqlBulkLoadRowSet l Results delivered in the response document l Many examples provided with OGSA-DAI GDS perform document response document specified in data resource configuration

2nd International Summer School on Grid Computing43 Simple Query l Select a range of chromosomes from GeneSequence l Use sqlQueryStatement activity SELECT Chromosome FROM GeneSequence WHERE Position > 1.1 AND Position < 1.2

2nd International Summer School on Grid Computing44 Simple Query Response l Response contained Web Row Set XML

2nd International Summer School on Grid Computing45 OGSA-DAI Clients l Send perform documents to a GDS using a client l OGSA-DAI provides 3 simple clients u Command-Line Client u Graphical Demonstrator u Data Browser > java uk.org.ogsadai.client.Client registryURL|factoryURL performDocPath > ant demonstrator > ant databrowser

2nd International Summer School on Grid Computing46 Performing Remote Update l Bob stores his new gene sequence in a local file l Use deliverFromURL and sqlBulkLoadRowSet activities to update remote database file://path/to/newSequence.xml

2nd International Summer School on Grid Computing47 Tree Frogs Tree Frogs updated row count Client GDS Interactions perform document updates GDS response document data pulled by GDS new gene sequence file

2nd International Summer School on Grid Computing48 handle Part 2: Configure Public Factory l Publish to the UK National Biology Registry Tree Frogs Public Factory creates GDS read l Allow anonymous users to search data handle National Biology Registry register find services

2nd International Summer School on Grid Computing49 Public Factory Set-up l Database changes u Alice defines findGene stored procedure l Supported activities u SQL stored procedure l To configure factory: u Create data resource configuration u Create activity configuration file u Create database roles file u Create service registration list u Update server configuration

2nd International Summer School on Grid Computing50 Data Resource Configuration l Configuration file describes the data resource u Create TreeFrogsPublic.xml u Base on examples\GDSFConfig\dataResourceConfig.xml <roleMap implementation="...rolemap.SimpleFileRoleMapper" configuration="path/PublicDatabaseRoles.xml"/> <dataResource implementation="...SimpleJDBCDataResourceImplementation"> jdbc:mysql://localhost:3306/treefrogs

2nd International Summer School on Grid Computing51 Activity Configuration <!– Only the sqlStoredProcedure activity is available to this GridDataService --> <activity name="sqlStoredProcedure" implementation="package.SQLStoredProcedureActivity" schemaFileName="path/sql_stored_procedure.xsd"/> l Describes the activities that are supported by the data resource u Create TreeFrogsPublicActivities.xml u Base on examples\GDSFConfig\activityConfig.xml

2nd International Summer School on Grid Computing52 Create Database Roles <User dn="No Certificate Provided" userid="guest" password="guest"/> l Enables access to TreeFrogs database u Create file PublicDatabaseRoles.xml u Base on examples\RoleMap\ExampleDatabaseRoles.xml guest / guest

2nd International Summer School on Grid Computing53 Edit Server Configuration l Specifies the services for the container l Loaded when Tomcat starts-up u Edit file server-config.xml... <parameter name="ogsadai.gdsf.config.xml.file" value="path/TreeFrogsPublic.xml"/> <parameter name="ogsadai.gdsf.activity.xml.file" value="path/TreeFrogsPublicActivities.xml"/> <parameter name="ogsadai.gdsf.registrations.xml.file" value="path/TreeFrogsRegistrationList.xml"/>......

2nd International Summer School on Grid Computing54 Create Service Registration List l Specifies a list of service group registries l Factory is registered with each registry u Create file TreeFrogsRegistrationList.xml u Base on example\GDSFConfig\registrationList.xml <gdsfRegistration... gsh=" ogsadai/NationalBiologyRegistry"/> GDSF-Private register National Biology Registry

2nd International Summer School on Grid Computing55 Starting the Factory l Start service container (Tomcat) l View the factory using a web/service browser u Causes factory to start up u Automatically registers with NationalBiologyRegister ogsa/services/ogsadai/ TreeFrogFactoryPublic ?wsdl

2nd International Summer School on Grid Computing56 Milestone 2 Tree Frogs GDSF-Private creates GDS read/write National Biology Registry GDSF-Public creates GDS read registers l Configuration for Public and Private Factories complete u Specific users have read/write access u Anonymous users can search data via stored procedure

2nd International Summer School on Grid Computing57 Use-case: Query with transformations l Carroll is a biochemist u Works for a small drugs company in Chicago u Investigating toxin in saliva of Fire Bellied Toad u Wants to compare proteins with Red Eyed Tree Frog

2nd International Summer School on Grid Computing58 protein sequence Transforming Sequences l Carroll has a protein sequence l Alice’s data is encoded as a gene sequence l There is a public Grid Data Transformation Service available at Newcastle University Transform Service gene sequence

2nd International Summer School on Grid Computing59 Interactions 1. Transform protein sequence needed for query Transform Service 1.2 gene sequence Client Tree Frog Service 1.1 protein sequence

2nd International Summer School on Grid Computing60 Transform Service Interactions 1. Transform protein sequence needed for query 2. Query tree frog gene sequence asynchronously 1.2 gene sequence Client 2.1 asynchronous query using gene sequence Tree Frog Service 1.1 protein sequence

2nd International Summer School on Grid Computing61 Transform Service Interactions 1. Transform protein sequence needed for query 2. Query tree frog gene sequence asynchronously 3. Transform results back into protein sequence 3.3 results as protein sequence Client 2.1 asynchronous query using gene sequence 3.2 results as gene sequence Tree Frog Service 3.1 pull results

2nd International Summer School on Grid Computing62 Client Toolkit l Why? Writing XML is a pain! l A programming API which makes writing applications easier u Now: Java u Next: Perl, C, C#? // Create a query SQLQuery query = new SQLQuery(SQLQueryString); // Perform the query Response response = gds.perform(query); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1);

2nd International Summer School on Grid Computing63 Conclusion l OGSA-DAI provides middleware tools to grid-enable existing databases access discovery integration transformation collaboration

2nd International Summer School on Grid Computing64 Amy Krause and Tom Sugden The Client Toolkit

2nd International Summer School on Grid Computing65 Overview l The Client Toolkit l OGSA-DAI Service Types l Locating and Creating Data Services l Requests and Results l Delivery l Data Integration Example

2nd International Summer School on Grid Computing66 Why use a Client Toolkit? l Nobody wants to read or write XML! l Protects developer from u Changes in activity schema u Changes in service interfaces u Low-level APIs u DOM manipulation

2nd International Summer School on Grid Computing67 OGSA-DAI Services l OGSA-DAI uses three main service types u DAISGR (registry) for discovery u GDSF (factory) to represent a data resource u GDS (data service) to access a data resource accesses represents DAISGR GDSF GDS Data Resource locates creates

2nd International Summer School on Grid Computing68 ServiceFetcher l The ServiceFetcher class creates service objects from a URL ServiceGroupRegistry registry = ServiceFetcher.getRegistry( registryHandle ); GridDataServiceFactory factory = ServiceFetcher.getFactory( factoryHandle ); GridDataService service = ServiceFetcher.getGridDataService( handle );

2nd International Summer School on Grid Computing69 Registry l A registry holds a list of service handles and associated metadata l Clients can query registry for all Grid Data Factories GridServiceMetaData[] services = registry.listServices( OGSADAIConstants.GDSF_PORT_TYPE ); l The GridServiceMetaData object contains the handle and the port types that the factory implements String handle = services[0].getHandle(); QName[] portTypes = services[0].getPortTypes();

2nd International Summer School on Grid Computing70 Creating Data Services l A factory object can create a new Grid Data Service. GridDataService service = factory.createGridDataService(); l Grid Data Services are transient (i.e. have finite lifetime) so they can be destroyed by the user. service.destroy();

2nd International Summer School on Grid Computing71 Interaction with a GDS Client GDS Activity Request Activity l Client sends a request to a data service l A request contains a set of activities

2nd International Summer School on Grid Computing72 Interaction with a GDS Client GDS Result Response Result l The Data service processes the request l Returns a response document with a result for each activity

2nd International Summer School on Grid Computing73 Activities and Requests l A request contains a set of activities l An activity dictates an action to be performed u Query a data resource u Transform data u Deliver results l Data can flow between activities HTML data WebRowSet data SQL Query Statement XSLT Transform Deliver ToURL

2nd International Summer School on Grid Computing74 gzipCompression zipArchive xslTransform Predefined Activities sqlQueryStatement sqlStoredProcedure sqlUpdateStatement sqlBulkLoadRowset xPathStatement xUpdateStatement xQueryStatement xmlResourceManagement xmlCollectionManagement relationalResourceManager inputStream outputStream DeliverFromURL DeliverToURL DeliverToGFTP DeliverFromGFTP DeliverToStream DeliverFromGDT DeliverToGDT DeliverToFile DeliverFromFile fileWriting directoryAccess fileAccess fileManipulation

2nd International Summer School on Grid Computing75 Examples of Activities l SQLQuery SQLQuery query = new SQLQuery( "select * from littleblackbook where id='3475'"); l XPathQuery XPathQuery query = new XPathQuery( ); l XSLTransform XSLTransform transform = new XSLTransform(); l DeliverToGFTP DeliverToGFTP deliver = new DeliverToGFTP( "ogsadai.org.uk", 8080, "myresults.txt" );

2nd International Summer School on Grid Computing76 Simple Requests l Simple requests consist of only one activity l Send the activity directly to the perform method SQLQuery query = new SQLQuery( "select * from littleblackbook where id='3475'"); Response response = service.perform( query );

2nd International Summer School on Grid Computing77 Constructing a Request SQL Query Statement XSLT Transform Delivery ToURL Request add

2nd International Summer School on Grid Computing78 Constructing a Request cont. SQL Query XSL Transform Delivery ToURL ActivityRequest ActivityRequest request = new ActivityRequest; request.add( query ); request.add( transform ); request.add( delivery );

2nd International Summer School on Grid Computing79 Data Flow l Connecting activities SQLQuery query = new SQLQuery( "select * from littleblackbook where id<=1000"); DeliverToURL deliver = new DeliverToURL( url ); deliver.setInput( query.getOutput() ); SQL Query Statement Deliver ToURL

2nd International Summer School on Grid Computing80 Performing Requests l Finally… perform the request! Response response = service.perform( Request ); l The response contains status and results of each activity in the request. System.out.println( response.getAsString() );

2nd International Summer School on Grid Computing81 Processing Results l Varying formats of output data u SQLQuery l JDBC ResultSet: ResultSet rs = query.getResultSet(); u SQLUpdate l Integer: int rows = update.getModifiedRows(); u XPathQuery l XML:DB ResourceSet: ResourceSet results = query.getResourceSet(); l Output can always be retrieved as a String String output = myactivity.getOutput().getData();

2nd International Summer School on Grid Computing82 Delivery l Data can be pulled from or pushed to a remote location. l OGSA-DAI supports third-party transfer using FTP, HTTP, or GridFTP protocols. DeliverToURL deliver = new DeliverToURL( url ); deliver.setInput( myactivity.getOutput() ); DeliverToGFTP deliver = new DeliverToGFTP( “ogsadai.org.uk”, 8080, “tmp/data.out” ); deliver.setInput( myactivity.getOutput() );

2nd International Summer School on Grid Computing83 Delivery Methods GDS GridFTP server Local Filesystem Web Server FTP server DeliverFromURL DeliverTo/FromURL DeliverTo/FromGFTP DeliverTo/FromFile

2nd International Summer School on Grid Computing84 Delivering data to another GDS l The GDT port type allows to transfer data from one data service to another. l An InputStream activity of GDS1 connects to a DeliverToGDT activity of GDS2 l Alternatively, an OutputStream activity can be connected to a DeliverFromGDT activity InputStream GDS1 GDS2 DeliverToGDT

2nd International Summer School on Grid Computing85 Delivering Data l Transfer in blocks or in full l InputStream activities wait for data to arrive at their input l Therefore, the InputStream activity at the sink has to be started before the DeliverToGDT activity at the source l Same for OutputStream and DeliverFromGDT

2nd International Summer School on Grid Computing86 Data Integration Scenario GDS2 GDS3 Relational Database Relational Database GDS1 Relational Database Client select + output stream select + output stream deliver deliver from GDT bulk load join tables

2nd International Summer School on Grid Computing87 Conclusion l Easy to use u No XML! u Less low-level APIs u improves usability and shortens learning curve for OGSA-DAI client development l Protects developer u Shielded from schema changes, protocols, GT3 l Limitations u Metadata and service-data not addressed adequate u Higher-level abstraction possible (no factory)

OGSA-DAI Wrap-up

2nd International Summer School on Grid Computing89 Overview Future Developments The OGSA-DAI Webpage Support Information Tutorials Links

2nd International Summer School on Grid Computing90 Future Developments Jan '04 - Feb '04 - Mar '04 – Apr '04 - May '04 - Jun '04 - Jul '04 - Aug '04 - Sep '04 - Oct '04 - Nov '04 - Dec '04 - Jan '05 - Feb '05 - Mar '05 – Apr '05 - May '05 - Jun '05 - Jul '05 - Aug '05 - Sep '05 - Oct '05 - Nov '05 - Dec '05 - R3.1: Technical preview of parts of R4 R5: Compliance with DAIS, distributed query and transactions, improved performance, scalability, dependability and security, installation wizard, coordinated contributor community R4: Enhancements and additional DBMS, SQL, File, Client toolkitR6: Features depend on user priorities, context and research R7: Maintainable release for the user community

2nd International Summer School on Grid Computing91 R5  R7 l R5 October 04 u Compliance with DAIS standards proposal u Distributed Relational Query Processing u Improved dependability and security integration u Extended & integrated XML and relational facilities u Distributed transaction participation u Coordinated OGSA-DAI contributor community l R6 April 05 u Integrated with GT4 u New facilities depend on user priorities, context and research u OGSA-DAI components from contributor community l R7 October 05 u Maintainable release for the user community

2nd International Summer School on Grid Computing92 OGSA-DAI Project Webpage l Background News & Events Software Releases Documentation Support Training Courses Links

2nd International Summer School on Grid Computing93 Support l Long term support for OGSA-DAI provided by UK Grid Support Centre u u l Web forms for submission of u General queries u Problems with installation and configuration u Problems with usage of software l Submissions are tracked and logged

2nd International Summer School on Grid Computing94 FAQ and Mailing List l Frequently Asked Questions u u updated as common problems become clear l Users mailing list u u general discussion of OGSA-DAI, data and the Grid u use support instead to report problems l Suggestions for additions and improvements to support service welcome

2nd International Summer School on Grid Computing95 Tutorials l Graphical Demonstrator User Guide l How to write an Activity Tutorial l Using the Client Toolkit Tutorial

2nd International Summer School on Grid Computing96 Links l OGSA-DAI Webpage u l Globus Toolkit 3 u l Database Access and Integration Services (DAIS-WG) u l Grid Technology Repository u l ELDAS - Enterprise-Level Data Access Services (Eldas) u l Web Services Choreography u

2nd International Summer School on Grid Computing97 Projects using OGSA-DAI l DQP - u Service Based Distributed Query Processor l FirstDIG - u Data mining analysis of OGSA-DAI service-enabled data sources l BIOGRID - u Construction of a Supercomputer Network to meet IT needs for biology and medical science in Japan l OGSA-WebDB - u Provides a uniform view of heterogeneous database resources in a grid environment l BioSimGrid - u A distributed database for biomolecular simulations l More projects–

2nd International Summer School on Grid Computing98 ODD-Genes l Data Analysis for genetics u Sites: l GTI (microarray data) l HGU (genex data) l EPCC (compute server) u Software: l OGSA-DAI (Data) l TOG (Computation) l Globus Toolkit 2 and 3 u

2nd International Summer School on Grid Computing99 FirstDIG l Data mining with the First Transport Group, UK u Example: “When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10%” u OGSA-DAI OGSA-DAI Client Application Data Mining Application

2nd International Summer School on Grid Computing100 EdSkyQuery-G l Collaboration between OGSA-DAI & Eldas l Based on SkyQuery project by John Hopkins University, Baltimore, USA l Identify astronomical objects and dropouts amongst different distributed catalogues l Large scale data transport l Plug-in algorithms l Platform and DBMS independence

2nd International Summer School on Grid Computing101 EdSkyQuery-G Sky Data Sky Data  Sky Data  Sky Data 

2nd International Summer School on Grid Computing102 EdSkyQuery-G Challenges l Data formats u XML (WebRowSet) u CSV u Binary u Compressed CSV or XML l Data transport u SOAP over HTTP/HTTPS u FTP, Secure-FTP, Grid-FTP l Importing/Exporting data u Through services u Direct from stored procedures u Using native tools

2nd International Summer School on Grid Computing103 SkyQuery.net

2nd International Summer School on Grid Computing104 Conclusion l Try out OGSA-DAI u It’s free! u Supported l Please send us feedback! l Evolving and improving u Data integration u Performance and scalability l Become involved u Write activities u Contribute to the DAIS working group

2nd International Summer School on Grid Computing105 HPC-Europa l EC-funded research visit programme l Fully-funded, multi-disciplinary l Visits between 3 and 13 weeks u EPCC in Edinburgh u CEPBA-CESCA in Barcelona/Catalonia u HLRS in Stuttgart u CINECA in Bologna u SARA in Amsterdam u IDRIS in Paris l

2nd International Summer School on Grid Computing106 OGSA-DAI Tutorial l Introduction to data access and integration on the Grid using OGSA-DAI u Using the Data Browser u Writing Clients using the Client Toolkit APIs l Start workstations in Windows mode u OGSA-DAI, Tomcat, MySQL and Xindice have already been configured