OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong

Slides:



Advertisements
Similar presentations
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Remote Educational Programming Of Robots (REPOR) Tord Fauskanger Aurelie Aurilla Bechina Arntzen Dag Samuelsen Buskerud University College.
Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Experiences with Converting my Grid Web Services to Grid Services Savas Parastatidis & Paul Watson
The National Grid Service Mike Mineter.
E-Science Data Information and Knowledge Transformation Eldas Building Service Grids with Enterprise Level Data Access Services Alan Gray
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
A PPARC funded project AstroGrid Framework Consortium meeting, Dec 14-15, 2004 Edinburgh Tony Linde Programme Manager.
The National Grid Service and OGSA-DAI Mike Mineter
Eldas 1.0 Enterprise Level Data Access Services Design Issues, Implementation and Future Development Davy Virdee.
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Configuration management
Software change management
1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 31 Slide 1 Service-centric Software Engineering.
An Overview of OGSA-DAI Kostas Tourlas
MS.NETGrid NeSC Review 18 March Description and Aims Project Aims: Implement OGSI on Microsoft.NET Develop sample Grid services Author and deliver.
31242/32549 Advanced Internet Programming Advanced Java Programming
ArrayExpress Query Interface Gonzalo Garc í a Lara January, / 24.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
1 e-science & data mining workshop, NeSC, UK, November 30 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI.
Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens.
1 OGSA-DAI: Status and Future Plans Neil Chue Hong.
OGSA-DAI: Future Work and Wrap-up The OGSA-DAI Team
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid : Using OgsaDai in Bioinformatics Noel Kelly Belfast.
Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter.
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
ES Metadata Management Enabling Grids for E-sciencE ES metadata OGSA-DAI NA4 GA Meeting, D. Weissenbach, IPSL, France.
Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone:
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Resource Monitoring & Service Discovery in GeneGrid Sachin Wasnik Belfast e-Science Centre.
Introduction to OGSA-DAI The OGSA-DAI Team
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
OGSA-DAI.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
IBM & HSBC visit Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 30 th March 2006.
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
Introduction to OGSA-DAI Neil Chue Hong OGSA-DAI Project Manager 14 th February 2006 GGF16, Athens.
OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid and GridSphere Noel Kelly.
The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh
OGSA-DAI Open Grid Services Architecture – Data Access and Integration NeSC Review 18 March 2004.
Data and storage services on the NGS.
OGSA-DAI Usage Scenarios and Behaviour: Determining good practice Mario Antonioletti EPCC, University of Edinburgh
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
Bioinformatics Data and the Grid: The GeneGrid Data Manager
Presentation transcript:

OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong

2 Motivation Goals Partners Features Projects Further information Overview and demo of FirstDIG/INWA Overview

3 OGSA-DAI Motivation Entering an age of data –Data Explosion CERN: LHC will generate 1GB/s = 10PB/y VLBA (NRAO) generates 1GB/s today Pixar generate 100 TB/Movie –Storage getting cheaper Data stored in many different ways –Data resources Relational databases XML databases Flat files Need ways to facilitate –Data discovery –Data access –Data integration Empower e-Business and e-Science –The Grid is a vehicle for achieving this

4 Goals for OGSA-DAI Aim to deliver application mechanisms that: –Meet the data requirements of Grid applications Functionally, performance and reliability Reduce development cost of data centric Grid applications Provide consistent interfaces to data resources –Acceptable and supportable by database providers Trustable, imposed demand is acceptable, etc. Provide a standard framework that satisfies standard requirements A base for developing higher-level services –Data federation –Distributed query processing –Data mining –Data visualisation

5 Integration Scenario A patient moves hospital DB2 Oracle CSV file A: (PID, name, address, DOB) B: (PID, first_contact) C: (PID, first_name, last_name, address, first_contact, DOB) Data A Data B Data C Amalgamated patient record

6 Why OGSA-DAI? Why use OGSA-DAI over JDBC? –Language independence at the client end Do not need to use Java –Platform independence Do not have to worry about connection technology and drivers –Can handle XML and file resources –Can embed additional functionality at the service end Transformations, Compression, Third party delivery Avoiding unnecessary data movement –Provision of Metadata is powerful –Usefulness of the Registry for service discovery Dynamic service binding process –The quickest way to make data accessible on the Grid Installation and configuration of OGSA-DAI is fast and straightforward

7 Project Partners Powered by …. Funded by the Grid Core Programme OGSA-DAI £3 million, 18 months, from Feb 2002 Three major releases, three interim releases DAIT (DAI-Two) Keep the OGSA-DAI brand name £1.5 million, 24 months, from Oct 2003 Four major releases GGF DAIS WG Strong involvement. Standardise the interfaces OGSA-DAI to be a reference implementation

8 Core features An extensible framework for building applications –Supports relational, xml and some files MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL –Supports various delivery options SOAP, FTP, GridFTP, HTTP, files, , inter-service –Supports various transforms XSLT, ZIP, GZip –Supports message level security using X509 certificates –Client Toolkit library for application developers –Comprehensive documentation and tutorials Third production release is coming in November –OGSI/GT3 based –Also previews of WS-I and WS-RF/GT4 releases

9 Activities are the drivers Express a task to be performed by a GDS Three broad classes of activities: –Statement –Transformations –Delivery Extensible: –Easy to add new functionality –Does not require modification to the service interface –Extension operate within the OGSA-DAI framework Functionality: –Implemented at the service –Work where the data is (do not require to move data back)

10 OGSA-DAI Deck

11 Client Toolkit Why? Nobody wants to write XML! A programming API which makes writing applications easier –Now: Java –Next: Perl, C, C#?, ML!? // Create a query SQLQuery query = new SQLQuery(SQLQueryString); ActivityRequest request = new ActivityRequest(); request.addActivity(query); // Perform the query Response response = gds.perform(request); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1);

12 Project classification OGSA-DAI Biological Sciences Physical Sciences Commercial Applications Computer Sciences FirstDig INWA Bridges AstroGrid BioSimGrid BioGrid eDiamond myGrid ODD-Genes N2Grid GEON MCS IU RGBench OGSA Web-DB GeneGrid GridMiner

13 e-Digital MammOgraphy National Database Built a prototype of a national database of mammographic images in support of the UK Breast screening programme Employ Grid technologies to facilitate this process

14 DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Federation OGSA-DAI Database Files OGSA-DAI Core Services Core Services Core Services Core Services Data Load Training App Training Services UCL KCLUEDCHU Core API Training API Training Application Core & Training API OGSA-DAI Data Load Training App Core & Training API Data Load Training App Core & Training API Data Load Training App Core & Training API

15 eDiaMoND Findings: –OGSA-DAI provides a flexible framework –Dynamically configure the system through discovery –Activities can operate with different levels of granularity –Federation can introduced at various levels –Extended Activities to access IBM DB2 Content Manager

16 GeneGrid Grid Based Framework for Bioinformatics – Virtual Bioinformatics Laboratory –Integration of Existing Technologies & Data Sets –Gene Study in Silico –Develop Specialist Data Sets –Grid Services for Commercial or 3 rd Party Use Data resources as XML collections (XIndice), flat files and relational databases (MySQL) –OGSA-DAI plus custom extensions –Beta testers for file based activities

17 GeneGrid Architecture GeneGrid Application Management Registry GeneGrid Workflow Definition GeneGrid Data Manager Registry GeneGrid Workflow Status GeneGrid Input &Results Parameters GeneGrid Environment GeneGrid Workflow Manager Service GeneGrid Process Manager Service GeneGrid Portal EMBL Database SwissProt Database iGAP GAM Service SDSC BeSC EBI GDM Service TMHMM Blast GAM Service SignalP mpiBlast GAM Service SwissProt DB GDM Service EMBL DB GDM Service

18 Distributed Query Processing Queries mapped to algebraic expressions for evaluation Parallelism represented by partitioning queries –Use exchange operators Prototype available from: – table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 12

19 GridMiner Test application area: medical –traumatic brain injury treatment –Predicting the outcome of seriously ill patients –analytical part focuses on data mining and On-Line Analytical Processing (OLAP) Target: –provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources –building on and extending OGSA-DAI

20 GridMiner Scenario Heterogeneities: –Name in A is First Last (as the target format) –Name in C has to be combined Distribution: –3 data sources

21 Future work Architecture review –better concurrency model –better AAA framework –better definition of extensibility points security, activities, dynamic configuration, mobile code,… Improved support for –WS Security profiles –Stored procedures –Data transport –XQuery –Database specific datatypes and SQL Additionally –JDBC and ODBC driver for OGSA-DAI –Contribution process

22 Further information The OGSA-DAI Project Site: – The DAIS-WG site: – OGSA-DAI Users Mailing list –General discussion on grid DAI matters Formal support for OGSA-DAI releases – OGSA-DAI training courses

23 Project Membership Principal Investigators Project Manager Programme Management Board Chair Technical Review Board Chair Research Team IBM Dissemination Team EPCC Team Charaka Tom Mike Ally Amy Mario Malcolm Kostas Norman Paul Neil Andy Simon Dave PatrickNeil IBM Development Team

24 The End Questions?

25 INWA Objectives Innovation Node Western Australia –Informing Business & Regional Policy: Grid-enabled fusion of global data and local knowledge Project –Run from Nov Aug 2004 –Involved 10 partners (6 UK + 4 Australia) Aim –Data mine commercially sensitive data –Security an absolute MUST –Employ Grid technologies –Need access to data and computational resources Demonstrator using: –OGSA-DAI Incorporate data resources –Sun DCG's TOG (Transfer-queue Over Globus) Handle job submission to analyse micro array data

26 Curtin,Australia EPCC,UK INWA Grid Engine BankTelco Grid Engine BankTelco OGSA-DAI TOG Data Browser Telco data Bank data Australian property UK Property

27 INWA: Lessons Learned Performing Data Integration: –TimeZone date problems Security issues: –Bugs in JavaCoG in GT3 OGSA-DAI could not switch security for Grid data transfers TOG had no security option –All of these have been fixed Middleware not mature enough for commercial deployment

28 Biomedical Research Informatics Delivered by Grid Enabled Services Want a Grid enabled front end to their software Want to do a comparison evaluation between –IBM's Information Integrator –OGSA-DAI

29 Bridges: Data Sources Edinburgh Glasgow Leicester Oxford MRC/Imperial Eindhoven Maastricht

30 MGICSV IBM Information Integrator MGICSV OGSA-DAI Client

31 FirstDIG Data mining with the First Transport Group, UK –Example: When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10% – OGSA-DAI OGSA-DAI Client Application Data Mining Application

32 EdSkyQuery-G Sky Data Sky Data Sky Data Sky Data

33 PostgreSQL MySQL Xindice DB2 Oracle Oracle Federation DB2 DB2 Federation Scratch DB Data Service Scratch DB Data Service Scratch DB Data Service

34 OGSA-DAI Downloads R4 690 downloads since May 04 -Actual user downloads not search engine crawlers -Does not include downloads as part of GT3.2 releases Total of 838 registered users 7/10/04) Version (release date) Downloads R1.0 (Jan 03)104 R1.5 (Feb 03)108 R2.0 (Apr 03)250 R2.5 (Jun 03)291 R3.0 (Jul 03)792 R3.1 (Feb 04)630 Total2865 United Kingdom 21% China 26% United States 13% Japan 5% Unknown 7% Germany 5% Italy 5% Austria 2% Australia 2% France 3% Taiwan 2% Downloads by Country – OGSA-DAI R4.0

35 Users Group A separate independent body to engage with users and feedback to developers –Chair: Prof. Beth Plale of Indiana University Twice-yearly meetings