OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.

Slides:



Advertisements
Similar presentations
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Advertisements

OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University.
PHP I.
Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.
Grid Database Projects Paul Watson, Newcastle Norman Paton, Manchester.
Eldas 1.0 Enterprise Level Data Access Services Design Issues, Implementation and Future Development Davy Virdee.
M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,
16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 1 Declarative Grid Service.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Slides thanks to Steve Lynden Amy Krause EPCC Distributed Query Processing with OGSA-DQP Principles and Architectures for Structured Data Integration:
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 6.
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
1 © Prentice Hall, 2002 The Client/Server Database Environment.
Client-Server Processing and Distributed Databases
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
MBA 664 Database Management Systems Dave Salisbury ( )
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
DAIS Grid1 Database Access and Integration Services on the Grid * * Authors: N. Paton, M. Atkinson, V.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Information System Development Courses Figure: ISD Course Structure.
1 / 18 Federal University of Rio de Janeiro – COPPE/UFRJ Author : Wladimir S. Meyer – Doctorate Student Advisors : Jano Moreira de Souza – Ph.D. Milton.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Metadata Mòrag Burgon-Lyon University of Glasgow.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
RUS: Resource Usage Service Steven Newhouse James Magowan
Application Development
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
OGSA-DAI Users’ Meeting Introduction Malcolm Atkinson Director 7 th April 2004.
GESA: Grid Economic Services Architecture Steven Newhouse Jon MacLaren
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
OGSA-DAI.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris,
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Open Source distributed document DB for an enterprise
Optimising the OGSA-DAI Enactment Model
Grid Systems: What do we need from web service standards?
Presentation transcript:

OGSA-DQP Steven Lynden University of Manchester

Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points People involved: University of Manchester Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton University of Newcastle Jim Smith, Arijit Mukherjee, Paul Watson OGSA-DAI Prototype release 3.0 available from the OGSA-DAI website

Data access & integration with OGSA-DAI: GGF 17 3 OGSA-DQP high-level overview OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. Usability: use it as an OGSA- DAI data service. DQP is capable of planning, scheduling and executing in parallel the distributed queries Calls to analysis (Web) services can be declared within queries and invoked by DQP. DBMS data OGSA-DQP QueryResults OGSA-DAI DBMS data

Data access & integration with OGSA-DAI: GGF 17 4 OGSA-DQP architecture OGSA-DAI data service perform Evaluator QE Evaluator QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) DQP activities installed

Data access & integration with OGSA-DAI: GGF 17 5 OGSA-DQP architecture DQP evaluator services: Are plain Web services Implement the QueryEvaluation port type: evaluate – the input is a query plan partition which is subsequently executed receiveData – allows the evaluator to receive data from other evaluators OGSA-DAI extensions: DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. OQL query statement activity – enables the submission of a query in Object Query Language (OQL) DQP factory activity – enables the creation and configuration of DQP resources.

Data access & integration with OGSA-DAI: GGF 17 6 Example query Given two DBMSs and one analysis tool (i.e., a Web service): goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service Blast (sequence alignment scoring Web service); We want to obtain alignment scores for a sequence against proteins of a certain kind The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO: ’ and p.proteinId=t.proteinId

Data access & integration with OGSA-DAI: GGF 17 7 Client interaction with OGSA-DQP Two kinds of client/server interactions: 1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource 2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource

Data access & integration with OGSA-DAI: GGF 17 8 DQP configuration OGSA-DAI data service perform Evaluator URLs OGSA-DAI data service resources Web service URLs OGSA-DAI data service GetRP DQP factory activity OGSA-DAI data service GetRP creates DQP DSR Global schema of imported DBs & analysis services Set of evaluators that can be used Physical DB metadata (used to optimise queries) Result: resource ID of created DSR

Data access & integration with OGSA-DAI: GGF 17 9 DQP query evaluation OGSA-DAI data service perform OQL query OGSA-DAI data service perform OQLQueryStatement DQP DSR Evaluator QE transport OGSA-DAI data service perform Analysis service... Evaluator QE Evaluator QE Result: WebRowSet XML Stream

Data access & integration with OGSA-DAI: GGF OQL Query Statement activity detail logical optimiser physical optimiser partitionerscheduler evaluators parser single-node optimiser multi-node optimiser query query results

Data access & integration with OGSA-DAI: GGF Logical optimisation Consider the query: select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO: ’ and p.proteinId = t.proteinId Plan is expressed as a logical algebra Multiple equivalent plans are generated scan termId=GO: (goTerm) scan (protein) reduce join (proteinId) op_call (Blast) reduce

Data access & integration with OGSA-DAI: GGF Physical optimisation Plan is expressed as a physical algebra Plan is chosen by cost- ranking of equivalent plans table_scan termId=GO: (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce

Data access & integration with OGSA-DAI: GGF Query partitioning Plan is transformed into a parallel algebra (physical operators + data exchange) Exchange operators are placed where data exchange must take place table_scan termId=GO: (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange

Data access & integration with OGSA-DAI: GGF Query scheduling Allocate operators to evaluator nodes table_scan termId=GO: (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange 4, select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO: ’ and p.proteinId = t.proteinId partitioned parallelism pipelined parallelism

Data access & integration with OGSA-DAI: GGF Conclusion OGSA-DQP is a service based distributed query processor that is: Exposed as a service Implemented as an orchestration of services Benefits: Queries are executed in parallel OGSA-DAI OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI Web services can be invoked during query execution, merging data access with data analysis