Presentation is loading. Please wait.

Presentation is loading. Please wait.

OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.

Similar presentations


Presentation on theme: "OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed."— Presentation transcript:

1 OGSA-DQP Steven Lynden University of Manchester

2 Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points People involved: University of Manchester Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton University of Newcastle Jim Smith, Arijit Mukherjee, Paul Watson OGSA-DAI Prototype release 3.0 available from the OGSA-DAI website

3 Data access & integration with OGSA-DAI: GGF 17 3 OGSA-DQP high-level overview OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. Usability: use it as an OGSA- DAI data service. DQP is capable of planning, scheduling and executing in parallel the distributed queries Calls to analysis (Web) services can be declared within queries and invoked by DQP. DBMS data OGSA-DQP QueryResults OGSA-DAI DBMS data

4 Data access & integration with OGSA-DAI: GGF 17 4 OGSA-DQP architecture OGSA-DAI data service perform Evaluator QE Evaluator QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) DQP activities installed

5 Data access & integration with OGSA-DAI: GGF 17 5 OGSA-DQP architecture DQP evaluator services: Are plain Web services Implement the QueryEvaluation port type: evaluate – the input is a query plan partition which is subsequently executed receiveData – allows the evaluator to receive data from other evaluators OGSA-DAI extensions: DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. OQL query statement activity – enables the submission of a query in Object Query Language (OQL) DQP factory activity – enables the creation and configuration of DQP resources.

6 Data access & integration with OGSA-DAI: GGF 17 6 Example query Given two DBMSs and one analysis tool (i.e., a Web service): goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service Blast (sequence alignment scoring Web service); We want to obtain alignment scores for a sequence against proteins of a certain kind The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId

7 Data access & integration with OGSA-DAI: GGF 17 7 Client interaction with OGSA-DQP Two kinds of client/server interactions: 1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource 2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource

8 Data access & integration with OGSA-DAI: GGF 17 8 DQP configuration OGSA-DAI data service perform Evaluator URLs OGSA-DAI data service resources Web service URLs OGSA-DAI data service GetRP DQP factory activity OGSA-DAI data service GetRP creates DQP DSR Global schema of imported DBs & analysis services Set of evaluators that can be used Physical DB metadata (used to optimise queries) Result: resource ID of created DSR

9 Data access & integration with OGSA-DAI: GGF 17 9 DQP query evaluation OGSA-DAI data service perform OQL query OGSA-DAI data service perform OQLQueryStatement DQP DSR Evaluator QE transport OGSA-DAI data service perform Analysis service... Evaluator QE Evaluator QE Result: WebRowSet XML Stream

10 Data access & integration with OGSA-DAI: GGF 17 10 OQL Query Statement activity detail logical optimiser physical optimiser partitionerscheduler evaluators parser single-node optimiser multi-node optimiser query query results

11 Data access & integration with OGSA-DAI: GGF 17 11 Logical optimisation Consider the query: select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId Plan is expressed as a logical algebra Multiple equivalent plans are generated scan termId=GO:0005942 (goTerm) scan (protein) reduce join (proteinId) op_call (Blast) reduce

12 Data access & integration with OGSA-DAI: GGF 17 12 Physical optimisation Plan is expressed as a physical algebra Plan is chosen by cost- ranking of equivalent plans table_scan termId=GO:0005942 (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce

13 Data access & integration with OGSA-DAI: GGF 17 13 Query partitioning Plan is transformed into a parallel algebra (physical operators + data exchange) Exchange operators are placed where data exchange must take place table_scan termId=GO:0005942 (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange

14 Data access & integration with OGSA-DAI: GGF 17 14 Query scheduling Allocate operators to evaluator nodes table_scan termId=GO:0005942 (goTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange 4,5 2 13 select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId partitioned parallelism pipelined parallelism

15 Data access & integration with OGSA-DAI: GGF 17 15 Conclusion OGSA-DQP is a service based distributed query processor that is: Exposed as a service Implemented as an orchestration of services Benefits: Queries are executed in parallel OGSA-DAI OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI Web services can be invoked during query execution, merging data access with data analysis


Download ppt "OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed."

Similar presentations


Ads by Google