Presentation is loading. Please wait.

Presentation is loading. Please wait.

14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.

Similar presentations


Presentation on theme: "14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester."— Presentation transcript:

1 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester Anastasios Gounaris Desmond Fitzgerald M. Nedim Alpdemir Norman W Paton Alvaro A A Fernandes Rizos Sakellariou Newcastle upon Tyne Arijit Mukherjee Jim Smith Paul Watson Service-Based Distributed Query Processing on the Grid: OGSA-DQP

2 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 2 context services 1.High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID. 2.Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available. 3.Distributed query processing technology is one approach to delivering (1.) given the availability of (2.).

3 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 3 OGSA-DQP goals 1.To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI]. 2.To benefit from Grid abstractions for on- demand, transparent allocation of resources required for a task [OGSA/OGSI/GT3]. 3.To provide transparent, implicit parallelism and distribution. [Polar*] 4.To orchestrate the composition of data retrieval and analysis services using query mechanisms. 5.To expose this orchestration capability as a Grid data service.

4 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 4 OGSA-DQP innovations  OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator.  All available nodes can be allocated for query evaluation (not just the nodes with data sources)  A distributed query execution plan is resourced on the fly  This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule.  The query plan is the outcome of optimising a declarative service orchestration expressed as a query.  OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.

5 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 5 OGSA-DQP wrapper-mediator approach OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. It promises bottom-lines regarding: –efficiency: “delegate optimisation and parallel scheduling to the system”; –effectiveness: “delegate service orchestration to the system”; –usability: “use it as a Grid data service”. DBMS data OGSA-DQP DBMS data QueryResults OGSA-DAI

6 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 6 OGSA-DQP provides two grid services Exposes to clients Grid Distributed Query Services (GDQSs) that: –interact with clients; –find and retrieve service descriptions; –parse, compile, partition and schedule the query execution over a union of distributed data sources. –Coordinates the GQESs into executing the plan The query plan is an orchestration of GQESs Coordinates transparently Grid Query Evaluation Services (GQESs) that: –implement the physical query algebra; –implement the query execution model and semantics; –run a partition of a query execution plan generated by a GDQS; –interact with other GQESs/GDSs/WSs but not with clients.

7 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 7 Brief tour: an illustration http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL varchar 12 varchar 12 varchar 12 id 130.88.192.230 http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450- 1076603541049 goterm goterm.OID string goterm.id string goterm.type string goterm.name goterms http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash- 31056514-1076603576481 goterms LIKE...

8 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 8 The Demonstration : Environment Setup

9 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 9 The Demonstration: Configuring the DQP Select DQP Factory Select Data Sources Select Web Services Import Metadata

10 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 10 The Demonstration : Example Query Given two DBMSs and one analysis tool (e.g., a WS): –Goterm to a GO Gene Ontology running as a remote mySQL DB, –proteinSequence yeast protein sequences, –EntropyAnalyser (information Content analyser); We can obtain the information content of protein sequences of a certain kind specified by certain gene ontology terms: select p.ORF, go.id, calculateEntropy(p.sequence) from p in protein_sequences, go in goterms, pg in protein_goterms where go.id=pg.GOTermIdentifier and p.ORF=pg.ORF and p.ORF like "YBL06%" and go.id like "GO:0000%"; Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid: Partition boundaries Parallelized on nodes 1 & 2

11 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 11 possible future work More functional: –Semi-structured data; –Streams. More dynamic: –Use Index Services; –Dynamically install factories. More application test-beds: –Metabolomics; –Sensor networks. More adaptive: –Queries may be long running, environment is constantly changing, and charging: static optimisation is likely to become stale fast. –So: monitor, assess and respond (e.g., switch operators/ algorithms, spawn more copies, relocate). More deployable: –As a web service, e.g..

12 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 12 where to find out more: software OGSA-DQP Grid middleware to query distributed data sources www.ogsadai.org.uk/dqp OGSA-DAI Grid middleware to interface with data(bases) www.ogsadai.org.uk/ Globus Toolkit Open-source implementation of OGSA/OGSI www.globustoolkit.org/


Download ppt "14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester."

Similar presentations


Ads by Google