Presentation is loading. Please wait.

Presentation is loading. Please wait.

Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.

Similar presentations


Presentation on theme: "Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT."— Presentation transcript:

1 Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT

2 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 2 Overview Background: OLAP-XML federations New challenges –XML data changes –Slow or unreliable XML sources –Schema changes in data sources –Other challenges Integration in TARGIT architecture Other applications of the techniques Conclusion and future work Related work

3 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 3 Data Warehousing & OLAP Multidimensional analysis: TARGIT Analysis

4 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 4 OLAP Good for complex ad hoc queries –Simple: natural, graphical queries –Fast: pre-aggregation A number of problems with physical integration –Short-term and varying data needs Population, product info,... –Dynamical data Stock quotes, competitor pricing,... –Data with limited access Competitor product info, public databases,...

5 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 5 OLAP-XML Federations OLAP -server Client Cube Traditional OLAP architecture:

6 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 6 OLAP-XML Federations Logical integration of XML data –External dimensions –External measures Data combined at query time Federation Client XML Cube

7 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 7 OLAP-XML Federations Logical integration of XML data –External dimensions –External measures Data combined at query time Transparent for users Flexible: many XML sources Quick: running in a few mins Data is always fresh Performance often comparable to physical integration Federation Client XML Cube

8 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 8 XPath Queries for Fetching XML 1984 Orwell Of Mice and Men Steinbeck /Books/Book[Author=”Steinbeck”]/Title Federation Client XML Cube XPath Dimension value

9 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 9 Old And New TARGIT Architecture

10 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 10 New Challenges Our previous work focused on basic aspects –Flexibility –General performance –Implementation New: what can go wrong? – need for adaptivity –XML data changes –XML sources slow or unreliable –Schema changes (XML, OLAP, federation) We often have no control over the XML sources A solution has broad interest: views over XML sources

11 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 11 XML Data Changes Basic federation –XML data is integrated at query time => XML data changes handled automatically However, XML data is cached for performance –Cache timeout value ensures fresh data (set manually or automatically) –0 cache timeout => always fetch from source Only few current XML databases inform about changes –Xyleme allows users to subscribe to changes –Only delta should be transferred

12 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 12 ICE: Information and Content Exchange Protocol proposed by W3C for automatically informing about and requesting changes –Supported by major vendors –Push: subscribe to changes and keep cache up-to-date –Pull: request changes from source at query time

13 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 13 Slow and Unreliable XML Sources Overload, maintenance, HW breakdown, attacks –Often we no influence on this Incremental presentation for user –What if source is too slow or no reply at all? Inform user that the system is not working…? Specification of alternative sources –Several queries per external dimension/measure –Increased fault tolerance, also better performance SourceServerClient

14 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 14 Slow and Unreliable XML Sources Start several queries and use the fastest –Always uses the fastest, but heavy load on sources –Use first response time as indicator for total time Start one query at a time Minimal load on sources, but slower Fed ? 1 2 3 1 2 3 3 1 2

15 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 15 Slow and Unreliable XML Sources Alternative sources of lower quality: better than no data? Alternatives –Expired cache data –Google, Xyleme, The WayBack Machine –Backup-disk, tape –Etc. SourceSpeedQuality Local cacheFastestFresh Original sourceFast?Freshest Expired cacheFastestOld Backup sourceFast/slowVery old

16 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 16 Slow and Unreliable XML Sources In practice? Sources with equal priority chosen at random

17 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 17 Result: Algorithm for Fetching XML Data

18 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 18 Experiments 1st experiment: fetching a 137 KB dimension –Start 8 queries, when first 3 respond, (cancel) last 5, when fastest query finish, (cancel) remaining 2 –Fast reply = good indication of overall speed 2nd experiment: search local cache, then Google cache

19 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 19 Schema Changes In XML Sources How to synchronize XML views after schema change? (solution described in separate paper) Bibliography Publisher PName Book Author AName Title Price /Bibliography/Author[AName=”Orwell”]/Book/Title Bibliography Publisher PName Book Author AName Title Price

20 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 20 Additional Challenges Changes to federation schema –Cache may be invalidated –Discard affected cache results (unproblematic) OLAP data changes –Cache may be invalidated –Less frequent than XML data changes => cache will often have expired anyway OLAP schema changes –Federated schema may be invalidated –Rare and easy to detect (and correct)

21 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 21 Integrating Techniques - Architecture

22 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 22 Integrating Techniques – Query Processing Query Evaluator splits query into XML+OLAP parts and determines query plan based on cost Execution Engine coordinates and executes plan Cache Manager maintains cache, e.g., through ICE XML Component interface fetches XML data, chooses between available XML sources (Algorithm 1) View Synchronizer handles schema changes Metadata Manager manages info about external dimensions and measures + XML component characteristics

23 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 23 Other Applications All XPath-based views on XML data Links to parts of XML documents Web pages Documents (DocBook) Software applications and many more… Automatic recreation of broken links Increased fault tolerance and performance using alternative sources ?

24 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 24 Conclusion and Future Work Operational problems in OLAP-XML federations XML data changes Slow and unreliable XML sources –Using several sources (Algorithm 1) –Experiment with Algorithm1 Techniques integrated into federation architecture Schema evolution and other challenges Future work –TARGIT implementation and testing –Using techniques in other applications

25 Torben Bach Pedersen · DOLAP 2003 · 25-05-2015 25 Related Work Data changes in XML/semistructured documents –Xyleme + Zhuge Schema changes in scientific documents –Not XML Adaptive/dynamic query optimization –Telegraph project –We use once per source, rather than per tuple Does not consider one or more of: OLAP+XML concepts, schema changes, slow and unreliable sources Own previous OLAP-XML work is not adaptive


Download ppt "Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT."

Similar presentations


Ads by Google