Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

Similar presentations


Presentation on theme: "Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010."— Presentation transcript:

1 Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010

2 2 Outline Short CV Personnalizing OLAP queries Recommending OLAP queries Summarizing OLAP queries Perspectives

3 3 About me PhD « multidimensional data(base) manipulations and rule based languages », defended 1998, LISI (now LIRIS) INSA Lyon Sup. J. Kouloumdjian and MS Hacid Maître de Conférences, UFRT, Dépt. Informatique Head of the Masters program in Information systems and decision making Semester off (September 2010 – January 2011)

4 4 About me (cont'd) Member of DB & NLP team (4 PR, 8 MCF) NLP XML and web technology Data mining and OLAP Recent activities Pattern based global models (PhD Eynollah Khanjari 2009) Summarizing and visualizing large sets of association rules (PhD Marie Ndiaye 2010) Collaborative exploration of datawarehouses (PhD Elsa Negre 2009)

5 5 Personnalizing OLAP queries PhD Hassina Mouloudi (2007) Main pulications ACM DOLAP 2005 BDA 2006 Hassina's dissertation (in French) Prototype Mobile application for querying a cube with query personnalization  Mondrian, Oracle, Tomcat, Axis

6 6 Motivation SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Members}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) Visualization depends on the user's profile

7 7 The problem Given – An MDX query q – User preferences P – A Visualization constraint v Find a preferred query q' – Included in q – Nearest to q satisfying v – The most interesting w.r.t P

8 8 Example of preferred query SELECT CROSSJOIN({City.Tours}, {Category.Food,Category.Drink}) ON ROWS {Year.2005} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) SELECT CROSSJOIN({City.Tours}, {Year.2006}) ON ROWS {Category.Drink} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) < Since the user profile contains Location < Product, Product < Time 2005 < 2006, food < drink Indeed: (2005,Food,Tours,quantity) < (2006,Drink,Tours,quantity) (2005,Drink,Tours,quantity) < (2006,Drink,Tours, quantity)

9 9 Personnalizing User query Result User profil Dimension tables Fact table Query processor Personnalization engine

10 10 Personnalizing OLAP queries Context – Dimension tables in main memory – No acces to the fact table Principle – Compute sets of positions in the resulting crosstab Largest possible Visualizable w.r.t. The visualization constraint Corresponding to the preferred facts – Compute the structures of the crosstabs

11 11 Example of personnalization (1) The query: SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Members}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) Preferences: Time < Location and Product < Location 2002 < 2003 < 2004 < 2005 < 2006 Electronics < shoes < cloth < food < drink Quantity < price Constraint: 2 axes, no more than 4 positions on each axis

12 12 Example of personnalization (2) Step 1 The most preferred facts

13 13 Example of personnalization (3) Step 2 The second most preferred facts

14 14 Example of personnalization (4) Step 3: the next most preferred facts But the selected facts have to satisfy the visualization constraint

15 15 Example of personnalization (5) Finally, one of the constructed query is SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Food, Category.drink}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity)

16 16 Prototype

17 17 Speedup

18 18 Recommending OLAP queries PhD Elsa Negre (2009) Main publications ACM DOLAP 2008 DaWak 2009 ACM DOLAP 2009 Int. Journal of DW and mining Prototype Various methods for OLAP query recommendation  Mondrian, MySql

19 19 Context and principle

20 20 Distances Between positions in the cube – Hamming – Based on shortest path Between queries – Based on differences in dimension – Hausdorff Between sessions – Based on the subsequence – Edit distance

21 21 Experiments Cube – Foodmart (Mondrian sample cube) Session generator – Max 100 cells per MDX query – 25-50 sessions – 20-50 queries/session – Log of 150-25000 queries – 1-20 queries/current session

22 22 Efficiency Shortest path Hausdorff distance Edit distance

23 23 Effectiveness 10 fold cross validation – 1 query set = 10 equally sized subsets 9 for the log 1 for the current sessions For the current sessions – Remove the last query – check how often this last query is recommended

24 24 Effectiveness E= Members of the expected query R = Members of the recommended query Precision = Intersect / R Recal = intersect / E Fmeasure = 2 * precision * recall / precision + recall Intersect

25 25 Query recommandation for discovery driven analysis? Hm this looks strange to me... interesting...

26 26 Processing the log 1: Consider all sessions

27 27 Processing the log 2: consider all queries 1: Consider all sessions

28 28 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs

29 29 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs

30 30 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs 5: detect their exception pairs

31 31 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs 6: consider only the most general pairs having drilldown pairs or exceptions pairs 5: detect their exception pairs

32 32 Recommending 1: detect difference pairs

33 33 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs

34 34 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries...

35 35 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries... 4:... then drilldown queries

36 36 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries... 5:... then exception queries 4:... then drilldown queries

37 37 Prototype Java, mondrian OLAP engine & Sarawagi's icube Preliminary tests show that for small size log (few hundreds of queries) Recommendation time does not exceeds 50 ms

38 38 Conclusion: so far... Hm this looks strange to me... Ongoing work with IRSA (a French social security health examination center) to analyze over 500.000 health care examination questionnaires

39 39 Summarizing OLAP queries Master's thesis Julien Aligon (in progress) Problem: viewpoints on former sessions? – By summarizing the log Summarize a sequence of queries by a sequence of queries – By browsing/querying the summary Experiments on healthcare data Related publication – EDA 2007, 2010

40 40 Perspectives Project STIC-AmSud PQUERY: preference models for personnalized queries Forthcomming work with M. Golfarelli (U. Bologna) – Preference mining to dynamically add preferences to an MDX query Contributions to a collaborative query management system for OLAP


Download ppt "Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010."

Similar presentations


Ads by Google