Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

1 Context-based Exploitation of Data Warehouses Yeow Wei Choong 1, Dominique Laurent 2, Arnaud Giacometti 3, Patrick Marcel 3, Elsa Negre 3, Nicolas Spyratos.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Laboratoire d'InfoRmatique en Images et Systèmes d'information UMR /06/2015 Vasile-Marian Scuturici and Dejene Ejigu LIRIS-UMR 5205 CNRS, INSA de.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Data Cube and OLAP Server
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen, 1 David W. Embley 1 Stephen W. Liddle 2 1 Department of Computer Science 2 Rollins Center.
New Student Orientation Registration System Stephen Nakamura EE496 Preliminary Design Review Fall 2008.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March 31, 2004 Funded by National.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
1 CADE Finance and HR Reports Administrative Staff Leadership Conference Presenter: Mary Jo Kuffner, Assistant Director Administration.
Chapter 4: Organizing and Manipulating the Data in Databases
Approximated Provenance for Complex Applications
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Concepts of Database Management, Fifth Edition
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
1.
MD240 - MIS Oct. 4, 2005 Databases & the Data Asset Harrah’s & Allstate Cases.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Similarity-Based Recommendation of OLAP Sessions 1 Julien Aligon by Julien Aligon Arnaud Giacometti & Patrick Marcel (supervisors) Marie-Aude Aufaure &
1/27 Ensemble Visualization for Cyber Situation Awareness of Network Security Data Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North.
Prerequisite Checker Neeharika Bollepalli Masters Report, Final Defense Guidance by Dr. Dan Andresen.
Vidas Matelis, Toronto SQL Server User Group November 13, 2008.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Warehousing.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
BUSINESS ANALYTICS AND DATA VISUALIZATION
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Department of Industrial Engineering Sharif University of Technology Session# 9.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
Gestion efficace de Séries Temporelles en P2P Application à l'analyse technique et l'étude des objets mobiles G. Gardarin, B. Nguyen, L. Yeh, K. Zeitouni,
COSC 6340 Projects & Homeworks Spring Learn how to define tables Learn how to load and create an Oracle database Learn how to define user views.
Data Warehousing.
A New OLAP Aggregation Based on the AHC Technique DOLAP 2004 R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 2 5, avenue.
Mailto : for all Hyperion video tutorial/Training/Certification/Material Understanding MDX with BSO and ASO.
DAT 378 SQL Server 2000 Bringing The Best of Reporting Services and Analysis Services Together Sean Boon Program Manager, BI Systems
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
A Decision Tree Approach to Cube Construction Patrick Kelly.
Supervisor : Prof . Abbdolahzadeh
SQL Server Analysis Services Fundamentals
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Jiawei Han Department of Computer Science
SQL Server Analysis Services Fundamentals
Data Warehouse and OLAP
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
A New Storage Engine Specialized for MOLAP
Chapter 17 Designing Databases
Data Warehouse and OLAP
Presentation transcript:

Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO,

2 Outline Short CV Personnalizing OLAP queries Recommending OLAP queries Summarizing OLAP queries Perspectives

3 About me PhD « multidimensional data(base) manipulations and rule based languages », defended 1998, LISI (now LIRIS) INSA Lyon Sup. J. Kouloumdjian and MS Hacid Maître de Conférences, UFRT, Dépt. Informatique Head of the Masters program in Information systems and decision making Semester off (September 2010 – January 2011)

4 About me (cont'd) Member of DB & NLP team (4 PR, 8 MCF) NLP XML and web technology Data mining and OLAP Recent activities Pattern based global models (PhD Eynollah Khanjari 2009) Summarizing and visualizing large sets of association rules (PhD Marie Ndiaye 2010) Collaborative exploration of datawarehouses (PhD Elsa Negre 2009)

5 Personnalizing OLAP queries PhD Hassina Mouloudi (2007) Main pulications ACM DOLAP 2005 BDA 2006 Hassina's dissertation (in French) Prototype Mobile application for querying a cube with query personnalization  Mondrian, Oracle, Tomcat, Axis

6 Motivation SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Members}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) Visualization depends on the user's profile

7 The problem Given – An MDX query q – User preferences P – A Visualization constraint v Find a preferred query q' – Included in q – Nearest to q satisfying v – The most interesting w.r.t P

8 Example of preferred query SELECT CROSSJOIN({City.Tours}, {Category.Food,Category.Drink}) ON ROWS {Year.2005} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) SELECT CROSSJOIN({City.Tours}, {Year.2006}) ON ROWS {Category.Drink} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) < Since the user profile contains Location < Product, Product < Time 2005 < 2006, food < drink Indeed: (2005,Food,Tours,quantity) < (2006,Drink,Tours,quantity) (2005,Drink,Tours,quantity) < (2006,Drink,Tours, quantity)

9 Personnalizing User query Result User profil Dimension tables Fact table Query processor Personnalization engine

10 Personnalizing OLAP queries Context – Dimension tables in main memory – No acces to the fact table Principle – Compute sets of positions in the resulting crosstab Largest possible Visualizable w.r.t. The visualization constraint Corresponding to the preferred facts – Compute the structures of the crosstabs

11 Example of personnalization (1) The query: SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Members}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity) Preferences: Time < Location and Product < Location 2002 < 2003 < 2004 < 2005 < 2006 Electronics < shoes < cloth < food < drink Quantity < price Constraint: 2 axes, no more than 4 positions on each axis

12 Example of personnalization (2) Step 1 The most preferred facts

13 Example of personnalization (3) Step 2 The second most preferred facts

14 Example of personnalization (4) Step 3: the next most preferred facts But the selected facts have to satisfy the visualization constraint

15 Example of personnalization (5) Finally, one of the constructed query is SELECT CROSSJOIN({City.Tours, City.Orleans}, {Category.Food, Category.drink}) ON ROWS {2003, 2004, 2005, 2006} ON COLUMNS FROM SalesCube WHERE (Measures.quantity)

16 Prototype

17 Speedup

18 Recommending OLAP queries PhD Elsa Negre (2009) Main publications ACM DOLAP 2008 DaWak 2009 ACM DOLAP 2009 Int. Journal of DW and mining Prototype Various methods for OLAP query recommendation  Mondrian, MySql

19 Context and principle

20 Distances Between positions in the cube – Hamming – Based on shortest path Between queries – Based on differences in dimension – Hausdorff Between sessions – Based on the subsequence – Edit distance

21 Experiments Cube – Foodmart (Mondrian sample cube) Session generator – Max 100 cells per MDX query – sessions – queries/session – Log of queries – 1-20 queries/current session

22 Efficiency Shortest path Hausdorff distance Edit distance

23 Effectiveness 10 fold cross validation – 1 query set = 10 equally sized subsets 9 for the log 1 for the current sessions For the current sessions – Remove the last query – check how often this last query is recommended

24 Effectiveness E= Members of the expected query R = Members of the recommended query Precision = Intersect / R Recal = intersect / E Fmeasure = 2 * precision * recall / precision + recall Intersect

25 Query recommandation for discovery driven analysis? Hm this looks strange to me... interesting...

26 Processing the log 1: Consider all sessions

27 Processing the log 2: consider all queries 1: Consider all sessions

28 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs

29 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs

30 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs 5: detect their exception pairs

31 Processing the log 2: consider all queries 1: Consider all sessions 3: consider all difference pairs 4: detect their drilldown pairs 6: consider only the most general pairs having drilldown pairs or exceptions pairs 5: detect their exception pairs

32 Recommending 1: detect difference pairs

33 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs

34 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries...

35 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries... 4:... then drilldown queries

36 Recommending 2: specialize a most general pair in the log? 1: detect difference pairs 3: suggest the most general queries... 5:... then exception queries 4:... then drilldown queries

37 Prototype Java, mondrian OLAP engine & Sarawagi's icube Preliminary tests show that for small size log (few hundreds of queries) Recommendation time does not exceeds 50 ms

38 Conclusion: so far... Hm this looks strange to me... Ongoing work with IRSA (a French social security health examination center) to analyze over health care examination questionnaires

39 Summarizing OLAP queries Master's thesis Julien Aligon (in progress) Problem: viewpoints on former sessions? – By summarizing the log Summarize a sequence of queries by a sequence of queries – By browsing/querying the summary Experiments on healthcare data Related publication – EDA 2007, 2010

40 Perspectives Project STIC-AmSud PQUERY: preference models for personnalized queries Forthcomming work with M. Golfarelli (U. Bologna) – Preference mining to dynamically add preferences to an MDX query Contributions to a collaborative query management system for OLAP