A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Slides:



Advertisements
Similar presentations
Optimal Top-k Generation of Attribute Combinations based on Ranked Lists Jiaheng Lu, Renmin University of China Joint work with Pierre Senellart, Chunbin.
Advertisements

$100 $400 $300$200$400 $200$100$100$400 $200$200$500 $500$300 $200$500 $100$300$100$300 $500$300$400$400$500.
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
L3S Research Center University of Hanover Germany
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
1 s Share 2 s Share 3 s Share settles with 1 for $200,000 (the limit of 1 s insurance policy) settles with 1 for $700,000 Fault allocation: =0%, 1 =50%,
High Performance Discovery from Time Series Streams
Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science.
Eftychia Baikousi Panos Vassiliadis
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Demand and Supply: TV Set (Australia)
Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.
Kyle bought a bike from his friend. His friend gave him a 20% discount. He paid $40 for it. How much was the original price of the bike?
Christopher Tooley Chief Executive Officer Lycamobile
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
CS4432: Database Systems II
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Data Warehouse Tuning. 7 - Datawarehouse2 Datawarehouse Tuning Aggregate (strategic) targeting: –Aggregates flow up from a wide selection of data, and.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
COMP 3715 Spring 05. Working with data in a DBMS Any database system must allow user to  Define data Relations Attributes Constraints  Manipulate data.
Rumor Routing in Sensor Networks David Braginsky and Deborah Estrin Presented By Tu Tran 1.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Chapter 9 Business Intelligence Systems
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Detection of different types of bibliometric performance at the individual level in the Life Sciences: methodological outline Rodrigo Costas & Ed Noyons.
20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Join Synopses for Approximate Query Answering Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Welcome to LNGReports. What is LNGReports?  Leading provider of LNG research LNGReports is a leading provider of strategic and financial research of.
Efficient Processing of Top-k Spatial Preference Queries
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
MODIS-based Cropland Classification in North America Teki Sankey and Richard Massey Northern Arizona University.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Relaxing Queries Presented by Ashwin Joshi Kapil Patil Sapan Shah.
By A Sai Krishna Geethika Lokanadham Mithun Rajanna KV Kumar Data warehousing for Risk Analysis.
Chapter 13: Query Processing
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies 병렬 분산 컴퓨팅 연구실 석사 1 학기 김남희.
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
Operation Data Analysis Hints and Guidelines
CS 540 Database Management Systems
ICICLES: Self-tuning Samples for Approximate Query Answering
Optimizing Queries Using Materialized Views
Tuning the top-k view update process
D. ZeinalipourYazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V
Efficient Processing of Top-k Spatial Preference Queries
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Probabilistic Ranking of Database Query Results
Presentation transcript:

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda, Italy

Data Cube sum(price*quantity)

Data Cube BrandProduct TypeCountrysum(Price*Quantity) Brand1Type1Country11234 Brand1Type2Country13522 Brand1Type11234 Brand1Type23522 Brand1Country14756 Type1Country11234 Type2Country13522 Brand14756 Type11234 Type23522 Country What are the top-2 product types with the highest revenue of each brand in each country? 2.What are the top-2 brands with the highest revenue in each country?

Top-k Queries Primary Attribute: The attribute/dimension over which the selection is performed (e.g. product type) Secondary Attributes: Used to filter specific results (e.g. brand, country) Aggregated Attributes: Used to compute an aggregated score (e.g. price, quantity) Aggregate Function: e.g. sum One top-k query for each combination of secondary attribute instances (filtering condition)

Filtering Conditions: Example (1) brand={X}-country = {Y, W} brand=X AND country=Y brand=X AND country=W SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Filtering Conditions: Example (2) country = {Y, W}-brand={X} country=Y country=W brand=X SELECT type, SUM(price*quantity) FROM relation WHERE country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Filtering Conditions: Example (3) country = {Y, W}-brand={X} SELECT type, SUM(price*quantity) FROM relation GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Updates Insertions to the underlying database that contain all information related to the top-k queries INSERT INTO relation (type, brand, country, price, quantity) VALUES (T, X, Y, 100, 3)

Problem How to maintain all these queries in the presence of fast updates?

Outline Setting/Problem Algorithms – Naïve Approach – Estimates Approach – Groups Approach Experimental Results Conclusions

Example SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2 Update: (type, X, Y, 300)

Naïve Approach Case 1: type in the top-2, e.g. (B,X,Y,300) TypeScore A3452 B TypeScore A3452 B2706 Case 2: type NOT in the top-2, e.g. (K,X,Y,300) Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=K GROUP BY type

Estimates Approach In-memory Structures top-(k+N) instances with exact aggregated scores B instances with estimated aggregated scores best possible score (basic score) + inserted values TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer

Estimates Approach Case 1.1: type in the top-2, e.g. (B,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore A3452 B2706 C2356 D2167 E1987 top-2 top

Estimates Approach Case 1.2: type in the top-5, e.g. (D,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore A3452 B2406 C2356 D2467 E1987 top-2 top TypeScore A3452 D2467 B2406 C2356 E1987

Estimates Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer +300 TypeScore O1990 P2412 Q2076 R1997 Buffer Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=P GROUP BY type

Estimates Approach Sub-case 2.1: score(P) < score(E), e.g. score(P) = 756 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P756 Q2076 R1997 Buffer TypeScore O1990 Q2076 R1997

Estimates Approach Sub-case 2.2: score(P) > score(E), e.g. score(P) = 2178 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2178 Q2076 R1997 Buffer TypeScore A3452 B2406 C2356 D2167 P2178 TypeScore O1990 Q2076 R1997

Estimates Approach Sub-case 2.3: score(P) > score(B), e.g. score(P) = 2407 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2407 Q2076 R1997 Buffer TypeScore A3452 P2407 B2406 C2356 D2167 TypeScore O1990 Q2076 R1997

Estimates Approach Buffer Full Reset Query Estimated Score(T) = basic score = 2287 Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300) SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type IN (O,P,Q,R) GROUP BY type

Estimates Approach score(O)=1254, score(P)=432, score(Q)=2050, score(R)=1990 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer TypeScore T2287 TypeScore A3452 B2406 C2356 D2167 Q2050 Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)

Queries Characteristics SAME primary attribute SAME aggregate attributes SAME aggregate function SAME top-k condition DIFFERENT filtering condition

Lattice organisation

Groups Approach The updates are forwarded from top to bottom in the lattice Each ranking forwards the queried results to the rankings lying in lower levels in the lattice

Groups Approach: Example SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2 Update: (type, X, Y, 300) Ranking: brand=X, country=ANY

Groups Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer +300 TypeScore O1990 P2412 Q2076 R1997 Buffer Verification Query

Groups Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) Verification Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type=P Buffer Reset Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R) Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)

Groups Approach Tuples (type, brand, country, price*quantity) limited to those satisfying its filtering condition Uses them to compute the scores. Forwards them to the rankings lower in lattice Rankings receiving tuples use those qualifying to their filtering condition to compute the scores

Groups Approach: Verification Query SELECT brand, country, price*quantity FROM relation WHERE brand=X AND type=P A set with (brand, country, price*quantity) tuples limited to those that have brand=X Uses them to compute score(P). Forwards them to the rankings lower in lattice ({brand=X, country=Y}, {brand=X, country=W})

Groups Approach Buffer Full Reset Query Estimated Score(T) = score(E) = 2287 SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R) Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)

Outline Problem Algorithms Naïve Approach Estimates Approach Groups Approach Experimental Results Conclusions

Experiments (1) TPC-H data Select on part.p_partKey (200,000 unique values) Filter on customer.c_mktsegment, orders.o_orderpriority and region.r_name Aggregation sum on lineitem.l_quantity 216 total rankings 30,000 updates/insertions

Experiments (2) Updates Random: inserts quantity between 1 and 50 for a random part.p_partKey : inserts quantity between 1 and 50 for a part.p_partKey selected according to the rule N-extra Gap Difference between top-k and top-(k+N) scores 100% (1*50) and 200% (2*50)

80-20 Updates: Queries

80-20 Updates: Time

Random Updates: Queries

Random Updates: Time

Naïve Approach updates: 239,985 Verification Queries, 4 secs/update Random updates: 239,977 Verification Queries, 4 secs/update

Outline Problem Algorithms Naïve Approach Estimates Approach Groups Approach Experimental Results Conclusions

Conclusion Two algorithms to maintain top-k rankings in the presence of fast updates arriving in an underlying database Exact top-k results Faster than a Naïve approach while Groups Approach limits further the communication with the database Preliminary results which provide insights on the impact of the various parameters in the effectiveness of our methods

Thank you!

Additional Instances