Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science.

Slides:

Advertisements

Similar presentations

Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works.

Advertisements

TI – 83 Plus1 A Quick Reference Presentation for AMSTI Year 1 Training.

COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables II.

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.

Cost Behavior, Operating Leverage, and Profitability Analysis

Multi-RQP Generating Test Databases for the Functional Testing of OLTP Applications Carsten Binnig Joint work with: Donald Kossmann, Eric Lo DBTest Workshop,

STATISTICS Sampling and Sampling Distributions

ON RULES, PROCEDURES, CACHING AND VIEWS IN DATA BASE SYSTEM by M. Stonebraker, A. Jhingran, J. Goh, and S. Potamianos UC Berkley Presented by Zhou Ji.

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

WHAT DO THEY ALL MEAN?. Median Is the number that is in the middle of a set of numbers. (If two numbers make up the middle of a set of numbers then the.

Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.

Introduction to SQL 1 Lecture 5. Introduction to SQL 2 Note in different implementations the syntax might slightly differ different features might be.

1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.

On Sequential Experimental Design for Empirical Model-Building under Interval Error Sergei Zhilin, Altai State University, Barnaul, Russia.

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.

1 The Impact of Buy-Down on Sell Up, Unconstraining, and Spiral-Down Edward Kambour, Senior Scientist E. Andrew Boyd, SVP and Senior Scientist Joseph Tama,

Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.

13.1 Theoretical Probability

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Chapter 4: Basic Estimation Techniques

1 Lecture 5: SQL Schema & Views. 2 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL)

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Eftychia Baikousi Panos Vassiliadis

Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

N-Player Games. A symmetric N-person game. 1)All players have same strategy sets 2)If you switch two players strategies, you switch their payoffs and.

Elementary Statistics

Modern Programming Languages, 2nd ed.

Best Value For Money adjudications for Industrial Services C. Lara/CERN 14 November 2013.

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

Business and Economics 6th Edition

Intelligent Light Control using Sensor Networks Vipul Singhvi 1,3, Andreas Krause 2, Carlos Guestrin 2,3, Jim Garrett 1, Scott Matthews 1 Carnegie Mellon.

Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir.

Thomas Jellema & Wouter Van Gool 1 Question. 2Answer.

Please take your learning log from the table by the door.

Optimization 1/33 Radford, A D and Gero J S (1988). Design by Optimization in Architecture, Building, and Construction, Van Nostrand Reinhold, New York.

Math Extension and Intervention Be on time and be respectful to others….A large group…… No food or drinks or popping gum No , searching, or playing.

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.

2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Other Dynamic Programming Problems

Brian Peasley and Stan Birchfield

all-pairs shortest paths in undirected graphs

Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.

Constraint Optimization We are interested in the general non-linear programming problem like the following Find x which optimizes f(x) subject to gi(x)

指導教授：陳良弼老師報告者：鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.

TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)

Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.

COMP 3715 Spring 05. Working with data in a DBMS Any database system must allow user to  Define data Relations Attributes Constraints  Manipulate data.

Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

HYPOTHESIS TESTS ABOUT THE MEAN AND PROPORTION

Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung

Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.

Xiao Liu 1, Yun Yang 1, Jinjun Chen 1, Qing Wang 2, and Mingshu Li 2 1 Centre for Complex Software Systems and Services Swinburne University of Technology.

CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.

1 Using Network Coding for Dependent Data Broadcasting in a Mobile Environment Chung-Hua Chu, De-Nian Yang and Ming-Syan Chen IEEE GLOBECOM 2007 Reporter.

SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.

CPS216: Data-intensive Computing Systems

Anthony Okorodudu CSE ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan.

Tuning the top-k view update process

Incremental Maintenance of XML Structural Indexes

Presentation transcript:

Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science

M-Pref 2007, Vienna 23/9/ Forecast Problem of maintaining materialized top-k views, when updates occur in the base relation Extra difficulty: address the problem in the presence of high deletion rates The crux of the approach is to materialize an appropriate number of extra tuples kcomp to sustain the deletion rates that are drastically higher than average The correct estimation & fine tuning of kcomp is not obvious We use appropriate probabilistic methods

M-Pref 2007, Vienna 23/9/ Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

M-Pref 2007, Vienna 23/9/ Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

M-Pref 2007, Vienna 23/9/ Top-k query Given a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3) Find k tuples with highest grades according to Q idx1x2x3 a b c d R Top-2 tuples sum

M-Pref 2007, Vienna 23/9/ Motivating Example Shopping Center Customers sign in with a palmtop (PDA) Need for advertisements – Special offers to Customers Given relation Customers (id, name, age, salary, …) materialized view V of the top-2 ( Younger and Highly paid Customers ) according to the query Q: - age + 2*salary Maintain the view V Customers sign in and out (e.g., train departures, working hours) idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q nameQ Bill44 John22 CustomersV

M-Pref 2007, Vienna 23/9/ Problem definition Given a base relation R (ID, X, Y) that originally contains N tuples, a materialized view V that contains top-k tuples of the form (id, val) where val is the score according to a function Q(x,y)=ax + by and a, b are constant parameters, the update ratios ins, del and upd for insertions, deletions and updates respectively over the base relation R, Compute kcomp that is of the form kcomp = k + Δk Such that the view will containat least k tuples, k kcomp, with probability p, after a period T idQ k ΔkΔk kcomp V

M-Pref 2007, Vienna 23/9/ Related Work Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo Chen: Efficient Maintenance of Materialized Top-k Views, ICDE 03 Maintain a materialized top-k view when updates occur in the base table Compute a k max (instead of the necessary k) adjusted at runtime so a refill query is rarely needed formulates the problem through a random walk model The method is theoretically guaranteed to work well only when the probabilities of insertions and deletions are equal, p ins =p del of insertions are more frequent than deletions p ins >p del There is no quality-of-service guarantee when deletions are more probable than insertions, p ins <p del

M-Pref 2007, Vienna 23/9/ Motivating Example idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q Customers sign in and out Due to train departures, working hours At certain time periods, deletions are more probable than insertions p ins <p del The view will not contain at least k tuples nameQ Bill44 John22 Customers V

M-Pref 2007, Vienna 23/9/ Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

M-Pref 2007, Vienna 23/9/ Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

M-Pref 2007, Vienna 23/9/ Empirical Cumulative Distribution Function ECDF ECDF is a non parametric cumulative distribution function that adapts itself to the data Definition F n (x) represents the proportion of observations in a sample less than or equal to x assigns the probability 1/n to each of n observations in the sample estimates the true population proportion F(x)

M-Pref 2007, Vienna 23/9/ Computation of update rates that affect V Given a relation Customers (id, name, age, salary, …) having N=4 tuples a materialized view V containing top-2 tuples (k=2) of the form (id, Q) where Q= -age +2*salary is the score Update ratios ins =1, del =2, upd =0 Find ins_aff and del_aff (insertions & deletions affecting the view) idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q nameQ Bill44 John22 Customers V

M-Pref 2007, Vienna 23/9/ Computation of update rates that affect V Given N=4, ins =1, del =2, upd =0 We compute the following: updates are treated as a combination of deletions and insertions from ECDF the probability of a new tuple affecting the view Ratios affecting the view

M-Pref 2007, Vienna 23/9/ Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

M-Pref 2007, Vienna 23/9/ Computation of kcomp Compute kcomp such that it will guarantee that the view will contain at least k tuples, k kcomp, with probability p, after a period of operation T that is of the form kcomp = k + Δk idQ ΔkΔk k kcomp idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q nameQ Bill44 John22 Peter17 CustomersV

M-Pref 2007, Vienna 23/9/ Computation of kcomp idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 5Kate2530 Q nameQ Bill44 Kate25 John22 Peter17 There is 1 insertion and 2 deletions affecting the view Tuple (5, Kate, 25, 30) is inserted and Tuples (3, Bill, 26, 35) and (4, Peter, 57, 37) are deleted from the view The view will contain 2 tuples, as initially needed Customers V

M-Pref 2007, Vienna 23/9/ Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

M-Pref 2007, Vienna 23/9/ Fine tune kcomp kcomp is expressed as a formula depending on ins_aff and del_aff the ratios of insertions and deletions affecting the view The probability of a tuple affecting the view may vary according to probabilistic properties Fine tune kcomp by adding the appropriate variance

M-Pref 2007, Vienna 23/9/ Fine tune kcomp The probability of a new tuple z affecting the view is p(z>valk) Bernoulli experiment with 2 possible events New tuple z affecting the view with probability p(z) New tuple z not-affecting the view with probability 1-p(z) The number of successes of ins Bernoulli experiments follow a Binomial distribution with VARIANCE : ins insertions in the base relation ins Bernoulli experiments

M-Pref 2007, Vienna 23/9/ Fine tune kcomp In worst case, in order to guarantee that the view will contain at least k tuples with confidence 95% kcomp is computed as: VAR ins denotes the variance of the insertions VAR del denotes the variance of the deletions

M-Pref 2007, Vienna 23/9/ Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

M-Pref 2007, Vienna 23/9/ Experimental methodology Test the following methods kcomp without fine tuning kcomp with fine tuning Yi et ICDE03 For the following measures Number of tuples (# tuples) deleted from the view that fall below the threshold value of k Memory overhead for kcomp with & without fine tuning as number of extra tuples needed to keep in the view Number of extra tuples for kcomp with & without fine tuning compared to the number of extra tuples of the related work

M-Pref 2007, Vienna 23/9/ Experimental methodology Synthetic data sets: Gaussian distribution with mean μ=50 and variance σ=10 Negative exponential distribution with parameters a=1.0 for X and a=2.0 for Y Zipf distribution with parameter a=2.1 Size of source table R (tuples)|R|1x10 5, 5x10 5, 1x10 6, 2x10 6 Size of mat. View (tuples)k5, 10, 100, 1000 Size of update stream (pct over |R|) 1/1000, 1/100 Deletion rate over insertion rate (ratio) D/I1.0, 1.5, 2.0 Experimental parameters:

M-Pref 2007, Vienna 23/9/ Max & average misses kcomp without fine tuning Gaussian distribution As a function of R and As a function of k and D/I

M-Pref 2007, Vienna 23/9/ Memory overhead Number of extra tuples as a function of R and D/I

M-Pref 2007, Vienna 23/9/ Comparison with related work Number of extra tuples of kcomp with fine tuning compared with k max of the related work as a function of R

M-Pref 2007, Vienna 23/9/ Comparison with related work Number of extra tuples of kcomp with fine tuning compared with k max of the related work a s a function of k

M-Pref 2007, Vienna 23/9/ Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

M-Pref 2007, Vienna 23/9/ Conclusions We handled the problem of maintaining materialized top-k views in the presence of high deletion rates The method comprises the following steps: a computation of the rate that actually affects the materialized view, a computation of the necessary extension to k in order to handle the augmented number of deletions that occur and a fine tuning part that adjusts this value to take the fluctuation of the statistical properties of this value into consideration

M-Pref 2007, Vienna 23/9/ Thank you for your attention! … many thanks to our hosts! This research was co-funded by the European Union in the framework of the program Pythagoras IΙ of the Operational Program for Education and Initial Vocational Training of the 3rd Community Support Framework of the Hellenic Ministry of Education, funded by 25% from national sources and by 75% from the European Social Fund (ESF).

M-Pref 2007, Vienna 23/9/ Auxiliary slides Formulas for kcomp

M-Pref 2007, Vienna 23/9/ Time to build top-k view in microseconds NKGaussNegative exponential Zipf 100K K K K K K K K M M M M M M M M