K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.

Slides:

Advertisements

Similar presentations

Naïve-Bayes Classifiers Business Intelligence for Managers.

Advertisements

指導教授：陳良弼老師報告者：鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Representing and Querying Correlated Tuples in Probabilistic Databases

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.

PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.

Fast Algorithms For Hierarchical Range Histogram Constructions

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Maintaining Sliding Widow Skylines on Data Streams.

1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.

Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.

SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.

Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.

Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.

CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Efficient Processing of Top-k Spatial Preference Queries

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp

Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.

1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.

Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.

Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.

CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

A paper on Join Synopses for Approximate Query Answering

Probabilistic Data Management

Chapter 15 QUERY EXECUTION.

CS & CS Probabilistic Data Management

Pyramid Sketch: a Sketch Framework

CS & CS ST: Probabilistic Data Management

Probabilistic n-of-N Skyline Computation over Uncertain Data Streams

Presented by: Mahady Hasan Joint work with

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1

Outline 1. Introduction 2. Contributions 3. Problem Definition 4. Algorithm k-Hit_Alg - Geometry Property 5. Experiment 6. Conclusion 2

1. Introduction An online car selling service place a car showroom in a prominent place of its webpage, where a very limited number of cars are shown. How can we determine which cars to be shown to the users? 3

1. Introduction We expect to show “good” cars to the users. Here, a "good" car means a car that a user would be interested in. In the literature, we are given a utility function for the user. With this utility function, existing studies could find the best car that this user is interested in. In this paper, we are given a distribution of users' utility functions (called probabilistic utility functions). 4

1. Introduction How to obtain the distribution? User log information Statistics from the users The distribution has been used by existing systems/models User’s recommendation systems Bayesian Learning models User’s preference elicitations The distribution on utility functions can be applied over not only a group of users but also a single user. This can help companies a lot for decision-making such as an effective promotion to a particular group of users. 5

1. Introduction Existing Approaches Top-k query Skyline query 6

1. Introduction Top-k queries and skyline queries have their own limitations. Top-k queries assume that the utility function of a single user is known. But, it is possible that the utility function of a single user is not known in some cases. Skyline queries may return too many outputs. 7

1. Introduction Top-k queries Top-k queries on certain databases with certain utility functions (ICDE12, TKDE07) Top-k queries on uncertain databases with certain utility functions (ICDE07, SIGMOD08) Top-k queries on certain databases with uncertain utility functions (our work) Other queries Skyline queries (ICDE01,SIGMOD06) K-Regret queries (VLDB10,SIGMOD12,ICDE14) Order-based skyline queries (SIGMOD10) 8

1. Introduction 9

2. Contributions We propose a novel k-hit query in the presence of the distribution of utility functions to maximize the number of users who are interested in the selection set. We propose algorithms using geometry properties for answering the k-hit query. 10

3. Problem Definition 11

3. Problem Definition 12

3. Problem Definition Different Settings: The form of a utility function Linear case Non-linear case The distribution of utility functions Uniform distribution Non-uniform distribution 13 (with geometry properties) f = 2x[1] + 3x[2] + x[3]

4. Algorithm k-Hit_Alg - Geometry Property 14

4. Algorithm k-Hit_Alg - Geometry Property We use the concept of “solid angle” used in the computational geometry to compute the hitting probability of a point. Solid angle is a measure of how large the object appears to an observer looking from that point. 15

An Example in 2-d 16

An Example in 2-d 17

4. Algorithm k-Hit_Alg - Geometry Property We have just given a formal definition of the solid angle. Next, we will present our new concept called “hitting solid angle”, one component of our problem. After that, we will give the relationship between the solid angle and the hitting solid angle. 18

4. Algorithm k-Hit_Alg - Geometry Property 19

4. Algorithm k-Hit_Alg - Geometry Property 20

4. Algorithm k-Hit_Alg - Geometry Property 21

4. Algorithm k-Hit_Alg - Geometry Property 22

4. Algorithm k-Hit_Alg - Geometry Property 23

4. Algorithm k-Hit_Alg - Geometry Property 24

5. Experiment Datasets Household dataset (6d): n = , d = 6 Household dataset (10d): n = , d = 10 NBA dataset: n = 21962, d = 5 Color dataset: n = 68040, d = 9 Stocks dataset: n = , d = 5 25

Experiments on “NBA” dataset 26 For the NBA dataset, k-hit_Alg gives the greatest hitting probability with a reasonable query time (within 1 second), using memory of a reasonable size (within 400 KB).

6. Conclusion We proposed a k-hit query, which returns a set of k tuples such that the probability that one of the k tuples is considered as the “best” tuple by a user is maximized. We proposed an algorithm called k-hit_Alg for answering the k-hit query. 27

THANK YOU! 28 ANY QUESTIONS?