Boolean + Ranking: Querying a Database by K-Constrained Optimization Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi.

Slides:



Advertisements
Similar presentations
Review: Search problem formulation
Advertisements

Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
State Space 3 Chapter 4 Heuristic Search. Three Algorithms Backtrack Depth First Breadth First All work if we have well-defined: Goal state Start state.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.
CSC 423 ARTIFICIAL INTELLIGENCE
CS171 Introduction to Computer Science II Graphs Strike Back.
Search in AI.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
MAE 552 – Heuristic Optimization Lecture 3 January 28, 2002.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
MAE 552 – Heuristic Optimization Lecture 4 January 30, 2002.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Query Processing & Optimization
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Fast Failover for Control Traffic in Software-defined Networks Globecom 2012 Neda B. & Ying Z. Presented by: Szu-Ping Wang.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Graphs II Robin Burke GAM 376. Admin Skip the Lua topic.
Search.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Querying Structured Text in an XML Database By Xuemei Luo.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Informed search strategies Idea: give the algorithm “hints” about the desirability of different states – Use an evaluation function to rank nodes and select.
Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented.
For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
Lecture 3: Uninformed Search
1 Outline:  Optimization of Timed Systems  TA-Modeling of Scheduling Tasks  Transformation of TA into Mixed-Integer Programs  Tree Search for TA using.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Introduction to Artificial Intelligence Class 1 Planning & Search Henry Kautz Winter 2007.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
Lagrangean Relaxation
Search Techniques CS480/580 Fall Introduction Trees: – Root, parent, child, sibling, leaf node, node, edge – Single path from root to any node Graphs:
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Artificial Intelligence Solving problems by searching.
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Boolean + Ranking: Querying a Database by K-Constrained Optimization
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Seung-won Hwang, Kevin Chen-Chuan Chang
Department of Computer Science
Heuristic Search Introduction to Artificial Intelligence
Spatio-temporal Pattern Queries
Machine Learning for Online Query Relaxation
A* Path Finding Ref: A-star tutorial.
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Boolean + Ranking: Querying a Database by K-Constrained Optimization Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi Chang Presented By Rashmi Pagadala( )Swetta Bhaskar( )

Introduction K- Constrained Optimization Query Query Q = ( B, O, k ) B – Qualifying Constraint O – Quantifying Constraint k – number of tuples

Many queries naturally combine Boolean and ranking Information retrieval Ranking query: Top 5 ranked by gpa + Database applications on Web Traditional databases Boolean query: dept = CS and year = 2 Qualifying constraint Quantifying function R: gpa B: dept = CS and year = 2 Find top answers

Boolean + Ranking form a coherent goal function Boolean B + Ranking R = Goal function G For a tuple t G(t) = B(t)*R(t) = R(t) if B(t) is true 0 if B(t) is false (ie, lowest score)

Motivating scenarios Data retrieval:  Find houses in certain price range with good price/sqrft ratio Data analysis:  Find products with highest sale increase in consecutive years Select h.address from House h Where h.price ≤ 200k ν h.price ≥ 400k Order by h.size/|h.price-300k| Limit 1 Select h.address from House h, CrimeRate c Where h.price ≤ 200k ν h.price ≥ 400k and h.zipcode = c.zipcode Order by h.size/|h.price-300k| *c.crimerate -1 Limit 10 Select itemid from Sales s1, Sales s2 Where s1.itemid = s2.itemid and s2.year – s1.year = 1 Order by s2.sale – s1.sale Limit 10

Current techniques lack of global search mechanism If evaluated as separate operators If search by an overall goal function G as a ranking function Boolean query B ……… Ranking query R  Current techniques restrict G to be monotonic  Current techniques optimize only condition-by-condition D Boolean query B Ranking query R D RB Goal function G

The nature of Boolean + Ranking is K-constrained optimization query Optimize goal function G over database D h.size/|h.price-300k| [ h.price ≤ 200k ν h.price ≥ 400k ] AddrZipPriceSize 1.Oak park, Chicago K Mattis, Champaign K … 150K … 250K … 300K … 80K500 Goal function G Database D D G

Our Goal: Evaluate query as its nature suggests! Optimize G over D Function optimization of G Discrete state search over D G D D OPT*

Query Mechanism Discrete State Search Search over a discrete set of index nodes to find the satisfying data tuples Continuous Function Optimization Optimize the goal function G over the domain of a database

Challenge 1: What is the search mechanism?

We encode as A* because it’s optimal What A* is: Finding the shortest path Why we choose: Completeness and optimality with proper heuristics  Complete: guarantee to find shortest path  Optimal: visit least number of nodes origin destination

The To do : Discrete state search perspective ( indices) Shortest path problem (Encoding)

Index –Induced State Space: A*-Driven Construction K constrained optimization State space induced by indices: state & route I i over relation D i I i= (V,E), where V=RUT Dom(n i ),n i €R The reachable node depends on t he type of n Each node is referred to as I.V and I.E.

We view compound index as discrete space ……… b1b1 b3b3 b2b2 b7b7 b6b ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k)

We view compound index as discrete space M 11 M 22 M 32 M 23 M 33 M 66 M 77 M67M67 M 76 M 55 M 56 M ……… b1b1 b3b3 b2b2 b7b7 b6b ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k) M ij =(a i, b j ) … …

We view compound index as discrete space M 11 M 22 M 32 M 23 M 33 M 66 M 77 M67M67 M 76 M 55 M 56 M ……… b1b1 b3b3 b2b2 b7b7 b6b ……… a1a1 a6a6 a3a3 a2a2 a7a7 size Price (k) M ij =(a i, b j ) conceptually, combined space …

Mapping the Space : state & transition States: Index graph,Composite graph Composite graph is categorized: Region U Tuples Region state: #Internal state #Leaf state Tuple state Transition: Internal state-Branch in( top down ) Leaf state-Branch out( bottom up) Cartesian product + intersection of node

Example Given a set of states constructed from the set of index graph I considering the transitions among the states. To reach tuple 1. The search, in principle, should follow those transitions to look for the tuple states maximizing the goal function. For instance, suppose we decide to start from the root of the graph M11. This essentially follows a top-down search strategy. The search may follow the path M11 →M33 → M77 → 1 to reach the target tuple state. Alternatively, as a bottom-up search, suppose we start fromM67, the search may follow an alternative route M67 → M77 → 1.

Encoding our problem into shortest path is challenging  How to encode:  a tuple  a path?  score of tuple  distance of path? K-constrained optimization Find a tuple with maximal score Shortest path Find a path with minimal distance

Therefore, we encode K-constrained opt. as: How to encode a tuple to a path?  Adding a virtual target t* only reachable through tuples How to encode maximal tuple with minimal path?  Quality of path depends solely on the tuple it passes by For tuple state t D(t, t*) = - G(t) For two states r, u D(r, u) = 0 M 55 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 75 M t* G(4) - G(1) 0 0 …

Challenge 2: How to guide the search?

We use function opt. to sketch the landscape of G Function optimization measures quality of states Function optimization enables:  1. How to define heuristics?  2. How to configure space?  3. Where to start the search? Return local optima O and upper bound score U

1. Define admissible heuristics: Measure tightest upper bound H(region) = OPTMAX(G, region) ie, maximal value of G in the region To guarantee completeness  A* requires admissible heuristics, ie, estimate optimistically To ensure admissible heuristics  Function optimization gives tightest upper bound Analytical approaches Numeric analysis package

2. Configure descending space: disconnect uphills To guarantee optimality  A* requires descending heuristics To ensure descending heuristics  Remove uphill links M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M …

Find right start point: Start from local optima To guarantee correctness  Every tuple state must be reachable from start states  Taking only downhills requires start with high points To ensure reachability  Initial states should contain all local optima M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M …

OPT SEARCH ALGORITHM

Putting together: Executing OPT* on the configured space M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M M 57 … Search is implemented as priority queue driven traversal top-down

Putting together: Executing OPT* on the configured space Bottom-up approach is always better than top- down M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M M 57 M 11 M 22 M 32 M 23 M 33 M 66 M 77 M 67 M 76 M 55 M 75 M M 57 … … top-down bottom-up

Experiments Comparison vs.  Boolean then ranking  Ranking then boolean Metrics: node accessed = N l + N t Settings:  Benchmark queries over real dataset  Controlled queries over synthetic dataset

Benchmark queries Datasets:  19,706 real estate listing crawled online Queries  Q1: size * bedrms/| price-450k| : [40k<=price<=50k]  Q2: size * e bedrms / |price-350k| : [price 4000]  Q3: size/price : [bedrms=3 ν bedrms=4] BR_unclustered BR_clustered OPT* Q1Q2Q3

Controlled queries Datasets  Three randomly generated datasets of 100k points Uniform, gaussian, logvariatenormal Queries  Linear average queries: (eg, 0.4*a + 0.6*b)  Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2)  Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d)

Conclusion Problem  Study K-constrained optimization queries as boolean + ranking Abstraction  Encode K-constrained optimization into shortest path problem Framework  Develop OPT* to process K-constrained optimization

References: Boolean + Ranking: Querying a Database by K-Constrained Optimization Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi Chang W. H. Press, S. A. Teukolsky, W. T.Vetterling, and B. P.Flannery. CAMBRIDGE UNIVERSITY PRESS, 2 nd Edition, www-forward.cs.uiuc.edu/talks/2006/asopt-sigmod06-zzhang-jun06

THANK YOU ! Questions?