Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Fast Algorithms For Hierarchical Range Histogram Constructions

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

The Theory of NP-Completeness

1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.

CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le

Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.

Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.

A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.

The Theory of NP-Completeness

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.

9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.

Two Discrete Optimization Problems Problem #2: The Minimum Cost Spanning Tree Problem.

Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.

Approximation Algorithms

TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.

1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.

Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.

Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:

A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.

A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.

Querying Structured Text in an XML Database By Xuemei Luo.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Approximation Algorithms

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.

CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.

For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.

1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

Presenter ： Kuang-Jui Hsu Date ： 2011/3/24(Thur.).

Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.

COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

A paper on Join Synopses for Approximate Query Answering

Associative Query Answering via Query Feature Similarity

Analysis and design of algorithm

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques

Overview Keyword Search in Graphs/Relational Databases r-clique Definition Challenges in Finding r-clique Approximation Algorithm for Finding r-cliques Enumerating Top-k r-cliques in Polynomial Delay Empirical Results Conclusion 2/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Keyword Search in Graphs/Relational Databases Keyword search is a well known mechanism for retrieving relevant information from a set of documents. Google is a familiar example ! What about structured data? Such as XML documents or Relational Databases? Current enterprise search engines in structured data requires: Knowledge of schema Knowledge of a query language Knowledge of the role of the keywords Do users have all of the above Knowledge ? The answer is NO ! 3/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Keyword Search in Graphs/Relational Databases Users need a simple system that receives some keywords as input and returns a set of nodes that together cover all or part of the input keywords as output. Relational databases can be modeled using graphs: Tuples are nodes of the graph. Foreign key relationships are edges that connect two nodes (tuples) to each other. 4/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Example: Search in Relational Databases IDName Country 22TorontoCA 16New YorkUS IDNameHead Q. 135UN16 175EU81 CountryOrg. CA135 US135 CodeName CACanada USUnited States CitiesOrganizations CountriesMemberships 5/28 Part of Mondial Dataset VLDB’11 Keyword Search in Graphs: Finding r-cliques

New York is Located in United States IDName Country 22TorontoCA 16New YorkUS IDNameHead Q. 135UN16 175EU81 CountryOrg. CA135 US135 CodeName CACanada USUnited States CitiesOrganizations CountriesMemberships Keywords : “New York” “United States” 6/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

New York hosts UN and Canada is a member IDName Country 22TorontoCA 16New YorkUS IDNameHead Q. 135UN16 175EU81 CountryOrg. CA135 US135 CodeName CACanada USUnited States CitiesOrganizations CountriesMemberships Keywords : “New York” “Canada” 7/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Previous Approaches Most of the works find minimal connected trees that contain all or part of the input keywords. The tree is called Steiner Tree. Recently, methods that produce sub-graphs are proposed. They might provide more informative answers One of the recent approaches is called multi-center community (ICDE 2009). So, what is the problem with previous approaches? 8/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Problems with Previous Approaches 1. There might be some content nodes that are far away from each other. It means that weak relationships among content nodes might exist. There is no guarantee on the closeness of the nodes. Since all keywords are equally important, all of them should be close to each other. They are also equally important in the ranking function. 2. While searching for the answers, current methods explore both content and non-content nodes. This might lead to poor performance. 9/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

r-cliques To solve the problem of previous approaches, we propose to find r-cliques. An r-clique is a set of content nodes that together contain all of the input keywords and in which the shortest distance between each pair of nodes is no longer than r. Weight of r-clique: Suppose that the nodes of an r-clique are denoted as {v 1, v 2, …, v n }. The weight of the r-clique is defined as: dist(v i,v j ) is the shortest distance between v i and v j. 10/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Benefits of Finding r-cliques Finding r-cliques as the answers for keyword search in graphs does not have the problems of previous approaches. All of the content nodes are reasonably close to each other. The weight function evaluates all of the content nodes equally. The algorithm (to be discussed later) for finding r-cliques concentrate on the content nodes rather than all of the nodes in the graph. So, it is faster and more efficient. For presenting the relationships, the final answer has less irrelevant nodes than a multi-center community. 11/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

An Example Input Keyword: James John Jack 12/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 12/28

r-clique weight: 12 tree weight: 8 community weight: 8 r-clique weight: 14 tree weight: 7 community weight: 7 13/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 13/28

Challenges in Finding r-cliques Problem 1: Given a distance threshold r, a graph G and a set of input keywords, find an r-clique in G whose weight is minimum. Theorem: Problem 1 is NP-hard. Proved in the paper by reduction from 3-satisfiability (3-SAT). Solution : Approximation algorithm with guaranteed ratio. Total number of answers is exponential regarding the number of input keywords. It is not efficient to generate all answers and then sort them. Solution : Enumerating answers in polynomial delay. 14/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 14/28

R-clique Problem NP-Hard Decision Version 3-sat Problem Reduction n words, m clause Step 1: construct nodes 2*n Step 2: construct keywords n + m Step3: construct weights 2* w, w/(n+m, 2) Step4: =>, <= Refer to Paper 15

Branch and Bound 16

What We Need … Producing r-cliques in a ranking order r-cliques with lower weights should be presented before ones with higher weights. Producing top-k r-cliques efficiently with a bound on approximation ratio Each r-clique must be generated efficiently in polynomial time. There must be a bound on the quality of a generated r-clique The weight of a generated r-clique should be within some factor of the current optimal solution Generating all the r-cliques if needed No r-clique should be missed 17/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 17/28

Heuristic and Approximate Order It is close to the optimal answer with a provable guarantee It is expected to be close to the optimal answer. But, we have no guarantee Heuristic Order Approximate Order Desired Choice 18/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Enumerating in Approximate Order The Lawler’s technique is used for finding the top-k answers. In each iteration, the next r-clique is generated by finding the top answer under constraints. Two problems should be solved 1- What are the constraints? 2- How top answer can be found efficiently under the constraints? 19/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 19/28

Overview of the System Input Keywords + Value of k Find best Answer with no Constraint Insert the best r-clique with the search space in priority queue Fetch the best r-clique from priority queue and print it Divide the related search space of the top answer into sub-spaces Find best r-clique in each sub-space with associated constrains Insert each answer with the related search space into priority queue Top-k already printed OR Empty priority queue ? YES Terminate NO 20/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 20/28

GenerateAnswers Algorithm Refer to Paper. 21

Constraints and Search Space Let’s do it using an example ! Suppose that the input keywords are {k 1, k 2, k 3, k 4 }. C i = {set of nodes that contains keyword k i }. The search space that contains the best r-clique can be represented as {C 1 C 2 C 3 C 4 }. Assume that the best r-clique is (v 1, v 2, v 3, v 4 ), where v i is a node containing keyword k i. 22/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques 22/28 The whole search space

FindTopRankedAnswer 23

Prove Theorem 2 & 3 Theorem 2 is obvious. Theorem 3: 24

Properties of the Approximation Algorithm Only content nodes are searched for finding the best answer in the search space. The approximation ratio of the algorithm is equal to 2. The weight of the answer is at most twice of the weight of the optimal answer. 25/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Presenting r-cliques to the User To show the relationship between the nodes in an r-clique, a Steiner tree is found and presented to the user. Distributed Parallel Algorithm For Nonlinear Optimization Without Derivatives A Binding Number Computation of Graph Xuping Zhang w w Keywords : (in DBLP dataset) “Parallel” “Algorithm” “Optimization” “Graph” Distributed Parallel Algorithm For Nonlinear Optimization Without Derivatives A Binding Number Computation of Graph Congying Han w w A New Non-interior Continuation Method for Second-Order Cone Programming Guoping He Xuping Zhang w www w r-clique community Irrelevant 26/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Experimental Results The r-clique is compared with the multi-center community method (it is called com-k). Our approximation algorithm is called poly-delay-k. Two datasets are used: DBLP and IMDb. The set of input keywords and parameters are the same as the community paper. 27/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Running Time DBLP Dataset IMDb Dataset 28/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Quality of the Answers DBLP Dataset 29/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Search Accuracy from a User Study DBLP Dataset 30/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques Top-k precision: the percentage of the answers in the top-k answers that are relevant to the query. The users are asked to evaluate the answers using two methods. In the first approach the scores (0-1) are assigned to the nodes. Then, the average is used as the precision. In the second approach, the whole answer is evaluated and a score is assigned to it. The results of both of the methods are similar.

Conclusion A novel and efficient approach for keyword search in graphs has been proposed. All of the content nodes are reasonably close to each other. An approximation algorithm with bounded guarantee has been proposed. Only content nodes are explored during the search process. A Steiner tree which has as small as possible number of middle nodes has been generated to reveal relations among content nodes. 31/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques

Thank you! Any Questions? 32/28 VLDB’11 Keyword Search in Graphs: Finding r-cliques