Exploiting Relevance Feedback in Knowledge Graph Search

Slides:

Advertisements

Similar presentations

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

Optimizing search engines using clickthrough data

Extracting Semantic Relationships Between Wikipedia Articles Lowell Shayn Hawthorne Suzette Stoutenburg Supervisor: Jugal Kalita University of Colorado.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

Navigating the Intranet with High Precision Huaiyu Zhu Alexander L¨oser Sriram Raghavan Shivakumar Vaithyanathan.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Disambiguation Algorithm for People Search on the Web Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Rabia Nuray-Turan, Naveen Ashish For questions.

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,

Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Overview of Search Engines

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Information Retrieval in Practice

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.

Using Hyperlink structure information for web search.

A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Querying Structured Text in an XML Database By Xuemei Luo.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Query Suggestions Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.

Universit at Dortmund, LS VIII

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.

LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Question Answering over Implicitly Structured Web Content

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Post-Ranking query suggestion by diversifying search Chao Wang.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.

DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.

Neighborhood - based Tag Prediction

Associative Query Answering via Query Feature Similarity

Web IR: Recent Trends; Future of Web Search

Disambiguation Algorithm for People Search on the Web

Table Cell Search for Question Answering Huan Sun

ISWC 2013 Entity Recommendations in Web Search

Combining Keyword and Semantic Search for Best Effort Information Retrieval Andrew Zitzelberger 1.

CS246: Information Retrieval

Topic: Semantic Text Mining

Presentation transcript:

Exploiting Relevance Feedback in Knowledge Graph Search Xifeng Yan University of California at Santa Barbara with Yu Su, Shengqi Yang, Huan Sun, Mudhakar Srivatsa, Sue Kase, Michelle Vanni Thanks much for the introduction…Good morning, everyone … Facility: Impacts (healthcare, intelligent policing, patent, bing search), technical merits (challenges: knowledge base, deep learning) + Benefit: 1. project 2. Apple siri: problem failed: one laughable question 1.

Transformation in Information Search 4/25/2017 Transformation in Information Search Desktop search Mobile search Hi, what can I help you with? “Which hotel has a roller coaster in Las Vegas?” Read Lengthy Documents? Direct Answers Desired! The way people search information is currently undergoing a transformation from traditional desktop search, where people type keywords in search engines, to mobile search, where people can directly speak to the smart phones and get the information they want. Such mobile search brings much more convenience to our life. Think about this scenario: In this coming holiday season, suppose you are traveling with your little kids to Las Vegas, little kids suddenly came up with the idea to ride a roller coaster when you are driving. But you need the information about which hotel has a roller coaster and how to go there. In this case, you can turn to your smart phone for help. Instead of typing keywords and reading webpages on the small screen, you can directly speak to the personal assistants such as Siri or google now on the phone, and get the exact answer. New York-New York hotel Answer:

Strings to Things Since text is the main carrier of knowledge, it is natural to extract knowledge from text. For instance, we can construct an structured knowledge graph from the large volume of text data, where nodes are entities and edges encode relationships among entities. Such formal and structural representation of knowledge has the advantage of being easy to manage and reason with, which can greatly facilitate many AI applications. Despite the big attraction, automatically extracting knowledge from text is quite challenging. For computer system, texts are just strings, or combinations of letters. It has no idea what they really mean. In order to distill knowledge from text, we have to move from strings to things.

Knowledge Graphs tagged listen watch follow Video Photo music Business Univeristy join Yellowstone NP Photo: bison Mammal Football Team class City Country

Broad Applications Customer Service Healthcare Business Intelligence RoboBrain [Saxena et al., Cornell & Stanford] Customer Service Healthcare Robotics Business Intelligence Enterprise search Query resolution, or more broadly, knowledge dicovery, has a variety of application scenarios and a great prospect. For example, in customer service, specialized agents need to be found to efficiently resolve queries from customers. Similarly important in healthcare and BI domains. By mining massive EHR, scientific reports, etc., a doctor-assistant system can help doctors to find individualized treatment options; in other businesses, decision-support systems can help humans make critical decisions by analyzing massive enterprise data. IBM cognitive systems Watson is one of the pioneering systems in these areas. (IBM Watson for Oncology is doing this with a famous cancer center Memorial Sloan Kettering center…) ; (other decision making such as product positioning, pricing …) Big companies : IBM watson (health care, travel and insurance services), Gartner, HG data, ca Query resolution and knowledge discovery assist humans to acquire knowledge and make decisions. In fact, Not only humans need such systems, but also More interestingly, not only humans could query, but also robots. Robots need them. Currently researchers in Cornell and Stanford are building a query engine for robots called Robobrain. Given a task, a robot can query robobrain to get the knowledge that In this way, robots can execute commands in a more active and autonomous manner in the future, by querying the information they need. So, in this exciting area, what have I been doing ? effectively mine (cause and treatment) clues for a certain disease and predict the potential outcome of a certain treatment from massive scientific reports / records. Business intelligence, some real example. Last, not only humans have queries to resolve, but also for robots, researchers from Cornell and Stanford. Intelligent Policing

Certainly, You Do Not Want to Write This! 4/25/2017 Certainly, You Do Not Want to Write This! “find all patients diagnosed with eye tumor” Another option is to use graph query languages, such as SPARQL queries. They can specify a result in a very exact way, with strictly defined syntax and grammar. But they become so complex that it is very difficult for people to learn how to use them. What I’m showing here is a SPARQL query to search NCIthesaurus, a network of medicine terminologies. To accurately find patients with eye tumor, a doctor has to write such a query specifying that eye tumor is actually related to A set of disease via a relationship named ‘synonym. Can you see the problem here? How can we assumes that a user already know what a data graph looks like or what the answer exactly looks like when he tries to look for it? Unfortunately this dilemma seems to be ignored by most graph query languages! “Semantic queries by example”, Lipyeow Lim et al., EDBT 2014

Search Knowledge Graphs Structured search: Exact schema item + structure Precise and expressive Information overload: too complex schema Keyword search: Free keywords, no structure User-friendly Low expressiveness

Natural language query Graph Query Geoffrey Hinton (1947-) Prof., 70 yrs. Find professors at age of 70, who works at Toronto and joined Google recently. Univ. of Toronto Toronto Google DNNResearch Google Natural language query Graph query A result (match)

Geoffrey Hinton (Professor, 1947) Schema-less Graph Querying (SLQ, VLDB 2014) Users freely post queries, without any knowledge on data graphs. SLQ finds results through a set of transformations. Query A Match Prof., ~70 yrs Google UT Geoffrey Hinton (Professor, 1947) University of Toronto DNNresearch Google Our recent work on schema-less graph querying (SLQ for short) aims at alleviating such challenges. Under our proposed querying framework, The application scenario is that users can freely post queries, without possessing any knowledge of the underlying data. The methodology for our querying framework is to find the matches of a user query based on a set of transformations. Given a query, by the following transformations, we can find a match in the knowledge graph to this query. Acronym transformation: ‘UT’  ‘University of Toronto’ Abbreviation transformation: ‘Prof.’ ‘Professor’ Numeric transformation: ‘~70’  ‘1947’ Structural transformation: an edge  a path 9

Geoffrey Hinton (Professor, 1947) 4/25/2017 Evaluate a Candidate Match: Ranking Function Query: Candidate Match: Prof., ~70 yrs Google UT Geoffrey Hinton (Professor, 1947) University of Toronto DNNresearch Google ? Features Node matching features: Edge matching features: Matching Score Under the set of transformations, there could be many matches for a given query. Which transformations are more important or can induce more relevant matches? We do not know at the very beginning. We learn the weights or importance of different transformations by designing a ranking function and rank relevant matches as high as possible. In our work, we resort to a well-studied conditional random fields (CRFs) model to model graph querying. Conditional random fields model is not the only choice here. We select it because it fits our problem setting and is well studied in terms of parameter learning and inference. In our formulation, finding a good match to a user’s query, is treated as finding the most likely configuration of latent variables in CRFs, given the observed variables. Given a graph query Q and one possible match phi(Q) from the knowledge graph, we evaluate the relevance of phi(Q) , based on the conditional probability formulated under the CRFs framework. Based on the training set where we have the relevant and irrelevant matches, we can learn the parameters in the model. 10

Query-specific Ranking via Relevance Feedback Generic ranking: sub-optimal for specific queries By “Washington”, user A means Washington D.C., while user B might mean University of Washington Query-specific ranking: tailored for each query But need additional query-specific information for further disambiguation Where to get? From users! Relevance Feedback: Users indicate the (ir)relevance of a handful of answers

Problem Definition Graph Relevance Feedback (GRF): Generate a query-specific ranking function for based on and

Query-specific Tuning The represents (query-independent) feature weights. However, each query carries its own view of feature importance Find query-specific that better aligned with the query using user feedback User Feedback Regularization

Type Inference Infer the implicit type of each query node The types of the positive entities constitute a composite type for each query node Positive Feedback Candidate Nodes Query

Context Inference Entity context: neighborhood of the entity The contexts of the positive entities constitute a composite context for each query node Query Positive Entities Candidates

The Next Action with the New Ranking Function Re-searching Re-ranking How Search the data graph again Re-rank the initial answer list Pros Find new relevant answers, probably higher accuracy Save time Cons Time-consuming May lose some good answers It’s a query-dependent decision Many underlying factors may affect this decision Lead to a trade-off between answer quality and runtime

A Predictive Solution Build a binary classifier to predict the optimal action for each query Key: Training set construction Feature extraction Query, match and feedback features Convert each query into a 18-dimensional feature vector Label assignment Assign a label to each query in the training set specifies the preference between answer quality and runtime.

Experiment Setup Base graph query engine: SLQ (Yang et at., 2014) Knowledge graph: DBpedia (4.6M nodes, 100M edges) Graph query sets: WIKI (50) and YAGO (100) Graph Query WIKI Wikipedia List Page Structured Information need Links between Wikipedia and DBpedia Ground Truth … …

Experiment Setup Base graph query engine: SLQ (Yang et at., 2014) Knowledge graph: DBpedia (4.6M nodes, 100M edges) Graph query sets: WIKI (50) and YAGO (100) Graph Query YAGO YAGO Class Naval Battles of World War II Involving the United States Structured Information need Links between YAGO and DBpedia Instances Ground Truth Battle of Midway Battle of the Caribbean … … … …

Exp 1. Overall performance of different GRF variants (a) WIKI (b) YAGO Exp 1. Overall performance of different GRF variants

Answer Quality vs. Runtime Exp 3. Tradeoff between answer quality and runtime