CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN EMAIL USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling,
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Sentiment Diversification with Different Biases Date : 2014/04/29 Source : SIGIR’13 Advisor : Prof. Jia-Ling, Koh Speaker : Wei, Chang 1.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Chapter 23: Probabilistic Language Models April 13, 2004.
Algorithmic Detection of Semantic Similarity WWW 2005.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Contextual Search and Name Disambiguation in using Graphs
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Learning to Rank Typed Graph Walks: Local and Global Approaches
Connecting the Dots Between News Article
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun 1

OUTLINE Introduction as a graph Graph similarity Learning Evaluation Conclusion 2

INTRODUCTION In modern IR settings, documents are frequently connected to other objects, via hyperlinks or meta-data It is important to understand how text-based document similarity measures can be extended to documents embedded in complex structural settings This paper considers schemes for propagating similarity across a graph that naturally models a structured dataset like an 3

AS A GRAPH 4 A graph consists of nodes and labeled directed edges Edge labels determine the source and target node type Multiple relations can hold between any particular pair nodes of types It could be that or, where The edges need not denote functional relations Inverse label: the graph will definitely be cyclic

GRAPH SIMILARITY * EDGE WEIGHTS Similarity between two nodes is defined by a lazy walk process, and a walk on the graph is controlled by a small set of parameters Θ To walk away from a node x, one first picks an edge label l ; then, given l, one picks a node y such that The probability of picking the label l depends only on the type T(x) of the node x Once l is picked, y is chosen uniformly from the set of all y such that. That is, the weight of an edge of type l connecting source node x to node y is 5

GRAPH SIMILARITY * GRAPH WALKS Associate nodes with integers, and make M a matrix indexed by nodes, then a walk of k steps can then be defined by matrix multiplication If V 0 is some initial probability distribution over nodes, then the distribution after a k -step walk is proportional to V k = V 0 M k In our framework, a query is an initial distribution V q over nodes, plus a desired output type T out, and the answer is a list of nodes y of type T out, ranked by their score in the distribution V k 6

LEARNING Re-order an initial ranking The advantage is that the learned classifier can take advantage of “global” features that are not easily used in a walk M. Collins and T. Koo. Discriminative reranking for natural language parsing. Computational Linguistics,

EVALUATION * CORPORA In this paper we evaluate the system on two tasks: Person name disambiguation threading Corpora: Cspace corpus Enron corpus 8

EVALUATION * PERSON NAME DISAMBIGUATION Task definition: Automatically determining that “Andrew” refers to “Andrew Y. Ng” and not “Andrew McCallum” is especially difficult when an informal nickname is used, or when the mentioned person does not appear in the header Based on a name-mention in an message m, we formulate query distribution V q, and then retrieve a ranked list of person nodes 9

EVALUATION * PERSON NAME DISAMBIGUATION Results for person name disambiguation 10

EVALUATION * PERSON NAME DISAMBIGUATION 11

EVALUATION * THREADING Threading is the problem of retrieving other messages in an thread given a single message from the thread Users make inconsistent use of the “reply” mechanism, and there are frequent irregularities in the structural information that indicates threads Thread information can improve message categorization into topical folders Given an file as a query, produce a ranked list of related files, where the immediate parent and child of the given file are considered to be “correct” answers 12

EVALUATION * THREADING 13

EVALUATION * THREADING 14

CONCLUSION A scheme for representing a corpus of messages with a graph of typed entities An extension of the traditional notions of document similarity to documents embedded in a graph Using boosting-based learning scheme to rerank outputs based on graph-walk related features provides an additional performance improvement 15

CONCLUSION Preserving entity type allows one to formulate a broad range of problems as typed search queries Structural relation modeling provides a unified framework for integration of multiple types of information 16