Enriching Taxonomies With Functional Domain Knowledge

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.
Knowledge Base Completion via Search-Based Question Answering
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Date : 2013/09/17 Source : SIGIR’13 Authors : Zhu, Xingwei
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Presented by Zeehasham Rasheed
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Helping Editors Choose Better Seed Sets for Entity Set Expansion Vishnu Vyas, Patrick Pantel, Eric Crestan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/05/10.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
DeepWalk: Online Learning of Social Representations
ORec : An Opinion-Based Point-of-Interest Recommendation Framework
Customized of Social Media Contents using Focused Topic Hierarchy
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Open question answering over curated and extracted knowledge bases
A Brief Introduction to Distant Supervision
Semantic Processing with Context Analysis
Vector-Space (Distributional) Lexical Semantics
A Large Scale Prediction Engine for App Install Clicks and Conversions
Word Embedding Word2Vec.
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
[jws13] Evaluation of instance matching tools: The experience of OAEI
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Learning Literature Search Models from Citation Behavior
A Framework for Benchmarking Entity-Annotation Systems
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Intent-Aware Semantic Query Annotation
Sourse: Www 2017 Advisor: Jia-Ling Koh Speaker: Hsiu-Yi,Chu
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization
Clinically Significant Information Extraction from Radiology Reports
Speaker: Ming-Chieh, Chiang
Deep Interest Network for Click-Through Rate Prediction
Discovering Important Nodes through Graph Entropy
Human-object interaction
Ask and Answer Questions
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Heterogeneous Graph Attention Network
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Enriching Taxonomies With Functional Domain Knowledge Advisor: Jia-Ling, Koh Source: ACM SIGIR '18 Speaker: Shao-wei, Huang Date: 2019/02/12

Outline Introduction Method Experiment Conclusion

Outline Introduction Method Experiment Conclusion

Introduction Augment a taxonomic hierarchy by accurately placing novel or unfamiliar concepts in it at an appropriate location and level of granularity. 儲水蛙 胃育蛙 棕樹蛙 湍雨濱蛙 白氏樹蛙 綠紋樹蛙 IUCN無危物種 雨濱蛙屬 澳大利亞兩棲類 溪蟾亞科 溪蟾屬 IUCN絕滅物種 無危物種 IUCN紅色名錄 澳雨蛙亞科 滅絕物種 澳大利亞動物 兩棲類 龜膽科 Category New entity 大雨蛙 Entity

Introduction Challenges Detection of previously unknown entities or concepts. Insertion of the new concepts into the knowledge structure, respecting the semantic integrity of the created relationships

Introduction Problem definition & Framework Input Output A corpus of unseen, unlabeled, unstructured text documents containing a set of new concepts X. A taxonomic hierarchy of categories and entities, T. A corpus of text documents D associated with the concepts present in T. Output If concept x not in T, then a solution x into T by outputting a set of semantically appropriate parents in T for x.

Outline Introduction Method Experiment Conclusion

Method Find concepts and Taxonomic relations Acquire the entities and categories from the given taxonomic structure and utilize their ancestor-descendant relationships to construct T. Use Bi-LSTM and CRF to extraction new concept from text corpus.

Method 𝑒 𝑖 𝑒 𝑖 Learning concept representations Exist concept doc2vec解說 Method Learning concept representations Exist concept Word2vec: Doc2vec:replace all occurrences of entity names in the corpus D of existing text documents(removes a great deal of ambiguity from the corpus). 𝑒 𝑖 Aggregate 𝑒 𝑖 New concept Word2vec:add the tf-weighted sum of its context terms.(approximate) Doc2vec:generate c as targets and infer a doc2vec by keeping the current doc2vec model. Aggregate

Method Filter potential parents Potential parents: Find the k-nearest neighbours(kNNs) of new concept. The potential parents for the new concepts are highly likely to be in common with the ancestors of these kNNs.

Method Ranking potential parents Learning to rank: Pairwise model

Method Ranking potential parents Graph-based features: Katz Similarity Random Walk Betweenness Centrality Information Propagation Score Semantic features: Ancestor-Neighbor Similarity New concept-Ancestor Similarity Pointwise Mutual Information (PMI)

Method p x Ranking potential parents Katz Similarity(Graph-base) (EX) Assume 𝑙 𝑚𝑎𝑥 = 3 and η = 0.2 l = 1  = 0.2 1 * 1 l = 2  = 0.2 2 * 1 l = 3  = 0.2 3 * 2

Method 1 2 3 4 Ranking potential parents Random Walk Betweenness Centrality(Graph-base) How often a node be passed by. (EX) 1 2 3 4 𝑉 1 𝑉 2 𝑉 3 𝑉 4 𝐼 1 𝐼 2 𝐼 3 𝐼 4 S=1 ; t=2 1 S=1 ; t=3 -1 S=1 ; t=4 2 S=2 ; t=3 S=2 ; t=4 S=3 ; t=4 s:起點 t:終點

Method p 𝑥 1 = ( 1−0.005 ∗𝐼𝑃(𝑢) 1 0.75 ∗ 𝑒 2∗0.005 ) Ranking potential parents Information Propagation Score(Graph-base) (EX) 𝑥 1 p Parameter: 𝛿= 0.005, 𝛼 = 0.75, 𝛽 = 0.005 = ( 1−0.005 ∗𝐼𝑃(𝑢) 1 0.75 ∗ 𝑒 2∗0.005 ) + ( 1−0.005 ∗𝐼𝑃(𝑢) 2 0.75 ∗ 𝑒 2∗0.005 ) 𝑢 1 𝑢 2 = 1−0.005 ∗0.5 3 0.75 ∗ 𝑒 1∗0.005 𝑢 1 v 𝑢 2 w = 0.5 u

Method Ranking potential parents Ancestor-Neighbor Similarity(Semantic-base) Term overlap between the text document linked to its potential parents, and that associated with each of its nearest neighbor entities. New concept-Ancestor Similarity(Semantic-base) Computes the Jaccard similarity between the document associated with the new concept, and those associated with potential parents. Pointwise Mutual Information (PMI)(Semantic-base)

Outline Introduction Method Experiment Conclusion

EXPERIMENT Dataset Wikipedia: WordNet: This taxonomy exceeds 5 million concepts. WordNet: This dataset contains 1000 concepts, split into a training set of 400 and a testing set of 600 concepts, which are either nouns or verbs.

EXPERIMENT Wikipedia

EXPERIMENT Wikipedia

EXPERIMENT WordNet Wu&P:predicted parent 𝑥 1 and the true parent 𝑥 2 as Wu&P( 𝑥 1 , 𝑥 2 ) = Lemma Match:how often the system picked the right word but wrong sence. Recall:the percentage of test concepts for which an output ancestor was identified by an algorithm. F1 score:Harmonic mean of theWu&Palmer score and recall.

EXPERIMENT 危及應對 醫學領域 2015年埃里卡熱帶風暴 2015年智利Illapel地震 2014年渥太華連環槍擊案 禽流感 豬流感

Outline Introduction Method Experiment Conclusion

Conclusion We propose a solution to the problem of automated taxonomy enrichment, where we insert new concepts at appropriate positions into an existing taxonomy. Our proposed approach ETF learns a high-dimensional vector embedding via a generated context of terms, for each existing and new concept. We then predict the potential parents of the new concepts from the ancestors of their nearest neighbors, by ranking them based on semantic and graph features.