Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling,

Slides:



Advertisements
Similar presentations
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
LDA Training System 8/22/2012.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Generative Topic Models for Community Analysis
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
British Museum Library, London Picture Courtesy: flickr.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Automatic Selection of Social Media Responses to News Date : 2013/10/02 Author : Tadej Stajner, Bart Thomee, Ana-Maria Popescu, Marco Pennacchiotti and.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Date : 2014/01/14 Author : Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas Source : CIKM’13 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Automatic Suggestion of Query-Rewrite Rules for Enterprise Search Date : 2013/08/13 Source : SIGIR’12 Authors : Zhuowei Bao, Benny Kimelfeld, Yunyao Li.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Customized of Social Media Contents using Focused Topic Hierarchy
Where Did You Go: Personalized Annotation of Mobility Records
Sourse: Www 2017 Advisor: Jia-Ling Koh Speaker: Hsiu-Yi,Chu
Dynamic Supervised Community-Topic Model
Title Goes Here Title Goes Here Title Goes Here Title Goes Here
Heterogeneous Graph Attention Network
Connecting the Dots Between News Article
Presentation transcript:

Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling, Koh Speaker : Shun-Chen, Cheng

Outline  Introduction  Method  Experiment  Conclusions 2

Introduction 3 (Interdisciplinary) the hypothesis : interdisciplinary research can lead to new discoveries at a rate faster than that of traditional research projects conducted within single disciplines (single disciplines)

Introduction  Task : 4 Diversity score assign quantifying how diverse a document is in terms of its content Goal

Framework 5 Diversity score of each document corpus LDA Learn T for D D x T matrix Rao’s Diversity measure Topic co-occurrence similarity measures T : topic D : document

Outline  Introduction  Method  Experiment  Conclusions 6

Topic-based Diversity(1) LDA : collapsed Gibbs sampler Using the topic-word assignments from the final iteration of the Gibbs sampler ndj corresponding to the number of word tokens in document d that are assigned to topic j. Example of create D x T matrix : t1 t2 t3 d1 d2 d3 d4 n13

Topic-based Diversity(2) Rao’s Diversity for a document d : 8 ndj : the value of entry (d,j) in DxT matrix nd : the number of word tokens in d measure of the distance between topic i and topic j

Topic-based Diversity(3) Example of Rao’s diversity : t1 t2 t3 d1 d2 d3 d4 div(1) = 1.26 div(2) = div(3) = div(4) =

Topic co-occurrance Similarity Cosine similarity : Probabilistic-based : 10 N : number of word tokens in the corpus. ndj : the value of entry (d,j) in DxT matrix

Similarity to Distance 11 Similarity measures Cosine similarity Probability based Similarity to Distance

Outline  Introduction  Method  Experiment  Conclusions 12

Experiment Dataset PubMed Central Open Access dataset (PubMed ) NSF Awards from 2007 to 2012 (NSF) Association of Computational Linguistics Anthology Network (ACL) Topic Modeling (LDA) MALLET α : 0.05*(N/D*T) , β : ,000 iterations. Keep only the final sample in the chain. T = 10, 30, 100 and 300 topics. 13

Pseudo- Documents Reason : no ground-truth measure for a document's diversity. Half of which were designed to have high diversity and half of which were designed to have low diversity. High diversity pseudo-document : manually selecting Randomly select an article from A and one from B. 14 Relatively unrelated Journal A Journal B Pseudo-document Randomly select

Experiment ROC Curve 15 AUC : Area under the ROC curve

Experiment AUC scores for different diversity measures based on 1000 pseudo- documents from PubMed 16

Experiment Evaluating transformations 17

Experiment most diverse NSF grant proposals 18

Outline  Introduction  Method  Experiment  Conclusions 19

Conclusions Presented an approach for quantifying the diversity of individual documents in a corpus based on their text content. More data-driven, performing the equivalent of learning journal categories by learning topics from text. Can be run on any collection of text documents, even without a prior categorization scheme. A possible direction for future work is that of temporal document diversity. 20