MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Yansong Feng and Mirella Lapata
KDD 2011 Summary of Text Mining sessions Hongbo Deng.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Latent Aspect Rating Analysis without Aspect Keyword Supervision Hongning Wang, Yue Lu, ChengXiang Zhai Department of.
ADVISE: Advanced Digital Video Information Segmentation Engine
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Maryam Karimzadehgan (U. Illinois Urbana-Champaign)*, Ryen White (MSR), Matthew Richardson (MSR) Presented by Ryen White Microsoft Research * MSR Intern,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz (KDD`08) Speaker: Hsu, Yi Ling.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
Confidence-Aware Graph Regularization with Heterogeneous Pairwise Features Yuan FangUniversity of Illinois at Urbana-Champaign Bo-June (Paul) HsuMicrosoft.
Dr Jamal Roudaki Faculty of Commerce Lincoln University New Zealand.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
My First Car By Mr. Nash IET Core, Period 2,3,4,5 & 6 September 21, 2015 Budget: Up to $4,999.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
A Study of Poisson Query Generation Model for Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
2017 KIA Optima Sedan
Queensland University of Technology
A Formal Study of Information Retrieval Heuristics
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Bayesian Inference for Mixture Language Models
John Lafferty, Chengxiang Zhai School of Computer Science
Junghoo “John” Cho UCLA
Topic Models in Text Processing
INF 141: Information Retrieval
Information Retrieval and Web Design
Language Models for TR Rong Jin
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei, UIUC University of Illinois at Urbana-Champaign

Motivation  The common task: mining and extracting information from a text collection with ad hoc information needs  Structured, faceted summarization  Clustering search results  Integrating expert/customer reviews  Semi-structured summarization of scientific literatures  Etc. etc. … 2 University of Illinois at Urbana-Champaign

Multifaceted Text Overview  Even if relevant information is found:  Too much information…  10 3 research papers  10 4 customer reviews  10 5 web search results Facet2: Design Facet1: Price Facet3: Driving experience - A multifaceted overview Sentence 1, … Sentence 2, … … Sentence k, … price 0.4 finance 0.3 cheap 0.05 interest 0.05 … 3 University of Illinois at Urbana-Champaign

Multi-Faceted Overview Mining  Unsupervised  A topic clustering problem  Limitations: Topics do not necessarily reflect users’ preferences Summarizing a topic cluster is still challenging  Supervised  A categorization problem with training examples  Limitations: Predefined facets, may not fit the need of a particular user Only works for a predefined domain and topics Training examples for each facet are often unavailable What is missing here? User interactions… 4 University of Illinois at Urbana-Champaign

More Realistic New Setup  Allow a user to flexibly describe each facet with keywords (1-2)  Let the user determine what they want  Mine a multi-faceted overview in a semi-supervised way  No need of training examples  Technical challenge: how to cast it as a semi-supervised learning problem 5 University of Illinois at Urbana-Champaign

Example (1): Consumer vs. Editor FacetsGenerated Overview (10k customer rev.)Editor's Review (1) Body Styles, Exterior Design Like the minor exterior styling changes from 2005 to Tried the Camry XLE first, nice ride, but lacked a few features i wanted, like dual zone A/C, and didn't like the wood trim.... Available trim levels include... The VP provides air conditioning, power windows... Powertrains…… Safety…… Interior Design The interior is beautiful - I got all of the features and the navigation is extremely easy to use. Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space …The seating arrangements are top-notch, and the interior design and materials quality continue the high- caliber standards... The car's backseat is among the roomiest in the segment... Driving Impressions …… Honda accord University of Illinois at Urbana-Champaign

Example (2): Different Facets FacetsUser InputGenerated Overview Designdesign, styleLike the minor exterior styling changes from 2005 to Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space Engineengine, fuel… Financefinance, priceWhen I bought it I was amazed at the trim level for the price. It is extremely fun to drive, fit and finish is fantastic, the oversteer could easily be corrected, at the price, it has no peer and is 10k less then a comparable BMW Safetysafety… Drivingcomfort, fun…  What if the users want an overview with different facets? 7 University of Illinois at Urbana-Champaign

Approach  Two-stage framework, using probabilistic topic models  Model each facet with a language model (word distribution)  Facet model initialization bootstrapping method to expand the original facet keywords with additional correlated words in the document collection  Facet model estimation: to “guide” a generative topic model with user defined facets Propose probabilistic mixture models to estimate the word distribution of every facet Meanwhile, constraining a facet model to be close to the user specification  Generate the overview: apply the estimated facet models to categorize the sentences into a semi-structured overviews 8 University of Illinois at Urbana-Champaign

Bootstrapped facet model initialization design feature fun drive comfortable price horsepower smooth performance fuel safety reliability exterior roof seat cheap engine performance 0.5 fuel 0.5 … performance 0.4 fuel 0.3 horsepower 0.05 engine 0.03 smooth 0.03 … 9 University of Illinois at Urbana-Champaign

Semi-supervised facet model estimation  Guide facet model estimation with Dirichlet Priors …………… …………… ………… ……… ……. Dirichlet prior, can be interpreted as pseudo word counts - Initialized distr. 10 University of Illinois at Urbana-Champaign

Semi-supervised facet model estimation  Guide facet model estimation with Regularization the log likelihood of the text collection propagates the constraint through the entire collection according to document similarities Constrains the estimated facet models to close to the initial facet models 11 University of Illinois at Urbana-Champaign

Experimental Results  The gene summarization task in biomedical literature  The car review mining task for online customer reviews  Our proposed system, especially the regularized Topic model, is quite effective in mining multi-faceted overviews FacetsPriorRegMQR SI GI GP EL MP WFPI Avg FacetsPriorRegMQR BS PP SF IF DI Avg ROUGE-1 Average R scores 12 University of Illinois at Urbana-Champaign

- Please stop by our poster on Tuesday University of Illinois at Urbana-Champaign