Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,

Similar presentations


Presentation on theme: "MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,"— Presentation transcript:

1 MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei, UIUC 2008.08.25 1 University of Illinois at Urbana-Champaign

2 Motivation  The common task: mining and extracting information from a text collection with ad hoc information needs  Structured, faceted summarization  Clustering search results  Integrating expert/customer reviews  Semi-structured summarization of scientific literatures  Etc. etc. … 2 University of Illinois at Urbana-Champaign

3 Multifaceted Text Overview  Even if relevant information is found:  Too much information…  10 3 research papers  10 4 customer reviews  10 5 web search results Facet2: Design Facet1: Price Facet3: Driving experience - A multifaceted overview Sentence 1, … Sentence 2, … … Sentence k, … price 0.4 finance 0.3 cheap 0.05 interest 0.05 … 3 University of Illinois at Urbana-Champaign

4 Multi-Faceted Overview Mining  Unsupervised  A topic clustering problem  Limitations: Topics do not necessarily reflect users’ preferences Summarizing a topic cluster is still challenging  Supervised  A categorization problem with training examples  Limitations: Predefined facets, may not fit the need of a particular user Only works for a predefined domain and topics Training examples for each facet are often unavailable What is missing here? User interactions… 4 University of Illinois at Urbana-Champaign

5 More Realistic New Setup  Allow a user to flexibly describe each facet with keywords (1-2)  Let the user determine what they want  Mine a multi-faceted overview in a semi-supervised way  No need of training examples  Technical challenge: how to cast it as a semi-supervised learning problem 5 University of Illinois at Urbana-Champaign

6 Example (1): Consumer vs. Editor FacetsGenerated Overview (10k customer rev.)Editor's Review (1) Body Styles, Exterior Design Like the minor exterior styling changes from 2005 to 2006. Tried the Camry XLE first, nice ride, but lacked a few features i wanted, like dual zone A/C, and didn't like the wood trim.... Available trim levels include... The VP provides air conditioning, power windows... Powertrains…… Safety…… Interior Design The interior is beautiful - I got all of the features and the navigation is extremely easy to use. Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space …The seating arrangements are top-notch, and the interior design and materials quality continue the high- caliber standards... The car's backseat is among the roomiest in the segment... Driving Impressions …… Honda accord 2006 6 University of Illinois at Urbana-Champaign

7 Example (2): Different Facets FacetsUser InputGenerated Overview Designdesign, styleLike the minor exterior styling changes from 2005 to 2006. Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space Engineengine, fuel… Financefinance, priceWhen I bought it I was amazed at the trim level for the price. It is extremely fun to drive, fit and finish is fantastic, the oversteer could easily be corrected, at the price, it has no peer and is 10k less then a comparable BMW Safetysafety… Drivingcomfort, fun…  What if the users want an overview with different facets? 7 University of Illinois at Urbana-Champaign

8 Approach  Two-stage framework, using probabilistic topic models  Model each facet with a language model (word distribution)  Facet model initialization bootstrapping method to expand the original facet keywords with additional correlated words in the document collection  Facet model estimation: to “guide” a generative topic model with user defined facets Propose probabilistic mixture models to estimate the word distribution of every facet Meanwhile, constraining a facet model to be close to the user specification  Generate the overview: apply the estimated facet models to categorize the sentences into a semi-structured overviews 8 University of Illinois at Urbana-Champaign

9 Bootstrapped facet model initialization design feature fun drive comfortable price horsepower smooth performance fuel safety reliability exterior roof seat cheap engine performance 0.5 fuel 0.5 … performance 0.4 fuel 0.3 horsepower 0.05 engine 0.03 smooth 0.03 … 9 University of Illinois at Urbana-Champaign

10 Semi-supervised facet model estimation  Guide facet model estimation with Dirichlet Priors …………… …………… ………… ……… ……. Dirichlet prior, can be interpreted as pseudo word counts - Initialized distr. 10 University of Illinois at Urbana-Champaign

11 Semi-supervised facet model estimation  Guide facet model estimation with Regularization the log likelihood of the text collection propagates the constraint through the entire collection according to document similarities Constrains the estimated facet models to close to the initial facet models 11 University of Illinois at Urbana-Champaign

12 Experimental Results  The gene summarization task in biomedical literature  The car review mining task for online customer reviews  Our proposed system, especially the regularized Topic model, is quite effective in mining multi-faceted overviews FacetsPriorRegMQR SI0.440.450.47 GI0.510.470.41 GP0.200.220.20 EL0.220.250.18 MP0.25 0.20 WFPI0.090.190.15 Avg.0.290.310.27 FacetsPriorRegMQR BS0.1930.2000.174 PP0.2730.2780.207 SF0.2350.2430.208 IF0.3090.3240.294 DI0.3160.3190.264 Avg.0.2650.2730.229 ROUGE-1 Average R scores Precision @5 12 University of Illinois at Urbana-Champaign

13 - Please stop by our poster on Tuesday University of Illinois at Urbana-Champaign


Download ppt "MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,"

Similar presentations


Ads by Google