Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign August 24, COLING’2010 Beijing, China 1

2 Online Opinions: Valuable Resource 2 Need to organize them in a meaningful way! Need to organize them in a meaningful way! …

3 Aspect Summarization 3 Childhood Barack Obama is an African American whose father was born in Kenya and got a sholarship to study in American. born in Honolulu, Hawaii, to Barack Hussein Obama Sr., a Kenyan, and Kansas born Ann Dunham. President Campagne The Obama campaign’s use of new media technologies to revitalize political activism among youth, engage the public at large, and raise enormous, record-breaking sums of money was unlike that of any political campaign to date. Health Care Reform Several months after the landmark healthcare bill was passed, America's faith in healthcare increases dramatically. For health insurance brokers, the new health care reform legislation has created uncertainty of … What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order

4 Existing Work What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order 4 Clustering + Phrase Selection NA [Chen&Dumais 2000] Our idea: use structured ontology Our idea: use structured ontology

5 Why Using Ontology? What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order 5 Ontology based In addition: Great coverage – 12 millions of entities, e.g. person, place, or thing Consistently growing – Anyone can contribute data Clustering based NA

6 Problem Definition Topic = “Abraham Lincoln”Ontology (>50 aspects) Professions Quotations Parents … Date of Birth Place of Death Professions Online Opinion Sentences … Selected Subset of Aspects Selected Matching Opinions Ordered to optimize readability 6 Date of Birth Books written Place of Death Place of Birth Children Output Spouse Two Main Tasks: - Aspect Selection - Aspect Ordering Two Main Tasks: - Aspect Selection - Aspect Ordering

7 Aspect Selection: Task Definition What are “good aspects”? – 3. Captures major opinions … Professions KL-divergence retrieval model Query: Collection: Aligned relevant opinions Professions Parents … … … 7 Task: Select a subset of K aspects Task: Select a subset of K aspects

8 Aspect Selection: Methods (1) (2) Size-based – Size = Number of aligned relevant opinions – Select K aspects of largest size Opinion Coverage-based – Reduce redundancy, maximum coverage – Select K aspects sequentially (max cover problem) Professions 1 1 2 2 3 3 … Position 4 4 5 5 3 3 … Size=800 Size=600 8 Parents 4 4 5 5 6 6 … Size=500

9 Aspect Selection: Method (3) Conditional Entropy-based Professions … … Collection: Clustering, e.g. K-means C1 C2 C3 … … … Parents Position … 9 … Clusters: C Aspect Subset: A A = argmin H(C|A) p(A i,C i ) = argmin - ∑ i p(A i,C i ) log ---------- p(A i ) A1 A2 A3 Use a greedy algorithm to approximate the solution

10 Aspect Ordering: Task Definition Date of Birth Place of Death Professions Quotations Date of Birth Place of Death Professions Quotations Ordered Un-Ordered Aspect Subset 10 What are “good aspects”? – 4. Reasonable order

11 Aspect Ordering: Methods Ontology Order – Use the order that aspects appear in ontology Coherence Order – Follow the order of aligned opinions in their original articles (e.g. blog article, customer review) 11

12 Aspect Ordering: Coherence Order 12 Original Articles Date of Birth Place of Death A1 A2 Coherence(A1, A2)  #( is before ) Coherence(A2, A1)  #( is before ) … So, Coherence(A2, A1) > Coherence (A1, A2) Π(A) = argmax ∑ Ai before Aj Coherence(A i, A j ) Use a greedy algorithm to approximate the solution

13 Experiments: Data Sets Ontology – Freebase Opinions – Blog entries and CNET customer reviews StatisticsUS PresidentsDigital Cameras # Topics36110 # Aspects/Topic65±2632±4 # Opinions/Topic1001±1542140±249 13

14 Sample Results: Sony Cybershot DSC-W200 14 Freebase Aspects supRepresentative Opinion Sentences Format: Compact 13 Quality pictures in a compact package. …amazing is that this is such a small and compact unit but packs so much power Supported Storage Types: Memory Stick Duo 11 This camera can use Memory Stick Pro Duo up to 8 GB Using a universal storage card and cable (c’mon Sony) Sensor type: CCD 10 I think the larger ccd makes a difference. but remember this is a small CCD in a compact point-and- shoot. Digital zoom: 2X 47 once the digital :smart” zoom kicks in you get another 3x of zoom. I would like a higher optical zoom, the W200 does a great digital zoom translation...

15 Aspect Selection: Evaluation Measures Aspect Coverage (AC) Aspect Precision (AP) = Jaccard similarity Average Aspect Precision (AAP) 15 Professions C1 C2 C3 Parents Position A1 A2 A3 J(A1,C2)=1 J(A2,C2)=2/4 J(A3,C1)=2/4AP=0.5 AP=0.75 AP=0 = 2/3 = 0.625 = 0.42

16 Conditional Entropy-based method provides best trade-off for Aspect Selection Methods Aspect Coverage Aspect Precision Average Aspect Precision Random 0.5140 0.09330.1223 Size-based 0.3108 0.15080.0949 Opin-Cover 0.5463 0.09130.1316 Cond Ent 0.5770 0.08560.1552 Random 0.6554 0.08710.1271 Size-based 0.6071 0.10770.1340 Opin-Cover 0.6998 0.09140.1564 Cond Ent 0.7497 0.07890.1574 US Presidents Digital Cameras 16

17 Aspect Ordering: Human Labeling Professions Quotations Parents … Cluster Constraints Order Constraints Parents Spouse Party Positions … Date of Birth Date of Death Education Positions … Aspect subset size = K 17 Children Spouse Children Date of Birth Spouse Human Agreement X 3

18 Aspect Ordering: Measures 18 Cluster Constraints Parents Spouse Party Positions Children Parents Spouse Parents Children Spouse Party Positions Cluster Precision = 0.5 Is this pair presented together in the output? Cluster Penalty = 1.25 # aspects placed between this pair in the output? 1 0 1 0 0 2 0 3

19 Aspect Ordering: Evaluation Results Measures: Cluster Precision Higher is better Cluster Penalty Lower is better Gold STD Random Order Ontology Order Coherence Order 1 0.2540 0.93550.8978 2 0.2335 0.77580.8323 3 0.2523 0.40300.5545 union 0.3067 0.72680.7488 19 Gold STD Random Order Ontology Order Coherence Order 1 2.0656 0.29570.2016 2 2.1790 0.75300.5222 3 2.3079 2.13281.1611 union 1.9735 1.07200.7196

20 Aspect Ordering: Evaluation Results Higher is better Gold STD Random Order Ontology Order Coherence Order 1 0.5106 071110.5444 2 0.4759 0.67590.5093 3 0.5294 0.71430.8175 union 0.5006 0.65000.6833 20 Order Constraints Date of Birth Date of Death Education Positions Is this order pair preserved in the output? Spouse Children 1 0 1 Order Precision = 0.67

21 Conclusions Novel Problem: exploit ontology for structured organization of online opinions – Aspect selection – Aspect ordering Evaluation: US presidents and digital cameras – Conditional Entropy-based aspect selection – Coherence ordering Future Directions: – New aspect suggestion for ontology – Better alignment of opinion sentences and aspects – Ontology + well-written articles 21

22 Thank you! & Questions? 22


Download ppt "Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google