Download presentation
Presentation is loading. Please wait.
Published byBernadette Owens Modified over 9 years ago
1
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore
2
Introduction 2 WING, NUS Other customers can refer to the review when they buy it or not Manufacturers can get a kind of feedback from customers “Best photos that I have ever taken and a joy to use” “fantastic results ” “754 customer reviews ”
3
Introduction Output of summary in existing systems [Hu and Liu, KDD’04], [Hu and Liu, AAAI’04], [Popescu and Etzioni, HLT/EMNLP’05] 3 WING, NUS a. Lens (+): 57 sentences 1. The lens feels very solid! 2. I have taken a whole bunch of excellent pictures with this lens. … (-): 15 sentences 1. I do not satisfy with the included lens kit. 2. The lens cap is very loose and come off very easily ! … b. Battery Life (+): 32 sentences 1. The battery lasts for ever on one single charge. 2. The battery duration is amazing ! … (-): 8 sentences 1. I experienced very short battery life from this camera. 2. It uses a heavy battery. … Does not organize the sentences in each sentiment Users need to read through the sentences to know the reasons that justify the sentiment
4
Introduction Output of desirable summary that our system aims at 4 WING, NUS a. Lens (+): The lens feels very solid! (+10 similar) (-): I think the lens does not worth it, it’s a bit too fragile. (+2 similar) (+): I have taken a lot of excellent pictures with this lens. (+7 similar) (-): Don’t buy this lens, I always get my pictures blurred. (+0 similar) … b. Battery Life (+): The battery lasts for ever on one single charge. (+18 similar) (-): I experienced very short battery life from this camera. (+4 similar) (+): 0 sentence (-): It uses a heavy battery. … Provides a representative reason for the sentiment Users can read a concise summary
5
Proposed Method 5 WING, NUS Pre- processing Association Rule Mining Post- processing Infreq. Facet Extraction Opinionated Sentence Extraction 1.The lens is too plastic! 2.The price of this lens is affordable! … 1.The output pictures are crystal clear. 2.I like the sharpness of the picture. … … Sentence Representation Sentence Clustering Compact Presentation (1)PRODUCT FACET IDENTIFICATION (2) SUMMARIZATION Subtopic Clustering Product Reviews Output Summary Syntactic role Clustering
6
Product Facet Identification 6 WING, NUS Pre- processing Association Rule Mining Post- processing Infreq. Facet Extraction POS tagging Extract noun and noun phrases Syntactic Roles Filter away noisy results Identify all the frequent explicit product facets Remove irrelevant facets Help discover infrequent facets
7
Summarization 7 WING, NUS Opinionated Sentence Extraction 1.The lens is too plastic! 2.The price of this lens is affordable! … 1.The output pictures are crystal clear. 2.I like the sharpness of the picture. … … Sentence Representation Sentence Clustering Compact Presentation Subtopic Clustering [Ding’s et al., WSDM’08] Assign a polarity score per sentence Compute summation of polarity score of its constituent words Compute content-based pairwise similarities between all resulting opinion sentences Clustering Hierarchical clustering with groupwise-average distance Non-hierarchical clustering Select the most representative sentence in the cluster
8
Experiments Experimental Data 3 products from [Hu and Liu, KDD’04] 8 ProductsNumber of sentences Camera160 Phone139 DVD player111 Evaluation Measure (1) Product Facet Identification - Recall, Precision (2) Summarization - Purity, Inverse purity - F (harmonic mean of purity and inverse purity) [Hotho et al., GLDV-Journal for Computational Linguistics and Language Technology ‘05] WING, NUS
9
9 Purity (i) In each generated cluster, precision is first computed regarding each label, the maximum value is then selected. (ii) The overall value for purity are computed by taking the weighted average of (i). (i) Maximum precision of each cluster (ii) “ purity ” for this clustering result (8) (5) (4) ×× (3) × Target documents for clustering (20) × × × WING, NUS
10
Inverse purity (i) In each generated cluster, recall is first computed regarding each label, the maximum value is then selected. (ii) The overall value for inverse purity are computed by taking the weighted average of (i). (8) (5) (4) ×× (3) × × (i) Maximum recall of each label (ii) “ inverse purity ” for this clustering result × × × Target documents for clustering (20) 10WING, NUS
11
F 1 -measure Harmonic mean of “purity” and “inverse purity” 11 WING, NUS (α = 0.5)
12
(1) Product Facet Identification Example of extracted facet: Camera: “battery,” “picture,” “lens” Phone: “signal,” “headset” DVD player: “remote control,” “format” 12 WING, NUS
13
(1) Product Facet Identification 13 DataNumber of manually extracted facets Association miningPost processingInfrequent facet RecallPrecisionRecallPrecisionRecallPrecision Camera790.6710.5520.6580.8250.8220.747 Phone670.7310.5630.7160.8280.7610.718 DVD490.7540.5310.7540.7650.7970.793 Average650.7190.5490.7090.8060.7930.753 Performance of the product facet identification component [Hu and Liu, KDD’04] Performance of the product facet identification component [Hu and Liu, KDD’04] + syntactic role DataNumber of manually extracted facets Association miningPost processingInfrequent facet RecallPrecisionRecallPrecisionRecallPrecision Camera790.6710.6460.6580.8940.8220.842 Phone670.7310.6480.7160.9030.7610.769 DVD490.7540.6100.7540.8180.7970.867 Average650.7190.6340.7090.8720.7930.826
14
(2) Summarization 14 WING, NUS DataFacet (Number of manually defined clusters) Camera Battery (4), Memory (3), Flash (4),LCD (6), Lens (7), Megapixels (5), Mode (6), Shutter (6) Average: 5.13 Phone Battery (3), Camera (3), Headset (4), Radio (3), Service (5), Signal (3), Size (3), Speaker (4) Average: 3.50 DVDPrice (1), Remote (4), Format (1), Design (1), Service (1), Picture (4) Average: 2.00 Number of facets in each product “Camera” has richer properties.
15
(2) Summarization 15 WING, NUS Performance of summarization (F 1 -measure) DataFacetNumber of manually defined clusters Hierarchical clustering Non-hierarchical clustering Random clustering Camera Battery40.7020.7330.596 Memory30.7830.7070.563 Flash40.6280.6930.550 LCD60.6060.7220.473 Lens70.884 0.571 Megapixels50.5430.6260.473 Mode60.897 0.556 Shutter60.760 0.555 Average5.130.7250.7530.542 DVD Price10.8330.8650.688 Remote40.6820.6430.579 Format10.8330.7270.667 Design11.000 Service10.8500.686 Picture40.824 0.474 Average2.000.8370.7910.682 Effective when the number of subtopics is small. Effective when the number of subtopics is large.
16
Conclusion Design a system that can summarize product reviews and organize them into a structured, extractive summary -Product facet identification -Syntactic role information within a sentence is effective. -Summarization -Both hierarchical and non-hierarchical clustering work better compared with random clustering. 16 WING, NUS Future Work Recognize brand names to improve facet identification “My Canon camera has longer battery life than Nikon.” Thank you very much!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.