Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei.

Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei MURAKAMI (NICT) Akiyo NADAMOTO (NICT) Bulletin Board Systems Why BBS? ← |BBS|>> What sort of text? |News Wired| |Wikipedia| Japan

ID name please tell me why my nano sometimes stops even battery still remains. How about iriver N12? extremely light and small. How about iriver N12? extremely light and small. It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. iriver N series has stopped producing. What is the most light or small mp3 player? iPod Shuffle is the best way to do? Not Reply Reply

ID name please tell me why my nano sometimes stops even battery still remains. How about iriver N12? extremely light and small. How about iriver N12? extremely light and small. It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. iriver N series has stopped producing. What is the most light or small mp3 player? iPod Shuffle is the best way to do? BUT: NLP suffers from gaps between corresponding comments BUT: NLP suffers from gaps between corresponding comments Reply “N12” is a “small and light” “MP3 player”, but now “has stopped producing” “N12” is a “small and light” “MP3 player”, but now “has stopped producing”

How Often Such Gaps? Gap length (distance) & Frequency No gap (distance=1) is only 50% Usually distance =2 ～ 5 Gap is a popular phenomenon 【 QUESTION 】 Despite gaps, how does a human-being capture REPLY-TO relations 【 QUESTION 】 Despite gaps, how does a human-being capture REPLY-TO relations

Linguistic already gave several answers One of answers is Not enough! How to calculate relevance? Not enough! How to calculate relevance? Linguist Relevance theory [Sperber1986] Human communication is based on relevance Computer Scientist 【 This study’s GOAL 】 To formalize relevance 【 This study’s GOAL 】 To formalize relevance

Outline Background Method Task setting / Our Approach How to formalize two types of relevance Experiment Related Works Conclusion

Task-setting Natural Task-setting = To which a comment reply-to? i th i-1 i-2 i-3 INSTEAD: Discriminative Task Input: Two comments in the same BBS (P & Q) Output: True (=Q is reply-to P) / False → Suitable to Machine learning (such as SVM) Q Q P P True or False → Complex task Our Task-setting

Our Approach/Assumption 2 types of relevance are available (1) Contents Relevance (2) Discourse Relevance What is the most light or small mp3 player? How about iriver N12? extremely light and small. please tell me why my nano sometimes stops … It is because battery display approaches … Roughly speaking: sentence similarity Discourse or function of comments WHY-QUESTION REASON

Outline Background Method Task setting / Our Approach How to formalize two types of relevance (1) Contents Relevance (2) Discourse Relevance Experiment Related Works Conclusion

Two Contents Relevance (1) Word Overlap Ratio = 4/12= 0.33 (2) WebPMI based Sentence Similarity WebPMI [Bollegala2007] is defined by ↓ What is the most light or small mp3 player? How about iriver N12? extremely light and small. 1 34 5 62 13 4 5 62 Simple Word overlap Ratio can not capture mp3 player iriver N12 !

Web-PMI For each word in P, search Q’s word with the highest WebPMI, and sum up their values # of web pages that contain “N12” &“MP3” # of web pages that contains “N12” # of web pages that contains “MP3” WEBPMI (p,q)=log H(p∩q) / N H(p) / N ・ H(q)/N Mutual information of two words in WEB pages Content Relevance

Outline Background Method Task setting How to formalize two types of relevance (1) Contents Relevance (2) Discourse Relevance Experiment Related Works Conclusion

Discourse Relevance (CMPI; Corresponding PMI ← newly proposed ) ALSO: PMI-based measure BUT: Count co- occurring phrases in P and Q To calculate PMI, we need a large set of P & Q pairs To obtain one comment-pair, we need a large set of comment-pairs To obtain one comment-pair, we need a large set of comment-pairs # of P-Q pairs that contain “please tell me why” in P “It is because” in Q # of P-Q pairs that contain “please tell me why” in P “It is because” in Q # of Q that contain “It is because ” # of Q that contain “It is because ” CPMI (p,q)=log H(p∩q) / N H(p) / N ・ H(q)/N ! please tell me why my nano sometimes stops … It is because battery display approaches … # of P that contain “please tell me why”

Sometimes (=5.1%), we can easily know a response target by using lexical clues (NAME or COMMNET-ID) Unknown Known 5.1 ％ Known 5.1 ％ It’s my first comment! Nice to meet you. 100> nice to meet you.. 100 102 Of COURSE: 5.1% is low ratio OUR SOLUTION: We rely on the data scale (17,300,000 comments) → enough amount for PHI calculation Building a collection of P & Q pairs, by using Lexical-patterns

Outline Background Method Experiment Related Works Conclusion

Experiment 1 TEST-SET: 140 comment pairs (140 P-Q pairs) TASK: output Q reply-to P or not METHODS: – Human-A,B,C – Overlap : Only overlap ratio – WEBPMI : Only Contents Relevance – CPMI : Only Discourse Relevance – SVM : Feature= IF ratio > Th Then TRUE Else FALSE IF ratio > Th Then TRUE Else FALSE IF PMI > Th Then TRUE Else FALSE IF PMI > Th Then TRUE Else FALSE VALUE: WEBPMI & CPMI LEXICON: WORDS ∈ P, Q Half is positive (extracted by patterns), the other is random pairs Half is positive (extracted by patterns), the other is random pairs

Result Summary Human-A 79.2 83.3 75.3 79.1 Human-B 75.7 78.2 73.9 76.0 Human-C 70.7 71.6 72.6 72.1 OVERLAP 61.4 58.7 87.6 70.3 WEBPMI 61.4 72.0 42.4 53.4 CPMI 65.7 66.2 69.8 67.9 SVM 63.8 64.4 79.4 72.1 Accu racy prec ision recall F β=1 > = 70-79% OVERLAP ≒ WEBPMI < SVM < CPMI Discourse is strong Feature design is not suitable?

Kappa Matrix Agreement between methods Human-B Human-C OVERLAP WEBPMI CPMI Human-A Human-C OVERLAP WEBPMI 0.56 0.49 0.08 0.20 0.28 0.47 0.09 0.21 0.25 0.15 0.05 0.25 0.21 0.13 0.16 :=Moderate (High):=Slight(Low) Human output is similar to each other WEBPMI & CPMI have low agreement → They succeed in different examples, This supports our assumption, which decompose relevance into two: (1) contents & (2) discourse

7.47 How about… as soon as possible 6.72 I … I…too 8.43 I’d like to go Wait for you 8.37 It is in/at 7.62 Please tell me… I think it is … Where is it … Several Examples of phrase pairs that have high CPHI values 6.80 Thank you Your welcome 7.38 You can … I try … 7.12 I think … Thank you 6.93 …, isn’t it ? Maybe PHI P P Q Q ANSWER and THANKING These are outside the reach of sentence similarity, motivating discourse clues These are outside the reach of sentence similarity, motivating discourse clues Event sequence P says “go” & Q says “wait” Event sequence P says “go” & Q says “wait”

Outline Background Method Experiment Related Works (if enough time left) Conclusion

Related Works (1/2) in Linguistics 4 conversational maxims [Grice1975] Relevance theory [Sperber1986] How to calculate maxim/relevance? We’ve formalized it! We’ve formalized it! In BBSs, adjacency pairs are not adjacent This motivates our task In BBSs, adjacency pairs are not adjacent This motivates our task Adjacency Pair [Schegloff&Sacks1973] Which is a sequence of two utterances (such as “offers-acceptance”)

Related Works (2/2) in NLP Previous (Dialog and Discourse) Studies – Such as – Based on carefully annotated corpus Rich set of labels/relations This Study – Only one relation (REPLY-TO relation) – BUT: not require human annotation → large scale → enable to calculate Statistical Values (PMI) DAMSL [Core&Allen1997] RST-DT [Carlson2002] Discourse Graph-Bank [Wolf2005]

Outline Background Method Experiment Related Works Conclusion

(1) NEW_TASK – To Detect REPLY-TO relation in comments (2) Formalization for Relevance – To solve the task: We formalize two relevance CONTENTS & DISCOURSE relevance (3) Automatic Corpus Building – To calculate DISCOURSE relevance, we also proposed pattern based corpus construction FINALLY: We believe this study will boost larger scale dialog study (using WEB)

Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei.

Similar presentations

Presentation on theme: "Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei.

Similar presentations

Presentation on theme: "Discriminative Dialog Analysis Using a Massive Collection of BBS comments Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei."— Presentation transcript:

Similar presentations

About project

Feedback