Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "MAKING CONVERSATION STRUCTURE EXPLICIT: IDENTIFICATION OF INITIATION-RESPONSE PAIRS WITHIN DISCUSSION FORUMS Yi-Chia Wang and Carolyn Rosé Language Technologies."— Presentation transcript:

1 MAKING CONVERSATION STRUCTURE EXPLICIT: IDENTIFICATION OF INITIATION-RESPONSE PAIRS WITHIN DISCUSSION FORUMS Yi-Chia Wang and Carolyn Rosé Language Technologies Institute School of Computer Science Carnegie Mellon University 06/03/2010 NAACL 2010

2 Discussion Forums and Thread Structure  Often thread structure is explicitly represented  Sometimes thread structure is implicit  Initiation-response pairs are not necessarily adjacent to each other NAACL 2010 2

3 Outline  Related works  Identification of Initiation-reply pairs as a ranking problem  Usenet: data preparation  Error Analysis for the purely lexical approach  Variations of Latent Semantic Analysis  Experimental results and current directions NAACL 2010 3

4 Related Works  Thread Recovery  Application of thread recovery to education (Trausan-Matu et al., 2007) No evaluation  Basic research in thread recovery (Wang et al., ICWSM 2008; Wang et al., CSCW 2008) Investigated the contribution of temporal information and similarity  Conversation Disentanglement (Elsner and Charniak, 2008; Eisenstein and Barzilay 2008; Wang and Oard, 2009)  Identify subtopic clusters of contributions in a conversation  Did not identify the explicit parent-child relationships between contributions NAACL 2010 4

5 Ranking Problem  Ranking is more suitable than classification John: The weather is good. Mary: I want to have a picnic. Kevin: What’s your plan for this weekend?  The degree of relatedness between two contributions is conditioned on the relationship between them and other surrounding posts within the discussion. NAACL 2010 5

6 Pairwise Ranking  Given a pair of posts p i and p j, we represent the relatedness of them as the quantity x ij  x ij = sim(p i, p j )  Where sim is a similarity function  Model  Input: ordered pair (x ij, x ik )  Scoring function: score(x ij, x ik ) = x ij – x ik  Output: sign(score(x ij, x ik )) +: x ij is ranked higher than x ik NAACL 2010 6

7 Corpus - Usenet  One of the most active newsgroups, alt.politics.usa  From June 2003 through 2008  Parent-child relationships between whole posts are explicit in meta-data  Statistics  784,708 posts  625,116 posts are explicit responses to others  77,985 discussion threads with 2 or more posts NAACL 2010 7

8 Setting up the Data Set  For every post p i, we generated one instance ( (p i, p j ), (p i, p k ) )  p i is the reply message  p j is the correct parent message of p i  p k is a randomly chosen incorrect parent message of p i Random post from the same thread of p i Not the parent of p i John: The weather is good.Kevin: What’s your plan for this weekend? Mary: I want to have a picnic. Positive example NAACL 2010 8 Negative example

9 Dividing the Data Set  Three subsets  Learning set (~90% of the data): 90,028 Used for constructing LSA space  Testing set (~10% of the data): 10,000 Evaluation results. NAACL 2010 9

10 Baseline  Higher lexical cohesion is expected between the reply and the correct parent  ( (p i, p j ), (p i, p k ) )  ( cossim(p i, p j ), cossim(p i, p k ) )  Accuracy is 66% posnegpos-splitneg-split 0.300.1HL 0.250.15LL ………… 0.280.22LH 0.350.20HH NAACL 2010 10

11 Error Analysis of Baseline  Is there a way to amplify indirect connections? Link Info.InitiationResponse Single word overlapping As much as I despise Wal*mart, I have to say "kudos " to them for this. I am by no means conservative, but I think PC has run way to amok when people cannot use the word Christmas " Jesus was born, and so I get presents. Thank you, Jesus, for being born. " Eric Cartman understands Christmas ! Synonym, hypernym, etc. A 100m square array in our Southwest, would provide ALL the electricity the US can consume, now and into the future. You have no clue how much energy this country uses, how the power grid works or how much 100 sq mi of panels would produce in the daytime, do you ? You don't know what you 're talking about. Paraphrase He is right, so the question is, why would a bank give a credit card to someone they know they can never recover their money from, if they decide to just disappear? Your history misses something - a lot. As you hint at, its original customer base was Italian immigrants and their descendants in the San Francisco area - NOT a subculture of illegal aliens notorious for changing aliases and related identification at the drop of a hat any time they were in a little trouble under the latest alias. NAACL 2010 11

12 Latent Semantic Analysis (LSA) Landauer et al., 1998  LSA can group semantic-related words together  Represent word meanings in a concept space with dimension k  Documents are the positive examples in the learning set Term-by-document Matrix d1d2………dn t1200010 t2003000.000000.100002.000300 tm040000 LSA c1c2…ck t1 ….……… t2………….………….………….………… tm………… Term-by-concept Matrix NAACL 2010 12

13 NAACL 2010 Versions of LSA ( (p i, p j ), (p i, p k ) )  ( ?(p i, p j ), ?(p i, p k ) )  lsa-avg( p i, p j )  Foltz et al., 1998  lsa-cart( p i, p j )  Introduced in this work p i :t i1 t i2 t i3 p j :t j1 t j2 p i :t i1 t i2 t i3 p j :t j1 t j2 13

14 Experimental Results  Two independent factors  Text expansion with WordNet (or not)  Relatedness representation  Results  Syn+Hyper, Gloss > NoExp  LSA-cart > Cos > LSA-avg  (LSA-cart, NoExp) > all the others NAACL 2010 14

15 Conclusions and Future Work  Formalize thread recovery as a ranking problem  A detailed error analysis of a simple baseline  A novel variation of LSA  Future work  LSA-cart has higher time complexity  Take discourse focus (i.e. salience) into account to select the most informative word pairs  Take discourse function (i.e. conversation act) of contributions into account Proposal, Counter-proposal, Acceptance, Rejection … NAACL 2010 15

16 NAACL 2010 Question?? 16

17 More Details  Narrow down to the specific text spans that have the initiation-reply relation.  The text immediately following the quoted text tends to have an explicit discourse connection with it >> Why is the quality of life of the child, mother, >> and society at large, more important than the >> sanctity of life? > > Because in the case of anencephaly at least, > the life is ended before it begins. We disagree on this point. Why do you refuse to provide your very own positive definition of life? Do you believe life begins before birth? At birth? After birth? Never? NAACL 2010 17

18 Taking Advantage of Initiation- Response Relations  Theoretical aspect  What makes conversation coherent?  How people are relating to each other?  Practical aspect  Influence prediction (Java et al., 2006; Kale et al., 2006)  Newsgroup search (Xi et al., 2004) Meta features extracted from the discussion threads  Text classification (Wang et al., 2007)  Email summarization (Carenini et al., 2007) Quotation graph to organize conversations NAACL 2010 18


Similar presentations

Ads by Google