Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.

Similar presentations


Presentation on theme: "Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014."— Presentation transcript:

1 Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014

2 Generic Summarization Generic Extractive Multi-doc Summarization: – Given a set of documents Di – Identify a set of sentences Sj s.t. |Sj| < L The “central information” in Di is captured by Sj Sj does not contain redundant information Representative methods: – KLSum – LexRank Key concepts: Centrality, Redundancy

3 Update Summarization Given a set of documents split as A = ai / B = bj defined as background / new sets Select a set of sentences Sk s.t. – |Sk| < L – Sk captures central information in B – Sk does not repeat information conveyed by A Key concepts: centrality, redundancy, novelty

4 Query-Focused Summarization Given a set of documents Di and a query Q Select a set of sentences Sj s.t.: – |Sj| < L – Sj captures information in Di relevant to Q – Sj does not contain redundant information Key concepts: relevance, redundancy

5 Query-Chain Focused Summarization We define a new task to clarify among key concepts: – Relevance – Novelty – Contrast – Similarity – Redundancy The task is also useful for Exploratory Search

6 QCFS Task Given a set of topic-related documents Di and a chain of queries qj Output a chain of summaries {Sjk} s.t.: – |Sjk| < L – Sjk is relevant to qj – Sjk does not contain information in Slk for l < j

7 Query Chains Query Chains are observed in query logs: – PubMed search log mining – Extract query chains (length 3) of same session / with related terms (manually) Query Chains evolution may correspond to: – Zoom in (asthma  atopic dermatitis) – Query reformulation (respiratory problem  pneumonia) – Focus Change (asthma  cancer)

8 Query Chains vs. Novelty Detection TREC Novelty Detection Task (2005) Task 1: Given a set of documents for the topic, identify all relevant and novel sentences. Task 2: Given the relevant sentences in all documents, identify all novel sentences. Task 3: Given the relevant and novel sentences in the first 5 docs only, find the relevant and novel sentences in the remaining docs. Task 4: Given the relevant sentences from all documents and the novel sentences from the first 5 docs, find the novel sentences in the remaining docs.

9 Novelty Detection Task Create 50 topics: – Compose topic (textual description) – Select 25 relevant docs from News collection – Sort docs chronologically – Mark relevant sentences – Among relevant sentences, mark novel ones (not covered in previous relevant sentences). – 28 “events” topics / 22 “opinion” topics

10 TREC Novelty – Dataset Analysis Select parts of documents (not full docs). – Relevant rate: events: 25% / opinion: 15% – Consecutive sentences: 85% / 65% – Relevant agreement: 68% / 50% – Novelty rate:38% / 42% – Novelty agreement: 45% / 29%

11 TREC Novelty Methods Relevance = Similarity to Topic. Novelty = Dissimilarity to past sentences. Methods: – Tf.idf and okapi with threshold for retrieval – Topic expansion – Sentences expansion – Named entities as features – Coreference resolution – Named entities normalization (entity linking) Results: – High recall / Low precision – Almost no distinction relevant / novel

12 QCFS and Contrast QCFS is different from Query Focus: – When generating S2 – must take S1 into account. QCFS is different from Update: – Split A/B is not observed. QCFS is different from Novelty Detection: – Chronology is not relevant Key concepts: Query Relevance Query Distinctiveness (how qi+1 contrasts with qi)

13 Contrastive IR CWS: A Comparative Web Search System Sun et al, WWW 2006 Given 2 queries q1 and q2 Rank a set of “contrastive pairs” (p1, p2) where p1 and p2 are snippets of relevant docs. Method: – Retrieve relevant snippets SR1 = {p1i} and SR2 = {p2j} – Score aR(p1, q1) + bR(p2, q2) + cT(p1,p2,q1,q2) – T(p1,p2,q1,q2) = x Sim(url1, url2) + (1-x)Sim(p1\q1, p2\q2) – Greedy ranking of pairs: rank all pairs (p1,p2) by score – take top Remove p1top and p2top from all pairs – iterate. – Cluster pairs into comparative clusters – Extract terms from comparative clusters.

14 Document Clustering A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results Kummamuru et al, WWW 2004 Desirable properties of clustering: – Coverage – Compactness – Sibling distinctiveness – Reach time Incremental algorithm: – Decide on width n of tree (# children / node) – Nodes are represented by “concepts” (terms) – Rank concepts by score and add them under current node – Score(Sak, cj) = a ScoreC(Sak-1, cj) + b ScoreD(Sak-1, cj) – ScoreC = document coverage – ScoreD = sibling distinctiveness


Download ppt "Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014."

Similar presentations


Ads by Google