Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz.

Similar presentations


Presentation on theme: "Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz."— Presentation transcript:

1 Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

2 The abundance of books is a distraction ‘‘,, Lucius Annaeus Seneca 4 BC – 65 AD

3 So, you want to understand a complex topic… Now what?

4 Search Engines are Great But do not show how it all fits together

5 Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

6 Real Stories are not Linear

7 Metro Map A set of lines Each line follows a coherent narrative thread Structure + multiple aspects austerity bailout junk status Germany protests strike labor unions Merkel

8 Map Definition A map M is a pair (G,  ) where – G=(V,E) is a directed graph –  is a set of paths in G (metro lines) – Each e  E must belong to at least one metro line austerity bailout junk status protests strike Germany labor unions Merkel

9 Game Plan

10 Properties of a Good Map 1. Coherence ???

11 1 1 2 2 3 3 4 4 5 5 Greece Europe Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default Coherence is not a property of local interactions: Incoherent: Each pair shares different words

12 1 1 2 2 3 3 4 4 5 5 Greece Austerity Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default A more-coherent chain: Coherent: a small number of words captures the story

13 Properties of a Good Map 1. Coherence Is it enough?

14 Max-coherence Map Query: Clinton Clinton visits Belfast Clinton set for Dublin High hopes for Clinton visit Clinton, Religious Leaders Share Thoughts Church Leaders Praise Clinton's 'Spirituality' Religion Leaders Divided on Clinton Moral Issue Clinton Should Resign, 2 Religious Leaders Say

15 Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user

16 Coverage Select a small set of diverse articles that covers the most important stories January 17, 2009 Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09]

17 Coverage: The Idea Documents cover concepts: Corpus Coverage

18 High-coverage, Coherent Map Greek Civil Servants Strike over Austerity Measures Greece Paralyzed by New Strike Greek Take to the Streets, but Lacing Earlier Zeal Infighting Adds to Merkel’s Woes It’s Germany that Matters UK Backs Germany’s Effort Germany says the IMF should Rescue Greece IMF more Likely to Lead Efforts IMF is Urged to Move Forward

19 Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity

20 Definition: Connectivity Experimented with formulations Users do not care about connection type Encourage connections between pairs of lines

21 Tying it all Together: Map Objective Coherence – Either coherent or not: Constraint Coverage – Must have! Connectivity – Nice to have Consider all coherent maps with maximum possible coverage. Find the most connected one.

22 Game Plan

23 Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

24 Coherence Graph: Main Idea Vertices correspond to short coherent chains Directed edges between chains which can be conjoined and remain coherent 1 1 2 2 3 3 4 4 5 5 6 6 5 5 8 8 9 9 1 1 2 2 3 3 5 5 8 8 9 9

25 Finding Vertices Vertices are short, coherent chains Can use [KDD’10] – Expensive – Solving many LPs Take advantage of simplicity of short stories – No topic drift – Sampling-based (fast) algorithm

26 Finding Edges Problem: Combining several strong chains may result in a much-weaker chain Discontinuity: Change of focus Discontinuity: Change of focus

27 A chain is m-coherent if each sub-chain (di, …, di+m) is coherent. m-Coherence Control discontinuity points: m: size of user's ‘history window‘ – m=length(chain) : standard coherence – m=1: optimize transitions without context

28 Observation If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:

29 Using the Observation 1 1 2 2 3 3 2 2 3 3 4 4 2 2 3 3 5 5 1 1 2 2 3 3 5 5 If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent: Useful for divide and conquer: – Add edge if m-1 overlap

30 Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

31 Finding High-Coverage Chains Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles 1 1 2 2 3 3 2 2 3 3 4 4 2 2 3 3 5 5 1 1 2 2 3 3 5 5 1 1 2 2 3 3 4 4 Cover( ) > Cover( ) ?

32 Reformulation Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Submodular orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Orienteering a function of the nodes visited

33 Approach Overview: Recap Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity Encodes all m-coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation

34 Example Map: Greece Debt

35 Game Plan

36 Evaluation User study – Document selection: capturing important content? – Micro-knowledge: question-answering – Macro-knowledge: high-level summaries – Effect of structure New York Times (2008-2010) – 18K+ articles – Chile, Haiti, Greece

37 Document Selection Experts compose a list of important events Subtopic recall (% of events in the map): # lines Subtopic recall

38 Micro-Knowledge (Question Answering) Mechanical Turk Competitors: – Google News – Event threading (TDT) [Nallapati et al, 04] – Structureless maps Results: minor gains – map structure helps Question 2: How many miners were trapped?

39 Macro-Knowledge (High-Level Summaries) Summarize complex story in a paragraph – Maps vs. Google News – ~15 paragraphs per task Mturk to evaluate paragraphs: – Which paragraph provided a more complete and coherent picture of the story? – Justification: Paragraph A is more… – ~300 evaluations per task

40 Macro-Knowledge: Results Greece: 72% prefer maps – Justifications: Haiti: 59% prefer maps – Map users mostly summarized one story line MapsGoogle News Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline

41 Conclusions Formulated metrics characterizing good maps Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you!

42

43 Finding Coherent Chains Goal: represent all coherent chains Problem: intractable Divide and conquer: – Find short coherent chains – Concatenate to form longer coherent chains

44 Website


Download ppt "Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz."

Similar presentations


Ads by Google