Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kiran Garimella.  News  Scientific papers  Email  Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics.

Similar presentations


Presentation on theme: "Kiran Garimella.  News  Scientific papers  Email  Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics."— Presentation transcript:

1 Kiran Garimella

2

3

4

5  News  Scientific papers  Email  Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics

6  I’m a..  Just kidding!

7  Link structure  Connected text  Hidden structure/patterns  This talk ◦ Summarizing scientific articles ◦ Political trends from search queries ◦ Romantic relationship breakups on Twitter

8

9  Motivation ◦ Not many existing systems ◦ Completely different from news document summarization ◦ Many topics ◦ Strong citation network ◦ Precise structure  Introduction  Related work  Experiments, etc. 9

10 10 Irrelevant Sentences Relevant Sentences Categories Aim Own Background Contrast Other New paper Model Categorized Sentences Final Summary Papers

11  Manually annotating is a very tedious and difficult job  Final summary depends on the classification accuracy  Summary might depend on the training data 11

12  Make use of the strong citation network 12

13  Page Rank?

14 14 --------- - Paper A --------- -- --------- - Paper B --------- -- X1 X2 X3 X4 X1 X5 --------- - Paper C --------- -- X1 X5 X7 Citations

15 search classify Citation 1 Citation 2 Extracted Citation Sentences Topics +ve -ve Summary Sentences from X Paper to be Summarized (X)‏ 15

16 Contains the negative points of a paper too. Different view points covered. Can be useful to create a survey. Did not work Not many negative statements made Difficult to classify as positive or negative 16

17 17 Example:

18  Split text into sentences?, paragraphs?  Text tiling to the rescue  A technique for automatically subdividing texts into multi-paragraph units that represent passages, or subtopics.

19 19 Various Machine Learning approaches have been proposed for chunking. (a,b,c,d) Chunking is a widely used technique in Natural language processing. Under the same shallow structure.. Step I – Extract text tiles Step II – Cluster cited papers

20 20 Various Machine Learning approaches have been proposed for chunking. (a,b,c,d) Step III - Extract keywords from text tiles Step IV – Search for keywords in the clusters obtained in Step II Step V – Rank relevant sentences and present to the user

21 User Search Paper Viewing Module Search Module Text tiling Module Generate Text Tiles Cluster cited papers Extract Context Clustering Module Rank Sentences Ranking Module Citation Sentences Summary Presentati on Module Link: http://ankara.lti.cs.cmu.edu/~nitina/cgi- bin/summarization/summarizer.html Pipeline 21

22

23 http://politicalsearchtrends.sandbox.yahoo.com/

24 Left leaning blogs (387)Right leaning blogs (644) From Benkler and Shaw “A tale of two blogospheres” (2010) and Wonkosphere Blog Directory

25 Use self-provided age and gender and ZIP- derived estimates People clicking on right-leaning blogs: – Are older (50 vs. 45 years) – Are more male (63% vs. 55%) – Are more white (81% vs. 78%) – More likely to study at La Sapienza (92.3% vs. 11.4%) All these trends agree with voters‘ demographics

26 “huffingtonpost.com” is left-leaning  a left-leaning vote for “pizza is a vegetable” Aggregate votes across all clicks on political blogs to compute overall leaning From Blogs to Queries v L = left-clicks for query V L = total left clicks

27 Some background first

28  Largest known knowledge repository  Covers wide range of domains  Manually tagged hierarchical categorization system  Frequently updated  Well built link structure  Categories ◦ Pages  Links

29

30

31 31

32 32

33 Examples using Wikipedia mapping for 6 months of data, July 4, 2011 – January 8, 2012. queries for Wikipedia entity “Patient Protection & Affordable Care Act” obama healthcare bill text (.91)who pays for obamacare (.04) obama health care privileges (.83)obamacare reaches the supreme court (.09) is affordable care act unconstitutional (.78) is obamacare constitutional (.16) queries for Wikipedia category “Occupy” who started occupy wall street (.94)occupy wall street rape (.09) we are the 99% (.91)occupy movement violence (.25) occupy movement supporters (.78)crime in occupy movement (.44)

34 ``cost obama trip to india‘‘ Mapping Queries to Statements 364 distinct queries mapped to true facts 574 distinct queries mapped to false facts

35

36  Small pieces of text, which may not give a lot of information, can be enhanced using external knowledge sources.

37

38 * Fake profiles 28 hour snapshot of Twitter from July 2013.

39 Nov 4, 2013 Feb 23, 2014 (BREAKUP) Apr 24, 2014 Tweets, mutual friendships and profile information collected every week. Nov 11, 2013 Nov 25, 2013 Data collected for 24 weeks. ……

40

41 Before breakupAfter breakup

42 Source: http://cdn.sheknows.com/articles/young-couple-fighting-against-red-background.jpg, http://myjoyofliving.com/wp-content/uploads/2013/02/couple-fighting.jpg After?

43 Before breakupAfter breakup

44

45  Don’t breakup and fight publicly  Word clouds as an easy source to get an overview

46  Use entity extraction on the abstracts.  Co-occurring entities might indicate something.  Create an entity co-occurrence graph.

47

48 Kiran.garimella@aalto.fi @gvrkiran


Download ppt "Kiran Garimella.  News  Scientific papers  Email  Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics."

Similar presentations


Ads by Google