Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taming Information Overload Carlos Guestrin Khalid El-Arini Dafna Shahaf Yisong Yue.

Similar presentations


Presentation on theme: "Taming Information Overload Carlos Guestrin Khalid El-Arini Dafna Shahaf Yisong Yue."— Presentation transcript:

1 Taming Information Overload Carlos Guestrin Khalid El-Arini Dafna Shahaf Yisong Yue

2

3 Search Limitations InputOutput Interaction New query

4 Our Approach InputOutput Interaction Phrase complex information needs Structured, annotated output New queryRicher forms of interaction

5 Millions of blog posts published every day Some stories become disproportionately popular Hard to find information you care about

6 Our goal: coverage Turn down the noise in the blogosphere select a small set of posts that covers the most important stories January 17, 2009

7 Our goal: coverage Turn down the noise in the blogosphere select a small set of posts that covers the most important stories

8 posts selected without personalization Our goal: personalization Tailor post selection to user tastes After personalization based on Zidane’s feedback But, I like sports! I want articles like:

9 TDN outline [El-Arini, Veda, Shahaf, G. ‘09] Coverage: Formalize notion of covering the blogosphere Near-optimal solution for post selection Evaluate on real blog data and compare against: Personalization: Learn a personalized coverage function Algorithm for learning user preferences using limited feedback Evaluate on real blog data

10 Approach Overview Blogosphere … Feature Extraction Coverage Function Post Selection Personalization

11 Document Features Low level Words, noun phrases, named entities e.g., Obama, China, peanut butter High level e.g., Topics Topic = probability distribution over words Inauguration TopicNational Security Topic

12 Coverage Function cover ( ) = amount by which covers cover ( ) = amount by which {, } covers … Features Posts … Post dFeature f cover d (f) Set A Feature f cover A (f) Some features more important than others  e.g., weigh by frequency A post never covers a feature completely  use soft notion of coverage, e.g., prob. at least one post in A covers feature f A post never covers a feature completely  use soft notion of coverage, e.g., prob. at least one post in A covers feature f

13 Objective Function for Post Selection Want to select a set of posts that maximizes Maximizing is NP-hard! probability that set A covers feature f is submodular Advantage( ) ) Greedy algorithm  (1-1/e)-approximation CELF Algorithm  very fast, same guarantees Greedy algorithm  (1-1/e)-approximation CELF Algorithm  very fast, same guarantees feature set weights on features Posts shown to user

14 Approach Overview Blogosphere Feature Extraction Coverage Function Post Selection Submodular function optimization Personalization

15 Evaluating Coverage Evaluate on real blog data from Spinn3r 2 week period in January 2009 ~200K posts per day (after pre-processing) Two variants of our algorithm User study involving 27 subjects to evaluate: TDN+LDA: High level features Latent Dirichlet Allocation topics TDN+NE: Low level features Topicality & Redundancy

16 Topicality = Relevance to current events … Reference Stories (ground truth) Post for evaluation Downed jet lifted from ice- laden Hudson River NEW YORK (AP) - The airliner that was piloted to a safe emergency landing in the Hudson… Is this post topical? i.e., is it related to any of the major stories of the day?

17 User study: Are the stories important that day? LDA topics as features Named entities and common noun phrases as features Higher is better TDN +NE TDN +LDA We do as well as Yahoo! and Google

18 Evaluation: Redundancy 1.Israel unilaterally halts fire as rockets persist 2.Downed jet lifted from ice-laden Hudson River 3.Israeli-trained Gaza doctor loses three daughters and niece to IDF tank shell 4.... Is this post redundant with respect to any of the previous posts?

19 User study: Are the stories redundant with each other? Lower is better TDN +LDA TDN +NE Google performs poorly We do as well as Yahoo! Google performs poorly We do as well as Yahoo!

20 Higher is better TDN +LDA TDN +NE User study summary Google: good topicality, high redundancy Yahoo!: performs well on both, but uses rich features CTR, search trends, user voting, etc. TopicalityRedundancy Lower is better TDN +LDA TDN +NE TDN performs as well as Yahoo! using only post content TDN performs as well as Yahoo! using only post content

21 TDN outline Coverage: Formalize notion of covering the blogosphere Near-optimal solution for post selection Evaluate on real blog data and compare against: Personalization: Learn a personalized coverage function Algorithm for learning user preferences using limited feedback Evaluate on real blog data

22 Personalization People have varied interests Our Goal: Learn a personalized coverage function using limited user feedback Barack Obama Britney Spears

23 Personalize postings Blogosphere coverage function personalization post selection post selection personalized coverage fn. personalized post selection learn your coverage function online learning of submodular function problem algorithm with strong theoretical guarantees

24 Modeling User Preferences represents user preference for feature f Want to learn preference over the features for a sports fan for a politico User preference Importance of feature in corpus

25 Exploration vs Exploitation Goal: want to recommend content that users like Exploiting previous feedback from users However: users only provide feedback on recommended content Explore to collect feedback for new topics Solution: algorithm to balance exploration vs exploitation Linear Submodular Bandits Problem

26 Balancing Exploration & Exploitation [Yue & Guestrin, 2011] For each slot, maximize trade-off (pick article about Tennis) Estimated coverage gain Uncertainty of estimate Mean Estimate by Topic Uncertainty of Estimate

27 After 1 day of personalization Least Squares Update Least Squares Update Learning User Preferences (Approach 2) [Yue & Guestrin, 2011] Before any feedback After 2 days of personalization Explore-Exploit Tradeoff Explore-Exploit Tradeoff Explore-Exploit Tradeoff Theorem: For Bandit Alg, Learned π (1-1/e) avg( ) – avg( ) 0 i.e., we achieve no-regret Learns a good approximation of the true π* Rate: 1/sqrt(kT), recommending k documents, T rounds True π*

28 User Study: Online Learning for News Recommendation 27 users in study Submodular Bandits Wins Static Weights Submodular Bandits Wins Ties Losses Multiplicative Updates (no exploration) Submodular Bandits Wins Ties Losses No-Diversity Bandits

29 We Synthesist [Guestrin, Khan ’09]

30 Our Approach InputOutput Interaction Phrase complex information needs Structured, annotated output Richer forms of interaction what about more complex information needs?

31 A Prescient Warning Today: 10 7 papers in 10 5 conferences/journals* * Thomson Reuters Web of Knowledge How do we cope? As long as the centuries…unfold, the number of books will grow continually… as convenient to search for a bit of truth concealed in nature as to find it hidden away in an immense multitude of bound volumes -Dennis Diderot, Encyclopédie (1755)

32 Motivation (1) Is there an approximation algorithm for the submodular covering problem that doesn’t require an integral-valued objective function? Any recent papers influenced by this?

33 Motivation (2) It’s 11:30pm Samoa Time. Your “Related Work” section is a bit sparse. Here are some papers we’ve cited so far. Anything else? Here are some papers we’ve cited so far. Anything else?

34 Given a set of relevant query papers, what else should I read?

35 An example query set seminal/background paper? a competing approach? Cited by all query papers Cites all query papers However, unlikely to find papers directly connected to entire query set. We need something more general… However, unlikely to find papers directly connected to entire query set. We need something more general…

36 Select a set of papers A with maximum influence to/from the query set Q

37 Modeling influence Ideas flow from cited papers to citing papers

38 Influence context Why do I cite this paper? generative model of text variational inference EM … we call these concepts

39 Concept representation Words, phrases or important technical terms Proteins, genes, or other advanced features Our assumption: Influence always occurs in the context of concepts Our assumption: Influence always occurs in the context of concepts

40 Influence by Word Influence with respect to word: bayesian Probabilistic Latent Semantic Indexing Probabilistic Latent Semantic Indexing Modern Information Retrieval Modern Information Retrieval Text Classification… Using EM Text Classification… Using EM Introduction to Variational Methods for Graphical Models Introduction to Variational Methods for Graphical Models Latent Dirichlet Allocation

41 Influence by Word Influence with respect to word: simplex Probabilistic Latent Semantic Indexing Probabilistic Latent Semantic Indexing Modern Information Retrieval Modern Information Retrieval Text Classification… Using EM Text Classification… Using EM Introduction to Variational Methods for Graphical Models Introduction to Variational Methods for Graphical Models Latent Dirichlet Allocation

42 Influence by Word Influence with respect to word: mixture Probabilistic Latent Semantic Indexing Probabilistic Latent Semantic Indexing Modern Information Retrieval Modern Information Retrieval Text Classification… Using EM Text Classification… Using EM Introduction to Variational Methods for Graphical Models Introduction to Variational Methods for Graphical Models Latent Dirichlet Allocation mixture Weight on edge (u,v) in G c represents strength of direct influence from u to v What is the chance that a random instance of concept c in v came about as a result of u? Weight on edge (u,v) in G c represents strength of direct influence from u to v What is the chance that a random instance of concept c in v came about as a result of u?

43 Influence: Definition x y … … … Edge (u, v) in G c active with probability Influence c (x  y): probability exists path in G c between x and y, consisting of active edges #P-complete, but efficient approximation using sampling or dynamic programming

44 Maximize F Q (A) s.t. |A| ≤ k (output k papers) Submodular optimization problem, solved efficiently and near-optimally Putting it all together prevalence of c in paper q query set probability of influence between q and at least one paper in A selected papers concept set

45 But should all users get the same results?

46 Personalized trust Different communities trust different researchers for a given concept Goal: Estimate personalized trust from limited user input e.g., network KleinbergHintonPearl

47 Specifying trust preferences Specifying trust should not be an onerous task Assume given (nonexhaustive!) set of trusted papers B, e.g., a BibTeX file of all the researcher’s previous citations a short list of favorite conferences and journals someone else’s citation history! a committee member? journal editor? someone in another field? a Turing Award winner?

48 Computing trust How much do I trust Jon Kleinberg with respect to the concept “network”? B Kleinberg’s papers An author is trusted if he/she influences the user’s trusted set B

49 Personalized Objective Extra weight in Influence: Does user trust at least one of authors of d with respect to concept c? probability of influence between q and at least one paper in A

50 Personalized Recommendations general recommendations: personalized recommendations

51 User Study Evaluation 16 PhD students in machine learning For each: Selected a recent publication of participant — the study paper — for which we find related work Two variants of our methodology (w/ and w/o trust) Three state-of-the-art alternatives: Relational Topic Model (generative model of text and links) [Chang, Blei ‘10] Information Genealogy (uses only document text) [Shaparenko, Joachims ‘07] Google Scholar (based on keywords provided by coauthor) Double blind study where participant provided with title/author/abstract of one paper at a time, and asked several questions

52 Usefulness our approach higher is better On average, our approach provides more useful and more must-read papers than comparison techniques

53 Trust our approach higher is better On average, our approach provides more trustworthy papers than comparison techniques, especially when incorporating participant’s trust preferences

54 Novelty our approach On average, our approach provides more familiar papers than comparison techniques, especially when incorporating participant’s trust preferences

55 Diversity In pairwise comparison, our approaches produce more diverse results than the comparison techniques us them

56 Our Approach InputOutput Interaction Phrase complex information needs Structured, annotated output Richer forms of interaction what about structured outputs?

57 Connecting the Dots: News Domain 3.19.2008

58 Input: Pick two articles (start, goal) Output: Bridge the gap with a smooth chain of articles Input: Pick two articles (start, goal) Output: Bridge the gap with a smooth chain of articles Input: Pick two articles (start, goal) InputOutput Interaction InputOutput Interaction Bailout Housing Bubble

59 Keeping Borrowers Afloat A Mortgage Crisis Begins to Spiral,... Investors Grow Wary of Bank's Reliance on Debt Markets Can't Wait for Congress to Act Bailout Plan Wins Approval Housing Bubble Bailout InputOutput Interaction Input: Pick two articles (start, goal) Output: Bridge the gap with a smooth chain of articles Input: Pick two articles (start, goal) Output: Bridge the gap with a smooth chain of articles

60 Connecting the Dots [Shahaf, G. ‘10]

61 What is a Good Chain? What’s wrong with shortest-path? Build a graph – Node for every article – Edges based on similarity Chronological order (DAG) – Run BFS s t

62 Shortest-path A1: alks Over Ex-Intern's Testimony On Clinton Appear to Bog A2: Judge Sides with the Government in Microsoft Antitrust Trial A3: Who will be the Next Microsoft? – trading at a market capitalization… A4: Palestinians Planning to Offer Bonds on Euro. Markets A5: Clinton Watches as Palestinians Vote to Rescind 1964 Provision A6: ontesting the Vote: The Overview; Gore asks Public For Lewinsky Florida recount Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience;

63 Shortest-path A1: alks Over Ex-Intern's Testimony On Clinton Appear to Bog A2: Judge Sides with the Government in Microsoft Antitrust Trial A3: Who will be the Next Microsoft? – trading at a market capitalization… A4: Palestinians Planning to Offer Bonds on Euro. Markets A5: Clinton Watches as Palestinians Vote to Rescind 1964 Provision A6: ontesting the Vote: The Overview; Gore asks Public For Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience;

64 Shortest-path A1: A2: Judge Sides with the Government in Microsoft Antitrust Trial A3: Who will be the Next Microsoft? – trading at a market capitalization… A4: Palestinians Planning to Offer Bonds on Euro. Markets A5: Clinton Watches as Palestinians Vote to Rescind 1964 Provision A6: Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience; Stream of consciousness? - Each transition is strong - No global theme Stream of consciousness? - Each transition is strong - No global theme

65 More-Coherent Chain Lewinsky Florida recount Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience; B1: B2: Clinton Admits Lewinsky Liaison to Jury B3: G.O.P. Vote Counter in House Predicts Impeachment of Clinton B4: Clinton Impeached; He Faces a Senate Trial B5: Clinton’s Acquittal; Senators Talk About Their Votes B6: Aides Say Clinton Is Angered As Gore Tries to Break Away B7: As Election Draws Near, the Race Turns Mean B8:

66 More-Coherent Chain Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience; B1: B2: Clinton Admits Lewinsky Liaison to Jury B3: G.O.P. Vote Counter in House Predicts Impeachment of Clinton B4: Clinton Impeached; He Faces a Senate Trial B5: Clinton’s Acquittal; Senators Talk About Their Votes B6: Aides Say Clinton Is Angered As Gore Tries to Break Away B7: As Election Draws Near, the Race Turns Mean B8: What makes it coherent?

67 For Shortest Path Chain Topic changes every transition (jittery) Word Patterns

68 For Coherent Chain Topic consistent over transitions Use this intuition to estimate coherence of chains

69 Strong transitions between consecutive documents d1 w1: Lewinsky w2: Clinton w5: Microsoft w4: Intern w3: Oath 4 4 3 3 1 1 min(4,3,1)=1 d2d3 d4

70 Strong transitions between consecutive documents min(4,3,1)=1 Too coarse – Word importance in transition – Missing words ??? Intuitively, high iff d i and d i+1 very related w plays an important role in the relationship Intuitively, high iff d i and d i+1 very related w plays an important role in the relationship

71 Influence with No Links! min(4,3,1)=1 Intuitively, high iff d i and d i+1 very related w plays an important role in the relationship Intuitively, high iff d i and d i+1 very related w plays an important role in the relationship Most methods assume edges – Influence propagates through the edges No edges in our dataset New Method: measure influence of d i on d i+1 WRT word w without links! using cool random walks idea New Method: measure influence of d i on d i+1 WRT word w without links! using cool random walks idea

72 Global Theme, No Jitter Jittery chain can score well! But need a lot of words… Good chains can often be represented by a small number of segments

73 Global Theme, No Jitter Choose 3 segments to be scored on Good score Score = 0

74 Coherence: New Objective Maximize over legal activations: – Limit total number of active words – Limit number of words per transition – Each word to be activated at most once

75 Finding a good chain Problem is NP-Complete Can’t brute-force – n d possible chains: >>10 20 after pruning Softer notion of activation: [0,1] Natural formalization as a linear program (LP) s23t next(s,3) next(s,2) next(s,t)

76 LP doesn’t output chain, but edge weights in [0,1]  we need to round Approximation guarantees – Chain length K in expectation – Objective: O(sqrt(ln(n/  )) with probability 1-  Rounding s23t 0.7 0.3 0.6 0.1 0.9

77 Evaluation: Competitors Shortest path Google Timeline Enter a query Pick k equally-spaced articles Event threading (TDT) [Nallapati et al ‘04] Generate cluster graph Representative articles from clusters

78 Example Chain (1) Simpson Strategy: There Were Several Killers O.J. Simpson's book deal controversy CNN OJ Simpson Trial News: April Transcripts Tandoori murder case a rival for OJ Simpson case Google News Timeline Simpson trial Simpson verdict

79 Example Chain (2) Issue of Racism Erupts in Simpson Trial Ex-Detective's Tapes Fan Racial Tensions in LA Many Black Officers Say Bias Is Rampant in LA Police Force With Tale of Racism and Error, Lawyers Seek Acquittal Connect-the-Dots Simpson trial Simpson Verdict

80 Evaluation #1: Familiarity 18 users Show two articles – 5 news stories – Before and after reading the chain Do you know a coherent story linking these articles?

81 Average fraction of gap closed Better Base familiarity: 2.1 3.1 3.2 3.4 1.9 Effectiveness (improvement in familiarity)

82 Average fraction of gap closed Better Base familiarity: 2.1 3.1 3.2 3.4 1.9 Effectiveness (improvement in familiarity)

83 Average fraction of gap closed Better Base familiarity: 2.1 3.1 3.2 3.4 1.9 Effectiveness (improvement in familiarity)

84 Average fraction of gap closed Better Base familiarity: 2.1 3.1 3.2 3.4 1.9 Effectiveness (improvement in familiarity)

85 Average fraction of gap closed Base familiarity: 2.1 3.1 3.2 3.4 1.9 Better Significant improvement in 4/5 stories

86 Evaluation #2: Chain Quality Compare two chains for – Coherence – Relevance – Redundancy  

87 Relevance, Coherence, Non- redundancy Across complex stories Better CoherenceRelevanceNon-redundancy

88 Relevance, Coherence, Non- redundancy Across complex stories Better CoherenceRelevanceNon-redundancy

89 Relevance, Coherence, Non- redundancy Across complex stories Better CoherenceRelevanceNon-redundancy

90 Relevance, Coherence, Non- redundancy Across complex stories Better CoherenceRelevanceNon-redundancy

91 What’s left? Interaction Two documents Chain

92 Interaction Types 1.Refinement 2.User interests d1 d2 d3d4 ???

93 Interaction … Defense cross-examines state DNA expert With fiber evidence, prosecution … Simpson Defense Drops DNA Challenge Simpson Verdict … A day the country stood still In the joy of victory, defense team in discord … Many black officers say bias Is rampant in LA police force Racial split at the end … Verdict Race Blood, glove Algorithmic ideas from online learning

94 Interaction User Study Refinement – 72% prefer chains refined by our method over baseline approach User Interests – 63.3% able to reverse engineer selected concepts which out of the following 10 words were used to personalize chain? 2 correct words out of 10

95 Where We Are Going… InputOutput Interaction Phrase complex information needs Structured, annotated output Richer forms of interaction can we provide more structure than chains?

96 An example of a structured view query select important documents selected, structured, annotated relations

97 An example of a structured view machines can’t have emotions is supported by concept of feeling only applies to living organisms [Ziff ‘59] we can imagine artifacts that have feelings [Smart ‘59] is disputed by deeper understanding  address information overload challenge: build structured view automatically! deeper understanding  address information overload challenge: build structured view automatically!

98 Trains of Thought Given a set of documents Show important pieces of information … and how they relate austerity bailout junk status protests strike Germany labor unions Merkel

99 What makes a good map? 1.Coherence 2.Coverage 3.Connectivity

100 Approach overview Documents D … 1. Coherence graph G Encodes all coherent chains as graph paths 2. Coverage function f f( ) = ? 3. Find high-coverage, high-connectivity paths Find a set of paths that maximize coverage

101 User Study Three stories: – Chile miners – Haiti earthquake – Greek debt crisis

102 Task: Effectiveness 10-question form Answer it once at the beginning… and again while interacting with Maps / Google News/ Maps without graph structure How fast do people acquire knowledge?

103 Effectiveness: Results Rate of closing the knowledge-gap: minutes % closing the gap Structureless > Google Due to content selection Structure > Structureless Especially for complex stories

104 Taming Information Overload InputOutput Interaction Phrase complex information needs Structured, annotated output Richer forms of interaction set of query papers, end points of chain set of query papers, end points of chain smooth chain connecting the dots, metro maps, issue maps smooth chain connecting the dots, metro maps, issue maps efficient algorithms with theoretical guarantees like/dislike (online learning) bibtex file (trust) feedback on concepts like/dislike (online learning) bibtex file (trust) feedback on concepts multiple user studies  promising direction for taming challenge of information overload


Download ppt "Taming Information Overload Carlos Guestrin Khalid El-Arini Dafna Shahaf Yisong Yue."

Similar presentations


Ads by Google