Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms.

Similar presentations


Presentation on theme: "Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms."— Presentation transcript:

1 Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms

2 This thesis is about Document collections they are everywhere they cover many domains

3 Research Publications Social Media Conference proceeding Journal transactions ArXiv Pubmed central Yahoo! news Google news CNN BBC Blogs Daily KOS Red state

4 Ban abortion with Constitutional amendment Choice is a fundamental, constitutional right CS BioPhy time Drill explosion time BP wasn't prepared for an oil spill at such depths BP: We will make this right." Temporal Dynamics Structural Correspondence

5 Thesis Question How to build a structured representation of document collections that reveals – Temporal Dynamics How ideas/events evolve over time – Structural Correspondence How ideas are addressed across modalities and communities

6 Thesis Approach Models – Probabilistic graphical models Topic models and Non-parametric Bayes – Principled, expressive and modular Algorithms – Distributed To deal with large-scale datasets – Online To update the representation with new data

7 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

8 What is a Good Model for Documents? Clustering – Mixture of unigram model How to specify a model? Generative process – Assume some hidden variables – Use them to generate documents Inference – Invert the process Given documents hidden variables cici wiwi N

9 Mixture of Unigram cici wiwi N k 1 -For Document w i - Sample c i ~ Multi( ) - Sample w i ~ Mult( ci ) wiwi 1 j k Generative Process Is this a good model for documents? When is this a good model for documents? - When documents are single-topic - Not true in our settings - When documents are single-topic - Not true in our settings

10 What Do We Need to Model? Q: What is it about? A: Mainly MT, with syntax, some learning A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrasesphrases that contain sub-phrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. Source Target SMT Alignment Score BLEU Parse Tree Noun Phrase Grammar CFG likelihood EM Hidden Parameters Estimation argMax MT Syntax Learning Unigram over vocabulary Topics Mixing Proportion Topic Models

11 Mixed-Membership Models k 1 wiwi 1 j k A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. z w N D Prior K -For each document d - Sample d ~ Prior - For each word w in d - Sample z ~ Multi( d ) -Sample w ~ Multi( z ) Generative Process

12 Topic Models Prior over topic Vector – Latent Dirichlet Allocation (LDA) – Correlated priors (CTM) – Hierarchical priors Topics – Unigram, bigrams, etc Document structure – Bag of words – Multi-modal – Side information z w N D Prior K

13 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

14 CS BioPhy Research Papers Topics Problem Statement Potentially infinite number of topics – With time-varying trends – And time-varying distributions – And variable durations Topics can die New topics can be born given Discover

15 The Big Picture Time Model Dimension LDA HDPM Dynamic clusteringDynamic LDA z w N D K Infinite Dynamic Topic Models Infinite Dynamic Topic Models

16 LDA: The Generative Process Topics distributions evolve over time? Topics trends evolve over time? z w N D K Number of topics grow with the data? -For each document d - Sample d ~ Dirichlet( ) - For each word w in d - Sample z ~ Multi( d ) -Sample w ~ Multi( z ) Generative Process

17 The Big Picture Time LDA z w N D K Infinite Dynamic Topic Models Infinite Dynamic Topic Models HDPM Model Dimension Dynamic clusteringDynamic LDA

18 Dynamic LDA: The Generative Process z w N D K Research Papers -For each document d - Sample d ~ Normal( ) - For each word w in d - Sample z ~ Multi(L( d )) -Sample w ~ Multi(L( z )) Necessary to evolve trends Logistic transformation:

19 Dynamic LDA: The Generative Process z w N D K Research Papers z w N D K t ~ Normal(.| t -1, ) k,t ~ Normal(.| k,t, ) - F or each document d - Sample d ~ Normal( t, ) - For each word w in d - Sample z d,i ~ Multi(L( d )) - Sample w d,i ~ Multi(L( z(d,i) ))

20 Dynamic LDA: The Generative Process z w N D K Research Papers z w N D K z w N D K

21 Dynamic LDA: The Generative Process z w N D K z w N D K z w N D K Topics distributions evolve over time? Topics trends evolve over time? Number of topics grow with the data?

22 The Big Picture Time LDADynamic clusteringDynamic LDA z w N D K Infinite Dynamic Topic Models Infinite Dynamic Topic Models HDPM Model Dimension

23 The Chinese Restaurant Franchise Process HDPM automatically determines number of topics in LDA We will focus on the Chinese Restaurant Franchise process construction – A set of restaurants that share a global menu Metaphor – Restaurant = documents – Customer = word – Dish = topic – Global Menu = Set of topics

24 The Chinese Restaurant Franchise Process Restaurant 1Restaurant 2 m 1 : Number of tables serving this dish (topic) m 1 : Number of tables serving this dish (topic) Table Dish served Customers Sharing the same dish Customers Sharing the same dish Customers Sharing the same dish Customers Sharing the same dish 4 : distribution for topic Global Menu

25 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table Generative Process ?

26 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table Generative Process ? w ~ Multi(L( 3 ))

27 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table Generative Process ?

28 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing dish k m k - A new dish Generative Process new ? ?

29 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing dish k m k - A new dish Generative Process ? new w ~ Multi(L( 3 ))

30 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant 3 -For customer w in restaurant 3 - Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing dish k m k - A new dish Generative Process ? new ? 5~ H 5 w ~ Multi(L( 5 ))

31 The Chinese Restaurant Franchise Process Global Menu Restaurant 1Restaurant 2 Restaurant Topics distributions evolve over time? Topics trends evolve over time? Number of topics grow with the data?

32 The Big Picture Time LDADynamic clusteringDynamic LDA z w N D K HDPM Model Dimension Infinite Dynamic Topic Models Infinite Dynamic Topic Models

33 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Documents in epoch 1 are generated as before Observations -Popular topics at epoch 1 are likely to be popular at epoch 2 k,2 is likely to smoothly evolve from k,1 Topics at end of epoch 1 - Height (m k,1 ) represent topic popularity k,1 represents topics k distribution Global Menu T=2 = * Pseudo counts Decay factor 4,1 3,1 2,1 1,1 5,1

34 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 4,1 3,1 2,1 1,1 5,1 Global Menu T=2 New real dish served 3,2 2,2 3,2 ~ Normal(.| 3,1, ) Inherited but not yet used

35 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 -For customer w in restaurant 1 -[as in static case] Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing and inherited dish k m` k,2 + m k,2 - Existing but NOT inherited dish k m` k,2 Then k,2 ~ Normal(.| k,1, ) - A new dish Then new ~ H Generative Process 4,1 3,1 2,1 1,1 5,1

36 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 -For customer w in restaurant 1 -[as in static case] Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing and inherited dish k m` k,2 + m k,2 - Existing but NOT inherited dish k m` k,2 Then k,2 ~ Normal(.| k,1, ) - A new dish Then new ~ H Generative Process 4,1 3,1 2,1 1,1 5,1

37 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 -For customer w in restaurant 1 -[as in static case] Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing and inherited dish k m` k,2 + m k,2 - Existing but NOT inherited dish k m` k,2 Then k,2 ~ Normal(.| k,1, ) - A new dish Then new ~ H Generative Process 1,2 1,2 ~ Normal(.| 1,1, ) 4,1 3,1 2,1 1,1 5,1

38 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 -For customer w in restaurant 1 -[as in static case] Choose table j N j - Choose a new table b - Sample a new dish for this table - Existing and inherited dish k m` k,2 + m k,2 - Existing but NOT inherited dish k m` k,2 Then k,2 ~ Normal(.| k,1, ) - A new dish Then new ~ H Generative Process 6,2 6,2 ~ H 4,1 3,1 2,1 1,1 5,1

39 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 6,2 Epoch 2 1,2 Global Menu T=3 died out topics Newly born 4,1 3,1 2,1 1,1 5,1

40 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 6,2 Epoch 2 1,2 Global Menu T=3 Topics distributions evolve over time? Topics trends evolve over time? Number of topics grow with the data? 4,1 3,1 2,1 1,1 5,1

41 Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Epoch 1 Global Menu T=2 3,2 2,2 6,2 Epoch 2 1,2 Global Menu T=3 -We just described a first order RCRF process - for a general -order process 4,1 3,1 2,1 1,1 5,1

42 Inference Gibbs Sampling – Sample a table for each word – Sample a topic for each table – Sample the topic parameter over time – Sample hyper-parameters How to deal with non-conjugacy – Algorithm 8 in Neals Metropolis-Hasting Efficiency – The Markov blanket contains the previous and following epochs

43 Sampling a Topic for a Table Global Menu T=1Global Menu T=2Global Menu T=3 4,1 3,1 2,1 1,1 5,1 Past Future Emission EfficiencyNon-Conjugacy 3,2 2,2 6,2 1,2

44 Sampling a Topic for a Table Global Menu T=1Global Menu T=2Global Menu T=3 4,1 3,1 2,1 1,1 5,1 Past Future Emission EfficiencyNon-Conjugacy 3,2 2,2 6,2 1,2 ~ H= N(0, ) 3

45 Sampling a Topic for a Table Global Menu T=1Global Menu T=2 3,2 2,2 6,2 1,2 Global Menu T=3 4,1 3,1 2,1 1,1 5,1 Past Future Emission Pre-compute And update Non-Conjugacy

46 Sampling Topic Parameters V| Mult( Logistic( )) Linear-State space model with non-Gaussian emission Use Laplace approximation inside the Forward- Backward algorithm Use the resulting distribution as a proposal v vv

47 Experiments Simulated data – Simulated 20 epochs with 100 data points in each epoch Timeline of the NIPS conference – 13 years – 1740 documents – 950 words per document – ~3500 vocabulary

48 Simulation Experiment Sample Documents:

49 Ground Truth Recovered

50 1987 speech Neuro sience NN Classificati on Methods Control Prob. Models image SOM RL Bayesian Mixtures Generaliza toin 1990 boosting 1991 Clustering 1995 ICA Kernels Memory speech Kernels ICA PM Classification Mixtures Control

51 field code temperature tree boltzmann energy annealing node probability field tree level energy probability node annealing boltzmann variables tree variables node level probability field distribution structure graph energy variables graph tree probability field structure node distribution energy probability variables tree field distribution graph nodes belief node inference propagation 1999 em expert mixture gating missing experts gaussian parameters density mixture em likelihood missing experts mixtures gaussian parameters mixture gaussian em likelihood parameters analysis density factor variables distribution 1999 PM Mixtures wavelet natural separation source ica coefficients independent basis 1995 source ica blind separation coefficients natural independent basis wavelet 1999 ICA method solution energy values gradient convergence equation algorithms gradient weight method methods local rate optimal descent solution gradient matrix weight algorithms local rate problems point equation matrix algorithms gradient convergence equation optimal method parameter Methods

52 support kernel svm regularization sv vectors feature regression kernel support sv svm machines regression vapnik feature solution Kernels kernel support Svm regression feature machines solution margin pca Kernel svm support regression solution machines matrix feature regularization Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing, V.Vapnik, S. E. Golowich and A.Smola - Support Vector Regression Machines H. Drucker, C. Burges, L. Kaufman, A. Smola and V. Vapnik -Improving the Accuracy and Speed of Support Vector Machines, C. Burges and B. Scholkopf - From Regularization Operators to Support Vector Kernels, A. Smola and B. Schoelkopf - Prior Knowledge in Support Vector Kernels, B. Schoelkopf, P. Simard, A. Smola and V.Vapnik - Uniqueness of the SVM Solution, C. Burges and D.. Crisp - An Improved Decomposition Algorithm for Regression Support Vector Machines, P. Laskov..... Many more

53 The Big Picture Time LDADynamic clusteringDynamic LDA z w N D K HDPM Model Dimension Infinite Dynamic Topic Models Infinite Dynamic Topic Models

54 Quantitative Analysis

55 Analyzing the NIPS Corpus Start state Posterior sample (b) (c) (a)

56 Ban abortion with Constitutional amendment Choice is a fundamental, constitutional right CS BioPhy time Drill explosion time BP wasn't prepared for an oil spill at such depths BP: We will make this right." Temporal Dynamics Structural Correspondence

57 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

58 Problem Statement Rapid growth of social media and news outlets Lots of redundancy How to get the big picture? – What are the stories? – Who are the main entities? – When and how do they develop overtime? – How are they categorized? (sports, economics, etc)

59 Proposed Solution Topic models – Discover long-term high-level themes Sports Health Politics Dynamic clustering – Discover short-term ephemeral themes Cricket match Sars epidemic Inference – Online algorithm using Sequential Monte Carlo

60 Preliminary Result Sports games Won Team Final Season League held Politics Government Minister Authorities Opposition Officials Leaders group Accidents Police Attach run man group arrested move Border-Tension Nuclear Border Dialogue Diplomatic militant Insurgency missile Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee UEFA-soccer Champions Goal Leg Coach Striker Midfield penalty Juventus AC Milan Real Madrid Milan Lazio Ronaldo Lyon Tax-bills Tax Billion Cut Plan Budget Economy lawmakers Bush Senate US Congress Fleischer White House Republican

61 Structure Browsing More Like this Story Middle-east-conflict Peace Roadmap Suicide Violence Settlements bombing Israel Palestinian West bank Sharon Hamas Arafat Based on topics Nuclear programs Nuclear summit warning policy missile program North Korea South Korea U.S Bush Pyongyang Nuclear+ topics [politics] - India in any topic - Pakistan in any topic - India and Pakistan in any topic -……

62 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

63 Modeling Dynamic User Intent How to model users intents? – Long-term – Short-term – spurious Input – Queries issued by the user – Documents viewed by the user Output Dynamic distribution over intents

64 The Big Picture Car Deals van job Hiring diet

65 The Big Picture Car Deals van job Hiring diet Hiring Salary Diet calories Auto Price Used inceptio n Flight London Hotel weather

66 The Big Picture Car Deals van job Hiring diet Hiring Salary Diet calories Auto Price Used inceptio n Flight London Hotel weather Movies Theatre Art gallery

67 The Big Picture Car Deals van job Hiring diet Hiring Salary Diet calories Auto Price Used inception Flight London Hotel weather Diet Calories Recipe chocolate Movies Theatre Art gallery School Supplies Loan college

68 The Big Picture Car Deals van job Hiring diet Hiring Salary Diet calories Auto Price Used inception Flight London Hotel weather Diet Calories Recipe chocolate Movies Theatre Art gallery School Supplies Loan college CARS Art Diet Jobs Travel College finance

69 Highlights Applications – Behavioral targeting Matching users to Ads – But you can match users to Stories New research papers Challenges – Large scale ~ 35 M users – Incremental data

70 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

71 Ban abortion with Constitutional amendment Choice is a fundamental, constitutional right CS BioPhy time Drill explosion time BP wasn't prepared for an oil spill at such depths BP: We will make this right." Temporal Dynamics Structural Correspondence

72 Biological Images High throughput devices in recent years Important source of information for biologists A pressing need to manage and organize this information for retrieval and visualization tasks Embedded within research papers Pose challenges to mainstream text-image systems FMI imagesGel images papers

73 Biological Figures are challenging Hierarchical Organization – Multiple panels – Image labels and image pointers Scoped Caption Global caption Protein annotations Free text annotations Market people Scotland water Bridge sky water fish water Clouds jet plane Mainstream image retrieval datasets

74 The Big Picture Mice + antibodies Cancer + tubulin Query Handling Module Actin Image retrieval Textual retrieval MM retrieval Anno- tation Visualization High level Overview High level overview: summary Retrieval across modalities – Image retrieval – Text-based retrieval – Text + protein based retrieval – Annotation Mixed Granularity – Input can be either panel or figure – Output can be either panel or figure Tasks

75 Why Queries Are Hard? What if I only want to retrieve figures that address the role of vha-8 during Larva state – Only addressed in panel E How can we compare figures with vastly different number of panels – Same study but with different time resolution?

76 The Big Picture Extraction System Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Mice + antibodies Cancer + tubulin Query Handling Module Across Modality and granularity Actin Image retrieval Textual retrieval MM retrieval Anno- tation Visualization High level Overview Scoped Caption Global Caption Protein entities - Segment the figure into panels - Detect panel image pointer : a, b - Detect mention of pointer in text like (a) - Match image pointer to text label (CRF) - Detect named entities in text - See paper for reference

77 The Big Picture Extraction System Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Mice + antibodies Cancer + tubulin Query Handling Module Across Modality and granularity Actin Image retrieval Textual retrieval MM retrieval Anno- tation Visualization High level Overview

78 The Big Picture Extraction System Topic Modeling Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Mice + antibodies Cancer + tubulin Query Handling Module Across Modality and granularity Image retrieval Textual retrieval MM retrieval Anno- tation Visualization High level Overview Actin

79 The Big Picture Extraction System Topic Modeling Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Mice + antibodies Cancer + tubulin Query Handling Module Across Modality and granularity Image retrieval Textual retrieval MM retrieval Anno- tation Semantic Representation Figure Panel Learnt Topics for Visualization Topic 1Topic K Actin

80 Topic Models Each topic has triplet distributions – Multinomial distribution over words – Multinomial distribution over protein words – Gaussian distribution over image features – Texture and histograms Each topic models correspondence between its facets Top panels Feature 1Feature M

81 Structured Correspondence LDA Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized PfPf MN ypyp z gwpwp wfwf yfyf F N f x r v L f ab K Learnt Topics Topic 1Topic K ProteinWordSLIF features Background Topic

82 Structured Correspondence LDA Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized PfPf MN ypyp z gwpwp wfwf yfyf F N f x r v L f ab K Learnt Topics Topic 1 Topic K Panel Number of Panels Background: annotation ratio

83 A Sample Topics Tumorigenesis Top Panels Known Tumor-suppressors Codes for protein with tumor- suppressing effect Member of Caspase family with role in apoptosis (cell programmed death)

84 Figure Embedding

85 The Big Picture Extraction System Topic Modeling Mice + antibodies Cancer + tubulin Query Handling Module Across Modality and granularity Image retrieval Textual retrieval MM retrieval Anno- tation Semantic Representation Figure Panel Learnt Topics for Visualization Topic 1Topic K Actin Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized

86 Protein Annotations Query Handling Module Across Modality and granularity Ranked list of proteins Evaluate ranking How to rank Based on similarity between latent representation of figure and protein Latent figure representation Latent protein representation Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized

87 Protein Annotations How to rank Based on similarity between latent representation of figure and protein How to evaluate the ranking Best rank Average Rank Rank at full recall Query Handling Module Across Modality and granularity Ranked list of proteins Evaluate ranking Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Actin mAB Tubulin Vhat-8 MTP-1 cPABP

88 Protein Annotations Query Handling Module Across Modality and granularity Ranked list of proteins Evaluate ranking Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized

89 Text-based Image Retrieval Input words (w) + protein (r) Output ranked list of figures – Use query language model Measure precision-recall tradeoffs Latent figure representation Latent word representation Latent protein representation

90 Transfer Learning from Partial Figures Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized.. Full figures Tie the parameters Partial Figures

91 Does it Help? Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized.. Protein annotation protein annotation

92 Transfer Learning from Partial Figures Full figures Partial Figures Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized.. p (, Words, Protein ) q ( Words, Protein ) Tie the parameters p ( Words, Protein ) Better marginal Better distribution lifted

93 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies

94 Ban abortion with Constitutional amendment Choice is a fundamental, constitutional right CS BioPhy time Drill explosion time BP wasn't prepared for an oil spill at such depths BP: We will make this right." Temporal Dynamics Structural Correspondence

95 Problem Statement Given Builds a model that could answer following Builds a model that could answer following Visualization How does each ideology view mainstream events? On which topics do they differ? On which topics do they agree?

96 Problem Statement Given Builds a model that could answer following Builds a model that could answer following Classification Given a new news article or a blog post, the system should deice: From which side it was written Justify its answer on a topical level E.g. because its view on abortion coincides with the pro-choice stance

97 Problem Statement Given Builds a model that could answer following Builds a model that could answer following Structured browsing Given a new news article or a blog post, the user can ask for : Examples of other articles from the same ideology about the same topic Documents that could exemplify alternative views from other ideologies

98 Approach: Build a Factored Model k-1 k 1,1 1,2 1,k 2,1 2,2 2,k Ideology 1 Views Ideology 2 Views Topics

99 Example: Bitterlemons corpus palestinian israeli peace year political process state end right government need conflict way security palestinian israeli Peace political occupation process end security conflict way governmen t people time year force negotiation bush US president american sharon administration prime settlement pressure policy washington ariel new middle unit state american george powell minister colin visit internal policy statement express pro previous package work transfer european administration arafat state leader roadmap george election month iraq week peace june realistic yasir senior involvement clinton november post mandate terrorism US role Palestinian View Israelie View roadmap phase security ceasefire state plan international step authority final quartet issue map effort roadmap end settlement implementation obligation stop expansion commitment fulfill unit illegal present previou assassination meet forward process force terrorism unit road demand provide confidence element interim discussion want union succee point build positive recognize present timetable Roadmap process syria syrian negotiate lebanon deal conference concession asad agreement regional october initiative relationship track negotiation official leadership position withdrawal time victory present second stand circumstance represent sense talk strategy issue participant parti negotiator peace strategic plo hizballah islamic neighbor territorial radical iran relation think obviou countri mandate greater conventional intifada affect jihad time Arab Involvement

100 Outline Background Temporal Dynamics – Timelines for research publications – Storylines form news stream – User interest-lines Structural Correspondence – Across modalities – Across ideologies Summary and Timeline

101 Summary Topic models are flexible framework Very useful if you – Care about the hidden structure – Want to leverage the hidden structure in tasks for which you have few labels – Have partially labeled data (many-many) Bayesian and Hierarchical models are not slow – It can be scaled – Can be made to work online

102 Main Contributions Models – Time-varying non-parametric framework Inference – Distributed incremental inference algorithms – Online SMC algorithms Applications – In research publications – Social media

103 Thanks! Questions?

104 Backup slides

105 Hyper-parameter Sensitivity v vv

106 v vv

107 Global Menu T=3

108 Structured cLDA and cLDA Market people Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized Blei and Jordan SIGIR 2003

109 Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Double immunofluorescence confocal microscopy using mAB against cPABP …….. And the bound antibodies were visualized.. Can we use cLDA instead? Market people Affinity-purified rabbit antir mnp 41 antibodies Monocolonal anti- cPAPB antibodies Whole captions replicationScoped captions replication Lose structure can no longer answer figure queries Under representationOver representation

110 Mixtures and MM-models k 1 wiwi 1 j k k 1 wiwi 1 j k - Two orthogonal dimensions - Mixtures - Membership models - Two orthogonal dimensions - Mixtures - Membership models

111 Example Story Story: Obamas Controversial pastor – Topics Politics Religion Race – Entities: Obama, Wright, Illinois

112 Storyline Models We can use clustering – Each document belong to a story (cluster) – Lacks global structure What is shared across stories? How about story classification? We can use topic models – Ignore the notion of story Tightly-focused, Short-term – Topics are high-level concept coarse-grained, Long-term


Download ppt "Amr Ahmed Thesis Proposal Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms."

Similar presentations


Ads by Google