Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimodal Learning with Deep Boltzmann Machines

Similar presentations


Presentation on theme: "Multimodal Learning with Deep Boltzmann Machines"— Presentation transcript:

1 Multimodal Learning with Deep Boltzmann Machines
Author: Nitish Srivastava, Ruslan Salakhutdinov Presenter: Shuochao Yao

2 Data - Collection of Modalities
Multimedia content on the web - image + text + audio Product recommendation systems. Robotics applications. sunset, pacificocean, bakerbeach, seashore, ocean car, automobile Vision Audio Motion sensors Motor control

3 Shared Concept “Modality-free” representation “Concept”
“Modality-full” representations

4 Building a Probabilistic Model
P(h| vtext, vimage) P(h, vimage| vtext) P(h, vtext| vimage) Learn a joint density model P(h, vimage, vtext). h : “fused” representation for classification, retrieval. Generate data from conditional distributions for Image Annotation, Image Retrieval. “Concept” h sunset, pacificocean, bakerbeach, seashore, ocean sunset, pacificocean, bakerbeach, seashore, ocean Missing data Missing data vimage vtext

5 Challenges - 1 Image Text Very different input representations.
Images - real-valued, dense Text - discrete, sparse Difficult to learn cross- modal features from low-level representations. sunset, pacificocean, bakerbeach, seashore, ocean Real-valued, dense features Sparse, discrete word counts

6 Text generated by model
Challenges - 2 Image Text beach, sea, surf, strand, shore, wave, seascape, sand, ocean, waves fall, autumn, trees, leaves, foliage, forest, woods, branches, path portrait, girl, woman, lady, blonde, pretty, gorgeous, expression, model night, notte, traffic, light, lights, parking, darkness, lowlight, nacht, glow Text generated by model pentax, k10d, pentaxda50200, kangarooisland, sa, australiansealion Noisy and missing data mickikrimmel, mickipedia, headshot < no text > unseulpixel, naturey, crap

7 Restricted Boltzmann Machines
Pair-wise potentials Unary potentials hidden variables RBM is a Markov Random Field with - Stochastic binary visible variables Stochastic binary hidden variables Bipartite connections.

8 RBMs on Real-Valued Data
Pair-wise potentials Unary potentials hidden variables Gaussian RBM is an RBM with - Stochastic real-valued visible variables Stochastic binary hidden variables Bipartite connections.

9 RBMs on Word Counts Replicated Softmax Model: undirected topic model
Pair-wise potentials Unary potentials 1 Replicated Softmax Model: undirected topic model Stochastic 1-of-K visible variables Stochastic binary hidden variables Bipartite connections.

10 A nice thing about RBMs P(h|v) is easy to compute exactly.
Binary/Gaussian/Softmax RBMs : All have binary hidden variables but use them to model different kinds of data. with. binary real-valued 1-of-K

11 A Simple Multimodal Model
Use a joint binary hidden layer. Problem: Inputs have very different statistical properties. Difficult to learn cross-modal features. real-valued 1-of-K

12 Deep Boltzmann Machines
Layers of binary hidden variables. Undirected connections across layers. Hidden variables are dependent even when conditioned on the input. Learns higher order features. Same as RBMs Hidden-hidden potentials

13 Multimodal DBMs Replicated Softmax Gaussian RBM Dense image features
Sparse word counts

14 Multimodal DBMs Bottom-up + Top-down Replicated Softmax Gaussian RBM
Dense image features Sparse word counts

15 Text Generated From Images
insect, butterfly, insects, bug, butterflies, lepidoptera dog, cat, pet, kitten, puppy, ginger, tongue, kitty, dogs, furry graffiti, streetart, stencil, sticker, urbanart, street, mural, nyc, graff, sanfrancisco food, art, dessert, cooking, delicious, cake, lunch, sugar architecture, reflection, window, building, facade, architektur sea, france, boat, mer, beach, river, bretagne, plage, brittany portrait, child, kid, ritratto, kids, children, boy, cute, boys, italy canada, nature, sunrise, ontario, fog, mist, bc, morning

16 Text Generated From Images
portrait, women, army, soldier, mother, postcard, soldiers obama, barackobama, election, politics, president, hope, change, sanfrancisco, convention, rally water, glass, beer, bottle, drink, wine, bubbles, splash, drops, drop

17 Images from Text water, red, sunset nature, flower, red, green
blue, green, yellow, colors chocolate, cake

18 Pretrained independently
Pretraining Pretrained independently Replicated Softmax Gaussian RBM Dense image features Sparse word counts

19 Data Set MIR-Flickr dataset. (Huiskes et. al.)
anawesomeshot, theperfectphotographer, flash, damniwishidtakenthat, spiritofphotography nikon, abigfave, goldstaraward, d80, nikond80 food, cupcake, vegan sculpture, beauty, stone sky, geotagged, reflection, cielo, bilbao, reflejo white, yellow, abstract, lines, bus, graphic nikon, green, light, photoshop, apple, d70 d80 1 million images along with user-assigned tags.

20 Data and Architecture 12M parameters Images features - Gist, SIFT, MPEG-7 descriptors dims 2000 most frequent tags. 25K labelled subset: 38 classes - sky, tree, baby, car, cloud ... 2048 1024 2000 3857

21 Classification Results
Logistic regression on top-level representation. Mean average precision Multimodal Inputs Method MAP Random 0.124 LDA [Huiskes et. al.] 0.492 0.754 SVM [Huiskes et. al.] 0.475 0.758 DBM-Labelled 0.526 0.791 DBM-Unlablled 0.585 0.836 Deep Belief Net 0.599 0.867 Autoencoder 0.6 0.875 DBM 0.609 0.873 Similar features,25K + 1 Million unlabelled + SIFT features

22 Classification Layer-wise

23 Classification Results
Training Phase Test Phase Missing Text Method MAP Image-LDA [Huiskes et. al.] 0.315 - Image-SVM [Huiskes et. al.] 0.375 Image-DBM 0.469 0.803 Multimodal-DBM (missing text) 0.531 0.832

24 Thank you!

25 Image features Words

26 Generated Text


Download ppt "Multimodal Learning with Deep Boltzmann Machines"

Similar presentations


Ads by Google