Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction Gunhee Kim Leonid Sigal Eric P. Xing 1 June 16, 2014.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Analysis and Modeling of Social Networks Foudalis Ilias.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Supervised Learning Recap
Patch to the Future: Unsupervised Visual Prediction
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.
Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing 1 School of Computer.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
Oklahoma State University Generative Graphical Models for Maneuvering Object Tracking and Dynamics Analysis Xin Fan and Guoliang Fan Visual Computing and.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.
Presented by Marlene Shehadeh Advanced Topics in Computer Vision ( ) Winter
Gunhee Kim1 Eric P. Xing1 Li Fei-Fei2 Takeo Kanade1
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
1 Unsupervised Modeling of Object Categories Using Link Analysis Techniques Gunhee Kim Christos Faloutsos Martial Hebert Gunhee Kim Christos Faloutsos.
Phylogenetic Trees Presenter: Michael Tung
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
Information Retrieval in Practice
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari.
Fixed Parameter Complexity Algorithms and Networks.
M ULTIFRAME P OINT C ORRESPONDENCE By Naseem Mahajna & Muhammad Zoabi.
MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Roee Litman, Alexander Bronstein, Michael Bronstein
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Algorithmic Detection of Semantic Similarity WWW 2005.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Cohesive Subgraph Computation over Large Graphs
Automatic Video Shot Detection from MPEG Bit Stream
Compositional Human Pose Regression
Nonparametric Semantic Segmentation
Intelligent Information System Lab
Lecture 18: Uniformity Testing Monotonicity Testing
Clustering Using Pairwise Comparisons
Learning with information of features
Estimating Networks With Jumps
Noah Snavely.
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Connecting the Dots Between News Article
Presentation transcript:

Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction Gunhee Kim Leonid Sigal Eric P. Xing 1 June 16, 2014

Problem Statement Algorithm  Video summarization  Storyline reconstruction Experiments Conclusion Outline 2

3 Background Online photo/video sharing becomes so popular Information overload problem in visual data Average 3,000 pictures uploaded per minute 100 hours of video are uploaded per minute Any efficient and comprehensive summary?

4 Our Objective Jointly summarize large sets of online images and videos The characteristics of two media are complementary A user video Videos: Much redundant and noisy information backlit subjects full of trivial BG overexposure A set of photo streams Images: More carefully taken from canonical viewpoints Video summarizationCollections of Images

5 Our Objective Jointly summarize large sets of online images and videos The characteristics of two media are complementary A set of user videos Images: Sequential structure is often missing A photo stream Videos: Motion pictures Image summarizationCollections of Videos

Problem Statement 6 (Input) A set of photo streams and user videos for a topic of interest Edges: chronological or causal relations (i.e., recur in many photo streams) Vertices: dominant image clusters (Output1) Video summary: keyframe-based summarization (Output2) Image summary as Storyline graph

7 Flickr and YouTube Dataset 20 outdoor recreational classes Surfing Beach Horse Riding RAfting YAcht Air Ball- ooning ROwing Scuba Diving Formula One SNow boarding Safari Park Mountain Camping Rock Climbing Tour de France London Marathon Fly Fishing # videos (15,912) Independ- ence Day Chinese New year Memorial Day St.Patrick Day Wimble- don # images/photo streams (2,769,504, 35,545)

Problem Statement Algorithm  Video summarization  Storyline reconstruction Experiments Conclusion Outline 8

9 Algorithm for Video Summarization 1. For each video, find the K-nearest photo streams Extreme diversity even with the same keywords Use Naïve-Bayes Nearest Neighbor method A user video A set of photo streams 2. Build a similarity graph between video frames and images

10 Algorithm for Video Summarization 1. For each video, find the K-nearest photo streams Extreme diversity even with the same keywords Use Naïve-Bayes Nearest Neighbor method A user videos A set of photo streams 2. Build a similarity graph between video frames and images k-th order Markov chain between frames Each image casts m similarity votes

11 Algorithm for Video Summarization 3. Solve the following optimization problem of diversity ranking A user videos A set of photo streams Choose the nodes to place heat source to maximize the temperature Sources should be (i) densely connected nodes, (ii) distant one another. Submodular [Kim et al. ICCV 2011]  A simply greedy achieves a constant factor approximation

Problem Statement Algorithm  Video summarization  Image summarization (Storyline reconstruction) Experiments Conclusion Outline 12

13 Definition of Storyline Graphs A storyline graph : the vertex set = the set of codewords (i.e. image clusters) Edges should be Sparse and Time-varying [Song et al. 09, Kolar et al.10] Images are too many, and much of them are largely redundant : popular transitions recurring across many photo streams Sparsity : only a small number of branching stories per node A few nonzero elements in

14 Definition of Storyline Graphs Edges should be Sparse and Time-varying [Song et al. 09, Kolar et al.10] Time-varying: popular transitions change over time timeline t = 10AM t = 12PM t = 2PM Cluster A storyline graph : the vertex set = the set of codewords (i.e. image clusters) Images are too many, and much of them are largely redundant : popular transitions recurring across many photo streams At 1PM At 7PM

15 Directed Tree Derived from Photo Stream 1. For each photo stream, find the K-nearest videos Use Naïve-Bayes Nearest Neighbor method 2. k-th order Markov chain btw images in a photo stream 4. Additional links are connected based on one-to-one correspondences 3. Keyframe detection for each neighbor video

16 Directed Tree Derived from Photo Stream 5. Replace the vee structure (impractical artifact) by two parallel edges ✗ and are followed by. Both and must occur in order for to appear.

17 Inferring Photo Storyline Graphs (1/3) Input: A set of photo streams Output : A set of adjacency matrices for Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions (A1) All photo streams are taken independently Likelihood of a single photo stream (A2) k-th order Markovian assumption btw consecutive images in PS (ex. k=1) (A3) The codewords of x l i are conditional independent one another given x l i-1 Transition model

Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions 18 Inferring Photo Storyline Graphs (2/3) For transition model, use a linear dynamic model where Gaussian noise 1st order Markovian assumption k-th order Markovian assumption A transition from x to y is very unlikely! where Transition model

Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions Inferring Photo Storyline Graphs (3/3) where For transition model, use a linear dynamic model where Gaussian noise 1st order Markovian assumption The transition model per dimension can be The log likelihood Transition model d -th row

20 Optimization (1/2) (A4) Graphs vary smoothly over time. For each t, estimate A t by maximizing the log-likelihood Optimization Data (i.e. images) Timeline Gaussian Kernel weighting

21 Optimization (2/2) In summary, the graph inference is Iteratively solve a weighted L1-regularized least square problem Trivially parallelizable (for each d) Linear-time algorithm (eg. Coordinate descent) Important in our problem (i.e. handling millions of images). where Sparsity

Problem Statement Algorithm  Video summarization  Storyline reconstruction Experiments Conclusion Outline 22

23 Evaluation of Video Summarization via AMT (OursV): our method with videos only. (OursIV): our method with videos and images (Unif): uniform sampling. (Spect),(Kmeans): Spectral clustering/Kmeans (RankT): Keyframe extraction methods using the rank-tracing technique Groundtruths for video summarication via Amazon Mechanical Turk (1) For each of 100 test videos, each algorithm selects K keyframes (2) At least five turkers are asked to choose GT keyframes (3) Compare between GT keyframes and ones chosen by the algorithm

24 Comparison of Video Summarization air+ballooningfly+fishing AMT (OursIV) (OursV) (Kmean) (Unif) (Unif): cannot correctly handle different lengths of subshots (OursIV): Get help from the voting by more carefully taken images (Kmean): hard to know best K (OursV): suffer from the limitations of using low-level features only

25 Evaluation on Storyline Graphs via AMT Main difficulty of quantitative evaluation No groudtruth available. For a human subject, images and too many and graphs are too big Crowdsourcing-based evaluation via Ex) fly+fishing Which is better?

26 Evaluation on Storyline Graphs via AMT 1. Each algorithm creates storyline per topic. 2. Sample 100 important images as test images 3. Each algorithm predicts next most-likely image after the test image 4. A pairwise preference test Given the test image, which of A and B is more likely to come next? ✔ Our method Baseline 2 Get responses from at least 3 turkers per test image A crowd of human subjects evaluate only a basic unit (i.e. important edges of storyline). Test image AB

27 Quantitative of Storyline Graphs via AMT Results of pairwise preference tests The numbers indicates the percentage of responses that our prediction is more likely to occur next. (OursV): our method with videos only. (OursIV): our method with videos and images NET: Network-based topic models ( [Kim et al. 2008] ) HMM: Hidden Markov Models Page: PageRank based image retrieval (no structural info) At least the number should be higher than 50% to validate the superiority of our algorithm.

28 Qualitative Evaluation on Storyline Graphs Given a pair of images in a novel photo stream, predict 10 images that are likely to occur between them using its storyline graph (HMM) retrieves reasonably good but highly redundant images. No branching structure. (PageRank) retrieves high-quality images but no sequential structure. GTGT Ours (HMM) (Page Rank)

29 Qualitative Evaluation on Storyline Graphs Given a pair of images in a novel photo stream, predict 10 images that are likely to occur between them using its storyline graph GTGT Ours A downsized storyline graph

Problem Statement Algorithm  Video summarization  Storyline reconstruction Experiments Conclusion Outline 30

31 Structural summary with branching narratives Global optimality, linear complexity, and easy parallelization Joint summarization of Flickr images and YouTube videos Inference algorithm for sparse time-varying directed graphs Conclusion Semantic summary even with simple feature similarity 2.7M Flickr images and 17K YouTube videos for 20 classes Images: More carefully taken from canonical viewpoints The characteristics of two media are complementary Videos: Motion pictures

Thank you ! 32