Using Cross-Media Correlation for Scene Detection in Travel Videos.

Slides:



Advertisements
Similar presentations
QR Code Recognition Based On Image Processing
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Kien A. Hua Division of Computer Science University of Central Florida.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Detecting Re-captured Videos using Shot-Based Photo Response Non-Uniformity Dae-Jin Jung.
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Automatic Feature Extraction for Multi-view 3D Face Recognition
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
Robust Object Tracking via Sparsity-based Collaborative Model
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.
On the use of hierarchical prediction structures for efficient summary generation of H.264/AVC bitstreams Luis Herranz, Jose´ M. Martı´nez Image Communication.
ACM Multimedia 2008 Feng Liu 1, Yuhen-Hu 1,2 and Michael Gleicher 1.
3D Video Generation and Service Based on a TOF Depth Sensor in MPEG-4 Multimedia Framework IEEE Consumer Electronics Sung-Yeol Kim Ji-Ho Cho Andres Koschan.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
ADVISE: Advanced Digital Video Information Segmentation Engine
A Study of Approaches for Object Recognition
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Local Affine Feature Tracking in Films/Sitcoms Chunhui Gu CS Final Presentation Dec. 13, 2006.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
On the Use of Computable Features for Film Classification Zeeshan Rasheed,Yaser Sheikh Mubarak Shah IEEE TRANSCATION ON CIRCUITS AND SYSTEMS FOR VIDEO.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
A Probabilistic Framework for Video Representation Arnaldo Mayer, Hayit Greenspan Dept. of Biomedical Engineering Faculty of Engineering Tel-Aviv University,
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Yuping Lin and Gérard Medioni.  Introduction  Method  Register UAV streams to a global reference image ▪ Consecutive UAV image registration ▪ UAV to.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Information Extraction from Cricket Videos Syed Ahsan Ishtiaque Kumar Srijan.
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
Miguel Reyes 1,2, Gabriel Dominguez 2, Sergio Escalera 1,2 Computer Vision Center (CVC) 1, University of Barcelona (UB) 2
3D SLAM for Omni-directional Camera
Computer Vision Lab Seoul National University Keyframe-Based Real-Time Camera Tracking Young Ki BAIK Vision seminar : Mar Computer Vision Lab.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
An Efficient Search Strategy for Block Motion Estimation Using Image Features Digital Video Processing 1 Term Project Feng Li Michael Su Xiaofeng Fan.
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
Detection of Illicit Content in Video Streams Niall Rea & Rozenn Dahyot
Visual Odometry David Nister, CVPR 2004
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
MULTIMEDIA DATA MODELS AND AUTHORING
SUMMERY 1. VOLUMETRIC FEATURES FOR EVENT DETECTION IN VIDEO correlate spatio-temporal shapes to video clips that have been automatically segmented we.
Pixel Parallel Vessel Tree Extraction for a Personal Authentication System 2010/01/14 學生:羅國育.
Visual homing using PCA-SIFT
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Presenter: Ibrahim A. Zedan
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Pre-Production Determine the overall purpose of the project.
Dynamical Statistical Shape Priors for Level Set Based Tracking
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Introduction to Algorithms: Dynamic Programming
Ying Dai Faculty of software and information science,
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Presentation transcript:

Using Cross-Media Correlation for Scene Detection in Travel Videos

Outline  Introduction  Approach  Experiments  Conclusion

Introduction Why Use Cross Media Correlation for Scene Detection in Travel Video?? What Correlation between photos and video? More and more people get used to record daily life and travel experience both by Digital Cameras and Camcorders. (much lower cost in Camera and Camcorders)

Why Use Cross Media Correlation for Scene Detection in Travel Video?? What Correlation between photos and video? People often capture travel experience by still Camera and Camcorders. The content stored in photos and video contain similar information. Such as Landmark, Human’s Face. Massive home videos captured in uncontrolled environments, such as overexposure/underexposure and hand shaking.

Why Use Cross Media Correlation for Scene Detection in Travel Video??  It’s Hard for direct scene detection in video.  High correlation between photo and video.  Photo obtain high quality data (scene detection is more easier).

Approach  What’s different purpose that people use photo and video even capture same things?  Photo To obtain high quality data, capture famous landmark or human’s face  Video To Capture evolution of an event Utilize the correlation so that we can succeed the works that are harder to be conducted in videos, but easier to be done in photos

FrameWork  To perform scene detection in photos: First we cluster photo by checking time information.  To perform scene detection in videos: First we extract several keyframe for each video shot, and find the optimal matching between photo and keyframe sequences

The idea of scene detection based on cross media alignment

The proposed cross-media scene detection framework Photos Time-based clustering Visual word representation DP-based Matching Videos Shot change detection Keyframe extraction Filtering (motion blur cease ) Visual word representation Scene boundaries This process not only reduces the time of cross-media matching, but also eliminates the influence of bad-quality image

Preprocessing  Scene Detection for Photos utilize different shooting time to cluster photo denote the time difference between the ith photo and the (i+1)-th photo as g i g i = t i+1 - t i A scene change is claimed to occur between the nth and (n+1)-th photos. We set K as 17 and set d as 10 in this work. K is an empirical threshhold D is the size of sliding window

Preprocessing  Use Global k-means algorithm to extract Keyframe  Detect and Filtering blur Keyframe. It’s no only reduces the time of cross-media matching, but also eliminates the influence of bad-quality images.

Visual Word Representation  Apply the difference-of-Gaussian(DoG) detector to detect feature points in keyframes and photos  Use SIFT(Scale-Invariant Feature Transform) to describe each point as a 128-dimensional feature vector.  SIFT-based feature vectors are clustered by a k- means algorithm, and feature points in the same cluster are claimed to belong to the same visual word

Visual Word Representation KeyFrames, Photos SIFT Feature point (Feature vector) K-means Visual Word

Visual Word Histogram Matching X i denote the i th prefix of X, i.e., X i = LCS(X i,Y j ) denotes the length of the longest common subsequence between X i and Y j

Evaluation Data

Evaluation Metric The first term indicates the fraction of the current evaluated scene, and the second term indicates how much a given scene is split into smaller scenes. The purity value ranges from 0 to 1. Larger purity value means that the result is closer to the ground truth τ(s i,s j * ) is the length of overlap between the scene s i and s j * τ(s i ) is the length of the scene s i T is the total length of all scenes

Performance in terms of purity based on different numbers of visual words, with different similarity thresholds

Performance based on four different scene detection approaches Hue Saturation Value

Conclusion For video, extract keyframe by global k- means algo. (Scen spot can be easily determined by time information of photos) Representing keyframes and photo set by a sequence of visual word. Transform scene detection into a sequence matching algo.

Conclusion  By using a dynamic programming approach, find optimal matching between two sequence, determine video scene boundaries with the help of photo scene boundaries. By experiment on different travel video, different parameter settings, result shows that using correlation between different modalities is effective