Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Robust Part-Based Hand Gesture Recognition Using Kinect Sensor
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Automatic Feature Extraction for Multi-view 3D Face Recognition
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
Lecture 21: Spectral Clustering
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Major Cast Detection in Video Using Both Speaker and Face Information
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Oral Defense by Sunny Tang 15 Aug 2003
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Presented by Tienwei Tsai July, 2005
Perception Vision, Sections Speech, Section 24.7.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
1 Lucia Maddalena and Alfredo Petrosino, Senior Member, IEEE A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Clustering.
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008.
1 Similarity-based matching for face authentication Christophe Rosenberger Luc Brun ICPR 2008.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Network Community Behavior to Infer Human Activities.
Tracking Turbulent 3D Features Lu Zhang Nov. 10, 2005.
Robust Nighttime Vehicle Detection by Tracking and Grouping Headlights Qi Zou, Haibin Ling, Siwei Luo, Yaping Huang, and Mei Tian.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Using Cross-Media Correlation for Scene Detection in Travel Videos.
SUMMERY 1. VOLUMETRIC FEATURES FOR EVENT DETECTION IN VIDEO correlate spatio-temporal shapes to video clips that have been automatically segmented we.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
南台科技大學 資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授:李育強 報告者 :楊智雁 日期 : 2010/04/19 Pattern Recognition Letters,
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
PRESENTED BY Yang Jiao Timo Ahonen, Matti Pietikainen
Nonparametric Semantic Segmentation
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Image Segmentation Techniques
CSDD Features: Center-Surround Distribution Distance
CS4670: Intro to Computer Vision
Human-object interaction
Presentation transcript:

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan Zhang, Student Member, IEEE, Changsheng Xu, Senior Member, IEEE, Hanqing Lu, Senior Member, IEEE, and Yeh-Min Huang, Member, IEEE

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Introduction In a film, the interactions among the characters resemble them into a relationship network, which makes a film be treated as a small society. In the video, faces can stand for characters and co- occurrence of the faces in a scene can represent an interaction between characters. In the film script, the spoken lines of different characters appearing in the same scene also represents an interaction. scene title brief description: environment, actions speaker name spoken line

Introduction speaking face tracks face affinity network name affinity network a graph matching method an EMD-based measure of face track distance leading characters & cliques Since we try to keep as same as possible with the name statistics in the script, we select the speaking face tracks to build the face affinity network, which is based on the co-occurrence of the speaking face tracks.

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Face Clustering frame face detection: detect faces on each frame of the video face track (the same person): store face position, scale and the start and end frame number of the track video video scene segmentation: 1.The scene segmentation points can be inserted in the boundary between two shots which have the high degree of discontinuity. 2.To align with the scene partition in the film script, we change discontinuity degree threshold to get the same number of scenes in the video with the script. the scene segmentation point scene 5 scene 6 scene 7 speaking face track detection: 1.the mouth ROI is located 2.SIFT 3.normalized sum of absolute difference (NSAD) 4. if a face track has more than 10% frames labeled as speaking, it will be determined as a speaking face track

Face Clustering face representation by locally linear embedding(LLE): 1. It is a dimensionality reduction technique. 2. It project high dimensional face features into the embedding space which can still preserve the neighborhood relationship. extract the dominant clusters: 1. We employ spectral clustering to do clustering on all the faces in the LLE space. 2. The number of clusters K is set by prior knowledge derived from the film script. spectral clustering k dominant clusters

Face Clustering earth mover’s distance (EMD): It is a metric to evaluate the dissimilarity between two distributions. measure face track distance by EMD: represent face track: : is the cluster center : is the number of faces belonging to this cluster dominant clusters: cluster1 cluster2 cluster3 cluster4 cluster5 face track P : : the ground distance between cluster centers and : the flow between and

Face Clustering constrained K-Means Clustering: 1. K-Means clustering is performed to group the scatted face tracks. 2. The two face tracks which share the common frames cannot be clustered together. 3. The target number of clusters on face tracks is the same as K we set in spectral clustering on the faces. 4. We also ignore those characters whose spoken lines are less than three in the script. 5. To clean the noise from the clustering results, a pruning method is employed in the next step. speaking face track clusters cluster1 cluster2 cluster3 cluster4 cluster5 face track noise face track cluster pruning:  We refine the clustering results by pruning the marginal points which have low confidence belonging to the current cluster. : the EMD between the face track F and its cluster center k : the number of K-nearest neighbors of F : the number of K-nearest neighbors which belong to the same cluster with F  All the marginal points: We do a re-classification which incorporates the speaker voice features for enhancement. : the likelihood of ‘s voice model for X X : the feature vector of the corresponding audio clip 1. To clean noises, we set a threshold. 2. The face track will be classified into the cluster whose function score is maximal.

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Face-Name Association We use a name entity recognition software to extract every name in front of the spoken lines and the scene titles. name occurrence matrix m : the number of names n : the number of scenes : the name count of the ith character in kth scene name affinity matrix The affinity value between two names is represented by their co-occurrence. face affinity matrix

Face-Name Association vertices matching between two graphs: The name affinity network and the face affinity network both can be represented as an undirected, weighted graphs, respectively: We use spectral matching method to find the final results of name-face association.

Face-Name Association Spectral matching method: It is commonly used for finding consistent correspondences between two sets of features. A B D C M(a,a): It measures how well the feature i matches the feature i’. ex: M(A,3)=4, M(A,1)=1 M(a,b): It measures how well the edge (i,j) matches the edge (i’,j’). ex: M((A,3),(B,1))=4, M((A,3),(B,4))=0

A1A1 A2A2 A3A3 A4A4 B1B1 B2B2 B3B3 B4B4 C1C1 C2C2 C3C3 C4C4 D1D1 D2D2 D3D3 D4D4 A1A A2A A3A A4A B1B B2B B3B B4B C1C C2C C3C C4C D1D D2D D3D D4D A B D C

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Applications

 Character-Centered Browsing

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

Experiment film information speaking face track detection

Experiment The higher the value of is, the more speaking face tracks will be pruned. Precision/recall curves of face track clustering

Experiment name-face association relationship mining

Outline Introduction Face Clustering Face-Name Association Applications Experiment Conclusions

A graph matching method has been utilized to build name- face association between the name affinity network and the face affinity network. As an application, we have mined the relationship between characters and provided a platform for character-centered film browsing.