Presentation is loading. Please wait.

Presentation is loading. Please wait.

김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.

Similar presentations


Presentation on theme: "김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated."— Presentation transcript:

1 김덕주 (Duck Ju Kim)

2 Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated media data?

3 Introduction Analysis Structured organization Embedded semantics Indexing Tagging semantic units Limited machine perception Skimming Abstraction & Presentation Video browsing

4 Event Detection Approach Shot detection Low-level structure Not correspond directly to video semantics Scene extraction Higher-level context Many unimportant contents Event extraction Higher semantic level Better reveal, represent, abstraction

5 Speaker Identification Approach Standard speech databases YOHO, HUB4, SWITCHBOARD Integration from media cues Speaker recognition + Facial analysis Speech cues + Visual cues Supervised Identification Fixed speaker models Insufficient training data Data collection before processing

6 Video Skimming Approach Pre-developed schemes Discontinuous semantic flow Ignored embedded audio cue Computation of six types of features Importance evaluation Assembling important events

7 Content Pre-analysis Shot detection Color histogram-based approach Extract keyframes The first and last frames Audio content Classification Silence, speech, music, environmental sounds Visual content Detect human faces

8 Movie Event Extraction Develop thematic topics Through actions or dialogs What to extract? Two-speaker dialogs Multiple-speaker dialogs Hybrid Events

9 Movie Event Extraction How to extract? Shot sink computation Grouping close and similar shots Sink clustering and characterization Periodic, partly-periodic, non-periodic Event extraction and classification Post-processing

10 Shot Sink Computation Pool of close and similar shots Using Visual Information Window-based Sweep Algorithm

11 Shot Sink Clustering Clustering & Characterizing Periodic, Partly-periodic, Non-periodic Degree of shot repetition Determining the sink periodicity Calculate relative temporal distance Compute mean μ, standard deviation σ Grouping with K-means algorithm

12

13 Integrating Speech & Face Information False Alarm Montage presentation -> Spoken Dialog Multiple-speaker dialog -> Two-speaker dialog Solution to reducing Embedded audio information integration Speech shot ratio calculation Facial cue inclusion Face detection

14 Adaptive Speaker Identification Shot detection & Audio classification Face detection & Mouth tracking Speech segmentation / clustering Initial speaker modeling Audiovisual-based speaker identification Unsupervised speaker model adaptation

15

16 Face Detection & Mouth Tracking Detection & Recognition of talking faces Distance between eyes and mouth : dist Eyes’ position : (x1, y1), (x2, y2) Mouth center : (x, y)

17 Speech Segmentation

18 Speech Clustering Two separate segments X1, X2 Joined segment X = {X1, X2} For cluster C have n homogeneous speech segments Dist(X, C) =, Negative value -> Considered from the same speaker

19 Initial Speaker Modeling Required for identification process Exploiting the inter-relations between facial and speech cues For each target cast member A Find a speech shot where A is talking Collect all the speech segments Build initial model Gaussian Mixture Model(GMM)

20 Likelihood-based speaker identification GMM model notation, j = 1, 2, …, m For ith enrolled speaker The log likelihood between X and Mi

21 Audiovisual integration for speaker identification Finalizing the speaker identification task Integration of audio and video cues Examine the existence of temporal overlap Overlap ratio > Threshold Assign face vector to cluster Otherwise, set face vector to null Speaker Identity

22 Unsupervised Speaker Model Adaptation Updating the speaker model Three approaches Average-based model adaptation MAP-based model adaptation Viterbi-based model adaptation

23 Average-based Model Adaptation Compute BIC distances Compare between d min and threshold T d min < T : d min > T : Initialize new mixture component Update the weight for each component

24 MAP-based Model Adaptation μ i : Mean of b i d L i : Occupation likelihood of the adaptation data μ-bar : Mean of the observed adaptation data

25 Viterbi-based Model Adaptation Allows different feature vectors from different components Hard decision Any vector can either occupy component or not Indicator function instead of probability function Mixture component

26 Event-based Movie Skimming Event feature extraction Six types of mid- to high-level features Evaluation of importance Movie skim generation Assemble major events -> final skim

27 Event Feature Extraction Music Ratio Speech Ratio Sound Loudness Action Level Normalized by dividing the largest value Present Cast Theme Topic

28 Event Feature Extraction M : # of features extracted N : # of events a i,j : value of jth feature in ith event

29 Movie Skim Generation Choosing important events User’s feature preference Event importance vector

30 Event Detection Results Correctness of the event classification System performance evaluation Hybrid class excluded

31

32 Speaker Identification Results Evaluation of adaptive speaker identification system False acceptance(FA) False rejection(FR) Identification accuracy(IA)

33

34 Average-based, MAP-based, Viterbi-based

35

36 Movie Skimming Results Difficulties of Qualitative evaluation Quantitative measure based on user study 5-point scale : 1~5 Visual comprehension Audio comprehension Semantic continuity Good abstraction Quick browsing Video skipping


Download ppt "김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated."

Similar presentations


Ads by Google