Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.

Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Outline Motivation Introduction Two approaches for semantic analysis  A probabilistic framework (Naphade, Huang ’01)  Object-based abstraction and modeling [Lee, Kim, Hwang ’01] A multimodal framework for video content interpretation Conclusion

Motivation There is an amazing growth in the amount of digital video data in recent years. Lack of tools for classify and retrieve video content There exists a gap between low-level features and high-level semantic content. To let machine understand video is important and challenging.

Introduction Content-based Video indexing  the process of attaching content based labels to video shots  essential for content-based classification and retrieval  Using automatic analysis techniques - shot detection, video segmentation - key frame selection - object segmentation and recognition - visual/audio feature extraction - speech recognition, video text, VOCR

Introduction Content-based Video Classification  Segment & classify videos into meaning categories  Classify videos based on predefined topic  Useful for browsing and searching by topic  Multimodal method Visual features Audio features Motion features Textual features  Domain-specific knowledge

Introduction Content-based Video Retrieval  Simple visual feature query Retrieve video with key-frame: Color-R(80%),G(10%),B(10%)  Feature combination query Retrieve video with high motion upward(70%), Blue(30%)  Query by example (QBE) Retrieve video which is similar to example  Localized feature query Retrieve video with a running car toward right  Object relationship query Retrieve video with a girl watching the sun set  Concept query (query by keyword) Retrieve explosion, White Christmas

Introduction Feature Extraction  Color features  Texture features  Shape features  Sketch features  Audio features  Camera motion features  Object motion features

Semantic Indexing & Querying Limitation of QBE  Measuring similarity using only low-level features  Lack reflection of user’s perception  Difficult annotation of high level features Syntactic to Semantic  Bridge the gap between low-level feature and semantic content  Semantic indexing, Query By Keyword (QBK) Semantic description scheme – MPEG-7  Semantic interaction between concepts  no scheme to learn the model for individual concepts

Semantic Modeling & Indexing Two approaches  Probabilistic framework, ‘Multiject’ (Naphade’01)  Object-based abstraction and indexing [Lee, Kim, Hwang ’01]

A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01) a probabilistic multimedia object 3 categories semantic concepts  Objects Face, car, animal, building  Sites Sky, mountain, outdoor, cityscape  Events Explosion, waterfall, gunshot, dancing

Multiject for semantic concept Outdoor Visual featuresAudio features Other multijects P( Outdoor = Present | features, other multijects) = 0.7 Text features

How to create a Multiject Shot-boundary detection Spatio-temporal segmentation of within-shot frames Feature extraction (color, texture, edge direction, etc ) Modeling  Sites: mixture of Gaussians  Events: hidden Markov models (HMMs) with observation densities as gaussian mixtures  All audio events: modeled using HMMs  Each segment is tested for each concept and the information is then composed at frame level

Multiject : Hierarchical HMM ss1 - ssm : state sequence for supervisor HMM sa1 - sam : state sequence for audio HMM xa1 - xam : audio observations sv1 - svm : state sequence for video HMM xv1 - xvm : video observations

Multinet: Concept Building based on Multiject A network of multijects modeling interaction between them + / - : positive/negative interaction between multijects

Bayesian Multinet Nodes : binary random variables (presence/absence of multiject) Layer 0 : frame-level multiject-based semantic features Layer 1 : inference from layer 0 : Layer 2 : higher level for performance improvement

Object-based Semantic Video Modeling VO Extraction Object-based Video Abstraction Object-based Low-Level Feature Extraction Semantic Features Modeling Video Sequence Indexing /Retrieving

Object Extraction based on Object Tracking [Kim, Hwang ‘ 00] I n-1 Motion Projection Model Update (Histogram Backprojection) Object Post-processing vo n vo n-1 InIn delay

Semantic Feature Modeling - Modeling based on temporal variation of object features - Boundary shape and motion statistics of object area Pre- processing HMM Training HMM Training Object Features Abstracted frame sequence

HMM Modeling 1. Observation Sequence O 1 ……. O T.... 2. Left-Right 1-D HMM modeling.... ….. S1S1 S2S2 STST object features

Video Modeling: Three Layer Structure Content Interpretation Frame-based Structural Modeling Audio-Visual Feature Extraction Semantic Video Modeling Object-based Structural Modeling Video Understanding Natural Language Processing Interpretation Sentence Structure & grammar Word Recognition Three layer structure of video modeling, compared to NLP

A Multimodal Framework for Video Content Interpretation Long-term goal Application on automatic TV Programs Scout Allow user to request topic-level programs Integrate multiple modalities: visual, audio and Text information Multi-level concepts  Low: low-level feature  Mid: object detection, event modeling  High: classification result of semantic content Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)

How to work with the framework? Preprocessing  Story segmentation (shot detection)  VOCR, Speech Recognition  Key frame selection Feature Extraction  Visual features based on key-frame Color, texture, shape, sketch, etc.  Audio features average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc.  Textual features (Transcript) Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. Word spotting, vote histogram  Motion features Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying Motion trajectories (moving objects) Object abstraction, recognition Building and training the Bayesian network

Challenging points Preprocessing is significant in the framework.  Accuracy of key-frame selection  Accuracy of speech recognition & VOCR Good feature extraction is important for the performance of classification. Modeling semantic video objects and events How to integrate multiple modalities still need to be well considered.

Conclusion Introduction of several basic concepts Semantic video modeling and indexing Propose a multimodal framework for topic classification of Video Discussion of Challenging problems

Q & A Thank you!

Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.

Similar presentations

Presentation on theme: "Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.

Similar presentations

Presentation on theme: "Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002."— Presentation transcript:

Similar presentations

About project

Feedback