Presentation is loading. Please wait.

Presentation is loading. Please wait.

DL:Lesson 11 Multimedia Search Luca Dini

Similar presentations


Presentation on theme: "DL:Lesson 11 Multimedia Search Luca Dini"— Presentation transcript:

1 DL:Lesson 11 Multimedia Search Luca Dini dini@celi.it

2 MPEG-4: Content-based Encoding Encodes objects that can be tracked from frame to frame. Video frames are layers of video object planes (VOP). Each VOP is segmented & coded separately throughout the shot Background encoded only once. Objects are not defined as to what they represent, only their motion, shapes, colors and textures, allowing them to be tracked through time. Objects and their backgrounds are brought together again by the decoder.

3 MPEG-4: Content-based encoding Ghanbari, M. (1999) Video Coding: An Introduction to Standard Codecs Video object plane (VOP) Background encoded only once

4 AMOS: Tracking Objects Beyond the Frame http://www.ctr.columbia.edu/~dzhong/rtrack/demo.htm

5 “Are We Doing Multimedia?”* Multimodal Indexing Ramesh Jain: “To solve multimedia problems, we should use as much context as we can.” – Visual (frames, shots, scenes) – Audio (soundtrack: speech recognition) – Text (closed captions, subtitles) – Context—hyperlinks, etc. *IEEE Multimedia. Oct-Nov. 2003 http://jain.faculty.gatech.edu/media_vision/doing_mm.pdf

6 Snoek, C., Worring, M. Multimodal Indexing: A Review of the State-of-the-art. Multimedia Tools & Applications. January 2005 Settings, Objects, People Modalities: Video, audio, text

7 Building Video Indexes Same as any indexing process…decide: – What to index: granularity – How to index: modalities (images, audio, etc.) – Which features? Discover spatial and temporal structure: deconstructing the authoring process Construct data models for access

8 Building Video Indexes: Structured modeling Predict relationship between shots: Pattern recognition Hidden Markov Models SVM (support vector machines) Neural networks Relevance feedback via machine learning

9 Data Models for Video IR Based on text (DBMS, MARC) Semi-structured (video + XML or hypertext): MPEG-7, SMIL Based on context: Yahoo Video, Blinkx, Truveo Multimodal: Marvel, Virage

10 Virage VideoLogger TM SMPTE timecode Keyframes Text or audio extracted automatically Mark & annotate clips

11 Annotation: Metadata Schemes MPEG-7 MPEG-21 METS SMIL

12 IBM MPEG-7 Annotation Tool

13 MPEG-7 Output from IBM Annotation Tool - T00:00:27:20830F30000 248 - T00:00:31:23953F30000 - Indoors - 14 15 351 238 Duration of shot in frames Location and dimension of spatial locator in pixelsAnnotation

14 The MPEG group Motion Picture Expert Group Founded by ISO (International Standards Organization) in 1988 Four standards, MPEG 1, 2, 4 and 7

15 MPEG-1 Standard in 1992 Gave good quality audio and video Usually low resolution video with around 30 frames per second Three audio layers

16 MPEG-2 Standardized in 1996 The codec of DVD Very good quality audio and video Uses high resolution and high bit-rate

17 MPEG-4 Standardized in 1998 Based on MPEG-1, MPEG-2 and QuickTime First real multimedia representation standard Intended for videoconferences Several different versions

18 MPEG-7 Standardized in 2001 Not a video codec Called “Multimedia Content Description Interface” Utilizes the earlier MPEG Standards Developed to simplify search for media elements

19 Standardization Progress ITU-T ISO/IEC Joint ITU-T, ISO/IEC H.261 (1990) JPEG (1992) MPEG-1 (1992) MPEG-2 (1994) H.263 (1995) H.26L (2001) MPEG-4 (1999) MPEG-7 (2001) Application Areas Features Videophone PSTN, B-ISDN Low quality 64kbps ~ 1.5Mbps Video CD Internet VHS quality < 1.5 Mbps Stereo Audio Digital Broadcasting DVD Digital Camcoder High quality 1.5 ~ 80 Mbps 5.1 channel Audio Content Production Internet Multimedia Broadcast Various quality Synthetic Audio/Video User Interactivity Content Search Internet, DSM Broadcasting User Interactivity Data CompressionContent Manipulation

20 MPEG-7 Scope Diversity of Applications – Multimedia, Music/Audio, Graphics, Video Descriptors (Ds) – Describe basic characteristics of audiovisual content – Examples: Shape, Color, Texture, … Description Schemes (DSs) – Describe combinations of descriptors - Example: Spoken Content

21 Scope Description Production (extraction) Description Consumption Standard Description Normative part of MPEG-7 standard MPEG-7 does not specify -How to extract descriptions -How to use descriptions -The similarity between contents

22 Descriptions Annotations – cannot be deduced from content – recording date & conditions, author, copyright, viewing age, etc. Features – that is present in the content – low level features color, texture, shape, key, mood, tempo, etc. – high level features composition, event, action, situation, etc.

23 MPEG-7 Terminology Data – Audiovisual information that will be described using MPEG-7 Feature – A distinctive part or characteristic of data (ex. Color, shape,...) Descriptor – Associates a representation value to one or more features. Description Scheme – Defines a structure and semantics of descriptors and their relationships to model data content. Description Definition Language (DDL) – A language to specify Description Scheme Coded description – A representation of description allowing efficient storage and transmission

24 Components 1) MPEG-7 Systems 2) MPEG-7 Description Definition Language 3) MPEG-7 Visual 4) MPEG-7 Audio 5) MPEG-7 Multimedia DSs 6) MPEG-7 Reference Software 7) MPEG-7 Conformance

25 Visual Descriptors Color Descriptors Texture Descriptors Shape Descriptors Motion Descriptors for Video

26 Colors

27 Etc… http://mp7.watson.ibm.com/marvel/


Download ppt "DL:Lesson 11 Multimedia Search Luca Dini"

Similar presentations


Ads by Google