TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Automatic Histogram Threshold Using Fuzzy Measures 呂惠琪.
Computer Science Engineering Lee Sang Seon.  Introduction  Basic notions for temporal video boundaries  Micro-Boundaries  Macro-Boundaries  Mega-Boundaries.
Content-based Video Indexing and Retrieval
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
EE 7730 Image Segmentation.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Segmentation Divide the image into segments. Each segment:
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
Major Cast Detection in Video Using Both Speaker and Face Information
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
On the Use of Computable Features for Film Classification Zeeshan Rasheed,Yaser Sheikh Mubarak Shah IEEE TRANSCATION ON CIRCUITS AND SYSTEMS FOR VIDEO.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
MPEG-7 Motion Descriptors. Reference ISO/IEC JTC1/SC29/WG11 N4031 ISO/IEC JTC1/SC29/WG11 N4062 MPEG-7 Visual Motion Descriptors (IEEE Transactions on.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Low-level Motion Activity Features for Semantic Characterization of Video Kadir A. Peker, A. Aydin Alatan, Ali N. Akansu International Conference on Multimedia.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Information Retrieval in Practice
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Computer vision.
What’s Making That Sound ?
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.
MPEG MPEG-VideoThis deals with the compression of video signals to about 1.5 Mbits/s; MPEG-AudioThis deals with the compression of digital audio signals.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Multimedia Information Retrieval
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Presented by Tienwei Tsai July, 2005
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Content-Based Image Retrieval
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Levi Smith.  Reading papers  Getting data set together  Clipping videos to form the training and testing data for our classifier  Project separation.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
A survey of different shape analysis techniques 1 A Survey of Different Shape Analysis Techniques -- Huang Nan.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
VISUAL INFORMATION RETRIEVAL Presented by Dipti Vaidya.
SUMMERY 1. VOLUMETRIC FEATURES FOR EVENT DETECTION IN VIDEO correlate spatio-temporal shapes to video clips that have been automatically segmented we.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Visual Information Retrieval
ECE 417 Lecture 1: Multimedia Signal Processing
Automatic Video Shot Detection from MPEG Bit Stream
Introduction Multimedia initial focus
Instance Based Learning
Presenter: Ibrahim A. Zedan
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Image Segmentation.
Presentation transcript:

TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN

Why do we need temporal segmentation of videos? How do we set up boundaries in between video frames? How do we merge two separate but uniform segments?

ABSTRACT Much work has been done in automatic video analysis. But while techniques like local video segmentation, object detection and genre classification have been developed, little work has been done on retrieving overall structural properties of a video content.

ABSTRACT(2) Retrieving overall structure in a video content means splitting the video into meaningful tokens by setting boundaries within the video. =>Temporal Video Boundary Segmentation We define these boundaries into 3 categories : micro-, macro-, mega- boundaries.

ABSTRACT(3) Our goal is to have a system for automatic video analysis, which should eventually work for applications where a complete metadata is unavailable.

INTRODUCTION What’s going on?  Great increase in quantity of video contents.  More demand for content-aware apps.  Still the majority of video contents have insufficient metadata. => More demand for information on temporal video boundaries.

BOUNDARIES : DEFINITIONS Micro-boundaries : the shortest observable temporal segments. Usually bounded within a sequence of contiguously shot video frames. (frames under the same micro-boundaries.)

Micro-boundaries are associated to the smallest video units, for which a given attribute is constant or slowly varying. The attribute can be visual, sound or text. Depending on which attribute, micro-boundaries can differ.

BOUNDARIES : DEFINITIONS(2) Macro-boundaries : boundaries between different parts of the narrative or the segments of a video content. (frames under the same macro-boundaries.)

Macro-boundaries are boundaries between micro-boundaries that are clearly identifiable organic parts of an event defining a structural or thematic unit.

BOUNDARIES : DEFINITIONS(3) Mega- Boundaries : a boundary between a program and any non-program material. (frames under different mega-boundaries.)

Mega-Boundaries are boundaries between macro-boundaries which typically exhibit a structural and feature consistency.

BOUNDARIES : FORMAL DEFINITION A video content contains three types of modalities : visual, audio, textual and each modality has three levels : low-, mid, high- These levels describe the “amount of details” in each modality in terms of granularity and abstraction.

BOUNDARIES : FORMAL DEFINITION(2) For each modality and levels is an attribute. An attribute defined as below. (attribute vector) : denotes modality( ex : m=1, 2 and 3 means visual, audio and text respectively. : denotes the index for the attributes. (ex : m=1 and =1 indexes color ) : denotes the total number of vector components. : time constant ( can be expressed in integers or milliseconds.)

BOUNDARIES : FORMAL DEFINITION(3) If time interval is defined as, the average and the deviation of an attribute throughout the video can be expressed as below : = avg of (deviation) = Where

BOUNDARIES : FORMAL DEFINITION(4) By using the vectors defined previously, we now have two different methods to estimate temporal boundaries : Has no memoryHas memory Given a threshold, and distance metric ‘Dist’, if Dist( ) is larger than, then there exists a boundary at instant The difference computed over a series of time. So we calculate the distance metric between the universal average, instead of the previous attribute. If Dist, a boundary exists at instant

MICRO-BOUNDARIES In multi-media, the term “shot” or “take” is widely used. Similar concept can be used to define the segment between micro-boundaries, which is often called a “family of frames.” Each segment has an representative frame called “keyframe.” The keyframe of a family has audio/video data that well represents the segment. But the method to pick out the keyframe may vary.

MICRO-BOUNDARIES(2) Each family has a “family histogram” to eventually form a “superhistogram.” A family histogram is a data structure that represents the color information of a family of frames. A superhistogram is a data structure that contains the information about non-contiguous family histograms within the larger video segment.

MICRO-BOUNDARIES(3) Generation of family histograms and superhistograms may vary depending on pre-defined dimensions below. 1) The amount of memory -No memory means comparing only with the pre- vious frame. 2) Contiguity of compared families -Determining the time step. 3) Representation for a family -How we choose the keyframe.

MICRO-BOUNDARIES : FAMILY OF FRAMES An image histogram is a vector representing the color values and the frequency of their occurrence in the image. Finding the difference between consecutive histograms and merging similar histograms enable generating family of frames. For each frame, we compute the histogram( ) and then search the previously computed family histograms( ) to find the closest match.

MICRO-BOUNDARIES : FAMILY OF FRAMES(2) Several ways to generate histogram difference : Among them, the L1 and bin-wise histogram intersection gave the best results.

MICRO-BOUNDARIES : BOUNDARY DETECTION If the difference between two family histograms is less than a given threshold, the current histogram is merged into the family histogram. Each family histogram consists of : 1) pointers to each of the constituent histograms and frame numbers. 2) a merged family histogram.

MICRO-BOUNDARIES : BOUNDARY DETECTION(2) Merging of family histograms is performed as below: (basically, the mean of all histograms in the given video.)

MICRO-BOUNDARIES : BOUNDARY DETECTION(3) Multiple ways to compare and merge families, depends on the choice of contiguity and memory. 1)Contiguous with zero memory 2)Contiguous with limited memory 3)Non-contiguous with unlimited memory 4)Hybrid : first a new frame histogram is compared using the contiguous frames and then the generated family histograms are merged using the non- contiguous case.

MICRO-BOUNDARIES : EXPERIMENTS CNN News Sample. 27,000 frames Tested with 9, 30, 90, 300 bins in HSB, 512 bins in RGB Multiple histogram comparisons: L1, L2, bin-wise intersection and histogram intersection. Tried on 100 threshold values.

MICRO-BOUNDARIES : EXPERIMENTS(2) Tested on a video clip, best results showed when threshold 10 with the L1 comparison/contiguous with limited memory boundary method/HSB space quantized to 9 bins.

MICRO-BOUNDARIES : EXPERIMENTS(3)

MACRO-BOUNDARIES A story is a complete narrative structure, conveying a continuous thought or event. We want micro-segments with the same story to be in the same macro-segment. Usually we need textual cues(transcripts) for setting such boundaries, but this paper suggests methodologies that does the job solely with audio and visual cues. We focus on the observation that stories are characterized by multiple constant or slowly varying multimedia attributes.

MACRO-BOUNDARIES(2) Two types of uniform segment detection : Unimodal and multimodal Unimodal(under the same modality) : when a video segment exhibits the “same” characteristic over a period of time using a single type of modality. Multimodal : vice versa

MACRO-BOUNDARIES : SINGLE MODALITY SEGMENTATION In case of audio-based segmentation: 1) Partition a continuous audio stream into non- overlapping segments. 2) Classify the segments using low-level audio features like bandwidth. 3) Divide the audio signal into portions of different classes.(speech, music, noise etc.)

MACRO-BOUNDARIES : SINGLE MODALITY SEGMENTATION(2) In case of textual-based segmentation : 1) If transcript doesn’t exist, extract text data from the audio stream using speech-to-text conversion. 2) The transcript segmented with respect to a predefined topic list. 3) A frequency-of-word-occurrence metric is used to compare incoming stories with the profiles of manually pre-categorized stories.

MACRO-BOUNDARIES : MULTIMODAL SEGMENTS What we want to do : Retrieve better segmentation results by using the results from various unimodal segmentations. What we need to do : first the pre-merging steps, and then the descent steps.

MACRO-BOUNDARIES : MULTIMODAL SEGMENTS(2) Pre-merging Steps : detect micro-segments that exhibit uniform properties, and determine attribute templates for further segmentation. 1)Uniform segment detection 2)Intra-modal segment clustering 3)Attribute template determination -attribute template : a combination of numbers that characterize the attribute. 4)Dominant attribute determination 5)Template application

MACRO-BOUNDARIES : MULTIMODAL SEGMENTS(3) Descent Methods : By making combinations of multimedia segments across multiple modalities, each attribute with its segments of uniform values is associated with a line.

MACRO-BOUNDARIES : MULTIMODAL SEGMENTS(4) Single descent method describes the process of generating story segments by combining these segments. 1)Single descent with intersecting union 2)Single descent with intersection 3)Single descent with secondary attribute 4)Single descent with conditional union

MACRO-BOUNDARIES : EXPERIMENTS Single descent process with conditional union. Used text transcript as the dominant attribute. -uniform visual/audio segments -uniform audio segments You can find a lag between the story beginning and the production of transcript.

Questions?