Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.

Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.

Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,

Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.

Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

ACM Multimedia th Annual Conference, October , 2004

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.

Presented by Zeehasham Rasheed

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,

Information Retrieval in Practice

DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

What Are the High-Level Concepts with Small Semantic Gaps? CS 4763 Multimedia System, Spring 2008.

TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Multimedia Information Retrieval

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Understanding The Semantics of Media Chapter 8 Camilo A. Celis.

IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Competence Centre on Information Extraction and Image Understanding for Earth Observation 29th March 2007 Category - based Semantic Search Engine 1 Mihai.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Image Classification for Automatic Annotation

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).

1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

A Genetic Algorithm-Based Approach to Content-Based Image Retrieval Bo-Yen Wang( 王博彥 )

Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.

MULTIMEDIA DATA MODELS AND AUTHORING

Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

Cross-modal Hashing Through Ranking Subspace Learning

Data Mining for Surveillance Applications Suspicious Event Detection

Visual Information Retrieval

Automatic Video Shot Detection from MPEG Bit Stream

Multimedia Content-Based Retrieval

Semantic Video Classification

Color-Texture Analysis for Content-Based Image Retrieval

Data Mining for Surveillance Applications Suspicious Event Detection

Multimedia Information Retrieval

Ying Dai Faculty of software and information science,

Ying Dai Faculty of software and information science,

Data Mining for Surveillance Applications Suspicious Event Detection

Ying Dai Faculty of software and information science,

Research Institute for Future Media Computing

Presentation transcript:

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign

LSCOM (Large Scale Concept Ontology for Multimedia) A broadcast news video dataset 200+ news videos/ 170 hours 61,901 shots Language ◦ English/Arabic/Chinese

Why broadcast News ontology? Critical mass of users, content providers, applications Good content availability (TRECVID LDC FBIS) Share Large set of core concepts with other domains

LSCOM Provides Richly annotated video content for accomplishing required access and analysis functions over massive amount of video content Large scale useful well-defined semantic lexicon ◦ More than 3000 concepts ◦ 374 annotated concepts ◦ Bridging semantic gap from low-level features to high-level concepts

A LSCOM concept Parade Concept ID: 000 Name: Parade Definition: Multiple units of marchers, devices, bands, banners or Music. Labeled: Yes

LSCOM Hierarchy Thing.Individual..Dangerous_Thing...Dangerous_Situation....Emergency_Incident.....Disaster_Event......Natural_Disaster....Natural_Hazard.....Avalance.....Earthquake.....Mudslide.....Natural_Disaster.....Tornado...Dangerous_Tangible_Thing....Cutting_Device

Definition: What’s the ontology? (Wikipedia) An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.

Ontology Represents the visual knowledge base in a structure way ◦ Graph structure ◦ Tree (hierarchy) structure Images/videos can be effectively learned and retrieved by the coherence between concepts ◦ Logical coherence ◦ Statistical coherence

An Ontology Hierarchy: Military Vehicle

An example from Wikipedia

Ontology Tree for LSCOM

A Light Scale Concept Ontology for Multimedia Understanding (LSCOM-Lite) The aim is to break the semantic space using a few concepts (39 concepts). Selection Criteria ◦ Semantic Coverage  As many as semantic concepts in News videos could be covered by the light concept set. ◦ Compactness  These concept should not semantically overlap. ◦ Modelability  These concepts could be modeled with a smaller semantic gap.

Selected concept dimensions Divide the semantic space into a multimedia-dimensional space, where each dimension is nearly orthogonal ◦ Program Category ◦ Setting/Scene/Site ◦ People ◦ Objects ◦ Activities ◦ Events ◦ Graphics

Histogram of LSCOM-Lite Concepts

Some example keyframes

Applications Application I: Conceptual Fusion (most basic – early fusion) Application II: Cross-Category Classification (inter-class relation) Application III: Event Dynamic in Concept Space

Application I: Conceptual Fusion Video Concept 1 Concept 2 Concept 3 Concept n Visual Features Classifier …

LSCOM 374 Models 374 LIBSVM models ◦ a374/ a374/ ◦ Feature used (MPEG-7 descriptors)  Color Moments  Edge Histogram  Wavelet Texture ◦ LIBSVM – a library for support vector machine at

Application II: cross-category classification with concept transfer G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011

Instance-Level Concept Correlation MountainCastle Mountain and castle Castle only Mountain only

Transfer Function Mountain, Castle Mountain Castle None of them

Model Concept Relations

Automatically construct ontology in a data-driven manner

An application III – Event Dynamics in Concept Space

Event Detection with Concept Dynamics W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.

Open Problems Cross-Dataset Gap ◦ Generalize LSCOM dataset to other dataset (e.g., non- news video dataset) Cross-Domain Gap ◦ Text script associated with news videos  Can help information extraction for visual concepts? Automatic ontology construction ◦ Task dependent v.s. task independent ◦ Data driven v.s. preliminary knowledge (e.g., WordNet) ◦ Incorporate prior human knowledge (logic relation etc.)

TRECVID Competition Task 1: High-Level Feature Extraction ◦ Input: subshot ◦ Output: detection results for 39 LSCOM-Lite concepts in the subshot

High-Level Feature Extraction Each concept assumed to be binary (absent or present) in each subshot Submission: Find subshots that contain a certain concept, rank them by the detection confidence score, and submit the top Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools

20 Evaluated Concepts

Evaluation Metric: Average Precision Relevant subshots should be ranked higher than the irrelevant ones. R is the number of relevant images in total, R j is the number of relevant images in top j images, I j indicates if the jth image is irrelevant or not.

Results

TRECVID Competition Task II: Video Search ◦ Input: text-based 24 topics ◦ Output: relevant subshots in the database

Topics to search

Topics to search (cont’d)

Topics to search

Three Types of Search Systems

Results: Automatic Runs

Results: Manual Runs

Results: Interactive Runs

Machine Problem 7: Shot Boundary Detection in Videos

Goals Detect the abrupt content changes between consecutive frames. ◦ Scene changes ◦ Scene cuts

Steps Step 1: Measuring the change of content between video frames ◦ Visual/Acoustic measurements Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.

Measuring Content based on Visual Information 256 dimensional Color Histogram ◦ In RGB space, normalize the r, g, b in [0,1] ◦ Color space nr ng 8X8 histogram

Color Histograms Divide each image into four parts, each part has a 8X8 histogram, and 256 dim features in total.

Acoustic Features 12 cepstral coefficients Energy (sum of square of raw signals) Zero crossing rates (ZCR) ZCR = sum(|sign(S(2:N))-sign(S(1:N-1))|) Hints: normalize energy to avoid it over- dominating when computing distances between successive frames

Datasets Two videos of little over one minute Manually label the shot boundary

What to submit Source code Report ◦ compare shot boundary detection results returned by your algorithm with the manually labeled boundaries ◦ Compare ◦ Explain your choice of threshold ◦ Explain the differences between the acoustic- based and visual-based detection results

Where and when to submit to Due: May 2 nd

Thanks! Q&A