Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign

2 LSCOM (Large Scale Concept Ontology for Multimedia) A broadcast news video dataset 200+ news videos/ 170 hours 61,901 shots Language ◦ English/Arabic/Chinese

3 Why broadcast News ontology? Critical mass of users, content providers, applications Good content availability (TRECVID LDC FBIS) Share Large set of core concepts with other domains

4 LSCOM Provides Richly annotated video content for accomplishing required access and analysis functions over massive amount of video content Large scale useful well-defined semantic lexicon ◦ More than 3000 concepts ◦ 374 annotated concepts ◦ Bridging semantic gap from low-level features to high-level concepts

5 A LSCOM concept 000 - Parade Concept ID: 000 Name: Parade Definition: Multiple units of marchers, devices, bands, banners or Music. Labeled: Yes

6 LSCOM Hierarchy http://www.lscom.org/ontology/index.html Thing.Individual..Dangerous_Thing...Dangerous_Situation....Emergency_Incident.....Disaster_Event......Natural_Disaster....Natural_Hazard.....Avalance.....Earthquake.....Mudslide.....Natural_Disaster.....Tornado...Dangerous_Tangible_Thing....Cutting_Device

7 Definition: What’s the ontology? (Wikipedia) An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.

8 Ontology Represents the visual knowledge base in a structure way ◦ Graph structure ◦ Tree (hierarchy) structure Images/videos can be effectively learned and retrieved by the coherence between concepts ◦ Logical coherence ◦ Statistical coherence

9 An Ontology Hierarchy: Military Vehicle

10 An example from Wikipedia

11 Ontology Tree for LSCOM

12 A Light Scale Concept Ontology for Multimedia Understanding (LSCOM-Lite) The aim is to break the semantic space using a few concepts (39 concepts). Selection Criteria ◦ Semantic Coverage  As many as semantic concepts in News videos could be covered by the light concept set. ◦ Compactness  These concept should not semantically overlap. ◦ Modelability  These concepts could be modeled with a smaller semantic gap.

13 Selected concept dimensions Divide the semantic space into a multimedia-dimensional space, where each dimension is nearly orthogonal ◦ Program Category ◦ Setting/Scene/Site ◦ People ◦ Objects ◦ Activities ◦ Events ◦ Graphics

14 Histogram of LSCOM-Lite Concepts

15 Some example keyframes

16 Applications Application I: Conceptual Fusion (most basic – early fusion) Application II: Cross-Category Classification (inter-class relation) Application III: Event Dynamic in Concept Space

17 Application I: Conceptual Fusion Video Concept 1 Concept 2 Concept 3 Concept n Visual Features Classifier …

18 LSCOM 374 Models 374 LIBSVM models ◦ http://www.ee.columbia.edu/ln/dvmm/columbi a374/ http://www.ee.columbia.edu/ln/dvmm/columbi a374/ ◦ Feature used (MPEG-7 descriptors)  Color Moments  Edge Histogram  Wavelet Texture ◦ LIBSVM – a library for support vector machine at http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.csie.ntu.edu.tw/~cjlin/libsvm/

19 Application II: cross-category classification with concept transfer G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011

20 Instance-Level Concept Correlation +1 +1 MountainCastle Mountain and castle Castle only Mountain only

21 Transfer Function Mountain, Castle Mountain Castle None of them

22 Model Concept Relations

23 Automatically construct ontology in a data-driven manner

24 An application III – Event Dynamics in Concept Space

25 Event Detection with Concept Dynamics W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.

26 Open Problems Cross-Dataset Gap ◦ Generalize LSCOM dataset to other dataset (e.g., non- news video dataset) Cross-Domain Gap ◦ Text script associated with news videos  Can help information extraction for visual concepts? Automatic ontology construction ◦ Task dependent v.s. task independent ◦ Data driven v.s. preliminary knowledge (e.g., WordNet) ◦ Incorporate prior human knowledge (logic relation etc.)

27 TRECVID Competition Task 1: High-Level Feature Extraction ◦ Input: subshot ◦ Output: detection results for 39 LSCOM-Lite concepts in the subshot

28 High-Level Feature Extraction Each concept assumed to be binary (absent or present) in each subshot Submission: Find subshots that contain a certain concept, rank them by the detection confidence score, and submit the top 2000. Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools

29 20 Evaluated Concepts

30 Evaluation Metric: Average Precision Relevant subshots should be ranked higher than the irrelevant ones. R is the number of relevant images in total, R j is the number of relevant images in top j images, I j indicates if the jth image is irrelevant or not.

31 Results

32 TRECVID Competition Task II: Video Search ◦ Input: text-based 24 topics ◦ Output: relevant subshots in the database

33 Topics to search

34 Topics to search (cont’d)

35 Topics to search

36 Three Types of Search Systems

37 Results: Automatic Runs

38 Results: Manual Runs

39 Results: Interactive Runs

40 Machine Problem 7: Shot Boundary Detection in Videos

41 Goals Detect the abrupt content changes between consecutive frames. ◦ Scene changes ◦ Scene cuts

42 Steps Step 1: Measuring the change of content between video frames ◦ Visual/Acoustic measurements Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.

43 Measuring Content based on Visual Information 256 dimensional Color Histogram ◦ In RGB space, normalize the r, g, b in [0,1] ◦ Color space nr ng 8X8 histogram

44 Color Histograms Divide each image into four parts, each part has a 8X8 histogram, and 256 dim features in total.

45 Acoustic Features 12 cepstral coefficients Energy (sum of square of raw signals) Zero crossing rates (ZCR) ZCR = sum(|sign(S(2:N))-sign(S(1:N-1))|) Hints: normalize energy to avoid it over- dominating when computing distances between successive frames

46 Datasets Two videos of little over one minute Manually label the shot boundary

47 What to submit Source code Report ◦ compare shot boundary detection results returned by your algorithm with the manually labeled boundaries ◦ Compare ◦ Explain your choice of threshold ◦ Explain the differences between the acoustic- based and visual-based detection results

48 Where and when to submit Email to ece.ece.ece.417@gmail.comece.ece.ece.417@gmail.com Due: May 2 nd

49 Thanks! Q&A


Download ppt "Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google