Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20.

Similar presentations

Presentation on theme: "MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20."— Presentation transcript:

1 MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20

2 Content  MPEG-7 overview  What is…  Why?  Objectives and scope  Main elements and organization.  MPEG-7 Audio  Low-level features  High-level tools

3 What is MPEG-7  "Multimedia Content Description Interface“  ISO/IEC standard by MPEG (Moving Picture Experts Group)  Providing meta-data for multimedia  MPEG-1, -2, -4: make content available; MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer).  Multi-degrees of interpretation of information’s meaning  Support as broad a range of applications as possible.  A compatible (with existing tech) and extensible standard.

4 Why MPEG-7  “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ”  Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms  Now: growing amount of audiovisual information -> Identifying and managing them efficiently is becoming more difficult. e.g. “record only news about sport.”

5 Why MPEG-7  For future multimedia services, content representation and description may have to be addressed jointly.  Many services dealing with content representation will have to deal first with content description  “a non-described content may be useless”  Need for access only to the content description:  New original services (e.g. optimizing personal time)  Adaptation to networks and terminal capabilities

6 Application’s domains (incomplete)  Broadcast media selection (e.g., radio channel, TV channel).  Digital libraries (e.g., film, video, audio and radio archives).  E-Commerce (e.g., personalized advertising).  Education (e.g., repositories of multimedia courses, multimedia search for support material).  Home Entertainment (e.g., management of personal multimedia collections, including manipulation of content, e.g. karaoke).  Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face).  Multimedia directory services (e.g. yellow pages, G.I.S).  Surveillance and remote sensing.

7 MPEG-7 Objectives Standardize content-based description for various types of audiovisual information  Independent from media support (encoding and storage)  Different granularity  Low-level features: shape, size, key, tempo changes,  High-level semantic info: “scene with a barking brown dog on the left and with the sound of passing cars in the background.”  Meaningful in the context of the application  Same material -> different types of features and combinations e.g. timbre v.s. loudness

8 MPEG-7 Objectives  Information about the content  The form: e.g. the coding format used  Conditions for accessing the material: e.g. Intellectual property rights / price  Classification: e.g. parental rating  Links to other relevant materials  The context: “e.g. Olympic Games 1996, final of 200 meter hurdles, men)”  Information present in the content:  Combination of low-level and high-level descriptors

9 Scope of the Standard processing chain:

10 An example of architecture  Pull: (Client Queries -> Descriptions repository -> Matched Ds)  Push: (Filter descriptions -> Programmed actions)

11 Workplan

12 Where are the descriptions from?  Preservation of existing descriptive data (e.g. scripts) through the production/delivery  Generated automatically by capture devices (e.g. time or GPS location in a camera)  Extracted automatically & semi-automatically (i.e. with some human assistance)  Manually produced (e.g. for legacy material such as existing film archives)

13 Main Elements of MPEG-7  Description Tools: ( textual / binary )  Descriptors (D): define the syntax and the semantics of each feature (metadata element)  Description Schemes (DS): relationships between components  Description Definition Language (DDL):  Define the syntax of the MPEG-7 Description Tools  Creation, extension and modification of DSs  System tools:  Storage and transmission, synchronization of descriptions with content, multiplexing of descriptions, etc.

14 Main Elements of MPEG-7  Relationship among elements introduced above.

15 Description Tools  Creation and production processes: (director, title)  Usage: (broadcast schedule)  Storage features.  Structural information: (spatial-temporal components)  Segmentations  Low level features: (sound timbres, melody description)  Conceptual information: (objects and events, interactions)  Navigation and access: (summaries, variations)  Collections of objects.  User-content interactions: (user preferences, usage history)

16 Organization of Description Tools

17 Descriptions (further)  MPEG-7 approaches the description of content from several viewpoints.  A set of methods and tools for the different viewpoints of the description (not a monolithic system)  Interrelated and can be combined in many ways.  Associated with the content itself: (searching, filtering)  Location: (document V.S. stream)  physically located with the material  somewhere else on the globe (maybe not)  Interoperability with other metadata standards: (XML)

18 Use of Description Tools  The description tools are presented on the basis of the functionality they provide.  In practice, they are combined into meaningful sets of description units.  Furthermore, each application will have to select a sub-set of descriptors and DSs.  Library of tools!  DDL can be used to handle specific needs of the application. (like scripting in many current applications)

19 Major Functionalities  MPEG-7 Systems  MPEG-7 Description Definition Language  MPEG-7 Visual  MPEG-7 Audio  MPEG-7 Multimedia Description Schemes (D.T.)  Reference Software: the eXperimentation Model (test)  MPEG-7 Conformance (syntax checking)  MPEG-7 Extraction and use of descriptions (technical report)

20 MPEG-7 Audio  Audio provides structures—building upon some basic structures from the MDS—for describing audio content.  Low-level Descriptors:  audio features that cut across many applications  High-level Description Tools:  more specific to a set of applications.

21 Low-level Features  “MPEG-7 Audio Framework”:  Two low-level descriptor types: (for sample and segment)  Scalar : (e.g. power or fundamental frequency)  Vector : (e.g. spectra)  Hierarchical, consistent interface  Any descriptor inheriting from these types can be instantiated, describing a segment with a single summary value or a series of sampled values, as the application requires.  Scalable Series: (hierarchical re-sampling)  Progressively down-sample the data contained in a series (Application-oriented)

22 Low-level Features (types)  Basic  Basic Spectral  Signal Parameters  Timbral Temporal  Timbral Spectral  Spectral Basis  MPEG-7 Silence Descriptor

23 Low-level Features (graph)

24 Low-level Features (details)  Basic: (temporally sampled scalar values for general use)  AudioWaveform Descriptor  waveform envelope: (for display purposes).  AudioPower Descriptor  temporally-smoothed instantaneous power: (quick summary of a signal)  Applicable to all kinds of signals

25 Low-level Features (details)  Basic Spectral: (single time-frequency analysis of signal)  AudioSpectrumEnvelope: (Base class)  the short-term power spectrum: (display, synthesize, general-purpose search)  AudioSpectrumCentroid:  dominated by high or low frequencies ?  AudioSpectrumSpread:  the power spectrum centered near the spectral centroid, or spread out over the spectrum?  pure-tone and noise-like sounds  AudioSpectrumFlatness: (the presence of tonal components)

26 Low-level Features (details)  Signal Parameters: (periodic or quasi-periodic signals)  AudioFundamentalFrequency:  “confidence measure”, replacing “pitch-tracking”  AudioHarmonicity:  distinction between sounds with a harmonic / inharmonic / non-harmonic spectrum

27 Low-level Features (details)  Timbral Temporal: (temporal characteristics of segments of sounds, musical timbre)  LogAttackTime  TemporalCentroid  where in time the energy of a signal is focused.  Useful when attack times are identical

28 Low-level Features (details)  Timbral Spectral: (spectral features in a linear-frequency space)  SpectralCentroid:  power-weighted average of the frequency of the bins in the linear power spectrum.  distinguishing musical instrument timbres  4 Ds for harmonic regularly-spaced components of signals:  HarmonicSpectralCentroid  HarmonicSpectralDeviation  HarmonicSpectralSpread  HarmonicSpectralVariation

29 Low-level Features (details)  Spectral Basis: (low-dimensional projections of a spectral space to aid compactness and recognition)  AudioSpectrumBasis:  a series of (time-varying / statistically independent) basis functions derived from the singular value decomposition of a normalized power spectrum.  AudioSpectrumProjection:  low-d features of a spectrum after projection upon a reduced rank basis.  independent subspaces of a spectra correlate strongly with different sound sources.  Provide more salience using less space.  With Sound Classification and Indexing Description Tools.

30 Low-level Features (details)  Silence segment: (no significant sound)  aid further segmentation of the audio stream, or as a hint not to process a segment

31 High-level audio Description Tools (Ds and DSs)  Exchange some generality for descriptive richness:  a smaller set of audio features (as compared to visual features) that may canonically represent a sound without domain-specific knowledge.  Audio Signature (DS)  Musical Instrument Timbre  Melody  General Sound Recognition and Indexing  Spoken Content

32 High-level audio Description Tools (details)  Audio Signature Description Scheme  SpectralFlatness Ds  a unique content identifier for the purpose of robust automatic identification  e.g. audio fingerprinting

33 High-level audio Description Tools (details)  Musical Instrument Timbre Description Tools  HarmonicInstrumentTimbre Ds:  LogAttackTime Descriptor  PercussiveIinstrumentTimbre Ds:  SpectralCentroid Descriptor

34 High-level audio Description Tools (details)  Melody Description Tools:  efficient, robust, and expressive melodic similarity matching.  MelodyContour Description Scheme:  terse, efficient melody contour / rhythm  MelodySequence Description Scheme:  verbose, complete, expressive melody / rhythm.  Interval encoding

35 High-level audio Description Tools (details)  General Sound Recognition and Indexing Description Tools:  SoundModel Description Scheme  SoundClassificationModel Description Scheme  a set of SoundModel DS -> multi-way classifier  SoundModelStatePath Descriptor  indices to states generated by a SoundModel of a segment  immediately applied to sound effects  automatically index and segment sound tracks.  Low -> mid -> high level analyses

36 High-level audio Description Tools (details)  Spoken Content Description Tools:  detailed description of words spoken within an audio stream.  indexing into and retrieval of an audio stream  indexing of multimedia objects annotated with speech.  Recall of audio/video data by memorable spoken events.  a character or person spoke a particular word  Spoken Document Retrieval  separate spoken documents  Annotated Media Retrieval  photograph retrieved using a spoken annotation

37 Development  Currently under development:  MPEG-7 Audio COR.1 (currently at DCOR1)  MPEG-7 Amendment 1 (currently at FPDAM1)  New Audio Description Tools specified (MPEG-7 version 2):  Spoken Content:  Audio Signal Quality:  Audio Tempo:  Currently Proposed tools:  Low Level Descriptor for Audio Intensity  Low Level Descriptor for Audio Spectrum Envelope Evolution  Generic mechanism for data representation based on ‘modulation decomposition’  MPEG-7 Audio-specific binary representation of descriptors

38 MPEG-7 version 1 Schedule  Call for Proposals October 1998  Evaluation February 1999  First version of Working Draft (WD) December 1999  Committee Draft (CD) October 2000  Final Committee Draft (FCD) February 2001  Final Draft International Standard (FDIS) July 2001  International Standard (IS) September 2001

39 MPEG-7 work plan:  See : Annex A of MPEG-7 Overview (version 9) 7/mpeg-7.htm 7/mpeg-7.htm

40 Annotated Link Page / References   All pictures taken from:  P. Salembier and O. Avaro, “MPEG-7: Multimedia Content Description interface”,

Download ppt "MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20."

Similar presentations

Ads by Google