Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.

Similar presentations


Presentation on theme: "1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia."— Presentation transcript:

1 1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia

2 2 Course Administration Discussion classes attend! speak!

3 3 Informedia Digital Video Library Collections: Segments of video programs, e.g., TV and radio news and documentary broadcasts. Cable Network News, British Open University, WQED television. Segmentation: Automatically broken into short segments of video, such as the individual items in a news broadcast. Size: More than 1,500 hours, 1 terabyte. Objective: Research into automatic methods for organizing and retrieving information from video. Funding: NSF, DARPA, NASA and others. Principal investigator: Howard Wactlar.

4 4

5 5 Surrogates A video sequence is awkward for information discovery: Textual methods of information retrieval cannot be applied Browsing requires the user to view the sequence. Fast skimming is difficult. Computing requirements are demanding (MPEG-1 requires 1.2 Mbits/sec). Surrogates are required

6 6 Thumbnails, Filmstrips and Video Skims Thumbnail: A single image that illustrates the content of a video Filmstrip: A sequence of thumbnails that illustrate the flow of a video segment Video skim: A short video that summarizes the contents of a longer sequence, by combining shorter sequences of video and sound that provide an overview of the full sequence

7 7 Creating a Filmstrip Separate video sequence into shots Use techniques from image recognition to identify dramatic changes in scene. Frames with similar color characteristics are assumed to be part of a single shot. Choose a sample frame Default is to select the middle frame from the shot. If camera motion, select frame where motion ends. User feedback: Frames are tied to time sequence.

8 8 Creating Video Skims Static: Precomputed based on video and audio phrases Fixed compression, e.g., one minute skim of 10 minute sequence Dynamic: After a query, skim is created to emphasize context of the hit Variable compression selected by user Adjustable during playback

9 9 Multi-Modal Information Discovery The multi-modal approach to information retrieval Computer programs to analyze video materials for clues e.g., changes of scene methods from artificial intelligence, e.g., speech recognition, natural language processing, image recognition. analysis of video track, sound track, closed captioning if present, any other information. Each mode gives imperfect information. Therefore use many approaches and combine the evidence.

10 10 Informedia Library Creation Video Audio Text Speech recognition Image extraction Natural language interpretation Segmentation Segments with derived metadata

11 11 Informedia: Information Discovery User Segments with derived metadata Browsing via multimedia surrogates Querying via natural language Requested segments and metadata

12 12 Text Extraction Source Sound track: Automatic speech recognition using the Sphinx II recognition system. (Unrestricted vocabulary, speaker independent, multi-lingual, background sounds). Error rates 25% up. Closed captions: Digitally encoded text. (Not on all video. Often inaccurate.) Text on screen: Can be extracted by image recognition and optical character recognition. (Matches speaker with name.) Query Spoken query: Automatic speech recognition using the same system as is used to index the sound track. Typed by user

13 13 An Evaluation Experiment Test corpus: 602 news stories from CNN, etc. Average length 672 words. Manually transcribed to obtained accurate text. Speech recognition of text using Sphinx II (50.7% error rate) Errors introduced artificially to give error rates from 0% to 80%. Relative precision and recall (using a vector ranking) were used as measures of retrieval performance. As word error rate increased from 0% to 50%: Relative precision fell from 80% to 65% Relative recall fell from 90% to 80%

14 14 Multimodal Metadata Extraction


Download ppt "1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia."

Similar presentations


Ads by Google