Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007.

Similar presentations


Presentation on theme: "© Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007."— Presentation transcript:

1 © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

2 © Copyright 2005 Michael Smith 2 History Copyright © 2001-2003 1990 – 1998 Technical Innovation Digital Libraries and Automated Video Editing Multimodal Content Analysis 1996 – 2002 Attempts at Commercialization Corporate Spin Offs lead to Mergers 2000 – 2004 Broadband for Video Internet Search and Enterprise Asset Management 2005 –  Mobile Media, Social Media and Personalization Image and Audio Features Camera and Object Motion Text, Face and Object Detection Video Summarization Hierarchical Rules for Video Summaries Combination of Text, Audio and Image Features

3 © Copyright 2005 Michael Smith 3 Goal: Automatic Video Characterization Scene Cuts Camera Objects Action Captions Scenery Yellowstone Static Adult Female Head Motion [Logo] Indoor Static Animal Left Motion Yellowstone Outdoor Zoom Two adults [Logo] Indoor

4 © Copyright 2005 Michael Smith 4

5 5 Static Filmstrip Abstraction

6 © Copyright 2005 Michael Smith 6 Active “Video Skim” Generation

7 © Copyright 2005 Michael Smith 7 Techniques Underlying Video Metadata Image processing Detection of text overlaid on video Detection of faces Identification of camera and object motion Breaking video into component shots Detecting corpus-specific categories, e.g., anchorperson shots and weather map shots Speech recognition Text extraction and alignment Natural language processing Determining best text matches for a given query Identifying places, organizations, people Producing phrase summaries

8 © Copyright 2005 Michael Smith 8 Combined Technologies Integration Text Detection Camera Motion Face Detection Shot Changes Word Relevance Audio Level

9 © Copyright 2005 Michael Smith 9 Text Detection Text and Face Detection

10 © Copyright 2005 Michael Smith 10 “Name-It” Face/Name Association Video Transcript …said President Clinton. Al Gore presented his policies….Gore stated…. In a gala affair, Clinton addressed…. Face/Name Association (Co-occurrence evaluation) Face Extraction Name Extraction Who is Gore? Clinton

11 © Copyright 2005 Michael Smith 11 Camera and Motion Detection Pan Right object motion (not pan left)

12 © Copyright 2005 Michael Smith 12 MPEG IBP Frames

13 © Copyright 2005 Michael Smith 13 MPEG and Editing Limitations in Frame and Cut Accuracy Most editors use raw or lossless compression Thompson to release JPEG 2000 based I-frame only compression

14 © Copyright 2005 Michael Smith 14

15 © Copyright 2005 Michael Smith 15 http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm#E11E42 http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm#E11E42http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm#E11E42http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm#E11E42 MPEG 7 Ontology

16 © Copyright 2005 Michael Smith 16 Useful Video Format Links http://www.ultimatewebdesigning.com/articles/formats.htmlhttp://www.ultimatewebdesigning.com/articles/formats.htmlhttp://www.ultimatewebdesigning.com/articles/formats.html http://www.theasc.com/news/index.htmlhttp://www.theasc.com/news/index.htmlhttp://www.theasc.com/news/index.html http://users.tkk.fi/~iisakkil/videoformats.htmlhttp://users.tkk.fi/~iisakkil/videoformats.htmlhttp://users.tkk.fi/~iisakkil/videoformats.html

17 © Copyright 2005 Michael Smith 17 Speech Recognition Functions Generates transcript to enable text-based retrieval from spoken language documentsGenerates transcript to enable text-based retrieval from spoken language documents Improves text synchronization to audio/video in presence of scriptsImproves text synchronization to audio/video in presence of scripts Provides speech interface to digital libraryProvides speech interface to digital library Supplies necessary information for library segmentation and multimedia abstractionsSupplies necessary information for library segmentation and multimedia abstractions Modern systems rely more on single phoneme detection than double or triple phoneme pairsModern systems rely more on single phoneme detection than double or triple phoneme pairs

18 © Copyright 2005 Michael Smith 18 Commercial Benchmark Lab TV Studio Dialog Broadcast News Word Error Rate Speech Recognition Accuracy

19 © Copyright 2005 Michael Smith 19 Information Retrieval Recall vs. Speech Recognition Accuracy Word Error Rate Relative Recall % of Text IR 100 90 80 70 60 50 40 0 10 20 30 40 50 60 70 80

20 © Copyright 2005 Michael Smith 20 Early Lessons Learned Titles frequently used, should include length and production dateTitles frequently used, should include length and production date Results and title placement affect usageResults and title placement affect usage Greater quantity of video was desiredGreater quantity of video was desired Storyboards (filmstrips) used infrequentlyStoryboards (filmstrips) used infrequently

21 © Copyright 2005 Michael Smith 21 Yahoo Search Example Search bars placed at different locations for each BrowserSearch bars placed at different locations for each Browser Placement of Search bar improves usagePlacement of Search bar improves usage Centered Placement preferred over left or right placement in portal applicationsCentered Placement preferred over left or right placement in portal applications

22 © Copyright 2005 Michael Smith 22 Empirical Study: Skims DFL - “default” long skim DFS - default short skim NEW - selective skim RND - same audio as NEW but with unsynchronized video

23 © Copyright 2005 Michael Smith 23 Skim Study Results Subjects asked if image was in the video just seen Subjects asked if text summarizes info. that would be in full source video © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann Carnegie Mellon

24 © Copyright 2005 Michael Smith 24 Skim Study QUIS Results wonderful, satisfying, stimulating terrible, frustrating, dull

25 © Copyright 2005 Michael Smith 25 Skims: Preliminary Findings Real benefit for skims appears to be for comprehension rather than navigationReal benefit for skims appears to be for comprehension rather than navigation For PBS documentaries, information in audio track is very importantFor PBS documentaries, information in audio track is very important Empirical study conducted in September 1997 to determine advantages of skims over subsampled video, and synchronization requirements for audio and visualsEmpirical study conducted in September 1997 to determine advantages of skims over subsampled video, and synchronization requirements for audio and visuals

26 © Copyright 2005 Michael Smith 26 Adding Imagery to Visualizations Query-based thumbnail images added to timeline and map interfacesQuery-based thumbnail images added to timeline and map interfaces Extend concept of “highest scoring” to represent country, or a point in timeExtend concept of “highest scoring” to represent country, or a point in time

27 © Copyright 2005 Michael Smith 27 How Much Text, and Does Layout Matter? NoText BriefByRow Brief AllByRow All

28 © Copyright 2005 Michael Smith 28 Informedia Research Timeline Copyright © 2001-2003

29 © Copyright 2005 Michael Smith 29 Applications Copyright © 2001-2003 Corporate Spin Offs Virage -> Autonomy Virage -> Pictron -> Yahoo? Excalibur -> Convera (Enterprise Search) ISLIP -> MediaSite -> Sonic Foundry Media Asset Management Context Media Semagix Blue Order

30 © Copyright 2005 Michael Smith 30 Systems That Work Copyright © 2001-2003 Image Matching Evolution Robotics Neven Vision Internet Video Google Yahoo Blinkx

31 © Copyright 2005 Michael Smith 31 ViPR™ Algorithm Database building SIFT feature extractionSIFT feature extraction Add SIFT features to databaseAdd SIFT features to database Matching a new image SIFT feature extractionSIFT feature extraction Feature pair-wise matchingFeature pair-wise matching Clustering by votingClustering by voting Pose refinementPose refinement Model Match

32 © Copyright 2005 Michael Smith 32 Semantic Music Correlation Copyright © 2001-2003 Predexis Automated Statistical Features (pitch, frequency, etc.) Siren Systems Pseudo Automated Features (Statistical and User Genre) Savage Beast Manual Feature Set (300 – 400 features per Genre)

33 © Copyright 2005 Michael Smith 33 Semantic Video Correlation Copyright © 2001-2003 Commercial Netflix, Amazon, Blockbuster, YouTube Social YouTube, Cuts, StumbleVideo Research Machine Learning on User and Commercial MetaData Video Buzz Tracking and Usage Visual Pattern Recognition?

34 © Copyright 2005 Michael Smith 34 Comparison of Video Buzz Sites Copyright material removed See Splashcast Blog

35 © Copyright 2005 Michael Smith 35 What’s next Content Media Remixing – Video RemixingMedia Remixing – Video Remixing User-Generated ContentUser-Generated ContentMonetization The Legal Aspects of New MediaThe Legal Aspects of New Media Digital AdvertisingDigital Advertising Emerging Technology MobilityMobility High Def and Super High DefHigh Def and Super High Def Virtual Environments and Immersive SystemsVirtual Environments and Immersive Systems Previsualization, Previsualization,

36 © Copyright 2005 Michael Smith 36 2007Advertising Market Projection Source :expand-March 2007 Online19 Billion Radio$21 Billion Other$43 Billion TV $71 Billion Print$102 Billion Direct $478 Billion Marketing Total$734 Billion

37 © Copyright 2005 Michael Smith 37 Emerging Technology High Def and Super High Def and Photo Realism

38 © Copyright 2005 Michael Smith 38 Emerging Technology Virtual Environments and Immersive Systems PrevisualizationPrevisualization Synthetic HumansSynthetic Humans

39 © Copyright 2005 Michael Smith 39 3D Visualization 3D Previsualization3D Previsualization Pixel Liberation Front www.thefront.com www.thefront.com 3D morphable model face animation3D morphable model face animationhttp://www.kyb.tuebingen.mpg.de/bu/people/volker/

40 © Copyright 2005 Michael Smith 40 Emerging Technology SportsSports Ad insertion Ad insertion Logging - http://www.dixonsports.com/images/liveevent/diagrams.html Logging - http://www.dixonsports.com/images/liveevent/diagrams.html HealthcareHealthcare Patient Monitoring Patient Monitoring Remote Diagnostics Remote Diagnostics Security and SurveillanceSecurity and Surveillance Forensics and DRMForensics and DRM Cameras as Sensors Cameras as Sensors Watermarking Watermarking

41 Credits Many Informedia Project and CMU research community members contributed to this work; a partial list appears here: Project Director: Howard Wactlar User Interface: Mike Christel, Chang Huang, Adrienne Warmack, Dave Winkler Image Processing: Takeo Kanade, Norm Papernick, Toshio Sato, Henry Schneiderman, Michael Smith Speech and Language Processing: Alex Hauptmann, Ricky Houghton, Rong Jin, Raj Reddy, Michael Witbrock Informedia Library Essentials: Bob Baron, Bruce Cardwell, Colleen Everett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig Marcus © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 41 Carnegie Mellon


Download ppt "© Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007."

Similar presentations


Ads by Google