Informedia Interface Evaluation and Information Visualization Digital Video Library December 10, 2002 Mike Christel.

Slides:

Advertisements

Similar presentations

Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.

Advertisements

History Study Center Primary and secondary sources documenting global history 2010.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

6 Developing Content and Layout Section 6.1 Generate and organize content ideas Write and organize Web text Section 6.2 Identify page dimension guidelines.

Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.

Information Retrieval: Human-Computer Interfaces and Information Access Process.

Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.

Information Retrieval in Practice

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Information Retrieval Review

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

© Copyright 2005 Michael Smith 1 AVA Media Copyright © Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007.

ADVISE: Advanced Digital Video Information Segmentation Engine

1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,

Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.

1 Discussion Class 10 Informedia. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.

January 19, 2006 Mike Christel Carnegie Mellon University Digital Video Research Evaluation and User Studies with Respect to Video.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

Information Retrieval: Human-Computer Interfaces and Information Access Process.

WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Design and Implementation of a Wireless Handheld Multimodal Digital Video Library Client System Sam K. S. Sze Henry K. P. Choi Feb Group Meeting.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 1 Informedia 03/12/97.

Digital Video Library Experience in Large Scale Content Management VIEW Technologies Symposium – CUHK – August 2002 Howard Wactlar Carnegie Mellon University,

Overview of Search Engines

Digital Storytelling Tell me a fact and I’ll learn

Simon Tucker NLP Presentation Efficient user-centred access to multimedia meeting content Simon Tucker and Steve Whittaker University.

Information Retrieval in Practice

1 Lessons Learned From Building a Terabyte Digital Video Library Presented by Jia Yao Multimedia Communications and Visualization Laboratory Department.

Facilitating Access to Video Oral Histories through Informedia Technologies and a Multimedia Web Portal October 29, 2010 Presented at Oral History Association.

1 The BT Digital Library A case study in intelligent content management Paul Warren

Multimedia Databases (MMDB)

Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.

Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.

Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.

Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.

SUMMON ® 2.0 DISCOVERY REINVENTED. What is Summon 2.0? A new, streamlined, modern interface New and enhanced features providing layers of contextual guidance.

TRECVID Evaluations Mei-Chen Yeh 05/25/2010. Introduction Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.

Directions for Hypertext Research: Exploring the Design Space for Interactive Scholarly Communication John J. Leggett & Frank M. Shipman Department of.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

CS3041 – Final week Today: Searching and Visualization Friday: Software tools –Study guide distributed (in class only) Monday: Social Imps –Study guide.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.

Making great productions takes more than great ideas. You need the right raw material. The storyboard can be used as a reminder of the productions content.

Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

Information Retrieval

Accessing News Video Libraries through Dynamic Information Extraction, Summarization, and Visualization Mike Christel Carnegie Mellon University, USA June.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.

Colby Smart, E-Learning Specialist Humboldt County Office of Education

1 Evaluation of Multi-Media Data QA Systems AQUAINT Breakout Session – June 2002 Howard Wactlar, Carnegie Mellon Yiming Yang, Carnegie Mellon Herb Gish,

Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.

Summon® 2.0 Discovery Reinvented

Digital Video Library - Jacky Ma.

Synchronization for Multi-Perspective Videos in the Wild

Visual Information Retrieval

CS 430: Information Discovery

Simon Tucker and Steve Whittaker University of Sheffield

Discussion Class 9 Informedia.

Presentation transcript:

Informedia Interface Evaluation and Information Visualization Digital Video Library December 10, 2002 Mike Christel

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon Outline Surrogates for Informedia Digital Video LibrarySurrogates for Informedia Digital Video Library Abstractions for single video document Abstractions for single video document Empirical studies on thumbnail images, skims Empirical studies on thumbnail images, skims Quick overview of early HCI investigations Quick overview of early HCI investigations Summaries across video documents (collages)Summaries across video documents (collages) Demonstration of information visualization Demonstration of information visualization Required advances in automated content extraction Required advances in automated content extraction TREC Video Retrieval Track 2002TREC Video Retrieval Track 2002 Overview of Carnegie Mellon participation and results Overview of Carnegie Mellon participation and results Multiple storyboard interface emphasizing imagery Multiple storyboard interface emphasizing imagery

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 3 Carnegie Mellon Informedia Digital Video Library Project Initiated by the National Science Foundation, DARPA, and NASA the Digital Libraries Initiative, Initiated by the National Science Foundation, DARPA, and NASA under the Digital Libraries Initiative, Continued funding via Digital Libraries Initiative Phase 2 (NSF, DARPA, National Library of Medicine, Library of Congress, NASA, National Endowment for the Humanities)Continued funding via Digital Libraries Initiative Phase 2 (NSF, DARPA, National Library of Medicine, Library of Congress, NASA, National Endowment for the Humanities) New work and directions via NSF, NSDL, ARDA VACE, “Capturing, Coordinating, and Remembering Human Experience” CCRHE Project, etc.New work and directions via NSF, NSDL, ARDA VACE, “Capturing, Coordinating, and Remembering Human Experience” CCRHE Project, etc. Details at at

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 4 Carnegie Mellon Techniques Underlying Video Metadata Image processing Detection of text overlaid on video Detection of faces Identification of camera and object motion Breaking video into component shots Detecting corpus-specific categories, e.g., anchorperson shots and weather map shots Speech recognition Text extraction and alignment Natural language processing Determining best text matches for a given query Identifying places, organizations, people Producing phrase summaries

Text and Face Detection

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 6 Carnegie Mellon Text Extraction and Alignment Raw audio Text extraction Raw video SILENCE MUSIC electric cars are they are the jury every toy owner hopes to please

Deriving “Matching Shots” Shot Detection Shot Frame Extraction Speech Recognition and Alignment These strange markings, preserved in the clay of a Texas riverbed, are footsteps… dinosaur graveyards… group of scientists… nature’s special effects Matching Shot for “dinosaur footprint” Align Words to Shots Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 8 Carnegie Mellon Initial User Testing of Video Library, ca hour library consisting of 3481 clips104 hour library consisting of 3481 clips Average clip length of 1.8 minutes, consuming 15.7 megabytes of storageAverage clip length of 1.8 minutes, consuming 15.7 megabytes of storage Automatic logs generated for usage of Informedia Library by high school science teachers and studentsAutomatic logs generated for usage of Informedia Library by high school science teachers and students 243 hours logged (2473 queries, 2910 video clips played)243 hours logged (2473 queries, 2910 video clips played)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 9 Carnegie Mellon Early Lessons Learned Titles frequently used, should include length and production dateTitles frequently used, should include length and production date Results and title placement affect usageResults and title placement affect usage Greater quantity of video was desiredGreater quantity of video was desired Storyboards (filmstrips) used infrequentlyStoryboards (filmstrips) used infrequently

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 10 Carnegie Mellon Empirical Study Into Thumbnail Images

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 11 Carnegie Mellon Text-based Result List

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 12 Carnegie Mellon “Naïve” Thumbnail List (Uses First Shot Image)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 13 Carnegie Mellon Query-based Thumbnail Result List

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 14 Carnegie Mellon Query-based Thumbnail Selection Process 1. Decompose video segment into shots. 2. Compute representative frame for each shot. 3. Locate query scoring words (shown by arrows). 4. Use frame from highest scoring shot.

Thumbnail Study Results © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 15 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 16 Carnegie Mellon Empirical Study Summary* Significant performance improvements for query- based thumbnail treatment over other two treatmentsSignificant performance improvements for query- based thumbnail treatment over other two treatments Subjective satisfaction significantly greater for query- based thumbnail treatmentSubjective satisfaction significantly greater for query- based thumbnail treatment Subjects could not identify differences between thumbnail treatments, but their performance definitely showed differences!Subjects could not identify differences between thumbnail treatments, but their performance definitely showed differences!_____ *Christel, M., Winkler, D., and Taylor, C.R. Improving Access to a Digital Video Library. In Human-Computer Interaction: INTERACT97, Chapman & Hall, London, 1997,

Thumbnail View with Query Relevance Bar © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 17 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 18 Carnegie Mellon Close-up of Thumbnail with Relevance Bar Relevance score of [0, 100] This document has score of 30 Color-coded scoring words: “Asylum” contributes some, “rights” a bit, “refugee” contributes 50% Query-based thumbnail Shortcut to storyboard

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 19 Carnegie Mellon “Skim Video”: Extracting Significant Content Skim Video (78 frames) Original Video (1100 frames)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 20 Carnegie Mellon Skims: Preliminary Findings Real benefit for skims appears to be for comprehension rather than navigationReal benefit for skims appears to be for comprehension rather than navigation For PBS documentaries, information in audio track is very importantFor PBS documentaries, information in audio track is very important Empirical study conducted in September 1997 to determine advantages of skims over subsampled video, and synchronization requirements for audio and visualsEmpirical study conducted in September 1997 to determine advantages of skims over subsampled video, and synchronization requirements for audio and visuals

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 21 Carnegie Mellon Empirical Study: Skims DFL - “default” long skim DFS - default short skim NEW - selective skim RND - same audio as NEW but with unsynchronized video

Skim Study Results Subjects asked if image was in the video just seen Subjects asked if text summarizes info. that would be in full source video © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 23 Carnegie Mellon Skim Study QUIS Results wonderful, satisfying, stimulating terrible, frustrating, dull

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 24 Carnegie Mellon Skim Study Results* 1996 “selective” skims performed no better than subsampled skims, but results from 1997 study show significant differences with “selective” skims more satisfactory to users audio is less choppy than earlier 1996 skim work audio is less choppy than earlier 1996 skim work synchronization with video is better preserved synchronization with video is better preserved grain size has increased grain size has increased_____ *Christel, M., Smith, M., Taylor, C.R., and Winkler, D. Evolving Video Skims into Useful Multimedia Abstractions. In Proc. ACM CHI ’98 (Los Angeles, CA, April 1998), ACM Press,

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 25 Carnegie Mellon Match Information

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 26 Carnegie Mellon Using Match Information For Browsing

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 27 Carnegie Mellon Using Match Info to Reduce Storyboard Size

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 28 Carnegie Mellon Adding Value to Video Surrogates via Text Captions AND pictures better than either modality aloneCaptions AND pictures better than either modality alone Large, A., et al. Multimedia and Comprehension: The Relationship among Text, Animation, and Captions. J. American Society for Information Science 46(5) (June 1995), Large, A., et al. Multimedia and Comprehension: The Relationship among Text, Animation, and Captions. J. American Society for Information Science 46(5) (June 1995), Nugent, G.C. Deaf Students' Learning from Captioned Instruction: The Relationship between the Visual and Caption Display. J. Special Education 17(2) (1983), Nugent, G.C. Deaf Students' Learning from Captioned Instruction: The Relationship between the Visual and Caption Display. J. Special Education 17(2) (1983), Video surrogates better with BOTH images and textVideo surrogates better with BOTH images and text Ding, W., et al. Multimodal Surrogates for Video Browsing. In Proc. ACM Conf. on Digital Lib. (Berkeley, CA, Aug. 1999), Ding, W., et al. Multimodal Surrogates for Video Browsing. In Proc. ACM Conf. on Digital Lib. (Berkeley, CA, Aug. 1999), Christel, M. and Warmack, A. The Effect of Text in Storyboards for Video Navigation. In Proc. IEEE ICASSP, (Salt Lake City, UT, May 2001), Vol. III, pp Christel, M. and Warmack, A. The Effect of Text in Storyboards for Video Navigation. In Proc. IEEE ICASSP, (Salt Lake City, UT, May 2001), Vol. III, pp For news/documentaries, audio narrative is important, but other video genres may be differentFor news/documentaries, audio narrative is important, but other video genres may be different Li, F., Gupta, A., et al. Browsing Digital Video. In Proc. ACM CHI ’00 (The Hague, Neth., April 2000), Li, F., Gupta, A., et al. Browsing Digital Video. In Proc. ACM CHI ’00 (The Hague, Neth., April 2000),

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 29 Carnegie Mellon How Much Text, and Does Layout Matter? NoText BriefByRow Brief AllByRow All

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 30 Carnegie Mellon Results from Christel/Warmack Study Task Completion Time (secs.) NoText AllByRow All BriefByRow Brief NoTextAllByRowAllBriefByRowBrief Mean Completion Times, in seconds: Graph with 95% confidence intervals

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 31 Carnegie Mellon More Results from Storyboard/Text Study Mean Ranking for Treatments (1 = favorite, 5 = least favorite): NoTextAllByRowAllBriefByRowBrief AllByRow was favored, but had relatively poor performance (160 seconds for tasks) BriefByRow ranked 2 nd by preference, and had the best performance (117 seconds for tasks)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 32 Carnegie Mellon Conclusions from Storyboard/Text Study Storyboard surrogates clearly improved with textStoryboard surrogates clearly improved with text Participants favored interleaved presentationParticipants favored interleaved presentation Navigation efficiency is best served with reduced interleaved text (BriefByRow)Navigation efficiency is best served with reduced interleaved text (BriefByRow) BriefByRow and All had best task performance, but BriefByRow requires less display spaceBriefByRow and All had best task performance, but BriefByRow requires less display space If interleaving is done in conjunction with text reduction, to better preserve and represent the time association between lines of text, imagery and their affiliated video sequence, then a storyboard with great utility for information assessment and navigation can be constructed.If interleaving is done in conjunction with text reduction, to better preserve and represent the time association between lines of text, imagery and their affiliated video sequence, then a storyboard with great utility for information assessment and navigation can be constructed.

Discussed Multimedia Surrogates, i.e., Abstractions based on Library Metadata text title Static Temporal skim video match bars Content detail, object size storyboard thumbnail image © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 33 Carnegie Mellon

Range of Multimedia Surrogates text title Static Temporal thumbnail image skim video match bars Content detail, object size storyboard audio data full text transcript storyboard with audio © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 34 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 35 Carnegie Mellon Evaluating Multimedia Surrogates Techniques discussed here:Techniques discussed here: transaction logs transaction logs formal empirical studies formal empirical studies Other techniques used in interface refinement:Other techniques used in interface refinement: contextual inquiry contextual inquiry heuristic evaluation heuristic evaluation cognitive walkthroughs cognitive walkthroughs “think aloud” protocols “think aloud” protocols

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 36 Carnegie Mellon Extending to Surrogates ACROSS Video As digital video assets grow, so do result setsAs digital video assets grow, so do result sets As automated processing techniques improve, e.g., speech and image processing, more metadata is generated with which to build interfaces into videoAs automated processing techniques improve, e.g., speech and image processing, more metadata is generated with which to build interfaces into video Need overview capability to deal with greater volumeNeed overview capability to deal with greater volume Prior work offered many solutions:Prior work offered many solutions: Visualization By Example (VIBE) for matching entity relationships Visualization By Example (VIBE) for matching entity relationships Scatter plots for low dimensionality relationships, e.g., timelines Scatter plots for low dimensionality relationships, e.g., timelines Dynamic query sliders for direct manipulation of plots Dynamic query sliders for direct manipulation of plots Colored maps for geographic relationships Colored maps for geographic relationships

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 37 Carnegie Mellon Enhancing Library Utility via Better Metadata (final representation) Summarizer PeopleEvent Topics Affiliation LocationTime User Interface Metadata Extractor Perspective Templates

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 38 Carnegie Mellon Displaying Metadata in Effective “Collages” North Pacific Ocean South Pacific Ocean Map collage emphasizing distribution by nation of “El Niño effects” with overlaid thumbnails

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 39 Carnegie Mellon Zooming into “Collage” to Reveal Details

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 40 Carnegie Mellon Example of “Chrono-Collage” March 1998 April 1998 May 1998 Suharto economic reform meetings U.S. policy on Indonesia Habibie new president El Niño wildfires Student protests against Suharto Timeline collage emphasizing “key player faces” and short event descriptors, representing the same data shown in Indonesia map perspective.

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 41 Carnegie Mellon Named Entity Extraction CNN national correspondent John Holliman is at Hartsfield International Airport in Atlanta. Good morning, John. …But there was one situation here at Hartsfield where one airplane flying from Atlanta to Newark, New Jersey yesterday had a mechanical problem and it caused a backup that spread throughout the whole system because even though there were a lot of planes flying to the New York area from the Atlanta area yesterday, …. Key: Place, Time, Organization/Person F. Kubala, R. Schwartz, R. Stone, and R. Weischedel, “Named Entity Extraction from Speech”, Proc. DARPA Workshop on Broadcast News Understanding Systems, Lansdowne, VA, February 1998.

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 42 Carnegie Mellon Challenge: Integrating Imagery into Collages

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 43 Carnegie Mellon Great Volume of Imagery Requires Filtering Video can be decomposed into shotsVideo can be decomposed into shots Consider 2050 hours of CNN videos from Consider 2050 hours of CNN videos from ,688,000 shots1,688,000 shots 67,700 segments/stories67,700 segments/stories 1 minute 53 seconds average story duration1 minute 53 seconds average story duration 4.5 seconds average shot duration4.5 seconds average shot duration 23 shots per segment on average23 shots per segment on average Result sets for queries number in the hundreds or thousandsResult sets for queries number in the hundreds or thousands Against 2001 CNN collection, top 1000 stories for queries on “terrorism” and “bomb threat” produced and shots respectively Against 2001 CNN collection, top 1000 stories for queries on “terrorism” and “bomb threat” produced and shots respectively User needs a way to filter down tens of thousands of imagesUser needs a way to filter down tens of thousands of images

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 44 Carnegie Mellon Adding Imagery to Visualizations Query-based thumbnail images added to VIBE, timeline, map summariesQuery-based thumbnail images added to VIBE, timeline, map summaries Layout differs: overlap in VIBE/timeline; tile in mapLayout differs: overlap in VIBE/timeline; tile in map Extend concept of “highest scoring” to represent country, or a point in time or a point on VIBE plotExtend concept of “highest scoring” to represent country, or a point in time or a point on VIBE plot

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 45 Carnegie Mellon Leveraging from Our Prior Video Summarization Work Context, e.g., matching terms, and synchronization between imagery and narrative can reduce summary complexityContext, e.g., matching terms, and synchronization between imagery and narrative can reduce summary complexity Text with imagery more useful in video summaries than either text alone or imagery aloneText with imagery more useful in video summaries than either text alone or imagery alone “Overview first, zoom and filter, then details on demand”“Overview first, zoom and filter, then details on demand” Visual Information-Seeking Mantra of Ben Schneiderman Visual Information-Seeking Mantra of Ben Schneiderman Direct manipulation interfaces leave the user in control Direct manipulation interfaces leave the user in control Iterative prototyping reveals areas needing further workIterative prototyping reveals areas needing further work

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 46 Carnegie Mellon Adding Text Overviews to Collages Transcript and other derived text such as scene text and characters overlaid on broadcast video provide input for further processingTranscript and other derived text such as scene text and characters overlaid on broadcast video provide input for further processing Named entity tagging and common phrase extraction provides filtering mechanism to reduce text into defined subsetsNamed entity tagging and common phrase extraction provides filtering mechanism to reduce text into defined subsets Visualization interface allows subsets, e.g., people, organizations, locations, and common phrases, to be displayed for the set of documents plotted in the visualization viewVisualization interface allows subsets, e.g., people, organizations, locations, and common phrases, to be displayed for the set of documents plotted in the visualization view

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 47 Carnegie Mellon Example of Text-Augmented Timeline Most frequent common phrases and people from query on “anthrax” against 2001 news listed beneath timeline plot.

Example of Text-Augmented VIBE Plot Left pane shows videos from 1/01 – 3/01 focus on refugees and asylum. Right pane shows videos from 5/01 – 8/01 focus on human rights and stem cell research. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 48 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 49 Carnegie Mellon Refinement of Collages* Image addition to summaries improved over timeImage addition to summaries improved over time Anchorperson removal for more representative visuals Anchorperson removal for more representative visuals Consume more space in timeline with images via better layout Consume more space in timeline with images via better layout Image resizing under user control to see detail on demand Image resizing under user control to see detail on demand Text addition found to require new interface controlsText addition found to require new interface controls Selection controls, e.g., list people, organizations, locations, and/or common phrases Selection controls, e.g., list people, organizations, locations, and/or common phrases Stopping rules, e.g., list at most X terms, list terms only if they are covered by Y documents or Z% of document set Stopping rules, e.g., list at most X terms, list terms only if they are covered by Y documents or Z% of document set Show some text where user’s attention is focused, by the mouse pointer, i.e., pop-up tooltips text Show some text where user’s attention is focused, by the mouse pointer, i.e., pop-up tooltips text_____ *Christel, M., et al. Collages as Dynamic Summaries for News Video. In Proc. ACM Multimedia ’02 (Juan-les-Pins, France, December 2002)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 50 Carnegie Mellon NIST TREC Video Retrieval Track Definitive information at NIST TREC Video Track web site: information at NIST TREC Video Track web site: TREC series sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agenciesTREC series sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agencies Goal is to encourage research in information retrieval from large amounts of text by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results Goal is to encourage research in information retrieval from large amounts of text by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results Video Retrieval Track started in 2001Video Retrieval Track started in 2001 Goal is investigation of content-based retrieval from digital video Goal is investigation of content-based retrieval from digital video Focus on the shot as the unit of information retrieval rather than the scene or story/segment/clip Focus on the shot as the unit of information retrieval rather than the scene or story/segment/clip

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 51 Carnegie Mellon TREC-Video 2001 and TREC-Video collection had ~11 hours of MPEG-1 video: 260 segments, 8000 shots, 80,000 I-frames2001 collection had ~11 hours of MPEG-1 video: 260 segments, 8000 shots, 80,000 I-frames 2002 search test collection had ~40 hours of MPEG-1 video: 1160 segments, 14,524 shots (given by TREC-V), 292,000 I-frames2002 search test collection had ~40 hours of MPEG-1 video: 1160 segments, 14,524 shots (given by TREC-V), 292,000 I-frames 2001 results2001 results Definite need to define the unit of information retrieval Definite need to define the unit of information retrieval Automatic search (no human in loop) difficult: about 1/3 of queries were unanswered by any of the automatic systems Automatic search (no human in loop) difficult: about 1/3 of queries were unanswered by any of the automatic systems Research groups submitting search runs were Carnegie Mellon, Dublin City Univ., Fudan Univ. China, IBM, Johns Hopkins Univ., Lowlands Group Netherlands, Univ. Maryland, Univ. North Texas Research groups submitting search runs were Carnegie Mellon, Dublin City Univ., Fudan Univ. China, IBM, Johns Hopkins Univ., Lowlands Group Netherlands, Univ. Maryland, Univ. North Texas 2002 results to be published after TREC Conference in 11/ results to be published after TREC Conference in 11/02

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 52 Carnegie Mellon TREC-Video 2001 Queries Specific item or personSpecific item or person the planet Jupiter, corn on the cob, Ron Vaughn, Harry Hertz, Lou Gossett Jr., R. Lynn Bonderant Specific factSpecific fact number of spikes on Statue of Liberty’s crown Specific event or activitySpecific event or activity liftoff of the Space Shuttle, Ronald Reagan reading speech about Space Shuttle Instances of a categoryInstances of a category mountains as prominent scenery, scenes with a yellow boat, pink flowers Instances of events/activitiesInstances of events/activities vehicle traveling on the moon, water skiing, speaker talking in front of the US flag, chopper landing

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 53 Carnegie Mellon Carnegie Mellon TREC-Video 2001 Results* Retrieval using: ARR Recall Speech Recognition Transcripts only 1.84 % 13.2 % Raw Video OCR only 5.21 % 6.10 % Raw Video OCR + Speech Transcripts 6.36 % % Enhanced VOCR with dictionary post-processing 5.93 % 7.52 % Speech Transcripts + Enhanced Video OCR 7.07 % % Image Retrieval only using a probabilistic Model % % Image Retrieval + Speech Transcripts % % Image Retrieval + Face Detection % % Image Retrieval + Raw VOCR % % Image Retrieval + Enhanced VOCR % %____ Average reciprocal rank (ARR) used as evaluation metric *See for full report.

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 54 Carnegie Mellon TREC-Video 2002 Queries Specific item or personSpecific item or person Eddie Rickenbacker, James Chandler, George Washington, Golden Gate Bridge, Price Tower in Bartlesville, OK Specific factSpecific fact Arch in Washington Square Park in NYC, map of continental US Instances of a categoryInstances of a category football players, overhead views of cities, one or more women standing in long dresses Instances of events/activitiesInstances of events/activities people spending leisure time at the beach, one or more musicians with audible music, crowd walking in an urban environment, locomotive approaching the viewer

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 55 Carnegie Mellon TREC-Video 2002 Features for Auto-Detection Outdoors: recognizably outdoor locationOutdoors: recognizably outdoor location Indoors: recognizably indoor locationIndoors: recognizably indoor location Face: at least one human face with nose, mouth, and both eyesFace: at least one human face with nose, mouth, and both eyes People: group of two more humansPeople: group of two more humans Cityscape: recognizably city/urban/suburban settingCityscape: recognizably city/urban/suburban setting Landscape: a predominantly natural inland setting, i.e., one with little or no evidence of development by humansLandscape: a predominantly natural inland setting, i.e., one with little or no evidence of development by humans Text Overlay: superimposed text large enough to be readText Overlay: superimposed text large enough to be read Speech: human voice uttering recognizable wordsSpeech: human voice uttering recognizable words Instrumental Sound: sound produced by one or more musical instruments, including percussion instrumentsInstrumental Sound: sound produced by one or more musical instruments, including percussion instruments Monologue: an event in which a single person is at least partially visible and speaks for a long time without interruption by another speakerMonologue: an event in which a single person is at least partially visible and speaks for a long time without interruption by another speaker

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 56 Carnegie Mellon New Interface Development for TREC-V 2002 Multiple document storyboardsMultiple document storyboards Resolution and layout under user controlResolution and layout under user control Query context plays a key role in filtering image sets to manageable sizesQuery context plays a key role in filtering image sets to manageable sizes TREC 2002 image feature set offers additional filtering capabilities for indoor, outdoor, faces, people, etc.TREC 2002 image feature set offers additional filtering capabilities for indoor, outdoor, faces, people, etc. Displaying filter count and distribution guides their use in manipulating the storyboard viewsDisplaying filter count and distribution guides their use in manipulating the storyboard views

Multiple Document Storyboards © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 57 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 58 Carnegie Mellon Resolution and Layout under User Control

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 59 Carnegie Mellon Leveraging From Query Context User has already expressed information need via queryUser has already expressed information need via query Query-based thumbnail representation has proven summarization effectiveness*Query-based thumbnail representation has proven summarization effectiveness* Therefore, use query-based scoring for shot selection, reduce thousands of shots to tens or hundreds of shots *See INTERACT '97 Conference paper by Christel et al. for more details. Decompose video into shots, align query matches to shots, use highest-scoring shot to represent video segment

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 60 Carnegie Mellon TREC 2002 Image Feature Set

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 61 Carnegie Mellon Filter Interface for using Image Features

Example: Looking for Beach Shots, 863 shots © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 62 Carnegie Mellon

Ex.: “Outdoor” Beach Shots Set at 469 Shots © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 63 Carnegie Mellon

Ex.: Beach Shot Set Manageable Size of 56 after Filtering Out Shots with No People © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 64 Carnegie Mellon

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 65 Carnegie Mellon Conclusions Multi-document storyboard view facilitates quick inspection of large set of imagesMulti-document storyboard view facilitates quick inspection of large set of images First-order filtering by query very useful in providing user with an initial set of images for investigationFirst-order filtering by query very useful in providing user with an initial set of images for investigation Shots temporally near relevant shots often were relevant as well, so image ordering by video segment and time usefulShots temporally near relevant shots often were relevant as well, so image ordering by video segment and time useful Image features useful to filter, specific to certain queriesImage features useful to filter, specific to certain queries Drill-down to details, from images to video, necessary to eliminate ambiguityDrill-down to details, from images to video, necessary to eliminate ambiguity These strategies hold promise for finding visual information from video corpus beyond TREC 2002 collectionThese strategies hold promise for finding visual information from video corpus beyond TREC 2002 collection

Credits Many Informedia Project and CMU research community members contributed to this work; a partial list appears here: Project Director: Howard Wactlar User Interface: Mike Christel, Chang Huang, Adrienne Warmack, Dave Winkler Image Processing: Takeo Kanade, Norm Papernick, Toshio Sato, Henry Schneiderman, Michael Smith Speech and Language Processing: Alex Hauptmann, Ricky Houghton, Rong Jin, Raj Reddy, Michael Witbrock Informedia Library Essentials: Bob Baron, Bruce Cardwell, Colleen Everett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig Marcus © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 66 Carnegie Mellon