Carnegie Mellon TRECVID 2004 Workshop – November 2004 Mike Christel, Jun Yang, Rong Yan, and Alex Hauptmann Carnegie Mellon University

Slides:

Advertisements

Similar presentations

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

A GUIDE TO CREATING QUALITY ONLINE LEARNING DOING DISTANCE EDUCATION WELL.

Object Detection Using Semi- Naïve Bayes to Model Sparse Structure Henry Schneiderman Robotics Institute Carnegie Mellon University.

Information Retrieval: Human-Computer Interfaces and Information Access Process.

Ming-yu Chen and Jun Yang School of Computer Science Carnegie Mellon University Carnegie Mellon Feature Extraction Techniques CMU at TRECVID 2004.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Information Retrieval Review

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Information Retrieval: Human-Computer Interfaces and Information Access Process.

Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Presented by Zeehasham Rasheed

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.

Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Information Retrieval in Practice

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

1 Lessons Learned From Building a Terabyte Digital Video Library Presented by Jia Yao Multimedia Communications and Visualization Laboratory Department.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

An Architecture for Mining Resources Complementary to Audio-Visual Streams J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Svátek, T. Declerck, A. Cobet,

Content-Based Image Retrieval

Beyond Call Recording: Speech Improves Quality Assurance Larry Mark Chief Technology Officer SER Solutions, Inc.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid

1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.

Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

TRECVID 2004 Search Task by NUS PRIS Tat-Seng Chua, et al. National University of Singapore.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

HYP Progress Update By Zhao Jin. Outline Background Progress Update.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video TRECVID 2003 Carnegie Mellon University A. Hauptmann, R.V. Bron, M.-Y. Chen, M.Christel,

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.

Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.

TREC-2003 (CDVP TRECVID 2003 Team)- 1 - Center for Digital Video Processing C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g CDVP & TRECVID-2003.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Post-Ranking query suggestion by diversifying search Chao Wang.

1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities.

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Multi-Source Information Extraction Valentin Tablan University of Sheffield.

Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1

Visual Information Retrieval

Multimedia Content-Based Retrieval

Project Implementation for ITCS4122

Outline Announcement Texture modeling - continued Some remarks

Multimedia Information Retrieval

CSE 635 Multimedia Information Retrieval

Presentation transcript:

Carnegie Mellon TRECVID 2004 Workshop – November 2004 Mike Christel, Jun Yang, Rong Yan, and Alex Hauptmann Carnegie Mellon University Carnegie Mellon University Search

Carnegie Mellon Talk Outline CMU Informedia interactive search system features 2004 work: novice vs. expert, visual-only (no audio processing, hence no automatic speech recognized [ASR] text, no closed-captioned text) vs. full system that does use ASR and CC text Examination of results, esp. of visual-only vs. full system Questionnaires Transaction logs Automatic and manual search Conclusions

Carnegie Mellon Informedia Acknowledgments Supported by the Advanced Research and Development Activity (ARDA) under contract number NBCHC and H C-0406 Contributions from many researchers – see for more details

Carnegie Mellon CMU Interactive Search, TRECVID 2004 Challenge from TRECVID 2003: how usable is system without the benefit of ASR or CC (closed caption) text? Focus in 2004 on “visual-only” vs. “full system” Maintain some runs for historical comparisons Six interactive search runs submitted Expert with full system (addressing all 24 topics) Experts with visual only system (6 experts, 4 topics each) Novices, within-subjects design where each novice sees 2 topics in “full system” and 2 in “visual-only” 24 novice users (mostly CMU students) participated Produced 2 “visual-only” runs and 2 “full system” runs

Carnegie Mellon Two Clarifications Type A or Type B or Type C? Marked search runs as Type C ONLY because of the use of a face classifier by Henry Schneiderman which was trained with non-TRECVID data That face classification provided to TRECVID community Meaning of “expert” in our user studies “Expert” meant expertise with the Informedia retrieval system, NOT expertise with the TRECVID search test corpus “Novice” meant that user had no prior experience with video search as exhibited by the Informedia retrieval system nor any experience with Informedia in any role ALL users (novice and expert) had no prior exposure to the search test corpus before the practice run for the opening topic (limited to 30 minutes or less) was conducted

Carnegie Mellon Interface Support for Visual Browsing

Carnegie Mellon Interface Support for Image Query

Carnegie Mellon Interface Support for Text Query

Carnegie Mellon Interface Support to Filter Rich Visual Sets

Carnegie Mellon Characteristics of Empirical Study 24 novice users recruited via electronic bboard postings Independent work on 4 TRECVID topics, 15 minutes each Two treatments: F – full system, V – visual-only (no closed captioning or automatic speech recognized text) Each user saw 2 topics in treatment “F”, 2 in treatment “V” 24 topics for TRECVID 2003, so this study produced four complete runs through the 24 topics: two in “F”, two in “V” Intel Pentium 4 machine, 1600 x inch color monitor Performance results remarkably close for the repeated runs: mean average precision (MAP) for first run through treatment “F”, MAP for second run through “F” MAP for first run through treatment “V”, MAP for second run through “V”

Carnegie Mellon A Priori Hope for Visual-Only Benefits Optimistically, hoped that visual-only system would produce better avg. precision on some “visual” topics than full system, as visual-only system would promote “visual” strategies.

Carnegie Mellon Novice Users’ Performance

Carnegie Mellon Expert Users’ Performance

Carnegie Mellon Mean Avg. Precision, TRECVID 2004 Search 137 runs (62 interactive, 52 manual, 23 automatic) Interactive Manual Automatic

Carnegie Mellon TRECVID04 Search, CMU Interactive Runs Interactive Manual Automatic CMU Expert, Full System CMU Novice, Full System CMU Expert, Visual-Only CMU Novice, Visual-Only

Carnegie Mellon TRECVID04 Search, CMU Search Runs Interactive Manual Automatic CMU Expert, Full System CMU Novice, Full System CMU Expert, Visual-Only CMU Novice, Visual-Only CMU Manual CMU Automatic

Carnegie Mellon Satisfaction, Full System vs. Visual-Only 12 users asked which system treatment better: 4 liked first system better, 4 second system, 4 no preference 7 liked full system better, 1 liked the visual-only system better Easy to find shots? Enough time? Satisfied with results? Full System Visual-Only

Carnegie Mellon Summary Statistics, User Interaction Logs (statistics reported as averages) Novice Full Novice Visual Expert Full Expert Visual Number of minutes spent per topic (fixed by study) 15 Text queries issued per topic Word count per text query Number of video story segments returned by each text query Image queries per topic Precomputed feature sets (e.g., “roads”) browsed per topic

Carnegie Mellon Summary Statistics, User Interaction Logs (statistics reported as averages) Novice Full Novice Visual Expert Full Expert Visual Number of minutes spent per topic (fixed by study) 15 Text queries issued per topic Word count per text query Number of video story segments returned by each text query Image queries per topic Precomputed feature sets (e.g., “roads”) browsed per topic

Carnegie Mellon Summary Statistics, User Interaction Logs (statistics reported as averages) Novice Full Novice Visual Expert Full Expert Visual Number of minutes spent per topic (fixed by study) 15 Text queries issued per topic Word count per text query Number of video story segments returned by each text query Image queries per topic Precomputed feature sets (e.g., “roads”) browsed per topic

Carnegie Mellon Breakdown, Origins of Submitted Shots Expert Full Expert Visual- Only Novice Full Novice Visual- Only

Carnegie Mellon Breakdown, Origins of Correct Answer Shots Expert Full Expert Visual- Only Novice Full Novice Visual- Only

Carnegie Mellon Manual and Automatic Search Use text retrieval to find the candidate shots Re-rank the candidate shots by linearly combining scores from multimodal features Image similarity (color, edge, texture) Semantic detectors (anchor, commercial, weather, sports...) Face detection / recognition Re-ranking weights trained by logistic regression Query-Specific-Weight Trained on development set (truth collected within 15 min) Training on pseudo-relevance feedback Query-Type-Weight 5 Q-Types: Person, Specific Object, General Object, Sports, Other Trained using sample queries for each type

Carnegie Mellon Text Only vs. Text & Multimodal Features Multimodal features are slightly helpful with weights trained by pseudo-relevance feedback Weights trained on development set degrade the performance

Carnegie Mellon Development Set vs. Testing Set “Train-on-Testing” >> “Text only” > “Train-on-Development” Multimodal features are helpful if the weights are well trained Multimodal features with poorly trained weights hurt Difference of data distribution b/w development and testing data

Carnegie Mellon Contribution of Non-Textual Features (Deletion Test) Anchor is the most useful non-textual feature Face detection and recognition are slightly helpful Overall, image examples are not useful

Carnegie Mellon Contributions of Non-Textual Features (by Topic) Face recognition – overall helpful ++ “Hussein”, +++ “Donaldson” - “Clinton”, “Hyde”, “Netanyahu” Face detection (binary) – overall helpful + “golfer”, “people moving stretcher”, “handheld weapon” Anchor– overall & consistently helpful + all person queries HSV Color – slightly harmful ++ “golfer”, + “hockey rink”, + “people with dogs” -- “Bicycle”, “umbrella”, “tennis”, “Donaldson”

Carnegie Mellon Conclusions The relative high information retrieval performances by both experts and novices are due to reliance on an intelligent user possessing excellent visual perception skills to compensate for comparatively low precision in automatically classifying the visual contents of video Visual-only interactive systems better than full-featured manual or automatic systems ASR and CC text enable better interactive, manual, and automatic retrieval Anchor and face improve manual/automatic search over just text Novices will need additional interface scaffolding and support to try interfaces beyond traditional text search

Carnegie Mellon TRECVID 2004 Concept Classification Boat/ship: video of at least one boat, canoe, kayak, or ship of any type. Madeleine Albright: video of Madeleine Albright Bill Clinton: video of Bill Clinton Train: video of one or more trains, or railroad cars which are part of a train Beach: video of a beach with the water and the shore visible Basket scored: video of a basketball passing down through the hoop and into the net to score a basket - as part of a game or not Airplane takeoff: video of an airplane taking off, moving away from the viewer People walking/running: video of more than one person walking or running Physical violence: video of violent interaction between people and/or objects Road: video of part of a road, any size, paved or not

Carnegie Mellon TRECVID 2004 Concept Classification Boat/ship: video of at least one boat, canoe, kayak, or ship of any type. Madeleine Albright: video of Madeleine Albright Bill Clinton: video of Bill Clinton Train: video of one or more trains, or railroad cars which are part of a train Beach: video of a beach with the water and the shore visible Basket scored: video of a basketball passing down through the hoop and into the net to score a basket - as part of a game or not Airplane takeoff: video of an airplane taking off, moving away from the viewer People walking/running: video of more than one person walking or running Physical violence: video of violent interaction between people and/or objects Road: video of part of a road, any size, paved or not

Carnegie Mellon CAUTION: Changing MAP with users/topic It is likely that MAP for a group can be trivially improved by merely adding more users/topic with a simple selection strategy.

Carnegie Mellon Carnegie Mellon University Thank You

Carnegie Mellon TRECVID 2004 Search Topics TypeGenericSpecific Objects Buildings with flood waters, hockey net, umbrellas, wheelchairs U.S. Capitol dome People Street scenes, people walking dogs, people moving stretcher, people going up/down steps, protest/march with signs Henry Hyde, Saddam Hussein, Boris Yeltsin, Sam Donaldson, Benjamin Netanyahu, Bill Clinton with flag Events Fingers striking keyboard, golfer making shot, handheld weapon firing, moving bicycles, tennis player hitting ball, horses in motion ScenesBuildings on fire

Carnegie Mellon TRECVID 2004 Example Images for Topics

Carnegie Mellon Evaluation - TRECVID Search Categories INTERACTIVE: TOPIC HUMANQUERY SYSTEM RESULT Human (re)formulates query based on topic, query, and/or results System takes query as input and produces result without further human intervention on this invocation Human formulates query based on topic and query interface, not on knowledge of collection or search results System takes query as input and produces results without further human intervention MANUAL: TOPIC HUMANQUERY SYSTEM RESULT System directly evaluates query System takes query as input and produces results without further human intervention AUTOMATIC: TOPIC QUERY SYSTEM RESULT

Carnegie Mellon TRECVID 2004 Top Interactive Search Runs