Download presentation
Presentation is loading. Please wait.
Published byElaine Anna Robertson Modified over 8 years ago
1
Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities in Video Retrieval Interfaces
2
Carnegie Mellon Background Automatic video analysis for detection/recognition is still quite poor Consider baseline (random guessing) Improvement is limited Consider near-duplicates (trivial similarity) Does not generalize well over video sources Better than nothing Need humans to make up for this shortcoming!
3
Carnegie Mellon Differences to VIRAT Most interface work done on broadcast TV Harder: Unconstrained subject matter Graphics, animations, photos Broadcasters Many different, short shots out of context Easier: Better resolution Conventions in editing, structure Audio track Keyframes are typical unit of analysis
4
Carnegie Mellon “Classic” Interface Work Interactive Video Queries Fielded text query matching capabilities Fast image matching with simplified interface for launching image queries Interactive Browsing, Filtering, and Summarizing Browsing by visual concepts Quick display of contents and context in synchronized views
5
Carnegie Mellon Informedia Client Interface Example
6
Carnegie Mellon Interface options Christel08
7
Carnegie Mellon Suggesting Related Concepts Zavesky07
8
Carnegie Mellon
9
Fork Browser Snoek09
10
Carnegie Mellon “Classic” Video Interface Results Concept browsing and image search used frequently Novices still have lower performance than experts Some topics cause “interactivity” to be one-shot query with little browsing/exploration Classic Informedia interface including concept browsing often good enough that user never proceeds to any additional text or image query “Classic Informedia” scored highest of those testing with novice users in TRECVID evaluations
11
Carnegie Mellon Visual Browsing
12
Carnegie Mellon Augmented Video Retrieval The computer observes the user and LEARNS, based on what is marked as relevant The system can learn: What image characteristics are relevant What text characteristics (words) are relevant What combination weights should be used We exploit the human’s ability to quickly mark relevant video and the computer’s ability to learn from given examples
13
Carnegie Mellon Combining Concept Detectors for Retrieval Final ranked list for interface Combination of diverse knowledge sources Text Find Pope John Paul AudioImageMotion multimodal question Output 1 Output 2 Output n … Closed Caption Color Feature Motion Feature Audio Feature Video library Face … (3k) Knowledge Source/API Building
14
Carnegie Mellon Why Relevance Feedback? Limited training data Untrained sources are useful for some specific searches Query: finding some boats/ships Txt: 0.5, Img: 0.3, Face: -0.5 Q-Type: general object (Learned from training set) Outdoor: ?, Ocean: ? (Unable to be learned)
15
Carnegie Mellon Probabilistic Local Context Analysis (pLCA) Yan07 Goal: Refine results of the current query Method: assume the combination parameters of “un-learned” sources υ to be latent variables and compute P(y j |a j,D j ) Discover useful search knowledge based on initial results A i Query Initial Search Result Video1 Video 2 Video M A1A1 A2A2 AmAm Y1Y1 Y2Y2 YmYm υ 1 :?, υ 2 :?,…, υ N :?
16
Carnegie Mellon Undirected Model and Parameter Estimation Compute the posterior probability of document relevance Y given initial results A based on an undirected graphical model D1D1 D1D1 Y1Y1 Y1Y1 A1A1 A1A1 υ υ Variational inference, i.e., iterate until convergence and approximate P(y j |a j ) by q yj Maximize w.r.t. variational para. of Y, Maximize w.r.t. variational para. of v D2D2 D2D2 Y2Y2 Y2Y2 A2A2 A2A2 DmDm DmDm YmYm YmYm AmAm AmAm
17
Carnegie Mellon Automatic vs Interactive Search
18
Carnegie Mellon Extreme Video Retrieval Automatic retrieval baseline for ranking order Two methods of presentation: System-controlled Presentation - Rapid Serial Visual Presentation (RSVP) User-controlled Presentation – Manual Browsing with Resizing of Pages
19
Carnegie Mellon System-controlled Presentation Rapid Serial Visual Presentation (RSVP) Minimizes eye movements All images in same location Maximizes information transfer: System Human Up to 10 key images/second 1 or 2 images per page Presentation intervals are dynamically adjustable Click when relevant shot is seen Mark previous page also as relevant A final verification step is necessary
20
Carnegie Mellon User-controlled presentations Manual Browsing with Resizing of Pages Manually page through images User decides to view next page Vary the number of images on a page Allow chording on the keypad to mark shots A very brief final verification step
21
Carnegie Mellon MBRP - Manual Browsing with Resizable Pages
22
Carnegie Mellon Extreme QA with RSVP 3x3 display 1 page/second Numpad chording to select shots
23
Carnegie Mellon Mindreading – an EEG interface Learn Relevant/Non-Relevant 5 EEG probes Simple features Too slow 250 ms/image Significant recovery time after hit Relevance feedback
24
Carnegie Mellon Summarizing Video: Beyond Keyframes BBC Rushes Unedited video for TV series production Summarize as video in 1/50 th of total Note the non-scalable target factor Lots of smart analysis Clustering, salience, redundancy, importance Best performance for retrieval was to play every frame
25
Carnegie Mellon Speed-up summarization results
26
Carnegie Mellon Surveillance Event Detection Interesting stuff is rare Detection accuracy is limited Monitor many streams
27
Carnegie Mellon Surveillance Event Detection Need action not key frame Difficult for humans Combine speed-up with automatic analysis Slow down when interesting stuff happens
28
Carnegie Mellon Summary Interfaces have much to contribute in retrieval We don’t know what is best Task-specific User-specific System-dependent Collaborative search Combining “best of current systems” Simpler is usually better (Occam’s razor) General principles are difficult to find
29
Carnegie Mellon Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.