Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities.

Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities in Video Retrieval Interfaces

Carnegie Mellon Background Automatic video analysis for detection/recognition is still quite poor Consider baseline (random guessing) Improvement is limited Consider near-duplicates (trivial similarity) Does not generalize well over video sources Better than nothing Need humans to make up for this shortcoming!

Carnegie Mellon Differences to VIRAT Most interface work done on broadcast TV Harder: Unconstrained subject matter Graphics, animations, photos Broadcasters Many different, short shots out of context Easier: Better resolution Conventions in editing, structure Audio track Keyframes are typical unit of analysis

Carnegie Mellon “Classic” Interface Work Interactive Video Queries Fielded text query matching capabilities Fast image matching with simplified interface for launching image queries Interactive Browsing, Filtering, and Summarizing Browsing by visual concepts Quick display of contents and context in synchronized views

Carnegie Mellon Informedia Client Interface Example

Carnegie Mellon Interface options Christel08

Carnegie Mellon Suggesting Related Concepts Zavesky07

Carnegie Mellon

Fork Browser Snoek09

Carnegie Mellon “Classic” Video Interface Results Concept browsing and image search used frequently Novices still have lower performance than experts Some topics cause “interactivity” to be one-shot query with little browsing/exploration Classic Informedia interface including concept browsing often good enough that user never proceeds to any additional text or image query “Classic Informedia” scored highest of those testing with novice users in TRECVID evaluations

Carnegie Mellon Visual Browsing

Carnegie Mellon Augmented Video Retrieval The computer observes the user and LEARNS, based on what is marked as relevant The system can learn: What image characteristics are relevant What text characteristics (words) are relevant What combination weights should be used We exploit the human’s ability to quickly mark relevant video and the computer’s ability to learn from given examples

Carnegie Mellon Combining Concept Detectors for Retrieval Final ranked list for interface Combination of diverse knowledge sources Text Find Pope John Paul AudioImageMotion multimodal question Output 1 Output 2 Output n … Closed Caption Color Feature Motion Feature Audio Feature Video library Face … (3k) Knowledge Source/API Building

Carnegie Mellon Why Relevance Feedback? Limited training data Untrained sources are useful for some specific searches Query: finding some boats/ships Txt: 0.5, Img: 0.3, Face: -0.5 Q-Type: general object (Learned from training set) Outdoor: ?, Ocean: ? (Unable to be learned)

Carnegie Mellon Probabilistic Local Context Analysis (pLCA) Yan07 Goal: Refine results of the current query Method: assume the combination parameters of “un-learned” sources υ to be latent variables and compute P(y j |a j,D j ) Discover useful search knowledge based on initial results A i Query Initial Search Result Video1 Video 2 Video M A1A1 A2A2 AmAm Y1Y1 Y2Y2 YmYm υ 1 :?, υ 2 :?,…, υ N :?

Carnegie Mellon Undirected Model and Parameter Estimation Compute the posterior probability of document relevance Y given initial results A based on an undirected graphical model D1D1 D1D1 Y1Y1 Y1Y1 A1A1 A1A1 υ υ Variational inference, i.e., iterate until convergence and approximate P(y j |a j ) by q yj  Maximize w.r.t. variational para. of Y,  Maximize w.r.t. variational para. of v D2D2 D2D2 Y2Y2 Y2Y2 A2A2 A2A2 DmDm DmDm YmYm YmYm AmAm AmAm

Carnegie Mellon Automatic vs Interactive Search

Carnegie Mellon Extreme Video Retrieval Automatic retrieval baseline for ranking order Two methods of presentation: System-controlled Presentation - Rapid Serial Visual Presentation (RSVP) User-controlled Presentation – Manual Browsing with Resizing of Pages

Carnegie Mellon System-controlled Presentation Rapid Serial Visual Presentation (RSVP) Minimizes eye movements All images in same location Maximizes information transfer: System  Human Up to 10 key images/second 1 or 2 images per page Presentation intervals are dynamically adjustable Click when relevant shot is seen Mark previous page also as relevant A final verification step is necessary

Carnegie Mellon User-controlled presentations Manual Browsing with Resizing of Pages Manually page through images User decides to view next page Vary the number of images on a page Allow chording on the keypad to mark shots A very brief final verification step

Carnegie Mellon MBRP - Manual Browsing with Resizable Pages

Carnegie Mellon Extreme QA with RSVP 3x3 display 1 page/second Numpad chording to select shots

Carnegie Mellon Mindreading – an EEG interface Learn Relevant/Non-Relevant 5 EEG probes Simple features Too slow 250 ms/image Significant recovery time after hit Relevance feedback

Carnegie Mellon Summarizing Video: Beyond Keyframes BBC Rushes Unedited video for TV series production Summarize as video in 1/50 th of total Note the non-scalable target factor Lots of smart analysis Clustering, salience, redundancy, importance Best performance for retrieval was to play every frame

Carnegie Mellon Speed-up summarization results

Carnegie Mellon Surveillance Event Detection Interesting stuff is rare Detection accuracy is limited Monitor many streams

Carnegie Mellon Surveillance Event Detection Need action not key frame Difficult for humans Combine speed-up with automatic analysis Slow down when interesting stuff happens

Carnegie Mellon Summary Interfaces have much to contribute in retrieval We don’t know what is best Task-specific User-specific System-dependent Collaborative search Combining “best of current systems” Simpler is usually better (Occam’s razor) General principles are difficult to find

Carnegie Mellon Questions?

Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities.

Similar presentations

Presentation on theme: "Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities.

Similar presentations

Presentation on theme: "Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities."— Presentation transcript:

Similar presentations

About project

Feedback