Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities.

Similar presentations


Presentation on theme: "Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities."— Presentation transcript:

1 Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities in Video Retrieval Interfaces

2 Carnegie Mellon Background Automatic video analysis for detection/recognition is still quite poor Consider baseline (random guessing) Improvement is limited Consider near-duplicates (trivial similarity) Does not generalize well over video sources Better than nothing Need humans to make up for this shortcoming!

3 Carnegie Mellon Differences to VIRAT Most interface work done on broadcast TV Harder: Unconstrained subject matter Graphics, animations, photos Broadcasters Many different, short shots out of context Easier: Better resolution Conventions in editing, structure Audio track Keyframes are typical unit of analysis

4 Carnegie Mellon “Classic” Interface Work Interactive Video Queries Fielded text query matching capabilities Fast image matching with simplified interface for launching image queries Interactive Browsing, Filtering, and Summarizing Browsing by visual concepts Quick display of contents and context in synchronized views

5 Carnegie Mellon Informedia Client Interface Example

6 Carnegie Mellon Interface options Christel08

7 Carnegie Mellon Suggesting Related Concepts Zavesky07

8 Carnegie Mellon

9 Fork Browser Snoek09

10 Carnegie Mellon “Classic” Video Interface Results Concept browsing and image search used frequently Novices still have lower performance than experts Some topics cause “interactivity” to be one-shot query with little browsing/exploration Classic Informedia interface including concept browsing often good enough that user never proceeds to any additional text or image query “Classic Informedia” scored highest of those testing with novice users in TRECVID evaluations

11 Carnegie Mellon Visual Browsing

12 Carnegie Mellon Augmented Video Retrieval The computer observes the user and LEARNS, based on what is marked as relevant The system can learn: What image characteristics are relevant What text characteristics (words) are relevant What combination weights should be used We exploit the human’s ability to quickly mark relevant video and the computer’s ability to learn from given examples

13 Carnegie Mellon Combining Concept Detectors for Retrieval Final ranked list for interface Combination of diverse knowledge sources Text Find Pope John Paul AudioImageMotion multimodal question Output 1 Output 2 Output n … Closed Caption Color Feature Motion Feature Audio Feature Video library Face … (3k) Knowledge Source/API Building

14 Carnegie Mellon Why Relevance Feedback? Limited training data Untrained sources are useful for some specific searches Query: finding some boats/ships Txt: 0.5, Img: 0.3, Face: -0.5 Q-Type: general object (Learned from training set) Outdoor: ?, Ocean: ? (Unable to be learned)

15 Carnegie Mellon Probabilistic Local Context Analysis (pLCA) Yan07 Goal: Refine results of the current query Method: assume the combination parameters of “un-learned” sources υ to be latent variables and compute P(y j |a j,D j ) Discover useful search knowledge based on initial results A i Query Initial Search Result Video1 Video 2 Video M A1A1 A2A2 AmAm Y1Y1 Y2Y2 YmYm υ 1 :?, υ 2 :?,…, υ N :?

16 Carnegie Mellon Undirected Model and Parameter Estimation Compute the posterior probability of document relevance Y given initial results A based on an undirected graphical model D1D1 D1D1 Y1Y1 Y1Y1 A1A1 A1A1 υ υ Variational inference, i.e., iterate until convergence and approximate P(y j |a j ) by q yj  Maximize w.r.t. variational para. of Y,  Maximize w.r.t. variational para. of v D2D2 D2D2 Y2Y2 Y2Y2 A2A2 A2A2 DmDm DmDm YmYm YmYm AmAm AmAm

17 Carnegie Mellon Automatic vs Interactive Search

18 Carnegie Mellon Extreme Video Retrieval Automatic retrieval baseline for ranking order Two methods of presentation: System-controlled Presentation - Rapid Serial Visual Presentation (RSVP) User-controlled Presentation – Manual Browsing with Resizing of Pages

19 Carnegie Mellon System-controlled Presentation Rapid Serial Visual Presentation (RSVP) Minimizes eye movements All images in same location Maximizes information transfer: System  Human Up to 10 key images/second 1 or 2 images per page Presentation intervals are dynamically adjustable Click when relevant shot is seen Mark previous page also as relevant A final verification step is necessary

20 Carnegie Mellon User-controlled presentations Manual Browsing with Resizing of Pages Manually page through images User decides to view next page Vary the number of images on a page Allow chording on the keypad to mark shots A very brief final verification step

21 Carnegie Mellon MBRP - Manual Browsing with Resizable Pages

22 Carnegie Mellon Extreme QA with RSVP 3x3 display 1 page/second Numpad chording to select shots

23 Carnegie Mellon Mindreading – an EEG interface Learn Relevant/Non-Relevant 5 EEG probes Simple features Too slow 250 ms/image Significant recovery time after hit Relevance feedback

24 Carnegie Mellon Summarizing Video: Beyond Keyframes BBC Rushes Unedited video for TV series production Summarize as video in 1/50 th of total Note the non-scalable target factor Lots of smart analysis Clustering, salience, redundancy, importance Best performance for retrieval was to play every frame

25 Carnegie Mellon Speed-up summarization results

26 Carnegie Mellon Surveillance Event Detection Interesting stuff is rare Detection accuracy is limited Monitor many streams

27 Carnegie Mellon Surveillance Event Detection Need action not key frame Difficult for humans Combine speed-up with automatic analysis Slow down when interesting stuff happens

28 Carnegie Mellon Summary Interfaces have much to contribute in retrieval We don’t know what is best Task-specific User-specific System-dependent Collaborative search Combining “best of current systems” Simpler is usually better (Occam’s razor) General principles are difficult to find

29 Carnegie Mellon Questions?


Download ppt "Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities."

Similar presentations


Ads by Google