Presentation is loading. Please wait.

Presentation is loading. Please wait.

{ rsarukkai@yahoo.com ramesh@yahoo-inc.com } Video Search: Opportunities and Challenges Keynote Speech at ACM MIR Workshop ACM Multimedia Conference 2005.

Similar presentations


Presentation on theme: "{ rsarukkai@yahoo.com ramesh@yahoo-inc.com } Video Search: Opportunities and Challenges Keynote Speech at ACM MIR Workshop ACM Multimedia Conference 2005."— Presentation transcript:

1 { rsarukkai@yahoo.com ramesh@yahoo-inc.com }
Video Search: Opportunities and Challenges Keynote Speech at ACM MIR Workshop ACM Multimedia Conference 2005 Singapore Dr. Ramesh R. Sarukkai Yahoo! Search { }

2 About the Presenter Dr. Ramesh Rangarajan Sarukkai currently heads up Video Search engineering at Yahoo where he manages video search/core media search infrastructure. Over his 6+ years tenure at Yahoo, Dr. Sarukkai has grown and managed different teams including the Small Business (e-commerce stores, hosting, domains) billing platform, and architected wireless applications & voice portals including my-yahoo. Prior to Yahoo, he has worked in Kurzweil A.I., IBM Watson Lab, and has collaborated with HP Labs. Dr. Sarukkai is the author of the book entitled “Foundations of Web Technology” (Kluwer/Springer 2002). In addition, he holds the first patent on large vocabulary voice web browsing, and has a number of patents awarded/pending in the areas of speech, natural language modeling, personalized search, information retrieval, optimal sponsored search, media and wireless technologies. Dr. Sarukkai also served on the World Wide Web Consortium (W3C) Voice Browser working group, recently chaired a panel in World Wide Web 2005 conference (Japan) on networking effects of the Web, and has contributed as a reviewer/Program Committee in a number of leading conferences/journals. His current interests include next-generation media search, social annotation, networking effects on the Web, emergent web technologies and new business models.

3 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

4 Introduction Key questions discussed in this talk:
Why is video search important? What broadly constitutes video search? How is Web Video Search different? What are some opportunities and challenges?

5 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

6 Whoever controls the media - the images - controls the culture.
Market Trends Whoever controls the media - the images - controls the culture. - Allen Ginsberg American Poet

7 Market Trends Broadband doubling over next 3-5 years
Video enabled devices are emerging rapidly Emergence of mass internet audience Mainstream media moving to the Web International trends are similar Money Follows…

8 Market Trends How many of you are aware of video on the Web?
How many have viewed a video on the Web in The last 3 months? The last 6 months? Ever? Would you watch video on your devices (ipod/wireless)? How many of you have produced video (personal or otherwise) recently? How many of you have shared that with your friends/community? Would you have liked to?

9 Market Trends How many of you are aware of video on the Web?
Large portion of online users How many have viewed a video on the Web in The last 3 months? 50% The last 6 months? Ever? Would you watch video on your devices (ipod/wireless)? 1M downloads in 20 days (iPod) How many of you have produced video (personal or otherwise) recently? Continuing to skyrocket with digital camera phones/devices How many of you have shared that with your friends/community? Would you have liked to? Huge interest & adoption in viral communities * Source: Forrester Report

10 Market Trends Technology more media friendly
Storage costs plummeting (GB  TB) CPU speed continuing to double (Moore’s law) Increased bandwidth Device support for media Adding media to sites drives traffic Web continues to propel scalable infrastructure for media products/communities

11 Market Trends Democratization of Mass Media in the 21st Century (aka the long tail) Of the People (media is ultimately defined by users) By the people (production,tagging) For the people (consumption, devices, personalization) - Abraham Lincoln as a Media Mogul in the 21st Century! “The Long Tail” was coined by Chris Andersen’s article in the Wired magazine.

12 Market Trends Mainstream media migrating online & interactive
Media content online is growing Video Streaming continuing to grow aggressively online. More interactive in nature (user voting): MTV Interactive American Idol Reality shows

13 Market Trends Video search is a key part of the future of digital media

14 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

15 Multimedia? As far as I'm concerned, it's reading with the radio on!
Video Search What is Video Search? Multimedia? As far as I'm concerned, it's reading with the radio on! - Rory Bremner Actor/Writer

16 Video Search Structured + Unstructured Data Content Features +
Meta-Content Video Production Video Consumption /Community Meta-Content Unstructured Data

17 Video Search Media Information Retrieval
Been around since the late 1970s Text Based/DB Issues: Manual Annotation, Subjectivity of Human Perception Content Based Color, texture, shape, face detection/recognition, speech transcriptions, motion, segmentation boundaries/shots High Dimensionality Limited success to date. Citations: “Image Retrieval: Current Techniques, Promising Directions, and Open Issues” [Rui et al 99]

18 Video Search Active Research Area
Graph from “A new perspective on Visual Information Retrieval”, Horst Eidenberger, 2004 Black: “Image Retrieval”; Grey:”Video Retrieval”; IEEE Digital Library

19 Video Search Traditional 3-Step Overview:
Meta-Data/Visual Feature Extraction Multi-dimensional indexing Retrieval System design

20 Video Search Popular features/techniques:
Color, Shape, Texture, Shape descriptors OCR, ASR A number of prototype or research products with small data sets More researched for visual queries

21 Video Search: Features
Texture: Autocorrelation; Wavelet transforms; Gabor Filters Color: Color Moments Color Histograms Color Autocorrelograms Shape: Edge Detectors; Moment invariants; Animate Vision Marr; Finite Element Methods; Shape from Motion; Segmentation: Scene segmentation; Scene Segmentation; Shot detection; OCR: Modeling; Successful OCR deployments; Face: Face Detection algorithms; Neural Networks; EigenFaces ASR: Acoustic analysis; HMMS; N-grams; CSR; LVCSR; Domain Specific; Media IR systems NIST Video TREC Starts Web Media Search 1970 1980 1990 2000 2010

22 Video Search: Features
Texture One of the earliest Image features [Harlick et al 70s] Co-occurrence matrix Orientation and distance on gray-scale pixels Contrast, inverse deference moment, and entropy [Gotlieb & Kreyszig] Human visual texture properties: coarseness, contrast, directionality, likeliness, regularity and roughness [Tamura et al] Wavelet Transforms [90s] [Smith & Chang] extracted mean and variance from wavelet subbands Gabor Filters And so on Region Segmentation Partition image into regions Strong Segmentation: Object segmentation is difficult. Weak segmentation: Region segmentation based on some homegenity criteria Scene Segmentation Shot detection, scene detection Look for changes in color, texture, brightness Context based scene segmentation applied to certain categories such as broadcast news Color Robust to background Independent of size, orientation Color Histogram [Swain & Ballard] “Sensitive to noise and sparse”- Cumulative Histograms [Stricker & Orgengo] Color Moments Color Sets: Map RGB Color space to Hue Saturation Value, & quantize [Smith, Chang] Color layout- local color features by dividing image into regions Color Autocorrelograms

23 Video Search: Features
Face Face detection is highly reliable - Neural Networks [Rwoley] - Wavelet based histograms of facial features [Schneiderman] Face recognition for video is still a challenging problem. - EigenFaces: Extract eigenvectors and use as feature space OCR OCR is fairly successful technology. Accurate, especially with good matching vocabularies. Script recognition still an open problem. ASR Automatic speech recognition fairly accurate for medium to large vocabulary broadcast type data Large number of available speech vendors. Still open for free conversational speech in noisy conditions. Shape Outer Boundary based vs. region based Fourier descriptors Moment invariants Finite Element Method (Stiffness matrix- how each point is connected to others; Eigen vectors of matrix) Turing function based (similar to Fourier descriptor) convex/concave polygons[Arkin et al] Wavelet transforms leverages multiresolution [Chuang & Kao] Chamfer matching for comparing 2 shapes (linear dimension rather than area) 3-D object representations using similar invariant features Well-known edge detection algorithms.

24 Video Search: Video TREC
Overview: Shot detection, story segmentation, semantic feature extraction, information retrieval Corpora of documentaries, advertising films Broadcast news added in 2003 Interactive and non-interactive tests CBIR features Speech transcribed (LIMSI) OCR

25 Video Search: Video TREC
1st TREC (2001) Mean Average items (MAP) (in general category) Transcript only was better than transcript+image aspects+ASR Also there were different types of test: known items versus general. Known items had at least one result. 2nd TREC (2002) Interactive runs to rephrase queries Again text based on only ASR was the best performing system Mean Average precision of 0.137 Leading system employed multiple systems: TF-IDF variants (Mean Average Precision MAP 0.093). Other was boolean with query expansion (0.101 MAP) OCR was not particularly applicable; Phonetic ASR did not help. 3rd TREC (2003) Data changes radically (Broadcast news added – CNN, ABC, CSPAN) Baseline ASR + CC  MAP 0.155 ASR + CC + VOCR + Image Similarity + Person X retrieval  MAP 0.218 * “Successful Approaches in the TREC Video Retrieval Evaluations”, Alexander Hauptmann, Michael Christel, ACM Multimedia 2004.

26 Video Search: Browsing
Search by text Navigation with customized categories Random Browsing Search by example Search by Sketch

27 Video Search: Pre-2000 or Research QBIC (IBM Almaden)
Photobook (MIT Media Lab) FourEyes (MIT Media Lab) Netra (UCSB Digital Library) MARS (UCI) PicToSeek (UVA.nl) VisualSEEK (Columbia) PicHunter (NEC) ImageRover (BU) WebSEEK (Columbia) Virage (now Autonomy) Visual RetrievalWare (Convera) AMORE (NEC) BlobWorld (UC Berkeley)

28 Video Search We have discussed the background of video search, techniques used, and applications. It should be clear that “video search” is a fairly broad term.

29 Video Search Note the differences in these broader definitions of video search: No mention of content based versus meta-data based. No mention of the actual browsing paradigm. Production and consumption are a key aspect of video search, and should be taken into core consideration in the models. Device, personalization and on-demand needs addressed.

30 Video Search Video Search has evolved a lot since its inception and opportunities much broader. Multimedia technologists (We) are defining and influencing the emergent media culture!

31 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

32 Opportunities Engineering is the art of compromise, and there is always room for improvement in the real world; but engineering is also the art of the practical. Engineers realize that they must, at some point curtail design, and begin to manufacture or build - H. Petrovski Invention by design (Harvard Univ. Press)

33 Opportunities The Internet Revolution
Exposed the internet backbone to the masses Standardized publication of web media (HTML,CSS,XML) Scaled to millions of users Pre-dot-com bust- slow emergence of media search (notably AltaVista Audio-Visual search) Post-dot-com bust Recent trends of community tagging especially for Images: Flickr (Y!), DeliCious, and so on.

34 Opportunities Media Search Now
Yahoo! Image, Video, Audio, Podcast searches Flickr(Y!) AOL/SingingFish BlinxTV Google Many smaller companies

35 Opportunities Web Based Video Search Web Search meets Media Search
Leverage Meta-content from: Web (inferred meta-data) Community (Tagging and user submission/mRSS) Structured Meta-Data from different sources Augment with limited media analysis OCR, ASR, Classification

36 Traditional Media Search
Video Search Traditional Media Search Structured + Unstructured Data Content Features + Meta-Content Video Production Web Video Search Video Consumption /Community Meta-Content Unstructured Data

37 Opportunities Yahoo! Media Search Billions of Images
Millions of Videos Hundreds of Millions of users Reasonable Precision rates

38 Opportunities Example 1: User query “zorro”
User 1 wants to see Zorro videos User 2 wants to see Legend of Zorro movie clips User 3 just wants to see home videos about Zorro Can content based analysis help over structured meta-data query inference?

39

40 Opportunities Example 2:
For main-stream head content such as news videos. Meta-data are fairly descriptive Usually queried based on non-visual attributes. Task: “Pull up recent Hurricane Katrina videos”

41

42 Opportunities Example 3: Creative Home Video
Community video rendering! The now “famous” Star Wars Kid Example of “social buzz” combined with innovative tail content video production.

43 Play Video 1 Play Video 2

44 Opportunities A hard example: Lets take an example:
“Supposing you want to find videos that depict a monkey/chimp doing karate”! CBIR Approach: Train models for Chimps/Monkeys Motion Analysis for Karate movement models Many open issues/problems!

45 Play Video

46 Opportunities What are the key opportunities for Video Search?
Automatic Speech Recognition (ASR) as a comparable case study

47 Opportunities Speech Recognition: Case study
From small experimental models to usable large scale products in restricted domains. 60’s – Very small experimental tests; Digit recognition; 70’s – Core Acoustic modeling work; Vocal Tract Modeling; Formant analysis; Signal Processing (FFT, MFCC) 80’s – HMMs; Statistical Language Models; Large collections of acoustic data. Very Large collection of non-acoustic Text Data; Speaker ID; 90’s – Large vocabulary speech recognition deployments in many domains (IVR, LVCSR) Natural Language Understanding, Core Background Noise/Multiple speakers, Spontaneous Dialog still open research problems

48 Opportunities Acoustic Modeling Vocal Tract Modeling Formant Analysis
DSP: FFT, MFCC Digit Recognizers Viterbi search; HMMs Sub-phonetic Models FSMs, Grammars, Statistical N-grams Digit Recognition; Small Vocabulary Discrete Medium Vocabulary Continuous Large Vocabulary Continuous LVCSR Telephony/Broadcast 1970 1980 1990 2000 2010

49 Opportunities Modeling Framework Milestones Millions of
Acoustic Modeling Vocal Tract Modeling Formant Analysis DSP: FFT, MFCC Millions of non-audio Text Data Digit Recognizers Viterbi search; HMMs Sub-phonetic Models FSMs, Grammars, Statistical N-grams Digit Recognition; Small Vocabulary Discrete Medium Vocabulary Continuous Large Vocabulary Continuous LVCSR Telephony/Broadcast Hundreds of Hours of Audio Data 1970 1980 1990 2000 2010

50 Opportunities Speech Recognition Case Study Key Milestones:
Foundations: HMMs, Viterbi search, Statistical Language Models Creative use of Data: Millions of non-audio text data from Wall Street Journal Corpus applied for Language model training Data Crunching at the lowest levels: Hundreds of hours of data applied to model thousands of sub-phonetic models

51 Opportunities Speech Recognition Case Study:
Exploiting non-media sources of meta/text data are beneficial Applying contextual restraints enables improved applicability Large scale data crunching does indeed help solve hard problems

52 Opportunities Web Video Search:
Mass Adoption: Large Community of users/taggers/web developers Different sources of Media and non-Media data Highly Scalable systems

53 Opportunities The ASR takeaways can be applied to video search:
Exploiting non-media sources of meta/text data are beneficial (Web Meta-data, Community Tagging) Applying contextual restraints enables improved applicability (User models, structured data,Tailored) Large scale data crunching does indeed help solve hard problems (Throw Horsepower)

54 Opportunities Web Browsing Paradigms Search by text (Very popular)
Navigation with customized categories (Very Popular) Random Browsing (Popular for certain categories: Discovery) Search by example (Not popular currently) Search by Sketch (Not popular) Combined usage models - users skipping to scan and then pick selective videos to play.

55 Opportunities Leverage Community Exploit Structured Meta-Data
Constrain with Context

56 Opportunities Tying into social network modeling for media search
How do you define social networks? How do you integrate into video search indexing/ranking? How do you filter out the noise from the authoritative ones?

57 Opportunities Exploit Structured Meta-Data
Video has a number of different editorial sources; Exploit such structured data Tailor results to users’ perceived needs Music Videos, Movies – aka traditional video markets

58 Opportunities Constrain with context
Occam’s razor: Simpler the model, lesser the number of parameters to train, the more general the model. Rather than throw in all meta + media feature vectors, tailor the model; A decision tree based approach to picking the right algorithms. E.g. News broadcast will require certain type of content features and meta-data. Context can also be constrained by user (preferences), device parameters (media type, geo-location)

59 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

60 Challenges A man will occasionally stumble over truth, but usually manages to pick himself up, walk over or around it and carry on. - Winston Churchill

61 Challenges Theoretical Frameworks Leverage CBIR solutions
Learn User Needs/Behavior Inherit Web Search Problems

62 Challenges Theoretical Frameworks
Establishing the right theoretical frameworks for video search is still a problem for a variety of reasons: Lack of proper classified data Large dimensionality; Different sources of meta-data. Difficult to define the correct “perceived relevance” metric (“Semantic Gap*”) Cost functions such as “average precision” are not differentiable & need to be approximated with heuristics Integrate with Application specific cost functions: E.g. search: Click through rates, clicks, video views, customer time spent Sometimes ranking needs to be based on user buzz, quality and other non-IR metrics * “Content-Based Image Retrieval at the End of the Early Years”, Arnold Smeulders et al, IEEE PAMI, Dec. 2000

63 Challenges Leverage CBIR Video TREC Summary:
Text based retrieval still is much better than non-text based searches. Visual queries have poor MAP, but amenable for interactive search. Commercial detectors work okay with broadcast news, but not okay with documentaries with no commercials. (also face detection) Visual features are seldom useful: Not without exceptions, edge detectors usually helped only in aircraft and animal categories when color features were not as discriminative. ASR was not as useful in tasks such as shot detection (as opposed to indexing whole clip). OCR is more closer to the actual occurrence of the word. “Successful Approaches in the TREC Video Retrieval Evaluations”, Alexander Hauptmann, Michael Christel, ACM Multimedia 2004. “Video Retrieval using Speech and Image Information”, Alex Hauptmann et al.

64 Challenges Leverage CBIR Apply what works in a tailored manner:
Color, Texture, OCR, Speech Recognition Classification (in restricted domains) Copyright infringement prevention Story-boarding/Shots Music/Speech Detection Apply other techniques in selective domains Broadcast news: ASR, Segmentation, Speaker Detection Motion for event detection restricted to time regions isolated by ASR

65 Challenges Overcome the temptation to “solve the computer vision” problem*. Yet leverage computer vision techniques Today, CBIR is mostly image based Leverage motion flow information more Apply techniques in compressed domain (or leverage compressed data- e.g. MPEG-4 already computes local/global motion) Strong Image Segmentation is hard- Consistent Segmentation across image sequences in video can be exploited. * “Guest Introduction: The changing Shape of Computer Vision in the Twenty-First Century”, M. Shah, IJCV, 2002

66 Challenges User Needs/Behavior Understand user needs
Better modeling of query and user intent for video search Not much work has been done on user intent analysis for video search Leverage User Behavior Implicit versus explicit relevance feedback Challenges in tracking user feedback Blind/Implicit feedback is useful for limited domains How do you assess user relevance? More challenging for video since there is a temporal duration associated with the actions. Explicit ratings are an option, but not incentivizable

67 Challenges Inherit Web Search Problems Meta-data Spam
Users spam through Web or tagging Leverage Web Search solutions of spam detection based on text, linkage analysis, or user disrepute. Potential dilution due to “long tail” How do you balance classic web search ranking with media based ranking?

68 Outline Introduction Market Trends Video Search Opportunities
Challenges Conclusion

69 If you look back too much, you will soon be headed that way.
Conclusion If you look back too much, you will soon be headed that way. - Unknown

70 Conclusion In this presentation, we covered:
Market trends that are fuelling large scale demand globally for video search. The changing nature of video search largely driven by the Web. Discussed different systems and the techniques that have been applied to media search. Identified key areas of opportunities and challenges.

71 Video Search is here to stay.
Conclusion Video Search is here to stay. We (the attendees) have an opportunity to define and shape next generation media production, consumption, usage and ultimately the culture.

72 Conclusion Many thanks to: MIR Workshop organizers Y! Search Qi Tian
Alex Jaimes ACM Multimedia 2005 Conference Organizers Y! Search John Thrall Qi Lu Bradley Horowitz Video Search team, esp. Ruofei (Bruce) Zhang for help with references Y! Blore R&D: Sesha Shah, Srinivasan, YBL (Marc Davis et al) & YRL (Malcolm Slaney) - Prof. S.K. Rangarajan for help with quotes

73 Conclusion Thanks to the audience Questions?

74 Appendix: Leveraging Research & Industrial Communities
Scalable systems deployed and processing millions of media files End-user testing with huge volume of traffic General problem areas covered in R&D (e.g. VideoTREC) are very relevant Selective application and proper modelling of information fusion is key. How do we share or leverage that data to foster research? How can industries help? How can universities help?


Download ppt "{ rsarukkai@yahoo.com ramesh@yahoo-inc.com } Video Search: Opportunities and Challenges Keynote Speech at ACM MIR Workshop ACM Multimedia Conference 2005."

Similar presentations


Ads by Google