Shin’ichi Satoh National Institute of Informatics.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Content-Based Image Retrieval
Multimedia Answer Generation for Community Question Answering.
Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)
ARNOLD SMEULDERS MARCEL WORRING SIMONE SANTINI AMARNATH GUPTA RAMESH JAIN PRESENTERS FATIH CAKIR MELIHCAN TURK Content-Based Image Retrieval at the End.
Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
T Seminar on Multimedia Metadata Management Hannu Järvinen
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Information Retrieval Review
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
ISP 433/633 Week 5 Multimedia IR. Goals –Increase access to media content –Decrease effort in media handling and reuse –Improve usefulness of media content.
A. Frank Multimedia Multimedia/Video Search. 2 A. Frank Contents Multimedia (MM) and search/retrieval Text-based MM search in General SEs Text-based MM.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
CAPTIONING VIDEOS FOR YOUTUBE Marisol Miranda, Beth Coombs.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Information Retrieval in Practice
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Image Annotation and Feature Extraction
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.
Multimedia Databases (MMDB)
DATA-CENTERED CROWDSOURCING WORKSHOP PROF. TOVA MILO SLAVA NOVGORODOV TEL AVIV UNIVERSITY 2014/2015.
Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Semantic Web - Multimedia Annotation – Steffen Staab
Content-Based Image Retrieval
Christine Laham, Fahed Abdu, David Dezano,Shelly Kim.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
Computer Vision – Overview Hanyang University Jong-Il Park.
IST DIVAS Presentation 1 Advanced search technologies for digital audio-visual content.
Research Projects 6v81 Multimedia Database Yohan Jin, T.A.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
IBM QBIC: Query by Image and Video Content Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC 28223
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Compiz. Some Demonstration Scenes What is Demonstration Scene? Non-interactive presentation with video and audio run in real-time. Demonstration of programming.
Chittampally Vasanth Raja 10IT05F vasanthexperiments.wordpress.com.
Chittampally Vasanth Raja vasanthexperiments.wordpress.com.
Presentation e-Learning Basics Author: Mary Frentzou )
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Yixin Chen and James Z. Wang The Pennsylvania State University
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Internet-scale Imagery for Graphics and Vision James Hays cs129 Computational Photography Brown University, Spring 2011.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
3D Motion Classification Partial Image Retrieval and Download Multimedia Project Multimedia and Network Lab, Department of Computer Science.
MULTIMEDIA SYSTEMS CBIR & CBVR. Schedule Image Annotation (CBIR) Image Annotation (CBIR) Video Annotation (CBVR) Video Annotation (CBVR) Few Project Ideas.
Digital Video Library - Jacky Ma.
Visual Information Retrieval
Data-Centered Crowdsourcing Workshop
Multimedia Content-Based Retrieval
Multimedia Content Description Interface
Multimedia Information Retrieval
Deep Cross-media Knowledge Transfer
Presentation transcript:

Shin’ichi Satoh National Institute of Informatics

 Nowadays abundant multimedia information available  Web, broadband network, CATV, satellite...  digital camera, mobile phone,

 YouTube: 35 hours of video uploaded every minute

 Flickr: 5 billion photos  Facebook: 3 billion photos per month

 How can we utilize such huge amounts of multimedia?  Search could be one promising option  Any technical problems?  It seems like multimedia search is already available  Google, Yahoo!, Bing image search, Flickr, YouTube, etc...

 Multimedia search is possible only via text search technology  This problem is prominent especially for visual media (audio can be converted into text via ASR)

 But major part of multimedia data has no text data  We checked a number of photos in Flickr and found around 85% of photos have no tags or description  as far as we use the text search-based technologies, such large amounts of multimedia are inaccessible at all!

 Moreover, text-based multimedia search is NOT perfect  searching images of "people playing drums"  some results are good  but some results are very strange John dog

 Multimedia semantic content analysis is required  However it’s difficult ◦ Multimedia is difficult to handle by computers ◦ Inherently difficult due to “Semantic Gap” Query: Lion Lion

 Multimedia data is huge ◦ text: 1kb/s (10 words), audio: 100kb/s (MP3), video 10Mb/s (MPEG2)  computers since 1940s (ENIAC 1946)  text processing by computer since 1950s! (Turing test 1950, ELIZA and SHRDLU 1960s)  project Gutenberg since 1971  CD-ROM (1985), DVD (1993), larger memory, external storage (hard disk drives)  multimedia data (audio/image/video) are getting manageable only after 1990s

 Please guess what this is. Water Lilies, Monet

 Please guess what this is.

 Computers are so good at handling text, but not so at handling multimedia  text: artificial media, symbolized by nature  multimedia: ambiguous, depend on cognition, natural media, not symbolized, etc...  human can easily “see” or perceive  but we cannot explain how we “see” The quick brown fox jumps over the lazy dog

 1980s  Landsat images, medial images, stock photos  Search using relational DB  only via statistics and text  issue was how to handle “huge” data of images  less attention was paid to content analysis

 CBIR: Image retrieval based on “content”  T. Kato, TRADEMARK & ART MUSEUM (1989)  IBM QBIC (1990s)  Take an image as a query, and return “similar” images  Use “features,” e.g., color histogram, edge, shape, etc.  It worked for images without metadata  Assume that similar images in the feature space are semantically similar as well  But this is not always true

Feature space Semantic Gap

 Let’s take a look at face detection as an example...  Face detection is very stable technology  Before 1990 face detection was very unstable ◦ Shape of facial features and their geometric relations were hard coded  After late 1990s face detector using machine learning succeeded in very stable performance ◦ Simply provide a lot of face image examples (a few thousands) to the system and let it learn Early face detection method Machine learning

Following the success of machine-learning- based approaches in face detection, OCR, ASR, etc., researchers decided to “train” computers for media semantic content analysis build corpus: tens, hundreds, or thousands images/video shots per concept with manual annotation extract features (low-level, but recently “local” features are known to be more effective) train computers to automatically map between low-level features and semantic categories using machine learning Several corpora available

 Caltech 101 (2003), Caltech 256 (2007)  101/256 concepts  define the set of concepts first, then collect images (via image search engine)  manual selection, so clean annotation  up to a few hundreds images per concept  standard benchmark datasets  “small world effect” anticipated  questionable selection of concepts

 airplane, chair, elephant, faces, leopards, rhino  bonsai, brain, scorpion, trilobite, yin_yang...

 Large number of concepts, large number of images  #concepts: 10,000+  #images: 10,000,000+  concepts are systematically selected from WordNet (computer-readable thesaurus)

 Manual annotation by Amazon Mechanical Turk  Hard to control quality  Scalability issue

Currently researchers are focusing on the issue: how to effectively learn semantic concepts from GIVEN training media corpus Corpus: the larger, the better But how to obtain large corpus? CGM (Flickr, Web): noisy Manual annotation (AMT): costly, less scalable Other approaches such as ESP game could be interesting

Text Audio/ Speech Image Video Project Gutenberg bag-of-words TF/IDF WSJ TREC PageRank MFCC Viterbi, HMM CMU-MIT Face DB Pascal VOC ImageNet TRECVID Caltech101 V-J Face Det. USPS OCR single digit 1000 words LVCSR IBM ViaVoice

 Multimedia content analysis research: “just started”  More advanced results to come  Business value?  Killer applications?