Chorus cluster meeting, Vilamoura April SAPIR Search in Audio-visual content using P2p IR Yosi Mass, Raul Santos
Chorus cluster meeting, Vilamoura April Why SAPIR? Searchable space created by the growing amounts of existing video and multimedia files may greatly exceed the area searched by major engines. Traditional search engines are limited to searching in the associated text and meta-data of the multimedia content. If content providers don't clearly or accurately describe their multimedia files, or use inaccurate tags, the current method falls short. Current internet search is geared mainly to relatively powerful desktop machines and accessed via regular web browsers, not lightweight mobile devices with their connectivity and interactivity limitations.
Chorus cluster meeting, Vilamoura April SAPIR Objectives Develop cutting-edge technology to index and search large scale audio-visual information by content. Make information available on many devices, enhanced by social networking while keeping privacy and preventing fraud Support new trends in MM content production: personal producer VS professional producers
Chorus cluster meeting, Vilamoura April SAPIR challenges Dimensions of the search problem: Efficiency (scalability is the key issue) Effectiveness (quality measures of results) Efficiency challenges Scale in collection size Scale in number of users Effectiveness challenges New search paradigm combining text + audio- visual content Usability challenges
Chorus cluster meeting, Vilamoura April SAPIR Consortium OrganizationActivity typeCountryNr. Employees RTD Person Months IBMINDIsrael62188 CNRResearch Institute Italy MPIResearch Institute Germany15064 UPDUniversityItaly EurixSMEItaly3066 XeroxINDFrance MU-BrnoUniversityCzech Republic TIDINDSpain TelenorINDNorway67429
Chorus cluster meeting, Vilamoura April SAPIR approach-P2P Architecture
Chorus cluster meeting, Vilamoura April Search using the Query by Example Paradigm Search for information about a physical object by taking an image of it with a mobile phone or find a song by humming the melody. Support similarity search for metric spaces Image Database
Chorus cluster meeting, Vilamoura April when waves collide beautiful… very powerful… waves Victoria beach Feature extraction
Chorus cluster meeting, Vilamoura April Indexing when waves collide beautiful… very powerful… waves Victoria beach Visual Descriptors Overlay Metric index Text Overlay Text index
Chorus cluster meeting, Vilamoura April Querying Tag: names waves Visual Descriptors Overlay Text Overlay Merge Results Approximation
Chorus cluster meeting, Vilamoura April Project status for Apr 2008 A scalable, extensible and versatile architecture for P2P was defined. APIs for P2P content management, indexing and search were defined and implemented Several Scenarios were defined and tested in Focus groups Definition of a common schema for feature representation using MPEG-7 was defined. A demo for Indexing and search in 10M Flickr files using a combination of content based image search combined with text and metadata was implemented using the SAPIR APIs. Testbed of 50M Flickr files crawled by the EGEE grid aiming at 100M towards the Year End. This testbed collection will be available for scientific experiments (CoPhir – site) Next demo (due Nov ’08) will include search in music, video and speech as well as some scenario integration.
Chorus cluster meeting, Vilamoura April Tests P2P architecture for search in Audio-Visual content Efficiency – Some initial results: 1M FlickrXML files – ~500msec per query – 50 peers (8CPU, 16Gb) 10M FlickrXML files - ~500msec per query – 500 peers (16CPU, 64Gb) Effectiveness Text + image improves over text or image only
Chorus cluster meeting, Vilamoura April WP9 – Dissemination and exploitation Public website Dissemination First DUP was published Participate in Chorus meetings and road map Workshops – SIGIR’07, ECIR’08, SAC’08 Demos Publications More than 20 SAPIR related publications so far Contacts with Standards Bodies MPEG-21, MPEG-A, MPEG-7 Exploitation
Chorus cluster meeting, Vilamoura April WP9 – Dissemination and exploitation Proposed contribution to standards Extension to MPEG-7 for music and speech. Proposals for MPQF (MPEG-7 Query Format) A DRM implementation for P2P based on Chillout Propose a call for MPEG-21 Query Format
Chorus cluster meeting, Vilamoura April Thank You! For more info visit
Chorus cluster meeting, Vilamoura April Results (Jan 2007 – Mar 2008) WP1 – Scenarios and a complete guideline for usability and user interface design WP2 – Architecture for P2P and APIs WP3 - Definition of a common schema for feature representation using MPEG-7. WP4, WP5 – Demo of indexing and search in 10M Flickr files combining text and low level visual descriptors WP6 – Work on interoperable DRM solution (Chillout) for P2P networks WP7 – initial design of Social networking and support for mobile devices