Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Michal Nir, Saar Gross Supervisors: Nadav Golbandi, Oren Somekh Computer Science Department Industrial Project (234313) Tuesday, January.

Similar presentations


Presentation on theme: "Presented by: Michal Nir, Saar Gross Supervisors: Nadav Golbandi, Oren Somekh Computer Science Department Industrial Project (234313) Tuesday, January."— Presentation transcript:

1 Presented by: Michal Nir, Saar Gross Supervisors: Nadav Golbandi, Oren Somekh Computer Science Department Industrial Project (234313) Tuesday, January 24, 2012

2  This project extends on a previous project which includes a client application (Android) and a server application (Running on Tomcat).  The user takes a photo using his smartphone and records an audio linked to that photo.  Tags are extracted from the audio using speech-to-text and the photo, with its tags, is uploaded to Flickr.  The speech-to-text engine (Sphinx) works best using small dictionaries.  In our project, we will try to supply Sphinx with a custom dictionary created for each photo (Or stack of photos) using the photo’s geo- location information.  Using the geo-location info, we can extract relevant tags from Flickr, thus creating the custom dictionary.

3  Implement a new module, running on the server application, that will create custom dictionaries for the Sphinx voice-to-text engine.  Optimize the algorithm for creating the custom dictionary while achieving optimal results with acceptable hit on performance.

4  The server generates tag recommendations, in one of two ways:  Uploading an image (Or multiple images) that contains a geo- location, with an audio file attached, will trigger the server to create a custom dictionary for the Sphinx voice-to-text engine.  The client may ask for tag recommendations by sending a request containing the image’s geo-location only.  The server can also be instructed not to use the image’s geo-location for compiling the recommendations list (Privacy concerns) and in that case, only the user’s “private tags” will be used.

5  The server supports uploading multiple images-  When uploading multiple images, images are clustered into different groups based on location (Using a simple and deterministic algorithm).  The server will compile a recommendation list for each group.  Every image with an audio file attached will be processed using Sphinx with its group’s custom dictionary.  All images will be uploaded to Flickr using their identified tags and user-supplied tags.  Returning recommendations only for a group of images is essentially the same.  Except, we only return recommendations for the largest group of images.

6  Method of compiling a recommendation list for an image (Or group of images): Group of images Public Tags (Based on geo-location) By ranking tags found in images near the given geo-location Public Tags (Based on geo-location) By querying Flicker’s Places API Private Tags (NOT using geo-location) By ranking the user’s past used tags Implemented using independent threads (All running in parallel) Implemented using independent threads (All running in parallel) Merging Results Merging parameters are configurable To Android Client (When asking for Tag Recommendations only) To Sphinx (When uploading images to Flickr)

7  Server side: 1. Tag Recommendation are compiled for an image/group of images and can be presented to the user (Recommendation only) or used for Sphinx voice-to-text. 2. Performance: 1. In general- Pretty good. 2. Compiling a recommendation list usually takes no more than a few seconds. 3. In any case, a time limit is enforced. 4. Most interaction with Flickr is completely multi-threaded to avoid bottlenecks. 5. Compiled recommendation lists are cached based on time and location to optimize performance further.

8  Server properties file: 1. Virtually all parameters needed for the server are acquired externally from a properties (Settings) file. 1. Tweaking the server becomes an easy and intuitive task. 2. The server uses 2 different sets of settings: 1. Settings to be used when uploading images to Flickr. 2. Settings to be used when asking for Tag Recommendations only. 1. Gives us more flexibility when changing the server’s settings. 3. Example from imageupload.properties: x

9 Client side:

10  Merged the Camera and Gallery applications into one.  Added a new Tag Editor (Can now add/edit and remove tags from images).  Added support for working with multiple images and getting tag recommendations.  Many bug fixes and GUI improvements:  New Image Properties dialog.  Updated menus and icons.  Improved gallery performance and design.

11  For evaluating the algorithm’s performance, we would like to do the following:  Find a user who uploaded many tagged images (With a reasonable time difference between them) in a popular location (San Francisco bridge, Las-Vegas Strip).  Perform a cross-validation analysis-  Choose a subset of images from the user’s images.  Send the images to server and receive tag recommendations for them.  Evaluate the accuracy (Precision and Recall) of the recommendations using the 2 left-out images.  Repeat…  Our expectations are that accuracy will be affected by many factors-  Number of tags merged into final recommendation list from each source.  Dictionary size.

12  We wrote TagRecTestFramework-  Completely automated.  Behaves like a “normal” client (Server thinks it’s talking to an Android client).  For each given location-  Finds a user with enough tagged images (Configurable…) in the area with a small time difference between images (Also configurable).  Perform cross-validation on grouped images.

13 -10 images in each group, Min. of 20 tags per image -Search radius: 1 KM, Time difference between images: Max. 1 day Piazza San Pietro (Vatican City) (41.902309, 12.457341)

14  Algorithm’s accuracy is very image/user-dependent:  We found that most images in Flickr are not tagged or tagged with irrelevant tags.  Most images on Flickr are not geotagged.  Flickr has ~5 billion photos.  Only ~170 million are geotagged (~3% of all photos).  Quality of results could be improved by tweaking the server’s settings-  Giving more weight to private/public tags affects the accuracy.  Compiling a larger recommendation list (And thus, a larger dictionary for Sphinx) improves recall but may hurt Sphinx’s performance (Sphinx works best with small dictionaries).


Download ppt "Presented by: Michal Nir, Saar Gross Supervisors: Nadav Golbandi, Oren Somekh Computer Science Department Industrial Project (234313) Tuesday, January."

Similar presentations


Ads by Google