Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

Similar presentations


Presentation on theme: "Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information."— Presentation transcript:

1 Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ImageCLEF 2005 Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Wen-Cheng Lin Department of Medical Informatics Tzu Chi University Hualien, Taiwan

2 NTU NLPL2 Why Combining Text and Image Queries in Cross Language Image Retrieval ? Text-based image retrieval Translation errors in cross language image retrieval Annotation errors in automatic annotation Easy to catch semantic meanings Easy to construct textual query Content-based image retrieval (CBIR) Semantic meanings are hard to be represented Have to find/draw example images Avoid translation in cross-language image retrieval Annotation is not necessary

3 NTU NLPL3 How to Combine Text and Image Features in Cross Language Image Retrieval ? Parallel approach Conducting text- and content-based retrieval separately and merging the retrieval results Pipeline approach Using textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images Transformation-based approach Mining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa

4 NTU NLPL4 Approach at ImageCLEF 2004 Automatically transform textual queries into visual representations Mine the relationships between text and images Divide an image into several smaller parts Link the words in caption to the corresponding parts Analogous to word alignment in a sentence aligned parallel corpus Build a transmedia dictionary Transform a textual query into visual one using the transmedia dictionary

5 NTU NLPL5 System at ImageCLEF2004 Query translation ImagesImage captions Text-Image correlation learning Text-based image retrieval Source language textual query Visual index Textual index ImagesImage captions Query transformation Transmedia dictionary Target language textual query Visual query Content-based image retrieval Result merging Retrieved images Language resources Target collectionTraining collection

6 NTU NLPL6 Learning Correlation Mare and foal in field, slopes of Clatto Hill, Fife hill mare foal field slope segmentation B01 B02 B03 B04

7 NTU NLPL7 Text-Based Image Retrieval at ImageCLEF2004 RunQuery Translation Backward Transliteration Mean Average Precision WCO No0.2920 WCO+NTWCOYes0.3276 F2hfFirst-two-highest-frequencyNo0.4015 F2hf+NTFirst-two-highest-frequencyYes0.4395 Mono--0.6304 Using similarity-based backward transliteration improves performance 69.71%

8 NTU NLPL8 Cross-Language Experiments at ImageCLEF2004 Query Type Mean Average Precision Textual Query (F2hf+NT)0.4395 Generated Visual Query (18 topics)0.0110 Textual Query + Generated Visual Query (N+V+A, n=30, t=0.02) 0.4441 poor +0.46%: Insignificant Performance Increase +

9 NTU NLPL9 Analyses of These Approaches Parallel approach and Pipeline approach Simple and useful Not employ the relations between visual and textual features Transformation-based approach Textual and visual queries can be translated to each other using relations between visual and textual features Hard to learn all relations between all visual and textual features Degree of ambiguity of the relations is usually high

10 NTU NLPL10 Our Approach at ImageCLEF2005: A Corpus-Based Relevance Feedback Method A Corpus-Based Relevance Feedback approach Initiate a content-based retrieval Treat the retrieved images and their text descriptions as aligned documents Adopt a corpus-based method to select key terms from text descriptions, and generate a new query.

11 NTU NLPL11 Fundamental Concepts of a Corpus-Based Relevant Feedback Approach

12 (Aircraft on the ground) VIPER system

13

14 NTU NLPL14 Bilingual Ad hoc Retrieval Task 28,133 photographs from St. Andrews University Library ’ s photographic collection Collection is in English and queries are in different languages In our experiments, queries are in Chinese All images are accompanied by a textual description written in English by librarians working at St. Andrews Library The test set contains 28 topics, and each topic has text description and an example image.

15 NTU NLPL15 An Example – An image and Its Description

16 NTU NLPL16 An Example – A topic in Chinese A Chinese Title An English Title

17 NTU NLPL17 Some Models in Formal Runs

18 NTU NLPL18 Experiment Results at ImageCLEF2005 + + +25.96% +15.78% +11.01% Performance of EE+EX > CE+EX  EE > EX > CE > Visual run

19 NTU NLPL19 Lessons Learned Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle. Combining Textual and Visual information can improve performance

20 20 Example: Aircraft on the Ground ( ) Text only (monolingual) Text only (cross-lingual ) Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )

21 NTU NLPL21 Example: Aircraft on the Ground (after integration) Text (monolingual) + Visual Text+Visual Run is better than monolingual run because it expands some useful words, e.g., aeroplane, military air base, airfield

22 NTU NLPL22 ImageCLEF2004 vs. ImageCLEF2005 Text-based IR (monolingual case) 0.6304 (2004) vs. 0.3952 (2005) Topics of this year is a little harder Text+Image IR (monolingual case) 0.6591 (2004) vs. 0.5053 (2005) Text+Image IR (crosslingual case) 0.4441 (2004) vs. 0.3977 (2005) 70.45% vs. 100.63%

23 NTU NLPL23 Automatic Annotation Task The automatic annotate task in ImageCLEF 2005 can be seen as a classification task, since each image can only be annotated with one word (i.e., a category) We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category. The methods we proposed use the same image features, but different classification approaches.

24 NTU NLPL24 Image Feature Extraction Resize images to 256 x 256 pixels Segment each image into 32 x 32 blocks (each block is 8 x 8 pixels). Compute the average gray value of each block to construct a vector with 1,024 elements. The similarity between two images is measured by cosine formula.

25 NTU NLPL25 Some Models and Experimental Results NTU-annotate05-1NN Baseline model. It uses 1-NN method to classify each image. NTU-annotate05-Top2 Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category. NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.

26 NTU NLPL26 Conclusion: Bilingual Ad hoc Retrieval Task An approach of combining textual and image features is proposed for Chinese-English image retrieval.  a corpus-based feedback cycle from CBIR Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977).  resolve part of translation errors The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053.  provide more information The improvement is the best among all the groups.  78.2% of the best monolingual text retrieval

27 NTU NLPL27 Conclusion: Automatic Annotation Task A feature extraction algorithm is proposed and several classification approaches are explored under the same image features. The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%). Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.

28 Thank You and Comments


Download ppt "Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information."

Similar presentations


Ads by Google