Presentation is loading. Please wait.

Presentation is loading. Please wait.

August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.

Similar presentations


Presentation on theme: "August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for."— Presentation transcript:

1 August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park, MD, USA

2 Multilingual Information Access Help people find information that is expressed in any language

3 Outline User needs System design User studies Next steps

4 Global Languages Source: http://www.g11n.com/faq.html

5 Source: Global Reach English 20002005 Global Internet User Population Chinese

6 Global Internet Hosts Source: Network Wizards Jan 99 Internet Domain Survey

7 European Web Size Projection Source: Extrapolated from Grefenstette and Nioche, RIAO 2000

8 Global Internet Audio source: www.real.com, Mar 2001 Over 2500 Internet-accessible Radio and Television Stations

9 Who needs Cross-Language Search? Searchers who can read several languages –Eliminate multiple queries –Query in most fluent language Monolingual searchers –If translations can be provided –If it suffices to know that a document exists –If text captions are used to search for images

10 Outline User needs  System design User studies Next steps

11

12 Cross-Language Search Query Translation Document Delivery Cross-Language Browsing SelectExamine Multilingual Information Access

13 The Search Process Choose Document-Language Terms Query-Document Matching Infer Concepts Select Document-Language Terms Document Author Query Choose Document-Language Terms Monolingual Searcher Choose Query-Language Terms Cross-Language Searcher

14 Interactive Search Search Translated Query Selection Ranked List Examination Document Use Document Query Formulation Query Translation Query Query Reformulation

15

16 Synonym Selection

17 KeyWord In Context (KWIC)

18

19 Outline User needs System design  User studies Next steps

20 Cross-Language Evaluation Forum Annual European-language retrieval evaluation –Documents: 8 languages Dutch, English, Finnish, French, German, Italian, Spanish, Swedish –Topics: 8 languages, plus Chinese and Japanese –Batch retrieval since 2000 Interactive track (iCLEF) started in 2001 –2001 focus: document selection –2002 focus: query formulation

21 iCLEF 2001 Experiment Design Participant 1 2 3 4 Task Order Narrow: Broad: Topic Key System Key System B: System A: Topic11, Topic17Topic13, Topic29 Topic11, Topic17Topic13, Topic29 Topic17, Topic11Topic29, Topic13 Topic17, Topic11Topic29, Topic13 11, 13 17, 29 144 trials, in blocks of 16, at 3 sites

22 An Experiment Session Task and system familiarization 4 searches (20 minutes each) –Read topic description –Examine document translations –Judge as many documents as possible Relevant, Somewhat relevant, Not relevant, Unsure, Not judged Instructed to seek high precision 8 questionnaires –Initial, each topic (4), each system (2), final

23 Measure of Effectiveness Unbalanced F-Measure: –P = precision –R = recall –  = 0.8 Favors precision over recall This models an application in which: –Fluent translation is expensive –Missing some relevant documents would be okay

24 French Results Overview  CLEF  AUTO

25 English Results Overview  CLEF  AUTO

26 Commercial vs. Gloss Translation Commercial Machine Translation (MT) is almost always better –Significant with one-tail t-test (p<0.05) over 16 trials Gloss translation usually beats random selection |-------- Broad topics ----------||-------- Narrow topics ---------|

27 iCLEF 2002 Experiment Design Query Formulation Automatic Retrieval Interactive Selection Mean Average Precision F 0.8 Standard Ranked List Topic Description

28 Maryland Experiments 48 trials (12 participants) –Half with automatic query translation –Half with semi-automatic query translation 4 subjects searched Der Spiegel and SDA –20-60 relevant documents for 4 topics 8 subjects searched Der Spiegel –8-20 relevant documents for 3 topics 0 relevant documents for 1 topic!

29 Some Preliminary Results Average of 8 query iterations per search Relatively insensitive to topic –Topic 4 (Hunger Strikes): 6 iterations –Topic 2 (Treasure Hunting): 16 iterations Sometimes sensitive to system –Topics 1 and 2: system effect was small –Topics 3 and 4: fewer iterations with semi-automatic Topic 3: European Campaigns against Racism

30 Subjective Evaluation Semi-automatic system: –Ability to select translations – good Automatic system: –Simpler / less user-involvement needed - good –Few functions / easier to learn and use – good –No control over translations - bad Both systems: –Highlighting keywords helps - good –Untranslated/poorly-translated words - bad –No Boolean or proximity operator – bad

31 Outline User needs System design User studies  Next steps

32 Next Steps Quantitative analysis from 2002 (MAP, F) –Iterative improvement of query quality Utility of MAP as a measure of query quality? Utility of semiautomatic translation –Accuracy of relevance judgments Search strategies –Dependence on system –Dependence on topic –Dependence on density of relevant documents

33 An Invitation Join CLEF –A first step: Hungarian topics –http://clef.iei.pi.cnr.it Join iCLEF –Help us focus on true user needs! –http://terral.lsi.uned.es/iCLEF


Download ppt "August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for."

Similar presentations


Ads by Google