Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Mandl: GeoCLEF Track Overview 2007 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.

Similar presentations


Presentation on theme: "Thomas Mandl: GeoCLEF Track Overview 2007 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop."— Presentation transcript:

1 Thomas Mandl: GeoCLEF Track Overview 2007 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) mandl@uni-hildesheim.de 8 th Workshop of the Cross-Language Evaluation Forum (CLEF) Budapest 20 Sept. 2007 GeoCLEF Track Overview: Query Classification and Geographic Search

2 Thomas Mandl: GeoCLEF Track Overview 2007 2 GeoCLEF Administration Joint effort of –Fredric Gey (U. California at Berkeley) –Diana Santos (Linguateca, SINTEF ICT, Norway) –Mark Sanderson (U. Sheffield) –Nicola Ferro, Giorgio Di Nunzio (U. Padua) –Thomas Mandl, Christa Womser-Hacker (U. Hildesheim) –and many others...

3 Thomas Mandl: GeoCLEF Track Overview 2007 3 Geographic Information Systems

4 Thomas Mandl: GeoCLEF Track Overview 2007 4 Initial Aim of GeoCLEF Aim: to evaluate retrieval of multilingual documents with an emphasis on geographic search (GIR) –“find me news stories about riots near Dublin” (Fred Gey @ CLEF Workshop 2005)

5 Thomas Mandl: GeoCLEF Track Overview 2007 5ParticipationParticipation CLEF Year200520062007 Nr. of Participants 111713 Nr. of submitted Experiments 117149108 New: Query Classification Task 6 participants

6 Thomas Mandl: GeoCLEF Track Overview 2007 6 Search Task 2007 Three languages 600,000 + docs 25 topics (75 in three years now) Intention behind topics –geographically challenging

7 Thomas Mandl: GeoCLEF Track Overview 2007 7Reliability?Reliability? 25 topics are sufficient under most circumstances to reliably order systems (Sanderson & Zobel 2005)

8 Thomas Mandl: GeoCLEF Track Overview 2007 8 Partial Swap Rate Analysis Average Correlation between system rankings of full and partial topic set for German Mono-lingual task 2006

9 Thomas Mandl: GeoCLEF Track Overview 2007 9 Search Task How much and which geo knowledge and reasoning is necessary? Each year, keyword based systems do well on the task

10 Thomas Mandl: GeoCLEF Track Overview 2007 10 Query Classification Task Goal: find geo queries in a log of real queries New in 2007 Organized by Xing Xie (Microsoft Research Asia, Beijing, China)

11 Thomas Mandl: GeoCLEF Track Overview 2007 11DataData Query log from the MSN search engine –in English –800.000 queries (collected August 2006) –500 queries were labelled and used for evaluation 100 queries for training 400 for testing

12 Thomas Mandl: GeoCLEF Track Overview 2007 12TaskTask Find queries with a geographic scope –Extract where component –Extract geo-relation-type –Extract what component –Classify what type {information, yellow page, map} Example: lottery information Florida, US in YES Lottery in Florida

13 Thomas Mandl: GeoCLEF Track Overview 2007 13Geo-relation-typesGeo-relation-types 27 classes Examples: –In –On –Near –Along –Distance –North_of –North_west_of –North_to –…

14 Thomas Mandl: GeoCLEF Track Overview 2007 14 Evaluation Set 1.Choose 800 queries randomly from the query set. 2.Remove the typos and the ambiguous queries from the 800 ones manually. 3.Select the queries with special geo-relations from the remainder queries in the query set manually and add them to the evaluation set. 4.Select 500 queries for the final evaluation set.

15 Thomas Mandl: GeoCLEF Track Overview 2007 15 Query Types in Evaluation Set

16 Thomas Mandl: GeoCLEF Track Overview 2007 16 Evaluation Metrics Three assessors –individually assessed all system answers –reached an agreement Fully Correct classified query instances Recall, precision and combined F1-Score

17 Thomas Mandl: GeoCLEF Track Overview 2007 17ResultsResults TeamPrecisionRecallF1 Ask0.6250.2580.365 Csusm0.2010.1970.199 Linguit0.1120.0380.057 Miracle0.4280.5660.488 Talp0.2220.2490.235 Xldb0.0960.080.088

18 Thomas Mandl: GeoCLEF Track Overview 2007 18ApproachesApproaches Gazeteers for location identification –Large base of geo names Pre-defined Rules Issues –Low Perfomance –Few training classes for many geo-types

19 Thomas Mandl: GeoCLEF Track Overview 2007 19 Parallel Session on Friday 9:00 – 10:30 Please come to the Breakout Session on Friday 11:00 – 12:00 and help us to promote GeoCLEF More on GeoCLEF

20 Thomas Mandl: GeoCLEF Track Overview 2007 20OverviewOverview More on the geographic search task … Mark Sanderson: Topic Development and Relevance Assessment Giorgio Di Nunzio: Results Diana Santos: Approaches and Interpretation


Download ppt "Thomas Mandl: GeoCLEF Track Overview 2007 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop."

Similar presentations


Ads by Google