Presentation is loading. Please wait.

Presentation is loading. Please wait.

A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc.

Similar presentations


Presentation on theme: "A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc."— Presentation transcript:

1 A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc.

2

3 ‘wine’ in Europe

4 Al Hamra (= ‘red’ in Arabic)

5

6 Local and non-local information Madison Wisconsin Milwaukee ‘s downtown More non-local information -> too many states to get probabilities

7 Candidate places 38 01'10.5"N 121 44'48.8"W four miles south of Lusaka –(22.10 S 15.51 E) Deir az Zor –(32.10 N 41.11 E), 0.325 –(25.03 N 31.44 E), 0.151 –(….) confidence

8 Local context resident of Madison Minister Ishihara Ishihara, Japan (32.36 N 147.21 E) Madison, WI; Madison, ID; Madison, CT; Madison, KY…

9 Context affects confidence Increase or decrease c(p,n) based on strength of context words –“by Madison” vs. “President Madison” –can be added manually or automatically and/or use HMM

10 Local context problems Madison family attractions Madison, WI; Madison, ID; Madison, CT; Madison, KY… Milwaukee

11 Using spatial patterns of geographic references

12 Madison Milwaukee Wisconsin Increase c(p,n) based on number of other references: Enclosing regions or nearby points

13 Pitfalls Ishihara, Japan (32.36 N 147.21 E) Ishihara, Japan’s leading epidemiologist,

14 Training “Philadelphia” is usually geographic; “Bend” usually isn’t If name n often refers to point p in documents, give (n,p) high confidence to start with Use average confidence in a large corpus

15 Training cont’d Extract local linguistic contexts that often occur with geographic names in tagged corpora Or train HMM

16 Relevance Several dimensions to relevance: –Traditional textual relevance of query terms –Georelevance Query: “cheese” in France

17 Georelevance Aim: combination reflects user’s preferred balance between recall and correctness of the geographic reference e.g. Georelevance = query term relevance * geoconfidence Depends on: –Attributes of the geotext, e.g. document frequency, font size, position –Geoconfidence

18 Conclusion Ambiguity problem much worse with large gazetteers Can use probabilistic methods where feasible (local information), combine with confidence-based heuristics


Download ppt "A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc."

Similar presentations


Ads by Google