Download presentation
Presentation is loading. Please wait.
Published byAmelia Chandler Modified over 9 years ago
1
A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc.
3
‘wine’ in Europe
4
Al Hamra (= ‘red’ in Arabic)
6
Local and non-local information Madison Wisconsin Milwaukee ‘s downtown More non-local information -> too many states to get probabilities
7
Candidate places 38 01'10.5"N 121 44'48.8"W four miles south of Lusaka –(22.10 S 15.51 E) Deir az Zor –(32.10 N 41.11 E), 0.325 –(25.03 N 31.44 E), 0.151 –(….) confidence
8
Local context resident of Madison Minister Ishihara Ishihara, Japan (32.36 N 147.21 E) Madison, WI; Madison, ID; Madison, CT; Madison, KY…
9
Context affects confidence Increase or decrease c(p,n) based on strength of context words –“by Madison” vs. “President Madison” –can be added manually or automatically and/or use HMM
10
Local context problems Madison family attractions Madison, WI; Madison, ID; Madison, CT; Madison, KY… Milwaukee
11
Using spatial patterns of geographic references
12
Madison Milwaukee Wisconsin Increase c(p,n) based on number of other references: Enclosing regions or nearby points
13
Pitfalls Ishihara, Japan (32.36 N 147.21 E) Ishihara, Japan’s leading epidemiologist,
14
Training “Philadelphia” is usually geographic; “Bend” usually isn’t If name n often refers to point p in documents, give (n,p) high confidence to start with Use average confidence in a large corpus
15
Training cont’d Extract local linguistic contexts that often occur with geographic names in tagged corpora Or train HMM
16
Relevance Several dimensions to relevance: –Traditional textual relevance of query terms –Georelevance Query: “cheese” in France
17
Georelevance Aim: combination reflects user’s preferred balance between recall and correctness of the geographic reference e.g. Georelevance = query term relevance * geoconfidence Depends on: –Attributes of the geotext, e.g. document frequency, font size, position –Geoconfidence
18
Conclusion Ambiguity problem much worse with large gazetteers Can use probabilistic methods where feasible (local information), combine with confidence-based heuristics
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.