Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alexandria Digital Library Project Textual-Geospatial Integration Project J AMES F REW University of California, Santa Barbara.

Similar presentations


Presentation on theme: "Alexandria Digital Library Project Textual-Geospatial Integration Project J AMES F REW University of California, Santa Barbara."— Presentation transcript:

1 Alexandria Digital Library Project Textual-Geospatial Integration Project J AMES F REW University of California, Santa Barbara

2 Textual-Geospatial Integration 2Frew JCDL 2002 gazetteer workshop 2002-07-18 Geospatially-Augmented Search o What’s here?  Find library objects associated with a given location: –Place name(s) –“Footprint” (geographic extent) o Where’s this?  Find the location(s) associated with a given library object

3 Textual-Geospatial Integration 3Frew JCDL 2002 gazetteer workshop 2002-07-18 Examples (from TREC-9) o Find documents that contain residential real estate listings within New Jersey. o Find reports on automobile traffic in the Washington, DC metropolitan area. o What forms of entertainment are available in Newport Beach, California?

4 Textual-Geospatial Integration 4Frew JCDL 2002 gazetteer workshop 2002-07-18 Why Is GAS ® Difficult? o Few library objects have explicit locations  Assigned reliably  Identified in object’s metadata o Many objects (especially text documents) have implicit locations  Present in, or inferable from, object’s content  Not necessarily identified as locations

5 Textual-Geospatial Integration 5Frew JCDL 2002 gazetteer workshop 2002-07-18 “Where’s This” Service PARSE LOOKUP ANALYZE EVALUATE text document type thesaurus gazetteer potential names, types, coordinates gazetteer entries (known places) ranked footprints and placenames “best” name(s) composite footprint

6 Textual-Geospatial Integration 6Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing o Extract “geographic facts” from text o Characterize by  Potential place component –name, type, footprint  Related fact (with preposition) –“in …”, “northeast of …”, etc.  Frequency  Importance  Context

7 Textual-Geospatial Integration 7Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing Example (1/2) (California,,,,1,K) (Callahan,,,(in,California),1,K) (Callahan-Yreka,,,(area of,),1,T) (Early Cambrian,,,,1,B) (Klamath Mountains,,,(eastern,),1,T) (Klamath Mountains,,,(within,),1,B) (Klamath Mountains,,,,1,K) (Northern California,,,,1,T) (Ordovician,,,,1,B) (Ordovician,,,,1,K) (Paleozoic,,,(in,California),1,B) (Paleozoic,,,,1,K)

8 Textual-Geospatial Integration 8Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing Example (2/2) (Silurian,,,,1,K) (Siskiyou County,,,(in,California),1,K) (Skookum Gulch,,,,1,K) (Skookum Gulch,,,,1,T) (Skookum Gulch,,,,2,B) (United States,,,,1,K) (Yreka,,,(in,California),1,K) (,fault,,,2,B) (,rocks,,,6,B) (,,N410000N420000W1220000W1230000,,1,C) (,,,(in,North America),1,B)

9 Textual-Geospatial Integration 9Frew JCDL 2002 gazetteer workshop 2002-07-18 Lookup Example: Feature Type o Fault: partial match: fault zones o Rocks: use: natural rock formations

10 Textual-Geospatial Integration 10Frew JCDL 2002 gazetteer workshop 2002-07-18 Lookup Example: Gazetteer Place Name exact partial Skookum Gulch1 0 Klamath Mountains1 0 Northern California1 0 California1492 Callahan*1 1 Silurian0 5 Siskiyou County*1 14 United States1273 Yreka*1 12 North America0 8 *within footprint of California

11 Textual-Geospatial Integration 11Frew JCDL 2002 gazetteer workshop 2002-07-18 Analysis Criteria o Placement in document  e.g. keywords, title > body o Frequency in document o Exact match in gazetteer o Accuracy of gazetteer footprint  e.g. points < bounding boxes o Scale of gazetteer footprint  Size of focus area / size of footprint

12 Textual-Geospatial Integration 12Frew JCDL 2002 gazetteer workshop 2002-07-18 Analysis Example: Results o High confidence  Callahan in California  Yreka in California  Skookum Gulch  Klamath Mountains (eastern)  Siskiyou County o Low confidence  Northern California  United States  North America

13 Textual-Geospatial Integration 13Frew JCDL 2002 gazetteer workshop 2002-07-18 Evaluation Example Skookum GluchKlamath MountainsCaliforniaCallahan in California Siskiyou County in CaliforniaUnited StatesYreka in California Additional placenames Shasta Butte City Yreka City Thompson's Dry Diggings Eastern Klamath Mountains Area of Callahan-Yreka Skookum Gulch Derived footprint

14 Textual-Geospatial Integration 14Frew JCDL 2002 gazetteer workshop 2002-07-18 “What’s Here” Service Gazetteer AIRE Document Ranker User Interface Query Parser Query Expansion Example Query: Bodies of Water near Chicago Expansion Terms: Lake Michigan, Chicago River

15 Textual-Geospatial Integration 15Frew JCDL 2002 gazetteer workshop 2002-07-18 Manual Relevance Feedback Gazetteer AIRE Query Parser User Interface Place Names “Chicago” Spatial Synonyms “Chicago, IL” “Chicago River” Query

16 Textual-Geospatial Integration 16Frew JCDL 2002 gazetteer workshop 2002-07-18 Automatic Relevance Feedback Gazetteer AIRE Document Ranker RF System Place Names, Surrounding Type Terms “Bodies of Water” Spatial Query Results “Chicago River, Lake Michigan” Expanded Query

17 Textual-Geospatial Integration 17Frew JCDL 2002 gazetteer workshop 2002-07-18 “What’s Here” Components o Place names  footprints  Requires: place name ranking scheme –Chicago, IL > Chicago tectonic plate in Brazil o Type terms  classes  Requires: class thesaurus API –“Bodies of Water”  “Water Bodies” o 3. Gazetteer  spatial synonyms  Requires: gazetteer API; results ranking –“Bodies of Water near Chicago”  set of gazetteer queries

18 Textual-Geospatial Integration 18Frew JCDL 2002 gazetteer workshop 2002-07-18 The Light at the End of the Tunnel o You submit:  a document o You get:  a place –Best –Also-rans –Alternatives o What you do with this is your business

19 Textual-Geospatial Integration 19Frew JCDL 2002 gazetteer workshop 2002-07-18 Brought To You By o UCSB  Linda Hill  Greg Janée  Dave Valentine  Satoshi Ikeda (Japan Patent Office) o IIT  Steven Beitzel  Ophir Frieder  David Grossman  Eric Jensen  Vasif Shaikh

20


Download ppt "Alexandria Digital Library Project Textual-Geospatial Integration Project J AMES F REW University of California, Santa Barbara."

Similar presentations


Ads by Google