Presentation is loading. Please wait.

Presentation is loading. Please wait.

GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See www.geo-spirit.org.

Similar presentations


Presentation on theme: "GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See www.geo-spirit.org."— Presentation transcript:

1 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See www.geo-spirit.org for information on SPIRIT project, the contributing partners, and downloads of articles and project deliverables.

2 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 2 What is Geo-information? Geo-information associates things and events with places Rich vocabulary: Place names, coordinates, geometric objects, spatial relationships, spatial structures, patterns, paths, flows, interactions…

3 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 3 Where is Geo-information? Personal knowledge –of landscape, of where things, people and services are located, where things happened Documents (various media) –Lists of where facilities, resources, structures are located –Textual descriptions of geographic phenomena –Images and videos of geographic space Maps

4 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 4 GIS and the Web A GIS typically : –Isolated –Supports individual organisation –Small range of topics –Structured data / geo-coded locations –Finds answers –Accessed privately –Complicated to use World Wide Web is : –Global networked –Supports everyone on Internet –Vast range of topics –Unstructured free text / images –Finds documents –Accessed publicly –Easy to use

5 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 5 Problems with WWW as a source of geo-information Geographic context embedded in natural language descriptions Place names ambiguous and confused with names of organisations, people, buildings and streets Web queries depend on exact match of text terms No intelligent interpretation of spatial relationships (near, west etc) No geo-relevance ranking

6 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 6 Current motivation of GIR : Find geo-specific resources on the Web find web resources about Something related_to Somewhere related_to = in, near, within Xkm, north_of..etc. Resolve ambiguity of names (many places have same name) Interpret the query spatial relationships query footprint Find documents geographically associated with region of query footprint Relevance rank geographically by place and subject nearnorth

7 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 7 GIR, GIS and The Web Geo- knowledge GIS The Web GIR World Knowledge

8 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 8 Geographical Search Engines Google etc have local versions. -Based on business (yellow pages) directories.

9 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 9 Geographical Search Engines SPIRIT research prototype general geo-web search Structured user interface: Dropdown menu of spatial relationships

10 10 Geographical search engines SPIRIT Results listed as URLs Plus symbols on map User Interface screen shots from Ross Purves et al University of Zurich

11 11 Anatomy of a Geographical Search Engine Textual Spatial Indexes Spatial Textual Search Engine Relevance Ranking Ranked Results Search Request + Query footprint Unranked Results Place Ontology User Interface Broker Ranked Results Query disambiguation Geo- tagging Textual Spatial Web Resources Document Footprints Text Indexing Query footprint

12 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 12 Geo-Tagging = Geo-parsing + Geo-coding Geo-parsing Recognising geographic references (ignoring non-geographic uses of place terminology) Geo-coding –Attaching a unique quantitative locations (footprint) to geographic references

13 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 13 Geo-parsing The presence of place names can be recognised with gazetteers (i.e. lists of names) Some types of genuine geographic reference –the name of the place : Sao Paulo –an address School of Computer Science, Cardiff University, 5 The Parade, Cardiff –an address fragment Ross lived in Dalmeny Street in Edinburgh –a postcode / zip code CF24 3XF –a phone number most Cardiff phone numbers start with 02920

14 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 14 Geo-Parsing : true & false references Some types of false geographic reference Personal names Smedes York, Jack London Business name Dorchester Hotel, York Properties.. Street names Oxford Street, London Road… Common words that are also places bath, battle, derby, over, well, ……

15 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 15 Geo-Parsing : distinguishing between false and true geo-references Look for patterns and context Personal names (Jack London, Mr York): ; Business names (Paris Hotel) : (or vice versa) Street names (Oxford Street) : Detect spatial propositions in, near, south of, outside etc he lived in Over

16 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 16 Geo-coding (grounding) the genuine geo-references Many different places with the same name (referent ambiguity) Newport, Cambridge, Springfield……… Use context to decide (references to parent or nearby places ) Or – choose most important one (by population or place type hierarchy)

17 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 17 Geo-Coding : What is the geo-focus of a web page? Frequency of occurrence Do multiple places have common parent?

18 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 18 Anatomy of a Geographical Search Engine Textual Spatial Indexes Spatial Textual Search Engine Relevance Ranking Ranked Results Search Request + Query footprint Unranked Results Place Ontology User Interface Broker Ranked Results Query disambiguation Geo- tagging Textual Spatial Web Resources Document Footprints Text Indexing Query footprint

19 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 19 Indexing Web Resources Standard text index is inverted file Query: Restaurants in Cardiff Find documents that contain all terms Works literally for in but wont find contained places. Doesnt work in general for near, Xkms from, north_of etc appleDoc79, Doc89, Doc822…. CardiffDoc2, Doc19, Doc37, … doorDoc16, Doc49, Doc112….. hotelDoc1, Doc2, Doc23, … inDoc4, Doc7, Doc19… LondonDoc20, Doc35, Doc150….. pubDoc9, Doc11, Doc100, … restaurantDoc19, Doc22, Doc37,.. …………………….………………………………………….. Text TermList of resources containing term

20 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 20 Why Spatial Indexing? Query : Castles in Wales Need to find documents that refer to names of places in Wales (perhaps without mentioning Wales) Query Hotels outside and within 30Kms of Rio Need to documents referring to hotels that are in places other than Rio In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Rio within 30km

21 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 21 Spatial indexing of resources Use prime foci of documents to create document footprints (point, polygon, bounding rectangle..) Use footprints to index documents Convert query to a query footprint Match query footprint to doc. footprints Spatial QueryResult

22 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 22 Combining text and spatial indexing : spatio-textual indexing Space-primary (ST) : textual index for each spatial cell Text-primary (TS) : spatial index for each term Separate S and T indexes (T)

23 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 23 D A B C Spatial Query Results term1 docs term2 docs term3 docs Text Index B term2 docs term4 docs term5 docs Text Index D Spatial-primary (ST) method Each spatial cell has a text index Retrieve document ids for query terms lying in cells intersected by query footprint High storage overhead with multiple text indexes

24 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 24 Text primary spatial indexing For each text query term, retrieve ids of documents lying in spatial cells intersecting the query footprint High storage overhead – multiple spatial indexes Time performance better than ST Results Spatial Query term1 term2 term3 Text Index B D A B CD A B CD A B C Index Entry: term2 : cellB(D 1, D 7 ); cellD(D 3, D 11, D 13 )… For each term, store spatial index of documents containing the term

25 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 25 Separate spatial and textual index Access spatial index with query footprint Access text index with concept terms Merge results – find intersection Relatively small storage overhead with spatial index Time performance superior (in latest experiments) Term1D1, D2, D23, … Term2D9, D11, D100, … Term3D27, D85,..

26 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 26 Anatomy of a Geographical Search Engine Textual Spatial Indexes Spatial Textual Search Engine Relevance Ranking Ranked Results Search Request + Query footprint Unranked Results Place Ontology User Interface Broker Ranked Results Query disambiguation Geo- tagging Textual Spatial Web Resources Document Footprints Text Indexing Query footprint

27 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 27 Geographical Relevance Ranking Determine distance between query footprint and document footprint Depends on query spatial operator (in, outside, XKms, north_of etc) Spatial score Example: airports near Leicester the further away, the lower the spatial score D Q Figure from Marc van Kreveld, University of Utrecht

28 28 Combining textual and spatial scores Textual scores: BM25 Spatial scores: by spatial footprint analysis 0 1 1 normalized BM25 score spatial score query / ideal footprint footprints of documents Figure from Marc van Kreveld University of Utrecht

29 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 29 Anatomy of a Geographical Search Engine Textual Spatial Indexes Spatial Textual Search Engine Relevance Ranking Ranked Results Search Request + Query footprint Unranked Results Place Ontology User Interface Broker Ranked Results Query disambiguation Geo- tagging Textual Spatial Web Resources Document Footprints Text Indexing Query footprint

30 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 30 Place Ontology Encodes knowledge of terminology and structure of geographic space alternative names, languages place types (political, topographic, social.. ) footprint (point, MBR, polygon) spatial relationships and attributes : containment, adjacency, overlap imprecise (vernacular) places (Midlands, south of France, Scottish borders, Pennines, Highlands…..) Derive from gazetteers, thesauri, maps & web

31 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 31 Roles of Place Ontology User Interface Query Disambiguation Geo-Tagging Metadata Extraction Web collection document footprints Relevance Ranking Spatial Index document footprints Search Component Query Expansion (query footprint) ontology

32 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 32 Mining text on the web for vernacular place name knowledge Objective: estimate spatial extent of vague place Documents that refer to vague places may also refer to more precise places inside them. Places that occur frequently in association with a target named place may have higher chance of being inside Analyse frequency of occurrence of co- located places

33 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 33 Places mentioned in documents retrieved by queries on the Cotswolds Figure from Ross Purves et al University of Zurich

34 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 34 Summary of web mining procedure Submit web search engine queries referring to a target place Geo-Parse resulting highest ranking web pages for occurrence of place names Geocode place names with coordinates Create density surface model of co- occurring places and extract approximate boundary (contour).

35 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 35 Formulating appropriate web queries Region only, e.g. Rocky Mountains –Retrieves all documents mentioning the name Region + Concept, e.g. Hotels in Cotswolds –Tends to retrieve directory pages listing places associated with the target place Region and lexical pattern (trigger phrase), e.g. Midwest towns such as; in the South of France –Reduces the number of relevant documents retrieved but can work well for those documents –Problem of not enough hits for statistical analysis Region + Concept produces highest numbers of co-associated places in top ranking documents.

36 36 Devon (county) Distribution of associated places Density surface at three threshold levels (1, 0.5, 0.25 points per cell) Density surface Note: some places wrongly geocoded Thresholded boundary compared with actual boundary Figure from Ross Purves et al University of Zurich

37 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 37 Vague place :Mittelland Evidence for validity of method Human interpretations of the extent + is the core Density surface of web mining results Figure from Ross Purves, University of Zurich

38 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 38 GIR and GIS GIR currently dominated by web search –Unstructured results in multiple documents Sometimes single focused result wanted Hotels within 1 kilometre of the British Museum in London Where are pre-sixteenth century dwellings in USA? Which areas of East Anglia would be flooded if sea level rose by 1 metre?

39 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 39 Bringing GIR and GIS together Geo- knowledge GIS The Web GIR World Knowledge Geo-knowledge GIS The Web GIR World Knowledge

40 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 40 GeoInformation Services Encode Geo-information in Web Services (Geo-services) Parse natural language queries Interpret geo-terminology of queries Identify the relevant geo-services to match geo and non-geo concepts Compose appropriate chain of services

41 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 41 Where is GIR going? Improve conventional GIR components: –Geo-tagging, spatio-textual indexing and geo-relevance ranking Creation of rich place ontologies with world-wide coverage Improved understanding of spatial natural language terminology Open GeoInformation Web services Adapt GIR to personal needs (Whats the quickest way out of here?)

42 GeoInfo 2006 Presentation by Chris Jones, Cardiff University 42 More Information SPIRIT project partners with local representatives: Cardiff University (Chris Jones, Project coordinator) University of Sheffield (Mark Sanderson and Paul Clough) IGN, Paris (Anne Ruas) Unversity of Utrecht (Marx van Kreveld) University of Hannover (Monika Sester) Universit of Zurich (Ross Purves and Rob Weibel) See www.geo-spirit.org for information on SPIRIT project and downloads of articles and project deliverables. [N.B. Prototype search engine (with link from SPIRIT web site) is no longer functional]


Download ppt "GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See www.geo-spirit.org."

Similar presentations


Ads by Google