Download presentation
Presentation is loading. Please wait.
Published byAshtyn Woolery Modified over 9 years ago
1
Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg
2
2 Current Situation In Web-IR Everybody is online But never seen
3
3 Current Situation In Web-IR Queries are too short Resultsets are too large You can effectively block your competitors Good results get buried Smaller Results Ways to drill the ice-berg
4
4 Solutions Personalized Search Dynamic/Interactive Search
5
5 Geographic Web-IR Location is the most personal property „All business is local“ People already use the web geographically „Yoga Brooklyn“ „Linux usergroup Frankfurt“ And get poor results We are going to make that a lot better
6
6 How-Not-To Semantic Web „If just everybody included Geographic Markup in their web-pages“ Two problems Chicken-Egg Malicious Webmaster Metatags Anyone? Bottomline Semantic web is for „B2B“ situations only.
7
7 How-To Modify traditional IR techniques to extract geographic markers Multigranular approach Extending basic Web-IR Map pages to geographic positions Footprint Aggregate and Cluster them Build Applications Geographic Search Geographic Web-Mining
8
8 Geocoding Footprint Geographic Position of a Webpage Set of points and polygons, associated with some amplitude
9
9 Preliminaries Basic IR Assumptions can easily be extended to „geographic-IR“ Radius-1 Hypothesis Radius-2 Hypothesis (co-citation) Intra-Site Hypothesis Intra-subdomain Intra-directory
10
10 Multigranularity Information extraction on different levels Domain Subdomain Directory File Need to aggregate Dir File Dom SDom Dir File
11
11 Sources On all levels Names of places Zip-codes Area-codes On Site Level Whois Business Directories Links Density over a given area Radius-1 and Radius-2 Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10 th WWW, 2001 Geospatial Mapping and Navigation of the Web Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000 Computing Geographical Scopes of Web Resources Dir File Dom SDom Dir File
12
12 Geographic Search A simple interface Not so exciting, but... Key Words City Street State Area code SEARCH
13
13 Dynamic Geographic-IR Replacing the „next“ button Closer Continue Wider Next Closer Wider Next ½ mile 1 mile 2 miles 5 miles 10 miles 25 miles 100 miles
14
14 Locality Final ranking is a (linear) combination of importance and geographic distance. Chances are: Amazon will still rank first: no matter where you are Amazon is a „global bully“ Idea: Eliminate global bullies by computing importance differently Give less weight to links that span a longer distance
15
15 Evaluation Evaluation Web-IR is hard Evaluating geo-Search is even harder Mistakes are hard to find
16
16 Impact of geo-IR Next generation Search Engine Location based Service For cellphones under UMTS Move traffic from A&E Local companies will get more traffic Increase Profits from Adwords Smallest businesses will advertise online Locally focused The „Leaflet-industry“ will shrink
17
17 Geographic Web-Mining The web reflects human society. Distorted Delayed/Ahead A lot of interesting social questions can be answered by looking at a large webcrawl You can save time and money compared to door- to-door surveys This is widely used But: Most of them are of geographic nature
18
18 Example Queries Where in Germany are vintage sneakers a trend? Is there a fashion authority that is accepted in all regions of Germany? Do Britney and Madonna have the same audience? Draw a map of Germany with all sites about vintage sneakers. Find all fashion-sites that get a min of 1000 equally distributed links. Map the areas in Germany, where there are significantly more Sites for B. than for M. Precise Semantics?
19
19 Current Work Older Prototype Metasearch on top of lycos.de Screen-scrape & re-order Whois only Did very well
20
20 Current Work Current Prototype for Geographic Search Limited to Germany =.de domains 50.000.000 Pages Expected online by late summer In co-operation with Yen-Yu Chen Xiaohui Long Torsten Suel Polytechnic University, Brooklyn
21
21 Reinventing Web-IR Nearly no (academic) work in geo-IR Allmost every aspect of Web-IR needs to be looked at again Interfaces Query processing Index distribution Link analysis User profile analysis Spam detection Even: Other aspects of personalized search Changes in the web
22
22 Thank you Any questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.