Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg.

Similar presentations


Presentation on theme: "Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg."— Presentation transcript:

1 Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg

2 2 Current Situation In Web-IR  Everybody is online  But never seen

3 3 Current Situation In Web-IR  Queries are too short  Resultsets are too large  You can effectively block your competitors  Good results get buried  Smaller Results  Ways to drill the ice-berg

4 4 Solutions  Personalized Search  Dynamic/Interactive Search

5 5 Geographic Web-IR  Location is the most personal property  „All business is local“  People already use the web geographically  „Yoga Brooklyn“  „Linux usergroup Frankfurt“  And get poor results  We are going to make that a lot better

6 6 How-Not-To  Semantic Web  „If just everybody included Geographic Markup in their web-pages“  Two problems  Chicken-Egg  Malicious Webmaster  Metatags Anyone?  Bottomline  Semantic web is for „B2B“ situations only.

7 7 How-To  Modify traditional IR techniques to extract geographic markers  Multigranular approach  Extending basic Web-IR  Map pages to geographic positions  Footprint  Aggregate and Cluster them  Build Applications  Geographic Search  Geographic Web-Mining

8 8 Geocoding  Footprint  Geographic Position of a Webpage  Set of points and polygons, associated with some amplitude

9 9 Preliminaries  Basic IR Assumptions can easily be extended to „geographic-IR“  Radius-1 Hypothesis  Radius-2 Hypothesis (co-citation)  Intra-Site Hypothesis  Intra-subdomain  Intra-directory

10 10 Multigranularity  Information extraction on different levels  Domain  Subdomain  Directory  File  Need to aggregate Dir File Dom SDom Dir File

11 11 Sources  On all levels  Names of places  Zip-codes  Area-codes  On Site Level  Whois  Business Directories  Links  Density over a given area  Radius-1 and Radius-2  Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10 th WWW, 2001 Geospatial Mapping and Navigation of the Web  Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000 Computing Geographical Scopes of Web Resources Dir File Dom SDom Dir File

12 12 Geographic Search  A simple interface  Not so exciting, but...  Key Words  City  Street  State  Area code  SEARCH

13 13 Dynamic Geographic-IR  Replacing the „next“ button  Closer  Continue  Wider  Next  Closer  Wider  Next  ½ mile  1 mile  2 miles  5 miles  10 miles  25 miles 100 miles

14 14 Locality  Final ranking is a (linear) combination of importance and geographic distance.  Chances are:  Amazon will still rank first: no matter where you are  Amazon is a „global bully“  Idea:  Eliminate global bullies by computing importance differently  Give less weight to links that span a longer distance

15 15 Evaluation  Evaluation Web-IR is hard  Evaluating geo-Search is even harder  Mistakes are hard to find

16 16 Impact of geo-IR  Next generation Search Engine  Location based Service  For cellphones under UMTS  Move traffic from A&E  Local companies will get more traffic  Increase Profits from Adwords  Smallest businesses will advertise online  Locally focused  The „Leaflet-industry“ will shrink

17 17 Geographic Web-Mining  The web reflects human society.  Distorted  Delayed/Ahead  A lot of interesting social questions can be answered by looking at a large webcrawl  You can save time and money compared to door- to-door surveys  This is widely used  But:  Most of them are of geographic nature

18 18 Example Queries  Where in Germany are vintage sneakers a trend?  Is there a fashion authority that is accepted in all regions of Germany?  Do Britney and Madonna have the same audience?  Draw a map of Germany with all sites about vintage sneakers.  Find all fashion-sites that get a min of 1000 equally distributed links.  Map the areas in Germany, where there are significantly more Sites for B. than for M.  Precise Semantics?

19 19 Current Work  Older Prototype  Metasearch on top of lycos.de  Screen-scrape & re-order  Whois only  Did very well

20 20 Current Work  Current Prototype for Geographic Search  Limited to Germany =.de domains  50.000.000 Pages  Expected online by late summer  In co-operation with  Yen-Yu Chen  Xiaohui Long  Torsten Suel  Polytechnic University, Brooklyn

21 21 Reinventing Web-IR  Nearly no (academic) work in geo-IR  Allmost every aspect of Web-IR needs to be looked at again  Interfaces  Query processing  Index distribution  Link analysis  User profile analysis  Spam detection  Even:  Other aspects of personalized search  Changes in the web

22 22 Thank you Any questions?


Download ppt "Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg."

Similar presentations


Ads by Google