Download presentation
Presentation is loading. Please wait.
Published byBryce Elliott Modified over 10 years ago
1
Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014
2
Outline 1.Search Engine results ranking based on location 2.Review of Personalized Mobile Search Engine 3.Extraction of Address Data from Unstructured Text
3
Search Engine Results Ranking based on Location Carolyn Watters and Ghada Amoudi Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia. Canada. E-mail: watters@cs.dal.ca Publication Year: 2003
4
Result Ranking in Search engine ( as in the year 2002 ) Search engine build their indexes based on a)Keyword occurence Frequency of query negotiation Prons + Robust, Fast Cons -User sort through pages when queries related to physical distance and location 44 % of users frustrated by search engine (Realname,2000)
5
Geosearcher Location based ranking system Translate search reference point into coordinates (Long,Lat) Rank search results in ascending order based on distance Geosearcher architecture
6
Geosearcher architecture-Query Presented by end system users e.g skiing resort District of Columbia Query- Skiing resolt Reference Point- District of Columbia Sample random Urls available ( used for evaluation )
7
Geosearcher architecture-Geocoding Process of assigning latitude and longitude coordinates to the host for each site; - Preliminary work ( Perfomed by researchers) a) Determine Location b) Create Lookup table
8
Geosearcher architecture-Geocoding a)Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries b) Lookup TableLookup Table - Country Codes with Coordinates www.about.com www.dartmouth.ca mathresource.com
9
Geosearcher architecture-Geocoding a)Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries Lookup Table Country CodeState CodeArea CodeCoordinates(Lat,Long) USAL25634.9200, 87.2703 USCA53038.8951, 77.0367 CANS90245.0000, 63.0000 FIHelsinki 60.1708, 24.9375 NOOslo 59.9500, 10.7500
10
Example: Location Information Getty thesaurus Whois Database
11
Geosearcher architecture-Geocoding The Process a)Check coordinates from host tablehost table b)If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province c) If not,strip down domain by 1 level (i.e data.about.com to about.com ) d) Unmatched names checked in IPtoLL(Host-LatLong Conversion) - IPtoLL uses administrative contact Store Results in host table Next
12
Geosearcher architecture-Geocoding The Process a)Check coordinates from host table b)If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province Host Table HostCoordinates(Lat,Long) www.skibluemt.com34.9200, 87.2703 www.dcski.com38.8951, 77.0367
13
Distance and Ranking For Ranking URL in host table from ref Location Calculated using haversine distance Stored in session host table Rank results based on distance (Insertion sort)
14
Results Unranked Result- Altavista Using Geosearcher
15
Results..contd Validation of accuracy Examined 100 result manually for Location Information 90 websites assigned correctly 78% of 83 URLs were accurately identified
16
Results..contd Algorithm Effectiveness Tested with 10 sets of 100 URLs using Yahoo Random Link generator
17
Personalized Mobile Search Engine Using Location and Content Concepts Namrata G Kharate ME-Computer-II MCOERC, Nasik-India Prof. S. A. Bhavsar Assistant Prof. Computer Dept. MCOERC, Nasik-India Publication: November, 2013
18
Search - Mobile Devices Search queries on mobile Devices – Shorter,ambiguous Search Results- Less Accurate Solution We need a system that capture user preference to return personalized result ranking Personalized Mobile Search Engine (PMSE)
19
PMSE- System Architecture RSVM- Ranking Support Vector MachineNext
20
PMSE- System Architecture RSVM- Ranking Support Vector Machine
21
PMSE Client Receive user requests Store Click through Data (Location,Content) Submit Request to server Display results Profile preference in ontology based user profile Server Forward request to commercial search engine RSVM Training Search Result Reranking
22
Extraction of Address Data from Unstructured Text using Free Knowledge Resources Sebastian Schmidt schmidt@kom.tudarmstadt.De Simon Manschitz manschitz@stud.tudarmstadt.de Publication: November, 2013 Ralf Steinmetz steinmetz@kom.tudarmstadt.de Christoph Rensing rensing@kom.tudarmstadt.de Multimedia Communications Lab Technische Universität Darmstadt Germany
23
Extraction of Address Data Is of interest in various domains o Location – based services o Address respiratory –automatically created - Automatic harvesting of web address is not possible Solution Identify business address data,hybrid approach Combine Pattern & Gazetteers
24
Address Structure-Germany Company Name- No special pattern Street- varies, Burgermeister-Jung,Bgm.-Jung Street # - Digit sequence, e.g 45a,45-47 Postal Code-exactly 5 numbers,reserved Cities –Frankfurt,Ffm,Frankfurt/Main
25
Address Data Identification Workflow
26
Address Data Identification Preprocessing Strip HTML Markup –e.g using Beautiful Soap Library Strip HTML Clearing- Removing non-unicode chars,White space btn numbers Line Splitting and Tokenizing –using Apache openNLP toolkitLine Splitting Part of Speech Tagging- using TreeTagger Next
28
Address Data Identification Line Splitting Line Splitting and Tokenizing –using Apache openNLP toolkit
29
Address Data Identification 1. Postal Codes Token regular expression [0-9]{5} 2. Cities Generated list based on OpenStreetMap accessed via Overpass-API (28,087 entries) o Known city found in the list o Preceded directly by postal code
30
Address Data Identification 3. Street Numbers Use Regular expression ([0-9]{1,3})([a-zA-Z][0-9]?)?(([+|- ])([0-9]{1,3})([a-zA-Z][0-9]?)?)? 4. Steet Names Generated list based on OpenStreetMap accessed via Overpass-API (300,000 entries) o Use street name endings e.g str
31
Address Data Identification 5. Company Name Search Identical terms ( Wikipedia )- 29 terms e.g GmbH-Private,AG-Public Exploit standard address structure
32
Evaluation & Methology Site with Legal Note (1,576 websites ) Fraction of full address identified correctly Rcorrect Address- 0.946, Rcompany-0.82
33
Conclusion Search engine Ranking Evaluation- Algorithm was accurate and effective Efficiency- Impacted by reliance on external databases Reccommendation Have Database of special resources – Increase efficiency Adaptation to other languages- Address extraction
34
Thank You! (Q&A)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.