Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014.

Similar presentations


Presentation on theme: "Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014."— Presentation transcript:

1 Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014

2 Outline 1.Search Engine results ranking based on location 2.Review of Personalized Mobile Search Engine 3.Extraction of Address Data from Unstructured Text

3 Search Engine Results Ranking based on Location Carolyn Watters and Ghada Amoudi Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia. Canada. E-mail: watters@cs.dal.ca Publication Year: 2003

4 Result Ranking in Search engine ( as in the year 2002 ) Search engine build their indexes based on a)Keyword occurence Frequency of query negotiation Prons + Robust, Fast Cons -User sort through pages when queries related to physical distance and location 44 % of users frustrated by search engine (Realname,2000)

5 Geosearcher Location based ranking system Translate search reference point into coordinates (Long,Lat) Rank search results in ascending order based on distance Geosearcher architecture

6 Geosearcher architecture-Query Presented by end system users e.g skiing resort District of Columbia Query- Skiing resolt Reference Point- District of Columbia Sample random Urls available ( used for evaluation )

7 Geosearcher architecture-Geocoding Process of assigning latitude and longitude coordinates to the host for each site; - Preliminary work ( Perfomed by researchers) a) Determine Location b) Create Lookup table

8 Geosearcher architecture-Geocoding a)Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries b) Lookup TableLookup Table - Country Codes with Coordinates www.about.com www.dartmouth.ca mathresource.com

9 Geosearcher architecture-Geocoding a)Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries Lookup Table Country CodeState CodeArea CodeCoordinates(Lat,Long) USAL25634.9200, 87.2703 USCA53038.8951, 77.0367 CANS90245.0000, 63.0000 FIHelsinki 60.1708, 24.9375 NOOslo 59.9500, 10.7500

10 Example: Location Information Getty thesaurus Whois Database

11 Geosearcher architecture-Geocoding The Process a)Check coordinates from host tablehost table b)If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province c) If not,strip down domain by 1 level (i.e data.about.com to about.com ) d) Unmatched names checked in IPtoLL(Host-LatLong Conversion) - IPtoLL uses administrative contact Store Results in host table Next

12 Geosearcher architecture-Geocoding The Process a)Check coordinates from host table b)If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province Host Table HostCoordinates(Lat,Long) www.skibluemt.com34.9200, 87.2703 www.dcski.com38.8951, 77.0367

13 Distance and Ranking For Ranking URL in host table from ref Location Calculated using haversine distance Stored in session host table Rank results based on distance (Insertion sort)

14 Results Unranked Result- Altavista Using Geosearcher

15 Results..contd Validation of accuracy Examined 100 result manually for Location Information 90 websites assigned correctly 78% of 83 URLs were accurately identified

16 Results..contd Algorithm Effectiveness Tested with 10 sets of 100 URLs using Yahoo Random Link generator

17 Personalized Mobile Search Engine Using Location and Content Concepts Namrata G Kharate ME-Computer-II MCOERC, Nasik-India Prof. S. A. Bhavsar Assistant Prof. Computer Dept. MCOERC, Nasik-India Publication: November, 2013

18 Search - Mobile Devices Search queries on mobile Devices – Shorter,ambiguous Search Results- Less Accurate Solution We need a system that capture user preference to return personalized result ranking  Personalized Mobile Search Engine (PMSE)

19 PMSE- System Architecture RSVM- Ranking Support Vector MachineNext

20 PMSE- System Architecture RSVM- Ranking Support Vector Machine

21 PMSE Client Receive user requests Store Click through Data (Location,Content) Submit Request to server Display results Profile preference in ontology based user profile Server Forward request to commercial search engine RSVM Training Search Result Reranking

22 Extraction of Address Data from Unstructured Text using Free Knowledge Resources Sebastian Schmidt schmidt@kom.tudarmstadt.De Simon Manschitz manschitz@stud.tudarmstadt.de Publication: November, 2013 Ralf Steinmetz steinmetz@kom.tudarmstadt.de Christoph Rensing rensing@kom.tudarmstadt.de Multimedia Communications Lab Technische Universität Darmstadt Germany

23 Extraction of Address Data Is of interest in various domains o Location – based services o Address respiratory –automatically created - Automatic harvesting of web address is not possible Solution Identify business address data,hybrid approach Combine Pattern & Gazetteers

24 Address Structure-Germany Company Name- No special pattern Street- varies, Burgermeister-Jung,Bgm.-Jung Street # - Digit sequence, e.g 45a,45-47 Postal Code-exactly 5 numbers,reserved Cities –Frankfurt,Ffm,Frankfurt/Main

25 Address Data Identification Workflow

26 Address Data Identification Preprocessing Strip HTML Markup –e.g using Beautiful Soap Library Strip HTML Clearing- Removing non-unicode chars,White space btn numbers Line Splitting and Tokenizing –using Apache openNLP toolkitLine Splitting Part of Speech Tagging- using TreeTagger Next

27

28 Address Data Identification Line Splitting Line Splitting and Tokenizing –using Apache openNLP toolkit

29 Address Data Identification 1. Postal Codes Token regular expression [0-9]{5} 2. Cities Generated list based on OpenStreetMap accessed via Overpass-API (28,087 entries) o Known city found in the list o Preceded directly by postal code

30 Address Data Identification 3. Street Numbers Use Regular expression ([0-9]{1,3})([a-zA-Z][0-9]?)?(([+|- ])([0-9]{1,3})([a-zA-Z][0-9]?)?)? 4. Steet Names Generated list based on OpenStreetMap accessed via Overpass-API (300,000 entries) o Use street name endings e.g str

31 Address Data Identification 5. Company Name Search Identical terms ( Wikipedia )- 29 terms e.g GmbH-Private,AG-Public Exploit standard address structure

32 Evaluation & Methology Site with Legal Note (1,576 websites ) Fraction of full address identified correctly Rcorrect Address- 0.946, Rcompany-0.82

33 Conclusion Search engine Ranking Evaluation- Algorithm was accurate and effective Efficiency- Impacted by reliance on external databases Reccommendation Have Database of special resources – Increase efficiency Adaptation to other languages- Address extraction

34 Thank You! (Q&A)


Download ppt "Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014."

Similar presentations


Ads by Google