Presentation is loading. Please wait.

Presentation is loading. Please wait.

IR & Web Search Engines Architectural design considerations.

Similar presentations


Presentation on theme: "IR & Web Search Engines Architectural design considerations."— Presentation transcript:

1 IR & Web Search Engines Architectural design considerations

2 About Patrick O’Leary Search Architect for AT&T interactive Principal Search Engineer AOL Co-Founder Cost2Drive Creator of Local / Spatial Lucene Author, Photographer, Dog person

3 Basic’s of a search engine Consumer facing ( Critical often forgotten ) – User interface is a science not just an art ! – http://eyetools.com/research_google_eyetracking_heatmap.html http://eyetools.com/research_google_eyetracking_heatmap.html – Part of Power Log click distribution Search engine software – Retrieval of candidate results – Ranking of results – Categorization / Classification / Federation ( Google single search box) Data Acquisition – Crawling the web – Purchasing from data providers – Editorial content – Enriching, matching, merging ( Google Place Pages ) – De-duplication ( Helps extended results ) Measuring – Business Intelligence – Quality Feedback ( click through rates )

4 Beyond the algorithm In 2001, demand for content changed – Driven by news, media, “what just happened”. – Search engines we unable to respond – TV, Print, Editorial Driven content providers still important – Need to go beyond the data Customizable Search Results – Drill down, restrict, reshape, results – Vertical / Federated – Yelp.com Personalized – Collaborative filtering Real time – News, Sports Viral – Twitter, Foursquare – URL shorten-ers, Bit.ly – ISP’s access log data Trending – Provide Navigation & Recommendations, not Search ( Google News )

5 Better Than Google? How do you beat Google – Good The biggest, best & brightest The most money Household name Dear Yahoo, I've never heard anyone say, "I don't know, let's Yahoo! it..." just saying... Sincerely, Google – Bad Too many engineers ! Limited user focus Clinical vs. Avatars, Facebook, MySpace, Bing (backgrounds), social, sharing Google became great because of page rank, clean UI, and AdSense. Google stayed great because they focused on scaling out what they were great at.

6 Why better than Google? CMS (digital news paper, editorial content) – Index faster – Restricted content Competitive Search Engine – Controlled ranking – Trade off relevancy for monetization Intranet – Not publically accessible, can’t be crawled – Cheaper implementations than Google Search Appliance or Google Mini

7 Recommended Reading The Long Tail – Chris Anderson ( Read with Caution! ) A Picture of Search – Abdur Chowdhury & Greg Pass


Download ppt "IR & Web Search Engines Architectural design considerations."

Similar presentations


Ads by Google