Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.

Slides:



Advertisements
Similar presentations
Database VS. Search Engine
Advertisements

Chapter 5: Introduction to Information Retrieval
Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
A Quality Focused Crawler for Health Information Tim Tang.
Evaluating Search Engine
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Geri carter Spring 2011 Review History of start up All the US companies owned Brin and Page Chrome All the tools Cloud computing gMail.
Internet Resources Discovery (IRD) Search Engines Quality.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Information Retrieval
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Introductions Search Engine Development COMP 475 Spring 2009 Dr. Frank McCown.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Introduction to SEO August 2011 NowSourcing, Inc..
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
| 1 › Gertjan van Noord – based on the sheets by Leonoor van der Beek2013 Information Retrieval Lecture 1: introduction.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
CIS 430 November 6, 2008 Emily Pitler. 3  Named Entities  1 or 2 words  Ambiguous meaning  Ambiguous intent 4.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Search Engines.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Search Tools and Search Engines Searching for Information and common found internet file types.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Dr. Frank McCown Comp 250 – Web Development Harding University
Evaluation Anisio Lacerda.
Federated & Meta Search
Search Engines & Subject Directories
Data Mining Chapter 6 Search Engines
Anatomy of a Search Search The Index:
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Search Engines & Subject Directories
Search Engines & Subject Directories
INF 141: Information Retrieval
Presentation transcript:

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial 3.0

How do you locate information on the Web? When seeking information online, one must choose the best way to fulfill one’s information need Most popular: – Web directories – Search engines – primary focus of this lecture – Social media

Web Directories Pages ordered in a hierarchy Usually powered by humans Yahoo started as a web directory in 1994 and still maintains one: Open Directory Project (ODP) is largest and is maintained by volunteers

Search Engines Most often used to fill an information need Pages are collected automatically by web crawlers Users enter search terms into text box get back a SERP (search engine result page) Queries are generally modified and resubmitted to the SE if the desired results are not found on the first few pages of results Types of search engines: – Web search engines (Google, Yahoo, Bing) – Metasearch engines – includes Deep Web (Dogpile, WebCrawler)Dogpile WebCrawler – Specialized (or focused) search engines (Google Scholar, MapQuest)Google Scholar MapQuest

Components of a Search Engine Figure from Introduction to Information Retrieval by Manning et al., Ch 19.Introduction to Information Retrieval

Search query Paid results Organic results SERP Text snippet Indexed copy Page title

Social Media Increasingly being used to find info Limits influence of results to trusted group Figure: Nielsen study (August 2009)

Search Queries Search engines store every query, but companies usually don’t share with the public because of privacy issues – 2006 AOL search log incident 2006 AOL search log incident – 2006 govt subpoenas Google incident 2006 govt subpoenas Google incident Often short: 2.4 words on average 1 but getting longer 2 Most users do not use advanced features 1 Distribution of terms is long-tailed 3 1 Spink et al., Searching the web: The public and their queries,2001Searching the web: The public and their queries Lempel & Moran, WWW 2003

Search Queries 10-15% contain misspellings 1 Often repeated: Yahoo study 2 showed 1/3 of all queries are repeat queries, and 87% of users click on same result 1 Cucerzan & Brill, Teevan et al., History Repeats Itself: Repeat Queries in Yahoo's Logs, Proc SIGIR 2006

Query Types Informational – Intent is to acquire info about a topic – Examples: safe vehicles, albert einstein Navigational – Intent is to find a particular site – Examples: facebook, google Transactional – Intent is to perform an activity mediated by a website – Examples: children books, cheap flights Broder, Taxonomy of web search, SIGIR Forum, 2002

Google Trends

Google Flu Trends

Google Zeitgeist

My 2010 Top Queries, Sites, & Clicks

My 2010 Monthly, Daily, & Hourly Search Activity

Relevance Search engines are useful if they return relevant results Relevance is hard to pin down because it depends on user’s intent & context which is often not known Relevance can be increased by personalizing search results – What is the user’s location? – What queries has this user made before? – How does the user’s searching behavior compare to others? Two popular metrics are used to evaluate whether the results returned by a search engine are relevant: precision and recall

Precision and Recall Corpus Retrieved Relevant Overlap Precision = Overlap / Retrieved Recall = Overlap / Relevant

Example Given a corpus of 100 documents 20 are about football Search for football results in 50 returned documents, 10 are about football Precision = Overlap / Retrieved = 10/50 =.2 Recall = Overlap / Relevant = 10/20 =.5 Note: Usually precision and recall are at odds

High Precision, Low Recall Corpus RetrievedRelevant Precision = Overlap / Retrieved Recall = Overlap / Relevant Missing a lot of relevant docs!

Low Precision, High Recall Corpus Retrieved Relevant Precision = Overlap / Retrieved Recall = Overlap / Relevant Lots of irrelevant docs!

Evaluating Search Engines We don’t usually know how many documents on the entire Web are about a particular topic, so computing recall for a web search engine is not possible Most people view only the first page or two of search results, so the top N results are most important where N is 10 or 20 is the precision of the top N results

Comparing Search Engine with Digital Library McCown et al. 1 compared the of Google and the National Science Digital Library (NSDL) School teachers evaluated relevance of search results in regards to Virginia's Standards of Learning Overall, Google’s precision was found to be 38.2% compared to NSDL’s 17.1% 1 McCown et al., Evaluation of the NSDL and Google search engines for obtaining pedagogical resources, Proc ECDL 2005

F-score F-score combines precision and recall into single metric F-score is harmonic mean of precision and recall Highest = 1, Lowest = 0

Quality of Ranking Issue search query to SE and have humans rank the first N results in order of relevance Compare human ranking with SE ranking (e.g., Spearman rank-order correlation coefficient) Other ranking methods can be used – Discounted cumulative gain (DCG) 2 which gives higher ranked results more weight than lower ranked results – M measure 3 similar function as DCG which gives sliding scale of importance based on ranking 1 Vaughan, New measurements for search engine evaluation proposed and tested, Info Proc & Mang (2004) 2 Järvelin & Kekäläinen, Cumulated gain-based evaluation of IR techniques, TOIS (2004) 3 Bar-Ilan et al., Methods for comparing rankings of search engine results, Computer Networks (2006)