Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching the Web Ed Milne. Theme ● How to find information on the World Wide Web.

Similar presentations


Presentation on theme: "Searching the Web Ed Milne. Theme ● How to find information on the World Wide Web."— Presentation transcript:

1 Searching the Web Ed Milne

2 Theme ● How to find information on the World Wide Web.

3 How Search Engines Work ● Web crawlers are software programs that collect information from the web. – Read the web pages like your browser. – Add the significant words in the web page to the search engine database. These words are matched to your search requests. – Collect references to any other web pages. ● The crawler uses these references to find other pages to analyze ● The search engine uses the number of references to a page as a factor to determine its importance.

4 Responding to a Query ● When you enter a query. The work is divided among many computers in the search engine's server farm – This accounts for, say, Goodie's fraction of a second response time

5 Search Engine Optimization ● For all practical purposes, only the first few entries in a search list will be explored by users ● The SEO industry provides advice on how to get a web site to the top of the list. ● Factors that can affect ranking include – Number of other web pages that refer to the page – Placement of the search words ● In the title ● In a heading ● How close multiple search words are to each other

6 Search Engine Optimization ● The exact algorithms used by search engines like Google are closely guarded secrets and are frequently changed. ● Some search engines allow websites to buy a preferred position. ● Search engines often have policies regarding acceptable and unacceptable ways of optimizing search results ● See http://searchenginewatch.com/http://searchenginewatch.com/

7 Metadata ● In HTML, metadata is supposed to describe the page. The metadata description does not appear when you browse a page but can be used by the search engines to classify the page. ● Thus, you can get search results for a page where the words in your search do not appear on the page.

8 Sponsored Links ● Ethical search engines, like Google, will clearly identify paid links ● Less ethical search engines will not.

9 Search Engine Types ● Directories – Categorize web sites. You search the categories. ● General Search Engines – Collects information about web pages on the web. ● Metasearch Engines – Search other search engines rather than their own database. ● Some sites combine these types.

10 Directories ● The major directories are – Yahoo - http://dir.yahoo.com/http://dir.yahoo.com/ – Open Directory Project - http://www.dmoz.org/http://www.dmoz.org/ – Goggle Directory - http://www.google.com/dirhphttp://www.google.com/dirhp ● Based on the Open Directory Project – AltaVista - http://dir.yahoo.com/http://dir.yahoo.com/ – HotBot - http://www.hotbotdirectory.com/http://www.hotbotdirectory.com/ – Lycos - http://yellowpages.lycos.com/http://yellowpages.lycos.com/ ● Yellow pages ● Good for initial research into a subject.

11 Open Directory Project ● Crowd-sourced directory ● Volunteers create and maintain the entries ● Can be downloaded and used by any website ● In multiple languages ● Anyone can submit an URL for a site to be added to the ODP – Sites are not necessarily accepted

12 General Search Engines ● The major general search engines are – Goggle web pages - http://www.google.cahttp://www.google.ca – Bing - http://www.bing.com/http://www.bing.com/ – Yahoo web pages - http://search.yahoo.com/http://search.yahoo.com/ ● Uses the Bing search engine – AltaVista - http://www.altavista.com/http://www.altavista.com/ ● Owned by Yahoo – Lycos - http://www.lycos.com/http://www.lycos.com/ ● Best when looking for specific information.

13 Metasearch Engines ● The major ones are – HotBot - http://www.hotbot.comhttp://www.hotbot.com – Dogpile - http://www.dogpile.com/http://www.dogpile.com/ – Mamma - http://www.mamma.com/http://www.mamma.com/ – Ixquick - http://www.ixquick.com/http://www.ixquick.com/ – Metacrawler - http://www.metacrawler.com/http://www.metacrawler.com/ – Ask Jeeves - http://ca.ask.comhttp://ca.ask.com ● Good when looking for hard-to-find information.

14 Boolean Logic ● Specifies the logical relationship between terms used in your search. ● Named in honour of the British mathematician George Boole.

15 OR ● The web page is selected if either term is present. ● college OR university – Select web pages that contain either college or university or both ● OR and other Boolean terms must be in upper case

16 AND ● Both terms must be present to select the web page ● poverty AND crime – Select web pages that contain both poverty and crime ● Google uses an implied AND for all terms

17 NOT ● Exclude pages that contain the term. ● cat NOT dog – Find pages that contain cat but do not contain dog ● Leading minus sign (-) implies NOT – cat -dog ● You can use NOT to refine a search.

18 Parentheses ● Parentheses force the order of processing. ● (cats OR felines) AND behaviour – First select the web pages that contain the words cats or felines. From these pages, only select the pages that also contain the word behaviour. ● cats OR (felines AND behaviour) – Select the web pages that contain the word cats or contain both the word felines and the word behaviour.

19 Exact Phrase or Word ● Double quotes around terms selects the phrase rather than the individual words ● "montreal canadiens" – Selects the web pages that contain the phrase montreal canadiens ● montreal canadiens – Selects web pages that contain the words montreal and canadiens ● A leading plus sign (+) means only search for an exact match – Montreal +Canadien will not match canadiens

20 Symbols ● * - matches any known term – Google * finds a list of Google services ● A hyphen between two words indicates the words are strongly connected – e.g. wild-card matches wild card. wild-card and wildcard ● A dollar sign means a price – e.g. Nikon 400 and Nikon $400 give different results

21 Synonyms ● Google automatically uses synonyms – e.g. Child care will also search for childcare ● To suppress synonyms – Add a plus sign (+) before the word - +childcare – Enclose the word in quotes – "child care"

22 Word Stemming ● Word stemming causes the search engine to look for any words with the same root as the word that you enter. ● E.g. tests will also match on test, testers, testing and testable ● You can turn off word stemming by adding a plus sign (+) at the beginning of the word ● E.g. + tests will only search for tests

23 Advanced Search ● Most search engines have an Advanced Search option that accepts additional criteria for matching pages. Typical advanced search options are: – Language of the page – How old the page is - how long since the page was updated – Filter out pornography and foul language – Set the number of results by page

24 Google Advanced Search ● All these words – Normal query with implied AND ● Exact wording or phrase – Quoted phrase or word ● One or more of these words – - OR expression ● Unwanted words – NOT expression

25 Google Advanced Search ● Reading level – Basic, intermediate or advanced ● Results per page ● Language – Only return pages written in a specific language ● File type – e.g. Only find PDF files ● Domain – Only search within a specific domain

26 Google Advanced Search ● Date – How recent is the page ● Usage rights – e.g. Free to use or share ● Where keywords appear – Anywhere, page title, text, etc. ● Region – e.g. Canada

27 Google Advanced Search ● Numeric range – e.g. $1500..$3000 ● SafeSearch ● Find pages similar to a specific page ● Find pages that link to a specific page ● See also http://www.googleguide.com/advanced_operators _reference.html http://www.googleguide.com/advanced_operators _reference.html

28 Translation ● Some search engines provide the ability to translate pages written in different languages. E.g. you will get a different history of Canada if the page is written in French rather than English. – Google ● click on the Translate this page link or ● Copy the link and go to http://translate.google.comhttp://translate.google.com – Yahoo ● click on the Translate this page link or ● go to http://babelfish.yahoo.comhttp://babelfish.yahoo.com

29 Translation ● Google's Chrome browser automatically detects web pages written in a different language and offers to translate them

30 Cached Pages ● Google saves the version of the page used by its web crawler to create the index. This can be different from the current version of the page. ● Click on the Cached link. ● This also gives you access to the page on a 404 error.

31 Other Google Functions ● If you have a Google account and Google knows your location – Sunrise gives the time of sunrise – Sunset gives the time of sunset – Weather gives a four day weather report ● Calculator – e.g. 5*9+(sqrt 10)^3 gives 76.6227766 ● Currency exchange – e.g. 100 USD in CAD gives 100 U.S. dollars = 96.470157 Canadian dollars

32 Other Google Functions ● Unit conversion – e.g. 10 cm in inches gives 4.13385827 inches ● Movies – Provides a description of movies playing locally ● Local businesses – e.g. Kingston on chinese food ● Dictionary – e.g. Define: monarchy

33 Other Google Functions ● Time – e.g. time london england ● Sports scores – e.g. ottawa senators ● Postal or zip codes – e.g. k7p 2t1 – gives a Google map ● Package tracking – Enter the UPS or Fedex tracking number ● Stocks – e.g. tsx:bce

34 Archive.org ● http://www.archive.org contains the archives of the internet http://www.archive.org – Run by the Smithsonian ● Moving images - 84,851 movies ● Live music archive - 90,226 concerts ● Audio - 850,843 recordings – e.g. Old Time Radio recordings ● Texts - 2,743,181 texts

35 Wayback Machine ● The part of Archive.org that contains the archives of the World Wide Web – Currently 2 petabytes and growing by 20 terabytes per month – Generally it takes 6 months to transfer the data ● You can search here for almost any page that was ever on the web rather than just the current pages.

36 Regional Search Engines ● Some search engines or subsidiaries of the general search engines only search within a country or region rather than the entire (US dominated) web. – e.g. uk+web+search lists various UK search engines http://www.wrx.zen.co.uk/searchuk.htm http://www.wrx.zen.co.uk/searchuk.htm

37 Web Rings ● Web rings are self-selected groups of web sites on the same topic. ● A directory of web rings is http://dir.webring.orghttp://dir.webring.org ● Now obsolete

38 Speciality Search Engines ● Also called topical search engines, vertical search engines or vortals (vertical portals). ● Search only on a specific topic or type of information, like MP3 files, legal information or web rings.

39 Speciality Search Engines ● General search engines have taken over this sector e.g. – Google Videos – Google Books – Google Scholar – Google Photos – Google Images – Google Finance – Google News – Google Blogs

40 Search Tools ● Programs (often shareware or freeware) that help you search. ● E.g. WebFerret – http://www.webferret.com/ http://www.webferret.com/ – One of many such tools

41 WebFerret ● Metasearch engine – submits your query to multiple web search engine ● Features – Save your query and results – Filter out pornography and foul language – Full Boolean logic with multiple levels of parentheses – Works with all versions of Windows

42 Newsgroups ● Newsgroups are forums where individuals post messages and replies to other messages. ● Newsgroups give you access to other people, their knowledge and expertise. ● I have found Google groups particularly valuable for resolving computer problems.

43 Google Alerts ● Lets you establish a standing search ● Google sends you an email with a summary of any new web pages or blogs that contains the search terms ● http://www.google.com/alerts http://www.google.com/alerts

44 Bookmarks ● Web pages are stateless – They have no record of where you have been or what parameters you have entered on other pages ● Parameters are passed as part of the URL – ?

45 Bookmarks ● When you bookmark a page, the parameters are included – e.g. http://www.google.com/search?hl=en&lr=&q=%22se perate%22+-talk+- user+site%3Aen.wikipedia.org&aq=f&aqi=&aql=&o q=&gs_rfai= http://www.google.com/search?hl=en&lr=&q=%22se perate%22+-talk+- user+site%3Aen.wikipedia.org&aq=f&aqi=&aql=&o q=&gs_rfai ● You can bookmark and reproduce searches

46 Searching within a page ● Type Ctrl+F to open a search text box ● Function of the browser not the web – Exact function varies with the browser

47 Safe Surfing ● Log in under a user account – The operating system will prevent the installation of any software without your knowledge ● Check the security options in your browser – e.g. Firefox 4 ● Warn me when sites try to install add-ons ● Block reported attack sites ● Block reported web forgeries ● Use a good firewall and virus scanner ● Use a tool like Web of Trust or AVG Safe Search which flags suspicious sites


Download ppt "Searching the Web Ed Milne. Theme ● How to find information on the World Wide Web."

Similar presentations


Ads by Google