Exploring the Internet Topic: Searching the Web 91.113-021 Instructor: Michael Krolak 91.113-031 Instructor: Patrick Krolak See Lecturer notes

Slides:



Advertisements
Similar presentations
ONLINE RESOURCES. QUESTION Do you ever go onto the Internet and plan to only spend a small amount of time looking for something and spend much longer.
Advertisements

ONLINE RESOURCES. QUESTION Do you ever go into the Internet and plan to only spend a small amount of time looking for something and spend much longer.
Exploring the Internet Instructor: Michael Krolak.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Information on the Internet. http hypertext transfer protocol Web clients (browsers) make request to the web server. Looks for web page written in HTML.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
The Electronic Library and other means of finding information on the web Patrick & Michael Krolak See: Week 4 in Intralearn or
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Exploring the Internet Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005.
Exploring the Internet Topic: Searching the Web Instructor: Michael Krolak Instructor: Patrick Krolak See Lecturer notes
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
What is a search engine? A program that indexes documents, then attempts to match documents relevant to a user's search requests. The term search engine.
Internet Research Search Engines & Subject Directories.
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
What are search engines? Tools used for locating web pages Automated software programs known as spiders or bots to survey the Web and build their databases.
Historical Background An internet server from which hierarchically-organised text files could be retrieved from allover the world. Developed at the University.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Lesson 12 — The Internet and Research
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Operating Systems Concepts 1/e Ruth Watson Chapter 12 Chapter 12 Introduction to the Internet Ruth Watson.
 Search Tools:  There are many type of search tools that you can use to locate information on the World Wide Web.  Various search tools are developed.
Searching the Internet CSCI-N 100 Department of Computer and Information Science.
Promotion & Cataloguing AGCJ 407 Web Authoring in Agricultural Communications.
1999 Asian Women's Network Training Workshop Tools for Searching Information on the Web  Search Engines  Meta-searchers  Information Gateways  Subject.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Introduction to Search Tools
Where do I find it? Created by Connie CampbellConnie Campbell.
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Search Engines.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Search Tools and Search Engines Searching for Information and common found internet file types.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
Internet Research – Illustrated, Fourth Edition Unit A.
Unit 1—Computer Basics Lesson 3 The Internet and Research.
1 SEARCHING FOR TRUTH Locating Information on the WWW chapter 5.
Search Engines A Web search engine is a tool designed to search for information on the World Wide Web. The search results are usually presented in a list.
Web Search Architecture & The Deep Web
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
W orkshops in I nformation S kills and E lectronic R esources Oxford University Library Services – Information Skills Training Finding quality information.
Internet Power Searching Finding Pearls in a Zillion Grains of Sand By Amelia Kassel Found in “Technical Communication” on page 198.
Learning how to search on the web “If all you ever do is all you’ve ever done, then all you’ll ever get is all you’ve ever got.” (author unknown)
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Search Engines and Search techniques
CIW Lesson 6 Web Search Engines.
Search Engines & Subject Directories
Internet Research Third Edition
Data Mining Chapter 6 Search Engines
A SPEAKER’S GUIDEBOOK 4TH EDITION CHAPTER 10
Search Engines & Subject Directories
Search Engines & Subject Directories
Presentation transcript:

Exploring the Internet Topic: Searching the Web Instructor: Michael Krolak Instructor: Patrick Krolak See Lecturer notes Authors: P. D. & M. S. Krolak Copyright 2005

Search for Information on the Web Finding information on the web requires some concepts of how the various types search engines work. Archives that capture the changes in the documents on the web are highly useful for those in the social sciences, technology, and business dynamics. “Intelligence is not the ability to store information, but to know where to find it.“ - Albert Einstein

How do we find information? Memory Media –Books –Movies –Music –Art Observe Ask other people

The Problem with the Internet The “Surface Web” contains 2.5 Billion pages. Each day 7.5 million web pages are added to the World Wide Web Information is submitted to the web without any context or test of validity

The Archives of the Web 1.Archival of the Web’s websites 2.Google’s archive of the Internet newsgroups. 3.Google’s archive of the world’s newspaper archives

The Way Back Machine Frustrated by dead links – there is an answer. The WayBack Machine at Just fill in the URL of the dead link and the links history will give the history of the link (how the page changed over time) and allow you to view the dead link.

Google’s Newsgroup archive Archives over 100,000 groups Goes back for some groups over 30 years. There are for fee sites that provide competitive services. Depending on the group it can provide a treasure trove of insight into the cyber information society and it early history. Not all messages in the database are true, have merit or redeeming value, or are appropriate for children.

Google Newsgroup Google has a major effort to archive newsgroups

Google’s News Project Google plans to digitize the 250 years of published news and other data online.

Google’s news archive Google is creating an archive that will provide a free search of both free and fee publishers archives The goal is to provide search for articles going back to The project involves a partnership of many major information publishers

Google’s News archive The archive is located at: chivesearch The archive works like any other Google search and the results can be requested in time line order.

Searching for Information on the web 1.Search engines 2.Directory searches 3.Meta-Search engines

What is a Search Engine? search engine n. 1. A software program that searches a database and gathers and reports information that contains or is related to specified terms. 2. A website whose primary function is providing a search engine for gathering and reporting information available on the Internet or a portion of the Internet. Source: The American Heritage® Dictionary Copyright © 2002, 2001, 1995 by Houghton Mifflin Company. Published by Houghton Mifflin Company.

Search Engines Search engines have two parts: 1.The search sends out onto the Internet a software called a spider or bot (robot). Traces all the links and returns all the pages found. The pages are characterized by algorithms and stored in databases 2.The retrieval system that takes a query and maps against the databases. The retrieval rank orders the responses by relevance Each search engine uses a unique technique for retrieval and ranking.

What is a spider? n. 1. An automated program which crawls over the World Wide Web, gathering web pages for search engines. Spiders will ignore sites that explicitly state not be indexed by the search engines. Also referred to as a webcrawler, crawler, or bot

What are Meta Tags meta tags n. 1. Attributes that describe information about the content of the document. Some spiders use these tags to determine the relevance of a site to future queries. Example

How do search engines work?

Meta Search Engines Meta search engines are search engines that use their own resources for answering the question but they mostly form the query from the user input and package it and send it off to many other search engines simultaneously (the process is called spawning) and then wait until the replies come back. After a fixed time the meta takes the responses received and pulls them together into a report. There are many ways to create a meta search based on the idea. Some allow you to search only the web, others newsgroups, newspapers, and scientific journal.

Why is an understanding of how a search engine works important? From the view of a user: –The user wants to find the information with as few downloads as possible. –The easier to use and the more accurate the ranking the better. From the view of a web site developer: –The developer wants the site to found by in the first 5-10 ranked responses to a query. –The merit of a web design is often based on the search rankings. This requires a knowledge of a given search engine ranks a page.

When in doubt ask a librarian: The librarian is a trained professional and are well versed in using the various WWW resources for finding answers to a vast array of subjects. The librarian should be used for difficult searches; but the student will wisely observe, learn, and contemplate the librarian's techniques, resources, and methods.

What is a Subject Directory? subject directory n. 1. An Internet research tool on the World Wide Web that organizes Internet resources by subject headings and subheadings. Subject directories are usually compiled by human beings who apply some selection criteria to resources included in the database.

Examples of Subject Directories Yahoo! BUBLhttp://bubl.ac.uk/ Internet Public Libraryhttp:// About.comwww.about.com Jump Citywww.jumpcity.com Joe Anthttp://

What is a Meta Search Engine? meta search engine n. 1. Meta search engines are search engines that use their own database as well as sending the query to many other search engines simultaneously (called spawning) and report the unique responses from other search engines. 2. Meta search engines that are limited to only the web, newsgroups, newspapers, and scientific journals.

Examples of Meta Search Engines Ask Jeeves -- frequently get the answer in the first pass. Jeeves allows queries in natural language.Ask Jeeves Dogpile -- for its variety of sources (web, newsgroups, newspapers)Dogpile Ixquick Metacrawler ProFusion

Current Information on search engines Search Engine Watch Is a source for comparing search engines and keeping up with innovations as they occur in the field.Search Engine Watch Recently Google was asked to turn over records about their customers search topics and the number of times pornographic information was accessed. The Federal government was looking to prove its case for protecting children. MSN and Yahoo had complied with the request. Google also recently complied with the Chinese government request to censure political inquiries on the Chinese version of Google.

The Deep Web

What is the invisible or deep web Invisible Web (n.) Also referred to as the deep Web, the term refers to either Web pages that cannot be indexed by a typical search engine or Web pages that a search engine purposely does not index, rendering the data “invisible” to the general user. One of the most common reasons that a Web site’s content is not indexed is because of the site’s use of dynamic databases, which opens the door for a potential spider trap. Web pages can also fall into the invisible Web if there are no links leading to them, since search engine spiders typically crawl through links that lead them from one destination to another. Data on the invisible Web is not inaccessible; the information is out there—it is stored on a Web server somewhere and can be accessed using a browser—but the data must be found using means other than the general-purpose search engines, such as Google and Yahoo!.WebWeb pagesindexed search engineWeb pagessearch enginedynamicdatabasesspider trapWeb serverbrowser search engines Source:

The deep web The deep web is not mysterious, it simply means that normal search engines that use spiders that go from one link to another will not work with pages that are generated on the fly from data requested from a database, or not linked to other data, etc. Example of a deep website are the yellow or white pages, catalogues, and patents. Google can index search pdf, text, and word documents

What is the Deep Web? Estimated to be 500 times (1.25 trillion web sites) the size of the surface web.

Using the Search tools to find information of the web

Successful searching Plan your search: 1.What are the words that will only be on the right web page. Should they all be there or are there alternatives. The most specific concept is the best. 2.If you do not know your ideal topic well, use a meta search engine to get the smart. Then refine your search with a search engine like google or altavista. 3.Use a virtual library site to find information reviewed by experts if it is technical.

What is Boolean Logic? We use Boolean Logic to evaluate the truth of one or more propositions. There are three important operators: AND, OR, NOT AND – only true if A and B are both true. OR - only true if either A or B is true. NOT - only true when A is false. When searching for information, we use Boolean logic to find results that are relevant to our search terms. If a web page is relevant to a search term, the search engine evaluates the page as true.

Examples of Searching with Boolean Logic Yankees and Choke –All web pages that contain the terms Yankees and Choke. Yankees or Choke –All web pages that contain the word Yankees. –All web pages that contain the word Choke –All web pages that contain the terms Yankees and Choke Choke and not Yankees –All web pages that contain the word Choke, but don’t contain the word Yankees

More Advanced Uses of Boolean Logic If you are looking for a proper name, a phrase, or an other collection of words that normally are found together, then enclose them in double quotes, i.e. "President Gerald Ford". If the web page should have one or more words that must be on the page, then use the logical And, i.e. President And Ford And "United States". If the web page may have different forms of the name, or titles, etc. then use the logical Or, i.e. President Or "Vice President" Or Representative And "Gerald Ford". If document should exclude a word or phrase, then use the logical Not, i.e. "Gerald Ford" Not "Ford automotive" and Not "Ford car" and Not "Ford truck".

Other Helpful Hints While not Boolean logic, some search engines allow concepts like -- NEAR and FOLLOWED BY are also allowed, to indicate the relationship of the words or phrases other words and phrases. Normally these relations can be which comes first or whether the word is within a certain number of words to the first word. This concept is called proximity logic. Not all search engines use the AND, OR, NOT notation some like Alta Vista use " +" for AND and "-" for NOT.

Tips for Using Search Engines When searching for a large scale database, it is important to be extremely precise. Avoid using vague or common words that will only produce millions of pages. Read the instructions for each new search engine you use. There are many different methods of searching between the search engines and subject directories.

Finding Audio and Video video, audio, newshttp:// – Good source of imageshttp://images.google.com – One of the few search engines that provides searches for video. – Provides limited video and image searching capabilitieswww.fazzle.com -- A new beta product may have bugs. Stock photo imageshttp://

Finding Movies and Films

Sources of Audio Information An Audio Archive (software & Music, etc.) Speeches –

Dogpile for finding non- text based files The number of sites that allow so called “anonyms or guest” ftp directories is now greatly diminished. Due to security considerations most sites do not have non-text directories that are open to search and file download. Hence Dogpile no longer maintains a search engine that can find files in ftp sites, but it still allows searches for images, audio, and videos.Dogpile Similarly tools like Archie and Gopher are now obsolete