Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Database vs. Web Search Engines 571-Information Access and Retrieval.

Similar presentations


Presentation on theme: "Online Database vs. Web Search Engines 571-Information Access and Retrieval."— Presentation transcript:

1 Online Database vs. Web Search Engines 571-Information Access and Retrieval

2 Online Database

3 Overview of Online Database 30 years (William (2006) From 1975 to 2005, databases increased considerably, from 301 to 17539 database records from 52 million to 21.02 billion, and database entries from 301 to 16532. The number of producers has not grown as fast as databases because one producer might publish multiple databases. The number of publishers increased from 200 to 3208 from 1975 to 2005. In 2005, the average producer produced 5.13 databases. Since each vendor might provide services from multiple databases, the number of vendors grew at a slower pace from 105 to 2811.

4 Types of search Known item search Specific-information search Subject search Exploring/Browsing information Others

5 General search steps Search plan System access Database selection (Optional) Search query formulation Preliminary results evaluation Search query reformulation (Optional) Final results evaluation (Optional)

6 Some search Strategies Building blocks combine sub-searches combine sub-searches Citation pearl growing use the index term to retrieve further similar citations use the index term to retrieve further similar citations Successive fractions reduce the set using narrower index terms reduce the set using narrower index terms Most specific facet first start with the most specific concept start with the most specific concept

7 Search Strategy Formulation Imagine the title and keywords of relevant documents Boolean and, or, not and, or, not proximity operator adj, near, freq, atleast adj, near, freq, atleast search fields/segments au, co, ti, de au, co, ti, de Use controlled vocabulary to identify context truncation string string plural plural single character single character

8 How to find related Words? Personal knowledge terminology terminology relevant document relevant document Term mapping provided by system Feedback from search results title, descriptor, text title, descriptor, textOthers

9 Search Strategy Reformulation System search fields search fields vocabulary vocabulary more like this more like this refine search refine search Limit/focus search Limit/focus searchUser relevance feedback relevance feedback

10 Narrow search Find the right database Add another word or phrase Negative feedback (exclude one aspect of the search statement) Exclude related terminology Restrict to certain field title, descriptor, frequency, etc. title, descriptor, frequency, etc. Restrict to certain types of publication Restrict to certain time range Restrict to certain language

11 Evaluate search results Known item title, author, publication, date title, author, publication, date Specific information Key Word In Context (KWIC) Key Word In Context (KWIC) Subject information title, abstract, descriptor, full text title, abstract, descriptor, full text

12 Check for Tutorial for online databases http://www.uwm.edu/Libraries/ris/courses/sois510/ http://training.dialog.com/onlinecourses/recorded/ http://www.sois.uwm.edu/DE_Info/cahansen/WT3/ WT3.html http://www.sois.uwm.edu/DE_Info/cahansen/WT3/ WT3.html

13 Web Search Engines

14 Characteristics of web IR Web documents Distributed stored Distributed stored Growing in size Growing in size Deep and surface documents Deep and surface documents Multiple formats Multiple formats Various in quality Various in quality Frequently changed Frequently changed Others OthersUsers Various user groups Various user groups Others OthersSystems

15 What is search engines? Users InternetSearch Engine

16 Key components Data collection Web spider or crawler Web spider or crawler Data processing Ranking Ranking Indexing Indexing Query formulating Interface InterfaceMatching Result displaying

17 How ranking works? Literally match Measure of word significance: The frequency of word occurrence (term frequency) Measure of word significance: The frequency of word occurrence (term frequency) location: relative position of a word location: relative position of a word Examples Examples http://www.searchenginewatch.com/webmasters/w ork.html http://www.searchenginewatch.com/webmasters/w ork.html http://www.searchenginewatch.com/webmasters/ra nk.html http://www.searchenginewatch.com/webmasters/ra nk.html

18 How ranking works? (Cont’) Hyperlinks (Brin&Page 1998) PR(A)=(1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn)) * PR(A)=(1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn)) * PA(A)—Page Rank of document A C(A)—Number of outgoing links from document A d—Dumping factor between 0-0.85 * http://infolab.stanford.edu/~backrub/google.html

19 Other Types of Search Engines Directories hierarchically organized indexes that allow you to browse through lists of web sites by category or subject hierarchically organized indexes that allow you to browse through lists of web sites by category or subject Meta-search engines query multiple search engines simultaneously and return a complete set of hits query multiple search engines simultaneously and return a complete set of hits Specialized search engines Create a database of sites on a specific topic using robots or spiders Create a database of sites on a specific topic using robots or spiders For specific user groups For specific user groups Visualization Visualization

20 Examples of Directories Yahoo Directory http://dir.yahoo.com/ The Internet Public Library http://www.ipl.org/ Librarians’ Index to the Internet http://sunsite.berkeley.edu/InternetIndex INFOMINE, from the University of California, is a good example of an academic subject directory INFOMINE, from the University of California, is a good example of an academic subject directoryINFOMINE

21 Examples of Meta-Search Engines MetaCrawler www.metacrawler.com Ixquick http://ixquick.com/ Clusty http://clusty.com/ Mamma www.mamma.com

22 More examples of Specialized Search Engines Career Mosaic www.careermosaic.com Diseases, Disorders and related topics www.mic.ki.se/Diseases/index.html The Day in History www.historychannel.com/today Shareware.com www.shareware.com

23 User Behaviors Web queries are short, not much modified, very simple in structure Very few advanced search features, if do so, half of them are mistakes View only first one or two pages No interested in relevance feedback

24 User search patterns in different environments (Jansen &Pooch, 2001)

25 Appendix A: Tips Most search engines employ the principles of Boolean logic in the formulation of search queries. If you take the time to understand the basics of Boolean logic, you will have a better chance of search success. Search engines tend to have a default Boolean logic. This means that the space between multiple search terms defaults to either OR logic or AND logic. This has become a de facto standard. It is imperative that you know which logical operator is the default. Nowadays, the default logic tends to be AND, but you should always check the site's Help file to make sure. Another de facto standard is the requirement to search for phrases within quotations, e.g., "dealth penalty".

26 Appendix A (Cont’) If the option is available, use proximity operators (e.g., NEAR) if these are available rather than specifying an AND relationship between your keywords. This will make sure that your search terms are located near each other in the full text document. The closer your terms are placed, the more possibly relevant the document will be. Google does proximity searching by default. Field searching is another extremely important way of limiting your search results in large search engines that contain millions of full-text files. For example, TITLE:slavery in a search engine such as AltaVista will bring you more relevant hits than merely searching on the keyword slavery. in a search engine such as AltaVista will bring you more relevant hits than merely searching on the keyword slavery. To enhance subject searches, try the URL field to narrow your results. The URL field offers a good way to search for certain subject terms. This is because of the make-up of the URL.

27 Appendix A (Cont’) The Internet is a self-publishing medium. It is not a library of evaluated publications selected by professionals. Rather, the Internet is a bulletin board containing everything from the definitive to the spurious. Everything, everything must be analyzed for its appropriateness for research use. Before you select a search tool, always think about your topic and what you are trying to find. Once you begin your research, be sure to try out a handful of sites. Don't rely on a single site. Don't just Google everything! Google is great, but there are other useful tools on the Web, too. Google has become so popular that many people use this tool exclusively, and miss out on others that might be more useful for their particular search. Others?

28 Appendix B Anatomy of a URL Anatomy of a URL This is a URL on the CNN home page: http://www.cnn.com/feedback/comments.html This is a URL on the CNN home page: http://www.cnn.com/feedback/comments.html This URL is typical of addresses hosted in domains in the United States: Protocol: http Host computer name: www Second-level domain name: cnn Top-level domain name: com Directory name: feedback File name: comments.html The directory name and file name often contain subject terms. These can be searched with the URL field. For example, URL:slavery will give you more relevant results than the keyword slavery by searching for this term as a directory name or a file name.

29 Appendix C Search engine comparison chart http://www.infopeople.org/search/chart.html http://www.infopeople.org/search/chart.html http://www.infopeople.org/search/chart.html http://www.searchengineshowdown.com/featu res/ http://www.searchengineshowdown.com/featu res/ http://www.searchengineshowdown.com/featu res/ http://www.searchengineshowdown.com/featu res/Tutorials Google Tutorial Google Tutorial Google Tutorial Google Tutorial


Download ppt "Online Database vs. Web Search Engines 571-Information Access and Retrieval."

Similar presentations


Ads by Google