Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Information retrieval for scientists Vrije Universiteit Brussel Information and Library Science, University of Antwerp Belgium.

Similar presentations


Presentation on theme: "1 Information retrieval for scientists Vrije Universiteit Brussel Information and Library Science, University of Antwerp Belgium."— Presentation transcript:

1 1 Information retrieval for scientists Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information and Library Science, University of Antwerp Belgium Presented at VUB Brussels for doctoral researchers 3 sessions, each of 3 hours, in May 2004

2 2 The slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ (note: BIBLIO and not biblio)

3 3 SESSION 1 of 3 Basic concepts of information Databases and computerized information retrieval (on fundamental difficulties in information retrieval, and how to take these into account.) Thesaurus systems for better information retrieval. - contents - summary - structure - overview of this tutorial

4 4 SESSION 2 of 3 Online access information sources and services (1) »types of information sources »a systematic overview of information sources and services that are accessible through the Internet: »dictionaries and encyclopedias »Internet subject directories for browsing »Internet indexes for text searching - contents - summary - structure - overview of this tutorial

5 5 Online access information sources and services (2) »making better search queries with general thesaurus systems that are available free of charge »meta-search systems »the invisible web and how to exploit its contents, even though it is hidden away from text search systems »finding images/pictures - contents - summary - structure - overview of this tutorial

6 6 SESSION 3 of 3 Online access information sources and services (3) »using image retrieval systems on the WWW to find relevant texts »finding books »finding journal articles »fee-based databases »using fee-based electronic journals »open access electronic journals »a link resolver to find the appropriate e-document - contents - summary - structure - overview of this tutorial

7 7 How to evaluate information retrieval systems and queries? How to evaluate the quality of information sources? - contents - summary - structure - overview of this tutorial

8 8 Is there a subject about which YOU personally would like to learn more?

9 9 -Interruptions -Questions -Remarks -Discussions are welcome

10 10 About “information” Information concepts ****

11 11 Our world: future trends Future trends in our world Complexity  Dynamics and evolution  Speed and acceleration  Internationalization  Globalization  Economic products less based on natural resources and more on “knowledge” Answers / Requirements / Solutions / Reactions Knowledge and skills  Adaptability  Flexibility  Global co-operation  Mobility  Education, research, exploitation of knowledge is important ***-

12 ?? Question ?? What is “information literacy”? ***- 12

13 13 Information literacy: definition (Part 1) 1. To understand in general the nature, the value and the importance of information. The ability to recognize when information is needed, when information can help to make progress, to solve a problem, to make a decision… ***-

14 14 Information literacy: definition (Part 2) 2. The ability to search, find, locate relevant, needed information! (Identify concepts, select information sources, formulate queries….) The ability to cope with information overload. ***-

15 15 Information literacy: definition (Part 3) 3. The ability to evaluate the suitability of retrieved information! The ability to manage information! (Saving information, ordering information in folders on computer, finding information in your own collection…) ***-

16 16 Information literacy: definition (Part 4) 4. The ability to apply/use information effectively the ability to keep up-to-date on the topics/subjects that are relevant for you. The ability to communicate with others, to share information with others! (Using citations, live presentations, electronic mail, your own WWW site…) ***-

17 ?? Question ?? Compare “information” for instance with “bananas”. ***- 17

18 18 Information versus other products = bits versus atoms The essential difference between information and other economical products or natural products is that information on computers (such as databases) consists of bits (and bytes), while other economic / natural products (such as bananas) consist of atoms. This has many interrelated consequences. ***- 01010101101011010010

19 19 Information: some strange properties (Part 1) Information is never consumed and does not deteriorate. However, nevertheless information becomes obsolete; speed of delivery can be crucial. The context is important. There is no agreed measure of a unit of information. The price of an information item is not well linked to its value in a particular situation. Moreover, one cannot well quantify the benefit/value of information. ***-

20 20 Information: some strange properties (Part 2) One information item can be available to different persons at the same time. Information can be well reproduced, which makes it cheap for wide consumption. However, copyright can keep the price high. Most digital information items (documents) can be changed, modified, falsified, manipulated… easier than physical products/items.  ”Is this document real, authentic, original?” ***-

21 21 ***- Information sources: people and documents Information sources come essentially in two formats: »less formal: people communicating by —telephone —electronic mail,… »more formal: documents such as —hard copy documents —electronic, digital documents; computer-based files Here we focus mainly on information that is stored in documents.

22 22 The flow of documentary information through many channels Reader / User / Receiver Many media / channels ***- Author / Creator / Sender Author / Creator / Sender

23 23 The flow of documentary information with primary and secondary sources Reader / User / Receiver ***- Author / Creator / Sender Author / Creator / Sender Primary sources / systems: mainly Journal articles / Books / Electronic mail / Online sources /... Primary sources / systems: mainly Journal articles / Books / Electronic mail / Online sources /...

24 24 The flow of documentary information with primary and secondary sources Reader / User / Receiver Secondary sources / systems: mainly Reference works (printed, CD-ROM, online) Library catalogues, including OPACs... Secondary sources / systems: mainly Reference works (printed, CD-ROM, online) Library catalogues, including OPACs... **** Author / Creator / Sender Author / Creator / Sender Primary sources / systems: mainly Journal articles / Books / Electronic mail / Online sources /... Primary sources / systems: mainly Journal articles / Books / Electronic mail / Online sources /...

25 ?? Question ?? Why is secondary information created? ***- 25

26 26 The role of secondary information sources The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item. Secondary information tries to bring some order in the great chaos. ****

27 27 Various categorisations of documentary information sources Information sources can be categorised in various ways. For instance: **** Primary Secondary Hard copy / not digital Digital Offline Online Text Image Sound Animation/ video Software Data Interactive Books Serials

28 28 Past Now Future Retrospective searching versus current awareness: scheme **** Retrospective searching Current awareness

29 29 Retrospective searching versus current awareness: the basics ***- Searching for suitable information takes the form of retrospective searching mainly when we enter a new, unknown field or subject domain where we need supporting information. Once that we have found enough information, we need to keep aware of new information, because we are always challenged »by the continuous flow of newly generated information and »by the changing environment in which we work and live.

30 30 What is a current awareness service? ***- A service which provides the recipient with information on the latest developments within the subject areas in which he/she has a specific interest or need to know. Aims: »Saving time »Covering many information sources »...

31 31 Information retrieval: evolution of storage and distribution media **** 1450printing with reusable characters/fonts 1975 + online access databases from the 1970sgrowing Internet 1985 + CD-ROM 1990 + World-Wide Web (based on the Internet)

32 32 Information retrieval: end user or information intermediaries End-user Information intermediary (Broker or library or...) Information ****

33 33 End user versus information intermediary People can retrieve information themselves, directly as so- called “end-users”. However, »the information landscape is complex, »it may cost a lot of the time to find the right information, »it may be costly to search for information Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker. ****

34 34 About “information” Computer- and network-based information ****

35 35 Information: from bits to meaningful information Digital computer data = bits or 01 Program code, meaningful for and to be interpreted / executed by a suitable / compatible computer Information = “documents”, meaningful for and to be interpreted by human beings ****

36 36 Information: digitally stored and managed information Categories of digital, computer readable information / data, forming electronic “documents”, understandable by human beings. 01 text numbers images video sounds multimedia + ****

37 37 01 Digital information Multimedia / Hypermedia Information: types of digital information Linear text Hypertext Static images Video Sound Programs for computers ****

38 38 **** Online / Networked CD-ROM Update speed Volume Some publication media compared Printed

39 39 Electronic publishing: evolutionary stages ***- To produce print on paper, using computers Dual mode: on paper and as database Simulation of print on computer display Repackaging of data for computer display (e.g. text to hypermedia) Creation by author directly for the computer (hypermedia) and no printed version

40 40 ?? Question ?? Which advantages do you see in electronic publishing, in digital information sources? ***-

41 41 Publications on CD-ROM or online: advantages compared with hard copy ***- Can be cheaper to produce, to transport and to store. Can offer better search features. Can offer various output formats. Can offer fast and efficient “copy and paste” by the reader/user of information to other documents. Taken together, these features allow more efficient access to large, high volume documents or databases.

42 42 Convergence of media to computer-based communication Already based on computers and networks: »CD-ROM / DVD / Hypermedia / Remote login into a computer / File transfer from a computer / Electronic mail / Usenet / the World-Wide Web... Evolving towards a computer- and network-based technology: »Telephone / Radio / Television / Video / Fax / Journals / Books /... ***-

43 43 Scientific publishing in Utopia: an ideal scheme Many authors Many readers / users Many editors / publishers Online remote access multimedia database server Many database search clients and user interfaces Many database search clients and user interfaces one global, international computer data communication network author = reader in science ****

44 44 ?? Question ?? Indicate the differences between reality and that simplified, ideal scheme of the information flow. ****

45 45 ?? Question ?? Which basic problems/difficulties hinder people to find / access / use information? ****

46 46 Information retrieval: basic difficulties (Part 1) **** In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required. In many cases the need for information cannot be expressed completely in the form of a query. One of the reasons is that the complete context of the information need should ideally be expressed, including the knowledge and background of the searcher.

47 47 Information retrieval: basic difficulties (Part 2) **** Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples: People use different languages and different terms (vocabularies) to describe a similar concept. Concepts, vocabularies and meanings of words and terms may change over time. Meanings of words / terms may depend on their context.

48 48 Information retrieval: basic difficulties (Part 3) **** Many different and imperfect retrieval systems should or must be used. »To retrieve and access the information that is in principle available, many different retrieval systems must be available and be mastered. »Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.

49 49 Information retrieval: basic difficulties (Part 4) **** Information overload Users are often overwhelmed by the amount of available information and by the large influx of new information.

50 50 Information retrieval: basic difficulties (Part 5) **** The price (or inaccessibility) of particular information A lot of information cannot be obtained or at least not free of charge.

51 51 Information retrieval: browsing and searching as methods To make information available, the producer of an information system can offer to the user basically two different ways for retrieval of the right information from the system: »by browsing or navigating or »by searching. ***-

52 52 Browsing a logically ordered list of terms Logical order / Sorted by subject Table of contents Classification Hypertext-Hypermedia: jump from a page to a linked page Searching by submitting a search term to the system Alphabetical order / Not sorted by subject Alphabetical index Thesaurus Hypertext-Hypermedia: search built in a page Information retrieval: browsing versus searching ***-

53 53 Information retrieval: browsing systems support In browsing systems, the user can follow some of the paths offered by the system. The information is ordered, according to subject for instance. The user does not have to use his own words to indicate his needs. ***-

54 54 Information retrieval: browsing systems To support organising and browsing of information items, some type of classification is applied in many cases. ***-

55 55 Information retrieval: examples of browsing systems Examples of browsing systems are »a table of contents in the front part of a book, »a set of books placed on shelves according to some classification system, »a hypertext hierarchical directory on the WWW, or more generally all hypermedia systems. ***-

56 56 Information retrieval: search systems In search systems, the user has to express his need for information by formulating a query that is normally using a natural language or a more formal language. In this case the information is normally not ordered according to some logic, but in most cases in the form of a well structured compilation of items of a similar form, in the form of the records of a database when a computer system is applied. ***-

57 57 Information retrieval: examples of search systems Examples of search systems are »the index (the register) in the back part of a book, »a library or museum catalogue with a search interface, »a search form on a web page. ***-

58 58 Advantages: »Browsing is relatively easy for the user.  Difficulties for the user: »Allows the user to explore the information space by roads constructed based on the view of the world of the system designers, and not based on his own view.  Difficulties for the producer: »It is relatively costly to construct an information system based on browsing. Information retrieval: pro and contra of browse systems ***-

59 59 Advantages: »Creation of keyword indexes for fast searching is relatively simple and cheap and can be automated.  Difficulties for the user: »Searching is hindered by vocabulary / language problems. »The users cannot always fully articulate their needs. Information retrieval: pro and contra of search systems ***-

60 60 Databases and computerized information retrieval Introduction ****

61 61 What is a database? A database is a collection of similar data records stored in a common file (or collection of files). ****

62 62 Types of databases: examples Examples: The databases that form the basis for »catalogues of books or other types of documents »computerized bibliographies »address directories »a full text newspaper, newsletter, magazine, journal + collections of these »WWW and Internet search engines »intranet search engines »... ****

63 63 Information management Information retrieval Information retrieval and related activities: figure Image retrieval Text retrieval Presentation of information ***-

64 64 Information retrieval and related activities: explanation “Text retrieval” can be considered as a part of the larger concept “information management”. There is a great overlap: “text retrieval” - “image retrieval” because image retrieval is in most cases based on text retrieval: in most cases retrieval of images is not based on computerized investigation of the images themselves, but on searches in the text that accompanies each image. ***-

65 65 Information retrieval: the terminology Several words are used with similar or related meanings: »database / databank / corpus / collection / catalog / site / archive / file / web /... »contents of a database / records / documents / items / (web) pages /... »search / query / filter /... »thesaurus / controlled vocabulary / dictionary / lexicon / term bank / ontology /... »results / selection / retrieved documents / retrieved items /... ***-

66 66 Information retrieval software: a particular type of DBMS Software for information storage and retrieval (ISR software) Text(-oriented) database management systems (Text-DBMS) Text information management systems (TIMS) Document retrieval systems Document management systems ***-

67 67 Information retrieval: via a database to the user ***- Information content Linear fileInverted file Search engine Search interface User Database

68 68 Comparison Information retrieval: the basic processes in search systems Information problem Representation QueryIndexed documents Representation Retrieved, sorted documents Text documents Evaluation and feedback ****

69 69 Information retrieval systems: many components make up a system Any retrieval system is built up of many more or less independent components. These components can be modified to increase the quality of the results more or less independently. ***-

70 70 Information retrieval systems: important components ***- the information content system to describe formal aspects of information items system to describe the subjects of information items concrete descriptions of information items = application of the used information description systems information storage and retrieval computer program(s) computer system used for retrieval type of medium or information carrier used for distribution

71 71 Information retrieval systems: the information content The information content is the information that is created or gathered by the producer. The information content is independent of software and of distribution media. The information content is input into the retrieval system using »a system (rules) to describe the formal aspects »a system (rules) to describe the contents (classification, thesaurus,...) ***-

72 72 Information retrieval systems: media used for distribution Hard copy (for information retrieval systems only in the broad sense) »Print »Microfiche For computers: (for information retrieval systems strictu sensu) »Magnetic tape »Floppy disk; optical disk (CD-ROM, Photo-CD, DVD...) »Online ***-

73 73 Information retrieval systems: the computer program The information retrieval program consists of several modules, including: The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s). The search engine provides the search features and power that allow the inverted file(s) to be searched. The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands). ***-

74 74 What determines the results of a search in a retrieval system? the information retrieval system ( = contents + system) the user of the retrieval system and the search strategy applied to the system ***- Result of a search

75 75 Layered structure of a database Database (File) Records Fields Characters + in many systems: relations / links between records ***-

76 76 A simple database architecture: all records together form a database The ‘salami architecture’ = ‘sliced bread architecture’ »the salami or the bread is a “database” »each slice of salami or bread is a “database record” »there are no relations between slices / records »the retrieval system tries to offer the appropriate slices / records to the user ***-

77 77 Databases and computerized information retrieval Text retrieval and language ****

78 78 Text retrieval and language: an overview Problems related to language / terminology occur 1. even when the same language is used in searching and in the searched databases 2. in the case of “multi-linguality”: “cross-language information retrieval” that is when more than 1 language is used »in the search terms »in the contents of the searched database(s) and/or in the subject descriptors of the searched database(s) ***- 

79 79 Text retrieval and language: enhancing retrieval Retrieval can be enhanced by coping with the problems caused by the use of natural language. Contributions to this enhancement of retrieval can be made by »the database producer »the computerized retrieval system »the searcher/user (The distinction between these is not very sharp and clear in all cases.) ***-

80 80 Text retrieval and language: a word is not a concept (a) Problem: A word or phrase or term is not the same as a concept or subject or topic. **** Word Concept 

81 81 Text retrieval and language: a word is not a concept (a’) So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to ‘cover’ the concept. **** 

82 82 Text retrieval and language: a word is not a concept (a’’) »synonyms! (such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry…) **** 

83 83 Text retrieval and language: a word is not a concept (a’’’) »narrower terms, more specific terms (such as particular brand names); including terms with prefixes (for instance: viruses, retroviruses, rotaviruses,...) »spelling variations (such as UK English versus US English); possible variations after transliteration **** 

84 84 Text retrieval and language: a word is not a concept (a’’’’) »singular or plural forms of a noun (when this is used as a search term) »(relevant) related terms »various forms of a verb (when this is used in the query) »broader terms (perhaps) **** 

85 85 Text retrieval and language: a word is not a concept (b) Method to solve the problem at the time of database production: »adding to each database record those codes from a classification system or terms from a thesaurus system that are relevant, and providing the user with knowledge about the system used; in some cases, this process is computerized (with intellectual intervention or completely automatic) ***-

86 86 Text retrieval and language: a word is not a concept (b’) »However, this solution is not perfect: —Addition of terms by humans from a controlled vocabulary / from a thesaurus is not easy and time consuming. Consequences: –the added value lags behind the availability of the document –the process can delay access to the document –the process is expensive —Moreover, in practice, most users of the resulting database do not exploit this method offered. ***-

87 87 Text retrieval and language: a word is not a concept (c) Method to solve the problem, provided by the computerized retrieval system: »offering to the user a partly computerized access to the particular subject description system used by the database producer, and then linking to the database for searching »computerized, automatic, analysis of the ‘free text’ search terms applied in a query by the user, for transparent ‘mapping’ to the corresponding particular classification codes, categories, or thesaurus terms used by the database producer ***-

88 ?? Question ?? Which problems in text retrieval are illustrated by the following sentences? **** 88 

89 89 Time flies like an arrow. Fruit flies like a banana. ? ****Examples

90 90 Time flies like an arrow. Fruit flies like a banana. ****Examples

91 91 Time flies like an arrow. Fruit flies like a banana. OK! ****Examples

92 92 Text retrieval and language: ambiguity of meaning (a) Problem: A word or phrase can have more than 1 meaning. Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches. The meaning can depend on the context. The meaning may depend on the region where the term is used. **** 

93 93 Text retrieval and language: ambiguity of meaning (a’) Example of a word: »Pascal the philosopher »Pascal the computer language ****Example 

94 94 Text retrieval and language: ambiguity of meaning (a’’) Example of sentences: »The banks of New Zealand flooded our mailboxes with free account proposals. »The banks of New Zealand flooded with heavy rains account for the economic loss. ****Example 

95 95 Text retrieval and language: ambiguity of meaning (a’’’) Problem: Ambiguity of meaning may be the cause of low precision. **** Word Concept 

96 96 Text retrieval and language: ambiguity of meaning (b) Method to solve the problem at the time of database production: »adding to each database record codes from a classification system or terms from a thesaurus system, and providing the user with knowledge about the system used; in some cases, this process is computerized (completely automatic or with intellectual intervention); ***-

97 97 Text retrieval and language: ambiguity of meaning (b’) Method to solve the problem, provided by the computerized retrieval system: »offering to the user a partly computerized access to the subject description system and then linking to the database for searching ***-

98 98 Text retrieval and language: ambiguity of meaning (b’’) »searching normally (without added value), but adding value by categorizing the retrieved items in the presentation phase to assist in the ‘disambiguation’; this feature is offered for instance by —the public access module of the book catalogue of the library automation system VUBIS at VUB, Belgium, when a searching items that were assigned a particular keyword ***-

99 99 Text retrieval and language: ambiguity of meaning (b’’’) »Natural language processing of the queries: linguistic analysis to determine possible meanings of the query, which includes disambiguation of words in their context: “lexical” analysis = at the level of the word “semantic” analysis = at the level of the sentence However, most queries are short and therefore it is difficult to apply semantic analysis for disambiguation. ***-

100 100 Text retrieval and language: ambiguity of meaning (b’’’’) »Natural language processing of the documents: linguistic analysis to determine possible meanings of a sentence, which includes disambiguation of words in their context: “lexical” analysis = at the level of the word “semantic” analysis = at the level of the sentence However, most retrieval systems do not apply this complicated method. ***-

101 101 A word is not a concept A concept is not a word **** Word1 Word2 Word3 Concept1 Concept2 Concept3 A concept cannot be “covered” by only 1 word or term; this may be the cause of low recall of a search. The meaning of many words is ambiguous; this may be the cause of low precision of a search.

102 102 Text retrieval and language: phrases composed of words (a) Problem: Most retrieval systems can search for words, but they do not directly recognize or ‘know’ phrases / terms composed of more than 1 word. ***- 

103 103 Text retrieval and language: phrases composed of words (b) Methods to solve the problem, provided by the computerized retrieval system: »the user can and should indicate explicitly that a few words should be considered together by the retrieval system as forming a phrase/term (for instance in many Internet search engines by putting the phrase in quotes like “three word phrase”) ***-

104 104 Text retrieval and language: phrases composed of words (b’) »better: the retrieval system automatically recognizes a phrase/term relying on a term bank that has been created in advance; examples: the Internet search engines AltaVista and Scirus work in this way ***-

105 105 Text retrieval and language: conclusions The use of terms and language to retrieve information from databases/collections/corpora causes many problems. These problems are not recognized or underestimated by many users of search/retrieval systems = The power of retrieval systems is overestimated by many users. Much research and development is still needed to enhance text retrieval. ***-

106 106 Databases and computerized information retrieval Hints on how to use information sources ****

107 107 Hints on how to use information sources: overview (Part 1) Know the purpose and motivation for each search. Do not be lazy: search on your own, before bothering experts with requests for advice. Plan your search in advance. Choose the best source(s) for each search. Use the available tools for subject searching well. Try to cope with the language problems; avoid spelling errors in your search query; use spelling variations in your search query ****

108 108 Hints on how to use information sources: overview (Part 2) Match your search strategy with the type of source. Work cost-effectively. Use special care when searching for names. Be specific. Avoid broad searches. Limit your search to a specific country or region if required. Work iteratively. Keep a record of your work. ****

109 109 Hints on how to use information sources: overview (Part 3) Do not only focus on a single source. Consider citation indexes besides subject-oriented databases, as useful secondary information sources. Stop searching when “enough is enough” Give up if necessary... (Not all questions have an answer.) Be critical: not all information is correct or useful. ****

110 110 Hints on how to use information sources: overview (Part 4) In computer-based retrieval systems, consider applying »truncation of search terms (using a symbol like * or ?) »combine search terms, using —Boolean operators: ORAND / +NOT / AND NOT / - —proximity operators (for instance “NEAR”) —phrase searching (“word1 word2”) »searching limited to a field (for instance URL, title…) ****

111 111 Hints on how to use information sources: subject searching When you search for information on a particular topic/subject: investigate if the database producer offers »a subject classification scheme and/or »a controlled/approved/accepted subject terms, and/or »a subject thesaurus Exploit these, if they are available. In most cases you should find and use synonyms and narrower terms Use broader and /or related terms, if appropriate. ****

112 112 Hints on how to use information sources: Boolean combinations Most text search systems understand the basic Boolean operators: OR = obtain records that contain one or both search terms AND = obtain records that contain both search terms NOT = exclude records that contain a search term ****

113 113 Hints on how to use information sources: Boolean combinations In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible. **** term x1 OR term x2 OR term x3 term x1 OR term x2 OR term x3 term y1 OR term y2 OR term y3 term y1 OR term y2 OR term y3 term z1 OR term z2 OR term z3 term z1 OR term z2 OR term z3 AND AND...

114 114 Hints on how to use information sources: Boolean queries Most text search systems understand the basic Boolean operators typed in capital characters: OR AND ****

115 115 Hints on how to use information sources: default Boolean operator Find out if there is a default implicit Boolean operator working in the search system that you use. This works even when no operator is used explicitly among words. This can be OR, AND, NEAR... ****

116 116 ?? Question ?? Why is it important to know the default Boolean operator in the search system that you use? You can also explain this with an example. ***-

117 117 !! Task - Assignment !! You can read Cohen, Laura Boolean searching on the Internet. [online] University Libraries. University at Albany, USA. [cited 2003] You can read Cohen, Laura Boolean searching on the Internet. [online] University Libraries. University at Albany, USA. [cited 2003] ***-

118 118 ?? Question ?? How many (and which) concepts/facets do you see in a search for “general reviews about monitoring seawater pollution that is due to effluents in Tanzania”? ****

119 119 !! Task - Assignment !! Prepare off-line, on paper, a suitable search query in a generic format, to find “general reviews about monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases. (Limit yourself to 1 of the concepts.) ****

120 120 Hints on how to use information sources: example of a search query Example: Searching for the concept “sea” can or should involve for instance the following words in a Boolean OR combination: baltic OR bay OR bays OR coast OR coastal OR coastline OR coasts OR cove OR coves OR gulf OR mangrove OR mangroves OR marine OR mediterranean OR noordzee OR noordzeekust OR noordzeekusten OR ocean OR oceanic OR oceans OR pacific OR reef OR reefs OR “saline-freshwater interface” OR sea OR seas OR seashore OR seawater OR seawaters OR shore OR shores ***-Example

121 121 ?? Question ?? What did you learn from the exercise on the formulation of a query? ****

122 122 Hints on how to use information sources: work iteratively Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search. **** Results Query Searching Feedback

123 123 **** Hints on how to use information sources: work iteratively: example When you search a database with subject keywords from a controlled list, added to each record: 1. Search with search terms that you know 2. Investigate the results and select good, relevant items 3. Look for the keywords added to these items 4. Select the good, relevant keywords 5. Formulate a new search with these keywords added 6. Execute the new search 7. Repeat the procedure

124 124 “The ability to ask the right question is more than half the battle of finding the answer.” Thomas J. Watson **** ?

125 125 Hints on how to use information sources: when to stop searching? Develop a feel for the “curve of diminishing returns”: If you spend too much time, effort, and/or money with too few benefits, you should stop. **** time / effort / money payoff Time to stop?

126 126 Knowledge organisation: classifications, and thesaurus systems Introduction ****

127 127 To organise knowledge / documents / books / reports / information / data / records / things / items / materials for more efficient storage and retrieval, some related, similar tools / systems / methods / approaches are used. Often but not yet always, this process is assisted by a computer system. Good systems are expanded and updated when the need arises. The organization system applied should ideally be clearly and immediately visible or even searchable on computer, by the user of the materials. Knowledge organisation: introduction ****

128 128 Various related tools / systems / methods / approaches are available: »Classification »Taxonomy »Controlled list of selected keywords »Thesaurus »Ontology »Subject-related metadata »… Knowledge organisation: some tools ***-

129 129 Knowledge organisation: classifications, and thesaurus systems Classifications ****

130 ?? Question ?? Give examples of general, universal classification systems. **-- 130

131 131 Classification systems: introduction Classification systems present the subjects in a logical order, usually going from the more general to the more specific. ***-Examples

132 132 Universal means here: covering all subjects Not just one but several competing systems exist. Examples »Universal Decimal Classification = UDC used mainly outside U.S.A. »Dewey Decimal Classification = DDC used mainly in U.S.A. »Library of Congress Classification used mainly in U.S.A. »... Classification systems: examples of universal systems ****Examples

133 133 Knowledge organisation: classifications, and thesaurus systems Thesaurus systems ****

134 134 Thesaurus: description Thesaurus (contents) = »system to control a vocabulary (= words and phrases + their relations) »+ the contents of this vocabulary Thesaurus program = program to create, manage, modify and/or search a thesaurus using a computer ****

135 135 Thesaurus relations Term(s) with broader meaning BT (= Broader Term) RT (= Related Term) UF (= Use(d) For) Other term(s) Term Synonym(s) NT (= Narrower Term) Term(s) with narrower meaning ****

136 ?? Question ?? Which applications do you see for a thesaurus? ***- 136

137 137 Thesaurus applications related to information searching (1) For producers of a database: To find/choose index terms to add these to items in a database, when terms are taken from a controlled vocabulary to increase precision and recall in the searches by users of the database. ***-

138 138 Thesaurus applications related to information searching (2) For users (!) of a database: When the database to be searched is produced with added descriptors (words and terms) that are taken from a controlled list of approved, selected words and terms, then the searcher can use some printed or computer- based system first, to find more and ‘correct’ suitable words and terms that belong to that controlled list of descriptors; then, the searcher can use these descriptors (and only these words or terms) in a database query. ***-

139 139 Thesaurus applications related to information searching (3) For users (!) of a database: When the database to be searched is NOT produced with added descriptors (words and terms) that are taken from a controlled list of words and terms, then the searcher can use one or several thesaurus systems first, to find more words and terms and more suitable words and terms; then the searcher can use these found words and terms to formulate a query for that database (to increase recall and precision). ***-

140 140 Thesaurus applications To find more and/or better terms during writing. To understand the meaning of a term, by inspecting »the scope note of the term and/or »the relations with other terms. **--

141 141 Thesaurus systems that cover all subjects General systems Universal systems Covering all subjects Broad and shallow systems Horizontal systems ***-

142 142 Thesaurus systems that cover all subjects: examples (1) Library of Congress Subject Headings (LCSH) thesaurus system built into word processing software thesaurus system that runs on a pc (independent of Internet) see for instance http://www.wordweb.co.uk/free/http://www.wordweb.co.uk/free/ ***-Examples

143 143 Thesaurus systems that cover all subjects: examples (2) thesaurus systems that can be used free of charge through the WWW »http://education.yahoo.com/reference/thesaurus/index.htmlhttp://education.yahoo.com/reference/thesaurus/index.html »http://thesaurus.plumbdesign.com/http://thesaurus.plumbdesign.com/ ***-Examples

144 144 Thesaurus systems covering all subjects: comments An ideal, complete thesaurus that covers all subjects does not exist. ***-Examples

145 145 !! Task - Assignment - Exercise !! Try to find suitable search terms to retrieve documents on “pollution” from a database on marine science, by using for instance the thesaurus included in the program for word processing that you use. ****

146 !! Task - Assignment - Exercise !! Have a look at various global, general, universal thesaurus systems. Consider which ones may be useful for your future online information searches. **-- 146

147 147 Thesaurus systems focused on a particular subject Focused on a particular subject domain = narrow and deep, vertical systems ***-

148 148 Thesaurus systems focused on a particular subject: examples ERIC: education, information science,... Psychological Abstracts / PsycInfo Sociological Abstracts / SocioFile INSPEC: physics, electronics, information technology the Aquatic Sciences and Fisheries Information System Medline (the Medical Subject Headings = MeSH) Various thesaurus systems for art and architecture can be found online: http://www.getty.edu/research/tools/vocabulary/ http://www.getty.edu/research/tools/vocabulary/ ***-Examples

149 149 Knowledge organisation: classifications, and thesaurus systems Classification systems versus thesaurus systems ****

150 150 Knowledge organization: classifications versus thesauri Classification »Good for placement of documents in a library (because documents on many related subjects can be kept together) »Not well suited for computer searching (too complicated) Thesaurus »Not suited for placement of documents in a library (because documents with related subjects would NOT be kept together) » Well suited for computer searching (relatively simple alphabetic listing of keywords) ****

151 151 Online access information sources and services Introduction ****

152 152 Online information sources: summary The following gives a general overview of online accessible information sources. This overview is not limited to or focusing on a particular concrete subject domain/area. ****

153 153 Online information sources: prerequisites Before using online accessible information sources, you should ideally have some knowledge and skills related to computer hardware computer software the Internet the WWW the concept of ‘information’ information retrieval in general the information market ****

154 154 Discovering online access information sources Equipment and tools required: »A microcomputer »Data communication facilities »Tools to locate information sources »Some knowledge and skills »... **--

155 155 Growing importance of computer network information resources Networked information resources are growing at a high rate, not only in volume but also in importance. There are many sources there which are vital to research and many others which are useful generally. To keep abreast of their field, most academics and researchers will find an increasing need to use the network for fast and efficient communication and for access to information. If they don’t, they are likely to be left behind, because most of their colleagues in institutions around the world will be doing just that. ***-

156 156 Online access to information: avoid network traffic jams To access from Europe online information sources in the US, work when lines are not saturated. (better in the morning than in the afternoon) ****

157 157 Internet based information sources: problems / difficulties (Part 1) Redundancy and overlap: On the one hand, there is too much information on some topics; in other words, the redundancy and overlap are high in many cases. Too few information sources: On the other hand, there are too few information sources on some topics. ****

158 158 Internet based information sources: problems / difficulties (Part 2) No order is imposed on most sources. Quality checks / quality controls are not performed. Related to this: it is not required to register new information offered. Is the information that you find real, honest, authentic? ****

159 159 Internet based information sources: problems / difficulties (Part 3) Change is the only constant: Information sources are constantly changing, growing, but sometimes disappearing. ****

160 160 Internet based information sources: problems / difficulties (Part 4) Scattering: There is no single simple but powerful system to find relevant information through the Internet. In other words: integration / aggregation is still far from perfect. ****

161 161 Internet based information sources: problems / difficulties (Part 5) Slow: The Internet is in many places and for many applications not yet fast enough. ****

162 162 Internet based information sources: problems / difficulties (Part 6) In conclusion: Surfing, using the Internet, the WWW, can be a time sink instead of a productive activity. ****

163 163 Internet based information sources: how many? how much information? More than 10 terabyte (= 10 000 gigabyte) of text data (in 2001) More than 10 million WWW sites (in 2003) More than 4 000 million (= 4 billion) unique URLs in the total Internet (in 2004) ****

164 164 Increasing number of online public access databases Source: Gale Directory of Databases, 1997. ***-

165 165 Online access information sources and services Types of online access information systems ****

166 166 Primary versus secondary computer sources / systems / services Primary sources /systems /services directly useful Secondary sources /systems /services »helping to access / use the primary services »“travel agencies”, “navigation services”,... ****

167 167 ?? Question ?? Do you know examples of primary and of secondary online information systems? ***-

168 168 Types of online access information systems by contents Documents (with or without hyperlinks) Catalogues of editors and bookshops Online public access library catalogues (OPACs) Community/Campus-Wide Information Systems (CWIS) Online access databases about journal articles Electronic newsletters and journals Computer file archives (documents, programs) Interest groups (for instance Usenet Newsgroups)... ***-

169 169 Types of online access information systems by access method Remote login information systems and bulletin board systems (BBS) (telnet in the Internet) Anonymous ftp servers, in the Internet Usenet News servers (nntp in the Internet) Gopher servers, in the Internet Wide Area Information Servers (WAIS), in the Internet World Wide Web servers = http servers (WWW), in the Internet... ***-

170 170 Types of online access information systems: “free” versus “fee” **** Public access information sources free of charge Fee-based online information services (NOT free of charge)

171 171 Types of online access information sources by file format For instance: »TXT (ASCII) »DOC »HTM, HTML, SHTML,… »PDF »PCX »TIF, TIFF »GIF »JPG »PNG »AVI »MPG »ASF »… ***-

172 172 **-- Commercial information provided through the Internet Most of the information that is freely available on the WWW is provided by commercially oriented organisations. Thus that information is not objective or scientific in most cases, but subjective or perhaps even misleading, and certainly attracting more attention than more scientific information. (Of course many information sources are also provided by commercial organisations that belong to the so-called information industry, but these are bound to supply more objective information of high quality, as this is their way to survive commercially.)

173 173 Online access information sources and services Dictionaries and encyclopaedias accessible through the WWW ****

174 174 Dictionaries and encyclopedias through the WWW: introduction Dictionaries and encyclopedias are the first choice among many types of information sources, »when we do not need detailed information on a common topic »when we want to prepare a more detailed search on an unfamiliar topic, by searching for the right spelling, synonyms, context,… Some dictionaries and encyclopedias are available through the WWW free of charge. ****

175 175 Dictionaries accessible through Internet and the WWW: example The American Heritage® Dictionary of the English Language »Over 200,000 entries, 70,000 audio word pronunciations, 900 full-page color illustrations »Available free of charge from http://education.yahoo.com/reference/dictionary/ http://education.yahoo.com/reference/dictionary/ ****Example

176 176 Dictionaries accessible through Internet and the WWW: compilation A compilation/collection of dictionaries can be searched simultaneously and free of charge: http://www.onelook.com/ http://www.onelook.com/ ****Example

177 177 Encyclopedias accessible through Internet and the WWW: examples Encarta Concise Free Encyclopedia »http://encarta.msn.com/http://encarta.msn.com/ »Available in English and in some other languages ****Example

178 178 Encyclopedias accessible through Internet and the WWW: examples Encyclopædia Britannica only a small part is available free of charge + links to selected WWW sites »http://www.britannica.com/http://www.britannica.com/ Encyclopædia Britannica Concise »http://education.yahoo.com/reference/encyclopedia/http://education.yahoo.com/reference/encyclopedia/ ****Example

179 179 Encyclopedias accessible through Internet and the WWW: examples The Canadian Encyclopedia (in English and in French): »http://thecanadianencyclopedia.com/http://thecanadianencyclopedia.com/ ****Example

180 180 Encyclopedias accessible through Internet and the WWW: overviews A list / overview of encyclopedia on the Internet: http://www.internetoracle.com/encyclop.htm http://www.internetoracle.com/encyclop.htm Other lists of encyclopedia on Internet can be found as a part of more general directories of Internet-based information sources. ****Example

181 181 Online access information sources and services Internet search functions built in browser software ***-

182 182 The Internet search functions built into browsers Some Internet search functions are built into common leading browsers like »Microsoft Internet Explorer »Netscape When connected to the Internet, you can use »The functions behind the “Search button” »Searching through the “Address” form ***-

183 183 The Internet search button of browsers: introduction Common graphical browsers provide a search function and a search button. Examples: Netscape, Microsoft Internet Explorer ***-

184 184 ***- The Internet search button of browsers: comments (Part 1) Such a search function offers in fact no searching, but (only) a link to a WWW site, often in the USA, which offers links or gateways to search tools on other servers. It is faster in many cases to contact search tools directly.

185 185 ***- The Internet search button of browsers: comments (Part 2) The gateways may offer only a limited view on the properties of the real search tool used. Such a search function can confuse users who may think that the searching capability is built more or less into the browser software, while searching relies on external servers.

186 186 ***- Searching with browsers using the address form: introduction A search for particular Internet documents can be performed by typing in keywords in the address form, when you are connected to the Internet, for instance with »Microsoft Internet Explorer »Netscape This is based on transmitting the keywords to some Internet index through the Internet.

187 187 !! Task - Assignment - Exercise !! Get some experience in using the address form of your web browser program to search for documents on a particular subject that are available on the WWW. ***-

188 188 ***- Searching with browsers using the address form: comments + An advantage is the ease of use. - A disadvantage is that it is less clear what really happens, than when you access a well chosen and well known Internet directory or Internet index directly.

189 189 Online access information sources and services Internet directories and indexes ****

190 190 !! Task - Assignment !! You can find an introduction to general Internet information skills in the form of a computer-based interactive tutorial at http://www.rdn.ac.uk/vts/instructor/index.htmlhttp://www.rdn.ac.uk/vts/instructor/index.html You can find an introduction to general Internet information skills in the form of a computer-based interactive tutorial at http://www.rdn.ac.uk/vts/instructor/index.htmlhttp://www.rdn.ac.uk/vts/instructor/index.html ***-

191 191 Internet: meta-information about Internet information sources in printed manuals and guides: - it is not always possible to get a copy fast - it costs money to get a copy - they are soon out of date offered on the WWW!: + directly available when we want to use the Internet + many systems are accessible free of charge + most systems are regularly updated (“intelligent agent” software on client PC) ****

192 192 Internet: subject-oriented meta- information offered via WWW Information about information sources: in the form of »subject guides = texts with references »subject hypertext directories = subject guides »key word indexes, generated automatically, for searching »collections of links or forms to the above »(multi-threaded search systems) ****

193 193 Internet global subject directories: introduction They are virtual libraries with open shelves, for browsing. They are manually generated, man-made by many people. They can be browsed following a tree structure or a more complicated variation. The most famous of these systems belong to the most popular and most visited sites on the WWW: e.g. Yahoo! ****

194 194 Internet global subject directories: structure The structure corresponds to a classification that is in most cases specific for the particular overview. In other words: the well-known and classical universal classification systems are not used in most Internet directories. ****

195 195 Internet global subject directories: pros and cons They cover a small number of selected WWW sites, in comparison with the total number of sites that are accessible.  + The selected, included sites should be better than average. - They are not suitable for deep, detailed, specific searches with a high coverage.

196 196 Internet global subject directories: why use one? They are suitable mainly for broad searches that can be difficult to formulate in words, but NOT for more specific searches that require combinations of several concepts.

197 197 **** Internet global subject directories: searching directories with a query Many of the Internet directories include an index to search their contents with a query. However, then the assisting classification structure is not well exploited and the user should be aware of the problems and difficulties of information retrieval with natural language queries. Furthermore, the possibility to use the system in this way may be confusing, as these directories are not real full- text Internet indexes, like those provided by other search tools.

198 198 Internet global subject directories: Yahoo! A hypertext global subject directory can be found at http://www.yahoo.com/http://www.yahoo.com/ and at many other sites, including http://www.yahoo.co.uk/ http://www.yahoo.co.uk/ Entries are NOT rated. Accessible free of charge. ****Example

199 199 Internet global subject directories: Yahoo! links in pediatrics Health > Medicine > Pediatrics: International Pediatric Chat - for professionals to share information and education regarding children's health care.International Pediatric Chat National Med/Peds Residents' Association - organization for residents, practioners and medical students interested in combined internal medicine and pediatrics.National Med/Peds Residents' Association Neonatology Network - information and communication platform for neonatologists and pediatricians.Neonatology Network Pediatria OnLine - qui si parla di bambini, fra pediatri e con le famiglie.Pediatria OnLine Pediatric Critical Care Pediatric Database (PEDBASE) - containing descriptions of over 500 childhood illnesses.Pediatric Database (PEDBASE) Pediatric Endocrinology Conference - LWPES/ESPE joint meeting occuring July 6-10 2001.Pediatric Endocrinology Conference Pediatric Endoscopic Photos - illustrating intestinal problems in children.Pediatric Endoscopic Photos ***-Example

200 200 Internet global subject directories: Yahoo! for pediatrics Health > Medicine > Pediatrics: link to a digital library (health sciences) for young patients ***-Example

201 201 Internet global subject directories: Yahoo! to pediatrics organisations Health > Medicine > Pediatrics > Organizations: link to the American Academy of Pediatrics ***-Example

202 202 Internet global subject directories: Yahoo! links to pediatrics schools Health > Medicine > Pediatrics >Schools, Departments, and Programs University of Rochester - partnership between pediatric residents and community-based agencies that serve children and their families.University of Rochester Michigan State University@ Royal College of Paediatrics and Child Health - responsible for training, examinations, professional standards, and organisation of child health services for the UK.Royal College of Paediatrics and Child Health Tohoku University University of Alabama at Biringham - programs and training opportunities in pediatrics. Also contains faculy information and sub-speciatlty descriptions.University of Alabama at Biringham … ***-Example

203 203 Internet global subject directories: searching with a query in Yahoo! (1) The directory of Yahoo! can not only be browsed, but can also be searched with a query. However, in this way the hierarchical structure is not well exploited. For the formulation of a search query, Yahoo! can provide automatic assistance related to spelling and word variations. For instance: After searching for “Capetown”, Yahoo! Answers: Other Spellings: Try searching for cape town instead.cape town ***-Example

204 204 Internet global subject directories: searching with a query in Yahoo! (2) When such a query does not provide results, then Yahoo! uses a much larger external Internet index, to execute a query based on textual search statements. The chosen Internet index has varied over time. This mechanism is not made very clear and may confuse the user. ***-Example

205 205 Internet global subject directories: Yahoo! and full-text search engines The company Yahoo! started and became famous by offering a WWW global subject directory. Afterwards it has offered many other services and has become one of the mostly used WWW portals. In 2003, Yahoo! also owns 3(!) big Internet search engines: All the Web, AltaVista, Inktomi ***-Example

206 206 Internet global subject directories: Britannica A hypertext global subject directory can be found at http://britannica.com/ http://britannica.com/ Entries are rated. Accessible free of charge. Combined and integrated with a great encyclopedia. **--Example

207 207 Internet global subject directories: BUBL link A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found at http://bubl.ac.uk/link/ http://bubl.ac.uk/link/ Accessible free of charge. **--Example

208 208 Internet global subject directories: Google directory A hypertext global subject directory can be found at http://directory.google.com/ http://directory.google.com/ Accessible free of charge. Based on the Netscape DMOZ Open Directory Project. Do not confuse this with the famous Google WWW search engine. ****Example

209 209 Internet global subject directories: Librarians' Index to the Internet A hypertext global subject directory can be found at http://www.lii.org/ http://www.lii.org/ Accessible free of charge. **--Example

210 210 Internet global subject directories: LookSmart A hypertext global subject directory can be found at http://www.looksmart.com/ http://www.looksmart.com/ Accessible free of charge. **--Example

211 211 Internet global subject directories: Open Directory Project A hypertext global subject directory can be found at http://www.dmoz.org/ http://www.dmoz.org/ The contents is also used in other systems, such as Google Directory and Webbrain. Accessible free of charge. ****Example

212 212 Internet global subject directories: Point (Communications) A hypertext global subject directory can be found at http://www.pointcom.com/ http://www.pointcom.com/ Accessible free of charge. **--Example

213 213 Internet global subject directories: Resource Discovery Network A collection of hypertext subject directories that focus on academic information sources can be found at http://www.rdn.ac.uk/ http://www.rdn.ac.uk Together these lead to more than 30 000 selected WWW sites. Accessible free of charge. ***-Example

214 214 !! Task - Assignment - Exercise !! Try to find Internet sources which are relevant for you, by using an Internet-based global subject directory. ****

215 215 Internet global subject directories: evaluation criteria - desiderata (1) Usage free of charge? Wide coverage? Up to date? Frequent updates? Only few dead / broken links? Good coverage of the sources in that part of the world in which you are interested? ***-

216 216 Internet global subject directories: evaluation criteria - desiderata (2) Does the manager of the directory refuse to give priority to sites that want to pay to get a prominent place in the directory? Easy user interface? Short response times? Are mirror sites available closer to you for faster response? Good presentation, description of each site? ***-

217 217 Internet global subject directories: evaluation criteria - desiderata (3) Is a rating, appreciation, review offered for each listed site? Is translation of documents offered free of charge? Good documentation and online help? Good help desk available? High stability and reliability? ***-

218 218 Internet global subject directories: evaluation criteria - desiderata (4) Are other services offered from the same site or with the same interface? Is the subject directory integrated with other services? Additional services can be »an Internet index or a WWW index or a gateway to such an index for searching with a query »weather, travel guides, flight and hotel reservations, maps,... »WWW-based e-mail and e-mail address directories »auctions through WWW ***-

219 219 !! Task - Assignment - Exercise !! ***- Compare two global Internet subject directories to start a search. (for instance on ecological marine management.)

220 220 ***- Internet subject directories: non-global, more specific systems a directory limited to sources in/of a country or region a directory restricted to a specific subject domain (“portal”) a global subject directory the complete WWW can lead to

221 221 Internet subject directories focusing on a specific subject domain (Part 1) “Specialised subject directories” or “gateways” Examples: Educational materials in the USA: »http://www.thegateway.org/http://www.thegateway.org/ Marine science and oceanography: »http://oceanportal.org/ = http://ioc.unesco.org/oceanportal/http://oceanportal.org/http://ioc.unesco.org/oceanportal/ ***- Examples

222 222 Internet subject directories focusing on a specific subject domain (Part 2) Engineering, mathematics, computing: »http://www.eevl.ac.uk/http://www.eevl.ac.uk/ »http://www.ub.lu.se/eel/http://www.ub.lu.se/eel/ Civil engineering: »http://www.icivilengineer.com/http://www.icivilengineer.com/ Fishing: »http://www.onefish.org/http://www.onefish.org/ ***- Examples

223 223 Internet subject directories focusing on a specific subject domain (Part 3) Medicine and healthcare: general: http://www.achoo.com/ http://www.medmatrix.org/ http://www.medscape.com/ http://www.omni.ac.uk Medicine and healthcare: General pediatrics: http://GeneralPediatrics.com http://www.medscape.com/pediatricshome http://www.pedinfo.com/ ***- Examples

224 224 Internet local subject directories: examples in Belgium http://yellow.advalvas.be/weblist.html http://search.msn.be/ The guide developed by the public libraries in Flanders: http://www.bib.vlaanderen.be/webwijzer http://www.bib.vlaanderen.be/webwijzer ****Examples

225 225 Internet indexes: automated search tools Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic, direct way than by only browsing/navigating. These systems do NOT search the contents of computers through the real Internet in real time and completely when a user makes a query. Searching in that way would be much too slow due to limitations in the technology. ****

226 226 Internet indexes: scheme of the mechanism **** User searching for Internet based information Internet client hardware and software user interface to a search engine Internet information source Internet index search engine Internet crawler and indexing system database of Internet files, including an index

227 227 Internet indexes: description of the mechanism Each of these search systems is based on: a database of links to pages / URLs that can be retrieved by searching with queries through a big index that is built machine-made on the basis of the contents, the texts, of these pages (to build this database and to keep it up to date, pages are continuously collected from the Internet by a “robot” computer software system) a search system with a user interface in a WWW form, to allow the user to search through that database ****

228 228 Internet indexes: building their database ***- Inverted file, full text index, register of the database User Records derived from the input and stored in the database Internet documents fed into the database management system Indexing Retrieval

229 229 Internet indexes: AltaVista **** The primary search interface can be found in the US. The following addresses all lead to the same information: »http://www.altavista.com/http://www.altavista.com/ »http://www.av.com/http://www.av.com/ »http://av.com/http://av.com/ Mirror site in UK: »http://uk.altavista.com/http://uk.altavista.com/ »http://www.altavista.co.uk/http://www.altavista.co.uk/

230 230 Internet indexes: AltaVista: features Allows full text searching of the WWW Offers relevance ranking of search results Allows also advanced Boolean searching (in “Advanced” mode) Offers a link to an Internet subject directory (Looksmart) Offers links to systems to find images, sounds… (multimedia) in the Internet ****

231 231 Internet indexes: AltaVista simple versus advanced “Simple” is suited for instance for searches »with only 1 concept expressed as a series of synonyms, narrower terms,... such as a search for a person, a company, an institute,... »when ranking is important “Advanced” is suited for instance for searches »with more than 1 concept so that an AND combination is useful, besides an OR combination »when ranking is not important ***-

232 232 Internet indexes: AltaVista as a company AltaVista and the other leading Internet search engines Alltheweb and Inktomi are owned by the same U.S. company Yahoo! since 2003. Their most important competitor is Google. ***-

233 233 Internet indexes: All the Web **** The search interface can be found at: http://www.alltheweb.com/ http://alltheweb.com/ http://www.alltheweb.com/ http://alltheweb.com/ You can search the WWW and ftp servers. The database is one of the biggest. Not only HTML and plain text files, but also the full text of many Adobe PDF files is indexed. Offers also a module to search for pictures/images. Offers spelling suggestions in the search interface.

234 234 Internet indexes: All the Web as a company All the Web and the other leading Internet search engines AltaVista and Inktomi are owned by the same U.S. company, Yahoo!, since 2003. Their most important competitor is Google. ***-

235 235 Internet indexes: Google (Part 1) http://www.google.com/ One of the most popular systems in 2001, 2002, 2003, 2004… For retrieval, an algorithm is used that takes into account the links between WWW pages. A retrieved page is ranked higher when »many sites/pages point to it »“important” sites/pages point to it ****

236 236 Internet indexes: Google (Part 2) Full-text searching is possible of many files that are available through the WWW. Not only HTML and plain text pages are covered, but also the first part is indexed of many files in other file formats, such as »Adobe PDF, »Microsoft Word, Microsoft Excel, Microsoft PowerPoint »Rich Text Format… Also the contents of some databases can be searched. ****

237 237 Internet indexes: Google (Part 3) In other words, not only static WWW pages are harvested and made searchable. Many other search systems on all kinds of WWW sites are based on Google. ****

238 238 Internet indexes: Google computer servers Google uses a system of more than 10 000 small computer servers to offer it’s information services. ***-

239 239 Internet indexes: Google refers to a dictionary In Google, the words used in a search query are returned to the user with hyperlinks to a dictionary and to a thesaurus on the WWW, that can be used partly free of charge. The dictionary can learn the user more about the meaning of the words used in the query. **--

240 240 Internet indexes: Google refers to a dictionary: display **--Example

241 241 Internet indexes: from Google into a dictionary **--Example

242 242 Internet indexes: Google refers to a thesaurus In Google, the words used in a search query are returned to the user with hyperlinks to a dictionary and to a thesaurus on the WWW, that can be used partly free of charge. The thesaurus can of course show the user synonyms, narrower terms, related terms for the word. In this way, this system can be used to expand a search query, so that the query better covers the search concept. ***-

243 243 Internet indexes: from Google into a thesaurus ***-Example

244 244 Internet indexes: Google can expand a query: how? ***- If you want to retrieve more documents, then you can request Google to include synonyms of one or several of the words in your query in an automatic way. This works since 2003. You can do this by putting a tilde ~ in front the selected word. Example of a query: word1 ~word2 word3 word4

245 245 Internet indexes: Google can expand a query: comment ***- Of course, this is only a “quick and dirty” method. The system does not really understand your information need. Manual, intellectual expansion of a query should yield better results.

246 246 Internet indexes: Google additional features Besides a system to search for WWW pages, Google offers also »a subject directory »searching for images/pictures on the WWW »searching an archive of Usenet messages + posting to Usenet groups »searching for news Thus Google has become a great integrator / aggregator. ****

247 247 !! Task - Assignment - Exercise !! Read the manual and make a search with Google. ***-

248 248 Internet indexes: Google as a company The important competitors of Google are »The well-established, classical Yahoo! subject directory system »The Yahoo! search engine, new since 2004 »All the Web and AltaVista well-established Internet search engines These are all owned by the same U.S. company, Yahoo!, since 2004. **--

249 249 Internet indexes: MSN Web Search Offered free of charge by Microsoft. You can search for WWW content. Since 1998. Famous system, because the search interface can be found with the search functions that have been built into one of the most widespread Internet browser, Microsoft Internet Explorer, and because it is offered by http://search.msn.com/ http://search.msn.com/ ***-Example

250 250 Internet indexes: MSN Web Search Is based on an Internet index created by another company. But in 2003, Microsoft has started building its own WWW crawler. **--Example

251 251 Internet indexes: Scirus Allows you to search for manually selected scientific information (only) on the WWW. This includes »the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier, that can be downloaded in full-text format, only when a fee has been paid to the publisher »scientific open archives files, that contain scientific research articles that can be downloaded free of charge. The search interface: http://www.scirus.comhttp://www.scirus.com ***-Example

252 252 Internet indexes: Scirus features Offered free of charge by Elsevier. Is partly based on the Fast WWW search system that is also used by Alltheweb. Offers access to information ordered according to some classification system / taxonomy. Offers not only access to files in html format, but also to files in PDF. ***-Example

253 253 Internet indexes: Scirus: screenshot **--Example

254 254 Internet indexes: Teoma Allows you to search for information on the WWW. Offers a feature that is not offered by most other search systems: categorization = classification = refinement = categorization = clustering of search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made The search interface: http://www.teoma.com/http://www.teoma.com/ **--Example

255 255 Internet indexes: Teoma example Example of coping with ambiguity: searching for pascal gives results related to the philosopher and to the computer programming language: **--Example

256 256 Internet indexes: Mooter Allows you to search for information on the WWW. Offers a feature that is not offered by most other search systems: categorization = classification = refinement = categorization = clustering of search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made. The clusters are displayed in a diagram. The search interface: http://mooter.com/http://mooter.com/ **--Example

257 257 Internet indexes: WiseNut Allows you to search for information on the WWW. Offers a feature that is not offered by most other search systems: categorization = classification = refinement = categorization = clustering of search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made. The search interface: http://www.wisenut.com/http://www.wisenut.com/ **--Example

258 258 Internet indexes: WiseNut: screenshot of the guide **--Example

259 259 Internet indexes: Yahoo! ***-Example An Internet search system is offered through http://www.yahoo.com/ http://www.yahoo.com/ This is offered besides the well-established, classical Yahoo! subject directory. Before 2004, the search system was provided by an external company, most recently by Google. Since 2004, an independent system is offered that is competing with other similar systems. It is probably based on the well-established INKTOMI Internet database that is owned by Yahoo! since 2003.

260 260 ?? Question ?? How can you indicate in a search query in most of the general, popular search engines that you want to search for a phrase that is composed of more than 1 word (for instance word1 and word2)? ***-

261 261 Internet indexes: coverage Internet indexes do not cover all static documents on the WWW. Most indexes grow and their “size ranking” is variable. If exhaustive results are desired, then more than one Internet index search system should be used. ****

262 262 Internet indexes: coverage and size of each index Most indexes grow and their “size ranking” is variable. The biggest systems in 2003: »Google ! »AltaVista »All the Web (serving also Lycos) »Systems based on the INKTOMI database of WWW pages. ****

263 263 !! Task - Assignment - Exercise !! Try to find Internet sources which are relevant for you, by using an Internet index. ****

264 264 **-- Internet indexes: delay in indexing new pages The great, well known, international Internet indexes have a delay of more than 1 month in indexing new pages. (according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.) So they are not suitable to search for rapidly changing recent information (such as “news”) (unless they index a small selection of important news sites more frequently.)

265 265 **-- Internet indexes: specialised systems More specialised search engines / systems can yield better result sets: »higher recall »higher precision Specialised Internet indexes / search engines can be found for instance in the directory http: //directory.google.com /Top /Computers /Internet /Searching /Search_Engines /Specialized/

266 266 the complete WWW covered by a global / international Internet index covered by an index limited to sources in/of a country or region Internet indexes: non-global, regional systems **--

267 267 the complete WWW Internet indexes: subject-specific, specialised systems covered by a global / international Internet index covered by an Internet index limited to sources related to a specific subject **--

268 268 ?? Question ?? Which Internet search system is closest to a classical library catalogue: an Internet subject directory or an Internet index? Which Internet search system is closest to a classical library catalogue: an Internet subject directory or an Internet index? ***-

269 269 ?? Question ?? Which differences do you see between a classical library catalogue and an Internet index? ***-

270 270 Internet indexes: comparison with library catalogues Most Internet indexes have a larger database than most catalogues. Internet index databases do not correspond as well to the Internet as a normal, good catalogue corresponds to the collection, because the documents on the Internet change more often and their number is growing fast. Most Internet indexes contain all the words of the documents that they index, whereas catalogues only contain short descriptions of the documents. ***-

271 271 Index spamming on WWW: introduction To increase the visibility of their WWW document or site, some information producers use several methods that are not appreciated by most users. This is called “index spamming”. **--

272 272 Index spamming on WWW: methods used (Part 1) Repeated words that are made invisible for human readers by using »very small characters, and/or »colours matching the background Using words not directly related to the page to attract readers. Examples: »names of competing companies and brands »words related to popular topics **--

273 273 Index spamming on WWW: methods used (Part 2) Fake HTML document titles that are not shown to the user (because ranking by Internet indexes is heavily based on words occurring in the HTML title) Duplicate pages / documents on the same site or even on several sites **--

274 274 Index spamming on WWW: methods used (Part 3) Creating many useless WWW pages / documents that point all to the page that should be promoted (because ranking of a page by Internet indexes is in many cases based on the number of links pointing to that page as an indication of its popularity and quality) **--

275 275 Internet indexes: variations among various systems Besides their common aims and characteristics, we can nevertheless see differences, variations among the searchable Internet index systems. To illustrate these variations and to assist Internet users to make a decision on which search system to use, the following list of some features and evaluation criteria can be useful. ***-

276 276 Internet indexes: general evaluation criteria - desiderata Is usage free of charge? How complete is the coverage? Is the coverage good (or poor) for a particular geographic region? Is the coverage good (or poor) for a particular type of documents? Is the searchable database up to date? Is the database updated frequently? Do the search results contain only few dead (broken) links? ***-

277 277 Internet indexes: indexing + searching evaluation criteria - desiderata (1) Does the database system work with full text indexing of each document that has a place in the database, so that full text searching is possible? Is the complete text indexed and searchable, even for very long documents? ***-

278 278 Internet indexes: indexing + searching evaluation criteria - desiderata (2) Are the contents of meta-fields also indexed to make them searchable? ***-

279 279 Internet indexes: indexing + searching evaluation criteria - desiderata (3) Does the system index also the text in files on the web that consist of non-ASCII codes to make these also searchable and retrievable? For instance files in the format of the various versions of »Microsoft Word (DOC), Microsoft PowerPoint (PPT, PPS), Microsoft Excel »Adobe Acrobat Portable Document Format (PDF) ***-

280 280 Internet indexes: indexing + searching evaluation criteria - desiderata (4) Field indexing, so that searching limited to the contents of a particular field is possible? for instance: HTML title,HTML keywords, URL, date, link,Java applet, text, image file, sound file,video file... ***-

281 281 Internet indexes: indexing + searching evaluation criteria - desiderata (5) Does the system offer powerful search options like »searching for terms composed of several words, in queries like “word1 word2” with the words enclosed in double quote characters »truncation of words in a query? »Boolean search combinations? »an unlimited number of search terms in a query? »proximity/nearby/adjacency searching, with operators like “word1 NEAR word2” or “word1 ADJ word2” ***-

282 282 Internet indexes: indexing + searching evaluation criteria - desiderata (6) »spelling check of search terms in the query, and suggesting spelling variations? »automatic expansion of the search terms in the initial user’s query, to achieve a higher recall, for instance by —automatic stemming of words in a query —including synonyms —including narrower terms —including translations into several other languages ***-

283 283 Internet indexes: indexing + searching evaluation criteria - desiderata (7) Can the results be limited to a certain time period? For instance based on the date »of the file as noted by the server computer, or »of the most recent indexing of the file Is the user interface easy to understand and efficient to use? Is a user interface offered in your own language? ***-

284 284 Internet indexes: indexing + searching evaluation criteria - desiderata (8) Is the search/query also submitted to another database to obtain more results? for instance: to a book database to obtain book descriptions besides WWW documents ***-

285 285 Internet indexes: indexing + searching evaluation criteria - desiderata (9) Is spamming filtered out, to give other pages a better chance of turning up in the result set? Can the system cluster presumed duplicate documents in the results? Or does the system simply eliminate presumed duplicate documents from its database? ***-

286 286 Internet indexes: output evaluation criteria - desiderata (1) Short response times? Are mirror sites available closer to you for faster response? Does the system rank the items in the result set according to their presumed relevance? Possibility to combine Boolean retrieval with relevance ranking of results? ***-

287 287 Internet indexes: output evaluation criteria - desiderata (2) Can the results be ordered according to date »of the file as noted by the server computer, or »of the most recent indexing of the file Can the results be ordered according to size? Can the system rank the results (documents) on the basis of the number of WWW hyperlinks to that document? The system does NOT place/rank some results (documents) higher in the results list, on the basis of payments by the producer of those documents to the search system company. ***-

288 288 Internet indexes: output evaluation criteria - desiderata (3) Are advertisements / sponsored links / sponsored results clearly distinguished from normal (not sponsored) search results? Good and detailed summary of each result available? Does the system offer a good presentation format of each result (document/page/item)? For instance: are search terms indicated / highlighted in the results? ***-

289 289 Internet indexes: output evaluation criteria - desiderata (4) Is any evaluation offered (automatic?) of the quality of each result, besides ranking in an order related to probable relevance and importance of the results? Can all the results (documents) from the same site be grouped together (clustered)? ***-

290 290 Internet indexes: output evaluation criteria - desiderata (5) Are results (retrieved documents) grouped / classified / categorized / clustered by the search system, on the basis of the subjects of the documents and are these presented as groups / clusters / classes / categories to the user of the search system, to assist the user in coping with the problems that can be caused for instance by multiple meanings of words used in a search query. ***-

291 291 Internet indexes: output evaluation criteria - desiderata (6) Is translation offered free of charge of the search result set, that is the list of brief descriptions of retrieved documents? Is any fact extraction from the information sources offered, in an attempt to answer the query more directly than by offering only links to documents? ***-

292 292 Internet indexes: output evaluation criteria - desiderata (7) Term suggestion: Does the system analyse the search results of the first query, to find frequently occurring terms and to suggest these to the user as new and potentially interesting additional query terms? High stability and reliability? No large variations/fluctuations in the results from identical searches at different times. ***-

293 293 Internet indexes: output evaluation criteria - desiderata (8) Relevance feedback: Can the user indicate among the search results of a first query the “good, relevant” and the “bad, irrelevant” results, so that the system can use this information to offer better results in a second query? ***-

294 294 Internet indexes: output evaluation criteria - desiderata (9) Relevance feedback 2: even better: Can the user indicate among the search results of a first query + “good, relevant” results, - as well as the “bad, irrelevant” results, so that the system can use this information to suggest + additional, new interesting query terms that can be included in a second query, - as well as query terms that should be excluded in a second query? ***-

295 295 Internet indexes: output evaluation criteria - desiderata (10) Does the system check automatically and directly the availability/reachability of WWW pages that correspond to the hyperlinks that the system has retrieved based on your search? The system can then discard invalid/broken links from the results set. This is useful because some links may be “dead/broken” even when they were included earlier in the search system. The Internet and the WWW are volatile media. ***-

296 296 Internet indexes: help evaluation criteria - desiderata Is good documentation and online help available free of charge? Is a good help desk available? Does the system clearly explain which contents (information) is harvested and made searchable and which not, and how completely this contents is covered? Does the system clearly explain the mechanism that is applied to rank the search results in the output? ***-

297 297 Internet indexes: current awareness evaluation criteria - desiderata Can the search system provide updated results, based on your interest profile, through electronic mail for instance, as a current awareness tool? ***-

298 298 Internet indexes: other services evaluation criteria - desiderata Other services available besides the normal WWW index: »index to news resources, that is more frequently updated?! »Internet subject directory?! »anonymous ftp file index? »gopher index? »searchable Usenet newsgroups archive? »white pages = people finder = addresses =... »WWW-based e-mail and e-mail address directories »auctions through WWW ***-

299 299 !! Task - Assignment - Exercise !! Discuss the pros and cons of Boolean searching, relevance ranking, and the combination of both. **--

300 300 ?? Question ?? What is wrong with the question: “Do you prefer Boolean searching or relevance ranking?” What is wrong with the question: “Do you prefer Boolean searching or relevance ranking?” ***-

301 301 ?? Question ?? Why do different Internet search engines (in most cases) give different results for an identical search, even though they have access to the same (all) documents on the Internet? ***-

302 302 Internet search systems: overviews of their relations Some relations among the most important public Internet search systems can be seen on maps in colours with hyperlinks, available from http://www.bruceclay.com/searchenginerelationshipchart.htm http://www.bruceclay.com/searchenginechart.pdfhttp://www.bruceclay.com/searchenginerelationshipchart.htm http://www.bruceclay.com/searchenginechart.pdf http://www.search-this.com/search_engine_decoder.asp Kept up to date (at least up to 2004). **--

303 303 ?? Question ?? In spite of the high popularity and the quality of the Google Internet index search system, there are still limitations in the search features. Which limitations? In spite of the high popularity and the quality of the Google Internet index search system, there are still limitations in the search features. Which limitations? ***-

304 304 Internet indexes: Google limitations (Part 1) Google does NOT offer/allow »an unlimited number of search terms in a search query »manual or automatic truncation of words in a query »manual or automatic stemming of words in a query »full Boolean search formulations (OR, AND, brackets…) like in (sea OR ocean) AND (pollution OR contamination) ***-

305 305 Internet indexes: Google limitations (Part 2) Google does NOT offer/allow »a proximity/nearby operator in the queries (such as NEAR) »full-text searching of complete text in the case of very long documents »a relevance feedback mechanism ***-

306 306 Internet indexes: Google limitations (Part 3) Google does NOT offer/allow »powerful searching to find WWW documents that link to some document in a given WWW site (WWW site citation searching), as truncation is not possible in a Google query; only searching is possible to find documents that link to a particular WWW document; in other words, the URL of the WWW document as written in the query must be perfect and cannot be truncated (AltaVista is superior in this application, because it allows truncations in the search queries) ***-

307 307 Internet indexes: Google limitations (Part 4) Google does NOT offer/allow »automatic classification/clustering/categorization of retrieved WWW pages, to cope with the problem of the natural ambiguity of meaning of the terms that were used in the search query »any evaluation of documents retrieved and offered as results ***-

308 308 Internet indexes: Google limitations (Part 5) Google does NOT offer/allow »fact extraction from the information sources, in an attempt to answer the query more directly than by offering only links to documents »a current awareness service, by email for instance (Googlealert exists however, a service independent of Google, but based on Google) ***-

309 309 Internet indexes: evolution, scalability, sustainability? Will one or several Internet indexes (search engines) be able to keep on growing in order to cover a large, interesting part the growing amount of information on the Internet and the WWW with a good retrieval system? In other words, are current systems in this area well scalable and sustainable in an affordable way? The answer to this question is not straightforward. **--

310 310 Internet indexes: distribution of searching facilities? An approach that may be considered is distribution of the workload among various more or less independent players/parties that would offer only information related to a particular region or subject-domain. **--

311 311 ?? Question ?? Compare Internet directories with Internet indexes (collection of data, coverage, ease of use...) Compare Internet directories with Internet indexes (collection of data, coverage, ease of use...) ***-

312 312 Meta- search systems: scheme 1 User Client computer + WWW client program WWW server computer Internet WWW WWW server computers with Internet search systems In Out **--

313 313 Meta- search systems: scheme 2 User Client computer + Multi-threaded Internet search client program Internet WWW WWW server computers with Internet search systems In Out **--

314 314 Meta- search systems: scheme 1+2 User Client computer + WWW client program Client computer + Multi-threaded Internet search client program WWW server computer Internet WWW WWW server computers with Internet search systems In Out **--

315 315 **-- Meta-search systems: vocabulary / synonyms “multi-threaded search systems” “multiple search systems” “multi-search systems” “meta-search systems / tools” “intelligent search agents” “federated search systems” “portals” (but this word has more meanings)...

316 316 Meta-search systems: server-based: scheme **-- User Client computer + WWW client program WWW server computer Internet WWW WWW server computers with Internet search systems In Out

317 317 **-- Meta- search systems: relations User an Internet meta-search system Internet search system 1 Internet search system collected database 1 WWW pages Internet search system 2 Internet search system collected database 2

318 318 **-- Meta-search systems: server-based or client-based Online accessible on a server in the Internet. On the client, “meta-search software”.

319 319 **--Examples Meta-search systems: server-based systems http://www.all4one.com http://www.bytesearch.com http://www.cyber411.com http://www.dogpile.com = http://dogpile.com/http://www.dogpile.comhttp://dogpile.com/ http://www.go2net.com = http://www.metacrawler.comhttp://www.go2net.comhttp://www.metacrawler.com http://www.kartoo.com http://www.mamma.com http://www.museseek.com http://www.profusion.com http://www.search.com http://www.vivisimo.com = http://vivisimo.com/http://www.vivisimo.comhttp://vivisimo.com/

320 320 **--Examples Meta-search systems: server-based systems An overview of meta-search systems that are based on a server in the Internet is avialable via http: //directory.google.com /Top /Computers /Internet /Searching /Metasearch/

321 321 **--Example Meta-search systems: server-based: example: Vivisimo

322 322 **--Example Meta-search systems: server-based: example: Vivisimo

323 323 **--Example Meta-search systems: server-based: example: Vivisimo Vivisimo adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. Vivisimo can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.

324 324 **--Example Meta-search systems: server-based: example: Vivisimo In the test search for a family name, Vivisimo succeeded in clustering documents related to different persons with the same family name. For comparison: the clustering search engine Teoma did not accomplish this.

325 325 !! Task - Assignment - Exercise !! Have a look at Vivisimo, a meta-search engine through the WWW, that offers automatic clustering=classification=categorisation=grouping of search results. **--

326 326 **--Example Meta-search systems: server-based: example: Dogpile The clustering software of Vivisimo is also used on other systems. Example: http://dogpile.com/http://dogpile.com/

327 327 **--Example Meta-search systems: server-based: example: Kartoo

328 328 **--Example Meta-search systems: server-based: example: Kartoo Kartoo offers an advanced graphical user interface. Before you can exploit the system, reading the manual is recommended.

329 329 !! Task - Assignment - Exercise !! Have a look at Kartoo, a meta-search engine through the WWW that offers sophisticated information visualisation. **--

330 330 Meta-search systems: client-based: scheme **-- User Client computer + Multi-threaded Internet search client program Internet WWW WWW server computers with Internet search systems In Out

331 331 **--Examples Meta-search systems: client-based: example Example: Copernic http://www.copernic.comhttp://www.copernic.com

332 332 **-- Meta-search systems: advantages (Part 1) + Saves time when otherwise more than only 1 Internet- based information source would have to be used one after the other; for instance when searching for specific information that is hard to find in any single source. In other words: for the same time spent, more sources can be covered. + Only 1 user interface must be learned for many sources.

333 333 **-- Meta-search systems: advantages (Part 2) + The user interface of some meta-search systems can be adapted to the local user population, which is cannot be realised with most external, Internet-based information sources. + In comparison with systems that first integrate/merge the information from several sources into one source and afterwards provide access, meta-search systems provide more up-to-date access to the information.

334 334 **-- Meta-search systems: advantages (Part 3) + Some meta-search systems provide a useful integration of the results they get from the various primary search systems, with a removal of repeated results.

335 335 **-- Meta-search systems: advantages (Part 4) + Some server-based and client-based meta-search systems show links among retrieved pages. + Some client-based meta-search systems allow storage on the client computer of a search query for later, repeated usage/application; application of such a system even allows excluding resulting documents that were already retrieved in an earlier search.

336 336 **-- Meta-search systems: advantages (Part 5) + Can add value, for instance by analysing the results / hits so that they can be clustered / grouped / categorized / classified, to make further selections by the user / searcher easier and faster. Example: http://www.vivisimo.comhttp://www.vivisimo.com

337 337 **-- Meta-search systems: disadvantages (Part 1) - It is not always clear through which Internet indexes the meta-search system will search. - Not all meta-search systems can search all the major primary search systems; for instance the famous Google Internet index is normally NOT included. -The systems are often slower than a direct, primary search system. - Only a limited number of the results that can be obtained from the various Internet indexes are shown.

338 338 **-- Meta-search systems: disadvantages (Part 2) - Some specific or advanced features of the individual search systems cannot be used through all the meta- search systems, such as: »Boolean searching, »proximity searching, »field searching, »categorization / clustering of search results, »...

339 339 ?? Question ?? List advantages and disadvantages of meta-search systems **--

340 340 Internet information sources Coverage of Internet directories and Internet indexes **** A global Internet index A global Internet directory

341 341 ?? Question ?? Suppose that you have searched for documents on the WWW about a particular subject by a query with a word or a term in a popular search engine, and that you found only few documents. How can you try to find more relevant WWW pages? Suppose that you have searched for documents on the WWW about a particular subject by a query with a word or a term in a popular search engine, and that you found only few documents. How can you try to find more relevant WWW pages? **--

342 342 ?? Question ?? A WWW directory is: (select one) 1.the same as a search engine 2.different to a search engine, because it is made by real people 3.a listing of where people live A WWW directory is: (select one) 1.the same as a search engine 2.different to a search engine, because it is made by real people 3.a listing of where people live ***-

343 343 **** Global Internet search tools: a comparison Global Internet directories Only a limited selection of Internet sources Browsing information sources is easy Good for broad searches Global Internet indexes About 1/3 of the Internet is covered by an index Searching requires some skills and knowledge Good for specific, narrow searches Multi-threaded search systems These get information from directories and indexes Searching requires some skills and knowledge Good when even 1 index does not yield information

344 344 Internet indexes cover only a part of the Internet: introduction (1) ***- The “visible” part of Internet The “hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index like Alltheweb, AltaVista, Google...)

345 345 ?? Question ?? Which information on the Internet is not covered by many searchable Internet indexes? ***-

346 346 Internet indexes cover only a part of the Internet: introduction (2) ***- Why can Internet indexes find only a part of what is in fact available through the Internet? 1.Quantitative technical limitations: Each Internet search system has indexed only a part of the static WWW pages that are available for indexing. 2.Qualitative technical limitations: Besides the static WWW pages that Internet search engines try to cover, many other, quite different sources exist, that are also available through the Internet, but that are not incorporated in those search engines.

347 347 Internet Internet indexes cover only a part of the Internet: scheme ***- WWW Databases and file archives accessible through the Internet telnet ftp... telnet ftp... CGI, ASP,... Rapidly changing information, such as news Information accessible only when passwords are used Static indexable texts in the WWW ( = on HTTP server computers) covered partly by Internet indexes Word files PDF files

348 348 ?? Question ?? Give an example of a database that is accessible through the WWW. ***-

349 349 Database accessible over the Internet: a famous example: Medline/PubMed ***- Example

350 350 Database accessible over the Internet: a famous example: Medline/PubMed Medline is a database of descriptions of articles in the area of medicine, published in more than 4000 scientific journals. This databases is accessible through several different retrieval systems on the Internet and the WWW. Medline/PubMed is one of the systems that provide access to the database. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi http://www.ncbi.nlm.nih.gov/entrez/query.fcgi These systems are provided free of charge by the USA National Library of Medicine. **-- Example

351 351 Internet indexes cover only a part of the Internet: conclusion for users When you want to retrieve information about a particular subject from the Internet, use not only WWW indexes, but use also other sources accessible through the Internet »databases! (book and journal bibliographies, library catalogues, archives of group messages, directories, atlases,…) »rapidly changing information, such as news »information accessible only when passwords are used »anonymous ftp file archives »e-mail based interest groups; Usenet newsgroups ***-

352 352 ?? Question ?? The popularity of Internet is growing. People use Internet often to retrieve information, using a search engine that is based on a big database with a database index for fast retrieval. In this way relevant information can indeed be found in most cases. However, explain all the reasons that you see why not all the information that is available through the Internet can also be directly found in this way. ***-

353 353 ***- Gateways to Internet databases accessible free of charge Most Internet search engines search classical, static WWW pages and not databases accessible through the WWW. However, some systems offer a gateway to search databases on the Internet. Examples: »http://www.completeplanet.com/http://www.completeplanet.com/ »http://www.invisible-web.net/http://www.invisible-web.net/ (See also other more general directories/overviews/lists of Internet information sources.)

354 354 ***-Example Gateways to Internet databases accessible free of charge: screenshot

355 355 ?? Question ?? What means “the invisible web”? ***-

356 356 **-- Finding information in PDF documents on the Internet The contents of documents on the Internet in Adobe’s Portable Document Format (PDF) are not indexed and thus not retrievable by most Internet search engines. Therefore, the creator of PDF, Adobe, makes available free of charge a specialised search engine to find PDF files from the Internet at http://searchpdf.adobe.com/ http://searchpdf.adobe.com/ Also some more general Internet search engines can find PDF files. Examples: Google, Alltheweb, AltaVista…

357 357 Collections of Internet search tools: introduction Collections / overviews have been made of the existing, accessible Internet search tools. These are often presented in the form of HTML WWW pages with forms. Some offer links to other collections. **--

358 358 ?? Question ?? Do all Internet search tools support truncation in a query / search? Explain your “yes” or “no”. Do all Internet search tools support truncation in a query / search? Explain your “yes” or “no”. **--

359 359 **-- Guides to searching the Internet available through WWW Searching the Internet: recommended sites and search techniques. [online] Available from: http://www.albany.edu/library/internet/search.html http://www.albany.edu/library/internet/search.html The RDN virtual training suite. [online] Available from: http://www.vts.rdn.ac.uk/ offers training for users with a specific academic or professional interest. http://www.vts.rdn.ac.uk/

360 360 Internet: a hierarchy of tools to locate sources Meta-meta-information Meta- information Information and other sources **-- Meta-indexes = overviews = collections of Internet indexes Internet indexes of Internet sources Internet sources

361 361 Internet: who owns the search tools? In 2003: The company Yahoo! owns »the most famous global Internet subject directory »3 (!) Internet full-text search engines: All the Web, AltaVista, Inktomi The company Google owns »the most famous Internet full-text search engine »one of the best Internet image search engines »a gateway to old and new Usenet news messages ****

362 362 ?? Question ?? How can you easily find new pages that become accessible on the WWW about a particular topic that is interesting for you? ***-

363 363 Current awareness services focusing on WWW pages: introduction Tracking changes in one or more public access pages on the WWW or finding new pages, is possible in an automated way, »by using one of the available, suitable, programs loaded on your client workstation! example: the advanced version of Copernic that is not available free of charge »through “alert” services based on a server on the WWW —that track updates for the user/subscriber —and send alerts by email to the user/subscriber ***-

364 364 Current awareness services focusing on WWW pages: modified versus new Several systems exist that can track changes / modifications / updates in a particular existing WWW page for you, even free of charge. Some systems can find new pages on the WWW for you. ***-

365 365 Current awareness services focusing on WWW pages: Google Alert Can discover relevant changed or new WWW pages for you in the future. Is based on the external Internet index Google. Works with search queries given by you that are stored on their server computer. Free of charge, at least up to 2003. http://www.googlealert.com/ ***-Example

366 366 Current awareness services focusing on WWW pages: Google Alert ***-Example

367 367 Online access information sources and services Public access book databases ****

368 368 Public access book databases: introduction Even in this age of Internet-based information sources, a lot of information is still distributed in the form of printed books. The contents of most books is (still) not available on the Internet. Most general Internet search tools do NOT allow you to find out about the existence of books that may be interesting for you. So, specific search tools to find books can be useful. ****

369 369 Public access book databases: an overview (Databases by publishers.) Fee-based databases by commercial providers Databases by book distributors / bookshops! Online public access catalogues of »local libraries, »national libraries (which produce and offer normally their national bibliography)! »big, famous libraries!! (Databases of computer-based versions of books.) ****

370 370 Public access book databases: which one to use? For years, the market of bibliographic information on books was limited to the services and databases of subscription-based bibliographic providers. Nowadays, the WWW provides a key to unlock many possibilities to find bibliographic information. Which book database should be preferred for particular applications is not clear for most librarians or end-users. ****

371 371 Suitable book databases? AIMRECOMMENDED SYSTEMS To find book titles about a specific subject / topic ? To find book titles published before 1990 ? To find a book title through a title search ? To find the price of a book ? To be informed regularly about new books ? ***-

372 372 Public access book databases by commercial producers To find currently available books, some databases assembled by commercial producers can be interesting. Example: Global Books in Print These databases offer formal descriptions of books, prices of the books, short descriptions of the contents with subject terms… However, access to such a database is not free of charge and can be expensive (in comparison with alternatives). ****

373 373 Public access book databases provided by bookshops To find currently available books, the bibliographic databases assembled by big bookshops are interesting. Several offer a good coverage and are accessible free of charge. The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book. Some provide a current awareness service, also free of charge. ****

374 374 Book databases accessible free of charge: examples in U.S.A. Amazon.com (US): http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazone Subject description is poor. Allows full text searching in the contents of a selection of recent books, fre of charge. http://www.amazon.com/http://www.amazon.co.uk/ Barnes and Noble (US): http://www.bn.com/ http://www.bn.com/ ****Examples

375 375 Book databases accessible free of charge: examples in Europe Blackwell’s on the Internet (International, academic books): http://www.blackwell.co.uk/ http://www.blackwell.co.uk/ VLB for books in German http://www.buchhandel.de/ http://www.buchhandel.de/ For books in French http://www.chapitre.com http://www.chapitre.com Boeknet - De Nederlandse Internet Boekhandel (Dutch) http://www.boeknet.nl/ http://www.boeknet.nl/ ***-Examples

376 376 Book databases accessible free of charge: examples in Belgium Proxis (Belgium) http://www.proxis.be/ http://www.proxis.be/ ***-Examples

377 377 Book databases accessible free of charge: for old books To find used, secondhand, rare, hard-to-find, and out-of-print books around the world: abebooks http://www.abebooks.com/ http://www.abebooks.com/ ***-Examples

378 378 Free public access bibliographic book database + price comparisons Even comparisons of the catalogues of shops of books (as well as of music, movies and many other goods) are available free of charge. See for instance »http://www.bookfinder.com/http://www.bookfinder.com/ »http://www.dealtime.com/http://www.dealtime.com/ ****

379 379 Example of an international public access dissertation database The dissertation database of UMI is available from: http://wwwlib.umi.com/dissertations/ http://wwwlib.umi.com/dissertations/ The most current two years are available without charge. ***-Examples

380 380 Databases of links to the full text of many books Databases (accessible free of charge ) of links to the full text of many books: http://digital.library.upenn.edu/books/ http://wordtheque.com/ **--Examples

381 381 Collection of links to public access book databases See for instance Internet directories like Yahoo! that lead to information about books. **--Examples

382 382 !! Task - Assignment - Exercise !! Search for titles of books which are relevant for you, using an online database provided by a book publisher or bookshop. ****

383 383 Online Public Access Catalogues of libraries **** Mainly to find older books, the catalogues of libraries can be useful. Most are accessible online and free of charge.

384 384 Online Public Access Catalogues = OPACs: definition ***- Online Public Access Catalogue: a term used to describe any type of computerized library catalog offered to the public by online login

385 385 Online Public Access Catalogues of the big famous libraries For instance: Library of Congress (USA) Their coverage is good. They offer the best subject descriptions. Access is free of charge. So they form excellent sources to find books about a particular subject/topic. ***-

386 386 Online Public Access Catalogues: The British Library Accessible online via WWW: Since 2000: http://blpc.bl.uk/http://blpc.bl.uk/ Access free of charge ***-Example

387 387 Online Public Access Catalogues: The British Library: screenshot ***-Example

388 388 !! Task - Assignment - Exercise !! Search for titles of books which are relevant for you, in the British Library. ***-

389 389 Online Public Access Catalogues: catalogues of national libraries National libraries are first of all an outstanding source for the local publications. The national libraries are the most reliable source for bibliographic searching and verification. ***-

390 390 Online Public Access Catalogues: union catalogues of libraries Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’. Example: Copac http://www.copac.ac.uk/ is accessible free of charge. http://www.copac.ac.uk/ ***-

391 391 Public access book databases: evaluation criteria - desiderata (1) Is usage free of charge? Wide coverage? Specialized coverage of books »in your preferred language? »on particular subjects / topics? »published in a specific country? »published in a particular time period? »of particular types (such as conference proceedings)? Up to date? Frequent updates? ***-

392 392 Public access book databases: evaluation criteria - desiderata (2) Does the database offer besides each formal book descriptions also »an abstract / summary / description of the contents? »a table of contents? »the price? »information about the publisher? »titles of related books? »reviews by readers? ***-

393 393 Public access book databases: evaluation criteria - desiderata (3) Full text indexing of each item (book description) in the database, so that full text searching is possible? Field indexing, so that searching limited to the contents of a particular field is possible? for instance »the title »the date of publication »the author »the publisher »the language ***-

394 394 Public access book databases: evaluation criteria - desiderata (4) Does the database producer improve retrieval by »adding subject terms, or »by classifying the books in categories ***-

395 395 Public access book databases: evaluation criteria - desiderata (5) Powerful search options: »truncation of words in a query? »stemming of words in a query? »Boolean search combinations? combined field searching? »proximity searching? »spelling check of your search terms? »suggestions by the system of spelling variations of the words in the query »translation of your search terms in several other languages? ***-

396 396 Public access book databases: evaluation criteria - desiderata (6) Can the user browse through subject categories that are used in the book database? Is a user interface offered in your own language? Easy user interface? Relevance ranking of results? Possibility to combine Boolean retrieval with relevance ranking of results? Can results be limited to a certain time period? Short response times? ***-

397 397 Public access book databases: evaluation criteria - desiderata (7) Can the results be ordered according to date, size, origin...? Good presentation of each result? For instance: Are search terms highlighted? Can search results be downloaded, well structured with field tags? (For instance to allow incorporation of the data in another database.) ***-

398 398 Public access book databases: evaluation criteria - desiderata (8) Does the system offer a current awareness service, sending information on new titles that may be of interest to you? ***-

399 399 Public access book databases: evaluation criteria - desiderata (9) Are other services offered from the same site or with the same interface? Is the system integrated with other services? Additional services can be »searchable databases of videos, of music CD’s, CD-ROMs, DVDs, all for sale also »WWW-based e-mail and e-mail address directories »auctions through WWW **--

400 400 Public access book databases: evaluation criteria - desiderata (10) Is the database system accessible through the Z39.50 Internet database search and retrieve protocol? In other words, is the database Z39.50 compliant? This would offer the following advantages: »The system can then be searched starting from one of the available Z39.50 client software packages. »The database can be then searched simultaneously with other Z39.50 compliant databases and the results from the various databases can be merged. This is useful for rare, uncommon, special items that are difficult to find. **--

401 401 Recommended book databases AIMRECOMMENDED SYSTEMS To find book titles about a specific subject / topic Library of Congress, British Library, (Amazon) To search for book titles published before 1990 national libraries, Barnes&Noble, Infoball, Alapage, Abebooks Book title search in general Library of Congress, British Library, Infoball To find the price of a book Global Books in Print, Infoball, online bookshops To be informed regularly about new books Amazon, Alapage, Bol ***-

402 402 General conclusion concerning book databases The one and only, international, complete, ideal, bibliographic database does NOT exist, but the united forces of the different available book databases should be satisfying. ***-

403 403 ?? Question ?? Why are book databases useful? In other words, why can we NOT simply use one of the general, all purpose WWW search engines to find relevant books? Why are book databases useful? In other words, why can we NOT simply use one of the general, all purpose WWW search engines to find relevant books? **--

404 404 Online access information sources and services Fee-based online public access information services ****

405 405 Types of online access information systems: “free” versus “fee” A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor. The first commercial computer systems that make information available online were born around 1975. Most of them are now also available through the Internet. Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge. ****

406 406 Types of online access information systems: “free” versus “fee” **** Public access information sources free of charge Fee-based online information services (NOT free of charge)

407 407 Types of online access information systems: “free” for members only **** Public access information sources free of charge Fee-based online information services (NOT free of charge) Fee-based online information services, made accessible “free of charge” by an institute to its members

408 408 Fee-based online access services: examples (Part 1) Location of the computer(s)U.S.A. U.S.A. U.S.A. U.S.A., Taiwan, UK Switzerland U.S.A. Name America On Line OCLC Ovid Technologies CompuServe Cambridge Data-Star Dialog EBSCO ***-Examples

409 409 Fee-based online access services: examples (Part 2) Location of the computer(s) U.S.A. U.S.A. U.S.A., The Netherlands,... Germany - U.S.A. - Japan The Netherlands... Name Elsevier ScienceDirect Factiva ISI (Web of Knowledge, JCR,…) LexisNexis MSN (Microsoft) Prodigy Silver Platter STN Swets-Blackwell (e-journals)... ***-Examples

410 410 Online information services: various names for similar systems (fee-based) online (access) information service (fee-based) online (access) computer service databank database vendor host computer aggregator... ***-

411 411 Online information services: access methods ***- Using generic, common communications software »through the telephone network (telephone + modem) »through X-25 data communication networks »through Internet, using client-server systems: —telnet —WAIS or Z39.50 —http (WWW)! (Examples: http://www.dialogweb.com; http://www.datastarweb.com)http://www.dialogweb.comhttp://www.datastarweb.com (Using client software dedicated to the particular service)

412 412 Online information services: total size of their databases In 1999: The big host systems and the public access WWW pages offer a comparable quantity of information: WWW offered about 8 terabytes (= 8 000 gigabytes) of text data (according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.) Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998) »6 billion pages of text »3 million images ****

413 413 Database hosts / distributors: evaluation criteria - desiderata (1) Contract not required? A priori payment not required? Satisfactory stability / history / evolution / future of host? Low costs of data communication? Many databases available? Whole records available (or only parts)? Frequent updates? Whole database available? As one file or fragmented? ***-

414 414 Database hosts / distributors: evaluation criteria - desiderata (2) Low price of access? Low price of information? Good searching facilities? (cfr. desiderata for Internet indexes) Can the indexes of more than one database be searched simultaneously? ***-

415 415 Database hosts / distributors: evaluation criteria - desiderata (3) Online indication of costs? Practice free of charge? Good manuals, documentation and online help? Training courses available? Quality? Good help desk available? Gateway service offered?... ***-

416 416 Databases of online public access databases Example »Gale directory of databases ! Their coverage: »online access databases »(databases accessible on CD-ROM) »... ***-

417 417 Databases of databases: Gale Produced in U.S.A. Not free of charge Available in various formats: »printed »on CD-ROM »online via the host systems Data-Star, Dialog, with a payment required for each use »online through the Internet through various hosts, for a fixed price per year to be paid in advance ***-

418 418 !! Task - Assignment - Exercise !! Identify databases which may be relevant for you, using a directory of online databases. ***-

419 419 Online access information sources and services Online access databases about journal articles ****

420 420 Online access databases about journal articles: overview Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers. Many publishers offer searchable bibliographies, but only of their own publications. (for instance Emerald, Elsevier) Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge. ****

421 421 Online access databases about journal articles: Ingenta (1) Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts. Searching is free of charge. ****Example

422 422 Online access databases about journal articles: Ingenta (2) Payment is required to receive the full text of an article. Available from »http://www.ingenta.co.uk/http://www.ingenta.co.uk/ »http://www.ingenta.com/http://www.ingenta.com Ingenta has acquired Uncover in 2000. ****Example

423 423 Online access databases about journal articles: Article@INIST Article@INIST allows you to search in a bibliographic database, NOT full-text, (Journal articles, journal issues, books, reports, conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France. Does not offer usage of classification or thesaurus. Searching is free of charge. Available from http://form.inist.fr/public/eng/conslt.htmhttp://form.inist.fr/public/eng/conslt.htm Payment is required to receive the full text of an article. ****Example

424 424 Online access databases about journal articles: Infotrieve Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text. Available from http://www3.infotrieve.com/http://www3.infotrieve.com/ Payment is required to receive the full text of a document. Current awareness services are also offered free of charge: the table of contents of new issues of the journals that you have selected are sent to you by email. ****Example

425 425 Online access databases about journal articles: Scirus This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier. An article can be downloaded in full-text format only when a fee has been paid to the publisher The search interface: http://www.scirus.comhttp://www.scirus.com ****Example

426 426 Online access databases about journal articles: Scirus features Offered free of charge by Elsevier. Is partly based on the Fast WWW search system that is also used by Alltheweb. Offers access to information ordered according to some classification system / taxonomy. ****Example

427 427 Online access databases about journal articles: Medline Medline produced by the National Library of Medicine (USA) allows searching a bibliographic database of articles in the field of medicine. free of charge available from many sites, including »PubMed of the National Library of Medicine (USA) and »Ingenta **--Example

428 428 Online access databases about journal articles: Medline through PubMed **--Example

429 429 !! Task - Assignment - Exercise !! Search for titles of journal articles that are relevant for you, in a database provided free of charge. ***-

430 430 Online access databases: Web of Knowledge The Web of Science or more recently the Web of Knowledge offers access through the WWW to a database of bibliographic descriptions of scientific journal articles in all subject domains. This database is (only) available to members of organisations / institutes / companies / consortia that pay a yearly, high fee to the producer/publisher of the database. This database is not only suitable for subject searching, but also for citation searching. ***-

431 431 !! Task - Assignment - Exercise !! If the Web of Science database is available to you, then use it for subject searching. ***-

432 432 ?? Question ?? Bibliographical descriptions or other records/items/texts in a database have in most cases a field structure. Compare this with static web pages that do not form part of a database. What are the consequences? **--

433 433 ?? Question ?? Which differences do you see between most of the commercial search systems for searching commercial databases, and the freely accessible search systems to search for information that is available free of charge through the Internet. Which differences do you see between most of the commercial search systems for searching commercial databases, and the freely accessible search systems to search for information that is available free of charge through the Internet. **--

434 434 Online access information sources and services Online information sources about journal titles ***-

435 435 ***- Online information sources about journal titles: introduction Besides directories / catalogs / overviews /databases / lists of electronic, computer-based, online accessible newsletters, newspapers, journals, and besides databases about published articles in journals (bibliographic databases), information is also available through the WWW about journal titles in general: their exact names, name changes, editors, prices, formats (printed or electronic online), full text availability online, …

436 436 ***-Example Online information sources about journal titles: example Available free of charge: http://www.publist.com/ about classical journalshttp://www.publist.com/

437 437 Online access information sources and services Electronic newsletters and journals ***-

438 438 Electronic newsletters and journals: introduction ***- Since the end of the 1990s, electronic journals have become a new communication medium that cannot be neglected. Author / Sender Editor Reader / Receiver

439 439 **-- Electronic newsletters and journals: variations on a theme We can distinguish several methods »of distribution and access »of formatting the information (PDF, HTML,…) »of pricing and licensing »of restricting access (authentication and authorization of legitimate users) »to integrate access to e-journals with access to other information sources

440 440 Electronic newsletters and journals: various types and the price of access ***- We can distinguish various types: »equivalents of a version printed on paper —published almost simultaneously —print version published long time before electronic version = deliberate long delay for the electronic version »purely electronic publications Price of access: from free of charge to very expensive

441 441 Electronic newsletters and journals: access and distribution methods ***- Many different methods are used: »anonymous ftp »gopher »WAIS / Z39.50 »electronic mail, listserv,... »Usenet News »loaded on local systems in universities or institutes »http, WWW ! »Open Archives Harvesting Protocol + http, WWW

442 442 Electronic newsletters and journals through the WWW ***- The WWW has become the most important platform for access to electronic newsletters and journals.

443 443 Electronic newsletters and journals: example ***-Example

444 444 ***- Electronic newsletters and journals: problems and challenges There is no central database with all article titles, summaries, and full contents. There is not even a central, complete and up to date directory of journal titles. There is no standard licensing/pricing method. Not all electronic journals are accessible through 1 user interface. Many passwords must be used. Archiving (By whom? Forever?)

445 445 ***- Electronic newsletters and journals: integration with other sources It is not (yet) clear and straightforward how electronic journals should be integrated »in a library collection »in a library web site »in the catalogue database »in interlibrary lending (depends on licensing agreement for each individual journal)

446 446 ***- Electronic newsletters and journals: integration and access methods Access can be possible through »A gateway offered by a subscription agent or the publisher »A commercial bibliographical database »A web-based static listing of journal titles »A web-based OPAC (for instance in the MARC 856 field) »A local searchable database for e-journals »Special linking mechanisms, based on OpenURL (for instance SFX commercialised by Ex Libris or VLINK commercialised by GEAC) COMPLEXITYCOMPLEXITY

447 447 ***- Electronic newsletters and journals: more than one access method How should libraries and readers/users cope with the fact that many e-journals can be accessed in more than one ways, that is by hyperlinks starting from various information systems or services, while authentication and authorization is NOT fully automated for all those systems, once that a licensing agreement has been established? What mechanisms can offer support for this situation? This is called the “multiple copy problem” or the “appropriate copy problem”.

448 448 Link resolver to guide users to the appropriate e-document: introduction Link resolver = appropriate hyperlink generator: to guide users to the most suitable electronic sources that are appropriate for the specific library or specific user, for instance to cope with the multiple-copy / appropriate copy problem (such as SFX software from Ex-Libris or V-link software from VUBIS-GEAC) ***-

449 449 incoming reference target information source appropriate hyperlink generator Link resolver to guide users to the appropriate e-document: scheme database about local situation “Knowledgebase” ***-

450 450 !! Task - Assignment - Problem !! Find out how you can efficiently access electronic journals from your institute. ***-

451 451 **-- Electronic newsletters and journals: cross-linking by CrossRef CrossRef is a non-profit membership organisation. It provides a cross-publisher citation linking network, that contributes to virtual integration of distributed Internet-based information resources, to improve access to published scholarship. The cross-linking is based on digital object identifiers (DOI).

452 452 Directory of Open Access Journals The Directory of Open Access Journals is a directory of electronic journals that can be accessed free of charge. Available since May 2003. http://www.doaj.org/ ***-

453 453 ***- Directory of Open Access Journals: screenshot

454 454 Online access information sources and services Finding multimedia files on the Internet ****

455 455 **** Finding multimedia files on the Internet: introduction Several public access search systems are available free of charge, to search the Internet for multimedia files: »images / pictures (either artwork, either photos, or both) »sound / audio files (music, speeches...); video

456 456 **** Finding images on the Internet: introduction Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet. When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).

457 457 **** Examples Finding images on the Internet: screen shot of a Google image search

458 458 ****Examples Finding images on the Internet: examples of search engines (1) http://alltheweb.com/ !!http://alltheweb.com/ http://gallery.yahoo.com/ !http://gallery.yahoo.com/ http://images.google.com/ !!! or via http://www.google.com/ The largest database in this category (at least in 2002, 2003, 2004). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.http://images.google.com/ http://www.google.com/

459 459 ****Examples Finding images on the Internet: examples of search engines (2) http://multimedia.lycos.com/ http://www.altavista.com/ !! (also audio and video, choose not the normal text search, but IMAGES in the user interface.)http://www.altavista.com/

460 460 **--Examples Finding images on the Internet: examples of search engines (3) http://www.ask.com/ or http://www.aj.com/ or http://aj.com/ Ask Jeeves. Offers no indication of the number of images retrieved, which is a disadvantage when many pictures are found, but only a few can be seen at the time.http://www.ask.com/ http://www.aj.com/ http://aj.com/

461 461 **--Examples Finding images on the Internet: examples of search engines (4) http://www.ditto.com/ !http://www.ditto.com/ http://www.picsearch.com/ Does NOT directly show the origin of each picture with a readable URL, together with each thumbnail.http://www.picsearch.com/

462 462 **-- Examples Finding images on the Internet: directories of search engines A collection of links to suitable Internet search engines: http: //directory.google.com /Top /Computers /Internet /Searching /Search_Engines /Specialized/Images/

463 463 !! Task - Assignment - Exercise !! Use a specialised search engine to find images about a particular subject on the Internet. ****

464 464 ?? Question ?? Why can we say that most of well-known, popular, specialised systems to search for images (and not texts) on the WWW are nevertheless in fact text searching systems? ***-

465 465 **--Example Finding audio on the Internet: example of a search engine http://www.findsounds.com Allows you to find sound files in formats aiff, au, wav.

466 466 **--Example Finding audio and video on the Internet: example of a search engine http://www.altavista.com/ (use the special multimedia finder)http://www.altavista.com/

467 467 **--Examples Finding audio and video on the Internet: directories of search engines A collection of links to suitable Internet search engines: http: //directory.google.com /Top /Computers /Internet /Searching /Search_Engines /Specialized /Multimedia/

468 468 Online access information sources and services Evolution and future trends ****

469 469 Online access information: evolution and future trends An increasing amount of information becomes available online. A growing amount of this online information becomes available free of charge. The quality and ease of use of software on server as well as client is growing.  A consequence is: An increasing number of end-users searching for information online. ****

470 470 Online access information: easier and more complicated?! At the same time, information retrieval becomes both easier and also more complicated. This may seem strange and contradictory, but it is reality. This is a paradox. ****

471 471 Online access information: easier information retrieval systems Individual information retrieval systems become easier: »they react faster; »they can provide access to more data/information in one action; »their user interfaces are simple, but more sophisticated, intelligent retrieval algorithms can nevertheless deliver satisfactory results in most simple cases. ****

472 472 Online access information: more complicated information market The whole information landscape consists of more and more decentralised information sources, each one bringing an individual user interface that should be mastered. Making the right, ideal choice among the sources becomes not easier, perhaps even more complicated every day. ****

473 473 Online access information: more complicated information market Furthermore, for many sources the accessibility / availability, the user interface, the interlinking, depend on the organisation in which the searcher is active. ****

474 474 Online access information: conclusion In the case of simple information needs, the WWW and the search tools can work like “magic”. However, in the case of more complicated information needs, there is still is no “magic button” that brings you immediately to all the required information. ****

475 475 Evaluations in information retrieval ****

476 476 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied to assess the quality of »information retrieval systems, and more concretely of search systems »the resulting set of records obtained after performing a query in an information retrieval system Note: This should not be confused with assessing the quality and value of the content of an information source. ****

477 477 Evaluations in information retrieval: introduction The quality of the results, the outcome of any search using any retrieval system depends on many components / factors. These components can be evaluated and modified to increase the quality of the results more or less independently. ****

478 478 Evaluations in information retrieval: important factors The information retrieval system ( = contents + system) The user of the retrieval system and the search strategy applied to the system **** Result of a search

479 479 Evaluations in information retrieval: why? (Part 1) To study the differences in outcome/results when a component of a retrieval system is changed, such as »the user interface »the retrieval algorithm »addition by the database of uncontrolled, natural language keywords versus keywords selected from a more rigid, controlled vocabulary ****

480 480 Evaluations in information retrieval: why? (Part 2) To study the differences in outcome/results when a search strategy is changed. To study the differences in outcome/results when searches are performed by different groups of users, such as »children versus adults »inexperienced users versus more experienced, professional information intermediaries/professionals ****

481 481 Evaluations in information retrieval: the simple Boolean model Boolean model: # items in database = # items selected + # items not selected # Items selected = # relevant items + # irrelevant items Relevant Yes 1 In Irrelevant No 0 Out ****

482 482 Relevant items in a database: scheme **** Dependent on the aims, independent of the search strategy Relevant items! (In most cases the small subset) Irrelevant / NOT relevant items (In most cases the large subset)

483 483 Selecting relevant items by searching a database: scheme **** Dependent on the aims, independent of the search strategy Selected and relevant! Selected but not relevant Not selected but relevant Not selected and not relevant Dependent on the aims and dependent on the search strategy

484 484 Recall: definition and meaning **** Definition: # of selected relevant items “Recall” = ------------------------------------------------- * 100% Total # of relevant items in database Aim: high recall Difficulty: in most practical cases, the total # of relevant items in a database cannot be measured.

485 485 Selecting relevant items: recall **** Selected and relevant! Selected but not relevant Not selected but relevant Not selected and not relevant

486 486 ?? Question ?? How can you change your search strategy to increase the recall? ***-

487 487 Precision: definition and meaning **** Definition: # Of selected relevant items “Precision” = --------------------------------------- * 100% Total # of selected items Aim: high precision

488 488 Selecting relevant items: precision **** Selected and relevant! Selected but not relevant Not selected but relevant Not selected and not relevant

489 489 ?? Question ?? How can you change your search strategy to increase the precision? ***-

490 490 ?? Question ?? When you change your search strategy to increase the precision, which consequence do you expect for the recall, in most cases? ***-

491 491 Relation between recall and precision of searches 100% Recall 0 0 Precision 100% Ideal = Impossible to reach in most systems Ideal = Impossible to reach in most systems Search (results) ****

492 492 Recall and precision should be considered together Examples: Increase in retrieved number of relevant items may be accompanied by an impractical decrease in precision. Precision of a search close to 100% may NOT be ideal, because the recall of the search may be too low. Make search / query broader to increase recall ! Poor (low) precision is more noticeable than bad (low) recall. ****

493 493 Evaluation in the case of systems offering relevance ranking Many modern information retrieval systems offer output with relevance ranking. This is more complicated than simple Boolean retrieval, and the simple concepts of recall and precision cannot be applied. To compare retrieval systems or search strategies, decide to consider for comparison a particular number of items ranked highest in each output. This brings us to for instance: “first-20 precision”. ****

494 494 Evaluating the quality of information Documentary information sources: evaluating their quality ****

495 495 Documentary information sources: evaluating their quality We should always be critical when using information sources, in view of »the widely varying degrees of quality of information sources, and of »the costs associated with searching, finding, using information. ****

496 496 Documentary information sources: evaluation criteria (1) Is the information valid, reliable, trustworthy, genuine, authentic? Is the author honest? Is the source objective, not subjective, without cultural or political or ideological or commercial bias? Is the origin an individual or a company or an organisation? Is the publication sponsored by some company or organisation? ***-

497 497 Documentary information sources: evaluation criteria (2) Is the information accurate, correct? Who is the author or producer? Has the source an author or a producer with a high expertise, a good reputation, good qualifications? Can the author be contacted for clarification or discussion? Was the information reviewed, edited, improved, corrected, censored, approved, verified, before publication? Do experts agree on the information provided? ***-

498 498 Documentary information sources: evaluation criteria (3) Is the information source unique? Does it offer a great amount of primary information, which is not obtainable from other sources? Is the information complete? Is the work available in its entirety? Does the source offer a wide coverage? Is the source comprehensive, substantive? Is the information current enough, up to date? Is a publication date provided? Is an expiration date provided? ***-

499 499 Documentary information sources: evaluation criteria (4) Does the document provide suitable references, so that you can verify statements and find older suitable information sources? Good clear format and lay-out of the information / User-friendly information system / Easy for users to orientate themselves within the resource and to find their way around it? Good user support / Good customer support? Is the type of distribution medium appropriate? (print, e-mail, online,...) ***-

500 500 Documentary information sources: evaluation criteria (5) Is the information what you want? If not, then reassess your needs and consider other types of information as well. ***-

501 501 Documentary information sources: evaluation criteria (6) Is the information suitable for your level of understanding of the subject? Is the document popular, suitable for the general public, for students, for professionals, for scholarly/academic use…? Does it report new, primary research (survey, experiment, observation, measurement, invention) or is it a review of sources published earlier? Does the information repeat or confirm what you already know, or is it complementary, contradictory, new? ***-

502 502 Evaluating the quality of information Computer-based information sources: evaluating their quality ***-

503 503 Computer-based information sources: evaluation criteria (Part 1) ***- Besides more general criteria applicable to all information sources, for those sources that are based on computers and networks we see the following criteria: Easy to navigate? »User-friendly information system? »Easy for users to orientate themselves within the resource and to find their way around it? »Is the resource organised into manageable chunks of information that can be browsed easily?

504 504 Computer-based information sources: evaluation criteria (Part 2) ***- »Is a contents page or index offered that describes what is contained within the site? »Are there good navigational links within the pages (e.g. 'back', 'forward', 'home') »Are the links clearly labeled? »Is the navigation process supported by images? »Is there a single downloadable file for documents that exist as a series of separate pages? »Is there a search facility within the resource?

505 505 Computer-based information sources: evaluation criteria (Part 3) ***- Good user support? »Good support that is offered to users to help them answer queries and problems that arise whilst using the resource? »Good computer-based, contextual help, documentation, training materials or tutorials? »E-mail contact(s) and telephone number(s) available?

506 506 Computer-based information sources: evaluation criteria (Part 4) ***- Based on appropriate technologies? »Are technologies and standards used that will enable users to access and utilize all aspects of the resource? »Does the resource avoid that proprietary software should be used? »Does the resource avoid the use of proprietary extensions to HTML, which some browsers will not be able to recognize?

507 507 Computer-based information sources: evaluation criteria (Part 5) ***- »Does the format allow access to the resource for all users, even for instance sight impaired and those who can only navigate by using the keyboard? Information integrity / High stability of the contents / Low volatility of the contents? »Is there adequate maintenance of the information content?

508 508 Computer-based information sources: evaluation criteria (Part 6) ***- System integrity? »Site integrity relates to the stability of the site over time. This usually relates to the work of the site manager or web master. »Realise that individual sites can be moved or withdrawn at any time by those responsible for publishing information on the Internet, and that addresses, file structures, formats and interfaces can be altered without warning. »Is the site current and up to date?

509 509 Computer-based information sources: evaluation criteria (Part 7) ***- »Is the site proven to be or expected to be durable in nature? »Is the site adequately administered and maintained?

510 510 These slides will be available through the WWW from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ References to publications about this subject and more slides are available through the WWW from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/ (note: BIBLIO and not biblio)

511 511 Questions? Suggestions? Topics for further discussion?


Download ppt "1 Information retrieval for scientists Vrije Universiteit Brussel Information and Library Science, University of Antwerp Belgium."