Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all of the relevant results, i.e. not being too narrow in search scope Subject vs. Keyword Strategies: – Boolean – Nesting – Phrase Searching – Truncation – Proximity – Field Searching
Subject vs. Keyword All cataloged and indexed materials have assigned headings called “subjects” Subject headings describe the “aboutness” or topic of the work, bring together all of the works on the same topic, despite differences in text. Subject headings are “controlled”- they are carefully selected from existing lists called “controlled vocabularies” Subject searches only search within the assigned subject field within a database record
Subject vs. Keyword Keywords are natural language Different people (including authors) use different words to describe the same topic Keywords are not controlled Keyword searches typically search an entire database record, which increasing includes document text.
Subject Searching Advantages – More precise; fewer irrelevant results – More manageable numbers of results – Increases recall by disambiguating and co- locating alike terms Disadvantages – Unfamiliar to users – Controlled vocab difficult to discover and manage
Keyword Searching Advantages – “Natural” language- requires no special knowledge – Increases recall by searching full record – Can lead to subject searching Disadvantages – Less precision; more irrelevant results (due to ambiguous terms, retrieving terms from irrelevant parts of record, i.e. notes, author, etc.) – Synonyms/ different terms mean loss of recall unless all terms are searched – More weeding/ larger numbers of results
Boolean Searching Boolean Operators: And, Or, Not Allows us to broaden our search by adding like terms (using “or”) Allows us to narrow our search by searching more than one topic at a time (using “and”) Allows us to eliminate unwanted results (using “not”)
Boolean Searching Keyword grid- helps organize thoughts: Childhood obesity OR AND OR AND OR ChildhoodORYouthORAdolescent AND ObesityOROverweightORBMI Index AND RatesORStatisticsORPrevalence
Nesting When we combine several terms, we have to groups like terms together Search engines often “read” search strings like a sentence- Child* and obesity or overweight will look for: Child and obesity as one search, with both terms appearing in the document, and obesity as a separate search, not combined with child
Nesting To avoid confusion, we nest terms with parentheses In general, interchangeable terms (those connected with “or”) go in parentheses. Child* and (obesity or overweight) will look for the word child with either obesity or overweight- every record will have one of these combinations.
Phrase Searching Most databases automatically “and” search terms together if no Boolean operator is specified. With “and” both words will appear in the record (or full text) but they may not be anywhere near each other, may not be related. To specify a phrase, use quotations “information literacy” will search for those two words as a phrase.
Truncation Allows us to search word variations automatically Some databases differ, but usually indicated by a “*” – childhood = child* – obesity = obes*
Wildcard Like truncation, allows for variations within a word – color = colo*r – behavior = behavio*r
Proximity Specify the distance and/or word order of search terms Operators= “within” usually written as “w” and “near” usually written as “n” – Childhood w5 obesity = the words childhood and obesity must both appear, in that order, with no more than 5 words between. – Childhood n5 obesity = the words childhood and obesity must both appear, with no more than 5 words between, in any order.
Proximity Increases precision, as words close together are more likely related Databases vary in level of specificity- – Some allow you to search terms up to 25 words apart. – Some allow you to search within sentence, paragraph, or page rather than give a word count.
Field Searching Databases offer many additional ways to limit our search Typical are author, title, subject, journal title, date, document type, etc. These are called field- vary by database May be more fields in an “advanced” or “expert” search than are available to the basic search screen.
Other considerations Stop words- a number of words, considered superfluous, are automatically dropped from searches. These include the boolean operators (and, or, not) as well as articles and prepositions (a, an, the, of, etc.). If these words are essential, use quotations to indicate a phrase search and they will be included. Thesaurus- many databases allow you to search their controlled vocabulary through a “thesaurus” feature Browse index- similarly, you may be able to search or browse other fields, such as journal names
Other considerations Inconsistencies- there is no standardization or universal control of database. As a result, field names, search operators, available fields, etc. can vary. i.e. the field for the title of a journal is variously called “journal name” “journal title” and “source Time outs- some databases automatically time out after a certain period of inactivity, and all work will be lost