Presentation on theme: "LIS618 lecture 1 Thomas Krichel 2004-02-01. structure of talk Recap on Boolean (aurally) Before online searching Working with DIALOG –Overview –Search."— Presentation transcript:
structure of talk Recap on Boolean (aurally) Before online searching Working with DIALOG –Overview –Search command Boolean exercise (on the fly)
before a searchI What is the purpose of the query? –brief overview –comprehensive search What perspective on the topic is required? –scholarly –technical –business –popular
before searchII What type of information does the patron want? –fulltext –bibliographic –directory –numeric Are there any known sources? –authors –journals –papers –conferences
before searchIII What are the language restrictions? What, if any, are the cost restrictions? How current need the data to be? How much of each record is required?
concept analysis This is the art/science of taking the topic to search for and develop facets. Example Internet filtering in Libraries –Internet filter –Libraries –Controversy not technical issues We may also need the think about the aim of the search.
search aims a known needle in a known haystack a known needle in an unknown haystack an unknown needle in an unknown haystack any needle in a haystack the sharpest needle in a haystack most of the sharpest needles in a haystack
search aims all the needles in a haystack affirmation of no needles in a haystack things like needles in a haystack is there a new needle in the haystack where are the haystacks needles, haystacks, anything
types of searches known-item searches negative searches selective dissemination of information topical or subject searches passage searching, where the user is only interested in part of the item
search strategies I Building block approach –Do a number of elementary searches –Combine the resulting sets with Boolean operators This is what I did in the example in the previous lecture Works only with the Boolean model
search strategies II Snowballing approach –Start with a very specific query –Think of other term that can be added to get more results –Stop when a reasonable number of results are achieved. Not sure this really works well in practice.
search strategies III The successive fraction approach is the opposite of the snowballing approach –First search for a broad concept –Then repeat the query by adding various limiting factors. Can work well if the IR system allows to repeat and edit queries. But queries can become unwieldy.
search strategies IV Most specific facet first –Conduct concept analysis –Look for the most specific facet –Search that first, add others later Presupposes that you have done a decent concept analysis.
two steps in DIALOG step one: select databases (aka files) to look at step two: perform searches on the selected databases You may wonder why one does not have one single step like in a search engine. Discuss. today we concentrate on the second step
working on selected files We assume that we have selected database that we know and we look at the search interface on the selected database. The database selection process is a bit more complicated, covered next week. First, let us login and look at the command prompt. Then we select the first database (file) with the begin command
the begin command As its name suggests, usually the first command. begin number, number,… selects files with numbers number Once they are selected they can be searched. Now select the ERIC "begin 1" "Begin 1" can be abbreviated as "b 1"
substeps in the second step Identify search terms Use Dialog basic commands to conduct a search View records online or print the results
the 's' (select) command Once issued the "begin" command to select a database, we issue the "s" command on the database. "s query_expression" where query_expression is a query expression. This will search the index of selected database in full-text view for the query issued It will not find any of the following: "an and by for from of the to with". They are stop words.
query expression A query expression contains search terms expressed in special ways –You can truncate search terms. –You can build an elementary expression by putting several keywords together. This is achieved by DIALOG's connectors. –You can combine several expressions with the use of Boolean operators We will cover this is in turn now.
truncation of terms I Open Truncation –"select path?" retrieves all words that begin with path: paths, pathos, pathway, pathology Controlled-Length Truncation –"select path??" retrieves the root and up to two additional characters: paths, pathos
truncation of terms II Embedded Character truncation can be used for variant spellings: –"select organi?ation" ->organization organisation –"select fib??board" ->fiberboard fibreboard This truncation feature is also useful for searching for unusual plural forms: –"select wom?n" ->woman women Apparently you can also do prefixes by putting the ? in the beginning. –"?mobile"-> automobile metamobile
use of connectors Connectors are used to put several words together. One instance where this is useful is when you have words that on their own mean different things. For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.
example: terms related to "mate" What other terms to be used? –matear (drink mate) –matero (mate drinker) –cebar (prepare mate) –cebador (mate preparer) –yerba (mate herb) –bombilla(mate straw)
connectorsI '(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate". '(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"
connectorsII '(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba". '(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora". '(S)' searches for the occurrence of connected terms in the same paragraph.
using Boolean operators In your query, you can combine several expressions with Boolean operators Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION" But I usually do not issue such fancy queries.
executing several searches There can be several searches done sequentially, and the results sets are saved by the system. Each time the system assigns a set number, Si, These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3' Remember that Boolean operations are set-theoretic!
Boolean operators on sets When using Booleans, be aware that "and" has higher precedence than "or". Thus: a or b and c is not the same as (a or b) and c but it is a or (b and c) Use parenthesis when in doubt
DS (display sets) This command can be executed any time to review the sets that have been formed since the last B (begin) command. This can be useful to review your search history.
the target command "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set. I have not seen details about how this subset is computed. A new result set is being formed.
display: the type command type set/format/range set is a result set format is a format range can be –start – end start is a record number to start end is a record number to end –all
standard delivery formats 2 -- full record except abstract 3 or medium – citation 5 or long – full except full text 6 or free – title and dialog number 8 or short – title plus indexing terms –useful to find other indexing terms 9 or full – everything KWIC or K – keywords in context
options for delivery I once tried to email results to me, to no avail You can save the html of the search results in the browser. You can print the results within the browser.
http://openlib.org/home/krichel Thank you for your attention!