Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information retrieval (IR)

Similar presentations


Presentation on theme: "Information retrieval (IR)"— Presentation transcript:

1 Information retrieval (IR)
Basics, models, interactions

2 Central ideas Information retrieval (IR) is at the heart of ALL indexing & abstracting databases, information resources, and search engines all work on basis of IR algorithms and procedures Contemporary IR is also interactive – to such a degree that pragmatically IR can not be separated from interaction As a searcher you will constantly use IR, thus you have to be knowledgeable about it

3 ToC Information retrieval (IR)
Matching algorithms: Exact match & best match Strength & weaknesses IR Interaction & interactive models

4 Definitions. Traditional model
1. Information retrieval Definitions. Traditional model

5 Information retrieval (IR) - original definition
Calvin Mooers ( ) coined the term “Information retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, techniques, or machines are employed to carry out the operation.” Mooers, 1951

6 IR: Objective & problems
Objectives: Provide users with effective access to & interaction with information resources. Retrieve information or information objects that are relevant Problems addressed: 1. How to organize information intellectually? 2. How to specify search & interaction intellectually? 3. What systems & techniques to use for those processes?

7 IR models Model depicts, represents what is involved - a choice of features, processes, things for consideration Several IR models used over time traditional: oldest, most used, shows basic elements involved interactive: more realistic, favored now, shows also interactions involved; several models proposed Each has strengths, weaknesses We start with traditional model to illustrate many points - from general to specific examples

8 Description of traditional IR model
It has two streams of activities one is the systems side with processes performed by the system other is the user side with processes performed by users & intermediaries (you) these two sides led to “system orientation” & “user orientation” in system side automatic processing is done; in user side human processing is done They meet at the matching process where the query is fed into the system and system looks for documents that match the query Also feedback is involved so that things change based on results e.g. query is modified & new matching done

9 Traditional IR model System User Acquisition Problem Representation
documents, objects Problem information need Representation indexing, ... Representation question File organization indexed documents Query search formulation Matching searching feedback Retrieved objects

10 Content: What is in databases
Acquisition system side Content: What is in databases In Dialog first part of blue sheets: File Description, Subject Coverage; in Scopus Subject Areas Selection of documents & other objects from various sources - journals, reports … In Blue Sheets: Sources; in Scopus Sources Mostly text based documents Full texts, titles, abstracts ... But also: data, statistics, images (e.g. maps, trade marks) ... Importance: Determines contents of databases Key to file selection in searching !!!

11 Abstracting; summarizing Bibliographic description:
Representation of documents, objects … system side Indexing – many ways : free text terms (even in full texts) controlled vocabulary - thesaurus manual & automatic techniques Abstracting; summarizing Bibliographic description: author, title, sources, date… metadata Classifying, clustering Organizing in fields & limits in Dialog: Basic Index, Additional Index. Limits in Scopus pull down menus Basic to what is available for searching & displaying

12 Enables searching & interplay between types of files
File organization system side As mentioned: Sequential record (document) by record Inverted term by term; list of records under each term Combination: indexes inverted, documents sequential When citation retrieved only, need for document files or document delivery Enables searching & interplay between types of files

13 Critical for examination in interview
Problem user side Related to user’s task, situation, problem at hand vary in specificity, clarity Produces information need ultimate criterion for effectiveness of retrieval how well was the need met? Inf. need for the same problem may change, evolve, shift during the IR process - adjustment in searching often more than one search for same problem over time you will experience this in your term project Critical for examination in interview

14 Determines search specification - a dynamic process
Representation question – user side Non-mediated: end user alone Mediated: intermediary + user interviews; human-human interaction Question analysis selection, elaboration of terms various tools may be used thesaurus, classification schemes, dictionaries, textbooks, catalogs … Focus toward deriving search terms & logic selection of files, resources Subject to feedback changes Critical roles of intermediary - you Determines search specification - a dynamic process

15 Translation into systems requirements & limits
Query search formulation – user side Translation into systems requirements & limits start of human-computer interaction Selection of files, resources Search strategy - selection of: search terms & logic possible fields, delimiters controlled & uncontrolled vocabulary variations in tactics Reiterations from feedback several feedback types: relevance feedback, magnitude feedback ... query expansion & modification What & how of actual searching

16 Various search algorithms: Each has strengths, weaknesses
Matching searching – system side Process of comparing search: what documents in the file match the query as stated? Various search algorithms: exact match - Boolean still available in most, if not all systems best match - ranking by relevance increasingly used e.g. on the web hybrids incorporating both e.g. Target, Rank in Dialog Each has strengths, weaknesses no ‘perfect’ method exists and probably never will Involves many types of search interactions & formulations

17 What a user (or you) sees, gets, judges – can be specified
Retrieved objects from system to user Various order of output: sorted by Last In First Out (LIFO) ranked by relevance & then LIFO ranked by other characteristics Various forms of output In Dialog: Output options in Scopus title (default), abstract + references, cited by, plus more When citations only available: possible links to document delivery Scopus View at publisher accessing RUL for digital journals Base for relevance, utility evaluation by users What a user (or you) sees, gets, judges – can be specified

18 Exact match & best match searches
2. Matching algorithms Exact match & best match searches

19 Exact match - Boolean search
You retrieve exactly what you ask for in the query: all documents that have the term(s) with logical connection(s), and possible other restrictions (e.g. to be in titles) as stated in the query exactly: nothing less, nothing more Based on matching following rules of Boolean algebra, or algebra of sets ‘new algebra’ presented by circles in Venn diagrams

20 Boolean algebra: operates on sets of documents
A OR B: retrieves set that has either term A or B often called union and labeled A  B I want documents that have either term library or term digital someplace within A NOT B: retrieves set that has term A but not B often called negation and labeled A – B I want documents that have term library but if they also have term digital I do not want those Has four operations (like in algebra): A: retrieves set that has term A I want documents that have the term library A AND B: retrieves set that has terms A and B often called intersection & labeled A  B I want documents that have both terms library and digital someplace within

21 Potential problems But beware:
digital AND library will retrieve documents that have digital library (together as a phrase) but also documents that have digital in the first paragraph and library in the third section, 5 pages later, and it does not deal with digital libraries at all thus in Scopus or Google you will ask for “digital library” and in Dialog for digital(w)library to retrieve the exact phrase digital library digital NOT library will retrieve documents that have digital and suppress those that along with digital also have library, but sometimes those suppressed may very well be relevant. Thus, NOT is also known as the “dangerous operator “ also beware of order: venetian AND blind will retrieve documents that have venetian blind and also that have blind venetian (oldest joke in information retrieval)

22 Boolean algebra depicted in Venn diagrams
Four basic operations: e.g. A = digital B= libraries 1 2 3 A B A alone. All documents that have A. Shade 1 & 2. digital 1 2 3 A B A AND B. Shade 2 digital AND libraries 1 2 3 A B A OR B. Shade 1, 2, 3 digital OR libraries 1 2 3 A B A NOT B. Shade 1 digital NOT libraries

23 Venn diagrams … cont. Complex statements allowed e.g A B 2
1 2 3 5 6 7 (A OR B) AND C Shade 4,5,6 (digital OR libraries) AND Rutgers (A OR B) NOT C Shade what? (digital OR libraries) NOT Rutgers 4 C

24 Venn diagrams cont. Complex statements can be made
as in ordinary algebra e.g. (2+3)x4 As in ordinary algebra: watch for parenthesis: 2+(3 x 4) is not the same as (2+3)x4 (A AND B) OR C is not the same as A AND (B OR C)

25 Adding variations to Boolean searches
digital AND libraries can be specified to appear in given fields as present in the given system e.g. to appear in titles only in Dialog command is s digital AND libraries/TI in Scopus pull down menu allows for selection of given field, – so for digital library specify Article Title in pull down menu in Google Advanced Search gets you to a pull down menu for Where your keywords show up: & then go to in the title of the page Various systems have different ways to retrieve singular and plurals for the same term in Scopus term library will retrieve also libraries & vice versa in Dialog you have to specify librar? to retrieve variants in Google library retrieves library but not libraries

26 Best match searching Output is ranked
it is NOT presented as a Boolean set but in some rank order You retrieve documents ranked by how similar (close) they are to a query (as calculated by the system) similarity assumed as relevance ranked from highest to lowest relevance to the query mind you, as considered by the system you change the query, system changes rank thus, documents as answers are presented from those that are most likely relevant downwards to less & less likely relevant as determined by a given algortihm remember: a system algorithm determines relevance ranking

27 Best match ... cont. Best match process deals with PROBABILITY:
what is the probability that a document is relevant to a query? compares the set of query terms with the sets of terms in documents calculates a similarity between query & each document based on common terms &/or other aspects sorts the documents in order of similarity assumes that the higher ranked documents have a higher probability of being relevant allows for cut-off at a chosen number e.g. the first 20 documents BIG issue: What representation & similarity measures are better? Subject of IR experiments “better” determined by a number of criteria, e.g. relevance, speed …

28 Best match (cont.) Variety of algorithms (formulas) used to determine similarity using statistic &/or linguistic properties e.g. if digital appears a lot of times in a given document relative to its size, that document will be ranked higher when the query is digital many proposed & tested in IR research many developed by commercial organizations Google also uses calculations as to number of links to/from a document & other methods many algorithms are now proprietary & not disclosed the way a system ranks and you rank may not necessarily be in agreement Web outputs are mostly ranked but Dialog allows ranking as well, with special commands

29 Best vs. exact match Traditional IR model
3. Strengths & weaknesses Best vs. exact match Traditional IR model

30 Boolean vs. best match Boolean Best match allows for logic
provides all that has been matched BUT has no particular order of output – usually LIFO treats all retrievals equally - from the most to least relevant ones often requires examination of large outputs Best match allows for free terminology provides for a ranked output provides for cut-off - any size output BUT does not include logic ranking method (algorithm) not transparent whose relevance? where to cut off?

31 Strengths of traditional IR model
© Tefko Saracevic Strengths of traditional IR model Lists major components in both system & user branches Suggests: What to explain to users about system, if needed What to ask of users for more effective searching (problem ...) Aids in selection of component(s) for concentration mostly ever better representation Provides a framework for evaluation of (static) aspects

32 IR is a highly interactive process - thus additional model(s) needed
© Tefko Saracevic Weaknesses Does not address nor account for interaction & judgment of results by users identifies interaction with matching only interaction is a much richer process Many types of & variables in interaction not reflected Feedback has many types & functions - also not shown Evaluation thus one-sided IR is a highly interactive process - thus additional model(s) needed

33 Models. Implications: what happens in searching?
4. IR interaction Models. Implications: what happens in searching?

34 Enters interaction There is MUCH more to searching than knowing computers, networks & commands, as there is more to writing than knowing word processing packages

35 © Tefko Saracevic, Rutgers University
IR as interaction If we consider USER & USE central, then: Interaction is a dominant feature of contemporary IR Interaction has many facets: with systems, technology with documents, texts viewed/retrieved intermediaries with people Several interactive IR models none as widely accepted as traditional IR model Broader area: human-computer interaction (HCI) studies

36 © Tefko Saracevic, Rutgers University
HCI: broader concepts “Any interaction takes place through one or more interfaces & involves two or more participants who each have one or more purposes for the interaction” Storrs, 1994 Participants: people & ‘computer’ (everything in it – software, hardware, resources …) Interface: a common boundary Purposes: people have purposes and ‘computer’ has purposes built in At issue: identification of important aspects, roles of each

37 © Tefko Saracevic, Rutgers University
HCI … definitions “Interaction is the exchange of information between participants where each has the purpose of using the exchange to change the state of itself or of one or more of others” “An interaction is a dialogue for the purpose of modifying the state of one or more participants” Key concepts: exchange, change for user: change the state of knowledge related to a given problem, tasks, situation

38 © Tefko Saracevic, Rutgers University
IR interaction is ... “... the interactive communication processes that occur during the retrieval of information by involving all the major participants in IR, i.e. the user, the intermediary, and the IR system.” Ingwersen, 1992 Involved: users intermediaries (possibly) everything in IR system communication processes - exchange of information

39 © Tefko Saracevic, Rutgers University
Questions What variables are involved in interaction? models give lists How do they affect the process? How to control? experiments, experience, observation give answers Do given interventions (actions) or communications improve or degrade the process? e.g. searcher’s (intermediaries or end-users) actions Can systems be designed so that searcher’s intervention improves performance?

40 © Tefko Saracevic, Rutgers University
Interactive IR models Several models proposed none as widely accepted as the traditional IR model They all try to incorporate information objects (“texts”): IR system & setting interface intermediary, if present user’s characteristics cognitive aspects; task; problem; interest; goal; preferences ... social environment variety of processes between them all.

41 User modeling (treated in unit 11, but introduced here to illustrate one of the important aspect of human-human interaction) Identifying elements about a user that impact interaction, searching, types of retrieval …: who is the user (e.g. education) what is the problem, task at hand what is the need; question how much s/he knows about it what will be used for how much wanted, how fast what environment is involved Much more than just analyzing a question posed by user related to reference interview Used to select resources, specify search concepts and terms, formulate query, select format and amount of results provided, follow up with feedback and reiteration, change tactics …

42 Three interactive models
Three differing models are presented here, each concentrates on a different thing: Ingwersen concentrates on enumeration of general elements that enter in interaction Belkin on different processes that are involved as interaction progresses through time Saracevic on strata or levels of interaction elements on computer and user side As mentioned, no one interaction model is widely accepted as the traditional IR model

43 Ingwersen’s interactive cognitive model
Among the first to view IR differently from traditional model Included IR as a system but concentrates also on elements outside system that interact inf. objects – documents, images … intermediary – you - & interface user cognitive aspects user & general environment path of request (we call question) from environment (problem) to query path of cognitive changes path of communication various other paths of interactions

44 Ingwersen’s model graphically
Information objects Interface/ Intermediary Query User’s cognitive Request Environ ment IR system setting Cognitive transformations Interactive communication - space

45 Belkin’s episodes model
Concentrates on what happen in interaction as process Ingwerson concentrated on elements Viewed interaction as a series of episodes where a number of different things happen over time depending on user’s goals, tasks there is judgment, use, interpretation… processes of navigation, comparison, summarization … involving different aspects of information & inf. objects While interacting we do diverse things, perform various tasks, & involve different objects Think: what do you do while searching?

46 Belkin’s episodes model
USER USER CO USER Goals tasks ..... CO COMPARISON REPRESENTATION INTERACTION Judgment, use, interpretation, modification SUMMARIZATION NA NA INFOR- MATION Type, medium mode level NAVIGATION VISUALIZATION Time

47 Saracevic’ stratified model
Interaction: considers it as a sequence of processes/episodes occurring in several levels or strata* Interaction = INTREPLAY between levels Structure: Several User levels Produce a Query – it has characteristics Several Computer levels They all meet on the Surface level Dialogue enabled by Interface user utterances computer ‘utterances’ Adaptation/changes in all Geared toward Information use The IR interaction is then a dialogue between the participants - user and computer - through an interface, with the main purpose to affect the cognitive state of the user for effective use of information in connection with an application at hand. The dialogue can be reiterative, incorporating among others various feedback types, and can exhibit a number of patterns - all of which are topics for study. The major elements in the stratified model are users and computer, each with a host of variables of their own, having a discourse through an interface. The interface instantiates a variety of interactions, but it is not the focus of interactions, despite that it can in its own right effectively support or frustrate other interactions. We can think of interaction as a sequence of processes occurring in several connected levels or strata. Each strata/level involves different elements and/or specific processes. On the human side processes may be physiological (e.g. visual, tactile, auditory), psychological, and cognitive.

48 Saracevic’s stratification model
Situational tasks; work context... Adaptation Engineering hardware; connections... INTERACTION STRATA (levels) Surface level Use of information Query characteristics … COMPUTER Affective intent; motivation ... Cognitive knowledge; structure... Processing software; algorithms … Content inf. objects; representations... USER INTERFACE Context social, cultural …

49 Roles of levels or strata
© Tefko Saracevic, Rutgers University Roles of levels or strata Defining of what’s involved whassup? Help in recognition/separation of differing variables each strata or level involves different elements, roles, & processes Observation of interaction between strata - complex dynamics On the user side suggests what affects factors query and judgment of responses thus elements for user modeling The user side has a number of levels. I suggest three to start with: Cognitive, Affective, and Situational: On the Cognitive level users interact with texts and their representations in the information resources considering them as cognitive structures. Users interpret and judge cognitively the texts obtained, and may assimilate them cognitively. On the Affective level users interact with their intentions, and all that go with intentionality, such as beliefs, motivation, feelings (e.g. frustration), desires (e.g. for a given degree of completeness), urgency, and so on. Intentionality may be a critical aspect governing all the other user variables. On the situational level users interact with the given situation or problem-at-hand which produced the information need and resulting question. The results of the search may be applied to the resolution or partial resolution of a problem. However, things are not that simple. The situation that was the reason for interaction to start with, produced a problem that sometimes may be well sometimes ill defined, and the related question, if not on paper then in user’s mind, may also be defined in various well-ill degrees. In addition, a user also brings a given knowledge or cognitive state related to the situation, as well as an intentionality - these also may be well or ill defined. All this is used on the surface level to specify and modify queries, select files, search terms, search tactics, and other attributes to use in searching and decision-making, and make or change relevance inferences and other decisions.

50 Interplay between levels
© Tefko Saracevic, Rutgers University Interplay between levels Interplay on user side: Cognitive: between cognitive structures of texts & users Affective: between intentions & other Situational: between texts & tasks Similar interplay on computer side Surface level - interface: searching, navigation, browsing, display, visualization, query characterization … Interplay judgments in searching: evaluation of results - relevance changing of models: situation, need ... selection of search terms resulting modifications - feedback As the interaction proceeds, a series of dynamic adaptations occur in both elements, user and computer, concentrating toward the surface level, the point where they meet. However, I assume that the use of information proceeds toward application, that is toward the situational level. Adaptations may also signify changes or shifts in a variety of these levels. Various types of feedback play a critical role in various types of adaptation and change. Of great interest is to study the nature, manifestations, and effects of these changes and shifts. Shifts, relative little explored events, are probably among the most important ones that occur in interaction. Intuitively, we understand that we are doing different things for different purposes while interacting with an IR system. Those who deal with design and other aspects of computer side concentrate at different times on very different computer levels. The stratified model deliberately decomposes the many elements that enter in different types of interaction. In that sense the model is related to the idea of different kinds of interactions on the human side, as suggested by Belkin, and different types of ‘things’ and processes involved on the computer side, as suggested in the traditional IR model. It tries to incorporate both sides in the interaction.

51 © Tefko Saracevic, Rutgers University
Intermediaries - YOU Intermediaries could participate as an additional interface - many roles: diagnostic help in problem, query formulation system interface handling selection, interpretation & manipulation of inf. resources interpretation of results education of users enablers of end-users Basic role: optimizing results Act in processes at different levels

52 Implications Interaction central to IR including in searching of the Web We see it on the surface level But result of MANY variables, levels & their interplay IR interaction requires knowledge of these levels & interplays many users have difficulties so do many professionals Design of interfaces for interaction still lacking People compensate in many ways including trial & error, failures

53 What happens in searching?
Highly reiterative process back & forth between user modeling & (re)formulating search strategy goes on & on in many feedback loops, twists & turns, shifts Search strategy (the big picture) selection/reselection of sources stating a query (search statement) from a question terms, their expansions, logic, qualifications, limitations

54 Searching … (cont.) Search tactics (action steps)
what to do first, next e.g. from broad to narrow searches format of results Evaluation of results as to magnitude - how much? as to relevance - how well? feedback to change after that user model - e.g. question strategy - e.g. files, query tactics - e.g. narrowing, broadening

55 Practical suggestions for searchers (filched from a source I cannot find anymore)
Prepare carefully Understand your opponent - e.g. Dialog, Scopus, LexisNexis Anticipate e.g. hidden meaning of terms Have a contingency plan assessing odds of success or points of diminishing returns Avoid ambiguity inherent in language Stay loose!

56 Stay loose? I copied that, but always wandered what does it really mean? Dictionary says: not firmly fastened or fixed in place ???? well, sounds OK! or

57


Download ppt "Information retrieval (IR)"

Similar presentations


Ads by Google