Download presentation
1
INFORMATION RETRIEVAL WEEK 1 AND 2
WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2
2
WHAT IS INFORMATION RETRIEVAL?
Information Retrieval – IR Information Retrieval Lancaster (1968) : An information retrieval system does not inform (I.e change the knowledge) of the user on the subject of his inquiry. It merely inform on the existence (or non-existence ) and whereabouts of documents relating to his request
3
IR – process of getting/retrieving information
Now : a lot of information – print and electronic Requirement : obtain information quickly and accurately IR – aims to provide fast , effective and efficient methods of representing, managing , searching, retrieving and presenting such information IR = the representation , storage, organization of and access to information items
4
Computer science perspective
Design and build a large scale system that will store, manipulate, retrieve and display electronic information of any kind Text, audio, image and graphics that are stored in such a way that they are available for interaction with human or machine Library and information perspectives Search features – au, ti, su, keywords Relevance of retrieve items
5
Examples of IRS
6
Examples of IRS
7
3 challenges for IR researchers and practitioners
Technical challenge : what tools should IR systems provide to allow effective and efficient manipulation of information within such diverse media as text, image, video and audio? Interaction challenge : what features should IR systems provide in order to support a wide variety of users in their search for relevant information. Evaluation challenge : how can we evaluate which tools and features are effective and usable, given the increasing diversity of end-users and information seeking situations?
8
3 basic areas of research
Content analysis – describing the contents of the documents in a form suitable for computer processing Information structures – exploiting relationships between documents to improve the efficiency and effectiveness of retrieval strategies Evaluation – measurement of effectiveness of retrieval
9
Information Retrieval System
Information Retrieval System = IRS Before :index document and retrieve Eg. OPAC of library – cataloguing Now: modelling, document classification and categorization, system architecture, user interface, data visualization, filtering languages Eg. WWW
10
Basic Information Retrieval Process
Question OR Full description of user information needs Translate into query OR keywords which summarizes the description of user information needs Query processed by a search engine or IRS IRS retrieves information which is useful/relevant to the user
11
Basic Concepts in Information Retrieval
User Task Logical View of documents
12
User Task A user has to translate his information needs into query in the language provided by the system Specify a set of words English Language Statement : I want a book by J. K Rowling titled The Chamber of Secrets
13
Query entered in a computer system
Au = Rowling Ti = Chamber of Secrets “Chamber of Secret” Rowling AND Stone Au rowling ti chamber of secrets ti stone
14
2 User Task 2 user task – browsing and retrieval
Browsing – the process of retrieving info. Whereby the main objective is not clearly defined from the beginning and whose purpose might change during the interaction with the system. Eg. User search the internet for info about marine organism look for info. About Australian aborigines user is said to be browsing in the collection and not searching Eg. Searching for a book in the library shelves
15
Retrieval – process of retrieving info whereby the main obj
Retrieval – process of retrieving info whereby the main obj. is clearly defined from the onset of searching process – eg. Eg. Searching for a book in the library shelves
16
2 actions when user interacts with an IRS
2 actions can be identified when a user interacts with an IRSYS – pulling and pushing actions. Pulling action user request for info in interactive way eg browsing and retrieval Pushing action push info towards the user periodically through the use of a specified or specially designed s/ware also known as filtering eg. Yahoo Msgr Service alert user each time new message arrive Online Stock Exchange
17
Interaction of the user with IRSYS through distinct task
DB Browsing USER
18
Logical View of Documents
Documents in a collection are represented by a set on index terms or keywords Keywords Abstract Full text
19
Logical View of Documents
Documents in a collection are represented by a set of index term/keywords Documents Indexing Process Extracted from text of document Assigned by humans Keywords/subject headings = Logical view of document
20
LISANET – search by abstract
21
MJLIS - EJournal
22
If full text : Each word in the text is a keyword Most complex form
Expensive If full text is too large, there are mechanisms built into the IRS to reduce the number of keyword :
23
Stop words (eg articles and connectives – a, the , an, and, of, etc)
Logical view of documents - continue Stop words (eg articles and connectives – a, the , an, and, of, etc) Stemming (reduce distinct words to their common grammatical root) eg diary** will find diary or diaries Truncation – eg catalog* will retrieve catalog, catalogs, catalogue, catalogues Noun words (eliminates adjectives, adverbs, verbs) eg run will represent runs, running compression Conversion Process
24
This conversion process is known as text operation or transformation
Logical view of documents - continue This conversion process is known as text operation or transformation It reduce the complexity of the document representation and allow the logical view from that of a full text to a set of index terms On the other hand, the human assigned keywords provides the most concise logical view of a document but might lead to retrieval of poor quality – different interpretations, limited keywords if using thesaurus
25
2 modes of retrieval Ad-Hoc – the documents in the IRS remains static but new queries are submitted to the system – eg. CD-ROM Database Filtering – the queries remain relatively static but new documents come into the IRS eg. Stock market
26
Filtering Construct a user profile that reflects the user’s preferences and profile is matched against incoming documents to find a match or a hit Retrieve only documents of interest to the user and as specified in the user profile User select relevant documents from the list. Filtered documents can also be ranked to further assist the user as to relevance Construction of a user profile - user provide necessary keywords or collect info about preferences from the user and use this to construct a user profile dynamically
27
INFORMATION RETRIEVAL PROCESS
DEFINE TEXT DATABASE The text database has to be defined before the retrieval process begins Done by database manager – documents to be used, operations to be performed on the text, text model Original documents is transformed into a logical view of the documents via the various text operations The database manager will then build up the index of the text – manually / computer generated The retrieval system is tested
28
B. RETRIEVAL PROCESS The IRS can be used once the document database has been indexed User puts or present his question/ user need to the IRS Question is change to a logical view of the document via the text operation The query operation will present this to the system in a form understandable by the system Query is processed to obtain the retrieved documents.
29
Continue… The retrieved document are ranked according to relevance Retrieved document are sent to the user User looks through at the ranked documents and can modify question/user need/ query via the user feedback cycle Same process repeated
30
DEVELOPMENT For the past 4000 years , man has always been organizing information for retrieval and usage. It started out with a table of contents for a book. Then, the amount of information extended over a number of books A specialized data structure is needed to ensure faster access to the stored info. The oldest and the most popular data form of data structure for fast IR is a collections of words or concept with which are associated pointers to the related info = INDEX Previously – Manual
31
2 different views of the IR problems:
Development…continue Now, with the advent of computers, large indexes can be generated automatically. This automatic indexes provide the logical view of the document as perceived by the system and not the user 2 different views of the IR problems: Computer-centered building efficient indexes , processing user queries with high performance, develop ranking algorithm which will improve the quality of the answer set Human-Centered studying the behavior of the user , understand his main needs, and of determining how such understanding affects the organization and the operation the the IRSYS.
32
IR in the Library Libraries are the first users of IRSYS to retrieve information Usually develop by academic institution and later by commercial vendors 1st generation – automation of the card catalog and allowed searches based on author and title 2nd generation – increased search functionality - searching by subject headings, keywords, complex queries -OPAC 3rd generation – graphical interfaces, electronic forms, hypertext features, open system architecture – Digital Libraries
33
The Web and Digital Libraries
Search engine on the web are still using indexes which are similar to the ones used by libraries years ago. So, what has change? Advances in computer technology has led to: Cheaper access to various sources of information Greater access to network due to advances in all kind of digital communication Freedom to post information on the web
34
Problems People still find it difficult to retrieve info relevant to their information needs from the web Issues to address: Dynamic world on the web Demand for access and quick response Quality of retrieval task is affected by user interaction with the system
35
THANK YOU
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.