Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval and Web Design

Similar presentations


Presentation on theme: "Information Retrieval and Web Design"— Presentation transcript:

1 Information Retrieval and Web Design
Lecture (6) Prepared by Dr. Dunia Hamid Hameed

2 Information Retrieval
Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

3 Structured and Unstructured Data
The term “unstructured data” refers to data which does not have clear, semantically overt, easy-for-a- computer structure. It is the opposite of "structured data", the canonical example of which is a relational database, of the sort companies usually use to maintain product inventories and personnel records.

4 IR is also used to facilitate “semistructured” search such as finding a document where the title contains Java and the body contains threading.

5 Clustering and Classification
Given a set of documents, clustering is the task of coming up with a good grouping of the documents based on their contents. It is similar to arranging books on a bookshelf according to their topic. Given a set of topics, standing information needs, or other categories (such as suitability of texts for different age groups), classification is the task of deciding which class(es), if any, each of a set of documents belongs to. It is often approached by first manually classifying some documents and then hoping to be able to classify new documents automatically.

6 I.R. Scales 1- In web search, the system has to provide search over billions of documents stored on millions of computers. 2- At the other extreme is personal information retrieval. 3- In between is the space of enterprise, institutional, and domain-specific search, where retrieval might be provided for collections.

7 Web search has its root in information retrieval (or IR for short), a field of study that helps the user find needed information from a large collection of text documents.

8 Retrieving information simply means finding a set of documents that is relevant to the user query. A ranking of the set of documents is usually also performed according to their relevance scores to the query. The most commonly used query format is a list of keywords, which are also called terms.

9 IR is different from data retrieval in databases using SQL queries because the data in databases are highly structured and stored in relational tables, while information in text is unstructured. There is no structured query language like SQL for text retrieval.

10 Web pages are also quite different from conventional text documents used in traditional IR systems:
Web pages have hyperlinks and anchor texts, which do not exist in traditional documents. Web pages are semi-structured. A Web page is not simply a few paragraphs of text like in a traditional document. A Web page has different fields, e.g., title, metadata, body, etc. The content in a page is typically organized and presented in several structured blocks (of rectangular shapes). Some blocks are important and some are not (e.g., advertisements, privacy policy, copyright notices, etc).


Download ppt "Information Retrieval and Web Design"

Similar presentations


Ads by Google