Presentation is loading. Please wait.

Presentation is loading. Please wait.

IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.

Similar presentations


Presentation on theme: "IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany."— Presentation transcript:

1 IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany

2 Course Contents (Provided) Modelling Query operations Mark up languages XML technologies and its applications Searching the Web IR models and languages Indexing and searching Digital libraries Project: Designing and developing parts of IR Systems.

3 A correction… Williams, H. E. D. Lane “Building Effective Database-Driven Web Sites” 2004, ISBN 13: 9780596005436. Reference “Web Database Applications with PHP and MySQL”, 2nd Edition

4 A correction regarding book

5 Sessional Marks Mid-1: 20 marks Mid-2: 20 marks Assignment: 10 marks Project: 10 marks Final: 40 marks

6 Course Description This course has two major inter-related portions:  Information retrieval (more towards theoretical discussion and formulae)  Web databases (more towards practical side) Web theory PHP and MySQL

7 7 Definition of Information Retrieval Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). 7

8 Types of Information Structured: databases Semi-structured: XML, RDF Unstructured: text documents

9 Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent and perhaps most widely used IR application Concerned firstly with retrieving relevant documents to a query. Concerned secondly with retrieving from large sets of documents efficiently.

10 Relevance Relevance is a subjective judgment and may include: – Being on the proper subject. – Being timely (recent information). – Being authoritative (from a trusted source). – Satisfying the goals of the user and his/her intended use of the information (information need) Main relevance criterion: an IR system should fulfill user’s information need

11 Typical IR Task Given: – A corpus of textual natural-language documents. – A user query in the form of a textual string. Find: – A ranked set of documents that are relevant to the query.

12 Typical IR System Architecture IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

13 Key Terms Used in IR QUERY: a representation of what the user is looking for - can be a list of words or a phrase. DOCUMENT: an information entity that the user wants to retrieve COLLECTION: a set of documents INDEX: a representation of information that makes querying easier TERM: word or concept that appears in a document or a query

14 Web Search System Query String IR System Ranked Documents 1. Page1 2. Page2 3. Page3. Document corpus Web Spider

15 A spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, also known as a "crawler" or a "bot." Spiders are typically programmed to visit sites that have been submitted by their owners as new or updated. Entire sites or specific pages can be selectively visited and indexed. Spiders are called spiders because they usually visit many sites in parallel at the same time, their "legs" spanning a large area of the "web." Spiders can crawl through a site's pages in several ways. One way is to follow all the hypertext links in each page until all the pages have been read.


Download ppt "IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany."

Similar presentations


Ads by Google