Information Retrieval and Web Design

Slides:



Advertisements
Similar presentations
Content 15.1 Basic features Types of database Data structures 15.2 Creating a database Screen layout Entering data Editing data 15.3 Displaying data Searching.
Advertisements

Learning HTML. > Title of page This is my first homepage. Tells Browser This is an HTML page Basic Tags Tells Browser End of HTML page Header information.
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
CH 11 Multimedia IR: Models and Languages
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Search Engines and Information Retrieval Chapter 1.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Information retrieval wed sept data…. -start at 6.45.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Query Methods Simple SQL Statements Start ….
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Lecture 1: Introduction and the Boolean Model Information Retrieval
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Search Engine Architecture
Database Vocabulary Terms.
Text & Web Mining 9/22/2018.
Multimedia Information Retrieval
Thanks to Bill Arms, Marti Hearst
Information Retrieval
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Searching for Truth: Locating Information on the WWW
Search Engine Architecture
Searching for Truth: Locating Information on the WWW
Query Type Classification for Web Document Retrieval
Information Retrieval and Web Design
Information Organization: Overview
Information Retrieval and Web Design
Information Retrieval and Web Design
Information Retrieval and Web Design
Introduction to Search Engines
Information Retrieval and Web Design
Database Management Systems and Enterprise Software
Presentation transcript:

Information Retrieval and Web Design Lecture (6) Prepared by Dr. Dunia Hamid Hameed

Information Retrieval Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Structured and Unstructured Data The term “unstructured data” refers to data which does not have clear, semantically overt, easy-for-a- computer structure. It is the opposite of "structured data", the canonical example of which is a relational database, of the sort companies usually use to maintain product inventories and personnel records.

IR is also used to facilitate “semistructured” search such as finding a document where the title contains Java and the body contains threading.

Clustering and Classification Given a set of documents, clustering is the task of coming up with a good grouping of the documents based on their contents. It is similar to arranging books on a bookshelf according to their topic. Given a set of topics, standing information needs, or other categories (such as suitability of texts for different age groups), classification is the task of deciding which class(es), if any, each of a set of documents belongs to. It is often approached by first manually classifying some documents and then hoping to be able to classify new documents automatically.

I.R. Scales 1- In web search, the system has to provide search over billions of documents stored on millions of computers. 2- At the other extreme is personal information retrieval. 3- In between is the space of enterprise, institutional, and domain-specific search, where retrieval might be provided for collections.

Web search has its root in information retrieval (or IR for short), a field of study that helps the user find needed information from a large collection of text documents.

Retrieving information simply means finding a set of documents that is relevant to the user query. A ranking of the set of documents is usually also performed according to their relevance scores to the query. The most commonly used query format is a list of keywords, which are also called terms.

IR is different from data retrieval in databases using SQL queries because the data in databases are highly structured and stored in relational tables, while information in text is unstructured. There is no structured query language like SQL for text retrieval.

Web pages are also quite different from conventional text documents used in traditional IR systems: Web pages have hyperlinks and anchor texts, which do not exist in traditional documents. Web pages are semi-structured. A Web page is not simply a few paragraphs of text like in a traditional document. A Web page has different fields, e.g., title, metadata, body, etc. The content in a page is typically organized and presented in several structured blocks (of rectangular shapes). Some blocks are important and some are not (e.g., advertisements, privacy policy, copyright notices, etc).