1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester 2008-2009.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
Tries Standard Tries Compressed Tries Suffix Tries.
CS 430 / INFO 430 Information Retrieval
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
IR Models: Structural Models
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
LAST WEEK  Retrieval evaluation  Why?  How?  Recall and precision – Venn’s Diagram & Contingency Table.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Modern Information Retrieval Chapter 4 Query Languages.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Chapter 2: Algorithm Discovery and Design
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
1 Modern information retrieval Chapter. 04: Query Languages.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Chapter 5 new The Do…Loop Statement
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Information Retrieval CSE 8337 Spring 2007 Query Languages & Matching Material for these slides obtained from: Modern Information Retrieval by Ricardo.
An Implementation of The Teiresias Algorithm Na Zhao Chengjun Zhan.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Vector Space Models.
Information Retrieval
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Modern information retrieval
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Context-Free Grammars: an overview
Text Based Information Retrieval
CS 430: Information Discovery
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Multimedia Information Retrieval
Query Languages.
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE
Modern information retrieval
Tries 2/27/2019 5:37 PM Tries Tries.
Query Languages Berlin Chen 2003 Reference:
Recuperação de Informação B
Information Retrieval and Web Design
Modern information retrieval
Presentation transcript:

1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester

2 QUERY LANGUAGES CHAPTER 4

3 What is a Query? Query is a representation of the user’s information needs. It is composed of keywords and documents containing such keywords are searched for. A query may not represent the information needs exactly because: Information needs are difficult to describe ( semantic difficulty) Query must be in a format acceptable to the retrieval system ( syntactic difficulty) A Query can be a word or combination of several words

4 1. Single-Word Queries. 2. Context Queries. 3. Boolean Queries. 4. Natural Language Queries. Types of Queries

5 1.Single-Word Queries The elementary query in a text retrieval system is a word. A word is a sequence of letters surrounded by separators. Where the alphabet is split into ‘letters’ and ‘separators’. The choice of what is a letter and what is a separator left to the manager of the text database

6 Single-Word Queries Cont… Example: - The word ‘On-line’: The hyphen is not a letter but do not split a word.

7 Single-Word Queries Cont… The division of the text into words is not arbitrary, because of that, many models (i.e. the vector model) are completely structured on the concept of the words, and the words are the only type of queries allowed. The result of the word queries is the set of documents containing at least one of the words in the query, and the resulting documents are ranked according to a degree of similarity to the query.

8 2.Context Queries Many systems have the ability to search words in a given context, that is near other words. For Example: Network is relevant to computer or computer science.

9 Types of Context Queries 1. Phrase: - Is a sequence of single-word queries. Example: search for ‘enhance’, and then search for ‘retrieval’. It is understood that the separators in the text need not be the same as those in the query (e.g. two spaces versus one space). The previous example could match a text as ‘…enhance the retrieval…’.

10 Types of Context Queries Cont... Proximity: - The sequence of single words or phrases is given together with a maximum allowed distance between them. For example: the query ‘enhance retrieval’ should occur within four words, and then a match could be ‘… enhance the power of retrieval …’. The distance may be measured in characters or words depending on the system.

11 Types of Context Queries Cont… The words and phrases may or may not appear in the same order as in the query. Proximity queries can be ranked in the same way if the ranking technique does not depend on physical proximity. This is because the proximity means that the words are in the same paragraph.

12 3.Boolean Queries Composed of: Atoms (i.e., basic queries) that retrieve documents, and Boolean operators which work on their operands (set of documents) and deliver set of documents (determinants). A Query Syntax Tree used to represent Boolean Queries. Where leaves correspond to the basic queries, and the internal nodes to the operators.

13 Example of Query Syntax Tree Boolean Queries Cont. This query will retrieve all the documents whish contain the word ‘translation’ as well as either the word ‘ syntax’ or the word ‘syntactic’

14 With Boolean systems, no ranking of the retrieved documents is provided, a document either satisfies the Boolean query or it does not. No partial matching between a document and a user query. Boolean Queries Cont…

15 4.Natural Language It is a user language query (spoken English, Arabic, or French, etc.). The query is an enumeration of words and context queries. All the documents matching a portion of the user query are retrieved.

16 Natural Language Cont… Higher ranking is assigned to those documents matching more parts of the query. A threshold may be selected so that the documents with very low weights are not retrieved. Boolean queries are a simplified abstraction of natural language queries.

17 Pattern Matching A pattern is a set of syntactic features that must occur in a text segment. Patterns allows the retrieval of pieces of text that have some property. The segments that satisfies the pattern are the ‘match’ pattern.

18 Pattern Matching Cont… Types of patterns: Words: A string (sequence of characters). e.g.: ‘computer’, ‘space’, etc.. Prefixes: A string which must form the beginning of a word. Given the prefix ‘comput’ all the documents containing words such as: ‘computer’, ‘computing’, etc. are retrieved.

19 Pattern Matching Cont... Suffixes: A string which must form the termination of a word. Given the suffix ‘ters’ all the documents containing words such as: ‘computers’, ‘painters’, etc. are retrieved. Substrings: A string which can appear within a word. Given the substring ‘tal’ all the documents containing words such as: ‘coastal’, ‘talk’, etc. are retrieved.

20 Pattern Matching Cont... Ranges : A pair of strings which matches any word lying between them in lexicographical order. For example: The range between words ’held’ and ‘hold’ will retrieve strings such as ‘hoax’ and ‘hissing’.

21 Pattern Matching Cont... Allowing Errors: a word together with an error threshold. This search pattern retrieves all words which are ‘similar’ to the given word. Errors come from typing, spelling, or from OCR software. The query should try to retrieve the given word and what are likely to be its erroneous variant.

22 Pattern Matching Cont... The similarity model used is the Levenshtien distance or edit distance. The edit distance between two strings is the minimum number of characters insertions, deletions, and replacements needed to make them equal. Example: the query word ‘flo wer’ is at distance 1 from ‘flower’.

23 Pattern Matching Cont... Regular Expressions: Is a general pattern built up by simple strings and the following operators: Union Concatenation Repetition Example: The query: ‘pro (blem | tein) (s | ε) (0 | 1 | 2)*’ will match words such as: ‘problem02’ and ‘proteins’.

24 Pattern Matching Cont... Extended Patterns (EP): Are subsets of the regular expressions which are expressed with a simple syntax. The retrieval system converts the EP into regular expressions, or search them with a specific algorithm. Each system supports its own EPs.

25 Pattern Matching Cont... Examples (EP): 1. Classes of characters Case-insensitive matching. Ranges of characters. 2. Conditional expressions i.e., a part of the pattern may or may not appear.

26 Pattern Matching Cont Wild characters Match any sequence in the text. e.g.: ret* 4. Combinations that allow some parts of the pattern to match exactly and other parts with errors.