Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Basic IR: Modeling Basic IR Task: Slightly more complex:
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
IR Models: Structural Models
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
LAST WEEK  Retrieval evaluation  Why?  How?  Recall and precision – Venn’s Diagram & Contingency Table.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modern Information Retrieval Chapter 4 Query Languages.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
1 Modern information retrieval Chapter. 04: Query Languages.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Information Retrieval CSE 8337 Spring 2007 Query Languages & Matching Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Web- and Multimedia-based Information Systems Lecture 2.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
WIRED Week 5 Readings Overview - Text & Multimedia Languages & Properties - Text Operations - Multimedia IR Finalize Topic Discussions Schedule Projects.
Modern information retrieval
Information Retrieval in Practice
XML: Extensible Markup Language
Text Based Information Retrieval
Query Languages.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Modern information retrieval
Models for Retrieval and Browsing - Structural Models and Browsing
Query Languages Berlin Chen 2003 Reference:
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Recuperação de Informação B
Modern information retrieval
Advanced information retrieval
Presentation transcript:

Query Languages

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Recap 1 Important points –Data retrieval vs. information retrieval –Users specify their needs via an intermediary language. –Documents are represented by an abstraction of their content. –Traditional model vs. berry-picking model –Evaluation (precision/recall, single-value measures, human measures) –Task characteristics (question answering, open- ended analysis, ad-hoc vs. filtering) –Collection characteristics (size, document relations)

Recap 2 Important points –Content types (text, descriptive/semantic metadata, multimedia) –Metadata (formats & sets) –Information Theory (entropy) –Models of symbol distribution (Zipf’s law, Heap’s law) –Distance measures (Hamming distance, Levenshtein distance) –Markup languages (SGML, HTML, XML) –Multimedia formats (header + data)

Query Languages Query language determines which queries can be formulated –User-oriented languages –System-oriented languages (protocols) Language dependent on underlying information retrieval model Systems may enhance the query using –Word expansion (thesaurus & stemming) –removing stopwords

Keyword-Based Querying A query is composed of keywords –Documents containing keywords are returned –Intuitive, easy to express, fast ranking –Single-word and multi-word queries Classification of keyword-based queries –single-word queries –context queries –Boolean queries –natural language queries

Single-Word Queries Word query –the most elementary query that can be formulated –a word is a sequence of letters surrounded by separators –in many models, words are only types of queries allowed Result of word queries –the set of documents containing at least one of the words of the query –the resulting documents are ranked according to a degree of similarity to the query use term frequency and inverse document frequency

Context Queries Phrase query –a sequence of single-word queries –ignore separators and uninteresting words example: “…enhance the retrieval…” –ranked in a fashion analogous to single words Proximity query –a phrase query with a maximum allowed distance (character or word) between words in the query example: distance = 4 “enhance the power of retrieval” –physical proximity has semantic value: the words in the same paragraph are related in some way

Boolean Queries Boolean query –composed of atoms (basic queries) that retrieve documents, and of Boolean operators which work on their operands (sets of documents) Query syntax tree –compositional scheme –leaves: basic queries –internal nodes: operators AND OR translation syntaxsyntactic

Boolean Queries Operators in Boolean queries –e1 OR e2 : selecting all docs satisfying e1 or e2 –e1 AND e2 : selecting all docs satisfying both e1 and e2 –e1 BUT e2 : selecting all docs satisfying e1 but not e2 Classic Boolean system –no ranking of the retrieved docs –does not allow partial matching –alternative: fuzzy Boolean set of operators meaning of AND and OR can be relaxed (e.g., appearing in some operands)

Natural Language Natural language query –blurring the distinction between AND and OR → query becomes an enumeration of words and context queries –higher ranking is assigned to those documents matching more parts of the query Characteristics –retrieving all the documents close to the query –a complete document can be used as a query → leads to the use of relevance feedback techniques (user selects a document from the result, and submits it as a new query) –example system: AskJeeves

Query Languages: Patterns & Structures

Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words: string (sequence of characters) in the text –Prefixes: string forming the beginning of a text word (e.g., comput → computer, computation) –Suffixes: string forming the termination of a text word (e.g., ters → computers, painters) –Substrings: string appeared within a text word (allowed word separators) (e.g., tal → talk, metallic & any flow → many flowers)

Pattern Matching (cont.) More Types of Patterns –Ranges: a pair of strings matched any word lying between them in lexicographical order (e.g., held to hold → hoax, hissing) –Allowing errors: retrieving word similar to given word (e.g., flower → flo wer [edit distance = 1]) –Regular expressions: general patterns built up by simple strings and operators (e.g., pro (blem | tein) (s | ε) (0 | 1 | 2)* ) → Problem02, proteins –Extended patterns: classes of characters, conditional expressions, wild characters, combinations

Structural Queries Structural query –mixing contents and structure in queries content constraints (words, phrases, patterns) structural constraints (containment, proximity) and restrictions on structural elements (chapters, sections) Type of structures of text –form-like fixed structure –hypertext structure –hierarchical structure

Structural Queries (cont.) Types of structures formhypertexthierarchical

Fixed Structure Traditional restrictions –documents had a fixed set of fields –each field had some text inside –only rarely the fields appear in any order or repeat –fields were not allowed to nest or overlap –retrieval: specifying a given basic pattern to be found only in a given field Characteristics –reasonable to retrieve text collection having a fixed structure (e.g. mail archive) → inadequate to represent the hierarchical structure such as HTML docs –expansion to relational DB model

Hypertext Hypertext (navigational) –a directed graph where the nodes hold some text and the links represent connections between nodes or between positions inside the nodes Browsing / Searching in hypertext –retrieval from a hypertext: browsing (traversing the hypertext nodes following link → navigational activity) –even in web, one can search by the text contents of the nodes, but not by their structural connectivity –some search engines now allow searching for specific source or destination anchors (but not general structure + content queries)

Hierarchical Structure Hierarchical structure –an intermediate structuring model lying between fixed structure and hypertext –represents a recursive decomposition of the text –a natural model for many text collections, e.g., books, articles, legal documents, structured programs, etc. Hierarchical models –PAT Expressions, Overlapped Lists, List of References, Proximal Nodes, Tree Matching Issues in hierarchical models –static or dynamic structure, restrictions on the structure, integration with text, query language

Query Protocols Query protocols –query language used to query text database –standards intended not for human use but for querying library systems and querying CD-ROMs Some important query protocols –Z39.50 –WAIS –CCL –CD-RDx –SFQL