Oracle Text Operations J. Molka-Danielsen Sept. 30, 2002.

Slides:



Advertisements
Similar presentations
Flex Your APEX Implementing Oracle E-Business Suite Descriptive Flexfields in Application Express Shane Bentz InterVarsity Christian Fellowship/USA.
Advertisements

Benchmarking Oracle 8i Intermedia Text Background for this benchmark Interesting new features in OIMT Benchmarking, methodology and problems Results Conclusions.
Chapter 4 5 6_ SQL SQL Is: Structured Query Language
Information Retrieval in Practice
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Using Objects and Properties
1 SQL-Structured Query Language SQL is the most common language used for creating and querying relational databases. Many users can access a database applications.
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
1 Creating a Non-Conditional List A- What are you going to do? You will “list” “all of the records” in a database. (it means you will not use any condition!)
Chapter 14 The Second Component: The Database.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Overview of Search Engines
Lecture 5 Geocoding. What is geocoding? the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of.
Text in Oracle The Search Platform and Ultra Search Omar Alonso, Senior Product Manager, Oracle Corp. Stefan Buchta, Principal Product Manager, Oracle.
Oracle Text NoCOUG Presentation August 15, Session Objectives Review Oracle Text basics Index Options Compare Oracle Text with interMedia and ConText.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Organizing Information Digitally Norm Friesen. Overview General properties of digital information Relational: tabular & linked Object-Oriented: inheritance.
ASP.NET Programming with C# and SQL Server First Edition
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
C H A P T E R 4 Designing Database E-Commerce Hassanin M. Al-Barhamtoshy
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Web 2.0: Concepts and Applications 6 Linking Data.
Chapter 4 The Relational Model 3: Advanced Topics Concepts of Database Management Seventh Edition.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Project Implementation for COSC 5050 Distributed Database Applications Lab3.
Chapter 7 Working with Databases and MySQL PHP Programming with MySQL 2 nd Edition.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Oracle vs SQL Server Dr. Alex Wang. Oracle Text Oracle Text uses standard SQL to do almost everything. Full-text retrieval technology, deal with unstructured.
Advanced searching with Oracle Text Indexing and searching in text and documents Author: Krasen Paskalev Certified Oracle DBA Semantec.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr.
Demo: Power Tools for P8 Presenter: Jay Bowen Demonstration Topic: Choice List Features Demo URL below Power Tools Choice List Support 1. Native P8 Choice.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ITGS Databases.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
1 CA201 Word Application Making Information in Longer Documents Accessible Week # 12 By Tariq Ibn Aziz Dammam Community college.
Oracle 8i interMedia Text Presented by Jorge Rimblas 4-Feb-2002 SSI Worldwide.
Week 7 Lecture 2 Globalization Support in the Database.
Using SQL Connecting, Retrieving Data, Executing SQL Commands, … Svetlin Nakov Technical Trainer Software University
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Session 1 Module 1: Introduction to Data Integrity
SQL Overview Structured Query Language
IS6146 Databases for Management Information Systems Lecture 1: Introduction to IS6146 Rob Gleasure robgleasure.com.
21 Copyright © 2009, Oracle. All rights reserved. Working with Oracle Business Intelligence Answers.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Analyzing Text with SQL Server 2014, R, AND Azure ML Dejan Sarka.
CSC314 DAY 8 Introduction to SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SQL OVERVIEW  Structured Query Language  The.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
11 Copyright © 2004, Oracle. All rights reserved. Managing XML Data in an Oracle 10g Database.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
SQL Query Getting to the data ……..
NOSQL databases and Big Data Storage Systems
Database Vocabulary Terms.
Search Techniques and Advanced tools for Researchers
Chapter 8 Working with Databases and MySQL
New Perspectives on XML
The ultimate in data organization
02 | Querying Tables with SELECT
Presentation transcript:

Oracle Text Operations J. Molka-Danielsen Sept. 30, 2002

Index types Standard index type for traditional full-text retrieval over documents and web pages. The context index type provides a rich set of text search capabilities for finding the content you need, without returning pages of spurious results. Catalog index type – the first text index designed specifically for eBusiness catalogs. The ctxcat catalog index type provides flexible searching and sorting at web-speed. Classification index type for building classification or routing applications. The ctxrule index type is created on a table of queries, where the queries define the classification or routing criteria. Oracle Text offers index types for traditional full-text retrieval, eBusiness catalogs, and classification and routing applications.

Where the index is used Index TypeApplication Type Query Operator CONTEXT Use this index to build a text retrieval applicationCONTAINS when your text consists of large coherent documents. You can index documents of different formats such as MS Word, HTML, XML, or plain text. With a context index, you can customize your index in a variety of ways. CTXCAT Use this index type to index small textCATSEARCH fragments such as item names, prices and descriptions that are stored across columns. With this index, query performance is improved for mixed queries. CTXRULE Use a CTXRULE index to build a documentMATCHES classification application. The CTXRULE index is an index created on a table of queries, where each query has a classification. Single documents (plain text, HTML, or XML) can be classified using the MATCHES operator.

Query operator strategies Keyword searching. - Searching for keywords in a document. User enters one or more keywords that best describe the query. Context queries. - Searching for words in a given context. User search for text that contains words near to each other. Boolean operations. - Combining keywords with Boolean operations. User can express a query connecting Boolean operations to the keywords. Linguistics features. - Using fuzzy and other natural language processing techniques. User searches for text that is about something. Pattern matching. - Retrieval of text that contains a certain property. User searches for text that contains words that contain a string.

Features Languages - Oracle Text supports all Oracle NLS character-sets. For example, ASCII, UTF-8, JA165JIS, GBK, BIG5, etc. Oracle Text supports search across documents in western languages (English, French, Spanish, German, etc.), Japanese, Korean, Traditional, and Simplified Chinese engine. Highlighting - The highlight service takes a query string, fetches the document contents, and shows you which words in the document cause it to match the query. Markup - Markup takes the highlight service one step further, and produces a text version of the document with the matching words marked up.

Features continued Theme Extraction - A “Theme” provides a snapshot that describes what the document is about. Rather than searching for documents that contain specific words or phrases, users can search for documents that are about a certain subject, even if that subject is not mentioned explicitly in the document. Theme queries return a hit list of those documents that are about the requested subject, along with a score that indicates how strongly each document reflects to the subject in question. Gist - A Generic Gist is a summary consisting of the sentences or paragraphs, which best represent the overall subject matter of the document. You can use the Generic Gist to skim the main content of the text, or assess your interest in the text's subject matter. You can generate paragraph-level or sentence-level gists.

Advanced Linguistics A document classification application is one that classifies an incoming stream of documents based on their content. These are known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, economy, or sports. Oracle Text enables you to build such applications with the new CTXRULE index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new MATCHES operator can be used to categorize each document. The CTX_CLS package generates CTXRULE query rules for a set of documents. The user has to supply a training set consisting of categorized documents and each document must belong to one or more categories. The package generates the queries that define the categories and then writes the results to a table.

Examples Creating Indexes with Oracle Text Let’s assume the following table containing some typical product information. > describe product_information Name Null? Type PRODUCT_ID NOT NULL NUMBER(6) PRODUCT_NAME VARCHAR2(50) PRODUCT_DESCRIPTION VARCHAR2(2000) CATEGORY NUMBER(2) PRODUCT_STATUS VARCHAR2(20) LIST_PRICE NUMBER(8,2)

Create index We would like to create a text index on the PRODUCT_DESCRIPTION column to make it searchable. The index creation is a SQL statement: CREATE INDEX description_idx ON product_information(product_description) INDEXTYPE IS CTXSYS.CONTEXT;

Select and result SELECT score(1), product_id, product_name FROM product_information WHERE CONTAINS (product_description, 'monitor NEAR "high resolution"', 1)>0 ORDER BY score(1) DESC; SCORE(1) PRODUCT_ID PRODUCT_NAME Monitor 21/HR Monitor 17/HR LCD Monitor 11/PM Plasma Mon 10/XGA Monitor 21/HR/M Monitor 17/HR/F

Another example on creating a contex index Specifying DIRECT_DATASTORE - The following example creates a table with a CLOB column to store text data. It then populates two rows with text data and indexes the table using the system-defined preference CTXSYS.DEFAULT_DATASTORE. create table mytable(id number primary key, docs clob); insert into mytable values(111555,'this text will be indexed'); insert into mytable values(111556,'this is a direct_datastore example'); commit; create index myindex on mytable(docs) indextype is ctxsys.context parameters ('DATASTORE CTXSYS.DEFAULT_DATASTORE');

Specifying file data storage The following example creates a data storage preference using the FILE_DATASTORE. This tells the system that the files to be indexed are stored in the operating system. The example uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute of to the directory /docs. begin ctx_ddl.create_preference('mypref', 'FILE_DATASTORE'); ctx_ddl.set_attribute('mypref', 'PATH', '/docs'); end;

References 1. Oracle Text Reference Guide. Oracle Corp., Redwood Shores, CA (2002). 2. Oracle Text Application Developer’s Guide. Oracle Corp., Redwood Shores, CA (2002). 3. Oracle Text Home Page ( 4. Oracle Text Discussion Forum (