Download presentation
Presentation is loading. Please wait.
Published byBartholomew Barber Modified over 9 years ago
1
eXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198 M D Metadata Solutions
2
M D Copyright 2008 Dan McCreary & Associates2 Overview Using eXist Indexes Types of indexes Configuring indexes Testing indexes
3
M D Copyright 2008 Dan McCreary & Associates3 Index Types Structural Indexes: These index the nodal structure, elements (tags) and attributes, of the documents in a collection. Range Indexes: Ideal for indexing measurements (integers, doubles, floats, currency or discrete value measurements). Full Text Indexes: These map specific text nodes and attributes of the documents in a collection to text tokens. NGram Indexes: These map specific text nodes and attributes of the documents in a collection to split tokens of n-characters (where n = 3 by default). Very efficient for exact substring searches and for queries on software program code which can not be easily split into whitespace separated tokens and are thus a bad match for the full text index. Spatial Indexes (Experimental): These map elements of the documents in a collection containing geo-referenced geometries to dedicated data structures that allow efficient spatial queries.
4
M D Copyright 2008 Dan McCreary & Associates4 Structural Indexes Keeps track of the elements (tags), attributes, and nodal structure for all XML documents in a collection It is created and maintained automatically in eXist Can not be reconfigured nor disabled by the user Used by all non-wildcard XPath and XQuery expressions in eXist (not “//*”) Stored in the database file elements.dbx
5
M D Copyright 2008 Dan McCreary & Associates5 How Do Structural Indexes Work? Maps every element and attribute qname (or qualified name) in a document collection to a list of pairs. This mapping is used by the query engine to resolve queries for a given XPath expression. Example: –//book/section –eXist uses two index lookups: the first for the node, and the second for the node –eXist computes the structural join between these node sets to determine which elements are in fact children of elements
6
M D Copyright 2008 Dan McCreary & Associates6 Range Index Range indexes provide a shortcut for the database to directly select nodes based on their typed values. Used when matching or comparing nodes by way of standard XPath operators and functions. Without a range index, comparison operators like =, > or < will default to a "brute-force" inspection of the DOM, which can be extremely slow if eXist has to search through maybe millions of nodes: each node has to be loaded and cast to the target type.
7
M D Copyright 2008 Dan McCreary & Associates7 Example You have a catalog of items that contain 50,000 items You want to find all items that have a price under $100 XPath: //item[price < 100.0] Without a range index you would have to do up to 50,000 comparisons for each search With a range index it would quickly find the subset that have a price under $100 with a single lookup
8
M D Copyright 2008 Dan McCreary & Associates8 Restriction on Ranges All collections that are included in the search must be indexed The data types must match Their must be no context dependencies
9
M D Copyright 2008 Dan McCreary & Associates9 All Collections Must be Indexes The range index must be defined on all items in the input sequence –If you search collections A and B but only A is range indexed, the query will not use the indexes Collection A Collection B with range index no range index XQuery
10
M D Copyright 2008 Dan McCreary & Associates10 Fulltext Fallback If all collections do not have the exact same type of range index the search will automatically revert to using the default fulltext indexes (slow)
11
M D Copyright 2008 Dan McCreary & Associates11 Data Types Must Match The index data type (first argument type) must match the test data type (second argument type) Wrong –//item[price = '1000.0'] Right –//item[price < xs:double($max-price)]
12
M D Copyright 2008 Dan McCreary & Associates12 Context Dependencies The right-hand argument must not have dependencies on the current context item. Wrong: –//item[price = self] Right: –//item[xf:double($max-price) < price]
13
M D Copyright 2008 Dan McCreary & Associates13 Fulltext Index Used to query for a sequence of separate "words" or tokens in a longer stream of text. While building the index, the text is parsed into single tokens which are then stored in the index. Historically, eXist has been creating a default full text index on all text nodes and attribute values. This will likely change in the future as the index is undergoing a major redesign. As the index becomes more configurable, we may drop the current default indexing behaviour. Anyway, as for the other index types, you can configure the full text index in the collection configuration and we will try to keep the configuration of the new index backwards compatible. We thus recommend to create a collection configuration file, disable the default index-all behaviour and define some explicit full text indexes on your documents. The details of this process will be described below. The full text index is only used in combination with eXist's fulltext search extensions. In particular, you can use the following eXist-specific operators and functions that apply a fulltext index:
14
M D Copyright 2008 Dan McCreary & Associates14 Fulltext Operators and Functions Operators: –&= –|= Main Functions –text:match-all() –text:match-any() –near()
15
M D Copyright 2008 Dan McCreary & Associates15 Disabling Indexes If you have disabled full text indexing for certain elements, these operators and functions will also be effectively disabled, and will not return matches. eXist will not return results for queries that normally would have results provided fulltext indexing was enabled. This is in direct contrast to the operation of range indexing, which does fallback to full searching of the document if no range index applies
16
M D Copyright 2008 Dan McCreary & Associates16 Geospatial Indexing (Beta) A working proof-of-concept index, which listens for spatial geometries described through the Geography Markup Language (GML)
17
M D Copyright 2008 Dan McCreary & Associates17 Sample Geospatial Data. 278515.400,187060.450 278515.150,187057.950 278516.350,187057.150 278546.700,187054.000 278580.550,187050.900 278609.500,187048.100 278609.750,187051.250 278574.750,187054.650 278544.950,187057.450 278515.400,187060.450
18
M D Copyright 2008 Dan McCreary & Associates18 Sample of Geospatial Queries What is the distance from point X to point Y? What items are within X miles of this point? What are inside county Y?
19
M D Copyright 2008 Dan McCreary & Associates19 Custom Indexing eXist version 1.2 and later feature a modularized indexing architecture Allows arbitrary indexes to be plugged into an indexing pipeline Required Java development skills See –http://exist-db.org/devguide_indexes.html
20
M D Copyright 2008 Dan McCreary & Associates20 For the eXist Database Administrator For each collection you want to administer –/db/foo - create a file collection.xconf and store it as /db/system/config/db/foo/collection.xconf Inheritance –Subcollections which do not have a collection.xconf file of their own will be governed by the configuration policy specified for the closest ancestor collection which does have such a file
21
M D Copyright 2008 Dan McCreary & Associates21 Inheritance Example /db /db/foo /db/foo/bar /db/system/config/db/foo/collection.xconf If no collection exists for this collection it will default to the parent’s collection configuration.
22
M D Copyright 2008 Dan McCreary & Associates22 Thank You! Please contact me for more information: Native XML Databases Metadata Management Metadata Registries Service Oriented Architectures Business Intelligence and Data Warehouse Semantic Web Dan McCreary, President Dan McCreary & Associates Metadata Strategy Development dan@danmccreary.com (952) 931-9198
23
M D Copyright 2008 Dan McCreary & Associates23 Index Creation and Updates The eXist index system automatically maintains and updates indexes defined by the user You therefore do not need to update an index when you update a database document or collection. eXist will even update indexes following partial document updates via XUpdate or XQuery Update expressions. The only exception to eXist's automatic update occurs when you add a new index definition to an existing database collection
24
M D Copyright 2008 Dan McCreary & Associates24 Sample Collection Index
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.