Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing.

Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing

Thank you sponsors!

About Me: Adam Koehler, Senior Database Administrator at ScriptPro LLC 15 years of progressive experience with SQL Server from 7.0 to LinkedIn: Blog:

What are we going to cover?
Relational & full-text indexes: What are they? How are they implemented? What are the benefits? What are the downsides? Other search products: Apache Lucene.NET

Relational Indexes – What are they?
A set of pages that are organized in a B-tree structure with a multiple level hierarchy Can be defined as a clustered or non-clustered index Built into the SQL Server Engine itself Can be created on tables or views

Relational Indexes – Clustered Indexes
Is the physical ordering of the data into an organized structure based on the key values of the index Only 1 allowed per table based on the physical order (ascending or descending) The leaf nodes of a clustered index are actually the data rows themselves

Relational Indexes – Non-clustered Indexes
Contain a row-locator to the clustered index or the data row if the table is a heap Can create up to 999 on an individual table Can add non-key columns as included columns on the leaf level of the index that allow for fully-covered queries to execute optimally

Relational Indexes – Benefits
They’re easy to implement, no additional code required Commonly used, so there’s a ton of information out there on indexing strategies Not constrained by data type limitations

Relational Indexes – Downsides
Dependent on the index structure, as the table gets bigger, so does the indexes on the table As the indexes get larger, the time to query data based on that index can increase Fragmentation can occur in the indexes, which can increase space usage & slow down queries Queries against this data are row by row and byte by byte, which can be slow, dependent on the amount of data you’re dealing with Certain data types cannot be key columns varchar(max),nvarchar(max), varbinary(max), xml

Relational Indexes – Implementation
Uses the CREATE CLUSTERED INDEX & CREATE NONCLUSTERED INDEX statement. Have visibility into indexes using the following DMV’s: sys.dm_db_index_physical_stats sys.dm_db_index_operational_stats sys.dm_db_index_usage_stats sys.dm_db_partition_stats sys.allocation_units

Full-Text Indexing – What is it?
A token (word) -based index that allows for searching against character and BLOB data types (such as Excel & Word documents) Been a part of SQL Server since SQL 7.0 Significantly updated in SQL Server 2008 to fully integrate into the SQL Server Engine

Full-Text Indexes – Architecture
Consists of two parts Full-Text Engine in sqlservr.exe Responsible for query compilation and processing Filter daemon host process - fdhost.exe Responsible for loading the filters that the Full-Text Engine uses Is the MSSQLFDLauncher service

Full-Text Indexes – Full-Text Engine
SQLServr.exe is responsible for the following components of Full-Text Search: User Tables Full-text gatherer Works with the full-text crawl threads for scheduling and executing the populating of the indexes and monitoring full-text catalogs Thesaurus files Stored in <sql instance directory>\MSSQL\FTData Stoplist objects Common words that are noise words not to search on Query Processor If a query contains a full-text search, the processor passes it off to the Full-Text Engine for compilation and execution Full-Text Engine Index Writer Builds the structure used to store the indexed items

Full-Text Indexes –FD Host Process
Is responsible for accessing, filtering, and word breaking data from tables and stemming the query input. Has the following components: Protocol Handler pulls the data from memory for processing and accesses data from user tables. Filters Data in varbinary, varbinary(max), image or xml columns require filtering the data in the document before it can be indexes. The filters are based on the document type and extract chunks of data from the documents removing formatting and leaving the text and position information. Word breakers and stemmers Are language specific components that find word boundaries based on the literal rules of a given language (breaking). Stemmers conjucate verbs and perform expansion of word tenses. At the time of indexing, the filter daemon uses these to perform linguistic analysis on the text data from a given column based on the language defined on the index itself.

Full-Text Indexes – Search Processing

Full-Text Indexes – Benefits
Allows for semantic search operations against fields in the database As long as automatic population is turned on, full-text index maintenance is fairly simple The size of the full-text index on the table is usually smaller than that of a relational index

Full-Text Indexes – Downsides
Requires modification of existing code to support searches Only one Full-Text index allowed per table Can only be created on the following data types: char, varchar, nchar, nvarchar text, ntext image xml varbinary and varbinary(max) columns

Full-Text Indexes – Implementation
The FDHost service must be started Named Pipes must be an enabled network protocol for SQL Server Must create a full-text catalog first in order to group any full-text indexes together (CREATE FULLTEXT CATALOG) Can have multiple catalogs per database Must have a unique key index defined on the table you’re going to put the full-text index on (i.e. primary key or unique index)

Full-Text Indexes – Implementation
Have visibility into the Full-Text subsystem via the following DMVs/DMFs Database level: Sys.fulltext_indexes Sys.fulltext_catalogs Sys.fulltext_stopwords Sys.fulltext_stoplists Sys.dm_fts_index_keywords Sys.dm_fts_index_keywords_by_document Sys.dm_fts_index_keywords_position_by_document Instance Level: Sys.dm_fts_active_catalogs Sys.dm_fts_fdhosts Sys.dm_fts_index_population Sys.dm_fts_memory_Buffers Sys.dm_fts_memory_pools Sys.dm_db_fts_index_phyiscal_stats Sys.dm_Fts_parser

Full-Text Indexes – Usage
In order to use the full-text index, your query must include one of the following functions: FREETEXT, FREETEXTTABLE CONTAINS CONTAINSTABLE

Full-Text Indexes – CONTAINS
Used in the WHERE clause of a query Searches for precise or less precise matches to single words and phrases Can search for the following: Prefix of a word or phrase Word near another word A word that is inflectionally generated from another (i.e. drive, drives, drove, driving, driven) Synonyms of another word using a thesaurus

Full-Text Indexes – CONTAINSTABLE
Returns a table of zero or one or more rows for the columns queried containing precise or less precise matches to single words and phrases, proximity of words within a distance of one another or weighted matches. Used in the FROM clause Returns a relevance ranking value and full-text key in the result set

Full-Text Indexes – FREETEXT
Used in the WHERE clause of a query Searches for values that match the meaning and not the exact wording of the search criteria Queries using FREETEXT are less precise than CONTAINS Matches are generated if any term or form of any term is found

Full-Text Indexes – FREETEXTTABLE
Uses the same search conditions as FREETEXT, but also adds a rank and key value for each row Used in the FROM clause of a query like CONTAINSTABLE

Apache Lucene.NET – What is it?
Port of the java Lucene search library to .NET. Based on an inverted index Mapping from content to locations in files or a database Used in search engine indexing

Apache Lucene.Net – Benefits
Allows C# developers to index documents and tables only having to learn basic T-SQL constructs Is a module that can be pre-built into C# applications with minimal effort It does not interact with the database, except when the query is executed to build and maintain the index files on disk.

Apache Lucene.Net – Downsides
Separate files exist on disk that must be maintained & backed up with file backups to make sure that the indexing service still runs. Cannot tune the queries against the index files without recompiling your application Unless those queries are in stored procedures, then you can tune the stored procedures

Apache Lucene.Net – Implementation
Main components of Lucene.NET Analyzer – Breaks down the search criteria into single words/terms IndexWriter – Coordinates with the Analyzer and moves results into storage IndexSearcher – performs the actual search against the index file Document – entity which is to be retrieved by the index Table in a database Field – metadata that describes a document. This data is what is searchable Columns in a table Store Directory – Directory in which the index files are stored

Summary Relational indexes are the easiest to implement to get good performance boosts on your systems. Full-Text indexes increase what can be indexed on your database and allow for search engine-like queries against SQL Server and can speed up your character based queries dramatically Apache Lucene.NET is nice for C# developers, but not for DBA’s to implement

Links, and Thank you! CREATE FULLTEXT INDEX
Query with Full-Text Search Apache Lucene.NET Lucene.NET main concepts Lucene.NET Sample application LinkedIn:

Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing.

Similar presentations

Presentation on theme: "Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing.

Similar presentations

Presentation on theme: "Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing."— Presentation transcript:

Similar presentations

About project

Feedback