Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connect with life Nauzad Kapadia Quartz Systems

Similar presentations


Presentation on theme: "Connect with life Nauzad Kapadia Quartz Systems"— Presentation transcript:

1 Connect with life www.connectwithlife.co.in Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com

2 Session Objectives And Takeaways Session Objective(s): Discover what we have learned from using the new Integrated Full Text Search Find out what works and what doesn’t work iFTS is faster than SQL 2005 full text A great base for future improvements

3 Background Based on MSSearch Lack of Integration limited the performance of queries Limits the ability to integrate with high availability and scalability functionality

4 New Architecture

5 T-SQL Parser Algebrizer SQL Algeb. FTS Algeb. Bind Language Module FTS Algeb. Parse Ranking Func. Integration SQL SERVER process FDHOST process Shared Memory WB Stemmer iFilters STOPLIST THESAURUS WB client QO QE SQL/FTS integrated query tree FTLogicalOperator Cardinality FTExecutionOperator QUERY Execution Plan Full-Text Index Results

6 Demo – Using Full Text Search

7 StopLists New STOPLIST support Simplified noise words utilization and manageability. DB object associated with the FT index. CREATE FULLTEXT STOPLIST stoplist_name [ FROM {[database_name.] source_stoplist_name} | SYSTEM STOPLIST] [AUTHORIZATION owner_name] ALTER FULLTEXT STOPLIST stoplist_name { | ADD LANGUAGE language_term | DROP { | LANGUAGE language_term | ALL LANGUAGE language_term | ALL }

8 Demo – Stop Lists and Creating FullText Indexes

9 Thesaurus improvements Stored in internal tables (in tempdb) in XML form instead of being parsed from external files Instance level thesaurus sys.sp_fulltext_load_thesaurus_file (lcid) Loads all the data specified in the Thesaurus XML corresponding to the language with specified lcid.

10 Demo - Thesaurus

11 New family of Word-Breakers (WB): WBs are components responsible of parse the textual data in a given language and pass the tokenized result to the Full-Text Index. 51 languages/WBs out of the box Improved quality in many already existing word-breakers

12 English English UK Simplified Chinese Traditional Chinese Chinese (Hong Kong) Chinese (Macau) Chinese (Singapore) Thai Korean French German Japanese Italian Spanish Bengali Bulgarian Catalan Croatian Neutral Punjabi Romanian Serbian Cyrillic Serbian Latin Slovak Slovenian Tamil Telugu Ukrainian Urdu Lithuanian Malay Icelandic Indonesian Hindi Gujarati Vietnamese Arabic Norwegian Portuguese Brazilian Russian Dutch Malayalam Marathi Hebrew Canada Latvian Swedish Danish Polish Turkish WBs available in SQL Server 2008: Languages present but disabled by default New languages supported in SQL Server 2008 Existing in SQL Server 2005, and being replaced by new WBs in SQL Server 2008 Unchanged language/WB from SQL Server 2005

13 The indexing performance has improved in most scenarios 2005 Crawl2005 TotalIFTS CrawlIFTS Total 20M rows 1k text data 02:0602:2501:2201:28 5M rows 8k text data 02:1002:4102:2202:32 20M rows 1k nvarchar data 01:3701:5501:2001:26 Measured on 4 processor AMD64 2793 MHz, 8G RAM. Numbers are in HH:MM format. Total time is combining time to crawl and time of merge into index For some HW configuration and data types, specific best practices are recommended to improve indexing performance (i.e: capping SQL Server’s memory)

14 To see the word frequency Sys.dm_fts_index_keywords() Sys.dm_fts_index_keywords_by_document() Get number and size of fragments Sys.fulltext_index_fragments Understanding Query Behavior Sys.dm_fts_parser(““This is test” AND “This is also a test”, 1033,0)

15 Demo – Understanding Indexes

16 - Due to new architecture, we have now new Full-Text Indexes. Former ones are not compatible in SQL Server 2008. Solution: Full-Text Catalog Upgrade Option - Import: (default) Faster method although performance and semantic implications are possible. - Rebuild: Slower method although ideal final state of new FTCatalogs guaranteed. - Reset: Faster Upgrade method although your Search app will not have the FTCatalogs available afterwards. You need to rebuild them when possible. - Possible Upgrade methods: 1.In place Upgrade: User will be prompted for what Upgrade Option to choose for existing FTCatalogs. 2.Restore/Attach : Instance level setting will be applied to former Full-Text Catalogs brought up with the former DB.

17 Put full text index on separate file group to avoid fragmenting main data files Use varchar(max) instead of text/image If you see excessive blocking by FTGATHER turn off auto change tracking Schedule manual job to do updates Watch number of fragments to determine how often to schedule Don’t run full text master merges with other index rebuilds or reorgs at the same time If you have large documents (>2MB) may need to reduce SQL memory a bit so FDHost daemon has memory to run.

18

19 No fielded searches on XML or any other document format. Still will be at document level. No partitioned full-text indexes. So no support for SWITCH partition on tables that are FT indexed. Some of the customer wish list items: Snippets, column weights, language detection, customizable wordbreakers and proximity operators, etc.. Forms of words when searching secondary languages; i.e. -s, -ing, -ed; e.g. in a document that used Russian word breaker initially and you know it contains the English word ‘Books’, it will find ‘book’ but not ‘books’.

20 Related Content Webcasts MSDN Webcast: Using Full-Text Search in SQL Server Express http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=103229497 9&Culture=en-US White Paper on SQL Server FTS 2005 http://msdn2.microsoft.com/en-us/library/ms345119.aspx MSDN http://msdn2.microsoft.com/en-us/library/ms142571.aspx Technical Case Study http://www.microsoft.com/technet/itshowcase/content/intdocmgmtsql2005.mspx FTS 2008 (iFTS) White Paper http://msdn.microsoft.com/en-us/library/cc721269(SQL.100).aspx Program Manager on SQL Server FTS Fernando.azpeitia@microsoft.com

21 Demo

22 © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Connect with life Nauzad Kapadia Quartz Systems"

Similar presentations


Ads by Google