Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc.

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
Project Management Database and SQL Server Katmai New Features Qingsong Yao
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
CH 11 Multimedia IR: Models and Languages
SiS Technical Training Development Track Technical Training(s) Day 1 – Day 2.
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.
Searching Binary Data in SQL Server 2012 Steve Jones SQLServerCentral.com.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
© 2002 by Prentice Hall 1 David M. Kroenke Database Processing Eighth Edition Chapter 13 Managing Databases with SQL Server 2000.
Overview of SQL Server Alka Arora.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
1 © 2006 SolidWorks Corp. Confidential. Clustering  SQL can be used in “Cluster Pack” –A pack is a group of servers that operate together and share partitioned.
Introduction. 
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
Module 20 Working with Full-Text Indexes and Queries.
Database Technical Session By: Prof. Adarsh Patel.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
 Michael Rys Principal Lead Program Manager Microsoft Corporation BB16.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 4: Active Directory Architecture.
BARBARIN DAVID SQL Server Senior Consultant Pragmantic SQL Server Denali : New development features.
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
Architecture Rajesh. Components of Database Engine.
Chokchai Junchey Microsoft Product Specialist Certified Technical Training Center.
WHAT’S NEW IN SQL SERVER 2008: T-SQL Martin Bell SQL Server MVP.
Module 8: Querying Full-Text Indexes. Overview Introduction to Microsoft Search Service Microsoft Search Service Components Getting Information About.
Module 3 Designing a Physical Database Model. Module Overview Selecting Data Types Designing Database Tables Designing Data Integrity.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 12 Understanding database managers on z/OS.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Module 10 Administering and Configuring SharePoint Search.
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Full Text Search. Some Info  An optional component  Much faster and complex than the previous version  Allow you to search for words and tokens in.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
SharePoint enhancements through SQL Server RSS integration with SharePoint What’s New Elimination of IIS
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Master Data Management & Microsoft Master Data Services Presented By: Jeff Prom Data Architect MCTS - Business Intelligence (2008), Admin (2008), Developer.
Session 1 Module 1: Introduction to Data Integrity
What is MySQL? MySQL is a relational database management system (RDBMS) based on SQL (Structured Query Language). First released in January, Many.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
21 Copyright © 2009, Oracle. All rights reserved. Working with Oracle Business Intelligence Answers.
Your Data Any Place, Any Time Beyond Relational. Overview of Beyond Relational Applications Today Beyond Relational Feature Overview Whirlwind Feature.
Analyzing Text with SQL Server 2014, R, AND Azure ML Dejan Sarka.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Aleksandar Drašković Enterprise Architect deroso Solutions GmbH Data shredding: a deep dive into SharePoint 2013 storage architecture.
CHAPTER 9 File Storage Shared Preferences SQLite.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL Database Management
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
FileTables Sam Nasr, MCAD, MCTS, MVP NIS Technologies
Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing.
Chapter 5 : Designing Windows Server-Level Security Processes
Module 8: Querying Full-Text Indexes
FileTables By Sam Nasr, MCAD, MCT, MCTS NIS October 6, 2012
Using FileTables Sam Nasr, MCSA, MCT, MVP NIS Technologies
Using FileTables Sam Nasr, MCAD, MCTS, MVP NIS Technologies
What is that service I never turn on?
Chapter 9 Database and Information Management.
Chapter 11 Managing Databases with SQL Server 2000
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Using FileTables Sam Nasr, MCSA NIS Technologies August 3, 2019.
Presentation transcript:

Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc.

Outline  What can we do with FTS?  How to install FTS  FTS components  Creating FTS indexes  How to query with FTS  FILESTREAM and FileTable

FTS Basics  char  varchar  nchar  nvarchar  text  ntext  image  xml  varbinary  varbinary(max) FTS allows searching against character-based data

Search Functionality “hotel” => “hotel” Specific words or phrases “fan” => “fantastic”, “fantasy” “local store” => “locally stored” Prefixes “minimized” => “minimizing”, “minimise” Inflectional forms

Search Functionality “search,query” => “query to perform search” Proximity “folder” => “directory” SynonymsWeighted Values

A First Look  Let’s run some simple examples to get a feel for FTS!

LIKE vs FTS  LIKE works on character patterns only  Cannot use the LIKE predicate to query formatted binary data  FTS is much faster against large amounts of unstructured text data

Supported SQL Server Editions  Enterprise  Business Intelligence  Standard  Web  Express with Advanced Services Available since at least SQL Server 2000

FTS Components Word Breaker StemmerStoplists ThesaurusFilters Property Lists

Language Support  50+ languages  Language-specific components  Word breakers and stemmers  Stoplists  Thesaurus files

How to Install

Default FTS Language

FTS Indexes  One index per table or indexed view  Must have a unique, single-column, non- nullable index on the table  Grouped within the same database into one or more full-text catalogs (“containers”)

Full-Text Catalogs  A logical construct  A way to manage FT indexes together

Index Population  Population: the addition of data to full-text indexes Automatic Manual On Request Scheduled

Steps to Setup an Index on a Table Create Full-Text Catalog For Each Column to Index Indicate language Indicate document type * Choose Change-Tracking Mechanism

Full-Text Index Wizard

Example: Create Catalog and Index

CONTAINS  Precise or prefix matches to single words and phrases  Proximity matches  Logical operations between conditions: AND, OR, AND NOT  Optional use of inflectional forms and thesaurus

FREETEXT  Matching the meaning, but not the exact wording, of specified words or phrases  Always uses inflectional forms and thesaurus

CONTAINSTABLE AND FREETEXTTABLE  Return a relevance ranking value (RANK) and full-text key (KEY) for each row  The actual RANK values are unimportant and typically differ each time the query is run  ISABOUT/WEIGHT influence the ranking in CONTAINSTABLE

Example: Queries

Stoplists  A mechanism to discard commonly occurring strings that do not help the search aisthe byand…

Thesaurus  Nicknames: Robert/Bob  Common misspellings: calendar/calender  Homophones: Geoff/Jeff  Technical terms: proc/procedure Very powerful if you log searches and learn what users are commonly searching for

Thesaurus  One file per language Expansions “bike” in addition to “bicycle” Replacements “calendar” instead of “calender”

Filters  Extract textual information from the document (removing the formatting)  Send the text to the word-breaker component for the language associated with the column  Need to manually install Office 2010 and PDF filters

Example: FTS Components

Where to Store Large Objects? DatabaseFile System security manageability, recoverability transactional consistency performance

Why Store in the Database?  Integrating unstructured data into the relational database provides significant benefits:  Integrated storage and data management capabilities (e.g., backup)  Ease of administration and policy management  Full-text search

FILESTREAM  A database/file system hybrid  FILESTREAM is an attribute that can be assigned to a varbinary(max) column  Allows storing BLOB data in the file system  Not restricted to the 2 GB limit SQL Server imposes on BLOBs

FILESTREAM  SQL Server buffer pool is not used  Isolation semantics are governed by Database Engine transaction isolation levels

Steps to FILESTREAM Enable at OS levelConfigure at instance levelCreate a filegroupAdd a file to the filegroup Indicate root folder

OS-level Configuration of FILESTREAM

Instance-level Configuration of FILESTREAM

Example: FILESTREAM

FILESTREAM  All data access must be transactional  Must use specific APIs for file I/O  Do not edit the files directly!

When to Use FILESTREAM  Objects that are being stored are, on average, larger than 1 MB  Store smaller objects in the database  Fast read access is important  You are using a middle tier for application logic

FileTables  A special, fixed-schema kind of table  Builds on top of existing FILESTREAM capabilities  Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API)

FileTables  Hierarchical namespace  Includes file system properties as columns  Preserves full file names  Non-transactional access through the FS

FileTables  Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable

Example: FTS over FileTables

FileTables vs FILESTREAM  File and directory hierarchy maintained in the database  Windows application compatibility  Relational access to file attributes  Both are available in all editions

Wrap Up  Advanced searching on character-based data, including documents  FTS setup, components, and queries  FILESTREAM  FileTables

Other Topics  Document-property search  Semantic search  Optimizations  Query plans and execution traces

References  Posts and presentations by Bob Beauchemin   Blog: SQL Server FTS Team Blog   SQL Server 2012 Books Online  us/library/cc645577(SQL.110).aspx

Filter Packs  Adobe PDF Filter  u.jsp?ftpID=4025&fileID=3941  Office 2010 Filters  us/download/details.aspx?id=17062