A short introduction to Full Text Search

YOU NEED THE ADVENTUREWORKS DATABASE FOR THIS BLOG!!!!

I think that everybody that works with SQL Server hates when users wants to make advanced searches, where the where-clause should be configured in such a way that it is impossible to get a good performance and to understand what is happening.

With Full text Search, Microsoft has a unique tool to optimize searches for e.g. websites or internal systems that requires a fast response and that must be flexible.

Imagine that you want to search in a producttable for Bike in the name field and the descriptionfield, You would then write something like:

Select *
from producttable
where name like ‘%Bike%’ or description like ‘%Bike%’

First of all, that would cause a lot of bad performance since you have a wildcard at the beginning of the searchfield, and second problem would be that  you can’t do an intelligent search, so that e.g. it also returns rows where it does not say Bike, but Racer, Biking and so on.

Full Text Search is the answer for that.

To do a full text search you need to setup a catalog and then an Index, and since it is a short introduction we will do it with the least amount of effort and without talking about all the properties:

— CODE START —

USE [AdventureWorks]
GO
CREATE FULLTEXT CATALOG [MainCatalog]WITH ACCENT_SENSITIVITY = ON
AS DEFAULT

— THen we create the full-text index

USE [AdventureWorks]
GO
CREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON ([MainCatalog]) WITH (CHANGE_TRACKING AUTO)
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Color])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ENABLE
GO
— CODE END —

Now you first have a full text catalog, and within that catalog you have a full text index. the index says on what table you want to add a full text index and what rows that should be included.

An easy way to do that is to go to Management studio, choose your database –> Storage and right click “Full Text Catalogs” to add a new catalog, and when you are finished. Click the catalog and choose properties.

Now we can query the table and get an incredible performance and flexiblity.

You query by using four different words, and we will look at them now:

  1. Freetext
  2. Contains
  3. FreetextTable
  4. ContainsTable

The most simple ones are freetext. The syntax is:

— CODE START —
Select *
from Production.Product
where freetext(*, ‘Black’)

Select *
from Production.Product
where Contains(*, ‘Black’)
— CODE END —

The star means that it should look in all columns that are in the index, that means name and color. I could write a column name instead or [col1, col2, colx].

‘Black’ is of course the searchexpression.

With freetext it is simple, and you write a word or a short statement. What freetext then do is:

  1. Wrap words according to languagespecific rules, so that it does not only look for a sentense but is capable of finding words also.
  2. Stemming. Meaning that it looks after other types of the word, e.g. Start = Started = Starting e.g.
  3. Look up in the thesaurus. A file that is situated in the SQL Server folder FTDATA (there is one for each language. This one you can modify, and it tells the Full Text search, that if you search for e.g. NT then it should also return Windows, Operating system e.g.

Contains let you do more advanced searches, e.g.

— CODE START —
Select *
from Production.Product
where Contains(Name, ‘HEadSet NEAR Bearings’)
— CODE END —

Try to do that with a normal SQL Statement.

If i want contains to find all stemmings of a word (start, started e.g.), then I could write.

— CODE START —
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)
— CODE END —

If I want to test it, I code do the following:

— CODE START —
— We REset Product 876
update Production.Product
set name = ‘Bike’
where productID=’876′

— See product 876
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)

— Now we change it to Biking
update Production.Product
set name = ‘Biking’
where productID=’876′

— Product 876 is gone
Select *
from Production.Product
where Contains(Name, ‘Bike’)

— But with Inflectional it is here again
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)
— CODE END —

NICE.. The difference between Inflectional and writing Thesaurus is that the latter looks up in the Thesaurus file, and you can controll it.

Remember.. If you change in the Thesaurus file, you must run:

— CODE START —
exec sys.sp_fulltext_load_thesaurus_file 1033;
— CODE END —

1033 is the localeID (Language code).

If you want to test what a searchphrase would return, then you could use the following Dynamic Management View:

— CODE START —
Select *
from sys.dm_fts_parser(‘FORMSOF(inflectional, Yellow)’,1033,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(inflectional, blau)’,1031,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(thesaurus, NT5)’,1033,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(thesaurus, BIKE)’,1033,0,0)
— CODE END —

One problem with Full Text search is that if I live en Denmark I would love to return all rows were the word THE is in. It means tea. But if I am in you, and a user search for The Company, then I would nok like to return all rows where THE is in. I must add a stopword, and a stopword means a word that is not included in the search.

To do that I must:

  1. Create a stoplist
  2. Add stopwords
  3. attach a stoplist to an index

— CODE START–

select *
from Production.Product
where contains(*,’Headset’)

— Then Lets create a stoplist
USE [AdventureWorks]
GO
CREATE FULLTEXT STOPLIST [SorensStop]
FROM SYSTEM STOPLIST
;

GO
— Then we add a stopword
USE [AdventureWorks]
GO
ALTER FULLTEXT STOPLIST [SorensStop] ADD ‘Headset’ LANGUAGE ‘English’;
GO

alter  Fulltext index on Production.Product
set stoplist=SorensStop — YOUR FULLTEXT INDEX

— Lets try again
select *
from Production.Product
where contains(*,’Headset’)
— CODE STOP —

Nice and easy 🙂

Søren

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s