Configuring SQL Server Full-Text Search

Here are step-by-step instructions for configuring tables in a SQL Server database for Full-Text search.  The following process should work for SQL Server 2005, 2008, 2008 R2, and 2012.

For more information, see Microsoft’s documentation on configuring Full-Text search:

SQL Server 2005
SQL Server 2008
SQL Server 2008 R2
SQL Server 2012

STEP 1:  Identify the tables to be full-text indexed. 

Tables to be full-text indexed must have a unique, single-column, non-nullable index.  For this example, a single table will be indexed.  Here is the DDL for the table:

CREATE TABLE dbo.Book
(  
    BookID int IDENTITY(1,1) NOT NULL CONSTRAINT PK_Book PRIMARY KEY,  
    Title nvarchar(2000) NOT NULL DEFAULT(”), 
    Authors nvarchar(2000) NOT NULL DEFAULT(”)
)
GO

— Sample data
INSERT dbo.Book (Title, Authors)
VALUES (‘A manual of the British marine Algae’,’Harvey, William H.’)
INSERT dbo.Book (Title, Authors)
VALUES (‘A manual of marine zoology for the British Isles.’,’Gosse, Philip Henry,’)
INSERT dbo.Book (Title, Authors)
VALUES (‘A history of the British sea-anemones and corals’,’Gosse, Philip Henry,’)
INSERT dbo.Book (Title, Authors)
VALUES (‘The life of North American insects’,’Jaeger, Benedict.|Preston, Henry C.’)
INSERT dbo.Book (Title, Authors)
VALUES (‘Proceedings of the California Academy of Sciences.’,’California Academy of Sciences.’)
INSERT dbo.Book (Title, Authors)
VALUES (‘The romance of natural history’,’Gosse, Philip Henry,’)
INSERT dbo.Book (Title, Authors)
VALUES (‘British conchology : or, an account of the Mollusca which now inhabit the British Isles and the surrounding seas’,’Jeffreys, John Gwyn,’)
INSERT dbo.Book (Title, Authors)
VALUES (‘Proceedings of the Entomological Society of Philadelphia.’,’Entomological Society of Philadelphia.’)
INSERT dbo.Book (Title, Authors)
VALUES (‘Observations on the Pedipalpi of North America’,’Wood, H. C.’)
INSERT dbo.Book (Title, Authors)
VALUES (‘On the origin of species, or, The causes of the phenomena of organic nature : a course of six lectures to working men’,’Huxley, Thomas Henry,’)
GO

This DDL represents a table that contains data about books.  The first column provides the unique, single-column, non-nullable index for the table, in the form of the table’s primary key.  The Title and Authors columns contain the searchable information about each book.  In a real-world scenario, the table would likely contain additional fields, both searchable and not. 

A good strategy for full-text indexing is to create a set of denormalized tables just to support full-text search, and accumulate data from the primary relational model into those tables.  This strategy separates tables used for OLTP from tables that support search functionality.  The key to making this strategy effective is deciding on an appropriate method and schedule for syncing the data between the OLTP tables and the search tables.

STEP 2: Enable the database for full-text indexing (SQL 2005 only)

In SQL Server 2008 and above, databases are always full-text enabled, so no action is needed.  In SQL 2005, full-text indexing is enabled be default.  If for some reason it is disabled, it can be enabled by executing the following command:

exec sp_fulltext_database ‘enable’

STEP 3 (OPTIONAL):  Edit the Noise Words list (SQL 2005 only)

Noise Words are very common words that, if full-text indexed, decrease the accuracy of full-text searches.  Examples are words like ‘a’, ‘the’, and ‘or’.  SQL Server 2005 ships with default lists of Noise Words for many different languages.

To view or edit the list, go to the database server and find the appropriate language file (for US English, the default file location is C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt).  If editing this file, don’t forget to make a backup!

For this example, after editing, the content of the file will be:

$
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

This represents the removal of most the contents of the file.  The table being indexed contains data about books, and many “noise” words prove to be significant for books.  For example, with the default Noise Word list, a full-text search for "origin of species" may fail to find Darwin’s best-known work (notice the noise word “of” in that title).  That is why, for this example, the content of the noise word file has been greatly reduced.

The need to modify the noise word list will vary by application; many times the file will not need to be edited.

If the noise word file is edited after the full-text catalogs have been created, they will need to be rebuilt.

STEP 4: Create a full-text catalog

A full-text catalog is a physical (SQL Server 2005) or logical collection of full-text indexes.

For SQL Server 2005, use the following command to set up a full-text catalog:

CREATE FULLTEXT CATALOG FullTextSearchCatalog IN PATH ‘D:\’ 

The path to the full-text catalog (in this case ‘D;\’) will be unique to the server on which SQL Server is installed.

Note:  As a best practice, full-text catalogs should be created on their own physical drive.

In SQL Server 2008 and after, a full-text catalog no longer has a physical implementation.  It is instead a logical concept that represents a group of full-text indexes, so no file paths are necessary.  Therefore, the command for setting up a full-text catalog in SQL Server 2008 and above is simply:

CREATE FULLTEXT CATALOG FullTextSearchCatalog

For information about other arguments for this command, look at this article on MSDN.

STEP 5: Create a full-text index

Use the CREATE FULLTEXT INDEX command to create a full-text index on a table, as shown here:

CREATE FULLTEXT INDEX ON dbo.Book
(  
    Title,  
    Authors  
)
KEY INDEX PK_Book ON FullTextSearchCatalog
GO

There are several things to notice about this command.  First, the index is being placed into the full-text catalog that was created in the previous step (FullTextSearchCatalog).  Next, the primary key of the table (PK_Book) is used to map table rows to full-text keys.  Finally, all columns to be full-text indexed (Title and Authors) are included in the index definition. 

STEP 6 (OPTIONAL): Populate the full-text indexes

This step is considered optional because the creation of the full-text index in the previous step should have automatically started the index population process.  However, this might need to be done again at a later date (if the noise words file is modified, for example).

To fully populate the full-text index, use this command:

ALTER FULLTEXT INDEX ON dbo.Book START FULL POPULATION
GO

Note the following command will accomplish the same thing:

ALTER FULLTEXT CATALOG FullTextSearchCatalog REBUILD
GO

It is also possible to update full-text indexes incrementally.  See the Microsoft’s documentation for more information about populating full-text indexes.

The status of index population can be verified by checking the PopulateStatus property of the full-text catalog.  Do this by using the FULLTEXTCATALOGPROPERTY function, as shown here:

SELECT FULLTEXTCATALOGPROPERTY(‘FullTextSearchCatalog’, ‘Populatestatus’)

A return value other than "0" indicates that index population operations are in progress.  This can be used to determine if the full-text indexes are online and available to respond to queries.  Prior to issuing a full-text query, a production system might check the full-text catalog status and issue an alternate (non-full-text) query if full-text indexes are offline.

As of SQL Server 2012, the PopulateStatus full-text catalog property is deprecated.  New development work should instead check the TableFullTextPopulateStatus property of a particular table using the OBJECTPROPERTYEX function.  Again, a return value other than “0” indicates that index population operations are in progress.

STEP 7 (OPTIONAL): Edit the default Stop List (SQL 2008 and up)

Stop Lists replace the Noise Word lists used in SQL 2005, and serve the same purpose (see STEP 3 above for more information).  Whereas Noise Word lists require manual editing of a text file on the server on which SQL Server is installed, Stop Lists can be modified via T-SQL scripts and the SQL Management Studio.

Following are the commands used to configure Noise Word lists on SQL 2008 and later.  First, create a Stop List:

CREATE FULLTEXT STOPLIST FullTextStopList FROM SYSTEM STOPLIST;

Once a stop list has been created, the sys.fulltext_stopwords table can be queried to view the contents of the Stop List.  Here is the command for viewing the English words in the Stop List:

SELECT * FROM sys.fulltext_stopwords WHERE language = ‘English’

For this example, the English-language Stop List should contain only the entries that we included in the SQL 2005 Noise Words list (STEP 3).  Because this requires removing most of the default words in the Stop List, the easiest way to produce the desired list is to drop all of the default entries and re-add only those that are needed.  Here are the commands to achieve this:

ALTER FULLTEXT STOPLIST FullTextStopList DROP ALL LANGUAGE ‘English’ ;

ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘$’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘A’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘B’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘C’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘D’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘E’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘F’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘G’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘H’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘I’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘J’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘K’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘L’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘M’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘N’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘O’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘P’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Q’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘R’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘S’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘T’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘U’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘V’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘W’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘X’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Y’ LANGUAGE ‘English’ ;
ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Z’ LANGUAGE ‘English’ ;

An important difference from SQL Server 2005 Noise Word lists is that Stop Lists can be applied only to selected full-text indexes.  (SQL Server 2005 Noise Word lists affect the entire server.) 

To associate the Stop List with a full-text indexed table, use the following command:

ALTER FULLTEXT INDEX ON dbo .Book SET STOPLIST FullTextStopList;

After associating a Stop List with a full-text index, the index should be repopulated (repeat STEP 6).

CONCLUSION

After completing these steps, it should be possible to submit full-text queries against the full-text indexed table.  Here is one example query that uses the new index:

SELECT  b.BookID AS ID, b.Title, b.Authors, x.[RANK]
FROM    CONTAINSTABLE(dbo.Book, (Title, Authors),
                      ‘("manual" and "British") or "Gosse"’) x
        INNER JOIN dbo.Book b ON x.[KEY] = b.BookID

If everything has been configured correctly, this query will return the following results:

ID Title                                             Authors             RANK
1  A manual of the British marine Algae              Harvey, William H.   32
2  A manual of marine zoology for the British Isles. Gosse, Philip Henry, 48
3  A history of the British sea-anemones and corals  Gosse, Philip Henry, 48
6  The romance of natural history                    Gosse, Philip Henry, 48

Hope this helps!

Advertisements

One Response to Configuring SQL Server Full-Text Search

  1. jj says:

    This post, especially the bit on modifying the default stop list, was really helpful today. Thank you for posting!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: