Faced with this very issue, I decided to setup a local SQL Server Full-Text Search.
Some of the cool things Full-Text Search will give you, over and above, a standard search include the following:
- One or more specific words or phrases (simple term)
- A word or a phrase where the words begin with specified text (prefix term)
- Inflectional forms of a specific word (generation term)
- A word or phrase close to another word or phrase (proximity term)
- Synonymous forms of a specific word (thesaurus)
- Words or phrases using weighted values (weighted term)
In order to get stared with the setup, it's important to know that the Full-Text Search architecture relies on filters for searching various file types. This is important for this example because the PDF filter is not installed by default. So, for starters, we need to go download and install the PDF ifilter (PDFFilter64Setup.msi).
There is a trick with getting the installation setup correctly; you have to setup a system variable. Here's how I did it.
Environment Variables:
From Environment Variables I first created a new Variable "AdobeiFilter" and pointed it to the bin directory associated with the install:
I then clicked Edit on the Path Variable, in environment Variables, and appended %AdobeiFilter% to the end:
I can validate this by opening run and typing "%AdobeiFilter%" which drops me to my bin directory here: "C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin"
You may also need to grant your SQL Server Service account access to this folder.
Next we'll configure SQL Server Full-Text. You can check to see if it's already installed by running the following:
SELECT Serverproperty('IsFullTextInstalled')
If it's not installed you can run through the standard SQL Installation choosing "New SQL Server stand-alone installation or add features to an existing installation":
The feature is listed as "Full-Text and Semantic Extractions for search":
Next we run a couple of sp_fulltext_service commands:
--Load operating system filters and word breakers
EXEC Sp_fulltext_service @action='load_os_resources',
@value=1;
--Do not verify whether binaries are signed
EXEC Sp_fulltext_service
@action='verify_signature',
@value=0;
Validate SQL is associating Full-Text and the PDF filter:
SELECT document_type,
path FROM sys.fulltext_document_types
WHERE document_type = '.pdf'
You may need to restart the SQL Service, possibly the box, at this point if the association isn't there.
For this blog I created a new database, "FULLTEXTDEMO". By default full-text indexing was enabled at the time of creation. This can also be checked with the following query:
SELECT NAME,
is_fulltext_enabled
FROM sys.databases
ORDER BY NAME
is_fulltext_enabled
FROM sys.databases
ORDER BY NAME
Next I created a table to hold my full-text pdf files:
USE [FULLTEXTDEMO]
go
CREATE TABLE [dbo].[files]
(
[id] [INT] IDENTITY(1, 1) NOT NULL,
[filename] [NVARCHAR](max) NULL,
[filetype] VARCHAR(5) NULL,
--File type is required for creating a Full-Text Index
[file] [VARBINARY](max) NULL,
CONSTRAINT [PK_files] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (pad_index =
OFF, statistics_norecompute = OFF, ignore_dup_key = OFF, allow_row_locks =
on, allow_page_locks = on) ON [PRIMARY]
)
ON [PRIMARY]
textimage_on [PRIMARY]
go
CREATE TABLE [dbo].[files]
(
[id] [INT] IDENTITY(1, 1) NOT NULL,
[filename] [NVARCHAR](max) NULL,
[filetype] VARCHAR(5) NULL,
--File type is required for creating a Full-Text Index
[file] [VARBINARY](max) NULL,
CONSTRAINT [PK_files] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (pad_index =
OFF, statistics_norecompute = OFF, ignore_dup_key = OFF, allow_row_locks =
on, allow_page_locks = on) ON [PRIMARY]
)
ON [PRIMARY]
textimage_on [PRIMARY]
Next we create a Full-Text Catalog
USE [FULLTEXTDEMO]
go
CREATE fulltext catalog [FILESCATALOG]
go
go
CREATE fulltext catalog [FILESCATALOG]
go
...and a full-text index to go with it:
Next, I loaded a couple of files:
INSERT INTO [FULLTEXTDEMO].[dbo].[files]
([filename],
[filetype],
[file])
SELECT 'D:\Files\Administrators Guide.pdf',
'.pdf',
bulkcolumn
FROM OPENROWSET(BULK 'D:\Files\Administrators Guide.pdf', single_blob) AS
input
INSERT INTO [FULLTEXTDEMO].[dbo].[files]
([filename],
[filetype],
[file])
SELECT 'D:\Files\Installation Guide.pdf',
'.pdf',
bulkcolumn
FROM OPENROWSET(BULK 'D:\Files\Installation Guide.pdf', single_blob) AS input
Finally, some fun with Query with Full-Text Search
--Simple Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"SQL Server"')
--Prefix Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"MS*"')
--Generation Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], 'FORMSOF(INFLECTIONAL, "install")')
--Proximity Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], 'NEAR((".NET","SQL Server"),2)')
--Weighted Term
SELECT F.[filename],
KEY_TBL.rank
FROM [FULLTEXTDEMO].[dbo].[files] AS F
INNER JOIN CONTAINSTABLE([FULLTEXTDEMO].[dbo].[files], [file],
'ISABOUT ("Install*", Windows WEIGHT(0.1), SQL WEIGHT(0.9) ) ' ) AS
KEY_TBL
ON F.id = KEY_TBL.[key]
ORDER BY KEY_TBL.rank DESC
--PROTIP: if you do it like this you can get a clickable link
SELECT ( 'file://' + [filename] ) AS [File]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"SQL Server"')
FOR xml raw
You can also get crazy with custom synonyms via: Configure and Manage Thesaurus Files for Full-Text Search
Statistical Semantic Search is really interesting, and it requires a separate setup. Details are here, if you're interested:
Install and Configure Semantic SearchINSERT INTO [FULLTEXTDEMO].[dbo].[files]
([filename],
[filetype],
[file])
SELECT 'D:\Files\Administrators Guide.pdf',
'.pdf',
bulkcolumn
FROM OPENROWSET(BULK 'D:\Files\Administrators Guide.pdf', single_blob) AS
input
INSERT INTO [FULLTEXTDEMO].[dbo].[files]
([filename],
[filetype],
[file])
SELECT 'D:\Files\Installation Guide.pdf',
'.pdf',
bulkcolumn
FROM OPENROWSET(BULK 'D:\Files\Installation Guide.pdf', single_blob) AS input
Finally, some fun with Query with Full-Text Search
--Simple Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"SQL Server"')
--Prefix Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"MS*"')
--Generation Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], 'FORMSOF(INFLECTIONAL, "install")')
--Proximity Term
SELECT [filename]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], 'NEAR((".NET","SQL Server"),2)')
--Weighted Term
SELECT F.[filename],
KEY_TBL.rank
FROM [FULLTEXTDEMO].[dbo].[files] AS F
INNER JOIN CONTAINSTABLE([FULLTEXTDEMO].[dbo].[files], [file],
'ISABOUT ("Install*", Windows WEIGHT(0.1), SQL WEIGHT(0.9) ) ' ) AS
KEY_TBL
ON F.id = KEY_TBL.[key]
ORDER BY KEY_TBL.rank DESC
--PROTIP: if you do it like this you can get a clickable link
SELECT ( 'file://' + [filename] ) AS [File]
FROM [FULLTEXTDEMO].[dbo].[files]
WHERE CONTAINS([file], '"SQL Server"')
FOR xml raw
You can also get crazy with custom synonyms via: Configure and Manage Thesaurus Files for Full-Text Search
Thanks a lot. explained perfectly
ReplyDelete