Select-String word search

by Mar 14, 2013

Hi,

Bit of a weird one…oh and im not trying to be dirty here 😉

I am writing a script that scans documents for a list of banned words. I am using the select-string command. My script seems to do the job, but now I have come across the problem where it is detecting the word '***' ('***' appears in the banned word list) in word documents. The problem is that the documents dont actually have the word '***' in the body of the doc, but the word 'do***ent' appears in the metadata for the file and so get flagged (as '***' is a substring of document). 

This is a snippet of my script:

if($StandardFileGroup)

{

    foreach ($File in $StandardFileGroup)  

        {

            $Filebuffer = Select-String -path $File -pattern $BannedWords

                If ($FileBuffer) 

                    {

                        $FileInfo = $FileBuffer | Select-String -Pattern $BannedWords| select -ExpandProperty Matches | select -ExpandProperty Value

                        Write-Host "BANNED WORD FOUND:" -foregroundcolor red

                        Write-Host "FILE: $file" 

                        Write-Host "BANNED WORD/S: $FileInfo" -foregroundcolor "red"

                        Add-Content $outputFile "$Date $File : $FileInfo"

                    } 

 

               Else

                   {

                        Write-Host ("NO BANNED WORDS FOUND IN FILE $file") -foregroundcolor "green"

                        Add-Content $outputFile "$date $item : NO BANNED WORDS FOUND!"

                   }

        }

}

 

Below is what $filebuffer spits out, seems the word 'document' is getting flagged:

PS C:Userssadevebp> $Filebuffer

 

C:ScriptsBannedWordsFilterSampleDocsTestDoc1.doc:53:  ï¿½ï¿½ï¿½ï¿½         �      F  

  Microsoft Word 97-2003 Document 

C:ScriptsBannedWordsFilterSampleDocsTestDoc1.doc:54:   MSWordDoc   Word.Docum

ent.8 �9�q                                                                         

 

Any ideas on how to circumvent this? Obviously I dont want all word docs to get flagged, but i still need to be able to search the actual doc for the word '***'    

 

Thaks in advance!