Hi,
Bit of a weird one…oh and im not trying to be dirty here 😉
I am writing a script that scans documents for a list of banned words. I am using the select-string command. My script seems to do the job, but now I have come across the problem where it is detecting the word '***' ('***' appears in the banned word list) in word documents. The problem is that the documents dont actually have the word '***' in the body of the doc, but the word 'do***ent' appears in the metadata for the file and so get flagged (as '***' is a substring of document).
This is a snippet of my script:
if($StandardFileGroup)
{
foreach ($File in $StandardFileGroup)
{
$Filebuffer = Select-String -path $File -pattern $BannedWords
If ($FileBuffer)
{
$FileInfo = $FileBuffer | Select-String -Pattern $BannedWords| select -ExpandProperty Matches | select -ExpandProperty Value
Write-Host "BANNED WORD FOUND:" -foregroundcolor red
Write-Host "FILE: $file"
Write-Host "BANNED WORD/S: $FileInfo" -foregroundcolor "red"
Add-Content $outputFile "$Date $File : $FileInfo"
}
Else
{
Write-Host ("NO BANNED WORDS FOUND IN FILE $file") -foregroundcolor "green"
Add-Content $outputFile "$date $item : NO BANNED WORDS FOUND!"
}
}
}
Below is what $filebuffer spits out, seems the word 'document' is getting flagged:
PS C:Userssadevebp> $Filebuffer
C:ScriptsBannedWordsFilterSampleDocsTestDoc1.doc:53: ���� � F
Microsoft Word 97-2003 Document
C:ScriptsBannedWordsFilterSampleDocsTestDoc1.doc:54: MSWordDoc Word.Docum
ent.8 �9�q
Any ideas on how to circumvent this? Obviously I dont want all word docs to get flagged, but i still need to be able to search the actual doc for the word '***'
Thaks in advance!