Get Text File Encoding

by Jan 22, 2019

Text files can be stored using different encodings, and to correctly reading them, you must specify the encoding. That’s why most cmdlets dealing with text file reading offer the -Encoding parameter (for example, Get-Content). If you don’t specify the correct encoding, you are likely ending up with messed up special characters and umlauts.

Yet how do you (automatically) determine the encoding a given text file uses? Here is a handy function that can help:

function Get-Encoding
{
  param
  (
    [Parameter(Mandatory,ValueFromPipeline,ValueFromPipelineByPropertyName)]
    [Alias('FullName')]
    [string]
    $Path
  )

  process 
  {
    $bom = New-Object -TypeName System.Byte[](4)
        
    $file = New-Object System.IO.FileStream($Path, 'Open', 'Read')
    
    $null = $file.Read($bom,0,4)
    $file.Close()
    $file.Dispose()
    
    $enc = [Text.Encoding]::ASCII
    if ($bom[0] -eq 0x2b -and $bom[1] -eq 0x2f -and $bom[2] -eq 0x76) 
      { $enc =  [Text.Encoding]::UTF7 }
    if ($bom[0] -eq 0xff -and $bom[1] -eq 0xfe) 
      { $enc =  [Text.Encoding]::Unicode }
    if ($bom[0] -eq 0xfe -and $bom[1] -eq 0xff) 
      { $enc =  [Text.Encoding]::BigEndianUnicode }
    if ($bom[0] -eq 0x00 -and $bom[1] -eq 0x00 -and $bom[2] -eq 0xfe -and $bom[3] -eq 0xff) 
      { $enc =  [Text.Encoding]::UTF32}
    if ($bom[0] -eq 0xef -and $bom[1] -eq 0xbb -and $bom[2] -eq 0xbf) 
      { $enc =  [Text.Encoding]::UTF8}
        
    [PSCustomObject]@{
      Encoding = $enc
      Path = $Path
    }
  }
}

Here is a test run checking all text files in your user profile:

 
PS> dir $home -Filter *.txt -Recurse | Get-Encoding

Encoding                    Path                                                                        
--------                    ----                                                                        
System.Text.UnicodeEncoding C:\Users\tobwe\E006_psconfeu2019.txt                                        
System.Text.UnicodeEncoding C:\Users\tobwe\E009_psconfeu2019.txt                                        
System.Text.UnicodeEncoding C:\Users\tobwe\E027_psconfeu2019.txt                                        
System.Text.ASCIIEncoding   C:\Users\tobwe\.nuget\packages\Aspose.Words\18.12.0\...
System.Text.ASCIIEncoding   C:\Users\tobwe\.vscode\extensions\ms-vscode.powers...
System.Text.UTF8Encoding    C:\Users\tobwe\.vscode\extensions\ms-vscode.powers... 
 

psconf.eu – PowerShell Conference EU 2019 – June 4-7, Hannover Germany – visit www.psconf.eu There aren’t too many trainings around for experienced PowerShell scripters where you really still learn something new. But there’s one place you don’t want to miss: PowerShell Conference EU – with 40 renown international speakers including PowerShell team members and MVPs, plus 350 professional and creative PowerShell scripters. Registration is open at www.psconf.eu, and the full 3-track 4-days agenda becomes available soon. Once a year it’s just a smart move to come together, update know-how, learn about security and mitigations, and bring home fresh ideas and authoritative guidance. We’d sure love to see and hear from you!

Twitter This Tip! ReTweet this Tip!