Converting Word Documents from .doc to .docx (Part 2)

by Nov 12, 2019

Converting old Word documents to the new .docx format can be a lot of work, and in part 1 you learned the basic steps to automate conversion.

However, to do it right, there are a number of extra steps. If you want to adhere to security guide lines, you need to find out whether there are macros in the documents, and change the extension accordingly. Also, if a document is in read-only mode, you cannot convert it and should skip conversion. And if you bulk-convert a lot of document, a progress bar would be nice to have.

A big thanks to Lars Köpcke, security specialist at the city of Wuppertal, for adding the macro check and testing for read-only documents!

Here is a revised function that bulk-converts like a charm. Still, this is just a prototype. If you want to use it in production, make sure you understand it and add all the error handling and reporting you need. If you don’t want the function to overwrite existing files, add checks.

function Convert-WordDocument
{
  param
  (
    # accept path strings or items from Get-ChildItem
    [Parameter(Mandatory,ValueFromPipeline,ValueFromPipelineByPropertyName)]
    [string]
    [Alias('FullName')]
    $Path
  )
  
  begin
  {
    # we are collecting all paths first
    [Collections.ArrayList]$collector = @()
  }

  process
  {
    # find extension
    $extension = [System.IO.Path]::GetExtension($Path)
    
    # we only process .doc and .dot files
    if ($extension -eq '.doc' -or $extension -eq '.dot')
    {
        # add to list for later processing
        $null = $collector.Add($Path)

    }
  }
  end
  {   
    # pipeline is done, now we can start converting!

    Write-Progress -Activity Converting -Status 'Launching Application'

    # initialize Word (must be installed)
    $word = New-Object -ComObject Word.Application

    $counter = 0
    Foreach ($Path in $collector)
    {
        # increment a counter for the progress bar
        $counter++

        # open document in Word
        $doc = $word.Documents.Open($Path)

        # determine target document type
        # if the doc has macros, use different extensions

        [string]$targetExtension = ''
        [int]$targetConversion = 0

        switch ([System.IO.Path]::GetExtension($Path))
        { 
          '.doc' {    
            if ($doc.HasVBProject -eq $true)
            { 
              $targetExtension = '.docm'
              $targetConversion = 13
            }
            else
            {
              $targetExtension = '.docx'  
              $targetConversion = 16     
            }
          }
          '.dot' {
            if ($doc.HasVBProject -eq $true)
            { 
              $targetExtension = '.dotm'
              $targetConversion = 15 
            }
            else
            {
              $targetExtension = '.dotx'  
              $targetConversion = 14      
            }
          }
        }

        # conversion cannot work for read-only docs
        If (!$doc.ActiveWindow.View.ReadingLayout)
        {
            if ($targetConversion -gt 0)
            {
              $pathOut = [IO.Path]::ChangeExtension($Path, $targetExtension)
              
              $doc.Convert()
              $percent = $counter * 100 / $collector.Count
              Write-Progress -Activity 'Converting' -Status $pathOut -PercentComplete $percent
              $doc.SaveAs([ref]$PathOut,[ref] $targetConversion)
            }
        }

        $word.ActiveDocument.Close()
    } 

    # quit Word when done
    Write-Progress -Activity Converting -Status Done.
    $word.Quit()
  }
}

This is how an example call would look like:

 
PS> dir F:\documents -Include *.doc, *.dot -Recurse | Convert-WordDocument 
 

Twitter This Tip! ReTweet this Tip!