Finding (and Deleting) Duplicate Files

by Jan 11, 2009

There are numerous ways of finding duplicate files. One approach uses Group-Object and groups your files by LastWriteTime and Length, assuming files with the same LastWriteTime and Length are indeed equal.

This line searches for duplicate PowerShell scripts in your user profile:

dir $home*.* -include '*.ps1' | Group-Object Length, LastWriteTime 

You can then filter out all groups with only one file in them so only the duplicate files are left.

dir $home*.* -include '*.ps1' | Group-Object Length, LastWriteTime | 
Where-Object {$_.Count -gt 1} | ForEach-Object { $_.Group }

To search for duplicate files recursively, add the -recurse parameter to Dir:

dir $home*.* -include '*.ps1'  -recurse | Group-Object Length, LastWriteTime | 
Where-Object {$_.Count -gt 1} | ForEach-Object { $_.Group }

Important: You get to see only files that have the same size and write time. They still could be different (which is very unlikely though). To make sure, you could hash the duplicate files based on their content.

To archive duplicate files, you would pick all files in group except for the first one (which you want to keep) and then move the files to some backup folder:

dir $home*.* -include '*.ps1' | Group-Object Length, LastWriteTime | 
Where-Object {$_.Count -gt 1} | ForEach-Object { $_.Group[1..1000] }

[1..1000] selects the second element upwards in the resulting array since array elements always start with an index number of 0. Simply append the move or delete cmdlet to complete your operation:

dir $home*.* -include '*.ps1' | Group-Object Length, LastWriteTime | 
Where-Object {$_.Count -gt 1} | ForEach-Object { $_.Group[1..1000] } |
Move-Item -destination c:backup -whatIf