In the previous tip we explained what Group-Object can do for you, and how awesome it is. Unfortunately, Group-Object does not scale well. When you try and group a large number of objects, the cmdlet may take a very long time.
Here is a line that groups all files in your user profile by size. This could be an important prerequisite when you want to check for duplicate files. While this line will eventually yield results, it may take many minutes or even hours:
$start = Get-Date $result = Get-ChildItem -Path $home -Recurse -ErrorAction SilentlyContinue -File | Group-Object -Property Length $stop = Get-Date ($stop - $start).TotalSeconds
Because of these limitations, we created a PowerShell-based implementation of Group-Object and called it Group-ObjectFast. It basically does the same thing, just faster.
function Group-ObjectFast { param ( [Parameter(Mandatory,Position=0)] [Object] $Property, [Parameter(ParameterSetName='HashTable')] [Alias('AHT')] [switch] $AsHashTable, [Parameter(ValueFromPipeline)] [psobject[]] $InputObject, [switch] $NoElement, [Parameter(ParameterSetName='HashTable')] [switch] $AsString, [switch] $CaseSensitive ) begin { # if comparison needs to be case-sensitive, use a # case-sensitive hash table, if ($CaseSensitive) { $hash = [System.Collections.Hashtable]::new() } # else, use a default case-insensitive hash table else { $hash = @{} } } process { foreach ($element in $InputObject) { # take the key from the property that was requested # via -Property # if the user submitted a script block, evaluate it if ($Property -is [ScriptBlock]) { $key = & $Property } else { $key = $element.$Property } # convert the key into a string if requested if ($AsString) { $key = "$key" } # make sure NULL values turn into empty string keys # because NULL keys are illegal if ($key -eq $null) { $key = '' } # if there was already an element with this key previously, # add this element to the collection if ($hash.ContainsKey($key)) { $null = $hash[$key].Add($element) } # if this was the first occurrence, add a key to the hash table # and store the object inside an arraylist so that objects # with the same key can be added later else { $hash[$key] = [System.Collections.ArrayList]@($element) } } } end { # default output are objects with properties # Count, Name, Group if ($AsHashTable -eq $false) { foreach ($key in $hash.Keys) { $content = [Ordered]@{ Count = $hash[$key].Count Name = $key } # include the group only if it was requested if ($NoElement -eq $false) { $content["Group"] = $hash[$key] } # return the custom object [PSCustomObject]$content } } else { # if a hash table was requested, return the hash table as-is $hash } } }
Simply replace Group-Object with Group-ObjectFast in the sample above, and check how much time it takes:
$start = Get-Date $result = Get-ChildItem -Path $home -Recurse -ErrorAction SilentlyContinue -File | Group-ObjectFast -Property Length $stop = Get-Date ($stop - $start).TotalSeconds
In our tests, the original Group-ObjectFast was roughly 10 times faster than Group-Object.