Today I would like to show you a performance table comparing different ways to sort a list/array unique.
Sometimes it is necessary to sort a list or an array unique to get rid of duplicates this can be a time consuming task
In this post we will have a look at 3 ways to sort a list unique.
- Sort-Object -Unique
- Get-Unique
- HashSet-Class
First we will create 3 different lists containing random strings in several sizes (small, medium, large)
#List elements
$ListOptionA ="Blue","Red","Green"
$ListOptionb ="Dog","Horse","Cat"
#Create a small set of strings based on list elemtents and a random number
$ListSmall = (0..100).ForEach({
"$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})
#Create a medium set of strings based on list elemtents and a random number
$ListMedium = (0..10000).ForEach({
"$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})
#Create a large set of strings based on list elemtents and a random number
$ListLarge = (0..1000000).ForEach({
"$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})
Now we can start to fetch results:
$Results = New-Object -TypeName System.Collections.Generic.List[PSCustomObject]
$ListOptions = "Small","Medium","Large"
$Method = "Sort-Object -Unique"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
$StopWatch = New-Object System.Diagnostics.Stopwatch
$StopWatch.Start()
$UniqueList = $($_ | Sort-Object -Unique)
$StopWatch.Stop()
$Results.Add([PSCustomObject]@{
MethodName = $Method
ListSize = "$($ListOptions[$Index]) $($_.Count)"
Result = $UniqueList.count
TimeElapsed = $StopWatch.Elapsed
TimeElapsedMS = $StopWatch.ElapsedMilliseconds
})
$Index++
})
$Method = "get-unique"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
$StopWatch = New-Object System.Diagnostics.Stopwatch
$StopWatch.Start()
$UniqueList = $($_ | Sort-Object | get-Unique)
$StopWatch.Stop()
$Results.Add([PSCustomObject]@{
MethodName = $Method
ListSize = "$($ListOptions[$Index]) $($_.Count)"
Result = $UniqueList.count
TimeElapsed = $StopWatch.Elapsed
TimeElapsedMS = $StopWatch.ElapsedMilliseconds
})
$Index++
})
$Method = "Hashset"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
$StopWatch = New-Object System.Diagnostics.Stopwatch
$StopWatch.Start()
$HashSet = New-Object System.Collections.Generic.HashSet[string]
foreach($Listelement in $_){
$HashSet.Add($Listelement) | Out-Null
}
$StopWatch.Stop()
$Results.Add([PSCustomObject]@{
MethodName = $Method
ListSize = "$($ListOptions[$Index]) $($_.Count)"
Result = $HashSet.count
TimeElapsed = $StopWatch.Elapsed
TimeElapsedMS = $StopWatch.ElapsedMilliseconds
})
$Index++
})
The result from this run looks on my machine like this:
MethodName | ListSize | Result | TimeElapsed | TimeElapsedMS |
---|---|---|---|---|
Sort-Object -Unique | Small 101 | 34 | 00:00:00.0003934 | 0 |
Sort-Object -Unique | Medium 10001 | 40 | 00:00:00.0582319 | 58 |
Sort-Object -Unique | Large 1000001 | 40 | 00:00:12.6371431 | 12637 |
get-unique | Small 101 | 34 | 00:00:00.0005651 | 0 |
get-unique | Medium 10001 | 40 | 00:00:00.0877467 | 87 |
get-unique | Large 1000001 | 40 | 00:00:15.0103995 | 15010 |
Hashset | Small 101 | 34 | 00:00:00.0050367 | 5 |
Hashset | Medium 10001 | 40 | 00:00:00.0995172 | 99 |
Hashset | Large 1000001 | 40 | 00:00:07.8959100 | 7895 |
Which conclusion can we get from this table above? At first not one of them is the best solution for any situation. We should choose Sort-Object -unique for lists from 0 up to 1000 elements. If the list increases dramatically we should choose the Hashset approach. Also we should not use get-unique, because to make this work we have to sort the list first and this is more time consuming as to use the plain sort-object method like you can see this in the result-table.
Best regards, Christian