My goal is to use multithreading to create a hash of folder names and a frequency count (sort of “heat map” for where files are accessed). My code is included below. I have done it as a loop, but want to learn how to use multiple threads for this case.
My problems as I know them are:
1. $dirHash may not be atomic enough to be accessed by multiple threads at a time (this may be the showstopper). My thinking is that $dirHash is a pointer to a location in memory and it should be an atomic write for each update. Even if the update is for the same key, I thought this should work even though the intention is to “share” the hash between threads.
2. I am not able to get access to the variables in the main part of the script while executing the script block. I have tried to use all types of creation methods, but everything seems out of scope. I even tried adding another parameter for my debug variable with no luck. It acts like the threads are executing in a different session (though I don’t know where that is created yet) based on http://technet.microsoft.com/en-us/library/dd315289.aspx where sessions are described as:
Sessions:
A session is an environment in which Windows PowerShell runs. When you create a session on a remote computer, Windows PowerShell establishes a persistent connection to the remote computer. The persistent connection lets you use the session for multiple related commands.
Because a session is a contained environment, it has its own scope, but a session is not a child scope of the session in which is was created. The session starts with its own global scope. This scope is independent of the global scope of the session. You can create child scopes in the session. For example, you can run a script to create a child scope in a session.
3. “Normal” output methods like Write-Output and [System.Console]::WriteLine do not seem to work inside the script block.
4. The reference to the array element does not fully expand as expected when inside double quotes (used for debugging or progress tracking). I thought that double quotes expanded variables while single quotes did not. It seems like double quotes only expand part way.
The code I am basing this on is example 4-2 from Dr. Tobias Weltner’s demofiles_multithreading.zip file. I have an obnoxious amount to debug statements in this to see where the execution thread is going and what the values are by just running it. I have commented out the actual working loop because it “runs away” in the threaded environment as is and killing PowerShell is the only way to get the machine back most times.
The code:
# (C) 2012 Dr. Tobias Weltner
# you may freely use this code for commercial or non-commercial purposes at your own risk
# as long as you credit its original author and keep this comment block.
# For PowerShell training or PowerShell support, feel free to contact tobias.weltner@email.de
$dirList = @()
$dirHash = @{}
$debug = 1;
(Get-Variable dirList).options = "AllScope"
(Get-Variable dirHash).options = "AllScope"
(Get-Variable debug).options = "AllScope"
$dirList += Get-Content -Path c:tempdirlist.txt
$handleLimit = $dirList.Count
if ($debug) {"HandleLimit is $handleLimit"}
$throttleLimit = 4
$SessionState = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$Pool = [runspacefactory]::CreateRunspacePool(1, $throttleLimit, $SessionState, $Host)
$Pool.Open()
$ScriptBlock = {
param($id)
# param($id, $myDebug)
"Debug is : $global:debug"
"Processing ID : $id"
"DirList is : $global:dirList[$id]"
$p = $dirList[$id]
"P is : $p"
# while ($p -ne "" ) {
# if ($Global:debug) {Write-Output "Processing ID $id $p"} # print $p if debug is set
# $dirHash.$p = $dirHash.$p + 1 # Increment the count for the current path
# $p = Split-Path $p # Return the parent folder from path
# }
if ($myDebug) {Write-Output "Message 2 Done processing ID $id : " $dirList[$id]} # "Done processing ID $id : $dirList[$id]" outputs full array when we want 1 element
[System.Console]::WriteLine("Message 3 Done processing ID $id :")
}
$threads = @()
$handles = for ($x = 0; $x -le $handleLimit; $x++) {
$powershell = [powershell]::Create().AddScript($ScriptBlock).AddArgument($x)
# $powershell = [powershell]::Create().AddScript($ScriptBlock).AddArgument($debug)
$powershell.RunspacePool = $Pool
$powershell.BeginInvoke()
$threads += $powershell
}
# if ($debug) {Write-Output "handles are $handles"}
# if ($debug) {Write-Output "threads are $threads"}
"Outside: Debug is : $Debug"
"Outside: Processing ID : $id"
"Outside: DirList is : $dirList[$id]"
do {
$i = 0
$done = $true
foreach ($handle in $handles) {
if ($debug) {Write-Output "Handle is $handle"}
if ($handle -ne $null) {
if ($debug) {Write-Output "Handle is NOT null"}
if ($handle.IsCompleted) {
if ($debug) {Write-Output "Handle is Completed"}
$threads[$i].EndInvoke($handle)
if ($debug) {Write-Output "End Invoke"}
$threads[$i].Dispose()
if ($debug) {Write-Output "Dispose"}
$handles[$i] = $null
if ($debug) {Write-Output "Handle is Null"}
if ($debug) {Write-Output ""}
} else {
$done = $false
}
}
$i++
}
if (-not $done) { Start-Sleep -Milliseconds 500 }
if ($debug) {Write-Output "Waited 500 ms"}
} until ($done)
$dirHash
The test data:
T:Projects780 – KS112 Safety, Health, and Environmental112-2 SHE Plan and FormsJHA'sOffload
T:Projects780 – KS112 Safety, Health, and Environmental112-2 SHE Plan and FormsJHA'sOffload
T:Projects780 – KS112 Safety, Health, and Environmental112-2 SHE Plan and FormsJHA's
T:Projects780 – KS112 Safety, Health, and Environmental112-2 SHE Plan and FormsJHA'sOffload
T:Projects780 – KS112 Safety, Health, and Environmental
T:Projects796 – TX110 Reports and Logs110-8 Monthly Project ReportingCost to CompletesMonthly Package 201202
T:Projects797 – NDSubmittalsSent to JessTransmittal 0328
T:Projects797 – NDSubmittalsSent to JessTransmittal 0328
T:Projects797 – NDSubmittalsSent to JessTransmittal 0328
Any pointer to help me understand what is going on with the problems listed above?
Steve