Examining what was utilizing the CPU we found the process 'winlogon.exe' was consuming nearly the entire difference between Process Explorer and Task Manager.
Process Explorer allows us to dive 'deeper' into the process to determine the thread that is utilizing our CPU.
twi3.dll is a Citrix file. Click 'Module' and getting properties gives us more information about the file and its purpose.
"Seamless 2.0 Host Agent - main component"
Now that we have an idea of the purpose of twi3.dll is, we can being to test why it's consuming so much CPU. Citrix has options for modifying the behaviour of twi3.dll via "Seamless Flags".
For our environment we had the following set:
I experimented with modifying each value to determine what the impact on the CPU on the twi3.dll threads would experience.
To that end, you can modify these values immediately and they take effect on the next session connect (but does not impact existing sessions). With that, here were my results:
Lower numbers for CPU% are better and per user. More CPU's actually lower the maximum % winlogon.exe can consume, for this test it was done with a 6 core CPU. Less CPU's and the maximum % increases.. I imagine this is due to a thread limit or some such?
The values that had the most impact on CPU utilization were the lowest values for the WorkerWaitInterval or WorkerFullCheckInterval followed by Disable Active Accessibility Hook.
So what does WorkerWaitInterval / WorkerFullCheckInterval do?
WORKER WAIT INTERVAL / WORKER FULL CHECK INTERVAL
Explanation: This update addresses a custom application's performance when run seamlessly. Some applications appeared to be slower to respond when performing actions such as moving, resizing, or closing windows. This fix introduces two new registry settings that allow administrators to configure an explicit time interval for the seamless engine mechanism to monitor when changes take place in the seamless applications.
For both values, a larger size slows responsiveness but improves scalability; a smaller size increases responsiveness but decreases scalability slightly. The level of scalability depends on several factors, such as hardware sizing, types of applications, network performance, and number of users.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Citrix\wfshell\TWI Value Name: WorkerWaitInterval
Type: REG_DWORD
Value: 0
Value: (Values are between 5 – 500; the default is 50.)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Citrix\wfshell\TWI Value Name: WorkerFullCheckInterval
Type: REG_DWORD
Value: (Values are between 50 – 5000; the default is 500.)
Explanation: This update addresses a custom application's performance when run seamlessly. Some applications appeared to be slower to respond when performing actions such as moving, resizing, or closing windows. This fix introduces two new registry settings that allow administrators to configure an explicit time interval for the seamless engine mechanism to monitor when changes take place in the seamless applications.
For both values, a larger size slows responsiveness but improves scalability; a smaller size increases responsiveness but decreases scalability slightly. The level of scalability depends on several factors, such as hardware sizing, types of applications, network performance, and number of users.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Citrix\wfshell\TWI Value Name: WorkerWaitInterval
Type: REG_DWORD
Value: 0
Value:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Citrix\wfshell\TWI Value Name: WorkerFullCheckInterval
Type: REG_DWORD
Value:
For the graph above, the WorkerWaitInterval and WorkerFullCheckInterval defined in milliseconds, when they are a low value, forces the seamless engine to monitor for changes at a higher pace. This consumes additional CPU cycles.
For our environment we encountered issues as user count increased. It turns out, as each user logged on they consumed a fairly constant amount of CPU. The winlogon.exe process for the server in the screenshot averaged around 0.8% CPU per user, with 35 users that's 28% of the CPU no longer available. So why does Task Manager not display these values? The author of Process Explorer has the answer:
The configured time range we configured were 5ms so the threads executing have a good chance being before or after the timer tick of 15.6ms.
So this poses a question of what *should* the values be? Citrix Adaptive Display technology has a maximum frame rate of 30 for XenApp 6.5 and a maximum of 60 for XenApp 7+. Potentially, I think to achieve these frame rates a value of 16 for XA7 or 33 for XA65 could be set. If the FPS is set to a lower maximum, it would probably make more sense to do the math for that maximum FPS?
Further testing may be needed here.
Lastly, if you do not use Seamless Flags, e.g., if you use Shift F1 to switch to Window'ed Mode then Winlogon.exe will use no CPU for twi3.dll. You also get the same results with RDP, no CPU being used.
Lastly, if you do not use Seamless Flags, e.g., if you use Shift F1 to switch to Window'ed Mode then Winlogon.exe will use no CPU for twi3.dll. You also get the same results with RDP, no CPU being used.
2 comments:
Nice work :)
Great Article. Just for clarity, you set both of those reg keys to 5
Post a Comment