Friday, December 18, 2015

Help! My Citrix application is running slow (Meditech, Citrix, Imprivata)

We have an application (Meditech) that users have reported as performing poorly in Citrix.  They were able to confirm the poor performance by comparing it to a desktop that had the same software installed.  I was brought on to try and understand these performance differences and why Citrix would be performing so much worse.  To measure the performance, the user took me to a screen where they held down a key on the keyboard to increment a counter in the software.  When holding the key down on the desktop the counter was quick and incremented at a steady pace, on Citrix it seemed OK at first but then slowed as time went on.   To illustrate these performance differences, I made a video:

So we definitely have a problem.  My first steps on troubleshooting this problem was to compare the differences between the desktop and the VM.  The VM had server processors but they were older, and at less GHz at 2.6GHz for the server.  The desktop had 3.4GHz processors with a newer generation.  If the processing is NOT latent sensitive then the faster processor's can make a difference, and the CPUMark had them at 1500 and 2100 (~40% difference).  At first glance, this seems like it could be the difference in our timings but it's still too drastic at ~5s vs ~20s, a 400% difference.  To try and narrow the difference I took the application to a completely bare server with no software on it what-so-ever and reran the test.  It completed in about ~5-6s.  The processor it ran on was comparable to the original server processor but ran at 2.7GHz instead.  The processor was not running at nearly enough speed to make up the difference, something else must be consuming those cycles.

At this point I procmon'ed the benchmark and came up with the following:

(PID 6660 is the process that the benchmark runs against)

Very obviously, the pattern of each key stroke on the Citrix server is present with the initial pattern highlighted:

About 400ms from each key stroke to when the next one is registered.  So what is delaying the 400ms?  From my experience, unexplained delays are CPU related.  I then looked at the process activity summary to see if I can find the bottleneck:

Again, very obviously we are seeing the CPU ramp up, and it can also explain the faster, initial iterations of the GUI as the CPU 'ramps' up and doesn't spike to the it's maximum.  None of the other graphs show any activity at the time of the benchmark so the suspect is highly on the CPU.  When hovering over the graph we see the CPU percentage.

This is a 4 core box, so ~25% equals one full core for a single threaded application.  Again, this points to the application being bottlenecked by the processor, but again, the difference is too large to consider just the CPU at this point.  We need to find what part of the process is consuming these resources.  Fortunately, ProcExp (process explorer) can help us determine what is going on within a process.

I started a new run and got properties of the process:

MGUI.DLL is consuming all of the CPU.  That is a DLL utilized by the application, clicking on 'Stack' gives us the hierarchy of commands being utilized by that thread.

From this, I can understand that ntoskrnl.exe, ntdll.dll are native Microsoft Windows functions, MGUI.DLL is utilized by Meditech, but what is ISXHook.dll?

Doing a search within the process shows that it's utilized by Imprivata, a single sign-on solution we utilize to try and increase user efficiency.  It works by 'screen scraping' to determine fields that it needs to populate with user credentials to try and speed up user logins.  Logically, this sounds like it could be causing the delay by screen scraping every time a key is stroked or a change is registered.  To confirm this, we need to remove Imprivata from the application.  Fortunately, it's hooked in by services that can easily be terminated.

I'm going to terminate everything that says 'Imprivata Inc.'

With the processes terminated I reran my test.  Using Process Explorer and getting properties on the process, immediately CPU usage went from 25% down to 15% at a peak:

And getting Stack information showed a much cleaner stack:

In conclusion, I may need to investigate into Imprivata to determine if I can reduce its polling rate or find some way of allowing it to 'stop' polling the CSMagic process *after* the accelerated login.  Its current settings (which I'm not familiar with, sadly) is not acceptable and causes a significant slow down.  Fortunately, the root cause has been determined and we can work towards a full resolution.


will rears said...

Just curious if you ever found away around this problem. We are in the midst of rolling out imprivata accessing Paragon via Citrix, and noticed that the imprivata server agents is polling so frequently it is slowing application performance. No answers yet.

Trentent said...

Hi Will, we've escalated this to Imprivata engineering. Last I heard, they were working on it. If I get any update I'll post it here.

will rears said...

Thanks. We are pushing on our end as well. If we hear anything I'll keep you updated.

will rears said...

Worked with Imprivata today. We upgraded appliances to latest version, and applied a hot fix 4. Then reinstalled agent on citrix servers. Speed for our application has increased. Response time went from 20/30 seconds to 3 seconds. Hope this helps.

Trentent said...

We have been able to narrow this down to the GDIHook Imprivata uses. The hook can be disabled via the registry "SSOProvider/ISXAgent/HookInit" (this isn't exact, I can't recall exactly what the key's values were) to prove it out. After, Imprivata provided us with registry values "SSOProvider/ISXAgent/GDIHook /v program.exe /d *Window Title to Exclude GDIHook with Wildcards* (new line) * (wild card to hook into all window classes) /t RegSZ_MultiLine"

This solution doesn't quite work for us because Meditech generates the window titles dynamically, so trying to exclude every possible window title would be impossible. But the login windows actually follow a set pattern so if we could set an INCLUDE for GDIHook that would be faaaar more preferable. Our imprivata engineer didn't think that was an option but he's checking.

Trentent said...

Update, we have found that our Imprivata Profile had a 'window title target' of *. Of course, that targets EVERY window so that's why Imprivata is showing up and consuming CPU. Setting the profile to target the window title we want resolves this issue and stops Imprivata from scanning each window's content.

Trentent said...

Last update. We were provided with a new Imprivata client that resolves this issue in conjunction with the 'window title target' having an appropriate string as opposed to just wildcard. Without the new client we still experience slowness. This was 4.9.125.XXXX (engineering release). My understanding is the fix for this issue is not in the 'RTM' 4.9.125 so we need to wait for a further release.