Scaling and Load Balancing Session Recording

By Hal Lange posted 04-02-2019 03:07 PM

  

One of our favorite and most overlooked products in the Citrix line is Session Recording. It gives an ability to see what is happening with the users and not just hearsay. One of the problems is that if the wrong people have access, it becomes an HR nightmare. I am not here to discuss that issue, but to show the latest frustrations with this wonderful product.

Background

We have what most people would call a very large environment. With a concurrent peak utilization daily of over 20k users, I can agree. This includes XenDesktop Desktop and Shared workloads across multiple datacenters. Here are the specs used at the beginning after reading the Citrix articles:

4 Session Recording servers – each with 8 vCPU, 32 GB RAM, 20GB space for MSMQ buffer File Share – 10TB Nutanix AFS

Here are the links that we followed:

https://support.citrix.com/article/CTX230015
https://support.citrix.com/article/CTX230013
https://docs.citrix.com/en-us/session-recording/current-release/configure/load-balancing.html
https://support.citrix.com/article/CTX200869

While following the first link will get you the closest, combining all of these links still miss out on some very important points.

Here are the steps Citrix does have documented correctly:

  • LB on the Netscaler – Make sure to configure all 3 ports: HTTP-80 / HTTPS-443 / TCP-1801
  • The reg key to tell Session Recording to load balance HKLM\Software\Citrix\SmartAuditor\Server   EnableLB=dword:1
  • SQL Permissions on the DB – Make sure to set all Server accounts with the db_owner membership on the database

The parts that Citrix mentions but does not give enough detail surround File Share and MSMQ redirection.

Redirecting MSMQ

Citrix documentation seems straight forward, until you read the powershell script they provide to copy the configuration. At this point, you will see that they are editing one file and then creating a completely different file on the other server. Which is the correct file?

Both are the correct method. The easiest method I have found to keep all of your servers looking like the same config is to create a new file.

Create a new file C:\Windows\System32\msmq\Mapping\sr_lb_map.xml

Add the following lines to the newly-created file changing the <Load_Balance_FQDN> and the <Local_Server_FQDN> to your local names:

<redirections xmlns="msmq-queue-redirections.xml">        <redirection>               <from>http://<Load_Balance_FQDN>*/msmq/private$/CitrixSmAudData</from>               <to>http://<Local_Server_FQDN>/msmq/private$/CitrixSmAudData</to>        </redirection>        <redirection>               <from>https://<Load_Balance_FQDN>*/msmq/private$/CitrixSmAudData</from>               <to>https://<Local_Server_FQDN>/msmq/private$/CitrixSmAudData</to>        </redirection> </redirections>

**All servers in the LB group will have a unique file because of the <Local_Server_FQDN> component.

Restart the MSMQ service or restart the server and your MSMQ service is ready to be load balanced.

File Storage

The file storage is the tricky one. We have the following requirements:

  • over 20k recordings concurrently
  • multiple datacenters
  • 30 days of retention

If you were to follow the recommendations from Citrix, you would use a file share that all the servers have Read/Write access to and leave it at that. There are a few problems with this approach as listed above. How do you have multiple datacenters write to the same share? How do you have Session Recording servers stay to their own datacenter?

When it was first tested in our environment, we used one share in one datacenter. While on the surface it looked like everything was working, when you tried to view the recordings only about 60% were available to watch. As we started to go through the Event Viewer on the Session Recording Servers, there were many errors of SQL dead-lock and file failures. The file failures were for multiple reasons: File opened by another user, couldn’t append the file, couldn’t find the file. The files were an issue, but we were much more concerned about the SQL dead-locks.

Troubleshooting SQL led to an interesting discovery: the dead-locks were coming from the Session Recording servers and NOT SQL. This dead-lock message was the incorrect error for a file access issue that was occurring.

At this point, we had followed Citrix recommendations to the letter and were getting about 60% success rate on recordings and countless errors regarding the files. 

As we kept doing more research, we started reading the Citrix articles more closely. There was a line in CTX200869 that caught our attention:

“Store data on a set of local disks controlled either as RAID by a local disk controller or as a Storage Area Network (SAN). Storing data on a Network Attached Storage (NAS) based on file-based protocols such as SMB, CIFS, or NFS has serious performance and security implications. Never use this configuration in a production deployment of Session Recording.”

That is a very interesting statement. According to all of their other Session Recording Load Balancing documentation, you must use a file share. Now that we have found the conflicting statements we have a starting point to work with.

File Storage Corrections

Now this leads to a new problem. How do we setup shared file storage, across multiple servers, across datacenters, where shares are not recommended?

Here are the requirements that need to be considered.

  • Local Disk is supported
  • Multiple Servers read/write the same files
  • Ownership of the file changing from server to server
  • All servers will need to find the file for playback
  • Allow for seamless access across datacenter

The scenario that Ryan floated out is what we affectionately call Local Drive loopback. What this entails is adding local storage to the Session Recording server and then sharing it. 

Lange1.jpg

Figure 1:  Local Drive Loopback

To use this setup:

  • Configure the Session Recording file locations like you would for a local server
  • Stop the two Session Recording services
  • Configure local disks added as Mount Points inside the folder C:\SessionRecordings
  • Create a Share on C:\SessionRecordings and give full access to all Session Recording servers
  • Configure the Session Recording server back to its local share

           

Lange2.png

Figure 2:  Server 1 configuration
                               

Lange3.png

Figure 3: Server 2 configuration

With this configuration, we now have access for each server locally, but each server can communicate with each other if necessary. Each folder listed will have recordings load balanced across the folders.

Current Configuration

Now that we have figured out how to make this work appropriately, without all the SQL and file errors, it is time to look at the server configuration.

We started with:

4 Session Recording servers – each with 8 vCPU, 32 GB RAM, 20GB space for MSMQ buffer File Share – 10TB Nutanix AFS

I always felt those servers were way to big for Session Recording. Now that the configuration has been corrected, I have changed the config to more servers with a smaller footprint.

We currently use 12 Session Recording servers each with:

  • 4 vCPU
  • 8 GB RAM
  • 1 x 100 GB OS drive
  • 5 x 2 TB drives for Recording storage

It turns out for 30 days worth of retention and some head room in case of failure, we need to use 120 TB of storage across all the servers. Mileage definitely varies on how much space you need for your retention.

The last piece of advice that I offer is don’t forget to run the cleanup utility at a regular scheduled interval using the icldb tool. Session Recording will not clean itself up without this.

 
Hal Lange and Ryan Revord


#Virtual_Apps_Desktops
#XenApp
#XenDesktop
#Session_Recording
#Smart_Auditor

​​​​​​​