Asp.Net session state in a web farm not being shared correctly

TL;DR The session ID in the database uses the Site ID from IIS as part of a composite key. Ensure the IIS Site ID is consistent in a web farm.

The website I work on needed to use a RadCaptcha recently on a form. Although it was configured as per the Telerik article to use out of process session state (SQL for us), it would occasionally show a grey box instead of the captcha.

After some investigation with Fiddler we found that one of the three web servers couldn’t share session state with the other two. I.e. a capture generated on server 1 couldn’t be read by server 2, but could be read by server 3. And the inverse of this was also true. Machine keys were already being shared between the servers so it wasn’t a decryption problem.

Monitoring the AspSession database watching session creation we found that server 1 and 3 were creating the same session ID whilst server 2 was creating a different but very similar ID. We saw the following IDs in the ASPStateTempSessions table in ASPState database:


After a bit of investigation I found this article which describes how the SessionID is made up of the Session ID + Application ID. The Application ID is a hash based on the AppName which is based on the metabase path of your IIS site.

As our servers run multiple sites and we’d brought these sites online in a different order, the Site IDs had got out of sync. We found the following values.

server 1 => /lm/w3svc/7/root
server 2 => /lm/w3svc/5/root
server 3 => /lm/w3svc/7/root

Server 1 and 3 AppName’s hashed to the same value, but server 2 didn’t and couldn’t therefore find the session data for the RadCaptcha.

I changed the IIS Site ID using IIS admin under Site => Advanced Settings as below and it fixed the problem. Note: this will recycle your app pool.

Moving from Redis on Windows to AppFabric Cache

The site I work on fetches significant amounts of data on-demand from a remote data centre via a JSON feed. In order to provide resilience against the site being unavailable I implemented Redis on Windows as a read-through cache. (None Windows operating systems were not permitted at the time). I also implemented a refresh-ahead cache service which would ensure ‘hot’ content was always fresh.

Although the solution worked I was recently asked to look at replacements for several reasons

  • Redis on Windows wasn’t a supported platform
  • We couldn’t cluster it trivially as members within a distributed Redis cluster were not equal so some would need to be read slaves and some write masters. This would introduce more complexity to our code, deployment and could have introduced a single point of failure.
  • When a Redis process reached it’s 32-bit Windows memory limit of 2GB it would crash. We had deployed 4 instances to each server and we distributed the read/writes across these instances.

I chose AppFabric Caching Services from Microsoft’s for a few reasons, and within an hour or so had replaced Redis with AppFabric.

  • It supports equal cluster members, so all members are read/write
  • Native 64-bit support meant no 2GB process limit
  • Active support and community
  • Like Redis, it’s free

Having deployed AppFabric in a production environment for several months now, I thought i’d write a few findings:

  • Installation is not entirely scriptable. Despite efforts by Microsoft to make everything scriptable with PowerShell the very first installation and subsequent upgrades had to be done via the Wizard. After that, the PowerShell commandlets were available for use.
  • There can be long waits (5 minutes or so) during startup when using a distributed cache. It makes sense that the various nodes require time to synchronise, but the error messages that are reported are cryptic.
  • It’s rock solid once you get it running. We’ve used it now for about 3 months in a 3 server cluster storing items in the cache of upto 50MB and it’s fast.
  • You can’t enumerate the cache keys. With Redis our refresh-ahead service would iterate over the keys and freshen content if necessary, we’ve had to drop this functionality and take a small performance hit with AppFabric.
  • Upgrading isn’t entirely trivial. I upgraded our servers from v1.1 to v1.2 and although it’s designed to have heterogenous nodes in the cluster with differing versions, some of the security settings appeared to have changed between versions meaning that our website ‘client’ was not permitted access to the cluster. Luckily we still had Redis running in the background and a config switch directed the website at that temporarily.
  • Changing cache settings requires you to delete the cache. Carefully consider the largest size object you want in the cache when you create the cache – if you want to change it in future, you’ll need to either start a new cache and update your config, or take the site off-line whilst you upgrade your cache.
  • You need to specify the CacheItemVersion when deleting items. Use the following to remove from the cache. When I tried Remove(key) I always got a false returned and the item remained. This makes sense that in a distributed cache it needs to know explicitly which version to remove, but seemed to only occur when we had a cache cluster.

    public bool Remove(string key)
    var dataCacheItem = _cache.Get(key);
    return _cache.Remove(key, dataCacheItem.Version);

Overall I’m pretty happy with AppFabric, but some of these gotchas, the general lack of enthusiasm on forums etc. concerns me slightly about the future. But hey, it didn’t take long to write the concrete implementation, tests and config switches needed to get it into the codebase, so it won’t take long to replace it if it does get dropped.

Visual Studio ‘Attach to w3wp process’ macro

In Visual Studio, choose Tools | Macro, Macros IDE, create a new module and drop this in. I bind it to Ctrl-Alt-1 for quick access.

    Sub AttachToW3WP()
        Dim attached As Boolean = False
        Dim proc As EnvDTE.Process

        For Each proc In DTE.Debugger.LocalProcesses
            If (Right(proc.Name, 8 ) = "w3wp.exe") Then
                attached = True
            End If

        If attached = False Then
            MsgBox("w3wp.exe is not running")
        End If

    End Sub

Update: 12/March/2014 If you’re using Visual Studio 2012 or newer, you will need to use the AttachTo extension.

Fiddler tips for HTTP Debugging

Fiddler is a Web Debugging Proxy which logs all HTTP(S) traffic between your computer and the Internet. Fiddler allows you to inspect all HTTP(S) traffic, set breakpoints, and “fiddle” with incoming or outgoing data.

Download Fiddler from, it’s freeware! It runs on Windows, but can debug traffic originating in any operating system (by making that OS point to Fiddler on Windows as a proxy). Before reading this you should read these articles which provide an overview of Fiddler.

Stubbing network responses

During development with a third party it’s often handy to insulate yourself from any downtime/network problems that might affect your testing. Quite often this involves writing a piece of code to simulate network responses and pointing your app to that. Instead of doing this, turn to Fiddler.

Record and replay

Configure your application to use Fiddler as a proxy (see this for .NET apps, use localhost:8888), then hit your third party endpoint with your application. Fiddler will capture the traffic in the session list. Now, click on the Auto Responder tab and enable Automatic Responses. Drag each row from the session list into the Auto Responder list. Now re-run your app, and instead of connecting to the remote machine, Fiddler will auto-respond for you. (If you are using SSL, read how to decrypt SSL traffic and also in .NET you’ll need to suppress the invalid man-in-the-middle cert that Fiddler uses by returning true in the ServerCertificateValidationCallback

From an interface spec

If you have an interface spec but no endpoint to hit, create a file matching the content you expect to be returned, define a match for the URI, and use your sample file as the response content. See AutoResponder reference for more information.

You can use regex pattern matching for the URI, and you can either respond with a local file, or captured session. With a regex to match the entire host you can make all calls to your network resource respond with a HTTP 403 Denied and ensure your app behaves as expected.

Custom rules to show Akamai cached pages

I’ve used Akamai edge caching on a number of sites over the past few years to improve site performance, and it’s always useful to see which pages are being served from cache, and which aren’t. The easiest way I’ve found to do this is to add a custom rule to Fiddler to highlight requests for me. From Fiddler, choose Rules, Customize Rules. In the Javascript that opens, enter the following code:

With the other field definitions…

	public static RulesOption("Highlight Akamai cache Hits")
	var m_HighlightAkamaiHits: boolean = false;

In the “OnBeforeRequest” method…

	if (m_HighlightAkamaiHits) {	
		oSession.oRequest.headers.Add("Pragma", "akamai-x-get-cache-key");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-cache-on");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-cache-remote-on");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-get-true-cache-key");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-check-cacheable");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-get-extracted-values");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-get-nonces");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-get-ssl-client-session-id");
		oSession.oRequest.headers.Add("Pragma", "akamai-x-serial-no");

In the “OnBeforeResponse” method…

	if (m_HighlightAkamaiHits) {
		if (oSession.oResponse.headers.ExistsAndContains("X-Cache","TCP_MEM_HIT")) {
			oSession["ui-customcolumn"] = "HIT";
		} else if (oSession.oResponse.headers.ExistsAndContains("X-Cache","TCP_IMS_HIT")) {
			oSession["ui-customcolumn"] = "HIT";

Now close the Javascript file, and go back to Fiddler. If you made any mistakes in the Javascript, Fiddler will tell you immediately. From the Rules menu you now have a new option – “Highlight Akamai cache Hits”. Enable this, and visit in your browser. In Fiddler, you should see the word “HIT” for several of the requests in the “custom” column. You can rearrange the column order to move the custom column if you like.

Add request time

This is a simple new rule but surprisingly handy.

With the other field definitions…

	public static RulesOption("Show response time")
	var m_ShowResponseTime: boolean = false;

Add to either “OnBeforeRequest” or “OnBeforeResponse” method…

	if (m_ShowResponseTime) {
		oSession["ui-customcolumn"] = DateTime.Now.ToString();

Remember when using these rules that when you save the Javascript file, the Rules menu will be reset so any previously enabled rules will need re-enabling.

Fiddler also has a nice set of C# APIs which allow you to embed the fiddler engine directly into your test suite, which makes for a really nice set of integration tests (using the AutoResponder) with only a few lines of code. I’ll go into this in a future post.

Getting NHibernate Query Analyzer to work

Everytime I try and get Ayende’s NHibernate Query Analyzer to work I seem to encounter some weirdness. Here’s what I did to get it running this time.

  1. Download the version appropriate to your version of NHibernate. For me, that was 1.2GA
  2. Create a new app.config file, and into it place just the NHibernate configuration section and the appropriate settings. Note that when I tried it, NQA didn’t like the <hibernate-configuration> notation for the hibernate section, instead I had to change it to use the <nhibernate> notation.
  3. Load up NQA and start a new project.
  4. Add the app.config file you created and the assemblies which contain your embedded HBM files.
  5. Hit Build Project. IMPORTANT! If you see any errors, then quit NQA, fix the errors, and start again. If you leave NQA running and try again you’ll see the same error message as it seems NQA doesn’t close the old app domain properly.
  6. If all that works, then your home dry. File, New Query, and off you go. Brilliant tool once it’s going.

Improving your code with Kaizen sessions

From Wikipedia, “Kaizen is a Japanese philosophy that focuses on continuous improvement throughout all aspects of life”. After a review of our systems from a software coach in the department, I decided to instigate a weekly Kaizen session for the whole dev team. (I first heard of the term Kaizen a few years ago when my then department manager spoke of it and I guess now I understanding what he was meaning – thanks Marc!)

The plan is to spend 1 to 1.5 hours a week pairing with another dev investigating a certain area of the code trying to improve it with no strict deliverables. The hope is that the tools and techniques we use in these sessions will become integral to our daily work and code quality will measurably improve.

To feed into these sessions I created a backlog of features to be investigated. These were stuck on cards on the whiteboards for the pairs to select from. Anyone is free to submit a card for an area of code they’d like to look at.

Here’s a few of the cards I created for week 1 to start us off.

Method foo has a Cyclomatic Complexity of 28, aim to reduce it.
Using NDepend i’ve been looking at some metrics across one of our projects. NDepend gives you so much data that it’s easy to be overwhelmed so I chose Cyclomatic Complexity. This found a single method in our code rated at 28 – that’s to say, there are 28 paths through that method. Although we have 13 unit tests for this method we aren’t testing every line of code, and there’s probably some refactoring that can be done.

Class foo has test coverage of 72%, aim to increase it.

Using the Visual Studio plug-in with NCover we monitor code coverage of our unit tests. 72% is great, but quite a lot of our code has over 95% coverage so this class needs more testing.

There are 8 unit tests currently marked as [Ignore], without simply deleting the unit test, aim to reduce that count.

This is pretty self explanatory. I don’t know why the tests were marked with the Ignore attribute, but they are, and they shouldn’t be.

I was shown today how by using the Duplicates.NET build runner in our Team City CI server it can automatically identify duplicate code across the system. Despite the fact that this project was a greenfield development we’ve still ended up with some duplicate code in need of an Extract Method refactor. Metrics from these reports will feed into our Kaizen backlog.

For those interested, the end result of the above 3 cards was 12, 74% and 4 respectively, so i’m happy we’ve improved the code and got familiar with the tool set.

NHibernate Lifecycle Callbacks appearing not to fire

Had a problem with NHibernate Lifecycle events recently where they appeared not to be firing when I was hitting ISession.Save(entity). After some investigation I finally realised that if you create a new entity, then query NHibernate it may implicitly persist the transient object during a Flush. When it does this any interceptors will be fired at *Query time* thus preventing them from firing when you explicitly call SaveNew. This is by design of NHibernate to ensure the query results are always valid, and is documented here.