Pushing to Amazon Elastic Beanstalk with Git

I’ve just deployed my first Ruby and Sinatra app to AWS Elastic Beanstalk and it was pretty simple. One catch I did have at first was using Bundler. Everything was setup on my Mac and working fine, however once I had pushed to AWS, Passenger was returning the error:

Could not locate Gemfile (Bundler::GemfileNotFound)

After trial and error I discovered that the gemfile on my mac was named ‘gemfile’ but on my EC2 instance needed to be called ‘Gemfile’ (with a capital G). A quick rename, commit and push and Bundler worked fine.

Asp.Net session state in a web farm not being shared correctly

TL;DR The Asp.net session ID in the database uses the Site ID from IIS as part of a composite key. Ensure the IIS Site ID is consistent in a web farm.

The website I work on needed to use a RadCaptcha recently on a form. Although it was configured as per the Telerik article to use out of process session state (SQL for us), it would occasionally show a grey box instead of the captcha.

After some investigation with Fiddler we found that one of the three web servers couldn’t share session state with the other two. I.e. a capture generated on server 1 couldn’t be read by server 2, but could be read by server 3. And the inverse of this was also true. Machine keys were already being shared between the servers so it wasn’t a decryption problem.

Monitoring the AspSession database watching session creation we found that server 1 and 3 were creating the same session ID whilst server 2 was creating a different but very similar ID. We saw the following IDs in the ASPStateTempSessions table in ASPState database:

3jt3wvhazn22rcliw1vyij3h2d3aafb7
3jt3wvhazn22rcliw1vyij3h2d3aafb5

After a bit of investigation I found this article which describes how the SessionID is made up of the Session ID + Application ID. The Application ID is a hash based on the AppName which is based on the metabase path of your IIS site.

As our servers run multiple sites and we’d brought these sites online in a different order, the Site IDs had got out of sync. We found the following values.

server 1 => /lm/w3svc/7/root
server 2 => /lm/w3svc/5/root
server 3 => /lm/w3svc/7/root

Server 1 and 3 AppName’s hashed to the same value, but server 2 didn’t and couldn’t therefore find the session data for the RadCaptcha.

I changed the IIS Site ID using IIS admin under Site => Advanced Settings as below and it fixed the problem. Note: this will recycle your app pool.

Moving from Redis on Windows to AppFabric Cache

The site I work on fetches significant amounts of data on-demand from a remote data centre via a JSON feed. In order to provide resilience against the site being unavailable I implemented Redis on Windows as a read-through cache. (None Windows operating systems were not permitted at the time). I also implemented a refresh-ahead cache service which would ensure ‘hot’ content was always fresh.

Although the solution worked I was recently asked to look at replacements for several reasons

  • Redis on Windows wasn’t a supported platform
  • We couldn’t cluster it trivially as members within a distributed Redis cluster were not equal so some would need to be read slaves and some write masters. This would introduce more complexity to our code, deployment and could have introduced a single point of failure.
  • When a Redis process reached it’s 32-bit Windows memory limit of 2GB it would crash. We had deployed 4 instances to each server and we distributed the read/writes across these instances.

I chose AppFabric Caching Services from Microsoft’s for a few reasons, and within an hour or so had replaced Redis with AppFabric.

  • It supports equal cluster members, so all members are read/write
  • Native 64-bit support meant no 2GB process limit
  • Active support and community
  • Like Redis, it’s free

Having deployed AppFabric in a production environment for several months now, I thought i’d write a few findings:

  • Installation is not entirely scriptable. Despite efforts by Microsoft to make everything scriptable with PowerShell the very first installation and subsequent upgrades had to be done via the Wizard. After that, the PowerShell commandlets were available for use.
  • There can be long waits (5 minutes or so) during startup when using a distributed cache. It makes sense that the various nodes require time to synchronise, but the error messages that are reported are cryptic.
  • It’s rock solid once you get it running. We’ve used it now for about 3 months in a 3 server cluster storing items in the cache of upto 50MB and it’s fast.
  • You can’t enumerate the cache keys. With Redis our refresh-ahead service would iterate over the keys and freshen content if necessary, we’ve had to drop this functionality and take a small performance hit with AppFabric.
  • Upgrading isn’t entirely trivial. I upgraded our servers from v1.1 to v1.2 and although it’s designed to have heterogenous nodes in the cluster with differing versions, some of the security settings appeared to have changed between versions meaning that our website ‘client’ was not permitted access to the cluster. Luckily we still had Redis running in the background and a config switch directed the website at that temporarily.
  • Changing cache settings requires you to delete the cache. Carefully consider the largest size object you want in the cache when you create the cache – if you want to change it in future, you’ll need to either start a new cache and update your config, or take the site off-line whilst you upgrade your cache.
  • You need to specify the CacheItemVersion when deleting items. Use the following to remove from the cache. When I tried Remove(key) I always got a false returned and the item remained. This makes sense that in a distributed cache it needs to know explicitly which version to remove, but seemed to only occur when we had a cache cluster.

    public bool Remove(string key)
    {
    var dataCacheItem = _cache.Get(key);
    return _cache.Remove(key, dataCacheItem.Version);
    }

Overall I’m pretty happy with AppFabric, but some of these gotchas, the general lack of enthusiasm on forums etc. concerns me slightly about the future. But hey, it didn’t take long to write the concrete implementation, tests and config switches needed to get it into the codebase, so it won’t take long to replace it if it does get dropped.

Visual Studio ‘Attach to w3wp process’ macro

In Visual Studio, choose Tools | Macro, Macros IDE, create a new module and drop this in. I bind it to Ctrl-Alt-1 for quick access.

    Sub AttachToW3WP()
        Dim attached As Boolean = False
        Dim proc As EnvDTE.Process

        For Each proc In DTE.Debugger.LocalProcesses
            If (Right(proc.Name, 8 ) = "w3wp.exe") Then
                proc.Attach()
                attached = True
            End If
        Next

        If attached = False Then
            MsgBox("w3wp.exe is not running")
        End If

    End Sub

Update: 12/March/2014 If you’re using Visual Studio 2012 or newer, you will need to use the AttachTo extension.

VS doesn’t start, all of a sudden – “Cannot find one or more components. Please reinstall the application.”

After a reboot yesterday, I got the following error message from Visual Studio.

“Cannot find one or more components. Please reinstall the application.”

I did the control panel fix/repair this app, nothing. Today i’ve found the solution on a messageboard so I thought I’d post it to save others hours of frustrated re-installation.

To resolve…

Look in C:WINDOWSWinSxSx86_Microsoft.VC90.ATL_1fc8b3b9a1e18e3b_9.0.30729.4148_x-ww_353599c2
Is it empty? Mine was. Copy atl90.dll from an adjacent x86 folder into that folder.
Start Visual Studio.

This worked for me on Windows 2003 Server, with Visual Studio 2008.

Getting NHibernate Query Analyzer to work

Everytime I try and get Ayende’s NHibernate Query Analyzer to work I seem to encounter some weirdness. Here’s what I did to get it running this time.

  1. Download the version appropriate to your version of NHibernate. For me, that was 1.2GA
  2. Create a new app.config file, and into it place just the NHibernate configuration section and the appropriate settings. Note that when I tried it, NQA didn’t like the <hibernate-configuration> notation for the hibernate section, instead I had to change it to use the <nhibernate> notation.
  3. Load up NQA and start a new project.
  4. Add the app.config file you created and the assemblies which contain your embedded HBM files.
  5. Hit Build Project. IMPORTANT! If you see any errors, then quit NQA, fix the errors, and start again. If you leave NQA running and try again you’ll see the same error message as it seems NQA doesn’t close the old app domain properly.
  6. If all that works, then your home dry. File, New Query, and off you go. Brilliant tool once it’s going.

Improving your code with Kaizen sessions

From Wikipedia, “Kaizen is a Japanese philosophy that focuses on continuous improvement throughout all aspects of life”. After a review of our systems from a software coach in the department, I decided to instigate a weekly Kaizen session for the whole dev team. (I first heard of the term Kaizen a few years ago when my then department manager spoke of it and I guess now I understanding what he was meaning – thanks Marc!)

The plan is to spend 1 to 1.5 hours a week pairing with another dev investigating a certain area of the code trying to improve it with no strict deliverables. The hope is that the tools and techniques we use in these sessions will become integral to our daily work and code quality will measurably improve.

To feed into these sessions I created a backlog of features to be investigated. These were stuck on cards on the whiteboards for the pairs to select from. Anyone is free to submit a card for an area of code they’d like to look at.

Here’s a few of the cards I created for week 1 to start us off.

Method foo has a Cyclomatic Complexity of 28, aim to reduce it.
Using NDepend i’ve been looking at some metrics across one of our projects. NDepend gives you so much data that it’s easy to be overwhelmed so I chose Cyclomatic Complexity. This found a single method in our code rated at 28 – that’s to say, there are 28 paths through that method. Although we have 13 unit tests for this method we aren’t testing every line of code, and there’s probably some refactoring that can be done.

Class foo has test coverage of 72%, aim to increase it.

Using the TestDriven.net Visual Studio plug-in with NCover we monitor code coverage of our unit tests. 72% is great, but quite a lot of our code has over 95% coverage so this class needs more testing.

There are 8 unit tests currently marked as [Ignore], without simply deleting the unit test, aim to reduce that count.

This is pretty self explanatory. I don’t know why the tests were marked with the Ignore attribute, but they are, and they shouldn’t be.

I was shown today how by using the Duplicates.NET build runner in our Team City CI server it can automatically identify duplicate code across the system. Despite the fact that this project was a greenfield development we’ve still ended up with some duplicate code in need of an Extract Method refactor. Metrics from these reports will feed into our Kaizen backlog.

For those interested, the end result of the above 3 cards was 12, 74% and 4 respectively, so i’m happy we’ve improved the code and got familiar with the tool set.