Peter Gillard-Moss

Transient opinion made permanent.

Safety First With AWS Roles and STS

AWS credentials are an extremely precious and powerful asset. In the wrong hands they can cause serious damage either by disrupting services or, more commonly, by acquiring free compute power, usually to mine bitcoins, at your expense - bills with five figure sums over a matter of days are not unheard off. It’s a serious and very real risk that can jeopardise the financial viability of a product or team.

Development teams need AWS credentials, often powerful ones, to do their jobs: to create instances for debugging, to run tests and applications locally as part of the development cycle etc. Keeping the AWS credentials used to do development safe, by following basic security hygiene, such as ensuring that you keep them out of source control, is one course of action that every development team should be taking. But even then mistakes can be made or credentials can be leaked, or become at risk, in other ways. So it is important to ensure that, where credentials are needed for development, they are limited in the scope of damage they can do.

AWS Security Token Service

AWS STS enables the request of temporary, limited-privilege credentials. By leveraging AWS STS and temporary credentials teams can greatly reduce, not only the impact, but also the steps to recovery (as credentials are automatically rotated and revoked), in the eventuality that credentials are leaked.

AWS originally provided temporary credentials for Instance Roles. This technique enables EC2 instances to run without the overhead and inherent risks of managing users and passing credentials. For any service running within AWS infrastructure the use of Instance Roles, on their own, removes a large part of the surface area.

Developers still require credentials when they work outside of AWS’s estate and these credentials are often the most powerful and therefore dangerous. Fortunately, the services behind Instance Roles are available for all AWS users via the AWS Security Token Service.

Principle of least privilege

Using AWS STS for all users, regardless of where they operate is a powerful security measure. Roles can be defined which users assume based on need. By only granting the permissions needed to carry out those roles, the scope of potential damage is reduced.

This also enables parity regardless of the source. This is both important and powerful: an IAM user, or an EC2 instance, or a user authenticated by a corporate identity provider, or even a mobile app, can all assume the same role and operate under the exact same permissions.

For developers this means they are able to run applications and execute tasks in the development cycle with only the intended permissions. This removes the need for developers to have system wide privileges. This has the additional benefit of improving the development lifecycle by simplifying it (e.g. by removing common permission bugs etc. during deployment). If a developer needs to deploy a server for debugging they can assume the same roles the build server would use for deployment. If they need to run a web server locally with permissions to S3 then they can assume the role that has been defined for running that application in production like environments.

Temporary credentials for developers

AWS STS will issue temporary credentials which last a maximum of 3600 seconds (1 hour), although they can be requested for even less. Because the credentials are generated ‘out-of-band’ and are disposable, they operate in a manner which decreases the likelihood that they are checked into source code etc. (as they tend not to be added into config files). Unfortunately, how to obtain credentials is not so obvious or provided easily by existing tools.

There are two ways to obtain temporary credentials from AWS STS. The first is by providing a token from an Identity Federation and the second is via an authenticated IAM user.
From there AWS STS generates credentials following a request to assume a Role. This is exactly the same technique used by Instance Roles.
This separation between authentication of a user, and the credentials used against AWS offer another layer in Defence in Depth.

If you use an identity provider (IdP) you can bootstrap into a role by providing a SAML assertion or with a Web Identity (such as Amazon Cognito or OpenID Connect). Unless you have tight control over the IdP I wouldn’t recommend using Web Identity as you essentially open up the ability to assume your role to the entire IdP (e.g. every user with a Google account).

If you prefer to use AWS for your account management you can bootstrap from an authenticated IAM user from the same, or even from a different account. Although this means that the user will still require AWS credentials they are only ever used for acquiring temporary credentials and have no value when running code.

To bootstrap and generate credentials I have created the following CLI tool: aws_role_credentials. This allows developers to easily generate temporary credentials from the command line to be used by other applications. The credentials are saved in a named profile in the standard AWS profile configuration file (e.g. ~/.aws/credentials) which is supported by all AWS SDKs transparently via the default credentials provider chain without the need for any code changes. From there any process can pick up those credentials and use them.

Using a SAML provider

This is the recommended way of operating as the vast majority of organisations will already have a form of Identity Provider such as Active Directory or LDAP. If that IdP supports SAML, or can integrate with a SAML IdP such as Shibboleth they can be integrated with AWS. This has many advantages to keeping your organisation secure, including the simple fact that if someone leaves your main corporate IdP then they no longer have access to AWS. This is more true if there are multiple AWS accounts as they can all be managed by the same IdP. It also enables your IdP to be the source of authentication for the acquisition of temporary keys (which, incidentally, is all that happens behind the scenes when you use the IdP to access the AWS console).

First you will need to setup an IAM Identity Provider for SAML and a Role for it to assume, or modify an existing role to add a Trust Relationship with the SAML Identity Provider (the SAML assertion will need to pass the role).

The achieve this the Role needs a statement similar to the following:

"Statement": [
        "Effect": "Allow",
        "Principal": {
            "Federated": "arn:aws:iam::1111111111:saml-provider/AcmeSaml"
        "Action": "sts:AssumeRoleWithSAML",
        "Condition": {
            "StringEquals": {
                "SAML:aud": ""

Then it is a simple case of authenticating with the IdP to obtain a SAML assertion and passing it to aws_role_credentials via stdin. In the following example our developer, Jo Bloggs, will use a tool which authenticates them against Okta called okta_auth to obtain the necessary SAML assertion - although aws_role_credentials has been written to plug into any other SAML IdP:

$ oktaauth -u jobloggs | aws_role_credentials saml --profile dev

This will assume the role provided in the SAML assertion and generate credentials in the named dev profile in the AWS profile configuration file ~/.aws/credentials.

To test whether the credentials were successful simply use the awscli:

$ aws s3 ls --profile dev

Any application which uses an AWS SDK or similar (such as boto) will also read these credentials from the AWS profile configuration file without any code changes as part of the default credentials provider chain. To tell the profile credentials provider to use the named profile simply set the AWS_PROFILE environment variable.

export AWS_PROFILE=dev

Using an IAM user

For those scenarios where an IdP is not available, or the user cannot be added to the IdP, then a normal IAM User can be used to bootstrap. The IAM user can exist in the same account or in another account owned by the same organisation.

To allow an IAM user to assume a role you first need to create one and add the trust relationship to the role in a similar way to the SAML example. As an aside single roles can have multiple trust relationships.

For this example we’ll assume that our master account id is 111111111 and the role is called Developer:

"Statement": [
        "Effect": "Allow",
        "Principal": {
        "AWS": "arn:aws:iam::111111111:root"
        "Action": "sts:AssumeRole"

Then the developer’s IAM user is given permissions to assume the role. The developer should have no other permissions.

"Statement": [
        "Effect": "Allow",
        "Action": [
        "Resource": ["arn:aws:iam::111111111:role/Developer"]

Our developer (jobloggs) can then use aws_role_credentials to assume the role and obtain temporary credentials. Jo will need to provide their usual AWS credentials to authenticate against AWS STS (by using the default profile or environment variables). Note that these credentials only have permissions to assume the role.

$ AWS_ACCESS_KEY_ID=feaedafda12312cfd
$ AWS_SECRET_ACCESS_KEY=secretaccesskey

In this scenario aws_role_credentials needs to be provided with two arguments: the full arn of the role, and a name for their session (AWS use this session name for logging and auditing):

$ aws_role_credentials user arn:aws:iam::111111111:role/Developer jobloggs-session --profile dev

The outcome is the same as the SAML example: the tool assumes the role provided and generates credentials in the named dev profile in the AWS profile configuration file ~/.aws/credentials.

Again to test whether the credentials were successful simply use the awscli:

$ aws s3 ls --profile dev

IAM with Multi Factor Authentication

The role can be further protected by enforcing MFA. To do this, simply update the statement to add the condition:

"Statement": [
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::111111111:root"
        "Action": "sts:AssumeRole",
        "Condition": {
            "Bool": {
                "aws:MultiFactorAuthPresent": "true"

This now requires that the developer calls aws_role_credentials and passes in the MFA serial number and the generated token. The MFA serial number can be obtained by running aws iam list-virtual-mfa-devices. The token will come from the mfa device.

Here’s an example of the full call:

 $ aws_role_credentials user arn:aws:iam::111111:role/dev jobloggs-session --profile dev \ 
   --mfa-serial-number arn:aws:iam::111111:mfa/Jo \ 
   --mfa-token 102345

Using a custom provider

You can also use a custom provider. This would simply require the creation of a service/web app to authenticates the user against the provider and once established invoke the STS AssumeRole and pass the temporary credentials to the client. From an implementation perspective it is the same as the IAM user except that the AssumeRole is called on the developer’s behalf.

This has the advantages of the SAML method but for situations where a SAML provider is not available for your IdP.

Further recommendations

AWS STS enables a policy where all developers and service accounts can be maintained in one place with restricted access to AWS via lower risk limited-time, limited-privilege credentials. To achieve this the following additional guidelines can be followed:

Service accounts

Inside AWS

Never use an IAM user for services that run inside AWS. Always use Instance Roles.

Outside AWS

For service accounts which exist outside of the AWS estate (such as build servers or other services) then AWS STS should be used to provide the service with temporary credentials. These could either be injected (where the operation is already time limited) or requested from a Custom Provider that is able to authorise and establish the role on their behalf (in exact replication of Instance Roles).

IdP Recommendations

Where an IdP is available there should be no IAM users at all (although there may be one or two edge cases such as the IdP itself requiring access to AWS). The IdP can be relied on to authenticate both developer and service accounts and thus allowing them to assume the necessary roles.

IAM Users

In some instances an IdP may not be available and IAM users may still need to be used. In those instances IAM users should only have privileges to AssumeRole and absolutely no other permission.

Separate account for IAM users

Most organisations operate multiple AWS accounts. If you must use IAM users then all IAM users should be kept in the master account, with only the AssumeRole permission. Roles are then used to delegate access across accounts to obtain credentials and console access.

Multi Factor Authentication

If developers must use IAM users to assume roles then their originating credentials should be protected by AWS’s Multi Factor Authentication. This option is completely free and easy to setup and gives an additional layer of security as no IAM user would be able to assume a role without MFA.

Authentication Out of App

The majority of modern web applications require some sort of authentication mechanism. Whether that’s an internal reporting site, a build server, an online store, a blogging engine or even an API. Users need to login to either gain access the entire system or to specific parts, or they need to send credentials when making API calls.

Typically authentication is integrated into the app or service by installing a 3rd party component and using the web framework’s inbuilt mechanism for allowing or deny specific routes or resources. For example, in Ruby on Rails you would use before_action which delegates to a custom written method.

This is a less than ideal practice, for a number or reasons, from engineering to security. I advocate keeping authentication out of the application altogether by pushing it to a higher layer (typically a reverse proxy).

Authentication vs Authorization vs Identity and user flow

vs Authorization

Authentication and authorization are often conflated, yet they are orthogonal. Where authentication is about providing some level of confidence that the person, or thing, accessing the system is who they say they are, authorization is concerned with enforcing domain rules around behaviours attributed to individuals or groups of users. You can have a system with neither, either or both. For example, a web site grants access to blocks content unless it has verified you are a registered user and all authenticated users have the same level of functionality (authentication only); alternatively a different site asks ‘are you a member?’ and if the user responds positively it allows them access to different content (authorization only); or more typically a user is an authenticated user and the site authorizes different behaviours (admin vs readonly) such as updating content (authentication and authorization).

vs Identity

Identity is also commonly conflated with authentication. Especially as the purpose of authentication is to validate an identity. Even more confusingly the information by which you validate an identity (typically username and password) is often conflated with the entity being identified (name, address etc.).

They are in fact, two separate things. The entity which that identity refers to is not a concern of authentication but of something else (depending on the domain it could be a customer service, or even LDAP). Likewise the application itself is concerned with resolving the identity not how to validate it.

Identity has other rules around users which may affect their interactions with the system. For example, if they have logged on for the first time, or if they have changed status (perhaps become an admin) and need to add more information. Or if they access a specific part of the site for the first time they may have to accept some Terms and Conditions etc. These rules, around what a user can and cannot do, and other behaviours, are part of the application domain.

Likewise, authentication, has different rules around validation (see Trust). For example, it may initially require a username and password but, then relies on a cookie or token to ensure validity. It also may not require the username and password at all, API keys or even authorisation from another system (e.g. OAuth, SSO etc.).

vs Trust

Trust is a much overlooked aspect of authentication. Trust is a variable matter and is usually derived from metadata around the authentication request itself and it is always authentication which determines trust. Trust can influence the behaviour around both authentication and authorization.

Commonly this is around the behaviour of authentication; more or less information can be demanded (e.g. multi-factor) depending on the application itself or the circumstances in which the request was made (untrusted network, untrusted device, new user etc.).

The trust score can affect authorization too, for example by changing what access is given to what data based on factors beyond a binary result of authentication. If the authentication results in a low trust score the application could decide to provide a lower level of access to the same user than normal.

Separate concerns

Authentication is a very distinct capability of the system. It is also a fairly complex one, yet it has a very well defined boundary. Engineering principles encourage that we treat authentication as distinct and encapsulate it, thus enforcing separation of concerns. Good engineering practice also encourages appropriate layering, again to help enforce separation of concerns but also to simplify.

Thanks to these characteristics authentication can easily be pushed up to a higher layer. Typically this layer is within the application framework itself (e.g. Rails callbacks) however most frameworks make it difficult to enforce isolation and functional bleed is not uncommon. This leads to ambiguity (especially with authorization) which, in turn, leads to the authentication surface area increasing, making it more difficult to isolate. These factors create a higher probability of error (i.e. bugs, exploits), which are potentially costly. Therefore it is preferable to remove it as an application concern altogether.

Into the reverse proxy

Removing authentication from the application framework completely and into a higher layer realises separation of concerns and enforces it.

A reverse proxy is a good candidate for this behaviour. Most reverse proxy and HTTP servers already offer a plethora of authentication modules (from basic auth to LDAP integration to SAML integration etc.) and for some (e.g. nginx) it isn’t hard to role your own. The reverse proxy simply blocks any unauthenticated request from reaching the application. Any behaviour related to an unauthenticated request (redirects, blocking etc.) are well within the capabilities of the reverse proxy.

As HTTP servers and reverse proxies tend to employ a declarative approach to configuration other subtleties of authentication are also surfaced in a far clearer manner. For example, protecting specific routes whilst leaving others open (css etc.).

The application itself has to be configured in such a way that it is either only accessible via the proxy (such as being bound to localhost or by whitelisted IP address via iptables rules) but again, these concerns are external to the application itself.

Contract with the Application

For the application to fulfil its function it will require the identity key (e.g. username or email) which it can then use internally to resolve to the entity itself (e.g. user or customer) plus any meta-data related to the authorization (e.g. trust levels, groups etc.). Because the application implicitly trusts the parent layer it can safely assume that the identity passed is valid at all times. If it was not valid, it would never have received the request.

There are a number of positive side-effects to this. The application is now simpler: a whole concern has been removed. Functional bleed in the application becomes impossible. As authentication is a cross-cutting concern its influence on application behaviour can be quite wide. This means other areas, such as testing, have also removed a layer of complexity. As a whole, both authentication, and the remaining application behaviour, become more robust and predictable.


Authentication is a common target for exploit for obvious reasons and so security is critical. The simple step of removing authentication out of the application and into the reverse proxy has potential to reduce vulnerability.

Security through layering is a good practice. The reverse proxy is one of the first lines of defence in a system. As a result they are well tested and proven against large numbers of security exploits and often include plugins and modules to protect your system from attack. This is one of the reasons it is recommended to place an application in front of a reverse proxy.

A rule of security is that it is only as strong as its weakest link. When authentication is in the whole application then security becomes as strong as the weakest part of the app. This includes all of the application’s dependencies and when modern application development often involves the use of large numbers of third-party dependencies that is a significantly large surface area.

Application dependencies are also highly volatile. This makes is very difficult to ensure that a new library, or an upgrade to an existing one, doesn’t introduce a weakness which exposes authentication. I witnessed a real world example of this where an application’s SAML library relied on an XML library which was upgraded by a separate component in the application. The XML library had a bug which resulted in incorrect behaviour in the SAML parser.

This situation is made worse in organisations with more than one application or service. If each application has self-rolled authentication then, regardless of how strong all the other authentication implementations are, the weakest link becomes that app. The result is that the whole organisation becomes exposed due to one single poor implementation.

Conversely, the modules used in reverse proxies are far more security hardened. Even in the case of a bespoke authentication system (which is a risk in itself) the lifecycle of the authentication service can be separated from the application’s and thus the risk of library upgrades or bugs in the application are not affecting the authentication service.

Authentication in the reverse proxy allows an organisation to test and harden one point and can have a more conservative attitude to change within authentication itself whilst allowing a more liberal attitude to the protected applications.

Multiple authentication mechanisms

It is increasingly rare for a system to have only one authentication mechanism. Web browsers authenticate differently to mobile apps, third parties authenticate to APIs in a different manner to native apps, apps that make requests on behalf of users (e.g. OAuth) authenticate differently to applications that are run as server processes (such as cron jobs, ETLs, other applications etc.). Also it is becoming common for sites to offer multiple providers (Google+, Facebook etc.) to authenticate against.

Reverse proxies allow different mechanisms to be declaratively defined and mixed and matched based on clear rules. Different auth modules can be invoked based on specific request headers, paths, client IPs etc.

Multiple authentication mechanisms makes for a complex implementation. Luckily a reverse proxy configuration can easily be tested without the need for the real application running behind it. Best of all this complexity is kept out of the app.

For packaged software this is also a real enabler. Products are often under pressure to meet the demands of a wide range of different authentication mechanisms out there in the market. Having integrated authentication is a deal breaker for many enterprises. Rather than having to provide and support an implementation for every single possible provider the product only has to support integration with a proxy. This means covering a significant part of the market with considerably less effort.

SSO systems vs custom provider

When it comes to SSO integrations for reverse proxies are a well established pattern. Many SSO providers already have robust, well-proven implementations for Apache and nginx. Shibboleth is a great example. It uses a fastcgi process allowing the reverse proxy to hand of authentication requests. As a general rule these implementations are wider used, often provided as system packages in the standard distribution repos, more vigorously tested and have communities that have a higher level of specialism in authentication. This makes security exploits and bugs less likely and an higher guarantee of response.

For API authentication there the majority of providers either integrate directly with reverse proxies or provide their own. Personally I prefer those that integrate with reverse proxies. This is for a number reasons: it simplifies the layers - a general reverse proxy, such as Apache or nginx, is often standard and adding another increases the number of layers thus increasing complexity; it gives one place to see all security related config (the reverse proxy configuration file) also, as previously mentioned, reverse proxies have a high amount of scrutiny and history with respect to security, and other key functionality (performance etc.). Thus any custom proxy for API traffic would need to be behind a general proxy regardless.

When websites have their own authentication mechanism (for example a basic user authenticated against a database looked), usually highly stylised and customized to the site, then a separate authentication application is still hugely beneficial for all of the above reasons. Better still, use an established commodity product which has been tested and proven and hardened rather than a potentially weak and vulnerable self-rolled one.

Authentication as a commodity

Commoditisation of common patterns is a real productivity booster. Teams rewrite the auth wheel over and over again. Even if they are using plugins or libraries there is still a good deal of effort in getting the authentication working.

Pushing authentication out of the application layer increases commodity. There are fewer implementations of reverse proxy than there are languages and frameworks.

Due to the cross-cutting nature of authentication an organisation can produce a single reverse proxy implementation which covers their entire family of applications with very low effort. Any modifications (adding more providers etc.) come at a low cost.

Commodity in the wider world, of open-source etc. makes the industry as a whole more productive. Rather than expending effort of writing yet another authentication implementation for your language, site or product by establishing a standard and following a common, easy to implement pattern and keeping authentication out of the application the industry reaches a high level of reuse and productivity.

It also ensures that applications follow a similar consistent pattern. This reduces maintenance and troubleshooting. If all applications follow the same pattern of behaviour it becomes far cheaper to roll out and adapt authentication for the organisation or product.

Three Rational Reasons for Being Vegetarian

I read the most ridiculous article that a friend posted on Facebook titled THE 3 reasons to give up meat (and 1 not to). The article lead with some really strange and more than slightly ridiculous arguments about vegetarian’s getting more light photons or something.

For some reason vegetarianism and veganism tend to attract that particular brand of lefty cynics that see things built on rationalism (such as science, drugs, chemicals!) as a great conspiracy to cover up those overlooked powers of nature, spirituality and the ‘alternative’. As a vegetarian I’ve had to regularly brace my rational self when obtaining vegetarian or vegan goods, whether from a health food shop or a vegetarian cafe, against the overt existence of homeopathy, vitamin supplements, acupuncture, magic water, tarot card reading and every other ridiculousness that prevails these areas of society which most of us manage to avoid.

The original article was obviously from that brand. Which upsets me. I’ve been a strict vegetarian for fifteen years and I like to feel my reasons for doing so are sound and not built on theories as probable as water memory.

My purpose here is not to try to convince people that they should become vegetarian but to support those of us that aren’t insane against the deluge of delusion that people put out there - and don’t believe that eating all those photons will make us glow. So here is a list of three rational reasons for being vegetarian.

1. Ethics

I’ve started with ethics because that is probably the single biggest reason (although I have no evidence for this) that people turn vegetarian. Often due to concern for animal welfare in an industrialised meat industry or being uncomfortable with killing animals for food full stop. Having a fundamental disagreement with the right to kill another living animal unnecessarily is the single reason I gave up meat. It is also, quite probably, the most contentious. Which is maybe why many vegetarians and vegans avoid ethics as a reason when asked. Most of us learn, pretty quickly, that a not insignificant number of people get quite offended when you tell them you don’t want to kill animals and, too often, an aggressive response is the result. The ethics around killing of animals for food or other products (such as clothes) is a personal decision, and vegetarians are often going against the norm in many cultures (with notable exceptions).

All of us already have an ethical framework in regards to what living things are acceptable to eat, regardless of how much explicit thought has been given. For some this might be based on culture or religion (the avoidance of beef or pork), it might be the simplest one (as long as it’s not human) or it might be arbitrary (not dog or cat but rabbit and pig are fine thanks). There are many meat eaters who have a far more considered approach and make their decisions on what animals are acceptable based on reasonable argument, such as avoiding anything that has a significant level of intelligence and consciousness (although there is often an arbitrary cutoff that usually lets pigs in) or nothing in a close biological family tree (apes, monkeys etc.) or perhaps based on farming methods (only organic or free range) or nothing endangered (no whales or north sea cod). Or a combination of the above. Regardless, everyone has applied some sort of reasoning to why they would eat a pig but not a gorilla, even if it did taste delicious.

Generally meat eaters start at the top of the food chain and exclude based on criteria. Criteria which is often arbitrary (see the British reaction to processed food ‘contamination’ with horse meat). I’d like to argue that this is problematic and that it is more coherent to start at the bottom of the food chain and include upwards. This means consume (for food, clothing whatever) things from the Plantae and Fungi kingdoms and you’ll be OK. Once you get to the Animalia kingdom, for each thing you wish to include, some coherent ethical criteria is required. This is pretty challenging; which is why I like to stay at the bottom.

Of course there are theoretical problems with this approach; what if there was a plant as clever as a cow (there isn’t)? This is why some people include criteria such as whether the life form feels pain (which lets in molluscs). It’s also not entirely as black and white as I make out, there are some grey areas. An animal is defined as “a living organism which feeds on organic matter, typically having specialized sense organs and nervous system and able to respond rapidly to stimuli”. That definition could include some of the weird and the wonderful that manage to survive at the bottom of the food chain (sponges). However, for most of us those organisms in the grey area don’t tend to fall on our dinner plate and by sticking to fruit, vegetables and some mushrooms you keep well away of the grey.

Vegetarians (a category in which I include myself) enter a whole ethical grey area with the consumption of animal products. Consuming animal products (for food: milk, eggs, honey etc. or fabric: wool etc.) means moving up from the bottom of the food chain. Whereas theoretically one could consume milk and eggs without ever causing suffering, pain etc. this is not how modern farming works. The reality is that animals are killed (mainly in conjunction with the meat industry) to enable this produce. I reluctantly admit that vegans have the logical upper hand here. So how do I, as a vegetarian, justify my decision to eat eggs and drink milk whilst acknowledging the realities of modern farming?

Here in my argument I pull in an auxiliary: to obtain reduction of overall suffering rather than absolute elimination. This becomes an argument based on balancing benefit and harm. Vegans are the environmentalist who only get around by walking. They never use a train, bus or aeroplane (or horse - animal product). They never buy goods that caused pollution in their construction or they’re supply chain (so they can’t even cycle). Although not impossible - you live in a forest and can cut down your own trees to build your own house with it’s own vegetable patch - it will have a pretty significant impact on your life and ability to exist in modern society. Somewhere you have to create a cut off and make decisions, a significant number of which will be arbitrary (even vegans make arbitrary choices). Flying becomes okay under certain circumstances because you are the head of Greenpeace and the benefit of your presence outweighs the harm. Likewise with the consumption of animal products, keep them to a minimum and choose products that cause the least harm overall. So personally I use soya milk mostly in my tea and cereal but I will still accept a cup of tea, or a slice of home made cake with cows milk from a friend.

The benefit vs harm argument places vegetarians pretty close to those much scathed ‘vegetarians’ who consume fish. It also places them pretty close to those other vegetarians who will eat animals under certain circumstances (special occasions, festivals, in restaurants where there is no palatable alternative) and also those meat eaters who try to keep their consumption to a minimum. I think the key difference is whether you take a top down exclusive approach or a bottom up inclusive one and how many arbitrary decisions you make overall. I argue that inclusion forces explicit decisions rather than the implicit ones that exclusion, by it’s nature, encourages. This approach, while it doesn’t completely remove all personal whim, reduces it overall leading to a potentially higher degree of logical coherence.

2. Health

There are plenty of studies that draw a correlation between a higher life expectancy and non-meat eaters. There are also a number of studies on diseases and cancers that draw a correlation between lower rates of presence, higher survival rates etc. and non meat-eaters. These are not causal however and some argue that vegetarians just tend to be a more health conscious bunch anyway and are less likely to drink, smoke etc.

One could argue that once you remove meat from your diet you have to replace it with something. You could argue therefore that this encourages vegetarians to increase the amount of fruit and veg they eat. A vegetable curry goes that much further to contributing to your five-a-day than a chicken one. However being vegetarian doesn’t automatically make you eat healthy: I’ve met plenty of ‘cheese, vegeburger and chips’ vegetarians whose have never let a green leafy vegetable onto their dinner plate and, the only contribution to their five-a-day is the peas and sweetcorn in the rice burger.

There are also health issues related specifically to animal produce. When yet another meat scandal breaks out (which they do with regularity) I sometimes joke “wake me up when there is a cabbage scandal”. Although this is slightly disingenuous as lysteria, the pathogen responsible for most deaths from food poisoning in the UK, does nestle in brassicas. And vegetables are not immune from causing infectious intestinal disease. However meat and seafood accounts for nearly two thirds of all foodborne disease with salmonella from poultry ranked first in terms of hospital admissions. Also one must consider that, amongst those non-meat related incidents, how many are caused by cross-contamination with meat (using the same utensils for meat and vegetables being a notable risk point). Vegans further reduce their risk further by avoiding cases from eggs and diary (slightly above 5%). To sum up simply the Health Protection Agency states “the foods least likely to cause food poisoning are cooked vegetables, fruit and rice.”. The reduction, or elimination of animal products has a significant reduction in risk of illness and even death from foodborne pathogens.

Vegetarians, at all stages of their lives, are able to get everything they need nutritionally (contrary to popular opinion) from their diet (not vegans however who require supplements). This aligns with the evidence suggesting that vegetarians are overall healthier, less prone to disease and live longer. Whilst being vegetarian or vegan doesn’t automatically make you healthy, for whatever reasons, it does seem to increase your chances. Although one could argue that the majority of health benefits and reduction in risks would be achievable by simply reducing meat consumption overall rather than completely eliminating it.

3. Environment

The meat industry’s impact on the environment outweighs the impact of every other human activity. If you wanted to reduce your environmental impact and you had a choice between either: a) selling your car and then cycling and walking everywhere; or b) giving up meat, you would have a greater impact giving up meat. And it’s not just beef being the single biggest cause of deforestation, the fishing industry too has a significant impact on our oceans, including the devastation to the coral reef.

Meat can be grown sustainably, fish can be fished sustainably. Both could be achieved in a manner that creates minimal environmental impact. However we are just nowhere near that. The harsh truth is that even if the country only bought locally reared, free range, organic meat not only would your environmental impact still be large but you wouldn’t be able to keep to those restrictions without a dramatic reduction in overall consumption of meat based products. It’s a zero sum game: if you maintain your meat consumption and buy exclusively locally reared beef you force someone else to buy beef reared from cleared Brazillian rainforest.

Biofuels caused a great degree of controversy from environmentalists. Fields that would have been used for food crop were instead used for fuel. The result was deforestation in order to balance the crop needed to feed people. There was also evidence that food prices were impacted which affected availability of food for the poorest. The choice, for environmentalists, became between avoiding unsustainable fossil fuels or feeding the poorest. However the amount of crop used for the rearing meat (36% globally - a whopping 67% in the US!) outweighs that used for biofuels by four times. Whilst people were happy to argue that people were getting fuel at the expense of food to the poorest it also has to be argued that cattle reared for meat is metaphorically taking food from the poor to feed the meat for the rich (it costs 100 calories of grain for every 3 calories of beef).

A common rebuttal is that a proportion of land is not suitable for crop where as it is more that suitable for grazing cattle or sheep. This is non sequitur: not every piece of land has to be squeezed for productivity - the very philosophy which contributes to our environmental problems. In fact leaving wild land wild is the best thing, environmentally speaking, to do. But mainly it is a straw man to claim that significant reduction in the environmental impact of animal products require the complete elimination of every doe-eyed cow from the British countryside.

The harsh, harsh reality is that meat is incredibly environmentally damaging (18% of all anthropogenic emissions) and also incredibly expensive and inefficient in terms of resources. Whilst flying bananas from halfway across the world isn’t the most environmentally neutral decision, including the odd faux-pas, your average vegetarian’s environmental footprint is going to be significantly less than the average meat eaters.

This does tie back to the benefit vs harm, bottom up vs top down argument. Whilst being vegan would have the greatest benefit on the environment, and vegetarian next greatest benefit, one could still achieve this by dramatically reducing their meat consumption and only buying locally reared meat.


One thing about a rational approach to animal products is that the only completely coherent position is to become vegan. Yet when employing the benefit vs harm as an auxiliary vegetarian, vegan or meat eater isn’t a binary black and white matter but more a spectrum.

People can still make rational choices that are coherent and sound based on ethics, health, the environment and economics which still allow some consumption of meat. However, in terms of a ‘kill count’ how big is the gap that separates the vegetarian from the person who has the occasional bit of fish or meat? Whilst theoretically one could drink milk and eat eggs without any animals suffering that is not the reality. To sustain a vegetarian diet animals still die. Eggs and dairy still have a higher environmental impact. Arguably the impact is significantly less that a diet with meat. However, what really is the difference on all those scales between a strict vegetarian and someone who eats animals rarely?

If I strictly followed my ethics I, and many other vegetarians, should be vegan. For me, personally, I try to minimise my impact on all counts (e.g. by mostly using soya milk). I acknowledge that rationally little separates a vegetarian and the infrequent meat eater. Yet ultimately it becomes an emotional response. The difference between indirect, implicit and direct, explicit killing. The thought of eating animals disquiets me on a number of levels that means I make the choice to be vegetarian.

Machine Images as Build Artefacts

Thanks to the cloud, new innovative approaches in infrastructure management, to make it considerably more reliable, consistent and repeatable, are being proven at scales never before imagined. By combining the benefits of virtualization with high levels of automation, mainstream cloud implementations such as AWS have enabled new properties to infrastructure management such as elasticity and autoscaling. Prior to this most machines in a datacentre were deliberate, long lived and had a strong one to one relationship with their hardware. The OS was installed shortly after the iron was installed in the rack. Initial installation was a long, often manual task and machines were updated infrequently. When machines were changed it was a very deliberate, explicit and controlled process with an emphasis on managing risk. For many these changes needed to be rehearsed and babysat and often done manually. The result was data centres comprised entirely of snowflakes.

In the last few years great improvements have been made with Infrastructure as Code (IasC). Tools such as Puppet and Chef reduced the footprint of the snowflake and machines were theoretically recreatable. Yet base images are still long lived. The OS is installed when the iron is installed and low level packages, such as Java, would be installed once and never again (save security updates). The result was configuration drift which again, ultimately led to snowflakes.

Combining the highly automatable nature of the cloud along with IasC gave birth to patterns such as immutable servers and phoenix servers. Entire stacks can be Configured on Demand (CoD) at rates and in timeframes several orders of magnitude beyond the limitations of Moore’s Law. By considering machine instances disposable the characteristics of legacy static infrastructure - which led to problems and limitations such as configuration drift and scaling - are completely removed. Thanks to the highly automated nature of the cloud the bind between the iron and the machine has been severed resulting in a shift from machines being provisioned only once in their lifetime to machines being provisioned tens or hundred times a day, hour, minute. Figures absolutely unimaginable a few years ago.

New tools new problems

Yet this has raised a new set problems that either weren’t experienced in legacy static infrastructure or were tolerated due to the low frequency, highly controlled environments that configuration changes were managed in. New cloud architectures operate at such high rates the variability of the internet becomes exposed. Before sys admins were doing one package upgrade (yum upgrade or apt-get upgrade) for every machine at regular, intervals (perhaps once a week). Now, in the cloud, package updates are run every time a machine is provisioned initiating hundreds, thousands of package downloads an hour.

These sorts of frequencies make the system vulnerable to variability and failure. A slow third party package provider (which are subject to seasonality especially at times of new distro releases) can cause provisioning times go from taking a few minutes to potentially dozens of minutes. This can result in deploys that are impossible to get out or autoscaling failing as it struggles to keep up with demand. Or more terminally, the third party is unavailable or has a corrupt package preventing any provisioning at all. Either way, the result is a system unable to cope under periods of load.

Then there is general change. With static infrastructure packages are installed as part of the machine’s original provision making them a relatively uncommon occurrence. Now the same package is installed tens, hundred times a day. This introduces the risk that when the third party updates the package to a new version new machines become unintentional early adopters. The result is bugs, inconsistencies between sibling machines and in some cases complete failure due to incompatibility.

Other factors, previously unconsidered such as provisioning performance, also become consideration. Especially under autoscaling where quick turn around time is an important factor. A few big packages with long install times causes the time to add up. With static infrastructure, where packages are installed once, and machines are often taken offline to do so, time is a cheap variable. In the cloud automated world however this has consequences in areas such as deploy times and can increase the latency of autoscaling. Latency in autoscaling can ultimately result in significant impacts on overall system performance at critical times.

The shear rate and frequency at which machines are provisioned means that the system hit the gaps, often gaps that were never previously noticed or acknowledged. What were mere irritations before are now critical. These gaps knock on down chain. Teams lose productivity because they can’t bring up a development environment because pypi or is down.

Being dependant on third parties is risky and even more so at these high rates of provisioning. There are no guarantees. No guarantees of consistency, no guarantees on reliability, no guarantees on performance. No assurances that environments are identical to each other. No two runs are guaranteed to be the same. Especially when you consider tools like Puppet deliberately run in random order. The end result is what was previously a minor outside factor now has a significant effect.

There are a number of traditional techniques that can be applied to reduce these problems and thus increase reliability and consistency.

Comprehensive configuration

At the simplest level the problem can be solved in configuration. Package managers can be instructed to fall back on mirrors for backup to increase reliability (though not completely guarantee it). Versions of packages can be explicitly pinned for consistency. Using distributions that have long term support reduces package variability as security updates or critical bugs should be the only changes.

This improves the situation somewhat. However, this is still a problem for less vigorous package systems such as gems and eggs etc. where the dependencies of the packages themselves are not locked down. So while you may install aws-sdk 1.21 it is instructed to accept json ~>1.4. If a new version of json comes out, during a deployment, then you inadvertently pick it up and you are exposed to the same risks already discussed. Also mirrors do not resolve issues with large packages and stressed third parties.

Pushed to infrastructure

Rather than solve the problem in configuration it can be pushed to infrastructure. The entire environment can be locked down by creating local repository mirrors, caches, proxies etc. This solves reliability. It part solves performance but while large packages will be quicker due to proximity they’ll still be time consuming. And although consistency is much higher there are still no cast iron guarantees. Minor changes in run orders could expose bugs at critical times.

It also requires considerable investment in infrastructure to create high availability, high bandwidth package repositories and carefully manage upgrades and version changes. This is a significant increase in infrastructure complexity and requires large amounts of systems investment.

Machine images as build artefacts

In the May 2013 edition of the Thoughtworks Technology Radar “machine images as build artefacts” were placed in “asses”. This is a technique that creates a one-to-one relationship between machine images and applications by actively embracing patterns such as phoenix and immutable servers. Thus it removes problems such as configuration drift and snowflakes whilst simultaneously, almost serendipitously, resolving the problems of reliability, consistency and performance inherent in Configure on Demand approaches without the need for comprehensive configuration or supporting infrastructure. It is a technique used extensively and exclusively in the Netflix architecture which they term ‘baking’.

CoD is heavily reliant on IaC tools such as Puppet and Chef running in production. Configuration scripts run as the machine comes up bringing it to a final state. Images as Artefacts move the provisioning upstream and out of the production environment, by producing images in advance. It is a process that is analogous to code compilation as opposed to interpreted code.

Baking an image

There are various underlying tools and platforms for image production from vagrant to ec2 (AMIs) to lxc to docker to VMWare to The images are created as part of the build pipeline and are output to the deployment. The mechanism for configuring the machines is orthogonal to the process: they could use shell scripts, Puppet, Chef, Ansible or even hand rolled (which may actually make sense in some rare cases).

Although, not everything can be baked into an image. There has to be some configuration of some sort. Database urls etc. are environment specific and may be variable (rotating passwords etc.) so they cannot be pre-baked into the image. It is desirable to keep the image variation to a minimum. This can be achieved by be externalizing configuration by using traditional techniques such as DNS, LDAP, ZooKeeper etc. or machine metadata (supported using AWS’s CloudFormation). To avoid extra infrastructure configuration techniques such as automating minimal configuration in cloud-init can be employed. Values can either be retrieved from external services at application runtime or at provision time by leveraging established techniques such as /etc/default files which can be created as part of cloud-init.

Shared configuration

MIasA allows images to be developed independently without central coordination. In CoD it is often the case that all boxes share the same configuration management code, either in a server-slave configuration or ‘masterless’ with a common package. This presents a challenge in keeping consistency across images for cross cutting configuration. This can be resolved in different ways. Netflix takes the approach of producing ‘base AMIs’ where all common and stable packages are installed. Application images are then built on top of the recognised defacto base AMI. This is analogous to object inheritance. The alternative is code share using the exact same techniques used for other code dependency management (copy-and-paste, git submodules, packages etc.). This is analogous to object composition. Each one has their independent advantages and disadvantages. A heavy reliance of base images or shared code requires management of change propagation in order to prevent the update of a base image or shared component triggering downstream pipelines and inadvertently overloading the system or resulting in a potentially undesirable upgrade of the entire environment. Although in some cases this may be desirable, if this is something not catered for in the deployment architecture it could have disastrous consequences.


Security fixes need careful consideration when using MIasA. As the configuration is baked into the images updates only occur when their respective pipelines are triggered, usually by code changes for example. Therefore applications that are stable, and change infrequently risk running with known vulnerabilities. This is simple to resolve in CoD by issuing an OS package update before provisioning starts in earnest. This technique could also be employed at provision time when using MIasA yet it arguably reintroduces many of the issues and risks that have been avoided by using images. In situations where development cadence cannot be relied upon pipelines can be triggered based on timers. By allowing the security updates to be part of image production process, thus keeping them upstream has the advantage, over CoD, of enabling validation before hitting production.


From a development perspective MIasA encourages a modular architecture. In order to develop images efficiently they must operate in isolation. Changing a piece of shared code, causing dozens of unrelated applications to produce new images, would potentially be expensive and time consuming. Therefore developing applications in a way that allows independence is a desirable prerequisite to MIasA. This suits it to microservice architectures (a popular application for

From a process perspective there are advantages to traceability (knowing which version of the image is in which environment and how it got there) and change detection is made easier (simply see if the image has changed). It also encourages a clear separation of runtime vs build time configuration (logically in repos and actual in images).

Overall reliability, consistency and repeatability are implicit with MIasA. Due to the lack of heavy provisioning and minimum configuration MIasA are extremely performant requiring only the time it takes to start the box and its applications. In heavy autoscaling environments where latency is critical they are well suited.


One of the more critical changes from CoD is that of provision style. CoD, along with immutable phoenix servers, reduce a large amount of provision complexity by not needing to be concerned with ensuring correct start on machine restarts. However, for CoD the servers are not strictly phoenix as they are all carry the previous lives of image creation with them. When creating images more thought has to be put into ensuring that the machine achieves the correct state when it is brought up from the produced image. This introduces a degree of complexity and requires careful planning and testing of upstart scripts, lsub scripts etc. Run order and dependencies (network, external endpoints, configuration files, other apps etc.) have to be configured correctly. The good news is that once achieved it should be fairly predictable (although, in the nature of these things there will always be some variability).

While MIasA keeps the complexity out of the infrastructure it does so by moving the problem into the build process. Overall the complexity is reduced, there is less infrastructure, less to go wrong and deployments are much simpler and more deterministic. However images do become more complex due to restart problem. Despite pairing well with phoenix and immutable servers machines essentially live twice and are no longer true phoenixes. Also, moving complexity to the application build risks contradiction of philosophy to move complexity from component to architecture.

There are other downsides to take into consideration before employing MIasA. Ultimately MIasA moves effort from deploy time to build time. Creating images is costly and can take long periods of time. They are also more difficult to test the full cycle (as you need to create the image and then test the machine created from the image). This introduces cycle time challenges as changes take longer to propagate at the beginning of pipelines. Although it is safe to assume that as the tools and technologies mature they would become more performant and the cost may dramatically decrease. Another downside is the rigid modularization required. This can result in a loss of flexibility in development cycle on smaller less complex systems and may require some innovation to abstract this away.

A hybrid approach

In an effort to balance some of these costs some deployment architectures use a hybrid model. Base images are employed for low variant configuration (common base packages) which are generally stable and then use CoD for high variation configuration, such as custom application packages and configuration, which tend to be related more closely to the applications development. This cost is that concerns of reliability and consistency, although decreased, are not completely eliminated and therefore complexity and effort are moved back into infrastructure (e.g. custom application repositories) so the economy may ultimately be a false one.

Overall successful employment of MIasA requires a careful balance of system qualities vs process qualities. As MIasA moves costs from deployment to development teams need to carefully consider how to balance potential impacts on pipeline cycle time, development time and configuration complexity. If teams prefer to continue with CoD the full cost of failure in their production systems needs to be assessed and the cost and effort required to increase reliability, consistency and performance using infrastructure needs to be balanced against the costs of using the more robust solution of MIasA.

Why Russell Brand Is Right

Russell Brand got a grilling from Paxman after, writing an article in the New Statesmen, and saying he didn’t vote and telling people there is no point voting. Brand’s central argument is that the current system of democracy in the UK (at least) is a sham, a charade, a pantomime. That its purpose is in maintaining the status quo and to keep the power with those in power. Paxman’s retort was that if you don’t engage then what hope is there to change this system? Brand admits he doesn’t know.

Brand’s rhetoric has struck a chord. He works by splitting the country into two camps: the poor, disengaged and thus disenfranchised, and the rich, elite and corporates who run the show and want to keep running it. Brand agrees with the Occupy movement that this is ninety-nine/one percent split. He argues that the challenge to go and vote is for the privileged one percent, not the rest of us. Since the interview, it turns out that Paxman isn’t impervious to Brand’s reasoning and has experienced similar disenfranchisement.

Now I’m more a Paxman than a Brand. I am politically engaged, I am a card carrying member of a political party, I am a local political activist and have run for the party and have represented it at Parish level. And - because I understand at quite a level of detail how politics really works and how it is essentially gamed and won - I agree, mostly, with Brand.

Here’s the problem. The ‘you can’t have change unless you vote’ rhetoric doesn’t refute Brand’s central point. Also, it places the responsibility for disengagement onto the electorate. Apathy is their crime, not our responsibility is the implication from the political class. However, the truth is that it is the political class that have disengaged from the electorate and that the design of the current democratic system, that Brand attacks manufactures that disengagement.

The disengagement starts when you consider that our democratic system is non-representative. The result is that for many people in the UK the power of the vote wields little to no value. Due to the political makeup of the country in many wards (at all levels: from parish to parliament) the swing is so strong - by orders of magnitude - in one or another party’s direction, that your vote is worthless. In my particular borough, categorised as an ultra safe seat, my vote has the equivalent of 0.044 votes. The average UK voter has 0.253 votes. We can be almost certain that 60% of seats will NOT change hands in the general election. This exacerbates itself as those in key swing wards wield disproportionate power over the national political landscape (about 5.17 times more power than the average voter and nearly 30 times more power than I have in my ward). Because of this all the political activity centres itself around those wards.

To demonstrate the real effect of this look to Kent as an example. In the 2005 election the Conservatives held 74 out of 84 seats regardless of the fact that nearly half the Kent population didn’t vote Conservative. There isn’t even an effective opposition. The Conservatives can do as they will as they wield so much of the power (disproportionately to the votes they received) - evidence of this is how safe seats and dishonest behaviour, such as the expenses scandal correlate. This is a pattern that repeats over the country (in favour of both the Labour and Conservative parties). This is a clear example of how the system is disengaged from the people, not the other way round.

In these safe wards so entrenched is the voting that parties don’t even bother engaging with their voters come election time (apart from the lip service of the odd leaflet). For those in safe seats the parties disengage, quite deliberately, from the voters in that area. This disengagement is an economic and pragmatic decision. The inertia in a safe seat is so great that to dislodge a party in that ward requires serious investment. It takes years of campaigning and parties have to be in it over the long haul (if it takes even two elections that can be over a decade of investment). That means some serious cash and remember you’ll be playing against people with some big funding. Any political party would be crazy to invest in making these investments. Better put your efforts and money into the swing seats or areas where there could be a shift and leave the safe seats to rot. And that’s a slippery slope because the area becomes more and more entrenched to the main party: hence why the trend is for councils like KCC to become increasingly dominated by one single party.

Now, here’s the other bit where the system corrupts. Let’s say you are a high flyer in a political party, though not an MP, but the party wants you on the front benches what do you do? Well, you wait for a retiring MP in a safe seat (or gently suggest they make way) and move your favourite into their seat. They’re guaranteed to get in. This works at all levels. If you want power at a local level (for whatever motivation) in say Kent, the pragmatic thing to do is join the Conservative party - with party membership falling across the main parties this isn’t an especially difficult thing to achieve. Join the Tories in Kent and you are guaranteed a seat on the council regardless of who you are and what you stand for.

Telling people to go vote for change is hollow rhetoric and a means of distraction. Look over here, not at the real problem. Voting (at a parliamentary level which is where Brand’s point lies) is an activity that occurs once every five years. For some that means by the time they reach their thirty-third birthday they’d have had only two opportunities to vote in their lifetime. To pass a period of fifteen years of adult life and only had two measly opportunities to instigate change is nothing short of a mockery, a two finger salute to the electorate. In the same period of time they would have had three chances of entering the Olympics or World Cup. And, given the odds stacked against them in my ward, they would have had more chance representing Britain in the Olympic stadium than they would have instigating any form of change through their vote.

The reason voting is a hollow call is that much of what happens in government happens outside the democratic process. This is where we have evidence of the political system disengaging the ninety-nine percent of the electorate in favour of the one percent. We have donors getting positions as special advisors thus bypassing the democratic process completely, hereditary piers and donors being given titles to get them in the Lords, without even lip service to democracy. We have lobbyists with special access to ministers and parliament. And we’re not talking people getting only a little bit of power here, where talking real influence. Now compare their non-democratic power to influence change compared to my measly 0.044% when voting. Again, it’s a complete mockery and insult to the electorate.

On top of that, and on top of that, we have massive parts of how the country is run which operate entirely outside of the democratic process. Lets take arms deals: in which governments manifesto, past of present, does it say ‘if you vote for us we’ll sell arms to dictators with human rights violations’? And yet, if you put it on a ballet paper do you think that would get through. Oh yes, then there’s all this spying stuff: who voted for that? what manifesto was that in? Erm? Worse than that, it was an active decision to hide it from the public in order to avoid stimulating public debate. Worse than that but when public debate is stimulated those that are stimulated it get threatened with law courts and notices, held under terrorism laws, property destroyed etc. etc. What part of the vote endorsed that?

The claim that parliament/government represents us is exposed from time to time. Probably at no clearer time than the war in Iraq. The majority of the public were against the war. So incredibly strongly was their feeling that it motivated record turnouts to several marches some of which were the largest in the UK’s history. Yet the government completely ignored the people. Worse than that it lied and blackmailed them. And then there’s broken manifesto promises: no top down reorganisation of the NHS, scrap tuition fees; the coalition agreement, and manifesto promise, to create a democratically elected House of Lords. For those votes that were obtained, on the basis they were obtained, is ignored and disregarded on whim with no consequence (no really, no consequence).

So back to the central point. Voting is a sham, a charade, a pantomime and the system is heavily skewed to keep those with power in power. Not only that but the political parties actively disengage from large areas of the country. On top of that they actively campaign against reform to give voters more power. On top of that they have created systems that allow a very small percentage of the population to wield far greater influence over the democratic process than the voter.

Where Brand is wrong though is on apathy having power. That the vote legitimises and therefore if enough don’t vote power will not be legitimate. Recently there was the Police Commissioners elections. Where a democratic process (a referendum) was used to decide whether to have a Police Commissioner the public dismissed them. Where the system was forced upon them the public refused to acknowledge it resulting in record low turnouts (14.9%) with some polling stations not having one single vote cast. Yet the government defended the outcomes and claimed legitimacy. In the absence of votes they fell back on hollow rhetoric. How low would the vote have to go before the system crumbles? Under ten percent? And what would the reaction be? The idea to force people to vote has been floated before and will likely come up again.

So yes, your vote is pointless but the problem is you have to keep doing it. You have to prop up the little democracy there is left in the system in the only way the political class will allow. In the meantime seek alternatives to wield legitimate democratic power to create real change. However you decide that may be.

Abstract Away & Abstract Into

You’re about to use a third party library in your codebase. Every good developer known that the first thing to do is create some domain specific abstractions by sticking a layer of objects over the top. This encapsulates the third party library and keeps is away from the client code.

This is the typical layering approach drummed into developers in their first few weeks in university/on the job/in training whatever. Leaky abstractions are bad okay? Lock them away behind a wall of bespoke interfaces and let nothing through. Our own code becomes simpler by using a narrower, more domain specific interface in higher layers. It also makes testing a lot easier as we avoid testing someone else’s code (which is a bad thing).

This approach is what I call Abstract Away because, effectively you are distancing yourself away from the third party library by creating new abstractions.

One of the big problems (and benefits) of abstracting away is that the third party library is now inaccessible except via the abstractions. Any power in that library, not covered by your own abstractions, is also inaccessible. It’s also a rather expensive thing to do and you essentially end up duplicating a lot of the underlying libraries concepts. And as helpful behaviours of the library are too distant to discover you lose opportunity costs.

The alternative is to Abstract Into. This takes the opposite approach by creating abstractions using the existing libraries interfaces and building on them in a complimentary fashion. The abstractions and the existing library sit as siblings in the same layer rather than one over the other. Almost as if they were just another set of abstractions in the same library. This is a lot more powerful for the consumer because they are no longer barred from the underlying library’s power giving them opportunity to do things beyond your initial intent.

Abstract Away is very controlling and is about limiting, or protecting the client. Abstract Into is very liberal and is about provide extra value alongside the original library.

Clojure, as a language, encourages and uses a lot of Abstract Into over Abstract Away. A great example of this is Compojure. Compojure abstracts into the Ring web application library to provide functions for building routes. However it is really very difficult to know where Compojure starts and Ring ends (or vice versa). This is because Compojure doesn’t attempt to push Ring away from you, instead it provides helpful patterns. This is incredibly powerful as the user can still harness all the power of Ring. It also makes Compojure very composable with other libraries and easy to add your own abstractions.

An example where Abstract Away is used where Abstract Into would have been appropriate, in my opinion, is the javascript graphing library Rickshaw. Rickshaw builds on top of the amazingly powerful and versatile D3.js library by providing convenient and simple abstractions in a dsl fashion for building charts. This is fantastic and the abstractions provided are very helpful allowing you to rig up attractive charts in no time. However, although it uses D3 you would have no way of knowing from the Rickshaw API, the only hint is from the docs. All the power and loveliness of D3 is locked away from you. The fact that it is built on top of D3 is utterly irrelevant. So, when you reach the limits of what Rickshaw can do what are your options? None, but to throw it away and rewrite in D3.

Libraries that have been abstracted into don’t suffer from this problem. If you hit their limits you just abstract into even more. Libraries can be composed from other libraries. This sounds similar to a plug-in model but its actually very different as you aren’t restricted in the same manner. If you look at Ring there is no plugin model.

One of the downsides of Abstract Into is that it requires a high quality library in the first place. Preferably one that has considered being extended in this way. Of course many libraries make this claim with their heavy use of interfaces etc. but how many actually match up to the promise in practice?

Abstract Away does have its uses however. It is very good at containing bad libraries by acting as an anti-corruption layer. It also makes testing simpler. Abstract Away also tends to be good for wrapping multiple libraries with a common interface. boto or fog follow this method by providing a cohesive set of abstractions over multiple cloud providers to create an ubiquitous API. Of course this could be also accomplished using Abstract Into but it is far more complicated and can get messy rapidly. It is also very useful for scenarios where you may wish to change the underlying library without disruption to the client (rarer than you think: I doubt Rickshaw will be dropping D3).

Abstract Into, on the other hand, is great for enhancing existing library and leveraging them to build new abstractions. A standing on the shoulders of giants approach. However it is dependant of high quality libraries in the first place and has an element of ‘vendor lock-in’. I think the proliferation of low quality libraries over the last few decades is one of the reasons why many developers have established the habit of using Abstract Away when Abstract Into would be better suited.

So, next time you are writing code with a third party library to create new abstractions think carefully about whether it would be better to Abstract Away of Abstract Into. Also, when writing your own libraries design them to promote Abstract Into and allow for composition.

Monitorama 2013

I spent the back end of the week attending the Monitorama EU 2013 hackathon in Berlin. It was an enjoyable, well organized affair. The talks, were generally of high quality and those I didn’t find engaging others had called out as some of the best of the day. Which suggests a good spread.

The conference was centred on one common goal: to make monitoring better. A community is definitely forming. One which is generally polite and respectful with a focus on co-operation, open source, open platforms and moving the industry forward regardless of differences (devs, ops, dev-ops, marketeers). It was one of the few events - both in professional and external pursuits - where sponsors genuinely support and enable a cause rather than the usual cynical sponsorship we have become used to - railroading, restricting and blackmailing the consumer. Instead there was more a symbiotic relationship between participants, organizers and sponsors. All had a common goal and vision and enabled a great event. As a participant I wanted to see the sponsors products because they were genuinely of interest and provided education. As a sponsor it’s a great way to get your message to people. As an organizer it enables the event and the message to become, to realize. I think that is a rare thing these days.

As I reflected upon the conference and the common threads on my bus journey to the airport it occurred to me how immature and lost we are as an industry on the topic. One co-participant said to me “I came expecting someone to tell me it was all easy and I was doing it wrong. But everyone is struggling with this stuff.”. It was a true observation. No talk dared suggest a “silver bullet” or evangelise an approach. Talk after talk focussed on the struggles and the questions, although there were some answers many of the big questions were left unanswered or addressed with dreams or speculation.

Though it is not quite that hopeless. Further reflection enlightened me as I realized that, although it is an accurate picture it is still one that tells a story of great progress. Monitoring has evolved a long way. As a community the monitoring enthusiasts have solved a large number of monitoring problems; the biggest of which is how to even get data and where to put it. Collecting data, storing it, processing it, the last few years have solved these problems (though, the conference demonstrated that there are still areas for innovation in these areas). Where we are all lost is what to do with this data. Speaker after speaker admitted that it is beyond them, reached out for help to those more skilled, knowledgeable and smarter in these areas. Arms held out to a data science and statistics community that is all too far disconnected from the community that needs it. This is the great challenge.

Alerts and surfacing problems came up again and again as a problem unsolved due to the limited skill set of the software engineer. And with that came another admission: our clumsy attempts at using naive school education statistics - predominantly normal distributions and percentiles - is misguided, perhaps even dangerous (well, relative to the sense of danger in business software rather than airplanes as speaker after speaker informed us). We’ve learnt a tool and some basic maths but it does not apply to our domain and never will. A classic case of a hammer making everything look like a nail. Our domains are unpredictable with complex models that will not and cannot fit the mathematical models of the predictable, steady rhythmic metrics of the factory lines. The result is unacceptable numbers of false positives and true positives missed or lost and even ignored in all the noise of the false positives. We need better models. And with that comes many challenges.

Some speakers entertained suspicions that as an industry we are at the forefront. The evidence is convincing. Not only are we pioneers in our industry but pioneers in monitoring across the spectrum. Other industries flounder and fall as we do even when lives depend on it. And other industries fall back on the most faithful algorithms of all, despite there great flaws, and that’s those held secret by the human brain.

For the human brain is the best pattern matcher, the best instrument to sort the signal from the noise, the best algorithms for detecting genuine or even potential problems. Yet it is a mundane unskilled activity of staring at screens, disrupting and demoralising. It conjures images of the Simpsons episode where Homer is employed to monitor the reactor plant to disastrous consequences. This, of course, creates a hankering in the technologist. Where there is a mundane activity performed by a human then a well-engineered automated solution offers permanent relief. Yet, as already mentioned, adequate algorithms are beyond us and current attempts hinder rather than help.

At the end of the conference I felt that alerts were good intentions leading to hell. To paraphrase Gogol the road seems straight and well lit yet we have all wandered of course and are scrabbling around in the darkness. A darkness caused by biblical swarms of alerts. Some speakers suggested turning them off all together and relying on humans, because that’s all you could rely on. Others suggested minimizing them or prioritizing them or collating and aggregating or other meta strategies. It was an area I remained wholly unconvinced. Nobody stood up and dared suggest that they had an alerting strategy that worked.

Given this is traditionally primarily an operational concern there was a refreshing absence of developer bashing. It seems there was common agreement that upstream developers need to consider monitoring as a first class concern and create applications that are well monitored. Downstream there was broad agreement that operations need to work to provide the services and tools to allow developers to easily integrate with monitoring platforms. And ultimately, for success, there must be communication and collaboration.

At the end of the two day event I concluded that this is a four part problem: one of technology, people, analytics and userbility. These parts all need different skills and different communities. Monitoring is also not just about cpu usage of servers. As monitoring grows I hope that next year most of the talks will be from data scientists, usability experts and business people telling stories beyond the cpu gauge and disk space alert.

Resource Centric Application

Web application frameworks, from cgi-bin through to PHP and Java Servlets all the way up to ASP.NET MVC, Ruby on Rails etc. are built around the paradigm of modelling a request and response pair: the application receives a request and then generates content, on demand, to return, as a response to the client. Each and every request, whether or not to the same resource, results in a little thread popping up (or being acquired from the thread pool) which executes a bunch of custom code. Often the same resource is requested multiple times and the exact same code will be executed and the exact same result will be returned.

In many situations, this burden on the web servers, is undesirable. So the more complete applications will serve cache headers which will tell various caches, at various points from (and sometimes inside) the web application, through the data centre, through the various ISPs, possibly through reverse proxies and finally on the client (browser or otherwise) that they needn’t make the same request to the web server again until the cache headers expire or are invalidated in some way (using etags etc.). On top of that other strategies will be employed, to separate certain resources, such as images and css files. Thus the server application is protected, by other layers between it and the client, from excess and unnecessary traffic.

It’s a complicated game. It requires putting smarts in your application, choosing cache policies, configuring extra layers of infrastructure such as reverse proxies. It can all too easily turn into a fragile system built using a delicate art form.

For some parts of the system, such as images, it is simpler to consider these resources as ‘static’ and publish them using a completely separate and more efficient mechanism (say a lightweight high performant file based web server or direct to CDNs) far away from the application platform. And it tends to be at these two extremes that we architect our applications.

The result is a simple split between static and dynamic content, with dynamic being clearly defined as any resource that requires computation of any sort, regardless of its lifespan.

This approach, of modelling your application around the request and response pair, in some cases, can dramatically increase its complexity. By using a request/response model for anything that requires computation the opportunity to simplify and clarify the application is lost.

An alternative approach is to model the resources themselves and looking at modelling the mechanism of change. For example, is the resource’s change well known such as on a time basis (every hour) or based on an event (someone has updated a piece of information)? By analysing these attributes of the different resources, and modelling them that way, allows alternative architectures to emerge which reduce the struggle against caches etc. It may even be possible, in some applications, to completely remove dynamic, on demand, computed resources altogether and reduce everything to static resources and thus make redundant the need for an always-on request and response driven web application.

Publishing systems such as Octopress and Jekyll take full advantage of this approach. The attributes of change in the engine are well understood and even allow for different rates of change depending on context. Change has been reduced to such a degree that all resources are static. For example, the Octopress application builds static content based on changes to the blog post source files at the content owners request (via a rake file). If the layout of the site is changed then the entire site is regenerated. Then, using simple mechanisms such as rsync - or even git to push deltas - Octopress’ publishing mechanism simply pushes the static files. Change is so well understood that even if you publish several pieces of content an hour, or only a few pieces every week, month or even year, the approach to resource production and publication is consistent and sound. The result, compared to similar systems in the same domain such as Wordpress, is that no special hosting is required, no concerns about downtime, no custom extensions to install on web servers, no custom running software etc. etc. In fact the delivery mechanism (github pages, s3, CDN, nginx) is not a concern of the publishing app itself and in no way affects the consistency of what is delivered to the client. This is an inspired employment of separation of concerns with all the benefits it begets.

There are other approaches which receive the same end goal. For example, a job could read events off of a queue and generate new static resources in response. Many applications have backend jobs that transfer data, in bulk, at regular time intervals: typically data is transferred from a third party system and directly into the web applications database, which the web application, at the moment of request, then fetches and transforms (into a presentable form more representative of its own domain). This process could be reversed by placing the responsibility of generating the resource as a static resource, on the batch process itself and thus completely bypassing the web application. Another approach is to memoize by generating static content directly off of the web application in a manner similar to warming a cache, except, rather than warming a cache the computed resource is published as a piece of static content elsewhere.

These various approaches have many advantages. Not only do they dramatically simplify the application itself, by removing the complexity around ‘protecting’ the web server from load, they remove, or drastically reduce, the problems of high availability, failover and zero downtime. Generally speaking, the processes that generate the static content will tolerate far, far, higher downtimes than a web application would. It is a far simpler, well known, solved, commodity problem to generate a high availability, scalable, zero downtime system for static resources. And it is far cheaper too.

Another simplification could be to treat resources as immutable. An immutable resource is incredibly simple in terms of cache headers etc. (which are still useful for traffic reduction) as they never run out; if the resource will never change then it can be cached indefinitely. Another side effect is that all resources, by default, have permalinks. Mechanisms can be employed to allow clients to obtain the lastest version, such as using redirects or publishing atom feeds.

Of course, there may be some functions of a system where this behaviour isn’t desirable and a request/response model is preferable. There are clear constraints to determine that case. For example, if resources changed at a far, far higher rate than they were read, or the number of the resources was so great that the sheer volume of static resources would be over whelming, or they were genuinely unique (or the majority was unique) per request or the resources life was so short, then perhaps, for those specific individual resources alone, the generate-on-demand model may be better suited. These types of situations are probably quite small, both in any singular application and across all domains. Particular examples may be shopping baskets (which are short lived) or search results (whose combination is so great and unknown). However, in these situations a hybrid approach could be taken.

The request/response models approach could be perceived as a form of premature optimization and over-engineering. It assumes that we are building systems that are highly dynamic in nature and therefore introduces complexity to the platform by optimizing for that scenario. For the vast majority of applications this is just not the case and therefore they inherit complexity for a solution that is unlikely to be required. It is a clear case of YAGNI. As applications such as Octopress show things can be dramatically simplified if these optimizations are rejected. By starting from the assumption that all resources can be static until proven otherwise.

Unfortunately, for the time being, frameworks are built on the request and response model as opposed to a resource, or hybrid model. For this reason, regardless of the architectural advantages, the productivity boost from using these established frameworks will outweigh the benefits.

Tiered Support Is an Anti-pattern

Back when the first internet bubble was bursting I had my first web development job. We thought we were sophisticated because we used Macromedia Drumbeat whose killer feature was, gosh, dynamic ASP and JSP websites. This put us a cut above those ‘amateurs’ who chopped huge TIFFs into static HTML using Fireworks (pfff)!

We also did a lot of other stuff, like edit our files directly on production. We were doing Continuous Delivery (without the source control or build server ;P). We also did our own support. We were responsible for everything, from the code we wrote to the web server itself and even the relationship with our ISP.

When something went wrong people phoned or emailed the web site team direct. Yes, there were plenty of ‘it works on my machine’ and ‘ignore that error, just refresh’ and ‘hang on, I’ll just recycle the IIS process’ but we had our fingers on the pulse of our users. And when there was a problem we could fix, we fixed it. Immediately. While the user was on the phone.

That’s how we rolled in those days. Seat of your pants, in touch with our users. Yes, we had a lot of bad habits but we knew IIS4 was rubbish and crashed regularly, we knew that one of our badly written pages was responsible for bringing down the whole site at peak times, we knew that our perl script to upload ads failed on a regular basis.

My next job, some four years later, was as Development Lead for a bank call centre application. The environment was far more ‘enterprise’: source control, change requests, dev and test environments etc. Due to a shortage in desk space we ended up sat right in the heart of the call centre. When our app went wrong we heard the call centre staff complaining. They would even come over to our desk (or shout across) and ask for our help directly.

In both roles the same thing happened: managers decided that we were spending far too long investigating users’ problems and not long enough building the new features the business wanted. Developers needed to be more productive, and more productive meant developers developing more new features. To get developers to develop they need to be ‘in the zone’. They need headphones and big screens to glue their eyes to. They did not need petty interruptions like stupid users ringing up because they got a pop up saying their details will be resent when they tried to refresh.

In both cases the same method was prescribed: support calls would no longer come to the development team. We were to redirect emails and our desks where moved to another ‘quieter’ building. From now on everything would go via the IT help desk. Someone (L1) would log the request, raising a ticket. They’d search their knowledge base and if they didn’t have an answer they’d pass it on (L2). And if the support team couldn’t resolve it then it would be raised again against the development team (L3).

In both cases the development team became disconnected. When there was a support ticket it was an interruption. It broke our flow, became something to get irritated and annoyed by. It was someone else’s problem, why can’t support just deal with it? Was it really that hard? Jeeeeeeeez. Bugs were something to be captured and handed over to someone else to prioritise the fix, they weren’t something to do with us.

In both cases the team was no longer responsible for what they produced. There was now a process for dealing with deficiencies in the system and that moved responsibility to the process, not to the developers.

A systems thinker would tell you this is wrong. You’ve gone from a system that connected a user to the team responsible with one degree of separation, to one that has three degrees of separation. Or think of it another way: the team producing the product, and responsible for improvements and fixes used to be one degree away from their end users, who use the product and are feeding back the product’s shortcomings and issues, but are now three degrees. And not even three degrees all of the time. The majority of the time the team won’t ever hear about most of the support issues. And most of the time the team won’t even have that much interaction with the team that does hear about most of the support issues.

This is wrong for many reasons. Let’s ask some questions: the user doesn’t find the software easy to use: who needs to know that? The user gets 500 errors everytime they click on ‘submit’: who needs to know that? The user can’t clearly work out what currency the prices are in when ordering from a different country: who needs to know that? The user never received a confirmation email: who needs to know that? Is the answer to any of those questions The Support Team? Help desk? Or perhaps the team responsible for the development and maintenance of the system?

There is another side effect: failure demand. Essentially you create a greater demand by moving support away from the team because, rather than resolving the issue, people return to raise further requests based on the ‘failure’ of not resolving first time. Also, development teams create additional failure demand by producing more bugs rather than fixing the existing ones. Except they are so far removed they have no idea that they are doing this. The result is a suboptimization of the whole.

Supporting a product should be an essential responsibility of a product team. That means developers, QAs, PMs, UX etc. Tiered support removes and distances the team from their users and encourages an ‘over-the-wall’ culture. Ultimately the product suffers and the users suffer as a result as, ironically, the team become more productive at producing bugs.

The alternative solution presents itself in Continuous Delivery. It’s easier for teams to connect to their users if they can roll out changes quickly. Exposing the team to production via information radiators and monitoring also keep the team on the pulse and enables them to react quickly and effectively. And ultimately being in direct contact with the end user via whatever means (Twitter, phone, email, face-to-face). All without the need for layers of support and bureaucracy.

Website as Decorator

The conventional way to build websites, over the last decade or so, has been to treat them as first class applications in their own right. After all, they often have behaviours, and domains, that are very specific to their usage.

There has been a downside to this. The result has been the production of an entire generation of monolithic applications that are expensive to maintain and extend. And that’s before we come on to concerns like scalability and the ability to leverage developments in cloud technology.

There have also been two orthogonal movements that have given rise to these unwieldy monoliths: LAMP and CRUD frameworks. One side effect of LAMP has been the dismissal of architecture as a concern. For years developers were liberated from worrying about overall architectural design and, instead, reached for a LAMP stack, or its MS/Sun equivalent. Likewise CRUD frameworks, such as Rails and its imitators, removed concerns about internal engineering principles in favour of a focus on ‘the domain’. Unfortunately it is potentially a highly skewed and limiting abstraction as movements driven by the Command and QueRy Separation (CQRS) pattern have attempted to demonstrate.

The other problem has been that, for increasing numbers of companies their websites have become their business. Thus what was a simple lightweight application has grown in functionality until they are essentially where the majority of the business operations are run.

Combine all these factors, as the large majority of web applications do, and there’s more than a fair share of problems. Problems which have been exacerbated by the stellar rise in web consumption, the need to integrate with third parties, the growth of client side scripting and the little predicted fragmentation of the client side consumers caused by the rise of mobile. Development teams are caught unawares by problems such as scalability, high availability and multi-channel consumption and struggle to manoeuvre systems that were never designed to handle these concerns. The result is teams reach for another rewrite.

Web applications that are built with these concerns from ground up look very different. The same engineering principles (such as SOLID) that are applied everyday at a code level are applied at an architectural level. Architecture is important again. The result is a movement away from web applications to web systems. The responsibilities of the website have been drastically reduced.

The reduction is severe. Websites are hacked back until they essentially become decorators providing placeholders for a set of disparate content modules served from varying endpoints (i.e. services). The website essentially composes resources from various locations and style them (using CSS of course). The site is left with very specific concerns, such as layout, style, co-ordinating security between the various services (e.g. ensuring that the HTML returned from the Basket Service is for the logged in user), defending against integration issues and providing SEO optimized URLs etc.

All the real functionality, all the real content, is provided by a set of independent services. These independent services provide the real resources as blobs of HTML for the web site to style as needed or as json documents for client side javascript. A Products service provides blobs of HTML ready for the web site to style as needed, a search service takes inputs and returns HTML, again ready for the web site to style or json for the client to process as-you-type Google style. All of this happens somewhere in the background in a manner opaque to the user. From a client perspective it is indistinguishable from any other website.

Web applications become web systems. Systems comprised of services which provide narrow, vertical, domain specific resources and capabilities.

This provides huge benefits and opens up new opportunities. Each service is independently deployable, scalable, cacheable and developed. Independent ‘two pizza teams’ work on each codebase using whatever tools, languages, techniques are most appropriate for their particular domain. For example, one service may process requests up front using a python batch job and deliver resources to a CDN to reduce load, while another may generate resources as needed and handle requests asynchronously using NodeJS.

There are other advantages: the front end is able to employ techniques such as segregation by freshness to increase the cachebility of the site without any changes to the architecture, simply by changing the behaviour of the front end. The website can be easily made Anti-Fragile by removing single points of failure and providing coping strategies when other services are unavailable thus increasing overall uptime and availability, again without huge change. Heavily used resources can be moved to CDNs to increase capacity simply by telling the website to load them from a different endpoint. Client side applications become easier to write as endpoints can be simply exposed. Third parties can integrate and leverage your system in unimaginable ways by externally exposing the very same service endpoints used to build the service internally, a technique employed to great success by the likes of Amazon, Facebook etc.. Mobile apps etc. can likewise be simply built off of the same services without requiring any architectural changes. Testing becomes dramatically reduced in complexity as each service is individually verifiable without having to apply expensive, long running regression suites against the whole thus dramatically reducing turn around time. And, again, all things can be independently (auto) scaled and deployed in a manner that makes sense to those individual services giving teams a larger number of levers to pull to ensure a good user experience without expensive and wasteful vertical scaling. Legacy problems become considerably easier to solve - in what is quite probably the most common application of the website as decorator pattern - by acting as a Strangler Application which pulls resources from legacy sites and repurposes and strips back the content from old systems.

Tools such as SiteMesh exist to enable such patterns. Movements such as Microservices are a natural extension of these techniques. Sites such as Amazon and Netflix are built upon these principles and techniques.

I would argue that building the website as a decorator and backing it with independent vertical services is a viable approach regardless of the size of the web application (within reason).