dev, DevOps and trusted systems

Software’s coming home

Software development and technical architecture design have been taken back in-house over the past 5 years – ending the era when Whitehall departments outsourced almost everything to a large systems integrator.

This client-side ownership of technology is a return to the way many of us operated during the 1980s and 1990s. Back then we had accountability for everything. From user needs and business analysis, to user research and design. From software engineering using rapid prototyping, testing and iterative user feedback to going live through a series of incremental releases, and then continuing to endlessly refine and improve the live services. None of this is anything new.

Roll forward to 2017 however and it seems a counter-intuitive time to build large amounts of custom software in-house. Many common technology needs have become a commodity, available as services from the cloud or as mainstream products, both proprietary and open source. So why is there this resurgence of in-house software development: is it just a continuation of the tendency of government to see itself as special? After all:

Public sector organisations tend to see their problems as complex and unique, thus requiring tailor-made, high-risk, state-of-the-art solutions even when alternative, off-the-shelf, cheap, tried and tested systems are available. [1]

Well, that all depends. If organisations apply techniques such as Wardley mapping and objectively analyse and understand the current landscape – where they are now and where they need to be – they may have a rational basis to decide which work needs to be brought in-house and custom designed and built. For those with a map, taking ownership of specific custom systems in-house can make perfect sense.

A Wardley Map: source Simon Wardley

But if they’re deciding to build things in-house without a map and just because it seems like a good idea then, yes, it’s simple vanity and stuck in the tired dogma of “build it and they will come“. And it’s expensive vanity at that – wasting taxpayers’ time and resources on developers’ pet projects rather than focusing on the most effective and timely way of redesigning and delivering better services.

Applying the right approach in the right place was part of the logic of disaggregation, distinguishing between commodity needs – acquired from the marketplace at competitive prices – and government’s niche needs, such as complex welfare and taxation requirements derived from decades of incremental policy and legislative changes. This approach, which was also reflected in the principles of “cloud first” and “buy before build”, was set out by the Government Digital Service (GDS) in guidance later deleted:

A move to platforms does not mean that government has to develop everything in-house: many of government’s needs can be met by existing cost-efficient utility services. However, government can help to establish best practice in areas such as personal data privacy.

Wherever appropriate, the government should use existing external platforms, such as payments services (ranging from third party merchant acquirer services to the UK’s national payments infrastructure). Deciding to develop platforms in-house will happen only where that is the best way to meet users’ needs in the most flexible and cost-effective way. [2]

Even where in-house bespoke software projects are required, many of the requirements can be consumed from existing services or built in the cloud using platforms such as AWS Lambda, Heroku and CloudFoundry, not by hand-welding an expensive, bespoke infrastructure as in the past.

The variable state of practice

This renewed emphasis on in-house development means organisations taking ownership of software engineering quality (a duty, in theory at least, previously the responsibility of the big systems integrators). Many of the public sector systems are critical national infrastructure and require robust engineering practices – hence initiatives such as the Trustworthy Software Foundation.

Process centric approaches however, while useful, often don’t address the most important issue – the quality of the software produced. There’s been a strong emphasis within the UK government on using standards to help drive cross-government consistency and interoperability. So it’s surprising there isn’t more visible activity in this area, exploring for example whether work such as the OMG CISQ standards, which address software quality issues at a system level, might help improve consistency and quality within and across government programmes.

Software itself is also becoming increasingly diverse, partly because programme teams, even within a single organisation, are often free to adopt their own pet programming languages – without always considering which language is best suited to the task at hand. Then there’s the challenge of integrating this new code not only with web front-ends but also the legacy systems, processes and data that keep the majority of government services running.

Up-front analysis and understanding of how these components, new and old, will interact with each other, and identifying potential risks to critical public services, are essential to success. This is why standards are so important. Regardless of the programming language used, the sensible application of standards can help ensure consistency in everything from code quality and security to architectural consistency to API design, within and across programmes.

The current state of dev, DevOps and continuous deployment is highly variable – from well designed and managed environments, to chaotic shanty towns lacking robust code and deployment practices and cobbled together on the fly by teams learning in the dark as they go. Some of the poor practices I’ve encountered – in both private and public sectors – include:

  • dismal “discovery”, with no situational awareness or objective analysis of the landscape and what already exists – but instead a desire to rush into building something new
  • no consistent technical performance measures, tooling, automation and dashboards
  • time and resource displaced into rolling hand-baked bespoke infrastructure and approaches rather than consuming common tooling and automation
  • no technical assurance or objective visibility of which programmes are performing well, and which least well

In the absence of a meaningful map and strong leadership, it’s not uncommon to find developers busy building new shiny things just because they think it’s the right thing to do. Instead of focusing 80% of effort on solving the particular user and business needs of a programme, time is instead consumed on duplicating and hand-forging the basic plumbing – everything from provisioning and pipeline management to identity and access management.

Good practice

Organisations with good DevOps practices have adopted re-usable models. New programmes hit the ground running, instantiating and consuming the core environments, processes, functions and systems they need. Unless there are exceptional security issues, these environments are being run from and scripted and configured in the public cloud. Automation and standard performance management tooling let teams focus on the specific business problem at hand, not the underlying technology.

Programmes run this way can focus on their specific business and policy issues rather than hobby tinkering with the technology stack. They also provide a more consistent and open view of their work through the use of real-time dashboards. The best examples cover areas that include:

  • technical debt (automated measurement based on quality measures for security, reliability, performance efficiency and maintainability)
  • backlog management (state of backlog, criteria/priority for working through backlog)
  • deployment frequency (e.g. twice a week, 6 times a day)
  • lead time (how long to get code from development into production, performance over time)
  • change volume (number of new features a day/week/month, new/changed lines of code per week)
  • percentage of failed deployments (how many failed, recovery time, performance over time)
  • availability (overall uptime per service, benchmark against user needs / SLAs)
  • customer issues / tickets (comparative volume by time and mapped to releases, number of P1s, P2s, etc.)
  • usage (number of users or systems accessing the services – website or API-based)
  • performance / service response times (validating e.g. scalability, i.e. that response time is stable independent of load: e.g. always sub 3 seconds)
  • percentage change in utilisation (number of users of service(s) or function(s) per time period)
  • code vulnerabilities (number of security vulnerabilities a day/week/month, performance over time)

This is essential, baseline data, across all programmes in an organisation. It makes transparent what’s going well and what isn’t – making visible where problems exist and informing decisions on where resource can most effectively be applied. And it’s not just about dull, textual lists like the one above – live, colourful dashboards provide a much more effective way of making previously obscure technical data visible to all levels of a team.

An application analytics dashboard: image source CAST

These dashboards also provide higher order benefits, such as insight into which teams – in-house, contractors or supplier-provided – are coping well, and which ones are struggling. Over time, they help inform decisions about which resources are best placed to handle which type of programme – further improving quality and outcomes, and helping manage and improve the supply chain and its capabilities.

Less good practice – poor teams abusing “Agile”

At the other end of the scale, I’ve encountered “Agile” programmes where nothing’s been deployed for months. I’ve even come across “Agile” programmes running for years that haven’t delivered any meaningful outcomes. Programmes where there’s inadequate separation of roles, with developers checking-in and approving their own code into production. Programmes with privileged processes running, yet no-one’s quite sure who owns them or what they’re doing. Programmes where no-one’s trained in writing secure code and there’s no automated tooling to check code against well known vulnerabilities. Basic, basic stuff.

Equally damaging are project assurance teams who should be identifying and flagging such problems, but who lack the necessary competence and experience in modern technical practice to meaningfully assure anything. I’ve seen such teams rely upon qualitative verbal assurances and “specially prepared” documentation rather than reviewing live dashboards of real data. They make no obvious effort to objectively analyse current and longitudinal performance over time. Yet this is baseline data any decent Agile team will have at their fingertips – it’s part of the air they breathe, part of how they operate and deploy high quality code to a regular heartbeat.

Project review teams ignorant of good technical practice or who obsequiously connive to accept what they are told rather than examining the data are as much part of the problem as development teams that masquerade under the pretence of Agile while exhibiting all the characteristics of old-style programmes – running late, descoped and over budget.

But it’s not just the code …

Ensuring continuous deployment through good software engineering and DevOps practices is all very well. But they’re fairly meaningless on their own. They need to dovetail into wider aspects too, including:

  • technical measures (such as system level component integration and overall architectural compliance)
  • business measures (such as delivery to budget and adaptability to meet changing needs)
  • policy measures (such as the extent to which legal deadlines and requirements are being met, and the ease with which future policy changes can be implemented)

All of these – technology, business and policy measures – need to be composed and orchestrated together for a programme to succeed.

It’s not just about the code – but meeting wider technical, business and policy needs too

I’ll consider how coding practices and continuous deployment need to dovetail into these wider issues of business and policy needs in a future blog. After all, there’s not much value in well-engineered, secure and adaptable code if it doesn’t fit the wider technical architecture – and particularly if it doesn’t meet ever-evolving business and policy needs.

[1] “Reducing the risk of information systems failure”, p.99, Heeks, R. In “Reinventing government for the information age.” 1999, Routledge. Heeks, R. (Editor)
[2] GDS original guidance on “Government as a Platform”. Later deleted, but available to see in GitHub here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.