The problem with “data sharing”

The consultation closed recently on the “Better use of data in government” proposals. It has been some two years in the making and yet seems to be a superficial retread of many of the ideas repeatedly surfaced by civil servants to previous administrations – Transformational Government (2005), the Identity Card Act (2006), and the Coroners and Justice Bill (2008) amongst them.

Here’s a quick rundown of just some of the issues where I found the proposals lack rigour.

Lack of definitions

The paper provides no objective analysis of the problem(s) it aims to fix and no evidential reason as to why “data sharing” is proposed as the (only? best?) solution. Oddly there is no definition of what is meant by “data sharing”. Does it mean data duplication / copying / distribution, or data access, or alternatives such as attribute / claim confirmation? These are all quite different things with their own distinct risk profiles.

Paragraph 54 p.15 implies some potential use of attributes (“flags”), but without detail or context, and paragraph 77 p.21 some levels of granularity of types of data access, but these issues need to be at the core of the paper. Similarly, there is no definition of “public” and “private” data, and hence no quantified levels of granularity of sensitivity within those definitions (e.g. for at risk children, protected witnesses, undercover officials, etc.).

The paper hence displays little understanding of or references to current best practice. For example, there is an inadequate exploration of how technology could be mandated that enables confirmation of attributes (e.g. “this person is over 21”, “this household is entitled to a fuel discount”) without disclosure of any personal data records. The document is discursive and verbose when it needs to be analytical, evidence-based and precise.

There is an odd claim that APIs are a “new” technique (they are not). Neither by themselves do they “allow the access to the minimal necessary information” (p.4, para 12): any API will merely do what is specified for it, including distributing an entire sensitive personal data set for fraudsters or hostile governments to mine and exploit.

This lack of definitions is also exhibited in the “Illustrative clauses”, where references are made to “disclosure of information” without defining what that means – whether copying information to third parties; providing them with controlled one-time access; or whether it would merely be to confirm e.g. “this person is in debt” or “not eligible for benefit X”.

So what’s the problem we’re aiming to solve?

In terms of better and more efficient services for citizens, the description of the fragmentation of experiences of users of public services suggests the core problem is not data but poor service design. The fact that service “design” (and hence data) is fragmented across organisations is a reflection of services designed around organisational structures and their needs rather than citizens. The paper however contains no analysis of whether better services can be created by redesigning them around users’ needs rather than by trying to reverse engineer a solution through “data sharing”.

The idea of applying “data sharing” to problems that actually frequently derive from inadequate organisational and service design makes this paper read as if its purpose is to paper over the cracks and inefficiencies of existing public sector organisations and hence protect them and their existing poor processes rather than fixing the underlying problems. The paper appears rooted in a bureaucracy-centric viewpoint when what is required is a user / service-centric one.

In terms of better search and statistics to inform better decision-making, there is inadequate distinction between public and private data. It provides no specific detail on the process of de-identifying personal data nor any reference to the known problems of achieving this successfully, although paragraph 107 p. 30 proposes putting into legislation the key criteria of the de-identification process. This will need to be about more than just removing personal identifiable information before disclosure however, also placing an obligation on the data owner to ensure no re-identification can be made using the data released in combination with other data. This is a much more complex issue than focusing on a single data set released in isolation and requires co-ordination and risk assessment across and between data sets.

Where public data rather than private data is concerned, a useful policy position would be that by default all such data should be automatically published or accessible via open APIs. Neither should it be limited to government’s own internal needs, but provide a public good, open resource for the wider UK economy. There should be no additional cost in doing this: the same interface can serve ONS alongside everyone else.

Lack of policy alignment

The paper does not make clear how proposals to “data share” comply with the government policy of citizens’ data being under their own control rather than civil servants’ (para 8 UK Government Technology Code of Practice). Instead, it appears to place the bureaucracy at the centre, weakening citizens’ control over their data in order for public bodies and their employees to “share” it around between their organisations rather than by improving the design of public services. It thus appears out of step with the focus on user needs and better services being pioneered by the Government Digital Service.

The paper appears unaware of, or unaligned with, other government initiatives. For example, it is notable for the complete absence of any reference to the Verify user identification programme. If Verify is to be used, why is it not included? If it is not to be used, why not?

Its absence suggests that citizens and their needs do not lie at the centre of this paper. What alternative identification, authentication and verification mechanism will be used by citizens to ensure secure and authorised access to personal data if Verify is apparently to be ignored? And what identity and access management approach is going to be used within and between public sector bodies? No such system currently exists.

What alpha or proof of concept work has taken place over the 2 years this paper has been running to explore and validate different models and inform the policymaking process? Why not “show the thing” rather than just spend 2 years producing paperwork?

Illustrations are simplistic and unrealistic

The illustrations provided are well-meaning but overly simplistic. For example, the illustration given of registration of a birth makes no mention of user identification, authentication or verification. As a result, the examples as they stand are more likely to increase fraud rather than help mitigate it: fraud often arises from poor data management and inappropriate data access and access controls (including social engineering to exploit such weaknesses), providing fraudsters (both insiders and external agents) with the ability to game the system. “Data sharing” more widely – providing an even larger pool of people and organisations with access to useful personal data – will remove it from current domains of control and context, exacerbating and increasing problems of fraud.

Security is mentioned only 11 times in the entire paper, but there is no detail of the computer security techniques to be applied. In particular, the paper makes only one mention of encryption. Along with the absence of any detail about identification, authentication, access controls (authorisation), confidentiality, integrity, non-repudiation, audit, protective monitoring etc., the proposals are inadequate in determining how opening up personal/private data will reduce fraud rather than increase it. There is a risk of repeating the poor design of earlier central government programmes (e.g. see the lessons learned on New Tax Credits [1], which took a well-meaning but simplistic approach to simplifying a complex data issue).

In the absence of such details, they could effectively lead to these proposals becoming a fraudsters’ charter.

Drilldown into an Example

Let’s take just one example – that provided on p.17 – to explore how these proposals lack detail and an explicit understanding of the problem domain and the issues that need to be tackled:

For a paper asserting to be the major proposed policy on sharing citizens’ private data with more civil servants and more organisations, it is notable for its failure to provide detail on the processes to be applied to the protection of data, whether it is attributes that are being confirmed, how users are authenticated, how audit happens etc.

The questions lack a meaningful context

The weaknesses above undermine the questions posed in the paper, since they lack an objective basis against which they can be assessed and answered. Let’s take an example, that of Question 8:

“Should a government department be able to access birth details electronically for the purpose of providing a public service, e.g. an application for child benefit?” (p.17)

Of course government services should be more efficient and online and seamless and painless: no-one argues with that. But this is not the issue here: it is unclear how anyone will be able to answer this question with any meaning given the absence of any description of how the system would work. Details missing include:

how will a civil servant in the “government department” identify themselves as a person with a legitimate interest in the birth details and with an appropriate level of clearance?
how will their access be monitored and how will it be audited – and will this be real-time protection or retrospective? (particularly important if someone is accessing an at risk individual’s personal data)
will civil servants be able to trawl all records or only the specific one related to the event on which they are currently working, and how will such mapping/matching happen?
how will the data be “shared”? Will it be copied to their system, will they get access to the full record, will they merely view the record on the system where it is currently retained, or will the system merely confirm attributes (e.g. “this parent has a child and is eligible for child benefit”) without disclosing any details about the child or their data?
how will data be secured, what levels of protection are being applied to data at rest and in motion? What levels of granularity are being applied to access controls to ensure more sensitive data is not disclosed without appropriate authority?
how does the civil servant prove they are acting on behalf of a legitimate parent or guardian and not participating in a potential or actual fraud and merely “fishing” for data?
how does the parent or guardian initiating the claim for child benefit prove who they are and prove that the child that they are asserting is theirs *is* theirs?
how is the data of those most at risk going to be “shared” whilst ensuring preservation of security and without tell-tale flags that in turn reveal that a sensitive record has been “hidden”?

In the absence of a definition of how this will work, how are the questions asked in this paper going to be answered with any credibility or meaning? Without such detail, the potential for more widespread and automated fraud and the compromising of potentially at risk people, such as vulnerable children, will be compounded.

Summary

We all need government to become smarter in the way it works, and to play a more positive role in our economy. Better use of data must play an essential role in making this happen. The problem is that this paper provides inadequate detail about basic, fundamental areas (such as security, privacy, accountability) – or indeed any early proof points – that will determine the success of the systems it proposes to put into place.

Without these details being clearly defined, in either the paper or the draft illustrative clauses, the proposals to “data share” will expand the pool of people and organisations able to access citizens’ personal data. In an increasingly digital economy, expanding access to useful personal data is more likely to increase the risk of fraud, not reduce it. There are smarter ways of tackling these problems – from improved service design to technical measures to protect data whilst enabling it to inform decision-making.

Disappointingly, they are inadequately covered by this paper.

[1] See for example “Online tax credit system closed” and https://www.nao.org.uk/press-releases/hm-revenue-customs-2005-06-accounts-the-comptroller-and-auditor-generals-standard-report-2/

New tech observations from the UK (ntouk)

Jerry Fishenden's technology policy blog

6 thoughts on “The problem with “data sharing””

Leave a comment Cancel reply