Why are data standards so hard?

If you want to smooth the flow of data between systems, reduce cost and complexity, get better information out….and a myriad of other benefits….you need to standardise your data.

It’s blindingly obvious – why have a dozen different ways of representing the same thing … and then have to deal with mappings and mismatches and inconsistencies that render your analysis meaningless?

So, data standards are a good thing. Really, nobody will ever argue against that principle.

So….why don’t we just do it?

The thing is, data standardisation is really, very difficult to achieve. I have seen many standards initiatives consume vast amounts of resource and deliver no lasting benefit. I think there are four main reasons for this.

1. Specification or standard?

Having spent much of my career working with data specifications I have become a bit of a definition pedant. It’s unavoidable really and I am spectacularly bad company at parties.

One of the definitions I find most infuriating is the way many people define what a data standard is. A data standard is a data specification that has widespread and consistent adoption. That second part is really important.

You cannot create a data standard by just creating a specification, irrespective of how beautifully presented it is, how much analysis and review goes into it, or how much governance and control you wrap around it. A data specification that is not widely adopted is not a data standard….and data standards initiatives that don’t focus on implementation are doomed to failure. I have witnessed that too many times.

Rant over.

2. Costs and benefits

While benefits of standardisation are easy to set out, there are also costs associated with adopting standards. With standardisation being inherently ‘for the greater good’ it is often the case that the costs and benefits do not fall in the same place. You have to be prepared to shoulder a greater cost in order to contribute to the greater good – and this can be a very difficult case to make.

In addition to the one-off cost of change there is an on-going cost associated with the loss of control over your data specification. Using an externally-managed standard means that you can’t change a specification when it suits you; you might not even be able to make the change you want at all. You might also become liable for implementing changes that others want but which deliver no benefit to you. There is some inherent loss of sovereignty in the adoption of data standards and it takes really strong leadership to properly commit to this.

3. Edge cases and oddities

They say that the devil is in the detail – and that certainly kills a lot of standards work. You can pick an area for standardisation and fairly easily define a model that standardises 99% of your domain or 99% of your use cases.

Then there is the 1%.

The 1% are those real world oddities that don’t fit your simple model. So you either ignore the edge cases (in which case your model doesn’t meet usability requirements) or you create a more granular, abstract model which caters for the edge cases but becomes even more complex (expensive) to implement and to maintain.

4. Standardising that which is not

My final type of standardisation failure is perhaps related to the edge cases and oddities, in that it is where data specifications hit up against the real world. However, in this case the problem is attempting to create a single data specification that covers a domain that is inherently not homogeneous, normally because it is too broad in scope. Now I’ve written before about the fundamental tensions that exist between hard data and our soft world so this is, to some extent, always going to be a problem with any data specification. But attempting to create a data specification for a domain that contains too much variability – entities that are inherently dissimilar – will run into the same problems are set out above. You either create a specification that over-simplifies reality and therefore fails in any real-world application, or you create a specification that is so abstract and complex it becomes unimplementable.

So how do we achieve true standardisation?

None of these issues are easy to fix. They require strong leadership, realistic thinking around scope and specifications and a clear focus on driving implementation. The benefits are massive, but it is not an easy road to travel.