What are the principles of joined-up data?

This post was written by Tom Orrell with contributions from Beata Lisowska. It was originally published on the Open Data Watch blog here

At the Friday Seminar that preceded this year’s UN Statistical Commission (UNSC), Open Data Watch’s Eric Swanson asked me a challenging yet pertinent question following my presentation to the plenary. He asked: “The definition and principles of ‘open data’ are quite clear and simple but the principles of joined-up data are less clear. Can you enunciate five principles of joined-up data that could serve as a practical guide for others?”

This is a question that we at the Joined-Up Data Standards (JUDS) project have been beginning to answer through our discussion papers, blogs and consultation paper. That said, Eric touched on a real gap in terms of concrete guidance when it comes to a commonly recognised list of principles for interoperability at a global level.

Proposed checklist for new standards from our consultation paper

Is there a clear need and demand?
Does it duplicate the efforts or compete directly with standards that already exist?
Are the design of the architecture and individual elements intellectually, logically and methodologically sound?
Do components (building blocks) within the standard adopt other existing standards wherever possible?
Is it designed to ensure comparability and interoperability with other standards?
Will the data be available through open, sustainable and easily accessible channels?
Is there political buy-in from the institutions that need to produce the data?
Are timelines for development, implementation and adoption realistic?
Does the data that can feed into the standard already exist?
Is it realistic to expect that new data can be produced to feed the standard?
Does any historical data exist that can act as a ‘rear-view mirror’ for the standard?

This blog builds on the answer that I gave to Eric at the UNSC and sets out five core interoperability principles:

Principle 1: Reuse existing data standards

Perhaps the most basic principle that underpins joined-up data is the notion that new classifications and standards should not be developed unless absolutely necessary. Where possible, those seeking to develop a new standard should spend time considering what is already out there and whether an open data standard already exists that can simply and easily be adapted to their needs. This principle is implicitly recognised within our consultation paper, where we suggest a ‘checklist for new data standards’ as a guide for anyone seeking to produce a new data standard. Moreover, any new standard developed must be compatible with existing standards.

Principle 2: Ensure standards are user-driven

The explosion in open data publication that has taken place over the last twenty-odd years has happened with the key consideration of ‘openness’ at its heart. Whilst this is great and important, openness does not automatically equate to usability. In order for data to be usable it has to be driven by the needs of users themselves. Take the Humanitarian eXchange Language (HXL) standard for example, its beauty and functionality emanate from its incredible simplicity and ease of use.

Principle 3: Don’t forget metadata

Metadata standards are arguably the most important prerequisite to joined-up data. Metadata includes information on the source of a piece of data, its author, the version being published and the link to the original dataset. Taken together, this information is crucial to ensuring that both machine and human users can discover, identify and contextualise data. Ensuring that machine-readable metadata formats are standardised and used across data producing institutions and bodies therefore greatly enhances the ability of data to be joined-up.

These attributes make metadata particularly important for the official statistics community as it starts to consider how statistical data can be made open by default. As my colleague Beata Lisowska recently put it in another blog, when it comes to metadata, “in essence, we’re really asking: can we trust this data?”

Principle 4: Use common classifications wherever possible

As more and more data is opened up and proactively published by governments, international institutions, private sector actors, open standard initiatives and others, we need to make sure that the language used – or the classifications to which data is published – is the same. Often, similar information is classified using slightly different definitions, which hinders the machine-readability and so interoperability of that data. Within the international development sector it’s crucial that data standards, such as the International Family of Economic and Social Classifications curated by the official statistical community, are fit for purpose and actively used, or at least linked to, by all stakeholders producing data.

Classifications of organisations and time formats are two cases in point where the absence of universally agreed definitions can seriously inhibit broad-scale interoperability. The identify-org.net site succinctly explains why the issue of organisational identifiers is important: “If my dataset tells you I have contacts with ‘IBM Ltd’. ‘International Business Machines’ and ‘I.B.M’ – how many firms am I working with?” Unique identifiers would go a long way to overcoming basic semantic challenges like this.

Principle 5: Publish data in machine-readable formats

For joined-up data solutions to offer real efficiency gains and value, it’s imperative that a machine is able to do most of the hard work in joining up the data itself. This is already possible but requires many data publishers to change the way they currently publish their data. Publishing data only in PDF or Excel formats is not enough; data must also be published in machine-readable formats such as RDF, XML and JSON. Publishing in these formats would enable a computer to access, identify and filter data in an automated way, making it far simpler and less time-consuming for data users to put data to good use.

These are some of the core principles of interoperability that we’ve uncovered during the course of our research. They offer a starting point for further discussion and we will continue to explore these issues and others with the various stakeholders involved. One thing that we can be sure of already is that to find solutions to interoperability challenges, political will and policy coordination between governments, international organisations, open standard setters and others is key.

Up until the end of March 2017, the Joined-up Data Standards project is inviting feedback on their consultation paper, and will be publishing an updated paper building on these principles and other aspects of their work in the summer of 2017.