Further Findings – #09.
What is data quality?
Back to OverviewWe’ve been pushing for improvements in the quality of donors’ data for some years now. What is data quality? There are lots of definitions and this is certainly not an exhaustive list, but there are four main areas we think of in terms of data quality. Note that this doesn’t delve into frequency, timeliness, or coverage: we’re focusing here on what is contained in the data that does get published.
1. Correctly structured
The IATI Standard provides a standard way in which data must be structured. Publishers can check their files conform to the requirements of the IATI schema by using the IATI Validator. Although the Index doesn’t require data to pass validation, we encourage all publishers to check their files against the validator. If files don’t pass validation, it’s more likely that they will fail certain tests.
For example, if a donor uses
<status>Implementation</status>
rather than…
<activity-status>Implementation</activity-status>
… then the Aid Transparency Tracker won’t be able to pick up the data, and the publisher will not score for those activities that are incorrectly structured. This is because data that does not conform to the IATI schema cannot be compared across publishers. The vast majority of the IATI files assessed for the Aid Transparency Index are correctly structured.
Codelists
IATI codelists are crucial to ensuring that IATI activity and organisation data from different publishers is comparable. For example, the Activity Status codelist makes it possible to ask for all the activities that are in a particular stage (1 = Pipeline/identification; 2 = Implementation, etc.). In 2014, we ran more tests to check whether files were using the correct codelists. In the example given above, the <activity-status>
is missing the code
attribute entirely. We found that a few donors used no code or an unrecognised code; scoring 100% for an indicator required both that the element was correctly structured and that the code was on a recognised codelist, in this case the ActivityStatus codelist.
2. Which fields are used
The Index focuses particularly on which fields are used at the organisation and activity level. In terms of each activity, we look at whether there’s associated results data, sub-national location (geocoding), etc. We don’t require that for all activities – for example, we only expect an evaluation to exist if the project is in the completion stage (NB – relating to point 1 above, if the activity status isn’t specified then the activity will always be expected to have the information).
Aside from these caveats, we encourage publishers to include as much information as possible about each activity.
Minimum requirements for an activity
The Index is quite lenient in terms of what it requires to exist for each activity. A unique IATI identifier is required for each activity, and data will be rejected without this. Furthermore, to count as “current data” – the only data that we take into account for the Index (i.e. we exclude historical data) – one activity start, end, or transaction date is required.
However, if activities don’t have activity status or aid type, we can’t exclude those activities where it would be inappropriate to include them (e.g. evaluations aren’t expected for activities that are currently in the Implementation stage). It’s important to include these elements alongside each activity to be able to identify the type of information that can be expected at any given stage of the project cycle..
3. Contains meaningful information
In 2014 we also began to look at the information within some of the important fields – for example, whether documents included the information they said they did, or whether location data was really sub-national and appropriate to the activity or just tagging the location of the country. We didn’t assess whether the information was correct, or pass any other value judgement on the information (for example, whether it was a “good” evaluation).
When sampling IATI activities we found that most contained meaningful information. However, some activities were linked to generic pages rather than specific information about that activity, whilst for others the links had expired.
4. Contains the correct information
Something that we don’t do is to check whether the information provided is really correct. For example, if the amounts are correct, or the appropriate sectors are used, or if the project even exists. Our aim is to open up good quality, comparable data so that others can easily access and use it – the next step is to get their feedback on the data.