Figuring out Fixes for the IATI Dashboard and Publishing Errors
A month or so ago, the IATI Dashboard looked a bit worse for wear:
It had a big yellow banner announcing “technical issues”, and the footer text showed the data hadn’t been refreshed in 2 months. That’s problematic, because the Dashboard provides important statistics, charts and metrics relating to IATI data in the Registry. For example, the Dashboard can tell you how many publishers (847 when checked today), organisation files (594), activity files (5,996) and unique activities were present at last count (done every three days). More importantly perhaps, the Dashboard can tell you how many download errors (115), XML errors (15), and schema validation errors (725) exist within the data. In many cases this information highlights that the vast majority of errors come from just a few publishers’ data, and often for one or two relatively straightforward inputting mistakes which are repeated in their data. So if the Dashboard is broken, identifying these errors is difficult, and thus many publishers are left unaware that their data may not be working properly.
Not to mention, the IATI tech team had been losing one day per month of developer time to fix intermittent issues with the dashboard. IATI reached out to Publish What You Fund to ask if we could help fix it. Fortunately, between two projects (a data visualisation tool for org file data, and something too top secret to mention) we were able to donate some developer time, so that I could work with Imogen and John at IATI to investigate the problem, and get some fixes merged.
We sent a couple of fixes, including this one, which makes dashboard generation about 22x faster. Imogen and John provided before and after logfiles, to confirm the fixes had worked. A month on, and the dashboard is back to normal, updating every three days without the need for further intervention. And because it’s now working, we couldn’t help having a dig into the latest data and seeing what we could find. We started by taking a look at schema validation failures. These highlight instances where publishers’ IATI data doesn’t conform to the standard, so could be misinterpreted or even fail to import to tools like d-portal or the datastore. Not to mention of course that these errors could indicate more fundamental issues with the data.
So what did we find when we looked into the data? Of the 725 failures, 75% (540 failures), were down to just five publishers, and actually three of these are agencies of the same donor. When you dig into the data further it looks like this:
Donor 1
- Agency 1 – 196 out of 197 files fail schema validation (99%). It looks like the problem started in November 2017 and is probably related to their upgrade to v2 of the standard which happened around the same time.
- Agency 2 – 137 out of 139 files fail schema validation (99%). It appears this problem started in January 2018 and it looks like the bulk of the errors relate to missing `value-date`s. With a bit more digging we’ll uncover the reasons behind the other failures.
- Agency 3 – 57 out of 57 files fail schema validation (100%). As per agency 1, it appears that things went wrong when they upgraded to v2 of the standard in November 2017.
Donor 2
- 86 out of 177 files fail schema validation (49%). On first inspection it appears the problem started in March 2018 and the bulk of the errors relate to a series of element ordering issues (the information is not in a specific order e.g. the results for each activity need to come at the end). This relates specifically to version 2 of the standard, and is fixed by simply rearranging elements.
Donor 3
- 64 out of 79 files fail schema validation (81%). Again, this one dates back to March 2018 but in this case the error relates to a consistent date format error. However, this one looks suspicious because the problematic date is always the same, suggesting there may be some more fundamental problem.
We’re going to do a bit more digging and then, seeing as how there is currently no feedback mechanism in place to let these publishers know that there are issues with their data, we’re going to give them a call and see if we can help them with a fix.