Taking Responsibility for Data

One of my early aims for the local election website was to list each candidate, along with party and website, directly on the page. It seemed right that – if I were promoting ease of access – that this should be a basic requirement.

Pretty quickly it became clear that this was an ambitious task - most of the 2000+ wards are in PDF format, some in Word and a tiny minority in HTML. The PDFs are not easily readable (aside – I’d love to know how they fare for disability discrimination tests) so any hope of automating the process went out of the window.

I simply didn’t have the time either. It took two days just to get the links together. A conservative estimate of one minute per document would have me working another week just to get the local elections up.

The bigger problem is liability. While I have rightly disclaimed that there are probably errors, and that users should double-check the council website, trying to present concise summary information based on copy & paste would be risking making mistakes. Disclaimers can probably help protect me against anybody getting nasty, but I neither want the hassle nor the exposure.

With the links as-is, there is a failsafe of sorts. Each linked PDF has the ward and constituency listed at the top of the page. If the viewer sees the wrong name, they know something is amiss.

If I were to put a candidate in the wrong political party, or list the party in the wrong election, the user would have little reason to doubt the results. There is no failsafe.

Had I had more time I might have tried to put something together that enabled a community effort. There are plenty of people out there who support this kind of stuff and would likely donate some time and effort. This additional manpower – mixed with proper QA – could help reduce the risk.

However, I still think a decent dataset from central government and/or local governments would be the ideal solution. They are likely going to take considerable measures to ensure the data is correct, and the candidates will undoubtedly be referring to their lists for completeness and instantly feeding back corrections.

One of the strong benefits of opening data is that the economics can be changed. Effort need not be duplicated; we largely eliminate human error. Time spent recreating these lists is time wasted.