February | 2014 | reuben binns | data, tech, policy

I wrote a tweet last week that got a bit of unexpected (but welcome) attention;

Co-author and I are using @github to revise a conference paper after peer review. Creating 'issues' to track progress. #openscience

— Reuben Binns @RDBinns@someone.elses.computer (@RDBinns) February 10, 2014

I got a number of retweets and replies in response, including:

https://twitter.com/chadkoh/statuses/433012506097745920

@rdbinns @openscience Can you share more on using @github for reports etc?

— GIGAmacro (@giga_macro) February 12, 2014

https://twitter.com/n3siy/status/433462895116943360

@RDBinns is it a public repository?

— Craig B (@NeuroCraig) February 12, 2014

The answers are: Yes, yes, yes and yes. I thought I’d respond in more detail and pen some short reflections on github, collaboration and open research.

The backstory; I had submitted a short paper (co-authored with my colleague David Matthews from the Maths department) to a conference workshop (WWW2014: Social Machines). While the paper was accepted (yay!), the reviewers had suggested a number of substantial revisions. Since the whole thing had to be written in Latex, with various associated style, bibliography and other files, and version control was important, we decided to create a github repo for the revision process. I’d seen Github used for paper authoring before by another colleague and wanted to try it out.

Besides version control, we also decided to make use of other features of Github, including ‘issues’. I took the long list of reviewer comments and filed them as a new issue. We then separated these out into themes which were given their own sub-issues. From here, we could clearly see what needed changing, and discuss how we were going to do it. Finally, once we were satisfied with an issue, that could be closed.

At first I considered making the repository private – I was a little bit nervous to put all the reviewer comments up on a public repo, especially as some were fairly critical of our first draft (although, in hindsight, entirely fair). In the end, we opted for the open approach – that way, if anyone is interested they can see the process we went through. While I doubt anyone will be interested in such a level of detail for this paper, opening up the paper revision process as a matter of principle is probably a good idea.

With the movement to open up the ‘grey literature’ of science – preliminary data, unfinished papers, failed experiments – it seems logical to extend this to the post-peer-review revision process. For very popular and / or controversial papers, it would be interesting to see how authors have dealt with reviewer comments. It could help provide context for subsequent debates and responses, as well as demystify what can be a strange and scary process for early-career researchers like myself.

I’m sure there are plenty of people more steeped in the ways of open science who’ve given this a lot more thought. New services like FigShare, and open access publishers like PLoS and PeerJ, are experimenting with opening up all the whole process of academic publishing. There are also dedicated paper authoring tools that extend on git-like functionality – next time, I’d like to try one of the collaborative web-based Latex editors like ShareLatex or WriteLatex. Either way, I’d recommend adopting git or something git-like, for co-authoring papers and the post-peer-review revision process. The future of open peer review looks bright – and integrating this with an open, collaborative revision process is a no-brainer.

Next on my reading list for this topic is this book on the future of academic publishing by Kathleen FitzPatrick – Planned Obsolescence
— UPDATE: Chad Kohalyk just alerted me to a relevant new feature rolled out by Github yesterday – a better way to track diffs in rendered prose. Thanks!

In an ideal world, our collective medical records would be a public good, carefully stewarded by responsible institutions, used to derive medical insights and manage public health better. This is the basic premise of the care.data scheme, and construed as such it suggests a simple moral equation with an obvious answer; give up a little individual privacy for the greater public good. The problem is, our world is not ideal. We’re in the midst of multiple crises of trust in government, the private sector and the ability of our existing global digital infrastructure to adequately deal with the challenges of personal data.

The NHS conducted a privacy impact assessment for the care.data scheme, to identify and weigh its risks and benefits. In discussing why citizens might choose to opt-out of sharing their own data (as 40% of surveyed GP’s said they would), the final paragraph is both infuriating and revealing:

‘However, some people may believe that any use of patient identifiable data without explicit patient consent is unacceptable. These people are unlikely to be supportive of care.data whatever its potential benefits and may object to the use of personal confidential data for wider healthcare purposes.’

In other words, there are some people who will selfishly exercise their individual rights to privacy (for whatever misguided reasons), to the cost and detriment of the public good.

While the leaflet promoting the scheme encourages donating ones data as a contribution to the public health service, even left-wing Bevanites have reason to be sceptical. While many of us instinctively trust ‘our NHS’, the truth is large parts of it are no longer ‘ours’, and the care.data scheme is a perfect example. As expected, the contract to provide the ‘data extraction’ service was won by an unnaccountable private sector provider (Atos, who are also responsible for disability benefit assessments), while some of the main beneficiaries of all the data itself will be a plethora of commercial entities.

This is not to say that private sector use of health data is inherently bad. The trouble with the care.data scheme goes deeper than that; it is a microcosm of a much wider malaise about the future of personal data and the value of privacy.

The social contract governing the use of our health information was written for a different age, where ‘records’ meant paper, folders and filing cabinets rather than entries in giant, mine-able databases. This social contract (if it ever even existed) never granted a mandate for the new kinds of purposes HSCIC proposes.

Such a mandate would have to be based on a realistic and robust assessment of the long-term risks and a stronger regulatory framework for downstream users. Crucially, it would need to proactively engage citizens, enabling them to make informed choices about their personal data and its role in our national information infrastructure. Rather than seizing this opportunity to negotiate a new deal around data sharing, the architects of this scheme have attempted to hush it in through the backdoor.

Thankfully, there are alternative ways to reap the benefits of aggregated health data. One example is Swiss initiative HealthBank.ch, a patient data co-operative, owned and run by its members. By giving patients themselves a stake and a say in the governance of their data, the project aims to harness that data to ‘benefit the individual citizen and society without discrimination and invasion into privacy’.

Personal data collected unethically is like bad debt. You can aggregate it into complex derivatives, but in the end it’s still toxic. If the NHS start out on the wrong foot with health data, no amount of beneficial re-use will shore up public trust when things go wrong.

reuben binns | data, tech, policy

Monthly Archives: February 2014

Open Research in Practice: responding to peer review with GitHub

Care.Data: Why we need a new social contract for personal health data