Category Archives: copyright

Open Research in Practice: responding to peer review with GitHub

I wrote a tweet last week that got a bit of unexpected (but welcome) attention;

I got a number of retweets and replies in response, including:

The answers are: Yes, yes, yes and yes. I thought I’d respond in more detail and pen some short reflections on github, collaboration and open research.

The backstory; I had submitted a short paper (co-authored with my colleague David Matthews from the Maths department) to a conference workshop (WWW2014: Social Machines). While the paper was accepted (yay!), the reviewers had suggested a number of substantial revisions. Since the whole thing had to be written in Latex, with various associated style, bibliography and other files, and version control was important, we decided to create a github repo for the revision process. I’d seen Github used for paper authoring before by another colleague and wanted to try it out.

Besides version control, we also decided to make use of other features of Github, including ‘issues’. I took the long list of reviewer comments and filed them as a new issue. We then separated these out into themes which were given their own sub-issues. From here, we could clearly see what needed changing, and discuss how we were going to do it. Finally, once we were satisfied with an issue, that could be closed.

At first I considered making the repository private – I was a little bit nervous to put all the reviewer comments up on a public repo, especially as some were fairly critical of our first draft (although, in hindsight, entirely fair). In the end, we opted for the open approach – that way, if anyone is interested they can see the process we went through. While I doubt anyone will be interested in such a level of detail for this paper, opening up the paper revision process as a matter of principle is probably a good idea.

With the movement to open up the ‘grey literature’ of science – preliminary data, unfinished papers, failed experiments – it seems logical to extend this to the post-peer-review revision process. For very popular and / or controversial papers, it would be interesting to see how authors have dealt with reviewer comments. It could help provide context for subsequent debates and responses, as well as demystify what can be a strange and scary process for early-career researchers like myself.

I’m sure there are plenty of people more steeped in the ways of open science who’ve given this a lot more thought. New services like FigShare, and open access publishers like PLoS and PeerJ, are experimenting with opening up all the whole process of academic publishing. There are also dedicated paper authoring tools that extend on git-like functionality – next time, I’d like to try one of the collaborative web-based Latex editors like ShareLatex or WriteLatex. Either way, I’d recommend adopting git or something git-like, for co-authoring papers and the post-peer-review revision process. The future of open peer review looks bright – and integrating this with an open, collaborative revision process is a no-brainer.

Next on my reading list for this topic is this book on the future of academic publishing by Kathleen FitzPatrick – Planned Obsolescence
— UPDATE: Chad Kohalyk just alerted me to a relevant new feature rolled out by Github yesterday – a better way to track diffs in rendered prose. Thanks!

What can innovators in personal data learn from Creative Commons?

“License Layers” by Creative Commons, used under Creative Commons Attribution 3.0 License

This post was originally published on the Ctrl-Shift website.

A few weeks ago I attended the Creative Commons global summit, as a member of the CC-UK affiliate team, and came away thinking about lessons for the growing personal data ecosystem.

Creative Commons is a non-profit organisation founded in 2003 to create and promote a set of alternative copyright licenses which allow creative works to be legally shared, remixed and built upon by others. Creators can communicate which rights they want to keep, and which they would like to waive. These licenses are now used in education, cultural archives, science, as well as in commercial contexts. By creating a set of legally robust, standardised and easy-to-use licenses, the organisation has turned a complicated and costly legal headache into an usable piece of public infrastructure fit for the digital age.

What lessons does this movement have for the management and use of personal data? In one sense, managing content is radically different to managing personal data. Consumers generally want to be able to restrict the publication of their personal information, while creative content is generally made for public consumption from the outset. But despite the differences, there are some striking parallels – parallels which point to possible innovations in personal data.

Just as for creative works, personal data bridges technical, legal and human challenges. Personal data is stored, transferred and transformed by technology, in ways that are not always captured by the legal terminology. In turn, the law is usually too complex for humans – whether they be data controllers or individual data subjects themselves – to understand. Creative commons licenses translate a complex legal tool into something that both humans and machines can understand. There are easy tools to help creators choose the right license for their work, and a simple set of visual icons help users understand what they can do with the work. By attaching metadata to content, search engines and aggregators can automatically find and organise content according to the licenses applied.

There are already pioneering initiatives which attempt to apply aspects of this approach to personal data. One promising area is privacy policies. Much like copyright licenses, these painfully obscure documents are usually written in legalese, and can’t be understood by humans or parsed by computers. Various projects are working to make them machine-readable, and to develop user-friendly icons to represent important clauses – for instance, whether data is shared with third parties. Conversely, if individuals want to create and share data about themselves, under certain conditions, they may need an equivalent easy-to-use license-chooser.

The personal data ecosystem is in need of public, user-friendly, and standardised tools for managing data. The Creative Commons approach shows this can be done for creative works. Can a similar approach work for personal data?


I’ve been a fan of the Open Rights Group – the UK’s foremost digital rights organisation – for a few years now, but yesterday was my first time attending ORGcon, their annual gathering. The turnout was impressive; upon arrival I was pleasantly surprised to see a huge queue stretching out of Westminster University and down Regent’s Street.

The day kicked off with a rousing keynote from Cory Doctorow on ‘The Coming War On General-Purpose Computing’ (a version of the talk he gave at the last Chaos Communication Camp, [video]). In his typical sardonic style, Doctorow argued that in an age when computers are everywhere – in household objects, medical implants, cars – we must defend our right to break into them and examine exactly what they are doing.  Manufacturers don’t want their gadgets to be general-purpose computers, because this enables users to do things that scare them. They will disable computers that could be programmed to do anything, lock them down and turn them into appliances which operate outside of our control and obscured from our oversight.

Doctorow mocked the naive attempts of the copyright industries to achieve this using digital locks – but warned of the coming legal and technological measures which are likely to be campaigned for by industries with much greater lobbying power. In the post-talk Q&A session, an audience member linked the topic to the teaching of IT in schools; the need for children to understand from an early age how to look inside gadgets, understand how they work and that they may be operating against the users best interests.

As is always the way with parallel sessions, throughout the day I found myself wanting to be in multiple places at once. I opted to hear Wendy Seltzer give a nice summary of the current state of digital rights activism. She likened the grassroots response to SOPA and PIPA to an immune system fighting a virus. She warned that, like an overactive immune system, we run the risk of attacking the innocuous. If we cry wolf too often, legislators may cease to listen. She went on to imply that the current anti-ACTA movement is guilty of this. Personally, I think that as long as such protest is well informed, it cannot do any harm and hopefully will do some good. Legislators are only just beginning to recognise how serious these issues are to the ‘net generation’, and the more we can do to make that clear, the better.

The next hour was spent in a crowded and stuffy room, watching my Southampton colleague Tim Davies grill Chris Taggart (OpenCorporates), Rufus Pollock (OKFN), and Heather Brooke (journalist and author) about ‘Raw, Big, Linked, Open: is all this data doing us any good?’ The discussion was interesting and good to see this topic, which has until recently been confined to a relatively niche community, brought to an ORG audience.

After discussing university campus-based ORG actions over lunch, I went along to a discussion of the future of copyright reform in the UK in the wake of the Hargreaves report. Peter Bradwell went through ORG’s submission to the government’s consultation on the Hargreave’s measures. Saskia Wazkel from Consumer Focus gave a comprehensive talk and had some interesting things to say about the role of consumers and artists themselves in copyright reform. Emily Goodhand (more commonly known as @copyrightgirl on twitter) spoke about the University of Reading’s submission, and her perspective of as Copyright and Compliance officer there. Finally Professor Charlotte Waelde, head of Exeter Law School, took the common call for more evidence-based copyright policy and urged us to ask ‘What would evidence-based copyright policy actually look like?’. Particularly interesting for me, as both an interdisciplinary researcher and believer in evidence-based policy, was her question about what mixture of disciplines are needed to create conclusions to inform policy. It was also encouraging to see an almost entirely female panel and chair in what is too often a male-dominated community.

I spent the next session attending an open space discussion proposed by Steve Lawson, a musician, about the future of music in the digital age. It was great to hear the range of opinions – from data miners, web developers and a representative from the UK Pirate Party – and hear about some the innovations in this space. I hope to talk to Steve in more detail soon in lieu of a book I’m working on about consumer ethics/activism for the pirate generation.

Finally, we were sent off with a talk from Larry Lessig, on ‘recognising the fight we’re in’. His speech took in a bunch of different issues: open access to scholarly literature; the economics of the radio spectrum (featuring a hypothetical three way battle between economist Robert Coase, dictator Joseph Stalin and singer Hetty Lamar [whom I’d never heard of but apparently co-invented ‘frequency hopping’ which paved the way for modern day wireless communication]); and corruption in the US political system, the topic of his latest book.

In the Q+A I asked his opinion on academic piracy (the time honoured practice of swapping PDFs to get around lack of institutional access, which has now evolved into the twitter hashtag phenomenon #icanhazPDF), and whether he prefers the ‘green’ or ‘gold’ routes to open access. He seemed to generally endorse PDF-swapping. He came down on the side of ‘gold’ open access (where publishers become open-access), rather than ‘green’ (where academic departments self-archive), citing the importance of being able to do data-mining. I’m not convinced that data-mining isn’t possible under green OA; so long as self-archiving repositories are set up right (for example, Southampton’s eprints software is designed to enable this kind of thing).

After Lessig’s talk, about a hundred sweaty, thirsty digital rights activists descended on a nearby pub, then pizza, then said our goodbyes until next time. All round it was a great conference; roll on ORGcon2013.