Monthly Archives: July 2013

Southampton CyberSecurity Seminar

I recently delivered a seminar for the Southampton University Cyber Security seminar series. My talk introduced some of the research I’ve been doing into the UK’s Data Protection Register, and was entitled ‘Data Controller Registers: Waste of Time or Untapped Transparency Goldmine?’.

The idea of a register of data controllers came from the EU Data Protection Directive, which set out a blueprint for member state’s data protection laws. Data controllers – any entity responsible for collection and use of personal data – must provide details about the purposes of collection, categories of data subjects, categories of personal data, any recipients, and any international data transfers, to the supervisory authority (in the UK, this is the Information Commissioner’s Office). This represents a rich data source on the use of personal data by over 350,000 UK entities.

My talk explored some initial results from my research into 3 years worth of data from this register. A number of broad trends have been identified, including;

  • The amount of personal data collection reported is increasing. This is measured in terms of the number of distinct register entries for individual instances of data collection, which have increased by around 3% each year.
  • There are over 60 different stated reasons for collection of data, with ‘Staff Administration’, ‘Accounts & Records’ and ‘Advertising, Marketing & Public Relations’ being the most popular (outnumbering all other purposes combined).
  • The categories of personal data collected exhibit a similar ‘long tail’, with ten very common categories (including ‘Personal Details’, ‘Financial Details’ and ‘Goods or Services Provided’) accounting for the majority of instances.
  • In terms of transfers of data outside the EU, the vast majority of international data transfers are described as ‘Worldwide’. Of those who do specify, the most popular countries are the U.S., Canada, Australia, New Zealand and India.

Beyond these general trends, I explored one particular category of personal data collection which has been raised as a concern in studies of EU public attitudes, namely, trading and sharing of personal data. The kinds of data likely to be collected for this purpose are broadly reflective of the general trends, with the exception of ‘membership details’, which are far more likely to be collected for the purpose of trading.

Digging further into this category, I selected one particularly sensitive kind of data – ‘Sexual Life’ – to see how this was being used. This uncovered 349 data controllers who hold data about individual’s sexual lives, for the purpose of trading and sharing with other entities (from the summer 2012 dataset). I visualised this activity as a network graph, looking at the relationship between individual data controllers and the kinds of entities they share this information with. By clicking on blue nodes you can see individual data controllers, while categories of recipients are in yellow

I also explored how this dataset can be used to create personalised transparency tools, or to ‘visualise your digital footprint’. By identifying the organisations, employers, retailers and suppliers who have my personal details, I can pull in their entries from the register in order to see who knows what about me, what kinds of recipients they’re sharing it with and why. A similar interactive network graph shows a sample of this
Open data is often seen as in tension with privacy. However, through this research I hope to demonstrate some of the ways that open data can address privacy concerns. These concerns often stem from a lack of transparency about the collection and use of personal data by data controllers. By providing knowledge about data controllers, open data can be a basis for accountability and transparency about the use (or abuse) of personal data.

Data on Strike

What happens to a smart city when there’s no access to personal data?

IMG_20130710_123158
Last week I had the pleasure of attending the Digital Revolutions Oxford summer school, a gathering of PhD’s doing research into the ‘digital economy’. On the second day, we were asked to form teams and engage in some wild speculation. Our task was to imagine a news headline in 2033, covering some significant event that relates to the research we are currently undertaking. My group took this as an opportunity to explore various utopian / dystopian themes relating to power struggles over personal data, smart cities and prosthetic limbs.

The headline we came up with was ‘Data Strike: Citizens refuse to give their data to Governments and Corporations’. Our hypothesis was that as ‘smart cities’ materialise, essential pieces of infrastructure will become increasingly dependent on the personal data of the city’s inhabitants. For instance, the provision of goods and services will be carefully calibrated to respond and adjust to the circumstances of individual consumers. Management of traffic flow and transportation systems will depend on uninterrupted access to every individual’s location data. Distributed public health systems will feed back data live from our immune systems to the health authorities.

In a smart city, personal data itself is as critical a piece of infrastructure as you can get. And as any observer of strike action will know, critical infrastructure can quickly be brought to a halt if the people it depends on decide not to co-operate. What would happen in a smart city if its inhabitants decided to go on a data strike? We imagined a city-wide personal data blackout, where individuals turn off or deliberately scramble their personal devices, wreaking havoc on the city’s systems. Supply chains would misfire as targeted consumers dissappear from view. Public health monitoring signals would be scrambled. Self-driving cars would no longer know when to pick up and drop off passengers – or when to stop for pedestrians.

We ventured out into the streets of Oxford to see what ‘the public’ thought about our sensational predictions, and whether they would join the strike. I had trouble selling the idea of a ‘data co-operative’ to sceptical passengers waiting at the train station, but was surprised by the general level of concern and awareness about the use of personal data. As a break from dry academic work, this exercise in science fiction was a bit of light relief. But I think we touched on a serious point. Smart cities need information infrastructure, but ensuring good governance of this infrastructure will be paramount. Otherwise we may sleepwalk into a smart future where convenience and efficiency are promoted at the expense of privacy, autonomy and equality. We had better embed these values into smart infrastructure now, while the idea of a data strike still sounds ridiculous.

Thanks to Research Council’s UK Digital Economy Theme, Know Innovation and the Oxford CDT in healthcare innovation, for funding / organising / hosting the event. More comprehensive coverage can be found over on Chris Phethean’s write-up.

5 Stars of Personal Data Access

As a volunteer ‘data donor’ at the Midata Innovation Lab, I’ve recently been attempting to get my data back from a range of suppliers. As our lives become more data-driven, an increasing number of people want access to a copy of the data gathered about them by service providers, personal devices and online platforms. Whether it’s financial transactions data, activity records from a Fitbit or Nike Fuelband, or gas and electricity usage, access to our own data has the potential to drive new services that help us manage our lives and gain self-insight. But anyone who has attempted to get their own data back from service providers will know the process is not always simple. I encountered a variety of complicated access procedures, data formats, and degrees of detail.

For instance, BT gave me access to my latest bill as a CSV file, but previous months were only available as PDF documents. And my broadband usage was displayed as a web page in a seperate part of the site. Wouldn’t it be useful to have everything – broadband usage, landline, and billing – in one file, covering, say, the last year of service? Or, even better, a secure API which would allow trusted applications to access the latest data directly from my BT account, so I don’t have to?

Another problem was that in order to get my data, I sometimes had to sign up for unwanted services. My mobile network provider, GiffGaff, require me to opt-in to their marketing messages in order to receive my monthly usage report. FitBit users need to pay for a premium account to get access to the raw data from their own device.

Wouldn’t it be nice to rate these services according to a set of best practices? In 2006, when the open data movement was in its infancy, Tim Berners-Lee defined ‘Five Stars of Open Data‘ to describe how ‘open’ a data source is. If it’s on the web under an open license, it gets one star. Five stars means that it is in a machine-readable, non-proprietary format, and uses URI’s and links to other data for context. While we don’t necessarily want our private, personal data to be ‘open’ in Berners-Lee’s sense, we do want standard ways to get access to our personal data from a service. So, here are my suggested ‘Five Stars of Personal Data Access’ (to be read as complementary, not necessarily hierarchical):

1. My data is made available to me for free in a digital form. For instance, through a web dashboard, or email, rather than as a paper statement. There are no strings attached; I do not need to pay for premium services or sign up to marketing alerts to read it.

2. My data is machine-readable (such as CSV rather than PDF).

3. My data is in a non-proprietary format (such as CSV, XML or JSON, rather than Excel).

4. My data is complete; all the relevant fields are included in the same place. For instance, usage history and billing are included in the same file or feed.

5. My data is up-to-date; available as a regularly-updated feed, rather than a static file I have to look up and download. This could be via a secure API that I can connect trusted third-party services to.

The Midata programme has considered these issues from the outset, calling for suppliers to adopt common procedures and formats. Simplifying this process is an important step towards a world where individuals are empowered by their own data. My initial attempts to get my data back from suppliers point to a number of areas for improvement, which I’ve tried to reflect in these star ratings. Of course, there’s lots of room for debate over the definitions I’ve given here. And I’m sure there are other important aspects I’ve missed out. What would you add?