Enigma Machine – Version 2

After finishing my wooden Enigma stepping mechanism in 2020, I started working on building a fully functioning electro-mechanical system with hand-turned rotors, interoperable with original Enigma machines.

After nearly two years, I’ve finally finished! I’m going to do a longer post going into more detail on the construction, but first I wanted to share some photos. Some people have asked for a copy of the plans used to cut out the parts on the laser cutter, so I’m sharing them here.

I have a twitter thread logging the work-in-progress here:

And another about the finished machine:

A video of the latest version in action:

The plans are available as a PDF file (EPS also available on request). They are licensed under CC-BY. They are designed to be laser cut into 3mm MDF.

Handmade Enigma Machine

To take my mind off all this plague, I’ve been making a sort of replica of an Enigma machine.

This version is a subset of the full Enigma machine. It includes the rotors and stepping mechanism, but no keyboard and electrical subsystem. I took a lot of inspiration from this similar project.

Since we can’t go out for fear of catching / spreading a deadly virus, I used bits of scrap lying around the house. I didn’t follow any plans, just guesswork, trial and error. The wood is mostly MDF from old IKEA draws, and there are lots of barbecue skewers and a bicycle inner tube in there too.

I wanted to minimise use of power tools (really don’t want to have to go to hospital right now!), so I cut everything by hand using a Japanese Dozuki saw, which is incredible for making precise cuts in soft wood (I can’t believe people still use any other kind of saw for this sort of work).

There are a few major differences to the real Enigma stepping mechanism:

First, there is space for 36 characters rather than the original 26, so you can encode numbers 0-9 as well as letters A-Z. This means each rotor has 36 teeth, which means there are a lot of tiny cuts to make; I slightly regretted this decision after cutting the third rotor.

Second, rather than making a continuous wavy notch to control the stepping (tricky to cut by hand), I decided to make a separate pair of ratchets and palls going in the opposite direction, to stop the rotor over-stepping, which are attached to the primary ratchet and (pushing) palls. This second set of palls hides just below the primary. This makes it slightly more fiddly to assemble underneath, especially with their own individual little harnesses, but they do look pretty cute.

Does it work?

The double-stepping mechanism works, with a satisfying chorus of clicking noises, although it’s pretty janky and requires frequent re-adjusting of the palls. See videos posted on twitter:

While it obviously doesn’t have the electrical subsystem, keyboard, and lamps, it can be used manually to do something like a fully-functional Enigma encryption / decryption in a couple of ways:

  1. Stepping only: To encrypt, you tap the single ‘key’ (which would connect to the keyboard in a full system) for every letter you want to encrypt, which turns at least the first rotor (and maybe also the second, and maybe also the third rotor). You then read across from the letter on the first rotor to the equivalently-placed letter on the third rotor. To decrypt, you do the same in reverse, making sure that the starting position of the rotors is the same as during the encryption.
  2. Stepping and scrambling: as above, but you can also simulate the internal electrical wiring inside each rotor in a real Enigma, which scrambles the input + output. This can be done by maintaining separate tables for each rotor which indicate the scrambled key pairs, or by drawing two sets of characters directly on each rotor, and arrows to indicate how they are to be scrambled (like this very clever and cute paper-and-Pringle can enigma machine).

Self-sufficient programming: RSA cryptosystem with plain Python

[WARNING: this is an exercise purely for fun: it is definitely very insecure, do not use this code in a real system and do not attempt to write your own cryptographic functions!]

Despite working in computer science, these days I barely have the need to write any code in my day-to-day work. This is not unusual for someone working in CS, especially those of us at the ‘softer’ edges. So I decided to set myself a little project to get back into it.

I’ve also been thinking about how the development stack for most programming languages is so dependent on a byzantine labyrinth of third party libraries and packages. This is great in many ways, because whatever you want to do, someone else has probably already done it better and more secure.

But it also means that most applications are dependent on code written and maintained by other people. In some cases this can lead to an unexpected global mess, such as the time when a developer of several popular NPM libraries pulled them from the package management system, breaking the ‘Jenga tower of Javascript‘.

While I don’t think this is actually the answer to the above problems, it could be a fun exercise to see how many of the basic building blocks of modern computing could be re-created from scratch. By me. Someone whose programming has always been a little shoddy and now very out of practice. Armed with nothing more than a few scribbled notes and old slides covering undergraduate-level CS, and using only basic Python (i.e. the Standard Library, no external packages). This is a challenge, because while Python supposedly ships with all the basic stuff you should need (‘batteries included’), in practice this is arguably not true.

DIY RSA

What project to pick? I decided to have a go at implementing the RSA cryptosystem for asymmetric key cryptography, a pretty fundamental part of modern computer security. This seemed like a good challenge for two reasons:

  • RSA is not in the Python standard library, and requires various functions which I’d naturally go looking for in external libraries (e.g. finding coprimes).
  • In debates about national security, it is often said that governments can’t effectively ban encryption because it’s not any particular piece of software, it’s just math. Anyone who knows RSA and how to program can implement it themselves given a general purpose computer. But how many people does that include? Can even I, a non-cryptographer with a basic knowledge of how it works in theory, create an actual working version of RSA?

So as an entirely pedagogical / auto-didactical weekend project I’m going to have a go. I’m sure that it will be highly insecure, but I’ll be happy if I can make something that just about works.

I’m allowing myself access to material describing the RSA system (e.g. lecture slides, the original paper), and to stackexchange for basic Python syntax etc. that I’ve forgotten. But no use of external libraries or peeking at existing Python implementations of RSA.

So far I’ve got the following steps of the key generation process:

Pick two large prime numbers:

First we need to pick two large prime numbers at random. So first, we’ll get the user to type some random keys on the keyboard (this will be familiar if you’ve used e.g. Gnu-PG):

user_entropy = input("please generate some entropy by typing lots of random characters: ")

entropy = 0
for letter in user_entropy:
entropy = entropy + ord(letter)

That turns the user input into a single number. Then we need to find the nearest prime number, with two functions:

def isPrime(num):
for i in range(2,num):
if (num % i) == 0:
prime = False
else:
prime = True
return prime
def find_nearest_prime(num):
while num < 100000:
if isPrime(num):
return num
else:
num += 1

Get N and Φ(N)

Now we have two prime numbers, we multiply then together to get the composite number N which will be the second part of the private and public keys.

n = prime1*prime2

We also need Φ(N) (the ‘totient’):

phi_n = ((prime1-1)*(prime2-1))

Get e (public key)

Now we have to find e, the public key component. The easy bit condition is that it has to be between 1 and Φ(N). The more tricky condition is that it has to be coprime with both N and Φ(N). Two numbers are coprime if they have no common factors other than 1. So first thing we need is a function to find the factors of a number:

def get_factors(num):
factors = []
for i in range(2,num):
if ((num % i) == 0):
factors.append(i)
return factors

Now we can write a function to check if two numbers have common factors other than 1:

def isCoprime(num1,num2):
num1_factors = get_factors(num1)
num2_factors = get_factors(num2)
if set(num1_factors).isdisjoint(set(num2_factors)):
# print('no common factors - they coprime!')
return True
else:
# print('there are common factors, not coprime')
return False

Now we can write a function to find values for e that will satisfy those conditions:

def find_e(n,phi_n):
candidates = []
for i in range(3,n):
if isPrime(i):
if((isCoprime(i,n)) and (isCoprime(i,phi_n))):
candidates.append(i)
return candidates

This returns a list of potential values for e which we can pick.

Get d (private key)

How about the private key for decrypting messages (d)? This should be the multiplicative inverse of e; that means that when multiplied by e, it should equal 1. My notes say this is equivalent to: e * d = 1 mod N. This is where I’ve run into trouble. My initial attempts to define this function don’t seem to have worked when I tested the encryption and decryption functions on it. It’s also incredibly slow.

At this point I’m also not sure if the problem lies somewhere else in my code. I made some changes and now I’m waiting for it to calculate d again. Watch this space …

[UPDATE: I got it working …]

OK, so it seems like it wasn’t working before because I’d transcribed the multiplicative inverse condition wrong. I had had (d * e) % N, but it’s actually (d * e) Φ(N). So the correct function is:

def find_d(prime1,n):
for i in range(prime1,n):
if (((i*e) % phi_n) == 1):
print(i)
return i

So that’s the basic key generation functions done. I put in some command line interactions to get this all done, and save the keypair to a text file in the working directory.

Encryption and decryption

With the keypairs saved as dictionaries, the encryption and decryption functions are relatively simple:

def encrypt(pt):
return (pt ** public_key['e']) % public_key['n']
def decrypt(ct):
return (ct ** private_key['d'] % public_key['n'])

At the moment, this only works to encrypt integers rather than text strings. There are various ways we could handle encoding text to integers.

Functional … just!

So there we go. A day was just enough to put together a minimally functional RSA cryptosystem.

The main issue is that even for pairs of small prime numbers, it takes a while to find e. Keysizes in the 10-20’s range are pretty quick to compute, but NIST recommends asymmetric keys should be at least 2048-bits. Trying to generate a key this big means leaving the script running for a long, long time.

There are probably loads of ways I could improve the code. Also it would be better to default to a higher value for e. Finally, the default key management is basically nothing (a unencrypted plaintext file).

How to comply with GDPR Article 22? Automated credit decisions

This post explores automated decision-making systems in the context of the EU General Data Protection Regulation (GDPR). Recent discussions have focused on what exactly the GDPR requires of data controllers who are implementing automated decision-making systems. In particular, what information should be provided to those who are subject to automated decisions? I’ll outline the legal context and then present an example of what that might mean at a technical level, working through a very simple machine learning task in an imaginary credit lending scenario.

The legal context

In May next year, the GDPR will come into force in EU member states (including the UK). Part of the Regulation that has gained a fair amount of attention recently is Article 22, which sets out rights and obligations around the use of automated decision making. Article 22 gives individuals the right to object to decisions made about them purely on the basis of automated processing (where those decisions have significant / legal effects). Other provisions in the GDPR (in Articles 13,14, and 15) give data subjects the right to obtain information about the existence of an automated decision-making system, the ‘logic involved’ and its significance and envisaged consequences. Article 22 is an updated version of Article 15 in the old Data Protection Directive. Member states implemented the Directive into domestic law around a couple of decades ago, but the rights in Article 15 of the Directive have barely been exercised. To put it bluntly, no one really has a grip on what it meant in practice, and we’re now in a similar situation with the new regulation.

In early proposals for the GDPR, the new Article 22 (Article 20 in earlier versions) looked like it might be a powerful new right providing greater transparency in the age of big data profiling and algorithmic decision-making. However, the final version of the text significantly watered it down, to the extent that it is arguably weaker in some respects than the previous Article 15. One of the significant ambiguities is around whether Articles 13, 14, 15, or 22 give individuals a ‘right to an explanation’, that is, an ex post explanation of why a particular automated decision was made about them.

Explaining automated decision-making

The notion of a ‘right to an explanation’ for an automated decision was popularised in a paper which garnered a lot of media attention in summer of last year. However, as Sandra Wachter and colleagues argue in a recent paper, the final text carefully avoids mentioning such a right in the operative provisions. Instead, the GDPR only gives the subjects of automated decisions the right to obtain what Wachter et al describe as an ex ante ‘explanation of system functionality’. Under Articles 15 (1) h, and 14 (2) g, data controllers must provide ‘meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing’ to the data subject. But neither of these provisions amount to an ex post explanation for a particular decision that has been made. The only suggestion of such a right appears in a Recital (71), which says appropriate safeguards should include the ability of data subjects ‘to obtain an explanation of the decision reached after such assessment’.

Since Recitals are not operative provisions and therefore not binding, this suggests that there is no ex post ‘right to explanation’ for specific decisions in the GDPR. It is conceivable that such a right becomes established by the court at a later date, especially if it were to pay close attention to Recital 71. It is also possible that member state DPAs, and the new European Data Protection Board in its harmonisation role, interpret Article 22 in this way, and advise data controllers accordingly. But until then, it looks like data controllers will not be required to provide explanations for specific automated decisions.

Having followed discussions about this aspect of the GDPR since the first proposal text was released in 2012, one of the difficulties has been a lack of specific and detailed examples of how these provisions are supposed to operate in practice. This makes it hard to get a grip on the supposed distinction between a ‘right to explanation of a decision’ and a mere ‘right to an explanation of system functionality’.

If I’m entitled as a data subject to an explanation of a system’s ‘functionality’, and its ‘likely effects’ on me, that could mean a lot of things. It could be very general or quite detailed and specific. At a certain level of generality, such an explanation could be completely uninformative (e.g. ‘based on previous data the model will make a prediction or classification, which will be used to make a decision’). On the other hand, if the system were to be characterised in a detailed way,  showing how particular outputs relate to particular inputs (e.g. ‘applicants with existing debts are 3x less likely to be offered credit’), it might be possible for me to anticipate the likely outcome of a decision applied to me. But without looking at specific contexts, system implementations, and feasible transparency measures, it’s difficult to interpret which of these might be feasibly required by the GDPR.

A practical example: automated credit decisions

Even if legal scholars and data protection officers did have a clear idea about what the GDPR requires in the case automated decision making systems, it’s another matter for that to be implemented at a technical level in practice. In that spirit, let’s work through a specific case in which a data controller might attempt to implement an automated decision-making system.

Lots of different things could be considered as automated decision-making systems, but the ones that are getting a lot of attention these days are systems based on models trained on historical data using machine learning algorithms, whose outputs will be used to make decisions. To illustrate the technology, I’m going to explain how one might build a very simple system using real data (note: this is not intended to be an example of ‘proper’ data science; I’m deliberately going to miss out some important parts of the process, such as evaluation, in order to make it simpler).

Imagine a bank wants to implement an automated system to determine whether or not an individual should be granted credit. The bank takes a bunch of data from previous customers, such as their age, whether or not they have children, and the number of days they have had a negative balance (in reality, they’d probably use many more features, but let’s stick with these three for simplicity). Each customer has been labelled as a ‘good’ or ‘bad’ credit risk. The bank then wants to use a machine learning algorithm to train a model on this existing data to classify new customers as ‘good’ or ‘bad’. Good customers will be automatically granted credit, and bad customers will be automatically denied.

German credit dataset

Luckily for our purposes, a real dataset like this exists from a German bank, shared by Professor Hans Hofman from Hamburg University in 1994. Each row represents a previous customer, with each column representing an attribute, such as age or employment status, and a final column in which the customer’s credit risk has been labelled (either 1 for ‘Good’, or 2 for ‘Bad’).

For example, the 1,000th customer in the dataset has the following attributes:

‘A12 45 A34 A41 4576 A62 A71 3 A93 A101 4 A123 27 A143 A152 1 A173 1 A191 A201 1’

The ‘A41’ attribute in the 4th column indicates that this customer is requesting the credit in order to purchase a used car (a full description of the attribute codes can be found here http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)). The final column represents the classification of this customer’s credit risk (in this case 1 = ‘good’).

Building a model

Let’s imagine I’m a data scientist at the bank and I want to be able to predict the target variable (risk score of ‘good’ or ‘bad’) using the attributes. I’m going to use Python, including the pandas module to wrangle the underlying CSV file into an appropriate format (a ‘data frame’), and the scikit-learn module to do the classification.

import pandas as pd
from sklearn import tree

Next, I’ll load in the german credit dataset, including the column headings (remember, for simplicity we’re only going to look at three features – how long they’ve been in negative balance, their age and the number of dependents):

features = ["duration", "age", "num_depend", "risk"]
df = pd.read_csv("../Downloads/german.data", sep=" ", header=0, names=features)

The target variable, the thing we want to predict, is ‘risk’ (where 1 = ‘good’ and 2 = ‘bad’). Let’s label the target variable y and the features X.

y = df[["risk"]]
X = df[features]

Now I’ll apply a basic Decision Tree classifier to this data. This algorithm partitions the data points (i.e. the customers) into smaller and smaller groups according to differences in the values of their attributes which relate to their classification (i.e. ‘people over 30’, ‘people without dependents’). This is by no means the most sophisticated technique for this task, but it is simple enough for our purposes. We end up with a model which can take as input any customer with the relevant set of attributes and return a classification of that customer as a good or bad credit risk.

The bank can then use this model to automatically make a decision about whether or not to grant or deny credit to the customer. Imagine a customer, Alice, makes an application for credit, and provides the following attributes;

Alice = {'duration' : 10, 'age' : 40, 'num_depend' : 1}

We then use our model to classify Alice:

# convert the python Dict into a pandas dataframe
Alice = pd.Series(Alice)
# reshape the values since sklearn doesn't accept 1d arrays
Alice = Alice.values.reshape(1, -1)
print clf.predict(Alice)

The output of our model for Alice is 2 (i.e. ‘bad’), so Alice is not granted the credit.

Logic, significance and consequences of automated decision taking

How could the bank provide Alice with ‘meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing’?

One proposal might be to provide Alice with a representation of the decision tree model that resulted from the training. This is what that looks like:

Reading from the top-down, each fork in the tree shows the criteria for placing an individual in one of two sides of the fork.

If Alice knows which personal attributes the bank knows about her (i.e. 40 years old, 1 dependent, 20 days in negative balance), she could potentially use this decision tree to work out whether or not this system would decide that she was a good credit risk. Reading from the top: the first fork asks whether the individual has 15.5 days or less in negative balance; since Alice has 20 days in negative balance, she is placed in the right hand category. The next fork asks whether the Alice has 43.5 days or less in negative balance, which she does. The next fork asks whether Alice is 23.5 years old or less, which she isn’t. The final fork on this branch asks if Alice has been in negative balance for 34.5 days or more, which she hasn’t, and at this point the model concludes that Alice is a bad credit risk.

While it’s possible for Alice to follow the logic of this decision tree, it might not provide a particularly intuitive or satisfactory explanation to Alice as to why the model gives the outputs it does. But it does at least give Alice some warning about the logic and the effects of this model.

Another way that the bank might provide Alice with information ‘meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing’ would be to allow Alice to try out what decisions the model would recommend based on a variety of different values for the attributes it considers. For instance, what if Alice was older, or younger? Would she receive a different decision?

The bank could show Alice the age threshold at which she would be considered a good or bad credit risk. If we begin with age = 40, we find that Alice is classified as a bad credit risk. The same is true for Alice at 41, 42, and 43. However, at age 44, Alice’s credit risk classification would tip over from bad to good. That small exercise in experimentation may give Alice an intuitive sense of at least one aspect of the logic of the decision-making system and its envisaged effects. We could do something similar with the other attributes – what if Alice had only had a negative balance for 10 days? What if Alice had more or less children?

If this kind of interactive, exploratory analysis were made available to Alice before she applied for credit, it might help her to decide whether or not she felt OK about this kind of automated-decision making system. It might help her decide whether she wants to object to it, as Article 22 entitles her to do. Rather than being presented in dry mathematical terms, the relationships between these variables and the target variable could be presented in colloquial and user-friendly ways; Alice could be told ‘you’re 4 years too young’ and ‘you’ve been in the red for too long’ to be offered credit.

At a later date, the data on which this model is trained might change, and thus the resulting model might give a different answer for Alice. But it is still possible to take a snapshot of the model at a particular time and, on that basis, provide potentially meaningful interfaces through which Alice could understand the logic, significance and effects of the system on her ability to gain credit.

Explanation: unknown

The point of this exercise is to put the abstract discussions surrounding the GDPR’s provisions on automated decision making into a specific context. If data controllers were to provide dynamic, exploratory systems which allow data subjects to explore the relationships between inputs and outputs, they may actually be functionally equivalent to an ex post explanation for a particular decision. From this perspective, the supposed distinction between an ex ante ‘explanation of system functionality’ and an ex post ‘explanation of a specific decision’ becomes less important. What’s important is that Alice can explore the logic and determine the likely effects of the automated decision-making system given her personal circumstances.

Some important questions remain. It’s easy enough, with a simple, low-dimensional model, to explore the relationships between certain features and the target variable. But it’s not clear how these relationships can be meaningfully presented to the data subject, especially for the more complex models that arise from other machine learning methods. And we know very little about how those who are subject to such automated decisions would judge their fairness, and what grounds they might have for objecting to them. Might Alice reject the putative relationship between a particular feature and the target variable? Might she object to the sampling techniques (in this case, Alice might quite reasonably argue that the attributes of German bank customers in 1994 have little bearing on her as a non-German credit applicant in 2017)? Perhaps Alice would reject the thresholds at which applicants are judged as ‘good’ or ‘bad’?

I hope this simplistic, but specific and somewhat realistic example can serve as a starting point for focused discussion on the options for usable, human-centered transparency around automated decision-making. This is a significant challenge which will require more research at the intersection of machine learning, law and human-computer interaction. While there has been some promising work on transparent / interpretable machine learning in recent years (e.g. ‘Quantitative Input Influence‘ and LIME), relatively little research has focused on the human factors of these systems. We know very little about how people might interpret and evaluate these forms of transparency, and how that might be affected by their circumstances and relative position in the context in which the decision is made.

These questions are important to explore if we want to create automated decision-making systems which adhere not just to the letter of data protection law, but also its spirit. The duty to provide information on the logic, significance and effects of algorithmic decision-making will mean very little, if it doesn’t provide data subjects with the ability to make an informed and reasonable decision about whether to subject themselves to such decisions.

When good algorithms go bad…

I recently spoke on a panel at Strata called ‘When good algorithms go bad’, organised by the good people at DataKind and sponsored by Pivotal Labs and O’Reilly Media. My co-panellists were Madeleine Greenhalgh (Cabinet Office), Shahzia Holtom (Pivotal Labs), and Hetan Shah (Royal Statistical Society), with Duncan Ross (DataKind) chairing what was a very stimulating discussion about the ethics of data science.

I won’t attempt to summarise the excellent contributions of the other speakers (and the audience), but here are some relevant (I hope!) references for those interested in delving further into some of the topics raised:

Many thanks to DataKind UK for organising and inviting me!

White House report on big data and discrimination

The White House has recently published another report on the social and ethical impacts of big data, entitled ‘Big Data: A Report on Algorithmic Systems, Opportunity and Civil Rights’. This could be considered the third in a trilogy, following 2012’s ‘Consumer Data Privacy in a Networked World’ and 2014’s ‘Big Data: Seizing Opportunities, Preserving Values’.

Each report has been a welcome contribution in an evolving debate about data and society, with impacts around the world as well as in the US context. They also reflect, I think, the progress that’s been made in the general direction of understanding in this complex and fast-moving policy area.

The 2012 report was largely an affirmation of commitment to the decades-old fair information practice principles, which noted the challenges posed by new technology. The 2014 report addressed the possibility that big data might lead to forms of unintended discrimination, but didn’t demonstrate any advanced understanding of the potential mechanisms behind such effects. In a paper written shortly after, Solon Barocas and Andrew Selbst commented that ‘because the origin of the discriminatory effects remains unexplored, the 2014 report’s approach does not address the full scope of the problem’.

The latest report does begin to dig more deeply into the heart of big data’s discrimination problem. It describes a number of policy areas – including credit, employment, higher education and criminal justice – in which there is a ‘problem’ to which a ‘big data opportunity’ might be a solution, along with a civil rights ‘challenge’ which must be overcome.

This framing is not without its problems. One might reasonably suspect that the problems in these policy areas are themselves at least partly the result of government mismanagement or market failure, and that advocating a big data ’solution’ would merely be a sticking plaster.

In any case, the report does well to note some of the perils and promise of big data in these areas. It acknowledges some of the complex processes by which big data may have disparate impacts – thus filling the gap in understanding identified by Barocas and Selbst in their 2014 paper. It also alludes to ways in which big data could also help us detect discrimination and thus help prevent it (something I have written about recently). It advocates what it calls ’equal opportunity by design’ approaches to algorithmic hiring. Towards the end of the report, it refers to ‘promising avenues for research and development that could address fairness and discrimination in algorithmic systems, such as those that would enable the design of machine learning systems that constrain disparate impact or construction of algorithms that incorporate fairness properties into their design and execution’. This may be a reference to nascent interdisciplinary research on computational fairness, transparency and accountability (see e.g. the FAT-ML workshop).

While I’d like to see more recognition of the latter, both among the wider academic community and in policy discussions, I hope that its inclusion in the White House report signals a positive direction in the big data debate over the coming years.

Predictions, predictions

The Crystal Ball by John William Waterhouse

It is prediction time. Around about this time of year, pundits love to draw on their specialist expertise to predict significant events for the year ahead. The honest ones also revisit their previous yearly predictions to see how they did.

Philip Tetlock is an expert on expert judgement. He conducted a series of experiments between 1984-2003 to uncover whether experts were better than average at predicting outcomes in their given specialism. Predictions had to be specific and quantifiable, in areas ranging from economics to politics and internatonal relations. He found that many so-called experts were pretty bad at telling the future, and some were even worse than the man or woman on the street. Most worryingly, there was an inverse relationship between an experts fame and the accuracy of their predictions.

But he also discovered certain factors that make some experts better predictors than others. He found that training in probabilistic reasoning, avoidance of common cognitive biases, and evaluating previous guesses enabled experts to make more accurate forecasts. Predictions based on the combined judgements of multiple individuals also proved helpful.

This work has continued in the Good Judgement Project, a research project involving several thousand volunteer forecasters, which now has a commercial spin-off, Good Judgement Inc. The project has experimented with various methods of training, forming forecasters into complementary teams, and aggregation algorithms. By rigourously and systematically testing these and other factors, the system aims to uncover the determinants of accurate forecasts. It has already proven highly successful, winning a CIA-funded competition several years in a row (‘Intelligence Advanced Research Projects Activity – Aggregative Contingent Estimation’ (IARPA-ACE)).

The commercial spin-off began as an invite-only scheme, but now it has a new part called ‘Good Judgement Open’ which allows anyone to sign up and have a go. I’ve just signed up and made my first prediction in response to the following question:

“Before the end of 2016, will a North American country, the EU, or an EU member state impose sanctions on another country in response to a cyber attack or cyber espionage?”

You can view the question and the current forecast from users of the site. Users compete to be the most prescient in their chosen areas of expertise.

It’s an interesting concept. I expect it will also prove to be a pretty shrewd way of harvest intelligence and recruit superforecasters for Good Judgement Inc. In this sense the business model is like Facebook and Google, i.e. monetising user data, although not in order to sell targeted advertising.

I can think of a number of ways the site could be improved further. I’d like to be given a tool which helps me break down my prediction into various necessary and jointly sufficient elements and allow me to place a probability estimate on each. For instance, let’s say the question about international sanctions in response to a cyberattack depends on several factors; the likelihood of severe attacks, the ability of digital forensics to determine the origin of the attack, and the likelihood of sanctions as a response. I have more certainty about some of these factors than others, so a system which split them into parts, and periodically revise them, would be helpful (in Bayesian terms, I’d like to be explicit about my priors and posteriors).

One could also experiment with keeping predictions secret until after the outcome of some event. This would mean one forecaster’s prediction wouldn’t contaminate the predictions of others (perhaps if they were well known as a reliable forecaster). This would allow for forecastors to say ‘I knew it!’ without saying ‘I told you so’ (we could call it the ‘Reverse Cassandra’). Of course you’d need some way to prove that you didn’t just write the prediction after the event and back-date it. Or create a prediction for every possible outcome and then selectively reveal the correct one, a classic con illustrated by TV magician Derren Brown. If you wanted to get really fancy, you could do that kind of thing with cryptography and the blockchain.

After looking into this a bit more, I came across this blog by someone called gwern, who appears to be incredibly knowledgeable about prediction markets (and much else besides).

Galileo and Big Data: The Philosophy of Data Science

The heliocentric view of the solar system was initially rejected… for good reasons.

I recently received a targeted email from a recruitment agency, advertising relevant job openings. Many of them were for ‘Data Science’ roles. At recruitment agencies. There’s a delicious circularity here; somewhere, a data scientist is employed by a recruitment agency to use data science to target recruitment opportunities to potential data scientists for data science roles at recruitment agencies… to recruit more data scientists.

For a job title that barely existed a few years ago, Data Science is clearly a growing role. There are many ways to describe a data scientist, but ‘experts who use analytical techniques from statistics, computing and other disciplines to create value from new (‘big’) data’ provides a good summary (from Nesta).

For me, one of the most interesting and novel aspects of these new data-intensive scientific methods is the extent to which they seem to change the scientific method. In fact, the underpinnings of this new approach can be helpfully explained in terms an old debate in the philosophy of science.

Science is typically seen as a process of theory-driven hypothesis testing. For instance, Galileo’s heliocentric theory made predictions about the movements of the stars; he tested these predictions with his telescope; and his telescopic observations appeared to confirm his predictions and, at least in his eyes, proved his theory (if Galileo were a pedantic modern scientist, he might instead have described it more cautiously as ‘rejecting the null hypothesis’). As we know, Galileo’s theory didn’t gain widespread acceptance for many years (indeed, it still isn’t universally accepted today).

We could explain the establishment’s dismissal of Galileo’s findings as a result of religious dogma. But as Paul Feyerabend argued in his seminal book Against Method (1975), Galileo’s case was actually quite weak (despite being true). His theory appeared to directly contradict many ordinary observations. For instance, rather than birds getting thrown off into the sky, as one might expect if the earth were moving, they can fly a stable course. It was only later that such phenomena could be accommodated by the heliocentric view.

So Galileo’s detractors were, at the time, quite right to reject his theory. Many of their objections came from doubts about the veracity of the data gathered from Galileo’s unfamiliar telescopic equipment. Even if a theory makes a prediction that is borne out by an observation, we might still rationally reject it if it contradicts other observations, if there are competing theories with more explanatory power, and if – as with Galileo’s telescope – there are doubts about the equipment used to derive the observation.

Fast forward three centuries, to 1906, when French physicist Pierre Duhem published La Théorie Physique. Son Objet, sa Structure (The Aim and Structure of Physical Theory). Duhem argued that physicists cannot simply test one hypothesis in isolation. Because whatever the result, one could always reject the ‘auxiliary’ hypotheses that support the observation. These include hypotheses about the background conditions, about the efficacy of your measurement equipment, and so on. No experiment can conclusively confirm or deny the hypothesis because there will always be background assumptions that are open to question.

This idea became known as the Duhem-Quine thesis, after the American philosopher and logician W. V. O. Quine argued that this problem applied not only to physics, but to all of science and even to the truths of mathematics and logic.

The Duhem-Quine thesis is echoed in the work of leading proponents of data-driven science, who argue that a paradigm shift is taking place. Rather than coming up with a hypothesis that predicts a linear relationship between one set of variables and another (‘output’) variable, data-driven science can simply explore every possible relationship (including highly complex, non-linear functions) between any set of variables. This shift has been described as going from data to algorithmic models, from ‘model-based’ to ‘model-free’ science, and from parametric to ‘non-parametric’ modelling (see, for instance, Stuart Russel & Peter Norvig’s 2009 Artificial Intelligence, chapter 18).

While this paradigm shift may be less radical than the Duhem-Quine thesis (it certainly doesn’t cast doubt on the foundations of mathematical truths, as Quine’s holism appears to do), it does embrace one of its key messages; you cannot test a single hypothesis in isolation with a single observation, since there are always auxiliary hypotheses involved – other potential causal factors which might be relevant to the output variable. Modern data science techniques attempt to include as many potential causal factors as possible, and to automatically test an extensive range of complex, non-linear relationships between them and the output variables.

Contemporary philosophers of science are beginning to draw these kinds of connections between their discipline and this paradigm shift in data science. For instance, Wolfgang Pietsch’s work on theory-ladenness in data-intensive science compares the methodology behind common data science techniques (namely, classificatory trees and non-parametric regression) to theories of eliminative induction originating in the work of John Stuart Mill.

There are potential dangers in pursuing theory-free and model-free analysis. We may end up without comprehensible explanations for the answers our machines give us. This may be scientifically problematic, because we often want science to explain a phenomenon rather than just predict it. It may also be problematic in an ethical sense, because machine learning is increasingly used in ways that affect people’s lives, from predicting criminal behaviour to targeting job offers. If data scientists can’t explain the underlying reasons why their models make certain predictions, why they target certain people rather than others, we cannot evaluate the fairness of decisions based on those models.