Self-sufficient programming: RSA cryptosystem with plain Python | reuben binns

[WARNING: this is an exercise purely for fun: it is definitely very insecure, do not use this code in a real system and do not attempt to write your own cryptographic functions!]

Despite working in computer science, these days I barely have the need to write any code in my day-to-day work. This is not unusual for someone working in CS, especially those of us at the ‘softer’ edges. So I decided to set myself a little project to get back into it.

I’ve also been thinking about how the development stack for most programming languages is so dependent on a byzantine labyrinth of third party libraries and packages. This is great in many ways, because whatever you want to do, someone else has probably already done it better and more secure.

But it also means that most applications are dependent on code written and maintained by other people. In some cases this can lead to an unexpected global mess, such as the time when a developer of several popular NPM libraries pulled them from the package management system, breaking the ‘Jenga tower of Javascript‘.

While I don’t think this is actually the answer to the above problems, it could be a fun exercise to see how many of the basic building blocks of modern computing could be re-created from scratch. By me. Someone whose programming has always been a little shoddy and now very out of practice. Armed with nothing more than a few scribbled notes and old slides covering undergraduate-level CS, and using only basic Python (i.e. the Standard Library, no external packages). This is a challenge, because while Python supposedly ships with all the basic stuff you should need (‘batteries included’), in practice this is arguably not true.

DIY RSA

What project to pick? I decided to have a go at implementing the RSA cryptosystem for asymmetric key cryptography, a pretty fundamental part of modern computer security. This seemed like a good challenge for two reasons:

RSA is not in the Python standard library, and requires various functions which I’d naturally go looking for in external libraries (e.g. finding coprimes).
In debates about national security, it is often said that governments can’t effectively ban encryption because it’s not any particular piece of software, it’s just math. Anyone who knows RSA and how to program can implement it themselves given a general purpose computer. But how many people does that include? Can even I, a non-cryptographer with a basic knowledge of how it works in theory, create an actual working version of RSA?

So as an entirely pedagogical / auto-didactical weekend project I’m going to have a go. I’m sure that it will be highly insecure, but I’ll be happy if I can make something that just about works.

I’m allowing myself access to material describing the RSA system (e.g. lecture slides, the original paper), and to stackexchange for basic Python syntax etc. that I’ve forgotten. But no use of external libraries or peeking at existing Python implementations of RSA.

So far I’ve got the following steps of the key generation process:

Pick two large prime numbers:

First we need to pick two large prime numbers at random. So first, we’ll get the user to type some random keys on the keyboard (this will be familiar if you’ve used e.g. Gnu-PG):

user_entropy = input("please generate some entropy by typing lots of random characters: ")

entropy = 0
 for letter in user_entropy:
     entropy = entropy + ord(letter)

That turns the user input into a single number. Then we need to find the nearest prime number, with two functions:

def isPrime(num):
     for i in range(2,num):
         if (num % i) == 0:
             prime = False
         else:
             prime = True
     return prime

def find_nearest_prime(num):

    while num < 100000:

        if isPrime(num):

            return num

        else:

            num += 1

Get N and Φ(N)

Now we have two prime numbers, we multiply then together to get the composite number N which will be the second part of the private and public keys.

n = prime1*prime2

We also need Φ(N) (the ‘totient’):

phi_n = ((prime1-1)*(prime2-1))

Get e (public key)

Now we have to find e, the public key component. The easy bit condition is that it has to be between 1 and Φ(N). The more tricky condition is that it has to be coprime with both N and Φ(N). Two numbers are coprime if they have no common factors other than 1. So first thing we need is a function to find the factors of a number:

def get_factors(num):

    factors = []

    for i in range(2,num):

        if ((num % i) == 0):

            factors.append(i)

    return factors

Now we can write a function to check if two numbers have common factors other than 1:

def isCoprime(num1,num2):

    num1_factors = get_factors(num1)

    num2_factors = get_factors(num2)

    if set(num1_factors).isdisjoint(set(num2_factors)):

        # print('no common factors - they coprime!')

        return True

    else:

        # print('there are common factors, not coprime')

        return False

Now we can write a function to find values for e that will satisfy those conditions:

def find_e(n,phi_n):

    candidates = []

    for i in range(3,n):

        if isPrime(i):

            if((isCoprime(i,n)) and (isCoprime(i,phi_n))):

                candidates.append(i)

    return candidates

This returns a list of potential values for e which we can pick.

Get d (private key)

How about the private key for decrypting messages (d)? This should be the multiplicative inverse of e; that means that when multiplied by e, it should equal 1. My notes say this is equivalent to: e * d = 1 mod N. This is where I’ve run into trouble. My initial attempts to define this function don’t seem to have worked when I tested the encryption and decryption functions on it. It’s also incredibly slow.

At this point I’m also not sure if the problem lies somewhere else in my code. I made some changes and now I’m waiting for it to calculate d again. Watch this space …

[UPDATE: I got it working …]

OK, so it seems like it wasn’t working before because I’d transcribed the multiplicative inverse condition wrong. I had had (d * e) % N, but it’s actually (d * e) Φ(N). So the correct function is:

def find_d(prime1,n):
     for i in range(prime1,n):
         if (((i*e) % phi_n) == 1):
             print(i)
             return i

So that’s the basic key generation functions done. I put in some command line interactions to get this all done, and save the keypair to a text file in the working directory.

Encryption and decryption

With the keypairs saved as dictionaries, the encryption and decryption functions are relatively simple:

def encrypt(pt):
     return (pt ** public_key['e']) % public_key['n']

def decrypt(ct):
     return (ct ** private_key['d'] % public_key['n'])

At the moment, this only works to encrypt integers rather than text strings. There are various ways we could handle encoding text to integers.

Functional … just!

So there we go. A day was just enough to put together a minimally functional RSA cryptosystem.

The main issue is that even for pairs of small prime numbers, it takes a while to find e. Keysizes in the 10-20’s range are pretty quick to compute, but NIST recommends asymmetric keys should be at least 2048-bits. Trying to generate a key this big means leaving the script running for a long, long time.

There are probably loads of ways I could improve the code. Also it would be better to default to a higher value for e. Finally, the default key management is basically nothing (a unencrypted plaintext file).