Sunday, December 18, 2016

The one-time pad

A long, long time ago, on a blog right, right here, I briefly mentioned that I wanted to write a blog post about the one-time pad. Today is the day that I do so. 

The one-time pad is an elementary part of cryptography and I feel that every computer scientist should know about it. Don't worry, though, I'll keep this understandable for just about everyone. In fact, the one-time pad is very simple and could probably be explained to a seven-year-old.

Let's say that you have written a love letter that you want to be readable only by the target of your affections. In order to hide its content you take piece of paper with completely random letters, which we will call the key.

Now you take the first letter of your love letter and remember its position in the alphabet. Then, you take the first letter of the key and also remember its number. You add the two, and if the total is higher than 26, subtract 26. You end up with a number that can once again be translated back to a letter. This will be the first letter of your encrypted message. 

You do this for the second letter as well. And then for the third. In fact, you keep doing this until you have encrypted the entire love letter. You send this encrypted message to your beloved, who must also have a copy of the key; only people with the key can read your message. 

Do note that you do not loop around on the key. This means that the key must be at least as long as the message itself. Moreover, you cannot use this key again (or at least the part of the key you used, if the key is longer than the message, you can use the remainder it next time). This is really an important part of the strength of the one-time pad. 

In fact, the strength of the one-time pad is such that it is unbreakable. Actually, it has been mathematically proven that it cannot be broken without knowledge of the key. The proof goes something like this: there is any number of possible decrypted messages that the encrypted message could represent. Without any knowledge of the key, each of those is exactly equally likely to be the original message. This means that you can't discern between them in any way. Some might make sense while others may not, but there is absolutely no way to tell which of the decryptions that does make sense is the intended one.

It should be noted that in the way we did this above, we are "leaking" a lot of information because we are not encrypting spaces and punctuation, which compromises the strength of the encryption. There are of course ways to encrypt these as well. In fact, there is no reason we should stick to the traditional alphabetical encryption. Instead of using letters as a basis for the encryption, you could use "bytes of data represented in binary" and instead of asking the key to the message you could use a "bitwise xor". This way you would end up with system that has the same mathematical properties, makes more sense for computers and can represent any type of data. 

Though the one-time pad is very strong, it is not used all that much. This is because it isn't very practical. You need to share a key with the recipient, which needs to be as long as your message. If you can do this securely, why wouldn't you just send the message instead? With that in mind, the only practical use of the one-time pad is that you can share the key at a moment when secure communications are possible and send the message later, when they aren't. 

While the one-time pad may be the ultimate in symmetric encryption, most of the encryption we do today is actually asymmetric. This is when there are different keys for encryption and decryption. One of the two can be public knowledge while the other is kept secret. In order to sign something and prove that you have written it (and nobody has changed anything about it) you use a secret key to encrypt and a public key to decrypt your "signature". In order to make sure there are no eavesdroppers, you use a encryption key that is freely available that has a corresponding decryption that only the intended recipient has.

Using one-time pads isn't entirely trivial. First of all, your key needs to be truly random. Any kind of pattern in it reduces the strength of the encryption. True randomness actually isn't easy, especially not in the large amounts needed for a one-time pad. Computers actually often simulate randomness, which really doesn't cut it for this purpose. 

The second problem is sharing the key. We already mentioned this, but the problem here becomes that you may not know how much data you will want to send later on. This means that you might need to share a very, very long key just to be sure you can send enough long messages later on. And having a long key lying around is a problem in itself, as a stolen key loses all its value...

Finally, you really really really can't reuse (parts of) a key. This really can't be stressed enough. During the Second World War, the Germans actually used one-time pads. Except... they reused keys and thus the encryption wasn't unbreakable and was in fact broken. Because of this, the Allies were able to listen in on their communications...

Allright, that is about all that I feel everyone ought to know about the one-time pad. I do intend to write about the ways I would use pads in different situations some other time, but I don't yet know when that will be. The situations I am thinking of would be modern day espionage and and an interstellar empire, but nothing is set in stone yet.

No comments: