This page attempts to give a very basic conceptual introduction to cryptographic methods. Before we start the usual disclaimer:
I am not a cryptographer. This document is only for educational purposes. Crypto is hard, you should never trust your home-grown implementation. Unless you’re a cryptographer you will probably overlook some crucial details. Developers should only use the high-level functions that have been implemented by an actual cryptographer.
Now that we got this is out of the way, let’s start hacking :)
The bitwise XOR
operator outputs true
only when both inputs differ (one
is true
, the other is false
). It is sometimes
called an invertor because the output of a bit in
x
gets inverted if and only if the corresponding bit in
y
is true:
# XOR two (8bit) bytes 'x' and 'y'
x <- as.raw(0x7a)
y <- as.raw(0xe4)
z <- base::xor(x, y)
dput(z)
as.raw(0x9e)
# Show the bits in each byte
cbind(x = rawToBits(x), y = rawToBits(y), z = rawToBits(z))
x y z
[1,] 00 00 00
[2,] 01 00 01
[3,] 00 01 01
[4,] 01 00 01
[5,] 01 00 01
[6,] 01 01 00
[7,] 01 01 00
[8,] 00 01 01
In cryptography we xor
a message x
with
secret random data y
. Because each bit in y
is
randomly true
with probability 0.5, the xor
output is completely random and uncorrelated to x
. This is
called perfect secrecy. Only if we know y
we can
decipher the message x
.
# Encrypt message using random one-time-pad
msg <- charToRaw("TTIP is evil")
one_time_pad <- random(length(msg))
ciphertext <- base::xor(msg, one_time_pad)
# It's really encrypted
rawToChar(ciphertext)
[1] ".\xf8F\xfe\xa4 \xe8\xb6\035\xb5\xe3\x95"
# Decrypt with same pad
rawToChar(base::xor(ciphertext, one_time_pad))
[1] "TTIP is evil"
This method is perfectly secure and forms the basis for most
cryptograhpic methods. However the challenge is generating and
communicating unique pseudo-random y
data every time we
want to encrypt something. One-time-pads as in the example are not very
practical for large messages. Also we should never re-use a one-time-pad
y
for encrypting multiple messages, as this compromises the
secrecy.
The solution to this problem are stream ciphers. A stream
cipher generates a unique stream of pseudo-random data based on a
secret key
and a unique nonce
. For a given set
of parameters the stream cipher always generates the same stream of
data. Sodium implements a few popular stream ciphers:
password <- "My secret passphrase"
key <- hash(charToRaw(password))
nonce <- random(8)
chacha20(size = 20, key, nonce)
[1] ff 4b ac d7 db b6 10 bf 82 3f 1c 3e 0b 0e 0f 98 40 9c 87 4a
Each stream requires a key
and a nonce
. The
key forms the shared secret and should only be known to trusted parties.
The nonce
is not secret and is stored or sent along with
the ciphertext. The purpose of the nonce
is to make a
random stream unique to protect gainst re-use attacks. This way you can
re-use a your key to encrypt multiple messages, as long as you never
re-use the same nonce.
salsa20(size = 20, key, nonce)
[1] 70 ac 75 1a 61 e9 91 b4 2d 48 6c 2d f8 05 5f ae 6e 1a 29 4c
Over the years cryptographers have come up with many more variants. Many stream ciphers are based on a block cipher such as AES: a keyed permutation of fixed length amount of data. The block ciphers get chained in a particular mode of operation which repeatedly applies the cipher’s single-block operation to securely transform amounts of data larger than a block.
We are not going to discuss implementation details, but you could
probably come up with something yourself. For example you could use a
hash function such sha256
as the block cipher and append
counter which is incremented for each block (this is called CTR
mode).
# Illustrative example.
sha256_ctr <- function(size, key, nonce){
n <- ceiling(size/32)
output <- raw()
for(i in 1:n){
counter <- packBits(intToBits(i))
block <- sha256(c(key, nonce, counter))
output <- c(output, block)
}
return(output[1:size])
}
This allows us to generate an arbitrary length stream from a single secret key:
password <- "My secret passphrase"
key <- hash(charToRaw(password))
nonce <- random(8)
sha256_ctr(50, key, nonce)
[1] 24 88 a6 d9 50 e4 08 c2 03 c2 56 15 55 95 1a 57 53 a0 d6 dd 38 24 37 dc 79
[26] 14 56 78 64 45 19 e8 1c d8 a1 40 4f d7 57 77 96 99 4f 4a 89 83 0e 4a 2f 45
In practice, you should never write your own ciphers. In the
remainder we just use the standard Sodium ciphers: chacha20
,
salsa20
or xsalsa20
.
Symmetric encryption means that the same secret key is used for both
encryption and decryption. All that is needed to implement symmetric
encryption is xor
and a stream cipher. For example to
encrypt an arbitrary length message
using
password
:
# Encrypt 'message' using 'password'
myfile <- file.path(R.home(), "COPYING")
message <- readBin(myfile, raw(), file.info(myfile)$size)
passwd <- charToRaw("My secret passphrase")
A hash function converts the password to a key of suitable size for the stream cipher, which we use to generate a psuedo random stream of equal length to the message:
# Basic secret key encryption
key <- hash(passwd)
nonce8 <- random(8)
stream <- chacha20(length(message), key, nonce8)
ciphertext <- base::xor(stream, message)
Now the ciphertext
is an encrypted version of the
message. Only those that know the key
and the
nonce
can re-generate the same keystream in order to
xor
the ciphertext back into the original message.
# Decrypt with the same key
key <- hash(charToRaw("My secret passphrase"))
stream <- chacha20(length(ciphertext), key, nonce8)
out <- base::xor(ciphertext, stream)
# Print part of the message
cat(substring(rawToChar(out), 1, 120))
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
The Sodium functions data_encrypt
and
data_decrypt
provide a more elaborate implementation of the
above. This is what you should use in practice for secret key
encryption.
Symmetric encryption can be used for e.g. encrypting local data. However because the same secret is used for both encryption and decryption, it is impractical for communication with other parties. For exchanging secure messages we need public key encryption.
Rather than using a single secret-key, assymetric (public key) encryption requires a keypair, consisting of a public key for encryption and a private-key for decryption. Data that is encrypted using a given public key can only be decrypted using the corresponding private key.
The public key is not confidential and can be shared on e.g. a website or keyserver. This allows anyone to send somebody a secure message by encrypting it with the receivers public key. The encrypted message will only be readable by the owner of the corresponding private key.
# Create keypair
key <- keygen()
pub <- pubkey(key)
# Encrypt message for receiver using his/her public key
msg <- serialize(iris, NULL)
ciphertext <- simple_encrypt(msg, pub)
# Receiver decrypts with his/her private key
out <- simple_decrypt(ciphertext, key)
identical(msg, out)
[1] TRUE
How does this work? Public key encryption makes use of Diffie-Hellman (D-H): a method which allows two parties that have no prior knowledge of each other to jointly establish a shared secret key over an insecure channel. In the most simple case, both parties generate a temporary keypair and exchange their public key over the insecure channel. Then both parties use the D-H function to calculcate the (same) shared secret key by combining their own private key with the other person’s public key:
# Bob generates keypair
bob_key <- keygen()
bob_pubkey <- pubkey(bob_key)
# Alice generates keypair
alice_key <- keygen()
alice_pubkey <- pubkey(alice_key)
# After Bob and Alice exchange pubkey they can both derive the secret
alice_secret <- diffie_hellman(alice_key, bob_pubkey)
bob_secret <- diffie_hellman(bob_key, alice_pubkey)
identical(alice_secret, bob_secret)
[1] TRUE
Once the shared secret has been established, both parties can discard their temporary public/private key and use the shared secret to start encrypting communications with symmetric encryption as discussed earlier. Because the shared secret cannot be calculated using only the public keys, the process is safe from eavesdroppers.
The classical Diffie-Hellman method is based on the discrete logarithm problem with large prime numbers. Sodium uses curve25519, a state-of-the-art D-H function by Daniel Bernsteinan designed for use with the elliptic curve Diffie–Hellman (ECDH) key agreement scheme.