Pond consists of users and servers. Servers exist in order to provide availability by accepting messages for a user while that user may be offline. Servers may be freely chosen by users and each user may have their own server if they wish.
Pond assumes the existence of an overlay network that prevents a network attacker from learning which servers a user is connecting to. That network is taken to be Tor for the remainder of this document and servers are Tor hidden services. Since a global, passive attacker can deanonymise Tor, that attacker is capable of violating this assumption and breaking Pond. Other attacks on Tor may also break this assumption.
Users are assumed to have trustworthy computers with storage that reliably deletes old information. For more details, see the threat model.
Users communicate with servers (and only with servers, never with other users directly) using the transport protocol. Users communicate with their own server to receive messages, but unlike email, they communicate with the recipient's server in order to send a message.
Although Tor hidden services already provide authentication of the server to the client, we don't rely on that. The transport layer provides integrity, forward secure confidentiality and mutual authentication. It's assumed that the public key of the server is already known to the user and the public key of the user isn't revealed to anyone other than the expected server. (Tor does provide another layer of forward secrecy which means that a compromise of the server's keys does not compromise the public keys of users that connected to it.)
After authentication, the user sends a fixed amount (currently 16KB) of data to the server and the server replies with an identically sized flow. Finally the user sends a secure close message and the connection is complete. Connections are never reused. (The actual messages between the user and the server might be very small, but those messages will be padded to 16KB before being encrypted in order to make the traffic profile constant.)
Users make connections periodically in order to send and receive messages. The time between each connection is exponentially distributed. Each connection is either to the user's own server, to fetch any messages that may be waiting, or to the server of another user, to deliver a message. Connections to the user's own server for fetching messages are authenticated as that user. Connections for delivering messages (whether to a user on the same server or not) are authenticated as fresh, random identities. These connections need not be authenticated, but it's easier to keep the traffic profile identical if they are authenticated with a random key.
From the point of view of the network close to the user, all that is seen are randomly timed connections over Tor, all with identical traffic profiles. That profile is fairly distinctive and would allow the network to learn that a host is using Pond, but the network cannot tell when messages are being sent or received.
From the point of view of the user's server, it receives exponentially distributed connections from the user, over Tor, and can reply with a message when one is waiting. The connections are randomly timed, rather than having a constant period, in order to avoid the server learning when the user is sending messages. If a server saw connections at 11 minutes past the hour, 21 minutes and 41 minutes, it's not too hard to guess when the user connected to another server in order to deliver a message. With an exponential distribution, although there is a slight, probabilistic bias in the timing between two fetches when the user sent a message in between, we can assume that messages are sent by users on a reasonably random basis and so there's no practical use for the bias.
An attacker who can monitor the network of the server and the user can tell when the user is sending messages to another server because the connection from the user doesn't appear in the expected place. This attacker breaks the overlay network assumptions and therefore breaks the security guarantees.
Message encryption follows the lead of OTR. Parties exchange a stream of Diffie-Hellman public values as they exchange messages. We omit OTR's key identifier because that counter reveals information about the number of messages that a pair of users have exchanged. Rather we simply use trial decryption to find the DH values that a particular message is using.
In addition to omitting the key identifier, we also omit OTR's anti-replay counter for the same reason. This allows servers to replay messages to users, although each message has a unique ID and a timestamp. Users have a time-limited anti-replay window within which duplicate messages are suppressed and, outside of that, using the timestamp on the message is viable and abnormally old messages should be highlighted.
Messages are protocol buffers which are padded to a fixed size (currently 16KB less 512 bytes). Messages are encrypted with a DH ratchet (thanks to Trevor Perrin for the design). This means that, if a client chooses not to retain a plaintext copy of an outgoing message, then even the sending client cannot decrypt it. (This is not currently surfaced in the UI however.) The random, 24-byte nonce is prepended to the boxed message. Messages contain within them a ‘next’ Diffie-Hellman public value from the client.
On receipt of a message, the client tries each of the four combinations in order to avoid numbering the keys and having to maintain state that reveals the number of messages exchanged. If the client finds that a message was encrypted using its ‘current’ DH key, the contact has clearly received that key and the client rotates keys and generates a new ‘current’ private key which will be included in future messages to that contact.
In order to facilitate the exchange of messages (and the ratcheting of Diffie-Hellman values) when, semantically, the flow of information may be more unidirectional, Pond encourages messages to be acknowledged. An acknowledgment is a user-initiated action that involves sending an empty reply to a message. This both lets the sender know that their message has been received and allows new Diffie-Hellman values to flow back.
As a matter of custom, users delete messages after a number of days (currently a week). Of course, there is nothing to enforce this in the same way that there's nothing to enforce the custom that OTR IM conversations aren't logged.
Since the bandwidth of this system is so low, a user can trivially be incapacitated by a small number of messages. Because of this, we make the system closed: only authorised users can cause a message to be queued for delivery. This very clearly sets Pond apart from email. There are no public addresses to which a Pond message can be sent. Likewise, it's no longer true that the network is fully connected; if you send a message to two people, they may not be able to reply to each other.
This policy has to be enforced at the servers as they're the parties who accept and enqueue messages. However, we don't want the servers to learn who is sending each message, so we cannot have them authenticate against a list of allowed senders for each user. We could give all allowed senders some shared secret, but then we cannot revoke someone's access without rekeying the entire user's contact list.
So we use the BBS group signature scheme. A server knows a group public key for each user that it accepts mail for. The user provides their contacts with a distinct member private key during key exchange. In order for a message to be delivered to a user's server, the message must be signed by that user's group and the server cannot learn which member of the group signed it.
However, if access needs to be revoked, the user can calculate a revocation for a specific member private key and hand it to the server. In order that the server not then be able to deanonymise all previous messages from the newly revoked contact, the revocation works in such a way that all previous signatures become invalid and each member has to update their private keys given the revocation information. (Except the revoked user, who cannot update their key.) Contacts learn the revocation information when they next contact the user's server and attempt to deliver a message. The delivery will fail, but non-revoked users can update their member private keys and retry successfully.
In order for forward secrecy to be effective, we need a method of deleting expired data so that it cannot be recovered. This is largely an implementation issue. Normal disks work fairly well, but backup schemes can confound secure deletion and so can bad-block remapping. To address the latter as much as possible, we save all state in a single file that is encrypted with a key derived from the user's passphrase and overwritten for every update. This causes large amounts of overwriting as a matter of course and we also smear the random nonce over 32KB of disk, all of which must be recovered in order to decrypt the file.
While such anti-forensic methods are standard, it also appears likely that they are critically flawed in the face of copy-on-write filesystems and log-structured SSDs. Thus, wherever possible, we try to make use of a source of ‘erasure storage’. This is anything that can store a small value and later overwrite it. Currently the only implementation is in the form of TPM NVRAM. The erasure storage value is a random, 32-byte mask that is XORed with the storage key before use. (The storage key is generated using scrypt from the passphrase.)
Because NVRAM has a limited number of update cycles, the mask value is only updated once per day, at most. TPM NVRAM, on top of typical NVRAM, also has the property that it can only be read after presenting the correct key. This stops an attacker from copying the state file and erasure storage value and later attempting to obtain the passphrase. Other types of erasure storage may not have this property however.
Initially both sides exchange fresh Diffie-Hellman public values in the curve25519 group. From the shared secret, sessions keys for each direction are derived as SHA256("client keys\x00" + shared_secret) and SHA256("server keys\x00" + shared_secret). All future messages on the connection are encrypted and authenticated as described in section 9 of the NaCl paper (i.e. NaCl's secretbox) with a counter nonce.
The server calculates the Diffie-Hellman shared secret between the client's ephemeral public value and the server's long-term, Diffie-Hellman public value, which we'll call k. It then sends HMAC_SHA256(key = k, message = "server proof\x00" + SHA256(clients_ephemeral_public + servers_ephemeral_public)).The client is assumed to know the server's long-term key via other means and decrypts and verifies the HMAC. If correct, it sends its long-term, Diffie-Hellman public, as well as a similar HMAC keyed with the shared-secret between the two long-term keys: HMAC_SHA256(key = k2, message = "client proof\x00" + SHA256(clients_ephemeral_public + servers_ephemeral_public + hmac_from_server)). (Note that the client's long-term key may actually be ephemeral if the client wishes to authenticate anonymously.)
At this point, the handshake is complete. The client now sends a padded request to the server, the server replies with an identically padded message and the client finishes by sending an empty message. All these are encrypted and authenticated with NaCl secretbox, as everything since the initial Diffie-Hellman exchange has been.
For details of the higher level protocol, see the protobufs.
It's currently an unanswered question whether timing differences between the home server and other servers will reveal to a network attacker which the user is communicating with and thus when the user is sending messages. Pond sets a random SOCKS5 username in order to request of Tor that different paths be used for every connection but it may be that servers need to delay their replies for a fixed amount of time.
Key exchange between two users involves each creating a KeyExchange protobuf (see protobufs). Since Pond servers demand a group signature before accepting a message for delivery, there are no public Pond addresses that can be used to bootstrap a Pond relationship. Rather we currently assume an external, confidential and authentic means for users to exchange KeyExchange messages.
(This may be an important weakness. One obvious answer is to have servers accept, say, 10% of a user's quota of unsigned ‘introduction’ messages.)
A KeyExchange contains a group member private key (which is why they need to be confidential), as well as the information needed to direct a message to a user: their home server and public identity at that server. They also contain an Ed25519 key and the intent is that, in the future, users will be able to update their home servers by sending messages to each of their contacts. Group member revocations are also be signed by this key.
There are two methods of key exchange supported. The first is manual: the KeyExchange messages are serialised with PEM and the user is expected to exchange them with the intended contact, authentically and confidentially. For example, a pre-existing relationship with PGP or OTR keys could be used to bootstrap Pond.
The second method we have deemed Phrase Automated Nym Discovery Authentication or PANDA for short. PANDA allows a relationship to be established via a shared-secret. This works by using a “meeting point” which, currently, involves contacting a central server over Tor. The shared secret is processed with scrypt into a key and two ‘meeting’ IDs. The meeting point simply acts as a pinboard where messages can be posted to a given meeting ID. If a different message has already been posted to the same ID then it's returned. The meeting point only allows two different messages to exist for a given meeting ID until it's garbage collected (in about a week).
The problem of performing a key exchange given a shared secret falls into the topic of Password Authenticated Key Exchange (PAKE), which is a well studied problem. The first PAKE, EKE, was described by Bellovin and Merritt in 1992. However, we have one additional requirement: that the protocol execute in a single round. A round is defined as a message sent by each party simultaneously, thus a message in a given round cannot depend on any other message in the same round. This requirement arises because the meeting point may have very high latency and we have no way of assigning the labels ‘client’ and ‘server’ to our parties. Many PAKE protocols implicitly assume that the parties can behave differently based on these labels (including the original EKE), but our parties have only a shared secret. An exchange of messages could be used to break this symmetry but the additional latency is undesirable.
I'm most familiar with SPAKE2, but SPAKE2 appears to be unsuitable because it masks Diffie-Hellman values with a mask value raised to the power of the password and requires different mask values for the two parties. Due to our symmetry requirement the two parties would have to use the same value mask. I've proven that the CCDH problem from the SPAKE paper is equally strong when the two mask values are equal, however the mask values appear again in the security proof and it's not obvious how to patch that second hole.
Instead Pond uses EKE2. EKE2 is very similar to SPAKE2 but uses an ideal cipher to encrypt the Diffie-Hellman values rather than masking them with secret group elements.
Our Diffie-Hellman group of choice is curve25519 which results in a 255-bit Diffie-Hellman value, which is larger than the 128-bit block size of most ciphers. Using a typical block cipher mode would result in an attacker having a discomforting amount of control of the resulting plaintext. Using an AEAD mode would give an attacker an authenticator which they could use to mount an offline, brute-force attack on the shared secret. (The attacker may already be able to mount an offline, brute-force attack on the shared secret by using an identifier revealed to the meeting point. But we still don't want to give them another avenue for this attack in case a particular meeting point doesn't require that). A wide-block construction like LIONNESS doesn't appear to be applicable because the 256-bit block is too small to split into usefully sized left and right parts for the Luby-Rackoff construction. Many other wide-block constructions have proposed subsequently including EME, CMC, XCB, HCTR and others. However, the area is a patent minefield.
Instead we instantiate the ideal-cipher for EKE2 with a 256-bit block Rijndael. Although AES specifies only a 128-bit block size, Rijndael was originally specified with 128, 192 and 256-bit block sizes. It's very uncommon to find an Rijndael implementation that implements a 256-bit block size, but since performance isn't a concern and nor are side-channels, a custom, toy Rijndael implementation is sufficient.
So EKE2 is used to go from a shared-secret to an ephemeral, shared-key and then the KeyExchange messages are encrypted with that key and exchanged via the second meeting ID.
The shared secret itself can consist of a passphrase, a time and a deck or two of cards. Since humans are poor at generating and remembering secrets, a deck of cards, shuffled and split in half serves quite well. Decks of cards are also commonly available and don't have the downsides that bringing a mobile device to a meeting might have.
Although the two halves of the deck are different, they are sufficient to define a common, canonical set of cards. For Pond, the cards are ordered and the canonical half is the one with the majority of the lowest numbered card. If both halves have the same number of the lowest numbered card then the next lowest is considered. (If each half has exactly the same cards then both are canonical.)
A single deck of cards results in an approximately 48 bit shared secret and two decks is 97 bits. The stacks of cards are a problem to dispose of (unlike a memory, they can be obtained by dumpster diving) but they only need to be protected until the key exchange is complete. After that they are no longer sensitive.