Dat Protocol
DEP-0010: Hypercore Wire Protocol
Title: DEP-0010: Hypercore Wire Protocol
Short Name: 0010-wire-protocol
Type: Standard
Status: Draft (as of 2019-02-27)
Github PR: Draft
Authors: Paul Frazee, Bryan Newbold
Summary
This DEP describes the Hypercore wire protocol: a transport-agnostic message stream spoken between nodes in a swarm of hypercore network peers (including Dat clients). The wire protocol includes mechanisms for framing, stream encryption, and feed key authentication.
Motivation
The protocol described here is already in use as of 2017 (by Hypercore, Dat, and Beaker Browser users), and was partially described in an earlier whitepaper. This document fills in some additional details.
Stream Connections
The Dat wire protocol depends on the use of a binary transport protocol which provides the following semantics:
- reliable delivery (no dropped or partial messages)
- in-order delivery of messages
Two notable transport protocols that satisfy these requirements are TCP and µTP (the "Micro Transport Protocol").
Peers wishing to connect need to discover each other using some mechanism or another (see forthcoming DEPs on some options; this process is modular and swappable), and both need to have the public key for the primary hypercore they wish to exchange.
Messages are framed by the Dat protocol itself (see Messages section for details).
Channels
Multiple hypercore feeds can be synchronized over the same protocol connection. Messages pertaining to the separate channels are tagged with an id for disambiguation.
Note that at least one feed is necessary for each connection (for handshaking to succeed), and that the first feed is the one used for discovery and as an encryption key.
To initiate a new channel (after the primary is established), a peer can send a Feed message, followed by an Info message. Unlike the first message sent on an overall connection, later Feed messages are encrypted. Either party may initiate a new channel with a Feed message at any time.
Handshake Procedure
A handshake procedure needs to occur for each feed on a channel; the first part of the first handshake happens in cleartext and both validates discovery keys and establishes encryption parameters used for the rest of the connection. The first (primary) channel has id=0
.
The first (cleartext) message is a Feed message, and includes two fields: a nonce and a discovery key.
The nonce is generated by each peer as a random 32-byte sequence.
The discovery key is generated from the public encryption key for a hypercore feed (in this case the first, or "primary" feed) by using the public key to perform a keyed hash of the 9-byte ASCII string "HYPERCORE" (with no trailing NULL byte) using the BLAKE2b keyed hashing algorithm (provided by most BLAKE2b hash implementations). The discovery key is 32 bytes long.
Example of generating a discovery key in pseudo-code:
discoveryKey = BLAKE2b(message = 'HYPERCORE', key = publicKey, outputLengthBytes = 32)
The discovery key is used in cleartext instead of the public key to avoid leaking the public key to the network; read access to hypercore feeds (including Dat archives) is controlled by limiting access to public keys.
When the connection is first opened, the connecting peer sends their Feed message. The receiving peer checks that the discovery key was expected (eg, that they know of a public key matching that discovery key and are willing to synchronize the feed associated with that key). If so, they reply with their own Feed message. If not, they drop the connection.
Once Feed messages are exchanged, both peers have all information they need to encrypt all further content on the channel, and do so (see below for details). The second part of the handshake is to exchange Handshake messages, which set some parameters for the channel. Handshakes also include the self-identified peer id, which can be used to detect accidental self-connections or redundant connections to the same peer (eg, over different transports). The peer id is typically random and is not authenticated.
Encryption Scheme
After the first Feed messages are exchanged (one message in each direction, in cleartext), all further bytes exchanged over the channel are encrypted.
Framing metadata (aka, message length and type) is encrypted, but a third party could likely infer message lengths (and thus potentially message types) by observing packet sizes and timing; no padding is applied at the protocol layer.
The encryption scheme used is libsodium's stream primitive, specifically the XSalsa20 cipher. The cipher is fed a shared key (the primary hypercore feed public key), a nonce (selected by the sender and exchanged during handshake), and a block offset (representing all encrypted bytes sent on the connection in this direction so far).
The specific libsodium function used is crypto_stream_xsalsa20_xor_ic()
. Some interfacing code is necessary to process messages that don't align with the cipher's 64-byte chunk size; unused bytes in any particular chunk can be ignored (or, as an optimization, cached for use with the next message). For example, if 1000 encrypted bytes had been sent on a connection already, and then a new 50 byte message needed to be encrypted and sent, then one would offset the message by 1000 % 64 = 40
bytes and XOR the first 24 bytes against block 15, then XOR the remaining 26 bytes against block 16. The bytes would be shifted back and recombined before sending, so only 50 bytes would go down the connection; the same process would be followed by the receiver.
Want/Have Procedure
The wire protocol is designed to be efficient when syncing extremely large Hypercore feeds (ie millions of blocks in length). Therefore a procedure exists for each peer to indicate which subset of the feed they would like to synchronize, and for the remote to then announce which blocks within those subsets they possess. This is the Want/Have Procedure.
By default, a peer wants no blocks. It can add or remove wanted block using the Want and Unwant messages.
When new wanted blocks are made available, the peer should react by sending a Have message. Likewise, when wanted blocks are no longer available, the peer should react by sending an Unhave message. Finally, any time a Want message is received, the peer should react by sending a Have message.
Request Procedure
After the Want/Have Procedure, a peer will know what Hypercore feed blocks are available on the remote, and can then send Request messages for the individual blocks. Each Request specifies which block it needs, and also specifies some information about Merkle tree proof nodes which should be included. In reaction, the remote sends a Data message with the requested content.
At present, Request messages can only specify one block at a time. This is to encourage an equal distribution of requests between multiple connected peers. However, multiple requests can be sent in parallel.
Extension Procedure
To allow for experimentation within userland, the wire protocol supports an extension process. All extensions are identified by strings, and sent in an array in the Handshake message. Both peers must declare an extension in the handshake to consider it 'supported'.
Each extension is capable of sending custom payloads through the Extension message type.
Message Details
The connection between peers is an endless stream of bytes, so each message must be "framed" so the recipient knows when it starts and ends. The wire framing format is <len>(<header><message>)
. len
is a varint with the number of bytes in the following message (the sum of the header
and message
). header
is a varint, of form channel << 4 | <4-bit-type>
. Note that in most cases the header
varint will be a single byte, but clients should treat it as a varint to accommodate large channel counts.
Messages are encoded (serialized) using Google's protobuf encoding.
type code | Name |
---|---|
N/A | Keep-Alive |
0 | Feed |
1 | Handshake |
2 | Info |
3 | Have |
4 | Unhave |
5 | Want |
6 | Unwant |
7 | Request |
8 | Cancel |
9 | Data |
15 | Extension |
Keep-Alive
A message of body length 0 (giving a total message size of 1 byte for the len
varint) is a keep-alive. Peers must always handle keep-alive messages correctly (aka, ignore them), regardless of transport.
Depending on transport and application needs, peers may optionally send keep-alive messages to help detect and prevent connection loss. Implementors are free to chose their own period or strategy for sending keep-alives. A reasonable default period is 300 seconds (5 minutes).
Feed
type=0
Should be the first message sent on a channel. Establishes the content which will be exchanged on the channel. See the Channels section for details.
message Feed {
required bytes discoveryKey = 1;
optional bytes nonce = 2;
}
Handshake
type=1
Overall connection handshake. Should be sent just after the Feed message on the first channel only.
Some notes on the fields:
- id Typically a 32-byte identifier which is allocated randomly by a peer at the start of its swarming session. Not authenticated. Used to help detect multiple connections to a single peer.
- live If both peers set to true, the connection will be kept open indefinitely.
- userData An open field for sending data which you can retrieve on handshake. No value is prescribed by the protocol.
- extentions A list of strings identifying additional message-types which are supported via the Extension message.
- ack A peer (the "sender") can set this flag to request that every time another peer (a "receiver") gets a
Data
message from the sender, that they reply with an explicitHave
message once they have successfully persisted the data in storage. This lets the "sender" track how many persisted copies of a chunk there are.
message Handshake {
optional bytes id = 1;
optional bool live = 2;
optional bytes userData = 3;
repeated string extensions = 4;
optional bool ack = 5;
}
Info
type=2
Indicates state changes in a channel. The initial state for uploading & downloading is true
. If both ends are not downloading and not live, it is safe to consider the channel ended.
message Info {
optional bool uploading = 1;
optional bool downloading = 2;
}
Have
type=3
Announces the availability of Hypercore feed blocks for download. If no bitfield
is present, the start
and length
represent a continuous set of blocks. If bitfield
is present, it will convey the availability of individual blocks, offset from the start
. For information about the encoding of bitfield
, see "Run-Length Encoded Bitfield" below.
The Have message is sent in the following contexts:
- In response to a Want message.
- To announce new blocks which have been added within a remote-wanted range.
message Have {
required uint64 start = 1;
optional uint64 length = 2 [default = 1]; // defaults to 1
optional bytes bitfield = 3;
}
Unhave
type=4
Announces the loss of availability of Hypercore feed blocks for download. The start
and length
represent a continuous set of blocks.
This message is sent in the following contexts:
- To inform the peer that data which was sent was rejected, for instance because it was not requested and pushes are not being allowed.
- To announce blocks have been removed from local storage within a remote-wanted range.
message Unhave {
required uint64 start = 1;
optional uint64 length = 2 [default = 1]; // defaults to 1
}
Want
type=5
Announces a range of Hypercore feed blocks which the peer would like to receive Have/Unhave messages about. Remote should respond with a Have message which describes which of the wanted blocks are available, and should send additional Have/Unhave messages if availability within the wanted range changes.
message Want {
required uint64 start = 1;
optional uint64 length = 2; // defaults to Infinity or feed.length (if not live)
}
Unwant
type=6
Announces a range of Hypercore feed blocks which the peer would no longer like to receive Have/Unhave messages about.
message Unwant {
required uint64 start = 1;
optional uint64 length = 2; // defaults to Infinity or feed.length (if not live)
}
Request
type=7
Request data from the remote. The remote should be react by sending a Data message.
The fields are as follows:
- index The Hypercore feed block being requested.
- bytes An alternative to
index
, a byte offset into the Hypercore feed. Instructs the remote to send the block at the givenbytes
offset. - hash If true, only send the nodes, not the block data.
- nodes Communicates which parent and uncle nodes should be included in the Data response. See "Block Tree Digest" (below) for an explanation of the encoding.
message Request {
required uint64 index = 1;
optional uint64 bytes = 2;
optional bool hash = 3;
optional uint64 nodes = 4;
}
Cancel
type=8
Cancel a Request message sent earlier. The remote should react by aborting any active or queued Request message which matches the index
, bytes
, and hash
parameters.
message Cancel {
required uint64 index = 1;
optional uint64 bytes = 2;
optional bool hash = 3;
}
Data
type=9
Send a Hypercore feed block. May contain the data of the block, and may contain the hashes of the ancestor nodes in the merkle tree. Should only be sent in reaction to a Request message.
When a Data message is received, its node hashes and signature should be verified against any local tree information. If the nodes can be verified, they and the block data may be stored locally.
If a Data message is received for a block which was not requested, the peer can react by ignoring the data and sending an Unhave message in response. This will inform the remote that the data was rejected, and is not stored.
message Data {
message Node {
required uint64 index = 1;
required bytes hash = 2;
required uint64 size = 3;
}
required uint64 index = 1;
optional bytes value = 2;
repeated Node nodes = 3;
optional bytes signature = 4;
}
Extension
type=15
Send a custom message.
The user-type
is an index into the extensions
array sent in the Handshake. For instance, if two extensions were declared ['foo', 'bar']
then the user-type
of 'foo'
would be 0
and of 'bar'
would be 1
.
The payload
may be any message content, as needed by the extension.
<varint user-type><payload>
Run-Length Encoded Bitfield
The Run-Length Encoded (RLE) bitfield is a series of compressed and uncompressed bit sequences.All sequences start with a header that is a varint. If the last bit is set in the varint (it is an odd number) then a header represents a compressed bit sequence. If the last bit is not set then a header represents an non compressed sequence.
compressed-sequence = varint(byte-length-of-sequence << 2 | bit << 1 | 1)
uncompressed-sequence = varint(byte-length-of-bitfield << 1 | 0) + bitfield
Block Tree Digest
As described in DEP-0002 (Hypercore), a peer should be able to verify both the integrity of received data (aka, was there corruption somewhere along the way, detected via hash) and the authenticity (aka, is this the data from the original writer, detected via signature). Hypercore transmits the hash for every data block, but only signatures for the root hashes of Merkle trees, not individual block hashes, which means a peer may need additional hashes (for data blocks they do not have a copy of) if they want to verify the signatures of individual blocks.
Redundantly transmitting all such hashes on every request would be inefficient if the receiver already had some hashes, so requesting peers can specify which hashes they need in the nodes
field of the Request
message. Instead of sending an array of node indexes, the nodes
field is a compact bitfield (serialized as a uint64
), indicating for each "uncle" and "parent" whether the hash should be transmitted.
Consider the following tree:
0
1
2
3
4
5
6
If the receiving peers wants to fetch, and verify, block 6, it needs to communicate whether it already has the uncle hashes (4, 1) and the parent hashes (3). This information can be compressed into a small bit vector with the following scheme:
- the least-significant bit indicates whether the most-significant bit is a "parent" (if '1') or an "uncle" (if '0')
- all other bits, in order from least- to most-significant, indicate whether the corresponding "uncle" hash does need to be transmitted (if bit '0') or does not (if bit '1')
As an example, suppose we want to fetch block 6 from a remote peer, and we already have the sparse node metadata (hashes) for blocks 3 and 4, but not 1. In other words:
- 4, an uncle, we already have the hash
- 1, next uncle, we don't have hash
- 3, a parent (and root hash), we do have hash
- root signature (of hash 3) will be sent with request
Our Request
should have the following nodes
bitfield (0b1011
encoded in the least-significant bits of a uint64
):
1011
│││└── indicates that the most-significant bit is a parent, not an uncle
││└─── do not send hash for the first uncle, 4
│└──── do send hash for the next uncle, 1
└───── do not send hash for next parent, 3
Using the index
field of the Request
message, the receiving peer can calculate the number of packed bits and extract the bitfield. The "most-significant bit" referenced above is of just the fixed-size bitfield, not the uint64
as a whole.
From this, the remote peer will know to only send one hash (for block 1) for us to verify block 6. Note that we (the receiver) can calculate the hash for block 6 ourselves when we receive it.
As a special case, the bit vector 1
(only contains a single one) means that the sender should not send any hashes at all.
These digests are very compact in size. Only (log2(number-of-blocks) + 2) / 8
bytes are needed in the worst case. For example if you are sharing one trillion blocks of data the digest would be (log2(1000000000000) + 2) / 8 ~= 6
bytes long (which fits in a single uint64
). This scheme works for Hypercores with up to 2^62 = 4,611,686,018,427,387,904
blocks.
Examples
Simple Download
Alice has an e-book identified by a public-key (PK) which Bob would like to download. The e-book also has a discovery-key (DK). Bob connects to Alice, and the following exchange occurs:
BOB: sends Feed (unencrypted) {discoveryKey: DK, nonce: BobNonce}
BOB: sends Handshake {id: BobID}
ALICE: sends Feed (unencrypted) {discoveryKey: DK, nonce: AliceNonce}
ALICE: sends Handshake {id: AliceID}
BOB: waits for Feed
BOB: waits for Handshake
ALICE: waits for Feed
ALICE: waits for Handshake
BOB: sends Want {start: 0}
ALICE: receives Want {start: 0}
ALICE: sends Have {start: 0, bitfield: ...}
BOB: receives Have {start: 0, bitfield: ...}
BOB: sends Request (for block 0)
ALICE: receives Request (for block 0)
ALICE: sends Data (for block 0)
BOB: receives Data (for block 0)
BOB: sends Request (for block 1)
ALICE: receives Request (for block 1)
ALICE: sends Data (for block 1)
... (repeat for all other blocks) ...
BOB: sends Info {downloading: false}
ALICE: closes connection
Privacy and Security Considerations
All security and privacy guarantees of this protocol implicitly depend on the soundness of the underlying cryptographic primitives, algorithms, and implementations, which include BLAKE2b and XSalsa20.
A connecting peer, or any observer able to decrypt traffic, may be able to infer the following from protocol traffic:
- what subset of content the local peer already has access to (via Have message)
- what subset of content the local peer is interested in (via Want and Request messages)
A "persistent" (all-seeing) observer can infer the following with reasonable confidence, using only protocol traffic:
- which peers (identified by IP address, timing, topology, or other network metadata) have connected which other peers
- the relative flow of content (which peer provided data to the other, and how much)
- which primary hypercore feeds which peers have knowledge of (identified by discovery key, not public key)
- which peers have the public key for a given discovery key (based on whether connections succeed)
Such an observer can not determine the specific content of hypercore feeds, or (with confidence) which peer (or peers) have write access to a feed.
The wire protocol does not pad message sizes. This means that a persistent observer could potentially identify traffic content by inferring message sizes.
The hypercore protocol does not intentionally identify peers across connections, but it has been shown in other network protocols (like HTTP) that even the smallest amount of identifying metadata can be used, statistically, to track and surveil users. These techniques are sometimes called "device fingerprints", "browser fingerprints", or "permacookies". Hypercore protocol users should not consider themselves immune to such tracking without specific additional effort to identify and mitigate such fingerprints.
Any network observer with the public key can fully decrypt the network traffic of any and all peers establishing a connection using that key. This includes channel content other than the first (discovery) channel. Peers should not consider extension messages, "Have"/"Want" metadata, or any other messages or metadata private from other peers (or observers) with the public key.
The wire protocol encryption provides message secrecy (from parties who do not have the public key), but does not guarantee any form of authenticity. In the case of Dat and hypercore, the application layer itself verifies authenticity of transferred content using hashes and signatures. However, implementors should note that network agents who can manipulate traffic can modify data in flight without detection, regardless of whether they have the feed public key. However, peers are already not trusted parties, and thus implementors must already take care to treat protocol messages as potentially hostile input.
The issues of observer decryption and message authenticity may be addressed in a future revision of the wire protocol.
References
"Dat - Distributed Dataset Synchronization And Versioning" by Maxwell Ogden, Karissa McKelvey, and Mathias Buus Madsen. Whitepaper, May 2017 (pdf).
"The BitTorrent Protocol Specification (BEP #3)", by Bram Cohen. January 2008 (html)
"uTorrent transport protocol (BEP #29)", by Arvid Norberg. June 2009 (html)
"Merkle hash torrent extension (BEP #30)", by Arno Bakker. May 2009. (html)
"Updating Torrents Via DHT Mutable Items (BEP #46)", by Luca Matteis. July 2016. (html)
"Rarest First and Choke Algorithms Are Enough", by Arnaud Legout, G. Urvoy-Keller, and P. Michiardi. October 2006. (pdf)
"Extending the Salsa20 nonce", by Daniel J. Bernstein. February 22011. (pdf)
libsodium is a fork of NaCl (the "Networking and Cryptography library", developed by Daniel J. Bernstein). More information available from the NaCl website and libsodium website. Specific information about the BLAKE2b hash function available from the BLAKE2 website.
Google Protocol Buffers Documentation (website)
Unresolved questions
- Encryption might not make sense in some contexts (eg, IPC, or if the transport layer is already providing encryption). Should this DEP recognize this explicitly? Does not need to be addressed before Draft status.
- There is a potential race condition with channel index numbers. If each peer sends a new Feed message on a new channel at the same time (aka, before the remote message is received), what should peers do? Probably ignore the channel and try again. Possibly channel indices should go even/odd depending on the peer proposing to prevent conflicts. Does not need to be resolved before Draft status.
Changelog
The Dat wire protocol was initially described in the 2016 white paper referenced above.
- 2018-03-04: Early "WIP" draft circulated on github
- 2019-01-19: First complete draft submitted for review
- 2019-02-27: Accepted as Draft