Merkle Trees

Merkle Tree1 is a data structure where every non-leaf node contains the hash of the labels of its child nodes, and the leaves have their own values hashed2. Because of this characteristic, Merkle Trees are used to verify that two or more parties have the same data without exchanging the entire data collection. The following figure shows an example of a Merkle Tree:

                                          ┌───────────────┐
                                          │     Root      │
                        ┌─────────────────│Hash(AA1 + BB2)│───────────────┐
                        │                 └───────────────┘               │
                        │                                                 │
                        │                                                 │
                        │                                                 │
                        │                                                 │
                 ┌─────────────┐                                   ┌─────────────┐
                 │     AA1     │                                   │     BB2     │
            ┌────│Hash(A1 + B1)│──────┐                         ┌──│Hash(C1 + D1)│────────┐
            │    └─────────────┘      │                         │  └─────────────┘        │
            │                         │                         │                         │
            │                         │                         │                         │
      ┌───────────┐             ┌───────────┐             ┌───────────┐             ┌───────────┐
      │    A1     │             │    B1     │             │    C1     │             │    D1     │
    ┌─│Hash(A + B)│─┐         ┌─│Hash(C + D)│─┐         ┌─│Hash(E + F)│─┐         ┌─│Hash(G + H)│─┐
    │ └───────────┘ │         │ └───────────┘ │         │ └───────────┘ │         │ └───────────┘ │
    │               │         │               │         │               │         │               │
    │               │         │               │         │               │         │               │
┌───────┐       ┌───────┐ ┌───────┐       ┌───────┐ ┌───────┐       ┌───────┐ ┌───────┐       ┌───────┐
│   A   │       │   B   │ │   C   │       │   D   │ │   E   │       │   F   │ │   G   │       │   H   │
│Hash(A)│       │Hash(B)│ │Hash(C)│       │Hash(D)│ │Hash(E)│       │Hash(F)│ │Hash(G)│       │Hash(H)│
└───────┘       └───────┘ └───────┘       └───────┘ └───────┘       └───────┘ └───────┘       └───────┘

Trees created by different machines can be compared as a way to guarantee that each machine has the same data as the other. This is exactly what Amazon’s Dynamo [1] does in its anti-entropy phase3: nodes exchange their Merkle tree roots and compare the root hashes to see if they have the same data; if that’s not the case they proceed by exchanging the inner nodes to see which branches are different and then synchronize accordingly.

Another system that uses Merkle trees is Bitcoin [2]. In Bitcoin, each block includes the Merkle root of all transactions in such block. And somewhat similar to what happens in Dynamo, the Merkle tree in Bitcoin can be used to verify that a given transaction is present in the block4.

All this reading led me to start building a library for creating and manipulating Merkle Trees in Erlang as a proof of concept. So far, it provides a way to create a binary Merkle Tree from {Key, Value} pairs, find which keys a given tree has, and compare two trees to find which keys are different.

References

  • [1] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s Highly Available Key-value Store, 2007, pp. 1–16.
  • [2] S. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System, pp. 1–9, Mar. 2009.

Notes

  1. Also known as hash tree. 

  2. If the leaf holds a key/value pair you could hash both together. 

  3. If you read the previous post about the Bayou system you might have recognized this name. 

  4. For more information about Merkle trees in Bitcoin you can read the following material: http://chimera.labs.oreilly.com/books/1234000001802/ch07.html#merkle_trees