Monday, September 29, 2014

A fairly random collection of links on Merkle Trees

Just sitting around, hanging out, musing about Merkle Trees...

  • Merkle tree
    Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. Currently the main use of hash trees is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks.
  • A Certified Digital Signature
    The method is called tree authentication because the computation of H(1,n,Y) forms a binary tree of recursive calls. Authenticating a particular leaf Y(i) in the tree requires only those values of H() starting from the leaf and progressing to the root, i.e., from H(i,i,Y) to H(1,n,Y).
  • Recent Improvements in the Efficient Use of Merkle Trees: Additional Options for the Long Term
    Fractal Merkle Tree Representation and Traversal, shows how to modify Merkle’s scheduling algorithm to achieve a space-time trade-off. This paper was presented at the Cryptographer’s Track, RSA Conference 2003 (May 2003). This construction roughly speeds up the signing operation inherent in Merkle’s algorithm by an arbitrary factor of T, (less than H), at a cost of requiring more space: (2^T times the space).
  • Merkle Signature Schemes, Merkle Trees and Their Cryptanalysis
    The big advantage of the Merkle Signature Scheme is, that the security does not rely on the difficulty of any mathematic problem. The security of the Merkle Signature Scheme depends on the availability of a secure hash function and a secure one-time digital signature. Even if a one-time signature or a hash function becomes insecure, it can be easily exchanged. This makes it very likely that the Merkle Signature Scheme stays secure even if the conventional signature schemes become insecure.
  • Caches and Merkle Trees for Efficient Memory Authentication
    Our work addresses the issues in implementing hash tree machinery in hardware and integrating this machinery with an on-chip cache to reduce the log N memory bandwidth overhead.
  • Protocol specification
    Merkle trees are binary trees of hashes. Merkle trees in bitcoin use a double SHA-256, the SHA-256 hash of the SHA-256 hash of something.

    If, when forming a row in the tree (other than the root of the tree), it would have an odd number of elements, the final double-hash is duplicated to ensure that the row has an even number of hashes.

    First form the bottom row of the tree with the ordered double-SHA-256 hashes of the byte streams of the transactions in the block.

    Then the row above it consists of half that number of hashes. Each entry is the double-SHA-256 of the 64-byte concatenation of the corresponding two hashes below it in the tree.

    This procedure repeats recursively until we reach a row consisting of just a single double-hash. This is the Merkle root of the tree.

  • Amazon's Dynamo
    Merkle trees help in reducing the amount of data that needs to be transferred while checking for inconsistencies among replicas. For instance, if the hash values of the root of two trees are equal, then the values of the leaf nodes in the tree are equal and the nodes require no synchronization. If not, it implies that the values of some replicas are different. In such cases, the nodes may exchange the hash values of children and the process continues until it reaches the leaves of the trees, at which point the hosts can identify the keys that are “out of sync”. Merkle trees minimize the amount of data that needs to be transferred for synchronization and reduce the number of disk reads performed during the anti-entropy process.
  • Cassandra: Using Merkle trees to detect inconsistencies in data
    A repair coordinator node requests Merkle tree from each replica for a specific token range to compare them. Each replica builds a Merkle tree by scanning the data stored locally in the requested token range. The repair coordinator node compares the Merkle trees and finds all the sub token ranges that differ between the replicas and repairs data in those ranges.
  • The Dangers of Rebasing A Branch
    Personally, having studied Merkle Trees and discussed a possible use-case for using git/Merkle Trees as a caching solution, I view git as a entirely immutable structure of your code. Rebases break this immutability of commits.
  • Sigh. "grow-only", "rebase is dangerous", "detached head state is dangerous". STOP. Stop it now.
    git is a bag of commits organized into a tree (a tree of Merkle hash chains). Branches and tags are symbolic names for these. Think of it this way and there's no danger.

    ...

    I didn't need a local branch crutch to find my way around because I know the model: a tree of commits.

    Understanding the model is the key.

    There are other VCSes that also use Merkle hash trees. Internally they have the power that git has.

  • Google's end-to-end key distribution proposal
    Smells like a mixed blockchain/git type approach - which is a good thing. The "super-compressed" version of the log tip sounds like git revision hash. The append-only, globally distributed log is pretty much like a blockchain.
  • Ticking time bomb
    Given only one verified hash in such a system, no part of the data, nor its history of mutation can be forged. "History" can mean which software runs on your computer (TPM), which transactions are valid (Bitcoin), or which commits have been done in a SCM (git, mercurial).

    So git is not magical, it is just a practical implementation of something that works. Any other *general* solution will be based on similar basic principles. Mercurial does this and there is a GPG extension for it.

Wow, that's a lot.

I shall have to find more time to read...

No comments:

Post a Comment