Exploring Blockchain State Management Through Merkle Tries

1 QniCDbVAOa8JM2Z jiyAMA 1

Understanding Merkle Tries for Blockchain State Management

Every blockchain has some state associated with it. That state can be thought of as a collection of key/value pairs. This state might represent something simple, like how many tokens each account has. The state might represent something more complicated like the code of a smart contract. No matter what this state represents, everyone on the blockchain has to reach agreement. If I thought that you had ten tokens, but you thought that you had a million tokens, we obviously couldn’t get much done until we figured out which of us was right.

Unfortunately reaching agreement about the state is quite a big problem. One issue is how to set things up so that bad participants can never trick trustworthy participants into accepting a bad state. The other issue is that the state is very large. How can participants reach agreement quickly if even transmitting the state takes a long time? In this article we will talk about one possible solution to the latter issue, the Merkle trie.

What is a Merkle Trie?

A Merkle trie is a data structure that combines the features of two other data structures. The first is a Merkle tree and the second is a radix trie.

What is a Merkle Tree?

A Merkle tree is a special type of tree. The special feature of Merkle trees is that they generate “fingerprints” for each part of the tree, as well as the whole tree. Just like a real fingerprint, the ones generated by the tree are fairly unique. Whenever you change any data in the tree, the fingerprint changes as well.

This provides us with a solution to the problem of communicating what each participant thinks the state currently is. You can take the fingerprint that represents the entire tree and send it to other blockchain participants. If everyone has generated the same fingerprint for the state data, then they must have the same state. If the fingerprints are different then the state must have been different.

What is a Radix Trie?

A radix trie is a special type of trie (a trie is a special type of tree) that makes storing and retrieving key/value pairs easy. A Merkle tree is really good at allowing you to prove that some data is part of the tree, but it does not make it easy to look up the values associated with keys, so we combine the radix trie and the Merkle tree to get something that can do both.

How does it all work?

The exact details of how a Merkle trie works can vary as there are many different ways to implement similar functionality. Trying to explain all of these different ways is beyond the scope of this article, so the rest of this article will describe just one particular implementation.

It’s nodes all the way down

The Merkle trie is made up of just one object, nodes. Each node is a little chunk of data that is stored within the trie. The data each node contains is key information, value information (if one exists for the node’s key), and the fingerprints of any children nodes.

Generating fingerprints

How do you use a Merkle trie to generate the fingerprint of each node? The most common way is to use a cryptographic hash function. A lot of different cryptographic hash functions exist, but they all share two features important to Merkle tries.

  • The first feature of a hash function is that it takes in data as input and outputs a number of fixed size that represents that data. Usually the number that is output is significantly smaller in size than the input data. For example, SHA-256, a popular hash function, takes inputs of any size and generates an output number 256 bits long. The function is very sensitive, so small changes in the input will lead to large differences in the output.
  • The second feature of hash functions is that it needs to be very difficult to “reverse” the function, i.e., given some output number, find any inputs that would generate that number.

One issue with cryptographic hash functions is that they tend to be somewhat slow to calculate. On one hand, this slowness is a security feature. If you wanted to create a fake state with the same hash function output, you could repeatedly try different inputs until you found something that worked. The slower it is to execute the hash function, the longer that search will take. On the other hand, slow calculation speed can make using the Merkle trie slower since it makes a lot of calls to the hash function in normal usage.

Finding a key/value pair

Recall that the data in this Merkle trie is the current blockchain state and that the state is made up of key/value pairs. One of the common actions that you have to do is look up the value associated with a key. This is done by turning the key into a path through the trie. By starting at the Root node and then following this key path, you will end up at the node that stores the value you want.

Inserting a key/value pair

Inserting a key/value pair into the trie is very similar to searching for an existing key. The main difference is the key you are looking for might not be in the trie. While following the key path, one of several situations will occur.

  • Scenario 1: Find a node corresponding to the key.
  • Scenario 2: Ran out of nodes while following the key path.
  • Scenario 3: Conflicting Compressed Path Found.

https://medium.com/avalancheavax/from-the-labs-handling-blockchain-state-b2611d540a0a