With the increasing adoption of blockchains, and still very limited capacity of modern protocols, many so-called Layer 2 protocols emerged, such as State Channels, Side Chains, Plasma and Roll Ups. This blog post dives relatively deeply into the technical details of each approach and their benefits and disadvantages.
The core idea behind any Layer 2 solution is that it allows several parties to securely interact in some way without issuing transaction on the main chain (which is the Layer 1), but still to some extent leveraging the security of the main chain as the arbitrator.
Different layer 2 solutions have different properties, advantages and disadvantages. The L2 solution that I personally find the most exciting is Roll Ups, which we will cover last.
As a good introductory example of Layer 2 solutions, let’s first consider simple payment channels. Payment channels are one of the most adopted Layer 2 solutions today. Lightning Network, for example, is based on Payment Channels.
Payment channels is a specific instantiation of a more generic concept called State Channels. A good overview of the history of the state channels can be found here. State channel is a protocol between a fixed set of participants (often two) that want to transact securely between themselves off-chain, in case of payment channel specifically exchange money. The protocol for the payment channel goes as follows: two participants first both deposit some money, say $10 worth of Bitcoin, on-chain using two on-chain transactions. After the money is deposited, both participants can instantaneously send each other money without interaction with the main chain by sending to each other state updates in a form of [turn_number, amount, signature], for as long as the balances of both participants remain non-negative.
Once one of the participants wants to stop using the payment channel, they perform a so-called “exit”: submit the last state update to the main chain, and the latest balances are transferred back to the two parties that initiated the payment channel. The main chain can validate the validity of the state update by verifying signatures and final balances, thus making it impossible to try to exit from an invalid state.
The problem with the exits is that the main chain cannot validate that the sequence of the transactions submitted is full, i.e. that no more state updates happened after those that are presented. For example, consider the following scenario:
From the state in which Alice has $11 and Bob has $9 Alice sends a state update to Bob that transfers him $7, waits for him to provide her some service for those $7, and then exits with the state update that was before she sent $7 to Bob. The main chain cannot know that an extra State Update existed, and thus sees the exit as valid.
The way to get around it is to have some time after the exit was initiated for Bob to challenge the exit, by providing a state update that is signed by Alice, and has a higher turn_number than the one that Alice submitted. In the example above Bob can submit the last state update that transfers $7 to him from Alice during the challenge period, and claim his $16 instead of just $9 in the attempted exit by Alice.
While the construction with the exit game is now secure, it presents a major inconvenience: the participants might be forced to wait for a rather long period of time to exit, usually 24 hours, and need to frequently (at least once per exit period) monitor the main chain to make sure their counterparty doesn’t try to exit using some past state. A technique called WatchTowers can be used to delegate watching the network to a 3rd party. Read more about watchtowers here and here.
To remove the necessity to wait for the exit timeout if both parties collaborate many implementations have a concept of “collaborative close”, in which participants can sign a “conclusion proof”, and the presence of such a conclusion proof allows the other party to exit without waiting for the challenge period.
The payment channel can be generalized to arbitrary state transitions, not just payments, for as long as the main chain can validate the correctness of such transitions. For example, a game of chess can be played using state channels, by players submitting their moves as transactions to each other.
Despite the inconveniences listed above, State Channels are widely used today for payments, games and other use cases, due to their immediate finality (once the counterparty confirms the receipt of a state update it is final), no fees except for those paid for the deposit and exit, and relative simplicity of the construction.
While the high level definition of the state channels is relatively simple, accounting for all the corner cases, so that no party can illegally take money from the other party, is a relatively complex task. See this whiteboard session with Tom Close from Magmo in which we dive very deeply into the intricacies of building secure state channels.
State Channel Networks
Another disadvantage of the state channels as described above is that they only exist between two participants. You can have constructions with N of N multisignatures that allow multiple parties maintain a state channel between themselves, but it would be desirable to have a layer 2 solution with properties that State Channels have, that allows parties that do not have a channel open between them directly to still transact.
Interestingly, it is possible to construct the state channels in such a way that if Alice has a state channel with Bob, and Bob has a state channel with Carol, then Alice can securely and atomically send money to Carol via Bob. With this an entire network of state channels can be built, allowing a large number of participants to transact with each other, without maintaining a connection between every pair of participants.
This is the idea behind the Lightning network. In our whiteboard session with Dan Rabinson on Interledger we dived pretty deep into Lightning Networks design, check it out here.
The core idea behind a simple side chain is to have a completely separate blockchain, with its own validators and operators, that has bridges to transfer assets to and from the main chain, and optionally snapshots the block headers to the main chain to prevent forks.
The snapshots can provide security against forks even when the validators of the side chain collude and try to fork out:
On the figure above the side chain produces blocks, and snapshots them to the main chain. The snapshot is just the hash of the block that is stored on the main chain. The fork choice rule on the side chain is such that the chain cannot be canonical if it doesn’t build on top of the latest snapshotted block. On the figure above even if the validators of the side chain collude and try to produce a longer chain A’<-B’<-C’ after producing block A to perform a double spend, if block A was snapshotted to the main chain, the longer chain will be ignored by the side chain participants.
If a participant wants to move assets from the main chain to the side chain, they “lock” the assets on the main chain, and provide a proof of the lock on the side chain. To unlock the assets on the main chain, they initiate an exit on the side chain, and provide a proof of the exit once it is included in the side chain block.
However, despite the fact that the side chain can leverage the security of the main chain to prevent forks, the validators can still collude and perform a different kind of attack called Invalid State Transition. The idea behind the attack is that the main chain cannot possibly validate all the blocks that the side chain produces (it would invalidate the purpose of the side chain, that is to offload the main chain from validating each transaction), and thus if more than 50% or 66% (depending on the construction) of the validators collude, they can create a completely invalid block that steals money from other participants, snapshot such a block, initiate an exit for the stolen funds and complete it.
We wrote a great overview of the invalid state transition problem in the context of sharding here. This problem maps to side chains one to one, in which case side chains would correspond to shards in the overview, and the main chain would correspond to the beacon chain.
The article linked above also covers some ways to get around invalid state transitions, but those ways are not implemented in practice today, and most side chains are built with the assumption that more than 50% (or 66% depending on construction) of validators never get corrupted.
Plasma is a construction that enables “non-custodial” sidechains, that is, even if all sidechain (commonly called “plasma chain”) validators collude to conduct any type of adversarial behavior, the assets on the plasma chain are safe, and can be exited to the mainchain.
The simplest construction of plasma, commonly referred to as Plasma Cash, only operates with simple non-fungible tokens, and only allows transferring a particular constant amount in each transaction. It operates in the following way:
Each block contains a Sparse Merkle Tree that in its leafs contains the change to the ownership of a particular token. For example, on the above figures there are four total tokens in circulation, and in the block B tokens 1, 3 and 4 do not change hands (there’s nil in the leaf), while token 2 now belongs to Alice. If block D then contained a transaction signed by Alice that transfers the block to Bob, then the same token 2 in block D would have the transaction from Alice to Bob.
To transfer a token to someone one needs to provide the full history of the token across all the blocks that were produced on the plasma chain since the token was moved to the chain until the transaction. In the example above if Bob wants to transfer the token to Carol, then a transaction (depicted at the bottom) would include one entry per block, with a merkle proof of change of ownership or lack thereof in each block.
Plasma chain snapshots the headers of all the blocks to the main chain, and thus Carol can validate that all the merkle proofs correspond to the hashes snapshotted to the main chain, and that the change of ownership in each block is valid. Once such transaction ends up in a block of the plasma chain, the corresponding entry with Carol is written into the merkle tree, and Carol is now the owner.
Since it is assumed that the Plasma operator can be corrupted at any moment, the exits cannot be instantaneous (since the hash of the state snapshotted by the plasma operator cannot be trusted), and an exit game is required. Even for such relatively simple construction that we discussed above the exit game is pretty complex. In an episode of Whiteboard Series with Georgios Konstantopoulos from Loom Network, that goes very deep into the technical details of Plasma Cash, he presents the exit game that was used at that point by Loom Network, and we find an example where by withholding the data the operator can either steal tokens from an honest participant, or have an ability to execute a relatively painful grieving attack (see the video starting from 41:40 for the details). Later Dan Robinson proposed a simpler exit game that addressed the issue, but again an example that reorders blocks was found that broke it.
Overall, the biggest advantage of Plasma is the security of the tokens that are stored on the plasma chain. An honest participant can be certain that they will be able to withdraw their tokens no matter what of the following events occur: plasma operator creates an invalid state transition (before or after the honest participant received their tokens), plasma operator withholds the produced blocks, plasma operator completely stops producing blocks. In all these scenarios, or in general under any circumstances, the tokens cannot be lost.
The disadvantages are the necessity to provide the full history of the token when it is transferred, and the complexity of the exit games (and in general reasoning about them).
For more technical details see the episode with Loom Network mentioned above, as well as the episode with Ben Joines from Plasma Group, in which he talks about Plasma CashFlow, a more sophisticated flavor of Plasma Cash that allows transacting in arbitrary denominations.
As I mentioned when discussing the side chains, one of the ways to get around the Invalid State Transition problem in side chains is to provide a cryptographic proof of correctness of each state transition. The particular instantiation of this approach presently built by Matter Labs is called Roll Ups, and was initially proposed on ethresear.ch by Barry White Hat here.
The Roll Up is effectively a side chain, in the sense that it produces blocks, and snapshots those blocks to the main chain. The operators in the Roll Up, however, are not trusted. Thus it is assumed that at any point the operators can attempt to stop producing blocks, produce an invalid block, withhold data, or attempt some other adversarial behavior.
Similar to regular side chains, the operators cannot produce fork that precedes any block snapshotted to the main chain, so once a block on the main chain that contains the snapshot is finalized, so is the block on the Roll Up chain that is snapshotted.
To get around the state validity issue, each time the Roll Up operator snapshots the block, they provide a snark of a list of transactions which performs a valid state transition. Consider the example below:
There are three blocks on the roll up chain: A, B and C, snapshotted correspondingly to blocks X, Y and Z on the main chain. At each point in time the main chain doesn’t need to store anything besides the last merkle root of the state of the roll up chain. When the block A is snapshotted, a transaction is sent to the main chain that contains:
- The merkle root h(S2) of the new state S2;
- Either the full state S2, or all the transactions in the block.
- A zk-SNARK that attests that there’s a valid series of transactions that move from a state hash of which is equal to h(S1) to a state hash of which is equal to h(S2), and that the applied transactions match the data provided in (2).
The transaction verifies that the zk-SNARK is correct, and stores the new merkle root h(S2) on chain. Importantly, it doesn’t store the full content of A in the state, but it naturally is kept in the call data, so can be fetched in the future.
The fact that the full block is stored in the call data is somewhat a bottleneck, but it provides a solution to the data availability issue. With the current Matter Labs implementation it takes one minute to compute the snark for a transaction (which can be done in parallel for multiple transactions), each transaction costs 1K gas on-chain, and occupies 9 bytes of call data on the main chain.
With this construction the malicious operator cannot do any harm besides going offline:
- It cannot withhold the data, since the transaction that snapshots a block must have the full block or the full state passed as an argument, and validates that the content is correct, and then such content is persisted in the mainnet calldata.
- It cannot produce a block with an invalid state transition, since it must submit a zk-SNARK that attests to the correctness of the state transition, and such a zk-SNARK cannot be obtained for an invalid block;
- It cannot create a fork since the fork choice rule always prefers the chain that contains the last snapshotted block, even if a longer chain exists.
While the amount of storage in the call data is not improved significantly with this L2 solution, the amount of the actual writable storage consumed is constant (and is very small), and the gas cost of on-chain verification is only 1k gas/tx, which is 21x lower than an on-chain transaction.
Importantly, assuming the Roll Up operators cooperate, the exit is instantaneous, and doesn’t involve an exit game. These properties combined make the Roll Up chains one of the most exciting L2 solutions today.
I work on a sharded Layer 1 protocol called Near. There’s a common misconception that sharded Layer 1 protocols compete with Layer 2 as solutions for blockchain scalability. In practice, however, it is not the case, and even when sharded protocols become live, Layer 2 solutions will still be actively used.
Designed for specific use cases, Layer 2 will remain cheaper and provide higher throughput, even when Layer 1 blockchains scale significantly better.
This write-up is part of an ongoing effort to create high quality technical content about blockchain protocols and related topics. We run a video series in which we talk to the founders and core developers of many protocols, we have episodes with Ethereum Serenity, Cosmos, Polkadot, Ontology, QuarkChain and many other protocols. All the episodes are conveniently assembled into a playlist here.
Follow me on twitter to get updated when we publish new write-ups and videos.
IOTA Foundation recently announced a project called Coordicide, and an accompanying paper. The goal of the project is to remove the so-called IOTA Coordinator, a centralized service that finalizes transactions on IOTA.
The paper outlines multiple changes that are being considered in order to make it possible for the protocol to remove the coordinator and thus make IOTA completely decentralized.
One of the most interesting changes proposed is the new consensus algorithm for choosing between multiple conflicting transactions. The consensus algorithm is described in a separate paper that Dr. Serguei Popov, one of the core researchers at IOTA, published shortly before the Coordicide project was announced.
The new consensus design attracted a lot of interest because of the similarities people drew between it and Avalanche, a paper that was publicly endorsed by Emin Gun Sirer (who is also one of Avalanche’s likely co-authors) a few years ago.
DAGs of Transactions
Indeed, IOTA and Avalanche are both built around a directed acyclic graph (DAG) of transactions, with edges of the DAG representing approvals.
In both IOTA and Avalanche the DAG is used to reduce the amount of communication needed in order to finalize the transactions. If a certain transaction is final from the perspective of some node, then all the transactions approved by that transaction are also final (such transactions are called Ancestry in Avalanche and Past Cone in IOTA). If a particular transaction is rejected (which can only happen if the transaction spends the same UTXO as some other approved transaction) then all the transactions that approve the rejected transaction are also rejected (such transactions are called Progeny in Avalanche and Future Cone in IOTA).
At the core of such a DAG-based protocol is the consensus algorithm that chooses one transaction amongst several conflicting transactions, i.e. transactions that spend the same UTXO. Such consensus doesn’t have to converge if multiple conflicting transactions appear at approximately the same time, and have an approximately equal number of participants initially preferring each one, since conflicting transactions can only come from malicious actors, and the protocol doesn’t need to guarantee finality for such actors. But if a particular transaction came first, and a sufficiently large percentage of participants preferred it for sufficiently long, a conflicting transaction that comes later shall never become the preferred transaction. The motivation behind this is that more transactions now exist in the Progeny / Future Cone of the transaction that arrived first, and if it gets rejected, all the new transactions from the honest nodes will get rejected too.
The consensus protocol that chooses between multiple conflicting transactions presented in the Avalanche paper is called Snowball. The consensus protocol for IOTA is presented in the aforementioned paper.
Since the consensus algorithms pursue similar goals, they also are quite similar underneath. At the core of both consensus algorithms is a node doing the following procedure iteratively until it becomes sufficiently certain that consensus has been reached:
- Choose a small sample of other nodes (on the order of 10) and query the outcome they currently prefer;
- Update the node’s current belief based on the resulting votes.
The “update the current belief” part is the core of such consensus algorithms. Since the consensus protocols need to work in the presence of adversaries that will behave in such a way as to prevent the network from reaching consensus, naive approaches (such as just preferring the outcome that was preferred by the majority of sampled nodes, or changing preference if a large percentage of sampled nodes (say 80%) believe in the opposite outcome) do not work.
Before we proceed, let’s define what “do not work” means. There are three ways in which these consensus algorithms can break:
- Agreement failure — when two nodes both decide that some outcome was agreed upon, but those outcomes differ;
- Termination failure — when no consensus is reached after an arbitrarily long period of time;
- Integrity failure — when consensus is reached on some outcome, but that outcome was not proposed by anyone. An example of integrity failure is reaching consensus on value 0, when all the participants initially proposed the value of 1.
In the context of Snowball and the new IOTA consensus protocol, Agreement failure is absolutely not acceptable, and Integrity failure is also not acceptable, but in a slightly adjusted way. It is not only necessary that consensus is reached on the outcome that was proposed by someone, but also that if a majority of nodes were proposing some outcome, no other outcome shall be agreed upon, even if some nodes in the minority were proposing it.
The termination would also be desirable, but both protocols deemphasize it for the cases with more than one proposed outcome, arguing that more than one proposed outcome means multiple transactions spending the same UTXO, which can only come from malicious actors.
Both Snowball and the new IOTA consensus protocol provide agreement, at least as far as I can tell (though it’s important to note that the Avalanche paper has a typo that currently means Snowball doesn’t provide Agreement; with the typo fixed it is unlikely that Agreement can be violated). For both of them, it is easy to argue that if a majority of nodes initially sway towards one of the outcomes, the nodes will not switch to the other outcome no matter what malicious actors do, so Integrity (as defined above) is also present.
Termination of the new IOTA consensus and Snowball
The important difference comes when we consider Termination.
Let’s get back to the “update the current belief” part. After sampling the votes of the 10 peers a node needs to somehow adjust their current preference. In snowball each node maintains several counters to remember its confidence in each of the outcomes, and waits for several consistent consecutive samples before it changes its belief. This way adversaries cannot easily sway nodes towards one decision or the other, and cannot violate Agreement.
However, this doesn’t help much with Termination. In our simulation of Snowball we show that with the parameters provided in the paper, consensus can be kept in a metastable state for thousands of iterations if just 4% of the nodes are adversarial. With just 10% adversaries, not only does the consensus process remain in the metastable state indefinitely, but also, the nodes that believe in each outcome increase their confidence in their preferred outcome to very large numbers and continue to become more confident, bringing the consensus process into a state from which it provably cannot escape.
Thus, Snowball as-is only has Termination in the binary consensus setting with a very low percentage of adversaries. Here’s a simulation with 17% adversaries:
Read more about it here.
IOTA consensus uses a very different approach. It doesn’t maintain any counters, and instead does the following to choose between 0 and 1:
- Each node samples current beliefs of k (say k=10) other nodes;
- After that, all the nodes run some distributed randomness generator to generate some threshold between some value beta and 1 – beta. I.e. if beta = 0.3, then the value will be picked between 0.3 and 0.7;
- Each node then chooses 1 if the number of nodes that prefer 1 in their sample was bigger than beta * k, otherwise it chooses 0.
The core idea here is that since the random value is chosen after the samples were performed, even an omniscient adversary (an adversary who knows the current preferences and states of all the nodes, but not the random value that will be generated in the future) doesn’t know which threshold to sway the samples towards.
To understand why this helps, imagine that the adversary can actually predict the value of the random generator, and knows it will be 0.7. If k=10, the adversary knows that in order to keep the consensus process in a metastable state, it wants approximately half of the nodes to sample less than 7 ones, and approximately half the nodes to sample more than 7 ones. If it also knows that in the current population of honest nodes 62 nodes prefer 1, it will make exactly 8 of the malicious nodes report 1 as well (so that together with the honest nodes, exactly 70 nodes report 1), and the remaining malicious nodes report 0. This way the median number of sampled ones will be 7, and thus approximately half of the nodes will end up sampling more than beta * k = 7 ones and choose one, while approximately half of the nodes will end up sampling less than 7 ones and choose zero. The adversary will then continue doing it in the future rounds, preventing the consensus process from converging.
However, consider what happens if the adversary doesn’t know the threshold. The adversary could try to report different outcomes to the different honest nodes that query them, but no matter what distribution of sampled votes they end up signalling, (with some non-trivial probability) the threshold they choose will cause a large percentage of the honest nodes to have queried samples that lie on the same side of the threshold. Thus, a large percentage of the participants will end up choosing the same outcome for the next round.
Such a consensus algorithm is significantly harder to stall. However, it relies on the existence of a distributed random number generator that generates the required randomness for the thresholds. Such randomness generation is a rather hard problem, especially given that the consensus strives for low network overhead. If Snowball had access to a distributed random number generator, the protocol’s designers could just make nodes choose the outcome of a random number generator in the event that the consensus process gets stuck in a metastable state.
For example, an idea like this is used in one of the components of Spacemesh. See professor Tal Moran explaining this approach here in a whiteboard session we recorded with him a few weeks ago.
In short, the new IOTA consensus is definitely in the same family of consensus algorithms as Snowball, but it is far from just being a Snowball copycat. As described, it is likely to have better liveness, but it relies on the existence of a distributed random number generator, which on itself is a complex problem (though the IOTA paper provides several references to existing research). If we assumed such a generator were available, it could be used in a variety of ways to escape the metastable state.
Myself and my co-founder Illia often invite founders and core researchers of other protocols to a room with a whiteboard and record an hour-long video diving deep into their tech. We have episodes with Ethereum Serenity, Cosmos, Polkadot, Ontology, QuarkChain and many other protocols. All the episodes are conveniently assembled into a playlist here.
On usability of blockchain applications
Imagine you want to play a blockchain game. For example, say you want to get a cryptokitty. Or play some collectible card game. It is actually a pretty involved process. You need to:
Create a key pair, and securely store the private key; If you want to later play from another device, you need to understand how to transfer it to that device;
Register on Coinbase;
Do a KYC, which involves sending your documents to Coinbase;
Wait for a few days;
Buy Ether. Yes, you need to make a purchase before you can even try the game!
Transfer Ether from Coinbase;
Finally, buy your kitty! Though now you need to pay for every interaction with the game, and your latency is at least 20 seconds.
Modern games and applications running on blockchains report a whopping 95–97% drop-off rate during the above onboarding process. That means that out of 100 users who try the application only 5 or fewer actually get to start using it!
The problem above can be roughly split into three subproblems:
The necessity to install a browser plugin (or have a wallet application) to securely interact with the chain;
The necessity to have and understand public/private keys security;
The necessity to pay for gas for each transaction.
Items 1 and 2 above are mandatory for interacting with the blockchain securely, they are designed to make sure the user doesn’t lose their funds or assets. The last item, besides providing a financial incentive to the miners, is also necessary to keep the blockchain itself secure — if transactions were free, it would be trivial to DDoS the system by spamming it with lots of free useless transactions.
Once a person is involved in a particular blockchain ecosystem, such as Ethereum or NEAR, they do have the browser plugin or wallet installed, have some assets on their accounts, and have all their devices set up to use the proper key pairs. For them using Web3 applications is relatively easy, besides maybe the fact that the applications are slow (latency and throughput of blockchain applications are beyond the scope of this writeup, check out our previous posts on sharding: one and two, as well as tech deep dives with developers of plasma: one and two, and state channels).
However, as of today, the majority of internet users do not use blockchain, and if we want it to change, we need to make the onboarding for them as streamlined as possible. In the ideal world developing a decentralized application running on a blockchain shall be no harder than building a nodeJS application, and once such an application is deployed, a user that never used blockchain before should be able to just open it in a browser and start interacting with it.
Let’s consider each of the barriers described above, what efforts are made to fix them today, and what changes we are developing on the protocol level to support them.
If you prefer video, you can watch me giving a talk at Berkeley on the same topic here:
Browser plugins / Wallet apps
You do need some custom binary running on your machine to securely interact with the blockchain. The motivation behind it is that anything hosted that you just open in your browser is completely controlled by the host, and thus can be arbitrarily changed at any point. Even if the hosted solution stores the keys locally encrypted, the code of it can later be changed to fetch the data from the local storage and send it to the remote server immediately after it was decrypted to be used for some interaction with the blockchain.
However, consider a person not involved in blockchain today buying crypto. Are they likely to set up their account locally and store funds there, or just store them on Coinbase, which is a completely centralized service? They will probably choose the latter.
Similar reasoning shall apply to use decentralized applications. When the user starts interacting with the blockchain they shall be able to do that through a hosted solution. It will provide lower security, since the centralized entity will have an ability to take over the account, but early on the user doesn’t have much to lose, so much security is no worse than what one gets today with centralized services to whom the users trust a great deal of their assets and data.
To emphasize this point, observe that most people install MetaMask from the Firefox or Chrome extensions catalog, and wallet applications from iTunes or PlayStore, effectively trusting both the MetaMask / wallet applications developers, and some big player such Mozilla, Apple or Google. It is extremely rare for one to install MetaMask from source, after carefully examining the code. Thus we already trust the security of our accounts to centralized entities.
There are solutions developed today that developers can integrate into their decentralized applications that would make it possible to interact with the application without installing browser plugins and wallet applications, such as Portis. The problem with such services is that once one trusted their private key to such a service, ultimately the security of the account is permanently compromised. If one later wants to get the full ownership over the account, they must create a new account and transfer all assets to such an account. If a particular application doesn’t provide a convenient way to transfer assets, the user will never be able to gain full ownership over such assets.
One solution to this problem is to have a contract-based account such that the user can replace the key that controls the account once they wish to do so. But for this to work the account needs to be contract-based from day one, and unless Portis or other service creates such a contract based account by default, users will not have this ability. Further, contract-based accounts cannot do everything that a regular account protected by a private key can do.
In NEAR each account is contract based by default, and a hosted wallet is provided by NEAR. Ultimately the user can start interacting with the blockchain by using the hosted wallet, and then later at any instance update the security of the account by creating a new key pair locally and updating the account to use such a key pair.
Someone suggested a term we like a lot for this approach: progressive security. The user transitions from the highest usability and low security to the highest security and low usability over time, as their involvement and investment into the blockchain increases.
Understanding private/public key pairs
If we convinced you that progressive security is a good thing, and hosted wallets are a way to go, key pairs are gone naturally. In the simplest approach, the hosted wallet stores the private keys of all the users in its own hosted database, and provide its own authentication layer to the users. It can offer them to use Facebook login, Google login, or just good old email and password option. Only once the user wants to transition from using the hosted wallet do they need to set up a private key properly, and learn how to transfer it to other devices.
Interestingly, with the contract based accounts, the transfer process itself can be done easier while maintaining the full security. Instead of transferring the private key to another device via some insecure channel, the contract that governs user’s account can have two methods: `proposeNewSk` and `approveSk`, where the first method can be invoked by anyone, and adds a new private key into a list of proposed private keys for the account, and `approveSk` can only be called with a signature from one of the existing private keys on the account, and can approve any of the proposed private keys for the account. This way a user can set up a brand new key pair on the new device, propose the new private key from such device, and approve it from the existing device.
Any transaction that is executed on chain consumes a somewhat large amount of resources. For a state change to be executed securely, a large number of independent entities need to validate the state transition before it is applied. Since there’s some amount of resources spent on executing the transaction, it cannot be free.
When one compares web3 to web2, they often argue that web2 services by nature are free today. One doesn’t pay for every transaction when they use Facebook. But in reality, they do. Facebook would not provide a free service to the users if the expected long term value from the user didn’t exceed the resources spent on the resources spent processing their requests and storing their data, as well as the cost of acquiring such a user. When using Facebook, users both pay with their data, access to which Facebook then abuses in the most unacceptable ways, and with their attention. The following screenshot literally doesn’t have a single block of information that is not sponsored:
(the value of x is 2)
In the case of the blockchain, if an application developer believes that the total lifetime value from the user will exceed the gas cost for their transaction, they shall be able to pay for such transactions. It is one of the few ideas that come from EOS that makes a lot of sense. Similarly, if the hosted wallet has some value in users using the applications, they can choose to cover the costs as well. For example, NEAR might opt in to cover some gas costs for each user, since it is highly motivated to get higher adoption for the protocol. CryptoKitties can choose to cover the cost for interactions with their contracts, since users that start playing CryptoKitties are very likely to buy one, and the expected value of a user is extremely high.
This only solves one part of the puzzle: offsetting the costs of executing transactions. If users don’t have to pay for transactions, they can spam the system with free transactions, and saturate the allowance that the hosted wallet or the application developers set for free usage. But similarly, people who use Facebook can spam them with free requests and saturate their resources. This problem is by no means unique to blockchain and has plenty of solutions already existing. The hosted wallet can choose to implement one such DDoS prevention solution, and still provide users with free transactions.
There’s still a problem. The model in which someone pays for the user expecting some value from them later is easily abusable. There’s a reason why Google, Facebook, Apple, and other tech giants have non-transparent privacy policies and completely disrespect users’ privacy. The entire motivation behind web3 is to put an end to such practices, but the very way we try to attract users promotes such practices again.
There’s however a fundamental difference. In web3, while the user can start using a service paying with the future expected value, they can at any point switch to paying for transactions themselves and use a hosted wallet, or a browser extension, that doesn’t try to take any advantage of the user’s privacy.
With the progressive security concept and particular solutions above, we can provide users with the onboarding as simple as it is today in web2, with an ability to upgrade to the full blockchain security at any moment in the future.
We are writing a separate blog post on the other side of the problem: ease of development. The state of developers experience in Ethereum is far from perfect, and we believe that it can be improved significantly.
While waiting for the blog post, you can already experiment with our development experience. Try out our online IDE, and read the documentation.
NEAR Protocol builds a sharded proof of stake blockchain with a fanatical emphasis on usability. If we intrigued you, please follow us on Twitter, and join our Discord, where we discuss all the topics related to tech, economy, governance and more.
In the first part of the series we provided motivation for blockchain sharding and discussed some core concepts. In this post we will discuss some more advanced aspects of sharding, including its two biggest unsolved challenges: data availability and data validity.
The core idea in sharded blockchains is that most participants operating or using the network cannot validate blocks in all the shards. As such, whenever any participant needs to interact with a particular shard they generally cannot download and validate the entire history of the shard.
The partitioning aspect of sharding, however, raises a significant potential problem: without downloading and validating the entire history of a particular shard the participant cannot necessarily be certain that the state with which they interact is the result of some valid sequence of blocks and that such sequence of blocks is indeed the canonical chain in the shard. A problem that doesn’t exist in a non-sharded blockchain.
We will first present a simple solution to this problem that has been proposed by many protocols and then analyze how this solution can break and what attempts have been made to address it.
The supposed simple solution
The naive solution to data validity is the following: let’s say we assume that the entire system has on the order of thousands validators, out of which no more than 20% are malicious or will otherwise fail (such as by failing to be online to produce a block). Then if we sample ~200 validators, the probability of more than ⅓ failing for practical purposes can be assumed to be zero.
⅓ is an important threshold. There’s a family of consensus protocols, called BFT consensus protocols, that guarantees that for as long as fewer than ⅓ of participants fail, either by crashing or by acting in some way that violates the protocol, the consensus will be reached.
With this assumption of honest validator percentage, if the current set of validators in a shard provides us with some block, the naive solution assumes that the block is valid and that it is built on what the validators believed to be the canonical chain for that shard when they started validating. The validators learned the canonical chain from the previous set of validators, who by the same assumption built on top of the block which was the head of the canonical chain before that. By induction the entire chain is valid, and since no set of validators at any point produced forks, the naive solution is also certain that the current chain is the only chain in the shard.
This simple solution doesn’t work if we assume that the validators can be corrupted adaptively, which is not an unreasonable assumption (see here to learn more about adaptive corruption). Adaptively corrupting a single shard in a system with 1000 shards is significantly cheaper than corrupting the entire system. Therefore, the security of the protocol decreases linearly with the number of shards. To have certainty in the validity of a block, we must know that at any point in history no shard in the system has a majority of validators colluding; with adaptive adversaries, we no longer have certainty. As we discussed in the previous part, colluding validators can exercise two basic malicious behaviors: create forks, and produce invalid blocks.
Malicious forks can be addressed by blocks being cross-linked to the Beacon chain that is generally designed to have significantly higher security than the shard chains. Producing invalid blocks, however, is a significantly more challenging problem to tackle.
Consider the following figure on which Shard #1 is corrupted and a malicious actor produces invalid block B. Suppose in this block B 1000 tokens were minted out of thin air on Alice’s account. The malicious actor then produces valid block C (in a sense that the transactions in C are applied correctly) on top of B, obfuscating the invalid block B, and initiates a cross-shard transaction to Shard #2 that transfers those 1000 tokens to Bob’s account. From this moment the improperly created tokens reside on an otherwise completely valid blockchain in Shard #2.
Some simple approaches to tackle this problem are:
- For validators of Shard #2 to validate the block from which the transaction is initiated. This won’t work even in the example above, since block C appears to be completely valid.
- For validators in Shard #2 to validate some large number of blocks preceding the block from which the transaction is initiated. Naturally, for any number of blocks N validated by the receiving shard the malicious validators can create N+1 valid blocks on top of the invalid block they produced.
A promising idea to resolve this issue would be to arrange shards into an undirected graph in which each shard is connected to several other shards, and only allow cross-shard transactions between neighboring shards (e.g. this is how Vlad Zamfir’s sharding essentially works, and similar idea is used in Kadena’s Chainweb). If a cross-shard transaction is needed between shards that are not neighbors, such transaction is routed through multiple shards. In this design a validator in each shard is expected to validate both all the blocks in their shard as well as all the blocks in all the neighboring shards. Consider a figure below with 10 shards, each having four neighbors, and no two shards requiring more than two hops for a cross-shard communication:
Shard #2 is not only validating its own blockchain, but also blockchains of all the neighbors, including Shard #1. So if a malicious actor on Shard #1 is attempting to create an invalid block B, then build block C on top of it and initiate a cross-shard transaction, such cross-shard transaction will not go through since Shard #2 will have validated the entire history of Shard #1 which will cause it to identify invalid block B.
While corrupting a single shard is no longer a viable attack, corrupting a few shards remains a problem. On the following figure an adversary corrupting both Shard #1 and Shard #2 successfully executes a cross-shard transaction to Shard #3 with funds from an invalid block B:
Shard #3 validates all the blocks in Shard #2, but not in Shard #1, and has no way to detect the malicious block.
There are two major directions of properly solving data validity: fishermen and cryptographic proofs of computation.
The idea behind the first approach is the following: whenever a block header is communicated between chains for any purpose (such as cross-linking to the beacon chain, or a cross-shard transaction), there’s a period of time during which any honest validator can provide a proof that the block is invalid. There are various constructions that enable very succinct proofs that the blocks are invalid, so the communication overhead for the receiving nodes is way smaller than that of receiving a full block.
With this approach for as long as there’s at least one honest validator in the shard, the system is secure.
This is the dominant approach (besides pretending the problem doesn’t exist) among the proposed protocols today. This approach, however, has two major disadvantages:
- The challenge period needs to be sufficiently long for the honest validator to recognize a block was produced, download it, fully verify it, and prepare the challenge if the block is invalid. Introducing such a period would significantly slow down the cross-shard transactions.
- The existence of the challenge protocol creates a new vector of attacks when malicious nodes spam with invalid challenges. An obvious solution to this problem is to make challengers deposit some amount of tokens that are returned if the challenge is valid. This is only a partial solution, as it might still be beneficial for the adversary to spam the system (and burn the deposits) with invalid challenges, for example to prevent the valid challenge from a honest validator from going through. These attacks are called Griefing Attacks.
Neither of the fisherman’s two problems has a satisfactory solution, but using fisherman is still strictly better than having the possibility of an invalid block being finalized.
Succinct Non-interactive Arguments of Knowledge
The second solution to multiple-shard corruption is to use some sort of cryptographic constructions that allow one to prove that a certain computation (such as computing a block from a set of transactions) was carried out correctly. Such constructions do exist, e.g. zk-SNARKs, zk-STARKs and a few others, and some are actively used in blockchain protocols today for private payments, most notably ZCash. The primary problem with such primitives is that they are notoriously slow to compute. E.g. Coda Protocol, that uses zk-SNARKs specifically to prove that all the blocks in the blockchain are valid, said in one of the interviews that it can take 30 seconds per transaction to create a proof (this number is probably smaller by now).
Interestingly, a proof doesn’t need to be computed by a trusted party, since the proof not only attests to the validity of the computation it is built for, but to the validity of the proof itself. Thus, the computation of such proofs can be split among a set of participants with significantly less redundancy than would be necessary to perform some trustless computation. It also allows for participants who compute zk-SNARKs to run on special hardware without reducing the decentralization of the system.
The challenges of zk-SNARKs, besides performance, are:
- Dependency on less-researched and less-time-tested cryptographic primitives;
- “Toxic waste” — zk-SNARKs depend on a trusted setup in which a group of people performs some computation and then discards the intermediate values of that computation. If all the participants of the procedure collude and keep the intermediate values, fake proofs can be created;
- Extra complexity introduced into the system design;
- zk-SNARKs only work for a subset of possible computations, so a protocol with a Turing-complete smart contract language wouldn’t be able to use SNARKs to prove the validity of the chain.
While many protocols are looking into using zk-SNARKs long term, I do not know any planning to launch with them besides Coda.
The second problem we will touch upon is data availability. Generally nodes operating a particular blockchain are separated into two groups: Full Nodes, those that download every full block and validate every transaction, and Light Nodes, those that only download block headers, and use Merkle proofs for parts of the state and transactions they are interested in.
Now if a majority of full nodes collude, they can produce a block, valid or invalid, and send its hash to the light nodes, but never disclose the full content of the block. There are various ways they can benefit from it. For example, consider the figure below:
There are three blocks: the previous, A, is produced by honest validators; the current, B, has validators colluding; and the next, C, will be also produced by honest validators (the blockchain is depicted in the bottom right corner).
You are a merchant. The validators of the current block (B) received block A from the previous validators, computed a block in which you receive money, and sent you a header of that block with a Merkle proof of the state in which you have money (or a Merkle proof of a valid transaction that sends the money to you). Confident the transaction is finalized, you provide the service.
However, the validators never distribute the full content of the block B to anyone. As such, the honest validators of block C can’t retrieve the block, and are either forced to stall the system or to build on top of A, depriving you as a merchant of money.
When we apply the same scenario to sharding, the definitions of full and light node generally apply per shard: validators in each shard download every block in that shard and validate every transaction in that shard, but other nodes in the system, including those that snapshot shard chains state into the beacon chain, only download the headers. Thus the validators in the shard are effectively full nodes for that shard, while other participants in the system, including the beacon chain, operate as light nodes.
For the fisherman approach we discussed above to work, honest validators need to be able to download blocks that are cross-linked to the beacon chain. If malicious validators cross-linked a header of an invalid block (or used it to initiate a cross-shard transaction), but never distributed the block, the honest validators have no way to craft a challenge.
We will cover two approaches to address this problem that complement each other.
Proof of Custody
The most immediately problem to be solved is whether a block is available once it is published. One proposed idea is to have so-called Notaries that rotate between shards more often than validators whose only job is to download a block and attest to the fact that they were able to download it. They can be rotated more frequently because they don’t need to download the entire state of the shard, unlike the validators.
The problem with this naive approach is that it is impossible to prove later whether the Notary was or was not able to download the block, so a Notary can choose to always attest that they were able to download the block without even attempting to retrieve it. One solution to this is for Notaries to provide some evidence or to stake some amount of tokens attesting that the block was downloaded. One such solution is discussed here.
When a particular light node receives a hash of a block, to increase the node’s confidence that the block is available it can attempt to download a few random pieces of the block. This is not a complete solution, since unless the light nodes collectively download the entire block the malicious block producers can choose to withhold the parts of the block that were not downloaded by any light node, thus still making the block unavailable.
One solution is to use a construction called Erasure Codes to make it possible to recover the full block even if only some part of the block is available:
Both Polkadot and Ethereum Serenity have designs around this idea that provide a way for light nodes to be reasonably confident the blocks are available. The Ethereum Serenity approach has a detailed description in this paper. Both approaches rely on challenges, and thus are potentially vulnerable to griefing attacks.
Long term availability, and Conclusion
Note that all the approaches discussed above only attest to the fact that a block was published at all, and is available now. Blocks can later become unavailable for a variety of reasons: nodes going offline, nodes intentionally erasing historical data, and others.
A whitepaper worth mentioning that addresses this issue is Polyshard, which uses erasure codes to make blocks available across shards even if several shards completely lose their data. Unfortunately their specific approach requires all the shards to download blocks from all other shards, which is prohibitively expensive.
Luckily, the long term availability is not as pressing of an issue: since no participant in the system is expected to be capable of validating all the chains in all the shards, the security of the sharded protocol needs to be designed in such a way that the system is secure even if some old blocks in some shards become completely unavailable.
Data validity and data availability remain two problems in designing secure protocols that do not yet have a satisfactory solution. We are actively researching these problems. Stay tuned for updates.
Near Protocol builds a sharded general purpose blockchain with a huge emphasis on usability. If you like our write-ups, follow us on twitter to learn when we post new content:
If you want to be more involved, join our Discord channel where we discuss all technical and non-technical aspects of Near Protocol, such as consensus, economics and governance:
Near Protocol is being actively developed, and the code is open source, follow our progress on GitHub:
Thanks to Justin Drake from Ethereum Foundation, Alistair Stewart from Polkadot, Zaki Manian from Cosmos Protocol, Monica Quaintance from Kadena Protocol and Dan Robinson from Interstellar for reviewing an early draft of this post and providing feedback.
This blog post is the first in the series of two on Blockchain Sharding. After reading this blog post you will know why Sharding is the path to the future-proof blockchain protocols, how Sharding is being built today, what challenges all the sharded protocols face, and how such challenges can be addressed. The second post covers more advanced topics such as data availability, data validity and corrupting shards.
It is well-known that Ethereum, the most used general purpose blockchain at the time of this writing, can only process less than 20 transactions per second on the main chain. This limitation, coupled with the popularity of the network, leads to high gas prices (the cost of executing a transaction on the network) and long confirmation times; despite the fact that at the time of this writing a new block is produced approximately every 10–20 seconds the average time it actually takes for a transaction to be added to the blockchain is 1.2 minutes, according to ETH Gas Station. Low throughput, high prices, and high latency all make Ethereum not suitable to run services that need to scale with adoption.
What is the primary reason for Ethereum’s low throughput? The reason is that every node in the network needs to process every single transaction. Developers have proposed many solutions to address the issue of throughput on the protocol level. These solutions can be mostly separated into those that delegate all the computation to a small set of powerful nodes, and those that have each node in the network only do a subset of the total amount of work. An extreme case of the former approach is Thunder that has one single node processing all the transactions and claims to achieve 1200 tx/sec, a 100x improvement over Ethereum (I do not, however, endorse Thunder, or attest to the validity of their claims). Algorand, SpaceMesh, Solana all fit into the former category, building various improvements in the consensus and the structure of the blockchain itself to run significantly more transactions, but still bounded by what a single (albeit very powerful) machine can process.
The latter approach, in which the work is split among all the participating nodes, is called sharding. This is how Ethereum Foundation currently plans to scale Ethereum. At the time of this writing the full spec is still not published. I wrote a detailed overview of Ethereum shard chains and comparison of their Beacon chain consensus to Near’s.
Near Protocol is also building sharding. Near team includes three ex-MemSQL engineers responsible for building sharding, cross-shard transactions and distributed JOINs, as well as five ex-Googlers, and has significant industry expertise in building distributed systems.
In this post I summarize the core ideas of blockchain sharding, on which both Near and majority of other sharded protocols are based. The subsequent post will outline more advanced topics in sharding.
The simplest Sharding, a.k.a. Beanstalk
Let’s start with the simplest approach to sharding, that we throughout this write-up will call a Beanstalk. This is also what Vitalik calls “scaling by a thousand altcoins” in this presentation.
In this approach instead of running one blockchain, we will run multiple, and call each such blockchain a “shard”. Each shard will have its own set of validators. Here and below we use a generic term “validator” to refer to participants that verify transactions and produce blocks, either by mining, such as in Proof of Work, or via a voting-based mechanism. For now let’s assume that the shards never communicate with each other.
The Beanstalk design, though simple, is sufficient to outline some major challenges in sharding.
Validator partitioning and Beacon chains
The first challenge is that with each shard having its own validators, each shard is now 10 times less secure than the entire chain. So if a non-sharded chain with X validators decides to hard-fork into a sharded chain, and splits X validators across 10 shards, each shard now only has X/10 validators, and corrupting one shard only requires corrupting 5.1% (51% / 10) of the total number of validators.
Which brings us to the second point: who chooses validators for each shard? Controlling 5.1% of validators is only damaging if all those 5.1% of validators are in the same shard. If validators can’t choose which shard they get to validate in, a participant controlling 5.1% of the validators is highly unlikely to get all their validators in the same shard, heavily reducing their ability to compromise the system.
Almost all sharding designs today rely on some source of randomness to assign validators to shards. Randomness on blockchain on itself is a very challenging topic and would deserve a separate blog post at some later date, but for now let’s assume there’s some source of randomness we can use.
Both the randomness and the validators assignment require computation that is not specific to any particular shard. For that computation, practically all existing designs have a separate blockchain that is tasked with performing operations necessary for the maintenance of the entire network. Besides generating random numbers and assigning validators to the shards, these operations often also include receiving updates from shards and taking snapshots of them, processing stakes and slashing in Proof-of-Stake systems, and rebalancing shards when that feature is supported. Such chain is called a Beacon chain in Ethereum and Near, a Relay chain in PolkaDot, and the Cosmos Hub in Cosmos.
Throughout this post we will refer to such chain as a Beacon chain. The existence of the Beacon chain brings us to the next interesting topic, the quadratic sharding.
Sharding is often advertised as a solution that scales infinitely with the number of nodes participating in the network operation. While it is in theory possible to design such a sharding solution, any solution that has the concept of a Beacon chain doesn’t have infinite scalability. To understand why, note that the Beacon chain has to do some bookkeeping computation, such as assigning validators to shards, or snapshotting shard chain blocks, that is proportional to the number of shards in the system. Since the Beacon chain is itself a single blockchain, with computation bounded by the computational capabilities of nodes operating it, the number of shards is naturally limited.
However, the structure of a sharded network does bestow a multiplicative effect on any improvements to its nodes. Consider the case in which an arbitrary improvement is made to the efficiency of nodes in the network which will allow them faster transaction processing times.
If the nodes operating the network, including the nodes in the Beacon chain, become four times faster, then each shard will be able to process four times more transactions, and the Beacon chain will be able to maintain 4 times more shards. The throughput across the system will increase by the factor of 4 x 4 = 16 — thus the name quadratic sharding.
It is hard to provide an accurate measurement for how many shards are viable today, but it is unlikely that in any foreseeable future the throughput needs of blockchain users will outgrow the limitations of quadratic sharding. The sheer number of nodes necessary to operate such a volume of shards securely is orders of magnitude higher than the number of nodes operating all the blockchains combined today.
However, if we want to build future proof protocols, it might be worth starting researching solutions to this problem today. The most developed proposal as of now is exponential sharding, in which shards themselves are forming a tree, and each parent shard is orchestrating a series of child shards, while can itself be a child of some other shard.
Vlad Zamfir is known to be working on a sharding design that doesn’t involve a beacon chain; I worked with him on one of the prototypes, the detailed overview of which is here.
Up until now we haven’t defined very well what exactly is and is not separated when a network is divided into shards. Specifically, nodes in the blockchain perform three important tasks: not only do they 1) process transactions, they also 2) relay validated transactions and completed blocks to other nodes and 3) store the state and the history of the entire network ledger. Each of these three tasks imposes a growing requirement on the nodes operating the network:
- The necessity to process transactions requires more compute power with the increased number of transactions being processed;
- The necessity to relay transactions and blocks requires more network bandwidth with the increased number of transactions being relayed;
- The necessity to store data requires more storage as the state grows. Importantly, unlike the processing power and network, the storage requirement grows even if the transaction rate (number of transactions processed per second) remains constant.
From the above list it might appear that the storage requirement would be the most pressing, since it is the only one that is being increased over time even if the number of transactions per second doesn’t change, but in practice the most pressing requirement today is the compute power. The entire state of Ethereum as of this writing is 100GB, easily manageable by most of the nodes. But the number of transactions Ethereum can process is around 20, orders of magnitude less than what is needed for many practical use cases.
Zilliqa is the most well-known project that shards processing but not storage. Sharding of processing is an easier problem because each node has the entire state, meaning that contracts can freely invoke other contracts and read any data from the blockchain. Some careful engineering is needed to make sure updates from multiple shards updating the same parts of the state do not conflict. In those regards Zilliqa is taking a very simplistic approach, which I analyze in this post.
While sharding of storage without sharding of processing was proposed, I’m not aware of any project working on it. Thus in practice sharding of storage, or State Sharding, almost always implies sharding of processing and sharding of network.
Practically, under State Sharding the nodes in each shard are building their own blockchain that contains transactions that affect only the local part of the global state that is assigned to that shard. Therefore, the validators in the shard only need to store their local part of the global state and only execute, and as such only relay, transactions that affect their part of the state. This partition linearly reduces the requirement on all compute power, storage, and network bandwidth, but introduces new problems, such as data availability and cross-shard transactions, both of which we will cover below.
Beanstalk as a model is not a very useful approach to sharding, because if individual shards cannot communicate with each other, they are no better than multiple independent blockchains. Even today, when sharding is not available, there’s a huge demand for interoperability between various blockchains.
Let’s for now only consider simple payment transactions, where each participant has account on exactly one shard. If one wishes to transfer money from one account to another within the same shard, the transaction can be processed entirely by the validators in that shard. If, however, Alice that resides on shard #1 wants to send money to Bob who resides on shard #2, neither validators on shard #1(they won’t be able to credit Bob’s account) nor the validators on shard #2 (they won’t be able to debit Alice’s account) can process the entire transaction.
There are two families of approaches to cross-shard transactions:
- Synchronous: whenever a cross-shard transaction needs to be executed, the blocks in multiple shards that contain state transition related to the transaction get all produced at the same time, and the validators of multiple shards collaborate on executing such transactions. The most detailed proposal known to me is Merge Blocks, described here.
- Asynchronous: a cross-shard transaction that affects multiple shards is executed in those shards asynchronously, the “Credit” shard executing its half once it has sufficient evidence that the “Debit” shard has executed its portion. This approach tends to be more prevalent due to its simplicity and ease of coordination. This system is today proposed in Cosmos, Ethereum Serenity, Near, Kadena, and others. A problem with this approach lies in that if blocks are produced independently, there’s a non-zero chance that one of the multiple blocks will be orphaned, thus making the transaction only partially applied. Consider the figure below that depicts two shards both of which encountered a fork, and a cross-shard transaction that was recorded in blocks A and X’ correspondingly. If the chains A-B and V’-X’-Y’-Z’ end up being canonical in the corresponding shards, the transaction is fully finalized. If A’-B’-C’-D’ and V-X become canonical, then the transaction is fully abandoned, which is acceptable. But if, for example, A-B and V-X become canonical, then one part of the transaction is finalized and one is abandoned, creating an atomicity failure. We will cover how this problem is addressed in proposed protocols in the second part, when covering changes to the fork-choice rules and consensus algorithms proposed for sharded protocols.
Note that communication between chains is useful outside of sharded blockchains too. Interoperability between chains is a complex problem that many projects are trying to solve. In sharded blockchains the problem is somewhat easier since the block structure and consensus are the same across shards, and there’s a beacon chain that can be used for coordination. In a sharded blockchain, however, all the shard chains are the same, while in the global blockchains ecosystem there are lots of different blockchains, with different target use cases, decentralization and privacy guarantees.
Building a system in which a set of chains have different properties but use sufficiently similar consensus and block structure and have a common beacon chain could enable an ecosystem of heterogeneous blockchains that have a working interoperability subsystem. Such system is unlikely to feature validator rotation, so some extra measures need to be taken to ensure security. Both Cosmos and PolkaDot are effectively such systems. This writeup by Zaki Manian from Cosmos provides detailed overview and comparison of the key aspects of the two projects.
You now have a good understanding of how sharding is implemented, including the concepts of the beacon chain, validator rotations and cross-shard transactions.
With all that information, there’s one last important thing to consider. Specifically, what adversarial behavior can malicious validators exercise.
A set of malicious validators might attempt to create a fork. Note that it doesn’t matter if the underlying consensus is BFT or not, corrupting sufficient number of validators will always make it possible to create a fork.
It is significantly more likely for more that 50% of a single shard to be corrupted, than for more than 50% of the entire network to be corrupted (we will dive deeper into these probabilities in the second part). As discussed above, cross-shard transactions involve certain state changes in multiple shards, and the corresponding blocks in such shards that apply such state changes must either be all finalized (i.e. appear in the selected chains on their corresponding shards), or all be orphaned (i.e. not appear in the selected chains on their corresponding shards). Since generally the probability of shards being corrupted is not negligible, we can’t assume that the forks won’t happen even if a byzantine consensus was reached among the shard validators, or many blocks were produced on top of the block with the state change.
This problem has multiple solutions, the most common one being occasional cross-linking of the latest shard chain block to the beacon chain. The fork choice rule in the shard chains is then changed to always prefer the chain that is cross-linked, and only apply shard-specific fork-choice rule for blocks that were published since the last cross-link. We will talk more about what fork-choice rules are, and provide an in-depth analysis of proposed fork-choice rules for sharded blockchains in the second part.
Approving invalid blocks
A set of validators might attempt to create a block that applies the state transition function incorrectly. For example, starting with a state in which Alice has 10 tokens and Bob has 0 tokens, the block might contain a transaction that sends 10 tokens from Alice to Bob, but ends up with a state in which Alice has 0 tokens and Bob has 1000 tokens.
In a classic non-sharded blockchain such an attack is not possible, since all the participant in the network validate all the blocks, and the block with such an invalid state transition will be rejected by both other block producers, and the participants of the network that do not create blocks. Even if the malicious validators continue creating blocks on top of such an invalid block faster than honest validators build the correct chain, thus having the chain with the invalid block being longer, it doesn’t matter, since every participant that is using the blockchain for any purpose validates all the blocks, and discards all the blocks built on top of the invalid block.
On the figure above there are five validators, three of whom are malicious. They created an invalid block A’, and then continued building new blocks on top of it. Two honest validators discarded A’ as invalid and were building on top of the last valid block known to them, creating a fork. Since there are fewer validators in the honest fork, their chain is shorter. However, in classic non-sharded blockchain every participant that uses blockchain for any purpose is responsible for validating all the blocks they receive and recomputing the state. Thus any person who has any interest in the blockchain would observe that A’ is invalid, and thus also immediately discard B’, C’ and D’, as such taking the chain A-B as the current longest valid chain.
In a sharded blockchain, however, no participant can validate all the transactions on all the shards, so they need to have some way to confirm that at no point in history of any shard of the blockchain no invalid block was included.
Note that unlike with forks, cross-linking to the Beacon chain is not a sufficient solution, since the Beacon chain doesn’t have the capacity to validate the blocks. It can only validate that a sufficient number of validators in that shard signed the block (and as such attested to its correctness).
I am aware of only two solutions to this problem, neither of which is really satisfactory today:
- Have some reasonable mechanism that will alert the system if an attempt to apply the state transition incorrectly is made. Assuming that each shard is running some sort of BFT consensus, for as long as number of malicious validators in a particular shard is less than ⅔, at least one honest validator would need to attest to a block, and verify that the state transition function is applied correctly. If more than ⅔ of the nodes are malicious, they can finalize a block without a single honest node participating. Assuming that at least one node in the shard is not malicious, some mechanism is needed that would allow such nodes to monitor what blocks are being produced, and have sufficient time to challenge nodes with invalid state transition.
- Have some information in the blocks that is sufficient to prove that the state transition is applied correctly but is significantly cheaper to validate than the actual application of the state transition function. The closest mechanism to achieve that is zk-SNARKs (though we don’t really need the “zk”, or zero-knowledge, part, a non-zk SNARK would be sufficient), but zk-SNARKs are notoriously slow to compute at this point.
Many protocols today assume that with proper validator rotation and a byzantine fault tolerant consensus neither forks nor invalid state transitions are possible. We will begin the subsequent Sharding 201 by addressing why this assumption is not reasonable.
With the above information you now know most of the important aspects of sharding, such as the concept of the Beacon chain, computation versus state sharding, and cross-shard transactions. Stay tuned for the second part with Sharding 201 which will dive deeper into attack prevention.
In the meantime be sure to join our Discord channel where we discuss all technical and non-technical aspects of Near Protocol, such as consensus, economics and governance:
Make sure to follow Near Protocol on Twitter to not miss our future blog posts, including Sharding 201, and other announcements:
NEAR Protocol builds a sharded public blockchain that executes smart contracts on a WASM virtual machine. If this sounds like Ethereum 2.0 (aka Serenity), it’s because they actually are very similar. However, based on Serenity’s multi-year roadmap, we believe that with our team and focus, we can deliver significantly faster.
Despite the release not being around the corner, Serenity’s specification is mostly available. As of time of this writing (November 6th, 2018), the spec for the beacon chain is complete, and the spec for the shard chains, while not published yet, is mostly finalized (in the absence of the official spec, I have a rather detailed blog post that describes the design for Ethereum 2.0 shard chains). This blog post also contains some details of the Ethereum’s beacon chain design that are not immediately obvious from the specification, that I learned from an in-depth conversation with Vitalik.
NEAR differs from Serenity in several aspects. Most notably, NEAR uses different consensus algorithms and fork choice rules in both the beacon chain and the shard chains. Given the extensive experience that Ethereum researchers have, strong motivations are required to validate such decisions.
In this post, I will describe the differences between the two protocols and motivations on why we use our own consensus algorithms and fork choice rules rather than those designed by the Ethereum team.
The Beacon Chain
It is highly desirable for the beacon chain not to have forks. In both Ethereum and NEAR, the beacon chains are responsible for selecting validators for shard chains and for snapshotting the state of the shard chains (the so-called cross-linking), with both processes relying on the beacon chain not having forks.
BFT consensus tradeoffs
Achieving zero forkfulness is highly challenging. The majority of modern BFT consensus algorithms do not scale beyond 1000 participants, while the permissionless blockchain networks are expected to scale to millions or possibly billions of people. Therefore, consensus on each block has to be reached by a number of participants that is significantly smaller than the total number of participants in the system. It can be done in two somewhat similar but fundamentally different ways (assuming a proof-of-stake sybil resistance mechanism exists):
- Make the stake for becoming a consensus participant (“validator”) so high that only on the order of 1000 participants can participate. Generally, that would be six digit numbers in the US dollars equivalent per validator. In this approach, a fork in the blockchain would result in millions or dozens of millions of dollars slashed. Even if a fork does occur, it will be a significant event, with consequences that are likely to result in a hard fork with some mitigation of the damage. For all practical reasons, such a system can be assumed to have zero forkfulness. It is arguable, however, whether such a system is decentralized. People capable of staking such sums of money within a particular blockchain ecosystem tend to know each other, meaning that the security of the system will be in the hands of a tight-knit group of people. It can result in all sorts of non-slashable misbehavior such as censoring, stalling, etc.
- Make the stake for becoming a validator low, and randomly select 1000 validators to create blocks. A new set of validators can be selected for each block, or rotated every few blocks. Assuming that the total number of malicious actors in the system is substantially less than ⅓, the probability of more than ⅓ corrupted validators appearing in a sample is very low. The problem with this approach is that the validators can be corrupted after they are selected (see this blog post with some further analysis). Corrupting a sufficient percentage of a shard is extremely hard, but not impossible. Therefore, a system that uses such an approach cannot rely on the absence of forks.
Footnote: For example, Algorand, that claims to never have forks, uses the latter approach. When answering a direct question about bribing validators, Silvio Micali responded that Algorand assumes that less than 50% of all the validators are corruptible. It is not only an unreasonable assumption but also in my opinion invalidates some of the other Algorand declared properties.
In essence, the design decision comes down to some compromise between centralization and forkfulness. An early design of Casper heavily favored centralization (see this link with a deprecated design, in particular MIN_DEPOSIT_SIZE being set to 1500 ETH). In the present designs NEAR favors forkfulness, while Ethereum’s Casper builds a consensus algorithm that scales to hundreds of thousands of validators, thus avoiding the compromise altogether. The pros and cons of both and why we do not use Casper are as follows.
With our current constants, each block is backed by approximately 0.1% of all the stake in the system. Thus, assuming the same valuation as Ethereum’s today ($20B) and 5% of all the tokens staked for validation, the cost of corrupting 50% of one block’s validators is around ~$0.5M, which is significantly less than the cost of corrupting the entire system.
Importantly, however, while for each block (produced once a minute) the probability of a fork is not negligible, the probability of reverting a large sequence of blocks is very low. Within one day, the validators (in terms of tokens staked) for each block do not intersect, so the number of tokens slashed to revert a tail of X blocks is linear in X. In particular, reverting all the blocks produced in one day would result in at least ⅓ of the total stake of all validators slashed.
Despite the fact that the beacon chain spec is published, the exact details of how the validation on the beacon chain is done and which subset of validators finalizes the blocks is not easy to derive from the spec. I had an in-depth conversation with Vitalik to better understand the current design.
To become a validator in Ethereum, it is sufficient to stake 32ETH. The number of validators is capped at approximately 4 million, but the expected value in practice should be around 400K. Shards sample committees from those validators, but on the beacon chain, all validators attest to each block, and all validators participate in Casper (see my blog post for the overview of the shard chains in Ethereum, and an overview of proposing and attesting; from now, I assume the reader is familiar with those concepts).
The attestations on the beacon chain serve multiple purposes, two that are relevant for us are:
- The attestations are used for the LMD (latest-message driven) fork choice rule that is used for blocks produced since the last block finalized by Casper;
- The attestations are reused for Casper finalization itself (see the Casper FFG paper).
Unlike the previous proposals, all the ~400K validators rather than a sample participate in each Casper finalization. LMD still relies on samples of 1/64 of all the validators.
Update: make sure to read Vitalik’s response here, where he provides more details and clarifications.
The blocks on the beacon chain are produced every 16 seconds (increased from 8 seconds in a recent spec update), and Casper finalization happens every 100 blocks. This effectively means that every 16 seconds, 400K/64 participants create a multisignature on a block, and every ~26 minutes all 400K participants reach a byzantine consensus on a block.
Both sending 400K signatures over network and aggregating them is expensive. To make it feasible, the validators are split into committees. Assuming 400K participants, each committee consists of 4096 participants (with 1024 total committees). Each committee aggregates the BLS signature internally, and propagates it up to the whole validators set, where only the resulting combined signatures from the committees are aggregated into the final BLS signature. The validation of a BLS signature is rather cheap, along with computing an aggregated public key for the 400K validators. I personally estimate the most expensive part will be validating 4K signatures within each committee, but according to Vitalik that should be doable in a couple seconds.
While Casper FFG, in practice, indeed provides almost zero forkfulness, there are a few reasons why we chose our consensus instead of adopting Casper FFG:
- In Ethereum, the underlying block production mechanism relies on synchronized clocks; I will discuss problems with this reliance below when talk about shard chains;
- Casper only finalizes blocks every 26 minutes. Blocks between such finalizations can theoretically have forks — the attestations do not provide theoretical guarantees, and even with ⅔ attestations on a block and less than ⅓ of malicious actors a block could be reverted;
Besides those reasons, NEAR aims to enable network operators to run nodes on mobile phones. To fully leverage the benefits of linear scalability that sharding provides, a blockchain network needs to have significantly more participating nodes than there are in any blockchain network existing today, and the ability to run nodes on (high end) mobile phones taps into a pool of hundreds of millions of devices. With Thresholded Proof of Stake, a participant on the beacon chain only needs to participate in a cheap consensus once per stake per day, while with Ethereum’s approach one would need to be constantly online, participating in heavy computations (validating thousands of BLS signatures every few seconds). Ethereum doesn’t target mobile devices as operating nodes, so for them, such a decision makes sense.
It is also important to note that the majority of participants on Ethereum will stake significantly more than 32ETH, and will thus participate in multiple committees, which might create some bottleneck on networking (a participant that staked 32000 ETH and thus participates in ~1000 committees will have to receive around 1000 x 4096 signatures every 16 seconds).
Overall, the main consideration for NEAR is the ability to run on low end devices, so we chose simpler and cheaper BFT consensus with small committees instead of running a consensus among all the validators. As a result, the beacon chain in NEAR Protocol can in theory have forks, and the rest of the system is designed to work without assuming that the beacon chain has zero forkfulness.
The Shard Chains
NEAR uses its own consensus called TxFlow for shard chains, while Ethereum 2.0 uses the proposers / attesters framework. While TxFlow provides byzantine fault-tolerant consensus under the assumption that less than ⅓ of nodes are malicious in each shard, such an assumption is completely unreasonable for a shard chain, for reasons discussed above.
With that assumption removed, TxFlow and Attestations have very similar properties: blocks are produced relatively quickly, and the probability of forks is reasonably small under normal operation. The major drawback of TxFlow is that it stalls if more than ⅓ of the participants are offline. Ethereum maintains liveness with any number of validators dropping out (though the speed of block production linearly degrades with fewer participants online).
On the other hand, Ethereum shard chains depend crucially on participants having synchronized clocks. The blocks are produced at a regular schedule (one every 8 seconds), and for the system to make progress, the clocks need to be synchronized with an accuracy of a few seconds. I personally do not believe that such synchronization is possible without depending on centralized time servers that become a single point of failure for the system. Also, the security analysis of possible timing attacks when there’s a dependency on a clock appears to be extremely complex.
At NEAR, we have a principled position to not have any dependency on synchronized clocks, and thus cannot use the proposers/attesters framework for the shard chains.
It is also worth mentioning that we are actively researching ways to adjust TxFlow in such a way that it maintains liveness when fewer than ⅔ of validators are online (naturally at an expense of higher forkfulness under such circumstances).
When designing complex sharded blockchains, many design decisions come down to choosing from multiple suboptimal solutions, such as choosing between centralization and forkfulness in the beacon chain.
We are working closely with Ethereum Foundation on sharding research, and both teams are aware of the pros and cons of different approaches. In this blog post I presented our thinking behind the decisions that differ in our design from Ethereum Serenity.
If you want to stay up to date with what we build at NEAR, use the following channels:
- Twitter — https://twitter.com/nearprotocol,
- Discord — https://discord.gg/kRY6AFp, we have open conversations on tech, governance and economics of blockchain on our discord.
- Our recently launched research forum — http://research.nearprotocol.com/
Huge thanks to Vitalik Buterin for providing detailed explanation on how the beacon chain in Ethereum Serenity works.