Ethereum’s client diversity: with 66% running Prysm, is The Merge safe to pursue?
As more than two thirds of the staking power of the Ethereum 2.0 chain is running on a single client software, what happens if this client is hit by a serious vulnerability, and what can the community do to fix the issue?
Around the middle of this year, Ethereum, the second-largest blockchain in terms of monetary value, and with hundreds of billions of dollars worth of assets depending on its operation, will transition from the Proof-of-Work consensus algorithm securing the system today, to the Proof-of-Stake system of tomorrow – a procedure described by many as changing the engine of an airplane while flying it. Ethereum can, under no circumstances whatsoever, stop producing valid blocks.
Unlike most blockchains, for example, Bitcoin, Ethereum’s developer community, encouraged by the Ethereum Foundation (EF) and many of the community’s prominent figures, has agreed on developing several versions of the client softwares implementing the protocol of the Proof-of-Stake consensus blockchain, often referred to as Ethereum 2.0. The different versions of the client software are separated by programming language and by the individual teams developing them.
Post merge, there will be two types of nodes
The transition will be a merger, often simply referred to as The Merge, between the Ethereum network nodes of today, among which a subset functions as miners, and nodes running the so-called beacon chain that is already up and running since December 2020. At the same time, there will be a separation of the duties of nodes. Today, nodes perform both the execution of transactions and validation of those same transactions.
Post merge, there will be two types of nodes: one type will present the Ethereum Virtual Machine, the EVM, to users and smart contracts, execute transactions and send these to validator nodes to validate them. Execution nodes on the execution chain will basically perform the same duties as they do now, except that the validation will be taken care of by the validator nodes on the consensus chain.
The two types of clients share some code, given they are developed in the same programming language, and the execution clients have been updated to a small extent to accommodate the merge. Most parts of the execution client, such as the EVM can be reused with slight modifications. Eventually, the execution clients may altogether drop the parts of the code that does the validation on the present Proof-of-Work chain.
The merge, then, is not actually a merge in the common sense that two chains will become one, but rather that at a certain point in time, at a certain block height to be correct, today’s nodes will stop validating transactions, a duty which will instead be carried out by validators. This is a classic enhancement of robustness by the separation of duties into different logical layers.
66% on one client could mean game over
The reasoning behind having several client softwares is that a fault, a bug, or vulnerability, in one of the clients won’t affect other clients, because they don’t share the same code or even programming languages.
It’s fair to ask why this is not the case with, say, Bitcoin. The reason is that the Bitcoin protocol, and its implementation of it, is very simple compared to Ethereum’s protocol. Ethereum is a much more complex machine, by an order of magnitude than Bitcoin, and added complexity, by nature, means a higher risk of vulnerabilities and more attack surfaces.
This is all fine as long as the distribution of the different clients is even, or close to even, and in particular in such a way that no one client is used by more than 33% of the staking power in the network. If not, and certainly if one client is used by more than 66% of the staking power, which is the case today, then the whole idea of having different code bases for different clients is pretty much useless.
Without going too much into the weeds of how different distributions can have different effects on the operation of the network, it suffices to say that if a serious bug hits a client with less than ⅓ of the staking power, then no harm is done. The network will continue to operate without any hiccups. The bug will be fixed and everything will go back to normal.
If the same thing happens to a client with between ⅓ and ½ of the staking power, then it’s a bit more serious but users won’t take notice. Automatic mechanisms of various sorts will take care of it. If a serious bug hits a client with more than ½ of the staking power, then a host of mechanisms will automatically be executed that will eventually mend the situation, but there will be complications and disturbances to the network, and users will be affected.
If, however, a bug hits a client that is used by more than ⅔ of the staking power, it’s basically game over. The buggy clients have a super-majority and all the power that comes with it, and the buggy chain will finalize. In essence, all the non-buggy clients can do is to either permanently split the chain, in which case we will have two Ethereums, or join the buggy chain and live with whatever the bug has caused.
Readers interested in reading up on the details are highly recommended to read jmcook. eth’s article on Mirror.
Supermajority on Prysm, not an ideal situation
As of today, around ⅔ of the network’s staking power runs the Prysm client implementation, developed by Prysmatic Labs. This is, to say the least, not an ideal situation in case the Prysm client would turn out to contain a bug, and the bug may be exploited in a way that causes a consensus failure on the network. To be fair, this scenario is unlikely, but nevertheless non-zero.
The other clients on the market are Lighthouse, Teku, Nimbus, Grandine, and Loadstar. Of these Grandine and Loadstar have very small market shares, both well below 1%. Grandine is the only one published under a closed source license.
The distribution of consensus clients as of press time is shown in the illustration below. As the reader can see, Prysm’s dominance is far beyond satisfying, but just below the critical ⅔ level. For up-to-date details and resources, visit clientdiversity.org.
A fair question to ask is why the Prysm client is so dominant; there must be some reason why people and organizations that run validator nodes chose Prysm? To answer the question, CryptoSlate reached out to Marius van der Wijden, Ethereum core developer working on the Geth (Golang Ethereum) Proof-of-Work client.
Prysm rules due to first-mover advantage
“I think the big reasons for Prysms success are a first-mover advantage, tooling, and golang. Prysm was the first prototype implementation of a beacon client. Thus they could start optimizing their client early on and they had more time to create additional tooling (e.g. the Web UI) and good documentation.”
“Another big advantage is the programming language used by prysm – golang – which is reasonably performant and very easy to read and develop in. Go-ethereum is also written in golang, thus devs familiar with Geth could also easily understand and audit prysm,” van der Wijden says.
The latter is important since the lack of even distribution between Proof-of-Work execution clients is even worse than with consensus clients. At the time of writing, Geth’s “market share” is over 85%. However, in a post-merger world, this is not as much of a problem since execution nodes merely execute transactions, but they don’t provide security the way consensus clients do.
“Go-ethereum currently has a supermajority of 85% on the execution layer. It will be a bit better post-merge since stakers can run multiple execution layer clients, with one beacon client, in order to always end up on the correct chain,” van der Wijden says.
Big exchanges are the big Prysm contributors
Now, not all node operators are equal. On the contrary, some node operators have staked vastly more ether than others, and thus they wield more staking power than their lesser peers. The biggest stakers are so-called staking services and/or pools, providing the opportunity to stake ether on the beacon chain without the need for coughing up 32 ETH, and if it wasn’t for all of the major staking services running the Prysm client, the client diversity issue wouldn’t be an issue.
These staking services have familiar names: Coinbase, Kraken, and Binance. Yes, the same.
With 278,407 validator nodes on the beacon chain today, Coinbase alone, with its 48,864 validators (17.5%) and 92.4% of those validators running Prysm, contributes 24.3% to the diversity issue.
When CryptoSlate reached out to Coinbase to ask how they view the client diversity issue, the company’s contribution to it, and what, if anything, Coinbase would do to deflate the issue, Coinbase’s communications Jaclyn Sales pointed to a tweet thread by Coinbase Cloud from the 22nd of February.
In the thread, Coinbase mainly points to security as the motivation behind the choice to run Prysm.
“Coinbase uses multiple eth2 staking providers to maximize security and client distribution. When launching eth2 staking, Coinbase evaluated existing clients and providers to maximize these traits, which meant starting with Prysm because it was the only viable client supporting remote signers.”
“Remote signers allow validators to generate and store keys in isolated environments instead of keeping them on the validator itself, which greatly increases the security of the eth2 validators on Coinbase.”
Coinbase: Prysm had better security features
As per the tweet, remote signers also allow Coinbase Cloud to offer double signing protection through high watermark software which helps protect validators from any issues with the signing modules in clients.
“On the Coinbase Cloud team, we service Coinbase Retail, but also many other customers. We have supported Lighthouse for almost a year, and worked with @sigp_io to add remote signer support to Lighthouse late last year,” the tweet continues.
As for Kraken, with a validator count of 30,847 (11%), a Prysm usage of 94.9%, and an overall Prysm contribution of 15.7%, Brian Hoffman, Senior Product Manager at Kraken answers in an email that,
“When we first built our ETH2 staking model, we found Prysm the most appropriate solution due to its maturity and stability.”
“Following discussions with the Ethereum Foundation, both Kraken and Staked have also started to roll out new validators that are built on Teku, as well as migrate some existing ones. This way we can increase diversity in our validator client software and offer clients an even more resilient on-chain staking service.”
Binance with 24,410 validators (8.7%), a Prysm usage of 76.6%, and an overall Prysm contribution of 10% did not answer CryptoSlate’s request for comment.
The third-largest staking service, Lido, with 50,274 validators (18%) has twice as many validators than Binance, but the Prysm usage is at least “only” 42.8%, and so Lido contributes 11.5% to the Prysm dominance.
Decentralized, Rocket Pool leads the way
There are, of course, exceptions but these are very small. Decentralized staking pool Rocket Pool, for one, has a validator count of 2,100 (0.75%) with only 10.6% of the validators running Prysm, whereby Rocket Pool contributes a mere 0.12% to the Prysm domination.
All in all, the four major staking services and pools have it within their reach to solve the situation, and on the bright side of things, there are ongoing discussions between the staking services, and between the staking services and the Ethereum Foundation. According to Ethereum core developer Marius van der Wijden, the progression of these discussions are “good”.
“Yes, there are talks about this, both internally and externally. I think big staking pools are working on switching parts of their infrastructure to other clients. They need to update their metrics and monitoring infrastructure for the new clients, so it might take longer for them to switch than home validators,” van der Wijden says.
According to van der Wijden, it’s neither risky nor difficult for a node operator to switch client software.
“All major implementations are pretty well tested and maintained. If a user is already staking, they should shut down and persist their slashing database, if they don’t have a slashing database, they should wait for a couple of minutes (> 7 minutes) between shutting down the old client and starting the new client. The only difficulties might arise for bigger stakers as some clients provide different APIs than others,” van der Wijdens says.
Is The Merge safe to pursue?
With the merge only months away, the Ethereum community will probably have to accept a less than ideal client distribution; the likelihood that Prysm domination will fall below 33% must be seen as very small. This, however, does not discourage Marius van der Wijden nor the other Ethereum core developers from pursuing the merge.
“I think it’s safe to pursue. The chances of a consensus failure happening are very small in my opinion. We have great testing and fuzzing infrastructure that runs permanently to find differences between clients. Even if a consensus failure occurs, we will be able to push out new releases and resolve forks quickly and easily.”
“We also have strong consensus that we will not bail out stakers that run a majority client if their clients misbehave,” van der Wijden says.