Nomadic Labs

Adaptive Issuance in Paris

2024-04-08T16:00:00+02:00

TL;DR: Adaptive Issuance has been revised in close cooperation with Tezos ecosystem stakeholders. This post recaps what Adaptive Issuance means for Tezos staking, and what’s different in the Paris proposal.

Since its launch in 2018, the Tezos protocol has emitted a fixed number of tez, roughly 5% of the total supply per year, as rewards to those securing the network, bakers.

5% of the total supply is more than enough to incentivize a secure network, and as a result most rewards received by bakers are re-distributed to tez holders through the delegation system. However, over time this approach has been seen to cause friction with real-world use of Tezos, as we’ll expand on below.

Adaptive Issuance is a mechanism for letting the network automatically set the rate to the minimal level necessary to secure the network, thereby minimizing inefficiencies that arise from the re-distribution of rewards.

The mechanism continuously adjusts the rewards based on a target ratio for tez staked out of the total supply. When the ratio is below the target, the protocol increases rewards to encourage more staking. When it’s above the target, the protocol reduces rewards to minimize dilution.

This way, the Tezos protocol emits only as much new tez as needed to achieve a desired security level – no more, no less.

Below we look into why it matters. A more in-depth discussion of arguments for (and against) can be found in this Tezos Agora post.

Lower rates, less dilution, minimal friction

The market for delegation suggests that the reward required to incentivize a secure network, the security budget, is notably lower than the current 5% rate.

The delegation system essentially lets tez holders 1) contribute to staking and governance, and 2) minimize dilution by being refunded any “excess inflation” beyond what bakers require to secure the network. Bakers typically share 80-90% of rewards with delegators, indicating that current rewards are somewhat higher than what’s required.

With Adaptive Issuance, rewards adjust towards a ‘minimal viable level’ to avoid excess inflation. It has the important effect of greatly reducing the opportunity cost of undelegated tez, which makes Tezos better adapted to real-world use:

Better composability. Tez becomes more attractive for use in lending markets, DeFi, rollups, etc. Needing to delegate creates friction when lending, locking or pooling tez, as it raises questions about who gets rewards and who votes in governance. Technical workarounds exist, but they also add friction.
Simpler for custodians. Redistributing baking rewards requires some administration, and it’s an area likely to see increased regulation. Lower rewards makes it less important for custodians, such as exchanges, to engage in baking to avoid dilution of users’ funds.
Tax considerations. Some jurisdictions tax received rewards as income irrespective of whether they simply serve as compensation for dilution or are created by the validators themselves. A lower rewards rate lowers the potential tax liability.

More at stake for a more secure network

Less incentive to delegate carries risk of less tez being involved in staking and governance.

To avoid this, the target percentage for staked funds is set to 50% of total tez supply, and reward rates will continuously adjust to incentivize this level.

The 50% target increases network security while still leaving plenty of liquid tez in circulation. It’s lower than the 70% currently involved in staking (baker funds + delegated funds), but requirements for staked funds under Adaptive Issuance are stricter than for delegation.

Delegated funds are not frozen or subject to economic penalties for dishonest baker behavior, slashing. They merely assign staking rights and governance voting power to the baker. The 50% target under Adaptive Issuance requires funds to be actually at-stake, i.e. frozen and subject to slashing. This is equivalent to what the current system requires for baker security deposits, which only make up around 7% of total supply today. Under Adaptive Issuance, staked funds are the security deposits.

So while 50% appears lower than 70%, the “quality” of the stake is much higher and should rather be compared to the 7%.

The new staker role

Currently only funds in the baker’s address can be frozen and slashed. Achieving 50% would be difficult to achieve purely with bakers’ funds.

To address this, the Paris proposal introduces the role of staker, which can be seen as a step up from delegator. Staked funds contribute to a baker’s staking and voting power, and are also frozen and subject to slashing. However, the baker never takes custody of the funds, and they never leave the staker’s wallet. The protocol handles the staking assignment and freezing of funds.

Rewards accrue to stakers directly from the protocol, without the baker’s involvement, unlike delegation where rewards are paid to the bakers who then pay delegators. In-protocol accrual means more security for external stakers, and further reduces the administrative burden for bakers.

Classic delegation remains possible, but to incentivize staking and compensate for the added risk, staked funds carry double the weight when assigning baking rights and in governance votes.

Bakers define if they’re open to external stake and how much they’re willing to accept, with a maximum of 5x their own stake. The baker can also define a fee that is taken out of rewards accruing to externally staked funds.

External stake beyond the baker’s limit will still be frozen, but is treated as delegated when it comes to rights assignment, governance votes, reward payouts, and in case of slashing.

	Currently		Adaptive Issuance
	Baker security deposit*	Delegated funds	Baker stake*	External stake*	Delegated funds
Frozen and slashable	Yes	No	Yes	Yes	No
Baking rights / voting power	100%	100%	100%	100%	50%
Control of funds	Baker	Delegator	Baker	Staker	Delegator
Who pays out rewards	Protocol	Baker	Protocol	Protocol	Baker
Counts towards staking target	N/A	N/A	Yes	Yes	No
*Funds held by the baker beyond the security deposit / stake are treated as delegated. The same applies to external stake that exceeds the baker’s limit.

What’s new in Paris

Adaptive Issuance in the Paris protocol proposal is a modified version of what was previously proposed in the Oxford proposal, which did not reach the required supermajority for adoption (a revised Oxford proposal without Adaptive Issuance, Oxford 2, was later adopted).

The changes highlighted here are the ways in which the Paris version differs from the Oxford version. They are the result of several extensive feedback rounds involving the wider ecosystem.

Progressive min-max rate. After activation, the Adaptive Issuance rate is initially kept in a narrow range, between 4.5% and 5.5%. Over the course of 6 months the range gradually widens, eventually spanning from 0.25% to 10%. This is to create a smoother transition for bakers and avoid abrupt fluctuations after activation, which was a concern with the Oxford version.


	A gradual widening of the range ensures a smooth transition to Adaptive Issuance.

Only staking rewards. Only rewards associated with staking become variable under Adaptive Issuance, while rewards for Liquidity Baking remain constant (or 0 if turned off).

Adaptive Slashing. As slashing no longer only affects the baker, but also directly the baker’s stakers, it becomes more important to differentiate between accidental errors and deliberate attacks. To do so, the penalty for double attestation is made proportional to the share of attestation rights in the block held by bakers involved in double-attestation. It means that accidental double attestations are unlikely to cause extensive slashing, while concerted attacks result in severe penalties. This differs from Oxford / Oxford 2, where a fixed percentage is slashed, regardless of the degree of misconduct.


	Accidental double attestations are unlikely to cause extensive slashing, while concerted attacks involving many bakers result in severe penalties.

Manual staking. All stake is handled manually, including when balances change with rewards or incoming transactions. Auto-staking is no longer an option. The reason is that the new staker role introduces complexity that makes it difficult to define a fixed percentage. It also becomes unfeasible for the protocol to calculate auto-staking for both bakers and external stakers, as the number of required calculations goes from around 400 to potentially hundreds of thousands.

Bug fixes. The Oxford proposal contained bugs that have been addressed. While it’s impossible to entirely avoid bugs, we have greatly expanded our testing framework and are continuously working to improve it.

Co-built with ecosystem stakeholders

The Adaptive Issuance mechanism included in the Paris proposal has been built in close cooperation with Tezos bakers, blockchain indexers, tooling providers, and other ecosystem stakeholders. This is part of a longer-running effort to strengthen the collaborative aspect of protocol development.

Adaptive Issuance is a major revision of Tezos staking economics, with significant implications for ecosystem stakeholders, and we are grateful for the time invested by them in helping us shape it. Feedback rounds about progressive min-max, Adaptive Slashing, and in particular the staking UX, have been very fruitful.

With the Paris proposal, we believe that most prevalent points of friction have been identified and addressed, and that the ecosystem is generally better prepared for Adaptive Issuance.

Get involved

It is already possible to experiment with Adaptive Issuance on the Weeklynet testnet, and anyone looking to explore it is highly encouraged to join.

Nomadic Labs’s engineers have played out various realistic scenarios on Weeklynet to showcase the baking and staking user experience, as well as the behavior of Adaptive Issuance and Adaptive Slashing under different conditions. Reports covering the actions and their effects are available here.

Additionally, we’ve released the code for an Adaptive Issuance simulator which can be used to model the behavior of the issuance rate in different scenarios. It also enables estimations of baking rewards for different baker configurations.

For more information, we recommend the Paris proposal announcement. For questions about Adaptive Issuance that may remain unanswered, feel free to reach out.

Adaptive Issuance is an ambitious upgrade which in its functionality embodies the Tezos ethos of constant evolution and adaptation. We are happy to have worked closely with the broader ecosystem on this feature, and it doesn’t stop here.

Close cooperation with bakers and other ecosystem actors is key to creating the best possible blockchain. We look forward to further strengthening this relationship, as the work to improve Tezos continues.

Faster, Higher, Stronger: introducing the Paris protocol upgrade proposals!

2024-03-28T09:00:00+01:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, & Functori.

Following the successful activation of the Oxford 2 protocol on February 9th, we are pleased to announce that the Paris protocol proposals, Paris A and Paris B, are ready!

As usual their “true names” are given by their hash: PtParisA6ruu136piHaBC7cQLDP87JEqtczJWP2pLa5QCELGBH5 for Paris A and, respectively, PtParisBQscdCm6Cfow6ndeU6wKJyA3aV1j4D3gQBQMsTQyJCrz for Paris B.

Either proposal, if adopted, would bring the following major updates and improvements to the Tezos protocol, notably:

Lower latency and faster finality with 10s block times without compromising decentralization or security.
The activation of the Data Availability Layer (the DAL) on Mainnet, boosting throughput and scalability of Smart Rollups.
Refinement to Tezos PoS, reducing the delays to acquire and update baking rights and simplifying their computation.

The two proposals differ regarding Adaptive Issuance, Staking, and Adaptive Slashing, a major overhaul of the fundamentals of Proof-of-Stake in Tezos:

Paris B includes Adaptive Issuance, Staking, and Adaptive Slashing. That is, these features would be immediately enabled upon protocol activation.
Paris A does not include these features. It offers bakers instead the possibility to vote for activating them later, via a a dedicated on-chain signaling mechanism.

In this article, we give a preview of the improvements described above and expand on the choices provided by both alternatives. Both proposals also include further minor improvements and other changes. A complete list of changes is provided in the Changelog.

10s block times bring lower latency and faster finality

As recently announced, the Paris protocol proposals reduce block time to 10 seconds, lowering latency and enabling faster finality, without compromising on decentralization.

This work builds upon previous foundations, like the adoption of the Tenderbake consensus algorithm and validation Pipelining, which led to halving block time to 15 seconds with Mumbai. In order to reduce block time further, the challenge was to identify and resolve remaining performance bottlenecks without compromising network safety and without forcing Tezos bakers to invest in expensive hardware.

It has always been our core belief that bakers should be able to participate in Tezos consensus with affordable, lower-end infrastructure. This is a core strength of Tezos that inherently impacts safety and decentralization, so we want to preserve it as Tezos evolves.

Based on our experiments, the following minimum specification is both affordable and performant enough to bake blocks on time:

3 CPU cores (arm64 or amd64/x86-64 architectures) – 2 are needed by the Octez node and 1 is needed by the Octez baker;
8GB RAM + 8GB swap (or 16GB RAM);
100GB SSD storage (or similar I/O performance);
a low-latency, reliable broadband internet connection.

Please see our recent blogpost for a thorough, in-depth analysis of this work.

The DAL activates on Mainnet, boosting Smart Rollups capacity

The Data Availability Layer (DAL) is a foundational step towards ensuring Tezos’ long-term scalability and provides a massive throughput boost for Smart Rollups like Etherlink. It enables Tezos Layer 1 to attest the publication of data living outside Layer 1 blocks, increasing by orders of magnitude the bandwidth of data attested by the Layer 1.

After being tested live on Weeklynet, both Paris proposals enable the DAL on Mainnet — with opt-in (but warmly encouraged!) participation of Tezos bakers.

At its core, the DAL is a permissionless, peer-to-peer (P2P) network, running in parallel with Layer 1. Anyone can join the Tezos DAL and submit and retrieve published data.

Tezos bakers participate in the DAL by attesting the published data’s availability. Concretely, bakers can attest the data seen on the DAL P2P network by including a DAL payload in their attestation consensus operations (the extended format of these operations is retro-compatible, and will be supported by upcoming versions of Tezos baking app for Ledger devices).

Smart Rollups like Etherlink can now retrieve attested data from the DAL, allowing for higher throughput without decentralization trade-offs.

All participants, including Smart Rollup operators and Tezos bakers, engage in the DAL via the DAL node – which is released in the Octez suite as an executable binary called octez-dal-node. It caters to the needs of anyone wanting to engage with the DAL, allowing them to publish, store, and exchange data in the DAL.

Participation by Tezos bakers in the DAL is optional. However, published data is only considered attested if it has collected attestations representing at least 66% of the aggregated stake of designated bakers.

Therefore, to ensure the DAL becomes a successful foundation for a high-throughput Tezos ecosystem — baker participation is essential! Here are a few pointers to get started:

Data Availability Layer (Octez and Tezos Protocol Docs)
The Data Availability Layer (Tezos Docs)
How to join the Tezos DAL as a baker, in 5 steps (Tezos Docs)

Adaptive Issuance, Staking, and Adaptive Slashing

Adaptive Issuance and Staking — only activated in Paris B — is a major overhaul of Tezos’ Proof-of-Stake mechanism, adapting the economics of tez to fit better with real-world usage, as argued in the Tezos Agora post “Why adaptive inflation matters for Tezos”, and to increase the chain security. This fresh new design is the result of a thoughtful rework of the initial one in the Oxford protocol proposal, based on both extensive testing, and feedback gathered from interactions with bakers, tooling and infrastructure providers, and other Tezos ecosystem members.

The Paris B proposal also introduce Adaptive Slashing, a refinement of the slashing mechanism that aims to distinguish innocent mistakes from deliberate attacks.

Paris A does not include Adaptive Issuance, Staking, and Adaptive Slashing (more on this below). In Paris A, these features are activated if a signaling mechanism is used by bakers.

Adaptive Issuance

The proposed mechanism ties the protocol’s regular issuance of tez to the ratio of staked tez over the total supply. In other words, the value of consensus participation rewards is no longer determined by fixed protocol constants. Instead, it is recomputed automatically at the end of each blockchain cycle, in order to nudge the staked fund ratio towards a protocol-defined target of 50%. When the ratio of staked funds decreases and diverges from the target, emission rates will increase, incentivizing participants to stake funds to re-approach the target, and vice versa.

Other rewards and incentives unrelated to staking are unaffected by the Paris protocol proposals, and their values remain set by fixed constants. Notably, the value of the Liquidity Baking subsidy remains a fixed amount.

The original implementation of Adaptive Issuance could cause abrupt fluctuations in participation rewards after the mechanism becomes active. Instead, the progressive min-max bounds mechanism implemented for Paris ensures that issuance rates change gradually over 6 months, smoothing the transition to the new system and allowing bakers to adjust fees progressively.

Staking

The Paris proposals introduce a new role — staker — in addition to delegate (baker) and delegator. Stakers contribute to their chosen baker’s security deposit without the baker taking custody of their funds, for which they are allocated a proportional share of rewards automatically by the Tezos economic protocol. Unlike liquid delegation, staked funds are frozen and are subject to slashing, in proportion to their contribution to their chosen baker’s deposit.

Bakers and stakers can modify or remove their stakes via an improved dedicated manual interface which replaces auto-staking and the set deposit limit operation. Changes in staked balances affect baking rights after 2 cycles. Unstaked funds become finalizable (they are unfrozen) after 4 cycles.

Bakers can configure their staking policy by setting parameters that explicitly state the desired limits to staking capacity, including the possibility of refusing third-party stakes. By default, delegates do not accept any funds from stakers.

To encourage staking over liquid delegation, staked and delegated funds have different weights in the computation of a delegate’s baking and voting powers: staked funds (their own and those from stakers) count twice as much as delegated funds.

The report from a recent live testing experience on Weeklynet provides several usage scenarios for bakers and stakers, and gives further insight into how to observe and analyze the impact of manual stake adjustments.

Adaptive Slashing

As stakers are subject to slashing if their chosen bakers misbehaves, the effect of penalties extends to more users. It becomes important then to differentiate between sporadic incidents arising from involuntary errors, from malicious, sustained attacks.

Adaptive Slashing refines the computation of slashing penalties for double-attesting blocks according to the total attestation power of those involved in double-attesting a single block.

If a low fraction of the total attesting stake is involved, this would result in the offending baker and stakers receiving moderate penalties. These increase smoothly as the greater the stake involved is until a critical level is reached (defined as one third of the total attesting stake). At the critical level (and beyond), bakers and stakers incur in the maximum penalty: being slashed 100% of their deposit.

See this design document for further detail.

Paris A, Paris B and feature activation

As we mentioned earlier, Paris B includes Adaptive Issuance, Staking, Adaptive Slashing while Paris A does not include them.

Paris A: the protocol proposal does not include Adaptive Issuance, Staking, and Adaptive Slashing. Instead, it offers the option to activate them later via a separate per-block voting mechanism, where bakers can signal for or against enabling these features. If net support stays above 80% over a long enough period, the feature activates. A similar vote to gauge baker sentiment has been running since Oxford 2 was activated, and we want to emphasize that the activation of Paris A would reset the current vote.

Specifically, the per-block voting mechanism enabled by Paris A works as follows:

Bakers can vote for (On) or against (Off) the activation of the guarded features, and they can also vote Pass. Absence of signaling equals a Pass vote.
Bakers’ votes are weighted by their staking power, as in the on-chain governance mechanism.
The vote is driven by an exponential moving average (EMA) whose half-life is two weeks. That is, it takes two weeks for the EMA to rise from 0% to 50%, assuming only votes in favor are casted in favor in the period.
Activation of the features requires the EMA to reach a supermajority of 80% On votes out of all On + Off votes. Pass votes are not counted.
There is no time limit or quorum requirement for the vote
If the EMA reaches the 80% On supermajority, an adoption phase of around two weeks (5 cycles) launches, after which the features are activated.
There is no mechanism for deactivating Adaptive Issuance after the Adoption phase launches. Continued signaling is possible, but the EMA will have no effect on the status of these features.

Paris B: Adaptive Issuance, Staking and Adaptive Slashing would activate roughly two weeks (5 cycles) after the proposal is activated on Mainnet, without any additional vote (and regardless of the status of the currently active signaling mechanism).

Paris B offers bakers an opportunity to activate Adaptive Issuance, Staking and Adaptive Slashing right away, if they are already satisfied with the merits of the proposal and want to see it come to effect as soon as possible. Paris A allows the debate on the merit of the features (or lack thereof) to extend further, but comes of course with the opportunity cost of deferring their adoption if they are deemed useful.

Adaptive Issuance, Staking, and Adaptive Slashing would be active on Mainnet earlier with Paris B than they would be with Paris A, even after assuming that 100% of votes are in favor. Under such a scenario, the necessary inertia of the signaling mechanism implies a minimal time to activation of roughly 6 weeks after protocol activation. More conservative participation scenarios would entail delays of at least 6 months.

Both Paris proposals implement further changes to Tezos Proof-of-Stake which are independent of Adaptive Issuance, Staking, and Adaptive Slashing. These changes allow for a simpler computation of baking rights, and reduce the length of a number of delay and grace periods in Tezos Proof-of-Stake. The updated values for the protocol constants governing these periods are chosen by taking into account Tenderbake’s deterministic finality, and the network’s current social organization.

One key change is shortening the consensus rights delay to 2 cycles – that is the wait period for newly computed consensus rights to take effect. This change propagates further to other periods and delays, notably:

The delay for new or updated Consensus keys to become active is now 2 cycles.
The grace period for baker’s inactivity period is now 3 cycles — plus 2 extra cycles for bakers that have just been (re-)activated.

See for further details on all changed constants.

Exciting times ahead

The Paris proposals are packed with features aimed at making Tezos faster, enabling higher throughput and standing on (even) stronger foundations.

We are quite proud of the proposal content, and the effort put into developing these features. If they are announced today, it is also thanks to the feedback from, and continuous exchanges with, tooling and infrastructure builders, bakers, and the Tezos community at large.

On that note, we would like to invite everyone to reach out to us on Tezos Discord for further questions and feedback. And stay tuned to the new #baking-announcements channel.

Let’s continue building a bright future for Tezos together.

10 Second Blocks: A Faster Tezos, Fully Decentralized

2024-03-08T16:00:00+01:00

TL;DR: Based on meticulous testing, Nomadic Labs and Trilitech propose reducing Tezos’ Layer 1 block time from 15 to 10 seconds. This blog post walks you through the test setup and results, including a hardware recommendation for bakers.

(EDITED 27-03-2024 to update minimal recommended hardware specs)

Block time is an important and highly visible parameter for a blockchain.

It determines how often incoming transactions are added to the chain, and on Tezos it impacts the user experience in two noticeable ways: latency and finality.

Latency is how quickly a new transaction is first applied by the network. The shorter the block time, the shorter the wait before moving on with what you were doing, and the smoother the user experience.

Finality is the time it takes for a transaction to be considered irreversible. The shorter the time to finality, the smaller the window for malicious actors to attempt double-spending or other fraudulent activities. Faster finality makes a blockchain more reliable and useful for mission critical and high-value transactions.

Tezos’ ‘Tenderbake’ consensus algorithm achieves finality after two blocks, and the current block time of 15 seconds hence equals a time to finality of around 30 seconds. In comparison, the time to finality on Ethereum is ~15 minutes (64+ blocks), and on Solana it’s 12 seconds (32 blocks).

10 second blocks: secure, stable, decentralized

Tezos launched with a Layer 1 block time of 60 seconds in 2018, and developer teams have since worked on optimizing the network to enable shorter block times. In 2021 block time was reduced to 30 seconds, and in 2023 it was further reduced to the current 15 seconds.

In an upcoming “P” protocol proposal, we include a reduction to 10 seconds, enabling a time to finality of 20 seconds.

Why not just step on the pedal and lower block time to, say, 1 second?

As with most parameters – block size being another example – it’s not quite that simple. Each blockchain network works differently, but they share common challenges in that they must balance speed with other factors, notably:

Security. Transactions are sufficiently verified before being accepted by the network.
Stability. The blockchain runs reliably, with outages close to non-existent. Also called liveness.
Decentralization. Broad distribution of validators prevents colluding actors from selectively censoring transactions.

While most blockchains more or less align on the first two, decentralization is often treated differently by various blockchains with diverging philosophies and assumptions.

To achieve higher speeds, some blockchains define a limited set of validators that secure the network. Others have high requirements in terms of hardware, internet connection, and minimum stake, effectively creating an economic barrier.

The Tezos ecosystem prioritizes open participation and a low barrier of entry for bakers (validators). The goal is to have bakers of different geographical regions and economic abilities participate in securing the network, which encourages decentralization and makes the network more inclusive.

Recommended hardware

With recent optimizations of the network, we are confident that a block time of 10 seconds will not harm security, stability or decentralization.

The conclusion is based on thorough testing, performed with strict performance requirements, high load on the network (full blocks), and with added delays simulating Ledger Nano S signing and unstable network conditions.

In fact, the test was performed successfully with a block time of 8 seconds, and 10 seconds was chosen to provide extra safety margin. For a deep dive into our testing methodology and the optimizations that made the reduction possible, see sections below.

Based on our results, we recommend the following minimum hardware specifications, which are viable for all and performant enough to bake at the target block time.

3 CPU cores
8GB RAM + 8GB swap (or 16GB RAM)¹
100GB SSD storage (or similar I/O performance)
A low latency reliable internet connection

As the work to reduce latency and time to finality on the Tezos network continues, we expect further reductions in Layer 1 block time, but low hardware requirements remains a goal.

And now, let’s dive into the details behind the reduction to 10 seconds. Heads up, it’s going to get technical.

The road to 10 seconds

We start with a brief history of Tezos network optimizations and block time reductions.

In 2021, a reduction from 60 to 30 seconds was made possible with tweaks to the consensus algorithm (Emmy*) in protocol Granada.

In 2022, protocol Ithaca replaced Emmy* with Tenderbake, a brand new consensus algorithm with deterministic finality, which opened the door to further optimizations. Ithaca also introduced a ‘light check’ validation of manager operations (such as transfers and contract calls) allowing nodes to propagate blocks faster through the network. It also introduced mempool optimizations. The light check approach was later improved in protocol Kathmandu, as part of the pipelining project.

In 2023, protocol Lima extended the light check concept to other types of operations (consensus, voting, etc.). Building on these advancements, the block time was reduced to 15 seconds in the Mumbai protocol.

More recently, in the Nairobi protocol, consensus operations propagation was again improved to further accelerate the consensus process.

At this point, identifying the obstacles for further block time reduction was not straightforward, and we deemed it necessary to improve our methodology with reproducible large-scale experiments.

Below, we cover the setup, the results, identified obstacles and improvements, and next steps.

Methodology

The purpose of our experiments is to examine whether a given block time is “safe”, and if not, identify bottlenecks. We define a safe block time as one where network performance lives up to these criteria:

at least 99% of blocks are baked at round 0
no blocks are baked with a round higher than 1

Framework

To get meaningful data about the consequences of a reduced block time in our simulations, we need to mimic Tezos Mainnet as closely as possible and load the chain with heavy traffic.

For this we have developed the framework Tzimulator, which is capable of executing large-scale experiments within a controlled environment.

The tool uses real-world data in the form of historical operations from Tezos Mainnet. It takes a snapshot of the chain as a starting point, and executes the subsequent two weeks of operations on top of it.

A cluster of Tezos nodes is started
The snapshot from Tezos Mainnet is imported
Using the historical operations, an injector node keeps a high level of proposed operations in the mempool, ensuring full blocks (at the gas limit) throughout the test
The test is executed for a specified number of operations or for a specified time.

Architecture

The Tzimulator framework is built on top of Kubernetes (also known as K8s), an open-source system for automating deployment of containerized applications.

Thanks to this highly automated framework, we can efficiently run different realistic simulations while tweaking various parameters. For testing block times, we have also run simulations on different hardware architectures.

All the experiments use machines with:

6.5Gb of RAM memory,
100Gb persistent SSD storage,
Gigabit network with simulated network delays

Meanwhile, the CPUs have been configured differently to reflect different hardware requirements. The experiment was carried out with 4 CPU cores of the following architectures:

n1-standard-4 (2Ghz),
n2-standard-4 (2.6Ghz)

In addition to the above parameters we can adjust the following:

Signing delay: Simulates signing times similar to slow hardware devices such as Ledger Nano S, remote signers, etc.
Network delay: Simulates network congestions, with ad-hoc delay ranging from 20ms to 150ms.
Mempool load: Defines target number of operations in the mempool at all times.
Block time: The targeted time between blocks. Various block times were tested (15s, 10s, 8s, 5s) with variations of the above parameters and different versions of Octez to determine the safety of each configuration.

We can also configure which data are collected for a given simulation run. For every node and baker we can recover standard Octez logs (daily-log). For a comprehensive overview and analysis of node behavior, Octez Metrics can be enabled to feed a Prometheus database with a Grafazos dashboard. Finally, our consensus inspection tool Teztale can be used for analyzing consensus performance.

Differences with Mainnet

Presently, the framework diverges from Mainnet in a few ways.

Node and baker configurations are uniform, with all instances using identical hardware, signing hardware, and docker images.

The framework also uses a reduced number of bakers (up to 250) and nodes (up to 250) compared to Mainnet’s larger count (around 400 bakers and 5000 nodes). All nodes operate within the same cluster, and network delay is instead simulated using a custom ad-hoc solution.

Regarding the network topology, the framework has a simpler setup compared to the Mainnet network. It uses the default P2P configuration, with each node connected to ~50 other nodes. The resulting graph for 250 nodes is not strongly connected, and some hops are needed for operations from the injection to reaching the farthest node.

The workflow: Identifying and fixing bottlenecks

Pinpointing bottlenecks and implementing solutions was carried out as an iterative process:

Choose block time and other parameters.
Conduct an experiment with the network under full load.
Analyze data collected from an internally developed profiler, Octez Metrics, and Teztale.
Identify potential bottlenecks based on analysis of the collected data.
Develop an optimization and conduct additional experiments using the same parameters and the newly implemented improvement.
If the optimization demonstrates improved results, merge it into the Tezos codebase, and re-initiate the process.

The baseline

We started out with a baseline experiment: a network configured with the parameters below, notably a blocktime of 15 seconds which is equivalent to current Mainnet.

Node version: Octez v18.1
Block time: 15 seconds
Simulated signing delay: 1 second
Random network delay: 20-150ms
Number of nodes (each with an associated baker): 250
Target mempool operations: 2500
Hardware:
- 6.5Gb RAM
- 4 CPU cores
- n1-standard-4 machine with Intel Broadwell CPU platform
- PersistentSSD disk type

The summary report for the experiment shows that out of the 363 levels, only one level was found to be at round 1 where the rest were at round 0.

We observed that quorum – 66% attestation required for completing a block level – was reached in just over 11 seconds on average. Average time for block application (validation time + application time) was 6 seconds.

Average time elapsed since block timestamp, high network load (full blocks)
Stages of a round	Baseline results
Block validated: ready for consensus	4.51s
Block fully applied: new chain head	6.02s
Pre-quorum reached: first consensus vote complete	7.47s
Quorum reached: second and final consensus vote complete	11.14s

The graph below (‘Attestation Reception Delay’ from Teztale’s Level page) shows the result from a single, random block level during the experiment. It aligns with the summary, but also reveals that the (pre-)attestation receptions have a noticeable difference between the first and last occurrences.

Overall, our observations confirmed that the network was safe with a 15 second block time. On average, quorum was reached after 11.14 seconds, with a maximum time of 12.46 seconds.

However it also confirmed that improvements would be required in order to lower block time to 10 seconds.

Implemented improvements

Providing a stepwise walkthrough of every performed experiment and subsequent implementation of improvements would make this blog post_ very_ long.

Rather, we present an overview of all improvements implemented through iterations of the process described above. We also present the results of an experiment run with all improvements implemented.

The improvements consist of changes to the Octez baker and the Octez node’s mempool handling. A good example of an improvement with a significant impact is the pre-emptive forging implemented into the Octez baker.

Previously, the baker proposing the block after the current one would wait until the end of the current level to begin the process. But that is not necessary.

Proposing a block can be split into three parts: forging, signing and injecting. Forging and signing are the most time consuming parts, but they can be started before the block level has started.

Most blocks have “idle” waiting time for bakers after quorum is reached, and pre-emptive forging lets the next baker begin forging the next block during this time, so it is ready for injection at the start of the new level. The result is earlier propagation through the network.

With 15 second block time, this change led to quorum being reached ~2 seconds faster – a significant improvement.

Additional baker improvements:

Operations selection. Bakers are prevented from wasting time on validating a manager operation that would always fail due to insufficient remaining gas in the block.
Operations request mechanisms. Bakers no longer fetch complete mempools from peers, which can delay consensus operations when the mempool is large.
Consensus disk writes. The number of on-disk writes when registering consensus votes are reduced.
Optional state disk writes. Consensus state is no longer written to disk at each block.
Attestation injection time. Attestations are sent immediately after pre-quorum has been detected, without waiting for block application.

Mempool improvements:

Advertisement computation. A change to how the mempool data structure is filled, which saves a lot of computation.
Advertisement priority. Avoids re-advertising non-consensus operations that have been advertised for a previous block.
Classification of operations order. Optimizations of mempool search reduces time spent by putting more time consuming elements last.

Lastly, the number of cycles with metadata kept on disk for full and rolling nodes has been reduced from 6 to 2. The result is a lighter storage allowing faster access.

Result of improvements

Below we present the results of an experiment using a development version of Octez (v20-dev) with the above improvements enabled. We used the same hardware specifications and parameters as in the baseline experiment.

The perhaps most important metric – the time required to reach quorum, allowing the network to progress to the next block – decreased significantly, from ~11 to ~4.5 seconds.

Average time elapsed since block timestamp, high network load (full blocks) Before Octez improvements (v18.1) and after (v20-dev)
Stages of a round	v18.1	v20-dev
Block validated: ready for consensus	4.51s	1.08s (-76%)
Block fully applied: new chain head	6.02s	2.53s (-58%)
Pre-quorum reached: first consensus vote complete	7.47s	2.88s (-61%)
Quorum reached: second and final consensus vote complete	11.14s	4.55s (-59%)

The ‘Attestation Reception Delay’ graph shows the result for a single, random block level during the Octez v20-dev experiment. It aligns with the summary and shows that (pre-)attestations now arrive quicker and more consistently around the average time.

Based on these results, we are confident that it is safe to reduce Tezos’ block time to 10 seconds, once the improvements are implemented in Mainnet nodes and bakers.

Our experiments showed even an 8 second block time to be safe, but the more conservative 10 seconds was chosen for the upcoming “P” protocol proposal for extra safety margin.

Hardware recommendations (again)

The following hardware specifications match the ones used for our experiments. They are provided here as a guideline, but bakers are advised to perform their own testing.

3 cores, 2 needed by the node and 1 needed by the baker (arm64 or amd64/x86-64)
8GB of RAM + 8GB of swap or 16GB of RAM¹
100GB SSD storage (or similar I/O performance)
A low-latency reliable internet connection

While it is possible to run the baker setup on a cloud platform, it may not be cost effective over the long term. Instances with specs similar to following (or higher) would work:

n1-standard-4 (2 GHz) machine with Intel Broadwell CPU platform
6.5Gb RAM allocated to baker and node processes
100GB Persistent disk SSD disk type

Impact on Mainnet

As mentioned initially, lowering the block time improves the latency of Tezos Layer 1, resulting in a smoother experience and faster finality.

Some parameters are affected by the change:

Gas per block. Since the gas limit per block is correlated with block time, a 10 second block time means the gas limit will be lowered from 2.4M to 1.73M gas units. As there are more blocks pr. minute, throughput is unchanged. The hard gas limit per operation is left unchanged at 1M gas units.

Consensus traffic. Increasing the number of blocks per minute increases bandwidth consumption from consensus operations. Experiments showed an increase from 200kB/s to 250kB/s on a network running at full load, which is well within acceptable limits.

Cycle length. Increasing the number of blocks per minute (from 4 to 6) also impacts the cycle size as each cycle will have more blocks (from 16384 to 24576).

For nodes running a rolling history mode, we’ve implemented a change that counteracts the increase in storage due to the extra blocks. Historically, such nodes have been required to store two weeks of blockchain data (5-6 cycles) for security purposes, but the deterministic finality of the Tenderbake consensus algorithm allows us to safely reduce it to 3-6 days (1-2 cycles).

Hence, while the number of blocks is increased by 50%, the storage footprint for rolling nodes is reduced by ~75%. Meanwhile, archive nodes and full nodes, which store all the chain’s history, will see an increased disk footprint with a 10 second block time.

Next steps

We will continue to run experiments, identify bottlenecks and implement improvements to further improve overall performance of Octez and enable even lower block times.

A factor worth pointing out is the signing delay introduced by older hardware devices, such as the Ledger Nano S. Experiments with added delays corresponding to these hardware devices yielded the following results:

Machine type	Signing device	Minimal safe block time
n1-standard-4 (4-core vCPU @ 2GHz, 8GB RAM assigned)	Ledger Nano S	8
n1-standard-4 (4-core vCPU @ 2GHz, 8GB RAM assigned)	Ledger Nano S+	7
n2-standard-4 (4-core vCPU @ 2.6GHz, 8GB RAM assigned)	Ledger Nano S	6
n2-standard-4 (4-core vCPU @ 2.6GHz, 8GB RAM assigned)	Ledger Nano S+	5

The results indicate that a further reduction of block time (by 1 second), could be achieved confidently, if the majority of bakers currently using Ledger Nano S were to adopt the Ledger Nano S+, or another equivalently faster signing solution.

Also worth noting is the difference in performance between the two machine types. Unsurprisingly, the ‘n2’ machine’s faster CPU enables significantly lower block times. However, hardware specifications must always be considered in the context of making sure Tezos has a low barrier for participation to encourage decentralization.

Keeping Tezos decentralized

As our experiments show, Tezos is able to have a Layer 1 block time of 10 seconds while maintaining low hardware requirements.

A quick way to enable further block time reductions would be to increase requirements for hardware. This is not an uncommon approach for blockchains looking to boost performance.

Ethereum, for example, has higher official hardware requirements than Tezos – at least 16GB RAM and a more performant CPU. A chain like Solana is able to produce blocks at a very high pace, but relies on significantly higher official hardware requirements, including 12 CPU cores @2.8 GHz, 256GB of RAM, 2 SSD disks up to 1TB, and 1 Gbit/s internet connection with 10Gbit/s preferred.

For now, we maintain a more conservative target for our work on Tezos. This is because higher barriers for participation may affect decentralization negatively – the economic cost of the hardware is one factor, with access to the required internet bandwidth being another.

As a reminder, decentralization is not just a principled ideal. Ultimately, decentralization remains the only way to enable true censorship resistance – a core purpose of blockchains.

Updated on March 27th 2024. The original version mentioned 8GB of RAM. ↩↩

PoS refinements and Private Rollups: Oxford 2 upgrade is live!

2024-02-09T14:40:00+01:00

On February 9 2024 13:23:45 UTC, the Tezos blockchain successfully activated the Oxford 2 protocol upgrade at block #5,070,849.

This 15th upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

Oxford 2 contains the following changes:

Refinement of Tezos PoS: the Oxford 2 proposal introduces changes to slashing and an automated staking mechanism for bakers. The latter aims to smoothen the transition towards a new staked funds management API and avoids manual bookkeeping to counter over-delegation.
Private rollups: Oxford 2 introduces private Smart Rollups, allowing developers to choose between permissionless or permissioned deployments. Additionally, Oxford 2 simplifies the deployment of rollups both on protocol and periodic test networks, as well as on ad-hoc dedicated ones.
Timelocks are re-enabled: a new design and implementation of Timelocks addresses security concerns that led to their temporary deactivation in a previous protocol upgrade.

For more details, see the Oxford 2 announcement post.

A deeper technical description can be found in the protocol proposal’s technical documentation, and a complete list of changes is provided in Oxford 2’s changelog.

Big things coming

The activation of Oxford 2 is just the beginning of a big 2024 for Tezos. We are excited to be part of the ongoing work to evolve Tezos – and the blockchain space.

Data-Availability Layer (DAL) on testnet. The DAL is currently live on Tezos’ Weeklynet, and we encourage bakers and rollup operators to help us test this cornerstone of Tezos’ future. To understand what it does, and what it means for Tezos, see our introduction to the DAL.

Get ready for the launch of Etherlink. Etherlink will be an EVM-compatible optimistic rollup with high throughput, a decentralized sequencer, low gas fees, and MEV protection, powered by Tezos Smart Rollups. Check out etherlink.com to learn more and get involved.

Meanwhile, we are busy working on an upcoming protocol upgrade proposal, “P”. Stay tuned!

Introducing Private Smart Rollups on Tezos

2024-01-30T16:00:00+01:00

TL;DR: A new whitelisting feature allows Smart Rollup builders to choose their desired level of decentralization.

The majority of blockchains don’t provide transaction privacy by default. Tezos is no exception. An address is a pseudonym, and transactions are public. A resourceful and determined actor will often be able to connect an address to a real identity, and then all transaction history is freely available.

This is the case whether you are transacting on Layer 1, in an optimistic rollup, or in a zk-rollup (also called validity rollup). By default there is no privacy and special solutions are required to achieve it.

So how can we achieve privacy in Smart Rollups for use cases that require it?

A common way to achieve privacy is to use zero-knowledge proofs, also called zk-proofs¹. In short they enable you to prove properties about some data without revealing the data itself. Zk-proofs have been adopted by many blockchain projects, including Tezos’ Layer 1, in order to provide optional privacy for simple transactions. However, using zk-proofs quickly becomes complicated and resource intensive when used for complex transactions and smart contracts with many participants.

Some techniques exist that can be applied, such as multi-party computation where a group of participants perform collaborative computation over pooled data while preserving confidentiality for individual inputs. But this is a complex solution that relies on advanced cryptography. It’s costly to implement, and it doesn’t scale well.

So, is there not a simpler way to achieve privacy in an optimistic rollup? There is, but it requires us to relax some security assumptions, and it is only useful for some specific use cases.

Trading (some) decentralization for privacy

The Nomadic Labs adoption team has worked with financial institutions with specific requirements for privacy, and we have developed a simple solution that satisfies their needs. It is a Smart Rollup, but we relax assumptions regarding 1) publication of transaction data, and 2) decentralization of the rollup.

Transaction data for the rollup is not posted on Layer 1 or otherwise publicly, and Layer 1 provides no guarantees about its availability for later verification. Instead transaction data is posted to a non-public Data-Availability Committee (DAC). A DAC is typically run by a consortium of interested parties, which retain control over the data. This enables privacy for incoming transaction data at the cost of verifiability for outsiders.

Imagine a consortium of banks that use a Smart Rollup for fast and easily auditable settlement. As the transaction data is confidential, it is posted to a non-public DAC. The only thing made public are commitments – hash values representing the rollup’s state that reveal nothing about the underlying data.

Using a DAC has been possible on Tezos for a while, but it isn’t enough for keeping a rollup private. The permissionless nature of Smart Rollups means that anyone could create challenges to rollup commitments, even honest ones, which could force rollup operators to reveal some data publicly or face an economic penalty. Since only parties with access to the DAC can meaningfully challenge commitments in the first place, it does not make sense to let anyone do so.

For this reason, the Oxford 2 protocol proposal contains a small change to the protocol allowing Smart Rollup developers to define a whitelist of allowed operators (and hence potential challengers). It’s a simple and private solution for a consortium-style rollup that comes with the benefits of being secured by Tezos consensus, and having a direct bridge to Layer 1 and the wider Tezos ecosystem.

To be clear, implementing a whitelist effectively makes the Smart Rollup permissioned – not fully decentralized. Having it as an option lets Smart Rollup builders choose their desired level of decentralization based on their particular use case.

How Smart Rollup whitelists work

Whether a Smart rollup is private (permissioned) or public (permissionless) is defined at deployment. If a whitelist is defined, the rollup is considered private by the protocol. If not, it is considered public.

For a private rollup, only addresses on the whitelist can be operators for the rollup, including publishing commitments, challenging commitments, and participating in refutation games.

It is the responsibility of the rollup kernel to maintain the whitelist. It can be updated with a specific outbox message, which includes a new whitelist replacing the existing one. Kernels must therefore implement their own access control logic to add and remove addresses.

Note that a private rollup can become public, but not the other way around.

Private rollups becoming public. If the specified outbox message is sent without containing a whitelist, the rollup becomes public. If a rollup is never meant to become public, its kernel can be designed to disallow empty whitelist messages.
Public rollups cannot become private. An outbox message with a whitelist for a public rollup will fail. This is to prevent a rollup becoming private after it has been advertised as public.

More control for builders and testers

The whitelisting feature doesn’t just benefit those working with confidential data.

Some developers may prefer to initially deploy a “beta” rollup with a whitelist, even if still on testnet, and only later make it public. This kind of “training wheels” approach is often used for launching blockchain projects.

Others may opt for a permanent semi-private setup, which can reduce risk for the rollup operator. The transaction data are available for all to check, but only whitelisted operators can initiate refutation games. Transparency is kept, but bad-faith actors can’t leverage kernel bugs or errors by the rollup operators to cause problems, such as incurring a loss of the 10,000 tez stake required to operate a rollup.

With the whitelisting feature, builders have the option to deploy rollups with the level of decentralization that fits their use case, whether it has to do with privacy, security or simply comfort.

We welcome any kind of feedback that helps us make Tezos the best possible solution for blockchain builders. If you are interested in launching a project on Tezos and have questions or suggestions, don’t hesitate to reach out.

Notes

Zk-rollups use elements from zero-knowledge technology, but the “zk” name is slightly misleading as the technology is implemented in a way that provides no more privacy by default than Layer 1 or optimistic rollups. ↩

The Data-Availability Layer is Coming to Tezos – Now on Testnet

2024-01-25T16:00:00+01:00

TL;DR: The Data-Availability Layer is live on Tezos’ Weeklynet, and we call on bakers and rollup operators to help us test this groundbreaking “rollup booster”. This blog post shows you how to get started.

In a previous blog post, we introduced the Data-Availability Layer (DAL) as a game-changing “rollup booster” for Tezos. If you’re new to the DAL concept, we highly recommend starting with this blog post. For a more technical walk-through, see this presentation from the TezDev 2023 conference.

In a nutshell, the DAL is a permissionless peer-to-peer (P2P) network which is part of the Tezos protocol and runs in parallel with Layer 1. It allows for rollup transaction data (Layer 2) to be published outside of the confines of Layer 1 blocks. Layer 1 bakers continuously monitor the DAL and attest on Layer 1 whether a given piece of data is available on the DAL.

What’s happening?

We expect the DAL to go live on Mainnet in 2024, which is why we are now publishing this call for testers on Weeklynet.

Once the DAL goes live on Mainnet, bakers will have to set up a DAL node in addition to the baker daemon. It’s important to test this step ahead of Mainnet activation, as DAL participants will be launching a new P2P network. To be clear: testing distributed networks is hard, and nothing can compete with a live, realistic testnet.

Additionally, we need feedback from rollup operators. The DAL offers rollup operators a permissionless, out-of-the-box data availability solution, enabling them to publish large amounts of data for a rollup at a very low cost and without sacrificing decentralization. Of course, success for such a system is contingent upon a great developer experience. We will need fresh eyes on this process and feedback to help us make it the best possible experience.

Getting started

To get you up and running, we have recently published several resources:

Technical documentation for the DAL
A tutorial explaining how to easily setup a DAL node
A tutorial demonstrating how a Smart Rollup can use the DAL

These documents should be enough to get you started. Afterward, the sky is the limit: you decide which data will be published on the DAL and for what purposes.

The DAL is a cornerstone for the future of Tezos, including innovations such as Etherlink and Tezos 2.0. That is why your feedback is very important. We encourage anyone involved with testing these features to share their feedback in a comment on this Tezos Agora post.

Next steps

We are planning to introduce the DAL with a phased approach. More precisely, the rollout plan for the DAL — assuming the community embraces it — is to:

First enable the DAL on Mainnet without baker incentives and with optional participation.
Later introduce baker incentives for the DAL, at which point DAL participation becomes an integrated part of the baker role.

The rationale is that we want to give bakers ample time to familiarize themselves with the DAL and validate its functionality before integrating incentives on Mainnet, as the incentives touch on a key part of Tezos Layer 1: the baker role and related economics/rewards.

We intend to publish a design proposal specifically for baker incentives around the time when we propose the protocol enabling DAL, along with an initial implementation live on Weeklynet. This should give plenty of time for bakers to provide feedback on these aspects too.

The DAL is a key element in keeping Tezos a technological frontrunner. Proper testing of it is vital, and we look forward to working with the broader Tezos community to make it ready for Mainnet.

Announcing Oxford 2: a revised 15th protocol upgrade proposal

2023-12-05T10:00:00+01:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

After the Oxford protocol upgrade proposal did not gather the required supermajority of affirmative votes in its promotion period vote, we presented an action plan centered around improving staking UX for bakers and furthering the continuous development of Smart Rollups and other features of the Tezos protocol, e.g. Timelocks.

We thank bakers, infrastructure and application developers, and the Tezos ecosystem at large for their feedback, in particular towards the proposed changes in slashing and automated staked funds management for bakers.

Today, we are pleased to announce that this revised Oxford 2 protocol proposal is ready to be submitted to the Tezos governance mechanism. The proposal’s hash is ProxfordYmVfjWnRcgjWH36fW6PArwqykTFzotUxRs6gmTcZDuH.

In a nutshell, the proposed changes in Oxford 2 can be summarized as follows:

Adaptive Issuance and the new Staking mechanism are disabled: Oxford 2 would not allow the activation of Adaptive Issuance nor extend the Staking mechanism to delegators in its lifetime. The feature vote mechanism inherited from Oxford remains available, but the result of the vote is ignored by the Oxford 2 protocol.
Refinement of Tezos PoS: the Oxford 2 proposal introduces changes to slashing and an automated auto-staking mechanism for bakers. The latter aims to smooth the transition towards the new staked funds management API and avoids manual bookkeeping to counter over-delegation.
Timelocks are re-enabled: a new design and implementation of Timelocks addresses security concerns that led to their temporary deactivation in a previous protocol upgrade.
Smart Rollups keep evolving: Oxford 2 introduces private Smart Rollups, allowing developers to choose between permissionned or permissionless deployments. Oxford 2 also simplifies the deployment of rollups both on protocol and periodic test networks, as well as on ad-hoc dedicated ones.

In the sequel, we provide further detail on the main features of the Oxford 2 protocol upgrade proposal, and highlight key changes with respect to its predecessor, Oxford.

A complete summary of changes is available in Oxford 2’s Changelog.

From Oxford to Oxford 2: Adaptive Issuance and Staking

First and foremost, the Oxford 2 proposal does not enable the activation of Adaptive Issuance, nor allows delegators to become stakers. A disabled feature flag has been introduced, which would prevent these features from activating in the protocol’s lifetime.

The disabled feature flags signal our position that the feature is not yet mature enough to be activated on Tezos Mainnet, and we will continue working together with infrastructure builders to ensure that both the feature and the required ecosystem tooling support are ready soon.

However, the feature activation vote introduced in the original Oxford proposal is still available to bakers: participation is still recorded by the Tezos protocol, but it does not take effect and can never activate the feature. We will reset the associated EMA to 0 in any future protocol proposal developed by us that enables the feature flag.

The rationale behind this choice is twofold: first, bakers can manifest their attitude towards Adaptive Issuance and the new Staking mechanism in an informal manner, which is a useful signal. Second, this is a conservative approach which causes less friction to ongoing efforts to develop and test support for these features.

Automated Staking

The Oxford 2 proposal introduces an automated staking mechanism for bakers aiming to smooth the transition between the current frozen deposits management system in Tezos Mainnet, and the staked funds management mechanism originally proposed in Oxford.

The rationale is that the resulting mechanism should not entail noticeable changes for baker operations. In a nutshell,

The set_deposit_limit operation is preserved – to enable bakers to continue capping their deposit limit further than the default value of 10% of their staking balance
The manual staked funds management API implemented in Oxford is disabled, and the protocol automatically adjusts frozen deposits to match the chosen deposit limit at the end of each cycle.

See this design document for further technical details of the proposed mechanism – and don’t hesitate to reach out to us on this Tezos Agora thread with questions or feedback.

Refined slashing

In Oxford 2, there are also a few changes to slashing penalties for double signing – that is, for behaviors that might threaten the safety and liveness of the Tezos ledger. As originally proposed in Oxford, the penalties for double baking are made proportional to the baker’s staking balance. However, Oxford 2 sets the penalty at 5% of the stake, instead of 10% as set in Oxford – or the 640 tez as it is currently the case on Tezos Mainnet.

Oxford 2 also implements changes to how and when slashing penalties are applied by the Tezos economic protocol:

Slashing penalties take effect at the end of a blockchain cycle instead of immediately after the inclusion of the corresponding denunciation operation. They are applied after the distribution of the cycle’s attestation rewards and before the computation of new voting and baking rights.
Balance updates from slashing penalties are published in the block ending the cycle, including burned tez and denunciation rewards.
Repeated offending bakers (those who got 51% of their stakes slashed over 2 cycles), will be marked as forbidden at the inclusion of the triggering denunciation, immediately preventing them from participating in consensus for the rest of the cycle — when their status would be reassessed using the same criteria.

See Slashing and you: a primer for further detail, including the effect of slashing on frozen deposits. Again, don’t hesitate to give feedback or ask questions on Tezos Agora.

Timelocks are re-enabled

As originally planned for Oxford, the Michelson instructions providing support for cryptographic Timelocks are re-enabled in Oxford 2.

Timelocks were disabled in the Lima protocol upgrade, following the discovery of a security flaw at the application level. This error could plausibly allow an adversary to play two roles at the same time in the Timelock protocol to bypass the security of the system, acquiring the secret. As a result, the attacker would be able to create and post false proofs and unlock the Timelock chest.

The feature has since been redesigned¹, and after thorough testing, including an external audit performed by Inference AG, we are ready to re-activate this feature in the Tezos protocol.

Additionally, the new version of Timelock features new client CLI commands allowing users to create, open, and verify time-locked chests, as well as to execute the pre-computations required for fast chest generation.

Introducing private rollups — and other improvements to Smart Rollups

Oxford 2 introduces a new complementary feature for Smart Rollups: the ability to deploy private rollups. They allow for defining a whitelist of allowed rollup operators, which is maintained by the rollup kernel and enforced by the Tezos protocol.

Note that this new feature does not affect the existing permissionless approach — which will continue to be the default rollup behavior. It is rather a complementary choice for operators, broadening the spectrum of potential adopters of Tezos Layer 2 solutions.

Indeed, this feature was developed in response to requests by potential adopters of Tezos Smart Rollups, whose data privacy needs are incompatible with permissionless fraud proofs, typically where user data cannot be shared openly. By restricting the participation in refutation games to whitelisted operators, private rollups prevent potentially sensitive data from being leaked in the generation of fraud proofs.

We will provide further insights into this new feature in an upcoming blog post.

Oxford 2 also includes several quality-of-life improvements for Smart Rollup developers, inherited from the Oxford proposal:

The rollup origination operation has been simplified, and no longer requires a so-called_ “origination proof”_ as an argument. This should benefit projects building on top of the Smart Rollups infrastructure stack.
Similarly to what was already possible for smart contracts, it is now possible to hard-code the origination of Smart Rollups in a chain’s genesis block. This is used, for example, in the deployment of EVM rollups on test networks.
A new WASM PVM revision is released (2.0.0-r3). It enriches the set of host functions available for querying the durable storage of a Smart Rollup, and introduces new capabilities aimed at integrating with the upcoming Tezos’ DAL. Existing rollups will see their PVM automatically upgrade, and newly originated Smart Rollups will use this version directly.

Looking ahead to 2024

Oxford 2 comes at the end of an exciting year for Tezos:

Smart Rollups went live on Mainnet when Mumbai activated, a crucial milestone in our roadmap for scalability.
We demonstrated that Tezos can support – and surpass – the 1M TPS throughput benchmark.
Mumbai also halved blocktime from 30s to 15s, after the Pipelining project completed its full-revamp of the internal block and (operation) validation logic.
Nairobi increased overall performance on Tezos Layer 1, reducing gas costs for common operations and enabling faster propagation of consensus operations.
Data-availability solutions continued their development, with DACs being finalized, and the DAL getting closer and closer to Tezos Mainnet.
Etherlink, an EVM-compatible rollup, is available on Tezos Ghostnet as it gets ready for Mainnet deployment.

In 2024, we look forward to continuing to work closely with the Tezos community to cement Tezos’ position as a technical leader in the blockchain space. Stay tuned for more updates!

In a recent research paper, we described two possible alternative designs we considered. The Oxford 2 protocol proposal implements the algorithm presented in part 4.2. ↩

Introducing Teztale – a Dashboard for Tezos Consensus

2023-10-31T17:00:00+01:00

TL;DR: Teztale is a new consensus-inspection tool for Tezos, developed by Nomadic Labs. This post shows you how to use it.

Consensus algorithms play a central role in maintaining the integrity and security of blockchain networks. They define the rules for how nodes reach agreement on the state of the network and how new valid blocks are added.

However, due to changing network conditions, it is not always a straightforward process. To monitor the behavior of Tezos’ consensus algorithm, Tenderbake, we have developed a monitoring tool called Teztale. Initially built for our Incident Response Team, we are now making Teztale available for anyone to use.

Tip: If you are not already familiar with Tenderbake, we suggest first taking a look at the following resources:

Tezos’ consensus algorithm (documentation)
A look ahead to Tenderbake (blog post)
Tenderbake has been injected (blog post)
Tenderbake’s Baker as a StateMachine (blog post)
A Solution to Dynamic Repeated Consensus for Blockchains (research article)

Tenderbake Rounds

In Tenderbake, a block level (also known as “block height”) is made up of rounds, starting with round 0. Each round consists of three successive phases:

A proposal phase where one of the bakers proposes a new block payload – the non-consensus content of the block (transfers, smart contract calls, voting operations, etc.).
A pre-attestation voting phase, where validators vote to accept the proposed payload.
An attestation voting phase, where validators vote upon the contents of the whole candidate block.

A quorum of votes (amounting to over 2/3 of the total active stake) must be gathered for the proposed block at the end of each voting step.

Ideally, whenever consensus is reached for a proposal at some level (i.e., attestation quorum is observed), each participant adds the proposed block to their local copy of the blockchain. Then a new instance of the algorithm is started for the next level.

However, things may go wrong for many reasons in practice. For instance, messages such as block proposals or (pre-)attestations could be lost or delayed, or some bakers could be desynchronized or offline. Consequently, in some situations, participants may only reach a consensus after multiple rounds.

Teztale enables you to keep track of this process as it unfolds.

Getting The Data

At the top right corner of the Teztale page, you see the head of the chain ( #3797158 in the example shown below) and the address of the Teztale server providing data. If you use the data visualization page with your own Teztale server, it can be / (the default value for data visualization distribution), but any publicly accessible Teztale server can be used. You can also use a Teztale server with a data visualization page hosted somewhere else or running locally.

The Teztale UI gets data from a Teztale server connected to one or more Teztale archivers that filter and forward data from a Tezos node. The server gathers the data, deduplicates where needed, and stores it in the database.

The server also serves the raw JSON data for a level through its /<LEVEL>.json API endpoint. It contains basic information about blocks concerned by a level (timestamp, validation/application time, etc.), as well as information about each baker that should be involved in consensus at this level. The information includes all consensus operations published by the baker, when the consensus operations were received by the archiver node, and whether they were included in a block.

It is possible to connect multiple nodes/archivers running on the same Tezos network (Mainnet, testnets, local deployments) to a single server. Aggregating data from multiple archivers makes the collected information more reliable and representative, as it reduces biases stemming from individual archivers’ performance. Moreover, it makes the whole system more robust, ensuring that data is collected even if one (or a few) archiver nodes are down.

The data visualization page can use multiple sources in different ways, depending on what you are looking for: a single source, using minimal time across all the sources, using the average value, etc.

The Level Page

The navigation bar gives you quick access to different pages (Level, Delay to Consensus, etc.).

On the Level page, users can monitor events in real-time and investigate past behavior based on various parameters. It has a sub-menu with a few options:

Level is the block level. By default it’s locked on the alias Head-1, which means you are in ‘streaming mode’ and the page will automatically refresh when a new block is added to the chain. Pressing the pause button will halt the stream and let you see the relevant block number (#3797157 in our case). You can also enter any block level you would like to inspect. The arrow buttons let you browse through block levels.
Round allows you to select the round for a given level. When the network runs smoothly, there is usually no more than one round (round 0/0) in a block. For blocks where it has taken several rounds to reach consensus, you can switch back and forth between rounds as you would with level.
Source is the Teztale archiver used as a data source.

In order to see Teztale at work, we can use the extreme case of block level #3019851 as an example. This level took an exceptional 18 rounds for the network to reach consensus (see our investigation), causing approximately 52 minutes to elapse between block levels #3019850 and #3019852.

For blocks levels that were completed in a single round (normal conditions), the Level page shows three sections:

a histogram;
key block metrics and other info; and,
a table classifying consensus operations according to different types (i.e., missed operations, valid operations, lost operations, etc.)

If you are inspecting a block level that has more than one round, as it is the case with #3019851, you will see an additional table classifying delegates according to arrivals and departures (more on that below).

The Histogram

The histogram tracks the reception of the block proposal for each round, and its associated pre-attestations and attestations. The x-axis tracks the elapsed time since round rights were enabled, and the y-axis plots the delegates seen (pre-)attesting this proposal. At the top right, you can choose whether the y-axis should display the number of delegates or the number of slots.

The histogram also features additional indicators such as block validation, block application, pre-quorum, and quorum.

You can also see a delegate’s individual performance for (pre)attestation using the ‘Search delegate’ bar in the top right-hand corner. Paste a baker’s tz-address and search, or select a baker in the dropdown menu. The pre-attestation of the delegates you have selected will be shown on the histogram as violet and green dotted lines.

Pro tip 1 : Click and drag to zoom in on an area of the chart.
Pro tip 2 : Click a bar to see (pre–)attestations details. From there, you can either download a .csv file or copy the whole section.

Next to the histogram you’ll find information about the current block and round. For example, you can see if a round is a reproposal of a previous round. You can also find information about the preceding and succeeding block.

The Tables

The first table classifies the delegates’ consensus operations according to the following scenarios:

Missed: These delegates were expected to participate in this round, but the node failed to receive their attestations, which were not included in the block.
Valid: The node successfully received these delegates’ attestations, which were included in the block.
Lost: The node successfully received these delegates’ attestations, but their attestations were not included in the block.
Held Captive: Attestations from these delegates were included in the block, but the archiver node did not receive these operations.
Erroneous: A Teztale archiver reported something wrong with these delegates’ attestations.

If the block level has more than one round, an additional table called ‘Delegates’ will be displayed. ‘Arriving’ delegates are validators that joined in during the current round, while ‘exiting’ delegates are validators that left during the current round.

Clicking a delegate’s address will present you with two options: copying the delegate’s address, or going to the Delegate Stats page, where you can take a closer look at any delegate’s performance.

The Delegates Stats page

The Delegate Stats page comes in handy when you need to pinpoint if and why a baker or the whole network is in trouble. Select the range of blocks you want to review, select a delegate, and then choose whether you want to display the delay or deviation (from the mean of all delegates).

The example below shows a baker missing a few attestations in a row.

At the left of the scatter plot, we can see that the baker’s attestations were all received around 12 seconds after the candidate block. This is late considering that since the activation of the Mumbai protocol on Tezos Mainnet, the minimal (and desired) block time is 15 seconds.

There are a number of possible causes for something like this to happen, but the baker was able to identify and fix the problem quickly (starting from block #4103106), as they knew there had been a problem with their clock not being on time.

Happy monitoring!

Teztale was designed for our Incident Response Team as a tool to monitor the overall health of Tezos network, but as we have shown in this guide, Teztale’s utility is much broader.

Whether you are looking to get a better understanding of Tezos’ consensus algorithm at work, or need to inspect and analyze a specific baker’s performance, Teztale has you covered.

Baking and the Oxford Proposal: A Technical Guide

2023-09-21T18:00:00+02:00

TL;DR: The Oxford upgrade proposal contains changes that affect the workflow of bakers. This blog post covers what to be aware of if Oxford is activated.

NB 27/03/2024 The features described in this documment have not been included in the Oxford 2 protocol, currently active on Tezos Mainnet. See Oxford 2’s announcement for further detail. The Paris protocol proposals implement revisited version of these features.

The Oxford protocol proposal currently going through Tezos’ on-chain governance process contains several changes that affect not just staking economics, but also how bakers operate.

The most significant changes are subject to a separate vote and will only be activated later, provided they are accepted. However, other changes affecting bakers will take effect immediately after activation, if the Oxford protocol is adopted.

In this blog post we highlight what Tezos bakers should be aware of, and the necessary steps to take for ensuring a smooth transition.

Adaptive Issuance and the new staker role

Adaptive Issuance is a new approach to staking economics on Tezos, adapting the network’s emission of its native token, tez, to fit better with real-world usage. For a deeper dive we recommend our blog post introducing Adaptive Issuance.

Adaptive Issuance will not be automatically enabled upon activation of the Oxford protocol. Instead, bakers will be able to vote using a configuration option when running the baking software. This process is described in detail further below.

Adaptive Issuance also introduces the role of staker, in addition to the existing roles of baker (or delegate) and delegator. However, no account can become a staker unless Adaptive Issuance is voted in by the additional vote.

Changes in a baker’s workflow

In order to ease the potential transition to Adaptive Issuance, additional changes have been included in the Oxford protocol proposal¹, which will take effect immediately from the protocol proposal’s activation.

They do not affect baking/attestation rights or rewards, but they affect the actions required to operate a baker. These changes are:

No more automatic freezing and unfreezing. Currently, the protocol regularly adjusts the baker’s frozen balance to accommodate for variations in their delegated balance. With Oxford, bakers may adjust their staked balance (which is frozen) via new explicit “stake” and “unstake” operations. This gives them more control, as the protocol will never move tez from their spendable balance without the baker explicitly signing an operation for it.
Changes to slashing. Double-baking penalties are changed from a fixed sum of 640 tez to 10% of the staked balance². Denunciation rewards (for double-baking and double-attestations) are reduced from one half to one-seventh of the slashed funds.

We also remind bakers that the “endorsement” and “preendorsement” operations are renamed “attestation” and “preattestation”, respectively. This can affect API calls or automated log review setups.The renaming is due to the fact that a baker does not actually endorse a block as much as attest to its existence and validity, which this new terminology makes clear. This change was initiated in Nairobi and is completed with Oxford.

Transitioning from Nairobi to Oxford

A smooth transition to Oxford requires bakers to take steps which they are familiar with from previous protocol upgrades. In this case:

Update to Octez v18.
Run both Nairobi and Oxford bakers (and accusers) in parallel at the time of activation. The former will automatically stop participating in consensus, and the latter will take over.

There is no requirement for bakers to take further immediate action in order to continue baking after activation, but some changes need to be taken into consideration for optimal operation of especially public bakers in the longer term.

Frozen deposits become stakes

Upon activation of Oxford, the frozen deposit of existing bakers will be preserved and the same amount will become the baker’s stake.

The baking and attesting rights are calculated in a similar fashion as before:

Rights are proportional to the baker’s delegated balance, that is, the baker’s own balance plus the balance of all their delegators.
Rights from delegations are capped at 9x the baker’s own staked balance. Above that, the baker is overdelegated, and any additional delegation does not bring additional rights or rewards.

However, while previously the frozen deposits would automatically adjust to the delegate’s total balance, in Oxford the staked balance does not. This will require manual adjustments.

How to manage stake

Oxford introduces a series of new commands for managing a baker’s staked balance.

Adding tez to the stakebalance is done by executing this CLI command:

octez-client stake 2000 for <baker_key>

This will make 2000 of the baker’s spendable tez immediately unavailable for transfer, and the amount will be included in the computation of baking rights after 6 cycles.

Reducing the staked balance is done with this command:

octez-client unstake 2000 for <baker_key>

If a baker wishes to stop baking, they may bring their stake to zero:

octez-client unstake everything for <baker_key>

After executing unstake commands, the baker must wait for an unfreezing period of 7 cycles. During this time, the amount in question is no longer taken into account to determine baking rights, but is still subject to slashing. Afterwards, they need to finalize their unstake operation with the following command:

octez-client finalize unstake for <baker_key>

Note that these operations are pseudo-operations: a transfer operation, but where the destination matches the source. Therefore they must be signed by the baker’s manager key and not the consensus key.

New bakers must explicitly stake

Up until now, an account could become a baker by using just this command:

octez-client register key <baker_key> as delegate

Under Oxford, it is no longer sufficient just to register as a delegate to start baking. It results in no tez being added to the staked balance and hence no baking rights being assigned.

The baker needs to subsequently run the stake command in order to stake at least 6,000 tez, before they will have rights assigned to them:

octez-client stake 6000 for <baker_key>

Set deposits limit operation is removed

Prior to Oxford, baking rewards have accrued to the spendable balance. But at the end of every cycle, the protocol automatically increases or decreases the frozen deposit based on the variations of the delegated balance.

To avoid public bakers risking all of their balance being frozen, with no tez left to pay their delegators, it has been possible to set an upper bound for the frozen balance. This is known as the deposit limit.

In Oxford, the staked balance is always set explicitly, and therefore the “set deposits limit” operation is deprecated.

Adjusting stake as balance changes

Baking requires at least 10% of the total balance staked to maximize rewards. But as the Oxford protocol does not freeze/unfreeze tez automatically, it is the responsibility of the baker to adjust as needed.

As baking/attestation rewards are received, the total balance increases over time. Therefore, it is advisable for a baker without delegators to have a staked balance slightly above 10%, and to adjust it upwards from time to time³.

Note that if Adaptive Issuance is later activated, going above 10% staked will increase rewards, due to staked funds counting twice as much as (self)delegated funds in calculating baking rights and voting power. Hence, staking 100% will generate more rewards than 10%. But until Adaptive Issuance activates, going above 10% will only increase funds at risk of slashing without increasing rewards.

Bakers accepting delegations will want to set their staked balance high enough to leave room for new delegations, while still keeping it below their total balance, so they are able to pay out rewards. A baker can accept up to 9x their staked balance before being overdelegated.

Activation of Adaptive Issuance and new staking mechanism

The activation of Adaptive Issuance and the new _staker _role is decided in a separate signaling vote, which begins if and when the Oxford proposal is activated.

Bakers will be able to vote ‘On’ (in favour), ‘Off’ (against), or ‘Pass’. Abstaining from explicit signaling will have the same effect as a ‘Pass’ vote.

Before voting, bakers are invited to first familiarize themselves with Adaptive Issuance. Should they want to vote in favor of its activation, they must pass the following parameter when running their Oxford baker:

—-adaptive-issuance-vote on

To vote against activation:

—-adaptive-issuance-vote off

To explicitly vote pass:

—-adaptive-issuance-vote pass

The signaling vote is driven by an Exponential Moving Average (EMA) whose half-life is 2 weeks. That is, it takes two weeks for the EMA to raise from 0% to 50% assuming that only On votes are cast. The target threshold is a supermajority of 80% ‘On’ votes out of all ‘On’ and ‘Off’ votes.

The signaling vote has no time limit and continues to run, as long as the threshold is not met. There is also no quorum. Lack of participation (equal to ‘Pass’ votes) is not taken into account by the EMA, and hence only affects the time required to reach the threshold⁴.

If the threshold is met, the Voting phase will complete at the end of the current cycle, and an Adoption phase lasting 7 cycles (about 20 days) will begin.

Oxford on test networks

The protocol-specific ‘Oxfordnet’ test network is now active and available for you to test the new CLI commands. Adaptive Issuance has been enabled on this network, so the distribution of baking rights and reward amounts will differ from the focus of this blog post: the time period where Oxford protocol is activated, but where Adaptive Issuance is not activated.

Should Oxford be voted in during the Promotion Period, the permanent Tezos test network, Ghostnet, will upgrade to Oxford one to two weeks before mainnet. Adaptive Issuance will be off on Ghostnet until voted in by Ghostnet bakers. Ghostnet bakers are recommended to signal vote for Adaptive Issuance being kept ‘Off’ for the time being, so as to mimic mainnet.

Questions?

The Oxford protocol proposal represents a major change to Tezos staking, and it’s natural that questions arise. We invite you to ask questions on the Tezos Stack Exchange, and label them with the tag ‘adaptive-issuance’. This way, we will build a community-driven, collaborative knowledge base on the topic for all to benefit from.

For further information on Adaptive Issuance and other changes brought about by the Oxford protocol proposal, consider the following resources:

Announcing Oxford, Tezos’ 15th protocol upgrade proposal (Nomadic Labs)

Adaptive Issuance and Staking (Nomadic Labs)

A Walkthrough of Tezos’ New Staking Mechanism (Nicolas Ochem / Oxhead Alpha)

Understanding “Adaptive Issuance” In Less Than 10 Minutes (Cryptonio.tez / Tezos Commons)

Special thanks to Nicolas Ochem for his contributions to this blog post.

Notes

With these upfront changes, the potential later transition to Adaptive Issuance only consists of a change to new distribution and reward formulas. ↩
To leave room for new delegations, public bakers must maintain a larger staked balance than what their own total stake requires. However, this increases their slashing risk, and this parameter should be adjusted carefully. ↩
Given the max network issuance of 5% annually, setting staked balance to 12.1% of current total balance is enough to ensure that you are still above 10% after one year. Note that this only applies to bakers not accepting delegations, and assumes no further funds are added to the total balance. ↩
This design of the voting mechanism allows time for discussions and possibly other amendments. If Oxford is activated and all bakers vote in favor immediately, it would still take ~33 days to reach the 80% threshold, and then a further ~20 days to activate the features. Any Pass or No votes would extend this period. ↩

We’re doing 1 million TPS on Tezos! Here’s how

2023-08-24T14:00:00+02:00

TL;DR: In the spirit of “show, don’t tell”, we are proud to present a demonstration of Tezos’ ability to process one million TPS, using Smart Rollups and a data-availability solution built for Tezos.

With Smart Rollups live on Mainnet since March, it’s time to demonstrate the power of this technology to the world!

Smart Rollups represent a groundbreaking design philosophy in that they enable custom rollup solutions to be built using any programming language that compiles to WASM. See this page for a quick introduction, or the Tezos documentation for a more technical walk-through.

But Smart Rollups also play another crucial role: scaling the Tezos network to serve billions of users engaging in high-throughput activities, enabling gas intensive computation, and keeping fees low.

This is done with a combination of horizontal and vertical scaling, as described in our blog post on how to scale Tezos from early 2022. Now we are happy to be able to show the outlined approach in action, with a public demo showcasing a throughput of 1 million transactions per second (TPS).

Watch each transaction happen

To show the throughput achieved in the demo as more than a few metrics on a dashboard, the setup includes a visual representation of the transactions as they happen.

The visualization consists of a series of 5-megapixel images being displayed one by one on a webpage.

Each transaction in the demo contains information about one RGB color component in one pixel, meaning that three transactions are required to update a pixel. Hence, the display of each 5-megapixel image represents 15 million transactions.

As the transactions are executed, the image changes progressively, showing a full change of image for each 15-second block. 15 million transactions executed in each 15-second block: one million transactions per second.

The demo runs for two minutes, and a total of eight images are displayed. We should perhaps emphasize that displaying these images serves purely to create a more tangible experience of the throughput achieved. The transactions can represent any type of blockchain operation.

The visualization setup was created by Elevated Labs, and the technical details were presented at the recent TezDev developer conference in Paris. Here, Trilitech’s Emma Turner and Nomadic Labs’ Thomas Letan also gave the audience an exclusive preview of the demo. Watch their presentation and the demo preview here.

A realistic demonstration

The demo is designed to be as realistic and publicly verifiable as possible. In particular, we are not cutting corners when it comes to defining a transaction.

Transactions in this demo consist of the transfer of a Tezos ticket between two rollup accounts.
Transaction size is similar to that of transactions typically processed on Mainnet¹.
Every transaction is individually signed and checked. It’s an important factor when comparing TPS claims across different blockchains, because these are the costly aspects of the computation done for each transfer.
Rollups are working exactly as they would in a real-world setting (publishing commitments, etc.).
The procedure and code will be made public, so that any third-party may reproduce it.

The demo is carried out on the public testnet Mondaynet with 1,000 Smart Rollup nodes (horizontal scaling) each processing 1000 TPS (vertical scaling).

All nodes are deployed in Google Cloud on 600 ‘c2-standard-16’ instances. The transactions themselves could be executed with half the computing power, but the extra CPU is required for the visualization.

Avoiding data bottlenecks with DACs

In total, transactions equivalent to about 16 gigabytes of data are carried out over the two minutes, requiring a bandwidth of about 133 megabytes/second.

This is far above Tezos’ current block capacity. A protocol-level solution to this bandwidth challenge, the Data-Availability Layer, is currently under development and is expected to launch in early 2024.

In the meantime, to avoid making Layer 1 a bottleneck in this demo, we use Data Availability Committees (DACs), with each rollup having its own DAC, i.e. 1000 instances. For a deeper dive into DACs, see this article.

The principle is this: Instead of posting newly received rollup messages directly to Layer 1 in their raw form, the messages are given to a DAC, which stores the data off-chain, creates a Merkle tree with hashes of all the messages, and returns a root hash signed by the DAC. Together, the root hash and signature(s) are called a DAC certificate.

The DAC certificate is then posted to the rollup inbox on Layer 1. Anyone running a Smart Rollup node can verify the contents covered by the root hash by requesting the input data from the DAC using the reveal data channel.

The use of DACs makes the setup very efficient in terms of bandwidth, as it allows us to represent the full 16 gigabytes of data in just 8000 Layer 1 messages (~1.2 megabytes in total), with each of the 1,000 DAC instances posting one message in each of the eight blocks that are produced during the 2-minute demo. This is far from the 120 million messages it would require to post all transactions directly to Layer 1.

We’re not stopping at one million

Reaching a million TPS is just a first milestone, as the work to scale Tezos continues.

In a keynote presentation at TezDev 2023 entitled ‘Tezos 2.0: The Next Era of Rollups - Ultra High Throughput’, Tezos co-founder Arthur Breitman outlined a way to scale Tezos even further with a rollup-centric approach. Other talks and highlights from the TezDev conference are available here.

We take pride in making this demo realistic and reproducible. Should you wish to check our work and perhaps set up your own test, we are happy to assist.

And similarly, don’t hesitate to reach out if you are looking for a blockchain solution that is future-proof, flexible, highly scalable, and not least, fully decentralized. Because that is exactly what Smart Rollups are all about.

Notes

About 160 bytes. The payload includes sender and receiver addresses, an amount, the RGB component identifier (which can be seen as a token), and the transaction signature. ↩

Announcing Oxford, Tezos’ 15th protocol upgrade proposal

2023-08-11T15:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, & Functori.

Following the smooth activation of the Nairobi protocol upgrade on June 24 — and a successful TezDev conference on July 21 — we are happy to unveil our latest Tezos protocol proposal.

For our 15th upgrade proposal, we head to Oxford, home to the world’s second-oldest university in continuous operation, with evidence of teaching happening as early as 1096. As usual, the proposal’s “true” name is its hash, ProxfordSW2S7fvchT1Zgj2avb5UES194neRyYVXoaDGvF9egt8.

The Oxford proposal first and foremost contains two major changes to the staking economics of Tezos. Note that these changes will not be automatically enabled upon activation, if the Oxford proposal is adopted. Instead, bakers will be able to signal their position in a separate vote. The changes in question are:

Adaptive Issuance: a new approach to tez issuance in the Tezos economic protocol, where emission is adjusted dynamically, and tied to the ratio of staked tez over the total supply.
A new Staking mechanism: a reworking of PoS on Tezos, which introduces the new role of staker in addition to the existing delegate (also known as baker) and delegator roles.

In this post we provide an overview of the functionality. For a deeper and more technical description, see the specification document.

Other prominent changes in Oxford (enabled upon activation):

Refinements to PoS penalties and rewards: slashing penalties are made proportional to funds at stake, and new tools for easier management of staked funds are introduced.
Timelocks are back: a new design and implementation of Timelocks has been finalized, addressing security concerns that led to their temporary deactivation in a previous protocol upgrade.
Further improvements of Smart Rollups: new features simplify the deployment of Smart Rollups both on public periodic test networks and ad-hoc dedicated ones.

The protocol proposal also includes other minor changes and improvements which can be found in Oxford’s changelog.

Adaptive Issuance and Staking

Adaptive Issuance, originally introduced as “Adaptive Inflation”, is an evolution of the staking mechanism, adapting the economics of tez to fit better with real-world usage, as argued in the Tezos Agora post “Why adaptive inflation matters for Tezos”.

The proposed mechanism ties the Tezos protocol’s regular issuance of tez (from participation rewards and the Liquidity Baking subsidy), to the ratio of staked tez over the total supply. At the end of each blockchain cycle, the nominal issuance rate is recomputed to nudge the staked funds ratio towards a protocol-defined target of 50%. When the ratio of staked funds decreases and diverges from the target, emission rates will increase, incentivizing participants to stake funds to re-approach the target, and vice versa.

Consequently, the value of participation rewards and the Liquidity Baking subsidy are no longer fixed values determined by protocol constants – they rather change automatically as the mechanism encourages (or discourages) staking of funds.

New Staking Mechanism

A new role — staker — is introduced, in addition to delegate and delegator. It enables tez holders to contribute to a baker’s security deposit without the baker taking custody of their funds, with in-protocol reward sharing.

When a staker provides staking funds, these funds are frozen and are subject to both rewards and slashing, in proportion to their weight in their delegate’s deposit. Stakers can modify or remove their stakes, with the changes taking effect after 5 cycles for staking and 7 cycles for unstaking.

Delegates (or bakers) can configure their staking policy by setting parameters which explicitly state whether they accept staking by stakers, and if so, up to which fraction of their total deposit. By default, delegates do not accept any funds from stakers.

To encourage staking over liquid delegation, staked and delegated funds have different weights in the computation of a delegate’s baking and voting powers: staked funds (own and external) count twice as much as delegated funds.

Additional changes to the staking mechanism and economic incentives are also included in the Oxford protocol proposal. These are independent of Adaptive Issuance and the new staking mechanism, and would take effect from the protocol proposal’s activation:

Delegates will have more control on their frozen deposits via a dedicated interface. There will be no more automatic freezing and unfreezing of deposits.
Double-baking penalties are changed from a fixed sum of 640 tez to 10% of the frozen deposit, making the slashed funds proportional to the delegate’s (and potential stakers’) funds at stake.
Denunciation rewards (for double-baking and double-attestations) are reduced from one half to one-seventh of the slashed funds.
Bakers can have a part of their rewards paid directly to their frozen deposits, making management of staked funds easier.

Activating Adaptive Issuance and Staking

As previously mentioned, approval of the Oxford protocol proposal by the community will not immediately enable Adaptive Issuance and the new Staking mechanism upon protocol activation. Instead, it will enable a per-block vote: a continuous signaling vote giving bakers the opportunity to activate Adaptive Issuance and Staking on Tezos Mainnet.

Concretely, the features guarded behind the voting mechanism are:

Adaptive Issuance (and adaptive rewards).
Ability for delegators to become stakers.
The changes in relative weights for staked and delegated funds for the computation of a delegate’s baking and voting power.

This choice is based on the following considerations:

Separating the acceptance of Adaptive Issuance (and the rest of the guarded features), from other contributions and changes included in Oxford avoids the technical cost of creating, testing, and maintaining two proposals: one with and one without Adaptive Issuance.
It gives the community more time to evaluate and discuss the adoption of the feature, without blocking the protocol amendment process.

The threshold for enabling the guarded features is set to 80% Yes votes out of Yes + No votes — the same used for protocol amendments — to ensure a high degree of community consensus. The vote however differs from protocol amendment votes, in that the voting phase is driven by an exponential moving average (EMA), and there is no quorum. Absence of signaling will count as a Pass vote, which is not taken into account by the EMA. If the threshold is reached, the guarded features will be activated 7 cycles later.

A high-level functional specification of Adaptive Issuance and the new Staking mechanism is given in: “Adaptive Issuance and Staking”. For more insight into the rationale and design choices behind these features, see the original proposal on Tezos Agora and this episode of the Blockchained Evolved Show by Tezos Commons.

Timelocks are re-enabled

The Michelson op-codes enabling support for cryptographic Timelocks are re-enabled in Oxford.

Timelocks address a challenge with blockchain-based transactions, primarily when trading, known as Maximal Extracted Value (MEV). Since a transaction can be observed in the mempool before it is included in a block, a user can exploit knowledge of the pending transaction to their advantage against another user. For example, upon receiving a transaction, a baker could craft a block including this transaction and one of their own such that the sequential execution of these two transactions guarantees a gain to the baker.

The use of Timelocks can mitigate (but not entirely prevent) this kind of value extraction, by enabling the payload of a transaction to be encrypted until it is too late to change the order of transactions.

Timelocks were disabled in the Lima protocol upgrade, following the discovery of a security flaw at the application level. The feature has since been redesigned, and after thorough testing, including an external audit performed by Inference AG, we are ready to re-activate this feature in the Tezos protocol.

Additionally, the new version of Timelock features new client CLI commands allowing users to create, open and verify time-locked chests, as well as to execute the pre-computations required for fast chest generation.

Improvements to Smart Rollups

As with the Nairobi protocol, the Oxford protocol proposal brings several quality of life improvements for Smart Rollup developers:

The rollup origination operation has been simplified. With Oxford, it no longer requires a so-called “origination proof” as an argument. This should benefit projects building on top of the Smart Rollups infrastructure stack.
Similarly to what was already possible for smart contracts, it is now possible to hard-code the origination of Smart Rollups in a chain’s genesis block. This is used in the deployment of EVM rollup on test networks.
A new WASM PVM revision is released (2.0.0-r2). It enriches the set of host functions available for querying the durable storage of a Smart Rollup with a faster alternative to store has.

Tezos evolves, again

We consider the new staking economics proposed in Oxford another good example of Tezos’ ability to evolve and adapt, and look forward to the community discussions on the activation of the Adaptive Issuance and Staking features.

Note that if this proposal is voted in by the community, upgrading to Octez v18.0 (or later) will be necessary for participating in consensus.

In order to allow the community to start testing the Oxford proposal as soon as possible, v18.0~rc1, a release candidate for Octez v18.0, will be published in the coming days. A dedicated protocol test network, Oxfordnet, is also scheduled to launch soon after.

Adaptive Issuance and Staking

2023-08-11T14:00:00+02:00

$\newcommand\F[2]{\mathit{#1}\left(#2\right)}$ $\newcommand{\minR}{\mathit{min_r}}$ $\newcommand{\maxR}{\mathit{max_r}}$ $\newcommand{\tmult}{\cdot}$ $\newcommand\static[1]{\F{static}{#1}}$ $\newcommand{\sfr}{\frac{1}{1600}}$ $\newcommand\tc{\tau_c}$ $\newcommand\tr{\tau_r}$ $\newcommand\grf{\gamma}$ $\newcommand\dyn[1]{\F{dyn}{#1}}$ $\newcommand\sgn[1]{\F{sign}{#1}}$ $\newcommand\dist[1]{\F{distance}{#1}}$ $\newcommand\DTF{{\Delta t}}$ $\newcommand\IL[1]{\normalsize{#1}}$ $\newcommand\MX[2]{\F{max}{#1,#2}}$ $\newcommand\adr[1]{\F{adaptive}{#1}}$ $\newcommand\clip[3]{\F{clip}{#1,#2,#3}}$ $\newcommand\supply[1]{\F{supply}{#1}}$ $\newcommand\iss[1]{\F{issuance}{#1}}$ $\newcommand\isb[1]{\F{issuance_{block}}{#1}}$ $\newcommand\tw{\Sigma_w}$ $\newcommand\rw[2]{\F{reward_{#1}}{#2}}$ $\newcommand\tip[2]{\F{tip_{#1}}{#2}}$ $\newcommand\lbs[1]{\F{subsidy_{LB}}{#1}}$ $\newcommand\exp[1]{\F{exp}{#1}}$ $\newcommand{\vdf}{\mathit{VDF}}$

TL;DR This document provides a high-level functional specification for Adaptive Issuance and Staking, two new features of the Oxford protocol upgrade proposal, which together constitute a major evolution of Tezos’ Proof-of-Stake mechanism.

NB 27/03/2024 The features described in this documment have not been included in the Oxford 2 protocol, currently active on Tezos Mainnet. See Oxford 2’s announcement for further detail. The Paris protocol proposals implement revisited version of these features.
EDITED on 02/10/2023 to correct the value for $\IL{\grf}$¹ in the dynamic rate formula, which did not reflect the value used in its implementation in Oxford.

Adaptive Issuance

Adaptive Issuance is a novel mechanism regulating tez issuance in Tezos.

Currently, the Tezos economic protocol issues new tez via:

Participation rewards: incentives given to delegates for participation in consensus and random seed generation.
The Liquidity Baking (LB) subsidy.
Protocol “invoices”: lump sums of tez issued and allocated during protocol migration.

Participation rewards and the LB subsidy are regularly issued by the protocol, whereas the value and recipients of invoices are defined discretionarily by the developers of a protocol proposal. In the Nairobi protocol, and in previous ones, the values for participation rewards and the LB subsidy, if any, are defined by the Tezos protocol using fixed constants.

The Oxford protocol proposal introduces Adaptive Issuance: a mechanism where the amount of regularly issued tez (participation rewards and the LB subsidy, if active) depends on the global staked funds ratio — that is, the ratio of staked tez to the total supply. This lets issuance roughly match the actual security budget the chain requires, the amount needed to encourage participants to stake and produce blocks, but no more.

At the end of each blockchain cycle, the regular issuance is adjusted, to nudge the staked funds ratio towards a protocol-defined target (set at 50% in Oxford). Participation rewards and the LB subsidy are recomputed to match that budget. When the staked funds ratio decreases and diverges from the target, emission rates increase, incentivizing participants to stake funds to re-approach the target. Conversely, incentives decrease as the ratio increases beyond the target.

Adaptive issuance rate

The adaptive issuance rate determines, at the end of cycle $\IL{c}$, the issuance for cycle $\IL{c + 5}$. The adaptive issuance rate is the sum of a static rate and a dynamic rate. The final result is clipped to ensure nominal emissions remain within $\IL{[\minR,\ \maxR]}$ (set to [0.05%, 5%] in Oxford) of the total supply.

Figure 1: adaptive issuance rate as a function of the staked funds ratio f.

Figure 1 plots the nominal issuance rate and the static rate as a function of the staked ratio $\IL{f}$. In the graph above, we picked the value 0.0075 (or 0.75%) for the dynamic rate, but this is just an example and that number varies dynamically over time.

The static rate is a static mechanism, which approximates a Dutch auction to compute a nominal issuance rate as a function of the staked funds ratio for a given cycle. Its value decreases as the staked funds ratio increases, and vice versa.

STATIC RATE Let $\IL{f}$ be the staked funds ratio at the end of cycle $\IL{c}$. Then, the static rate is defined as:

$\static{f} = \sfr \tmult \frac{1}{f^2}$

The choice of $\IL{\sfr}$ as a scaling factor ensures that the curve takes reasonable values for plausible staking ratios. Moreover, assuming Adaptive Issuance is activated with a dynamic ratio of 0, and at current staked funds ratio (that is, ~7.5% of the total supply), this factor allows for a smooth transition from current issuance rate (~4.6%).

The dynamic reward rate adjusts itself over time based on the distance between the staked funds ratio $\IL{f}$ and the 50% (±2%) target ratio ($\IL{\tc}$ and $\IL{\tr}$ parameters below), increasing when $\IL{f}$ < 48% and decreasing when $\IL{f}$ > 52%, provided the total issuance rate is not hitting its lower or upper limit.

DYNAMIC RATE The dynamic rate $\IL{\dyn{c}}$ is defined at the end of cycle $\IL{c}$ as:

$\dyn{c} = \dyn{c -1} + \sgn{\tc - \F{f}{c}} \tmult \grf \tmult \dist{\F{f}{c}} \tmult {\Delta t}$
$\dyn{c_0} = 0$

$\IL{\dyn{c}}$ is then clipped to $\IL{\left[ 0, \maxR - \static{\F{f}{c}}\right]}$, ensuring that $\IL{\static{\F{f}{c}} + \dyn{c} \leq \maxR}$.

In this formula:

$\IL{c_0}$ is the first cycle where Adaptive Issuance is active.
Given a cycle $\IL{c}$, $\IL{\F{f}{c}}$ denotes the staked funds ratio at the end of the cycle, and $\IL{\dyn{c}}$ the value of the dynamic rate computed in that cycle.
$\IL{\tc}$ = 0.5 and $\IL{\tr}$ = 0.02 denote, respectively, the target staked funds ratio and the radius of the interval centered on the target ratio.
$\IL{\grf}$ = 0.01, controls the speed at which the dynamic rate adjusts. The value is set so that a one percentage point deviation of the staked funds ratio changes the dynamic rate by 0.01 percentage points per day¹.
$\IL{\dist{\F{f}{c}} = \MX{0}{\left|\F{f}{c} - \tc \right| - \tr}}$ denotes the (absolute) distance between the staked funds ratio $\IL{\F{f}{c}}$ and the interval $\IL{\left[ \tc - \tr, \tc + \tr \right]}$.
$\IL{\DTF = \frac{16384 \tmult 15}{86400} = 2.8\overline{444}}$, denotes the minimal duration (in days) of a Tezos cycle, assuming all 16384 blocks in the cycle are produced at the minimal allowed time – that is, every 15 seconds.
$\IL{\sgn{\tc - \F{f}{c}} = 1}$ if $\IL{\F{f}{c} \leq \tc}$ and $-1$ otherwise, denotes the sign of the distance between the target ratio $\IL{\tc}$ and the staked funds ratio $\IL{\F{f}{c}}$.

In a nutshell, $\IL{\dyn{c}}$ increases and decreases by an amount proportional to the distance between the target rate and the interval $\IL{\left[ \tc - \tr, \tc + \tr \right]}$, while ensuring that the adaptive issuance rate is kept within $\IL{[\minR,\ \maxR]}$ bounds.

Finally, as mentioned before, the nominal adaptive issuance rate² for a cycle $\IL{c + 5}$ is defined as the sum of the static rate and the dynamic rate, clipped to stay within 0.05% — 5% range.

ADAPTIVE ISSUANCE RATE Let $\F{f}{c}$ be the staked funds ratio at the end of cycle $\IL{c}$, the adaptive issuance rate for cycle $\IL{c+5}$ is defined as:

$\adr{c + 5} = \clip{\dyn{c} + \static{\F{f}{c}}}{\minR}{\maxR}$

Adaptive rewards

In Nairobi, and in previous Tezos protocols, participation rewards and the LB subsidy are fixed values defined by protocol constants. With the proposed mechanism, the adaptive issuance rate provides instead a budget for the whole cycle, which gets allocated equally to each block of the cycle and distributed between the various rewards, in proportion to their relative weights.

ADAPTIVE ISSUANCE PER BLOCK Let $\supply{c}$ be the total supply at the end of cycle $\IL{c}$, the issuance per block for cycle $\IL{c+5}$ is defined as:

$\isb{c + 5} = \frac{\adr{c + 5}}{2102400} \tmult \supply{c}$

Where 2102400 = $\IL{\frac{365 \tmult 24 \tmult 60 \tmult 60}{15}}$ is the maximal number of blocks produced in a year, given a minimal block time of 15 seconds.

REWARD WEIGHTS The Oxford proposal defines the weights for participation rewards and the LB subsidy as:

Attestation (formerly, endorsing) rewards : 10,240.
Fixed baking reward: 5,120.
Bonus baking reward: 5,120.
LB subsidy: 1,280.
Nonce revelation tip: 1.
VDF tip: 1.

The total sum of all weights is $\tw$ = 21762. The total issuance per block, $\IL{\isb{c}}$, is distributed amongst the different rewards in proportion to their weight.

Consensus rewards. Since the adoption of Tenderbake, Tezos protocols have rewarded delegates for their participation in consensus with the following rewards per block:

A fixed baking reward, given to the delegate which produced the payload of the block (i.e. choosing transactions, and other non-consensus operations).
A variable, baking bonus reward given to the delegate which produced the block included in the chain. This bonus is given for including attestations, if their combined attesting power exceeds the minimal threshold (two thirds of total slots).
A collective attestation reward shared between the delegates selected in the consensus committee for that block level, for attesting block proposals.

We refer to the technical documentation for further insight on the pre-requisites and distribution of these rewards. Here, we derive the new formulas which compute their values per block for a cycle $\IL{c}$:

$\rw{baking}{c} = \rw{bonus}{c} = \frac{5120}{\tw} \tmult \isb{c}$
$\rw{attestation}{c} = \frac{10240}{\tw} \tmult \isb{c}$

Note that these formulas change the value of available rewards, but not why and how they are awarded. Hence, $\IL{\rw{bonus}{c}}$ still denotes the maximal value for this reward: the actual reward issued depends on the total number of attested slots in a block. Similarly, $\IL{\rw{attestation}{c}}$ is also a maximal value per block, which is further shared between multiple delegates depending on the number of attested slots and subject to the existing participation conditions.

Nonce and VDF revelation tips. The rewards allocated to delegates for contributing to random seed generation (that is for, revealing nonce seeds and posting VDF proofs) are not paid each block, but rather every 128 blocks. The adjusted formulas result:

$ \tip{\vdf}{c} = \tip{nr}{c} = 128 \tmult \frac{1}{\tw} \tmult \isb{c}$

Liquidity baking subsidy. The LB subsidy per block is determined by the following formula:

$\lbs{c} = \frac{1280}{\tw} \tmult \isb{c}$

Note that while the subsidy is issued only if the feature is on, its weight is always counted in the computation of $\IL{\tw}$. In other words, the budget for the LB subsidy is always allocated, regardless of whether it is issued or not.

The Oxford protocol proposal implements a new RPC endpoint, /issuance/expected_issuance, which reports the precomputed values of all participation rewards and the LB subsidy, for the cycle corresponding to the queried block level, and the next 4 cycles.

New Staking mechanism

Staking is an evolution of the existing Tezos Liquid Proof-of-Stake mechanism. It introduces a new role for network participants, called staker, complementary to the existing delegate (also known as baker) and delegator roles. A staker must also be a delegator — that is, they must first choose a delegate.

When stakers stake funds towards a delegate’s staking balance, the associated baking and voting powers accrue to that delegate. Similarly to how delegated funds work, staked funds remain within the staker’s account at all times.

Staked and delegated funds have different weights in the computation of delegates’ baking and voting powers: staked funds (both external stakes by stakers and the delegate’s own) count twice as much as delegated funds.

Unlike delegated funds, staked funds are considered to contribute to the security deposit associated with their chosen delegate. Thus, they are subject to slashing if the delegate misbehaves by double-signing block proposals or consensus operations, and are subject to the same withdrawal delays — colloquially, they are “frozen”.

Stakers are slashed proportionally to their contribution to the delegate’s staking balance. To simplify slashing, double-baking penalties are now proportional to staked funds: instead of the previous fixed sum of 640 tez they are now set to 10% of the delegate’s stake. Moreover, denunciation rewards (both for double-baking and double-attestations) are reduced from one half to one seventh of the slashed funds. The chosen value prevents adversarial delegates from abusing the slashing mechanism for profit at the expense of their stakers.

Delegates configure their staking policy by setting staking parameters which regulate whether they accept stakers (the default being to reject them), and if so, up to which fraction of their total staking balance. They can also configure which proportion of the staking rewards is set to accrue to their own staked balance versus their unfrozen, spendable balance. As participation rewards are paid to the staked balance, and automatically shared between delegates and their stakers, delegates can use this parameter to collect an edge from the rewards attributable to their stakers.

If and when Oxford activates, freezing and unfreezing of staked funds will be controlled directly by delegates and stakers, and will no longer be automatic. This entails that staked funds are frozen until manually unfrozen by stakers. This is a two step process which spans for at least 7 cycles (cf. Staked funds management).

A new user interface is provided for delegates and stakers to interact with the mechanism. It is based on four pseudo-operations: stake, unstake, finalize_unstake, and set_delegate_parameters. Pseudo-operations are self-transfers: a transfer operation where the destination matches the source – each involving a special entry-point of the same name introduced for implicit accounts. This approach was chosen to minimize the work required by wallets, custodians, exchanges, and other parties to support the functionality.

NB Until feature activation: only delegates can stake funds and the relative weight of staked and delegated funds remains unchanged. In the current implementation, only implicit accounts can become stakers. In other words, smart contracts cannot stake funds (they can of course still delegate them).

Staking policy configuration

Delegates can configure their staking policy by setting the following parameters:

edge_of_baking_over_staking: a ratio between 0 and 1, whose default value is 1. This parameter determines the fraction of the rewards that accrue to the delegate’s liquid spendable balance — the remainder accrues to frozen stakes.
limit_of_staking_over_baking: a non-negative number, denoting the maximum portion of external stake by stakers over the delegate’s own staked funds. It defaults to 0 — which entails that delegates do not accept external stakes by default. It is moreover capped by a global constant, set to 5 in Oxford, which ensures the baker controls a significant part of the stake.

Delegates can modify these staking parameters at all times, using the set_delegate_parameters pseudo-operation: that is, by transferring 0 tez to their own set_delegate_parameters. The chosen values for both parameters need to be supplied. The new parameters are then applied 5 cycles later.

On overstaking and overdelegation. Note that if a delegate’s limit_of_staking_over_baking is exceeded (that is, the delegate is overstaked), the exceeding stake is automatically considered a delegation for the delegate’s baking and voting power calculation, but it does remain slashable. The new mechanism does not alter overdelegation (delegated funds beyond 9 times the delegate’s own stake) nor its consequence on voting and baking powers. That is, overdelegated funds are not counted towards a delegate baking power, but they do increase their voting power.

Staked funds management

Stakers (and delegates) can use the stake, unstake, and finalize_unstake pseudo-operations to control their stakes. Figure 2 illustrates their effect on a staker’s funds. Note that while these pseudo-operations change the state of the involved funds, they remain otherwise within the staker’s account at all times.

Figure 2: staked funds management using pseudo-operations.

To stake funds, a delegator uses the stake pseudo-operation, transferring the chosen amount of spendable tez to their own stake entry-point. The staked tez will then be frozen and contribute to their chosen delegate’s staking balance. Note that the stake pseudo-operation will fail if the sender account is not delegated.

To unstake funds, a staker first submits an unstake request with the unstake pseudo-operation. This is implemented by transferring 0 tez to their unstake entrypoint, while passing the chosen amount as a parameter.

The requested amount will be unstaked but will remain frozen. After 7 cycles, unstaked frozen tokens are no longer considered at stake nor slashable. They are said then to be both unstaked and finalizable.

A staker can retrieve all unstaked and finalizable tokens at any time, making them spendable again. This is done using the finalize_unstake entrypoint — that is, by transferring 0 tez to their finalize_unstake entry-point.

Feature activation vs protocol activation

Should the Oxford protocol proposal be accepted by the community, and once the protocol becomes active on Tezos Mainnet, most of the features described in this document will not be enabled by default, only latent possibilities in the protocol, waiting for a separate activation.

In particular, the following changes will require additional approval from delegates via separate feature activation vote mechanism:

Adaptive issuance — including notably the changes to the computation of consensus rewards and the LB subsidy.
Ability for delegators to become stakers — until feature activation delegates continue to be the only participants who can stake funds.
The changes in weight for staked and delegated funds towards the computation of baking and voting rights.

Other changes described earlier would be enabled from Oxford’s activation:

The new interface for stake manipulation based on pseudo-operations. Note that this entails the deprecation of the set/unset deposits limit interface and also the end of automatic deposit freezing. On protocol activation, each delegate’s stake is derived from the frozen deposits at the end of the last cycle of Nairobi.
The changes in slashing penalties (double-baking penalties are set to 10% of the staked funds) and denunciation rewards (they amount to one seventh of slashed funds).
Changes to protocol constants. Note that this entails calculating participation rewards and the LB subsidy using the weight-based formulas, but these are defined so that they match the previous values when Adaptive Issuance is not active.

Activation Vote

We highlight the following principles behind the feature activation vote mechanism:

If and when Oxford activates, delegates can start voting for (On) or against (Off) the feature activation of the changes listed above in each block they bake. They can also abstain with a Pass vote.
These votes are cast by block-producing delegates, and are included in block headers.
Participation is not mandatory, defaulting to Pass in the absence of signaling.
The feature activation vote has two phases: a Voting phase and a subsequent Adoption phase.
The Voting phase is driven by an Exponential moving average (EMA) whose half-life is 2 weeks. That is, it takes two weeks for the EMA to raise from 0% to 50% assuming only On votes are cast.
The target threshold is a supermajority of 80% of On votes over On plus Off votes.
There is no time limit or fixed duration for the Voting phase. It continues as long as the threshold is not met. There is no quorum either, the lack of participation (reified as Pass votes) is not taken into account by the EMA, and hence only affects the time required to reach the threshold.
If the threshold is met, the Voting phase will complete at the end of the current cycle, and the Adoption phase will start at the beginning of the following cycle.
The Adoption phase lasts 7 cycles. The beginning of the cycle following the end of the Adoption phase activates the guarded features.
There is no automatic deactivation of the guarded features once in (and after) the Adoption phase — subsequent votes continue to be counted towards an updated EMA, but without any further effect.

NB In the implementation in the Oxford protocol, the issuance rate is computed 5 cycles in advance. Thus, in the first 5 cycles where is active, the protocol does not use the adaptive reward formula and keeps using the current reward values.

EDITED on 02/10/2023 to correct the value for $\IL{\grf}$. The original version mistakenly defined it as 0.0001. ↩↩
Note that if the nominal annual issuance rate is $r$, the annualized rate is close to $\IL{\exp{r} - 1}$ as it is compounded at every cycle. ↩

EVM Rollups Are Coming to Tezos – Now on Testnet

2023-07-18T18:00:00+02:00

TL;DR: An EVM-compatible Smart Rollup is available on Ghostnet, Tezos’ permanent testnet. This blog post shows you how to get started with it.

Connect your Metamask wallet, deploy Solidity contracts with Remix, build dApps using Ethereum JSON-RPCs, and still benefit from the latest Tezos innovations!

We are happy to present EVM-compatibility on Tezos via an open source EVM Smart Rollup developed by Trilitech, Marigold, Functori and Nomadic Labs. As the next step towards Mainnet-readiness, an instance of this rollup has been deployed on the Ghostnet test network at block level 3289331.

This deployment is a community rollup, meaning it is an EVM-compatible execution environment where all interested community members and project developers can explore and experiment with EVM-development on Tezos.

With that goal in mind, this blog post provides instructions on how to interact with the deployed rollup using the Octez software suite.

What’s an EVM-compatible Smart Rollup?

The Ethereum Virtual Machine (EVM) is the execution environment in which all Ethereum accounts and smart contracts live. An EVM-compatible Smart Rollup is a rollup whose kernel implements an EVM-compatible execution environment. This enables Ethereum smart contracts to be frictionlessly deployed and executed on Tezos.

Because Smart Rollups allow developers to write applications written in any programming language that compiles to WASM, we were able to develop an open source kernel using an established implementation of the EVM in Rust, SputnikVM. This kernel creates a separate blockchain running in Tezos’ Layer 2, producing its own blocks and processing its own transactions.

Currently, there is one EVM block per Tezos block, but Smart Rollups are upgradable, and nothing prevents us from having different block times from Layer 1 in the future. We retrieve the list of transactions to include in EVM-rollup blocks by reading the Smart Rollups’ shared inbox. Thus, we use the Tezos network as our consensus layer, as it provides a sequence of transactions to include in Layer 2 blocks.

The EVM kernel is executed by an Octez Smart Rollup node, which is kernel agnostic and not equipped to communicate with Ethereum wallets or block explorers out of the box. For this, we provide another layer on top of the Octez rollup node in the form of an EVM node.

This facade node is provided as a new Octez binary. It currently partially supports Ethereum’s JSON-RPC API specification, but we aim to provide full support in the future, so as to achieve full compatibility with the Ethereum tooling ecosystem.

Getting started with the EVM rollup

For reasons well-explained here, we have decided to use Ctez as the native token for this deployment of the EVM rollup. Therefore, your balance will be given in ctez, you’ll transfer ctez, and you’ll pay fees with ctez.

1. Get Ctez tokens from the faucet

The first step is getting ahold of sufficient Ctez tokens on Layer 1, which in this case is Ghostnet. In this tutorial, we use the Tezos address tz1g2RbQtxZRHw8oj4oSnhRzy7iRA2hxU4TD and the Ethereum address 0x8aaD6553Cf769Aa7b89174bE824ED0e53768ed70. All you have to do is replace these with your respective accounts.

For the deployment of the rollup on Ghostnet, we picked this instance of the Ctez contract. To speed up user onboarding, Marigold has integrated support for Ctez in their test network’s faucet.

2. Authorize the Layer 1 bridge to deposit your Ctez tokens

The EVM rollup will accept deposits only from a specific Layer 1 contract, responsible for bridging Ctez tokens to the Layer 2 native tokens, called the Layer 1 bridge.

The Layer 1 bridge needs to be able to transfer Ctez tokens from your account to itself. As the contract is compliant with the FA1.2 specification, we do this by creating a token allowance for the bridge contract, using the %approve entry point:

./octez-client --endpoint https://rpc.ghostnet.teztnets.xyz from fa1.2 contract KT1Q4ecagDAmqiY3ajvtwfNZyChWy86W7pzb as tz1g2RbQtxZRHw8oj4oSnhRzy7iRA2hxU4TD approve <amount-to-deposit> from KT1HJphVV3LUxqZnc7YSH6Zdfd3up1DjLqZv --burn-cap 0.0175

This call will enable the bridge contract to take (up to) <amount-to-deposit> ctez from your account and transfer them to itself, in the next step. Note that the amount should be given in µctez (10^-6 ctez), e.g. if you want to deposit 100 ctez, you need to put in 100,000,000 µctez.

3. Deposit your Ctez tokens on Layer 2 via the Layer 1 bridge

The Layer 1 bridge exposes an entrypoint %deposit, enabling users to deposit ctez. The entrypoint takes 3 parameters:

The receiver’s EVM address. Note that the user must make sure this is a valid Ethereum address.
The amount of ctez to deposit.
The maximum fee per unit of gas the sender is willing to pay, in Wei.

You can discover the current gas price by using the RPC eth_gasPrice (note that the result is hex-encoded), e.g.

curl -X POST -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","id":35,"method":"eth_gasPrice","params":[]}' https://evm.ghostnet-evm.tzalpha.net/

After choosing values for the 3 parameters above, you can call the entrypoint %deposit. In this example we’ve set the maximum fee per unit of gas to 21000 Wei:

./octez-client --endpoint https://rpc.ghostnet.teztnets.xyz transfer 0 from tz1g2RbQtxZRHw8oj4oSnhRzy7iRA2hxU4TD to KT1HJphVV3LUxqZnc7YSH6Zdfd3up1DjLqZv --entrypoint "deposit" --arg "Pair (Pair <amount-to-deposit> 0x8aaD6553Cf769Aa7b89174bE824ED0e53768ed70) 21000" --burn-cap 0.1

Using the EVM rollup

We have deployed a Ghostnet instance of Blockscout, a public Ethereum block explorer, which is available at https://explorer.ghostnet-evm.tzalpha.net/. Once you have deposited ctez to your EVM address, you can verify its balance via the explorer, for example: https://explorer.ghostnet-evm.tzalpha.net/address/0x8aaD6553Cf769Aa7b89174bE824ED0e53768ed70.

You are now ready to interact with the EVM rollup, whose public JSON-RPC endpoint lies at https://evm.ghostnet-evm.tzalpha.net. MetaMask is one of the most popular wallets of the Ethereum ecosystem, and below is a quick guide to using it with the EVM rollup deployed on Ghostnet. However, any EVM-compatible wallet should work.

When you have connected your favorite wallet to the EVM rollup and deposited funds, you can start to sign and send transactions via the wallets. You can perform regular transfers between externally-owned (non-contract) accounts:

You can also deploy your own contracts and interact with them. Here is an example of a contract deployed on the EVM rollup.

The EVM rollup is work-in-progress

The EVM rollup is still in alpha, and we are actively working on making it feature complete and bug-free. You can follow the development process directly on the Tezos GitLab repository.

It is subject to frequent upgrades as new features are introduced and existing ones are polished and hardened. However, these updates should not affect end-user activity in the deployed EVM-compatible rollup.

Similarly to Ghostnet’s governance, a dictator key is implemented in the EVM rollup and will be used to migrate the running instance, as future and ongoing developments mature. When we reach the first stable version of the EVM rollup, the dictator key will be removed and replaced by an upgrade mechanism owned by Tezos Layer 1 governance.

If you feel more adventurous and want to try out the latest features of the EVM rollup while they are being developed, it is possible to experiment with the EVM-compatible rollup automatically deployed on the bleeding-edge Dailynet test network. Check out the section ‘EVM Rollup’, and connect your tools with the endpoint: https://evm.dailynet-YYYY-MM-DD.teztnets.xyz.

Join us in creating the best possible EVM experience

To advance the EVM from the current alpha state to a production-ready solution, further testing is needed. We invite you to experiment with the Ghostnet deployment and let us know your experience.

We would also like to hear from Ethereum/Solidity-focused developers interested in deploying their project on an EVM rollup on Tezos. Reach out to us on the Tezos-Dev Slack workspace (in the channel #evm-testnets) and let’s talk about how we can help each other.

Finally, if you’re curious about the rollup-centric future of Tezos, join us at the TezDev developer conference in Paris on July 21 (right after EthCC). This year’s theme is #RiseofRollups. Learn about the future of Tezos, engage with relevant engineers about the EVM roadmap – and witness a live demonstration of 1 million tps!

The Rollup Booster: A Data-Availability Layer for Tezos

2023-07-06T18:00:00+02:00

TL;DR: To achieve millions of TPS, Smart Rollups currently rely on publishing data outside of the Tezos protocol. This blog post introduces the next step: a protocol-level solution for rollup data that is both highly scalable and fully decentralized.

With the activation of Smart Rollups in the Mumbai protocol upgrade, the Tezos ecosystem took a major step towards massive scalability. We are thrilled to see the ecosystem already building and deploying Smart Rollups on testnets, and we look forward to seeing them in action on Mainnet.

Meanwhile, at Nomadic Labs, Marigold, Functori and Trilitech, protocol developers are busy building a next-level data solution for Smart Rollups.

The challenge is to have the Tezos protocol guarantee availability of rollup transaction data without storing it in Layer 1 blocks. In this blog post we explain our approach, starting with an overview of the challenge at hand and then moving into a more technical explanation.

What’s the problem?

The main feature of rollups is that they move transaction and smart contract execution off-chain (to Layer 2). Layer 1 nodes can then focus on tasks that require high decentralization but are computationally lightweight, such as consensus.

While Smart Rollups are designed so that anyone can carry out execution for any rollup, execution does not need to be widely decentralized. That is because rollup operators are required to regularly post receipts, commitments, to Layer 1 with the result of the execution. These can be verified by anyone, and dishonest or erroneous commitments can be challenged by honest actors to ensure integrity.

However, rollup operators must of course be able to receive instructions from the rollup’s users to carry out any execution in the first place.

One way is to include these instructions in Layer 1 blocks, which is done on Tezos using the shared rollup inbox. This ensures that the original transaction data is available for verifying the execution done by rollup operators. But even when applying compression techniques, Layer 1 block size will eventually become a bottleneck. A rough estimate puts the maximum throughput using this approach on Tezos at approximately 3400 TPS¹ with current protocol parameters (block size, block time, etc).

For handling millions of transactions per second (TPS) it’s necessary to keep transaction data out of Layer 1 blocks. But once it is off-chain, how can we trust it, and how can we be sure it is available for others to verify the work of rollup operators?

This is known as the data-availability problem and is a widely recognized issue among those trying to scale blockchains with Layer 2 solutions.

Data-availability committees are already here

For achieving millions of TPS on Tezos we currently rely on Data-Availability Committees, or DACs for short.

A DAC is a group of data providers that host data for rollups off-chain and make it available via the reveal data channel, an interface enabling Smart Rollups to access data external to the Tezos blockchain. For a deeper dive into DACs, see this article.

The DAC can provide a high degree of security as long as a few DAC participants are honest, with the tradeoff that a few dishonest participants can limit the throughput of the rollup.

But ultimately the Tezos protocol does not control or monitor what data is stored on a DAC and can provide no guarantees about the data being available. As a result, using DACs involves some trust in its members actually storing the data and providing it on request.

DACs remain practical for many applications, especially ones that don’t require a high level of decentralization or guarantees of rollup data being publicly available. This can be gaming, ticketing, or other use cases where a centralizing actor is already involved. DACs are also practical when private control of the data is required for confidentiality or legal reasons.

But Tezos’ enshrined rollups are intended as an integrated, decentralized scaling stack. Thus, the next step is to make sure rollup data’s availability is guaranteed by the Tezos protocol itself.

The fully decentralized solution: A data-availability layer

A Data-Availability Layer, or DAL, stores data and provides guarantees about its availability by relying on Layer 1 consensus, i.e. bakers.

It’s an independent peer-to-peer (P2P) network, running in parallel with Tezos’ Layer 1, where data can be submitted and retrieved. Bakers continuously monitor the DAL and attest on Layer 1 whether a given piece of data is available on the DAL.

Unlike the P2P protocol employed by Layer 1, where each node receives all data, the P2P protocol utilized by the DAL is designed such that DAL nodes receive only part of the data. Each data “chunk” is accompanied by an erasure code, which makes it possible to reconstruct the original data, even if parts of the data are lost or never properly published.

In addition to attestations by bakers, DAL nodes perform data availability sampling², a technique that ensures, with a high probability, that the data is available while only downloading a small portion of the entire data. This means that anyone can ascertain data availability merely by running a node, without needing to trust bakers.

Overall, this approach is similar to what is proposed for Ethereum under the name ‘danksharding’, and what is planned for Celestia. It effectively circumvents the limitations inherent to Layer 1, massively scaling bandwidth and storage capacity for the Tezos network, while maintaining a high level of security and decentralization.

A quick overview of roles and responsibilities:

Adding data: Anyone can submit new data to the DAL, though pre-approval by Layer 1 is required to deter spamming.
Storing data: Anyone can contribute to storing data. The more people contributing, the higher the resiliency and efficiency of the DAL.
Verifying availability: Bakers continuously publish attestations on Layer 1, declaring the availability of the data. Other DAL nodes perform data availability sampling.
Retrieving data: Anyone can retrieve any data from the DAL.

Smart Rollups were designed for future compatibility with a DAL and will be able to leverage it as soon as it is activated through a protocol upgrade.

Notes on infrastructure

Given the different P2P protocol, we’ve decided to implement a separate node for connecting to the DAL network. We aim to make the command line interface and the configuration closely resemble those of the Octez node.

It means that both rollup operators and bakers will need to run DAL nodes, though based on feedback from the community, the setup with separate DAL nodes may change in the future. Hardware recommendations for participating in the DAL will be provided at a later stage, when the DAL has been evaluated on test networks.

Bakers should be aware that attesting requires downloading the data, which can be demanding in terms of bandwidth. Similarly to Layer 1 consensus, the amount of DAL attestations by a baker will be proportional to their stake, and bandwidth requirements will therefore also depend on their stake.

Status and roadmap

An early version of the DAL is now available on the Mondaynet testnet. Concurrently, we are preparing the DAL for production, including stress testing its P2P protocol.

The current DAL version does not have data-availability sampling implemented, and participation is optional for testnet bakers. We expect both to change before Mainnet release, which is estimated for early 2024.

Those interested in the technical aspects of the DAL can explore our design document. Also, more blog posts will follow, explaining design decisions and providing in-depth technical descriptions.

We consider the DAL a ground-breaking innovation when it comes to providing decentralized data-availability solutions. Upon activation through a protocol upgrade, the DAL would solidify Tezos’ position as a technical leader – and a blockchain that doesn’t compromise on decentralization.

Notes

Assuming a Layer 2 transaction takes up 10 Bytes of Layer 1 block space. Roughly, we can put at most 512KB of transactions in a Layer 1 block, which is equivalent to 51.2K transactions. As block time is 15 seconds, this means we can achieve a throughput of about 3413 TPS. ↩
Data availability sampling is not implemented in the DAL version currently available on test networks. ↩

Higher TPS and new rollup features: Nairobi upgrade is live!

2023-06-24T02:30:00+02:00

On June 24 2023 00:07:10 UTC, the Tezos blockchain successfully activated the Nairobi protocol upgrade at block #3,760,129.

This 14th upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, and Functori.

Included in Nairobi:

An up to 8x increase in TPS for transactions, smart contract calls, Smart Rollup maintenance operations, and other manager operations, thanks to an improved gas model for cryptographic signatures.
New functionality for Smart Rollups, including new host functions and new internal Layer 2 messages allowing rollup kernels to sync with Tezos protocol upgrades.
Renaming endorsements to attestations.
Faster propagation of pre-attestations to reach consensus earlier.

For more details, see the Nairobi announcement post. A deeper technical description can be found in the protocol proposal’s technical documentation, and a complete list of changes is provided in Nairobi’s changelog.

Join us at TezDev 2023!

TezDev is back in Paris on July 21 for a full day of innovation and collaboration, bringing together voices from across the ecosystem. Join us to connect with a thriving community, get inspired, and learn from experts on the latest and most exciting developments on the Tezos blockchain!

This year’s focus is the #RiseofRollups, with experts from Nomadic Labs, Marigold, Trilitech, and Tezos Commons charting the course for Layer 2 scaling on Tezos.

Key speakers include:

Arthur Breitman, co-founder of Tezos.
Martin Lynge, VP of Gaming for Misfits Gaming and Block Born.
Vlad Horilyi, COO of Madfish Solutions.
Jean Schmitt, Lead Backend Developer at Ubisoft.

In keynote speeches, Nomadic Labs’ Hadrien Zerah will cover adoption strategies and value creation, and Tezos co-founder Arthur Breitman will present an ambitious vision for the future of rollups on Tezos.

Throughout the day there will be workshops covering data-availability solutions, how to develop Smart Rollup kernels, and the upcoming EVM support on Tezos.

Additionally, panel sessions will unpack DeFi success stories on Tezos, review the state of Layer 2s in 2023, and discuss universal interoperability.

… and, of course, there will be lots of networking opportunities in the beautiful surroundings of La Fabrique Événementielle in the heart of Paris.

Register today

Register here to secure your spot. Use promo code Tez0sEarLy1 to get a €15 discount on your ticket. Early bird offer ends on June 30 so act now!

We look forward to seeing you!

Incident Report: Mumbai 2 user-activated protocol override

2023-06-16T15:00:00+02:00

TL;DR The updated Mumbai upgrade proposal, Mumbai 2, patched a vulnerability that could potentially halt block production on the Tezos network. No funds were at risk.

On March 7th 2023, we announced Mumbai 2, a patched version of the Mumbai protocol proposal addressing a liveness vulnerability witnessed on the Ghostnet test network. In this report, we revisit the events as they occurred and the decisions taken in response to this incident. But first, we provide a short summary:

An “inconsistent hash” error was reported on Ghostnet level #2,022,087.
Issue was tracked back to an inconsistency in the Tezos protocol cache, which stored different representations of certain values depending on node uptime for a deployed smart contract.
Network was safe: at worst this issue would affect network liveness (i.e. block production would stall or stop), but not lead to an inconsistent ledger state.
Mumbai 2: A patched version of the Mumbai protocol proposal was published on March 7th.
Mumbai 2 activated successfully on Tezos Mainnet on block level #3,268,609.
There was no evidence of this issue being exploited (or attempted to be exploited) on Ghostnet or Tezos Mainnet.

Incident discovery

On February 21st 2023, we observed that several nodes in the Ghostnet test network reported an “inconsistent hash” error message for block proposals for level #2,022,087.

Our investigation identified an issue in the way Michelson Lambdas are stored on the economic protocol’s cache, which manifested as a divergence in the cache entry for a deployed contract. KT1Ja7Cq1HUTzmk1Qh8iERrEzp1LCjRXvqei, between different nodes.

This divergence was not the result of a bug in the smart contract nor the Michelson interpreter, but rather on a difference in runtime behavior that would lead to certain nodes storing an “unoptimized”, human readable version of their argument in one case, versus an optimized byte-based representation in the other, depending on their uptime and recent activity.

Risk assessment and mitigation

This issue threatened network liveness, as it risked dividing nodes depending on the values stored in their protocol caches. If each side accounted for more than one third of the attestation power but less than two thirds, block production would halt. It’s worth highlighting that under Tenderbake consensus rules, there was no risk of a diverging network split where each fork progressed separately¹.

In this worst case scenario, the ledger state would remain safe, but not live.

Even so, it was quickly noted that should that scenario occur, there would be a relatively straightforward path to recovery. It would suffice to reboot enough nodes holding the unoptimized value in the cache for them to be updated with the optimized value, and get the network unstuck.

In spite of this, and the lack of evidence that there was indeed a Byzantine motivation in the deployment of this contract, we still decided to treat this issue as a 0-day vulnerability, and worked on fixing this issue quickly and silently:

The vulnerability had been witnessed on Ghostnet, a public test network, and nothing prevented it from happening again on the same test network nor on Mainnet – if an exploit was derived.
The information necessary to weaponize this bug was public, and moreover didn’t require a very deep understanding of the core issue – it needed only deploying and interacting with a similar contract on Mainnet.

We had also investigated various ways of further mitigating the problem via an Octez shell update, but none were found to be satisfactory. Thus, we had no option to address this issue at its core, and modify the way the Tezos economic protocol interacts with the cache: when updating an entry, values should be always normalized to the optimized, byte representation.

We took the decision to deploy this fix only on Mumbai, after considering the following:

While this was a high risk threat, it was not a safety issue – no funds were compromised, there was no risk of an inconsistent ledger state –, but rather a liveness one: at worst, block production would slow down or grind to a halt.
Even if this bug was indeed present in Lima on Tezos Mainnet, there was no evidence that the contract triggering the incident was deployed with an intent to attack the network, nor were there consecutive attempts to exploit it – neither on Ghostnet nor Mainnet.
Mitigation was straightforward by rebooting affected nodes. If the situation escalated (e.g. by a repeated exploit), we would still have the option to deploy a patch for Lima on Mainnet as a user-activated upgrade.

After thorough testing and review, an updated protocol proposal, Mumbai 2, was announced on March 7th.

In hindsight, this decision seems to be the adequate one: the user-activated protocol override was not controversial, and after the original Mumbai proposal successfully passed the Promotion period vote, Mumbai 2 activated at block #3,268,609 on March 29th. Moreover, we did not witness neither a repetition of the incident nor any attempt to exploit it.

Moving forward

Ideally, we would have discovered this issue on bleeding-edge test networks like Dailynet or Mondaynet.

However, this requires being able to support environments closer to Tezos Mainnet in test networks: more smart contracts and rollups deployed, increased traffic, etc. Moving forward, we are working towards increasing the capability to reproduce Mainnet conditions in our test infrastructure. It is also imperative to increase community participation in bleeding-edge test networks.

Tezos is constantly evolving, and a new protocol upgrade, Nairobi, is set to activate on Tezos Mainnet around June 23rd. Staying on top of the blockchain game requires us to move fast. Yet, we don’t buy the part of the mantra which requires breaking things – at least, not on Mainnet.

Indeed when we said that adopting Tenderbake was a trade-off of between (more) safety and (less) liveness, we had scenarios like this in mind: in the event of a network split, it is not possible for both forks to advance as at most one can reach sufficient attestation power. ↩

Introducing Data Availability Committees

2023-05-23T16:00:00+02:00

A Data Availability Committee (DAC) is a solution for scaling the transaction throughput of Tezos Smart Rollups. In summary, a DAC enables storing transaction data for a smart rollup off-chain. Rollup nodes retrieve the transaction payloads from the DAC members and import them into their Smart Rollup virtual machines, instead of retrieving them directly from Tezos blocks, and thus circumvent the data limit imposed by Tezos block sizes. In this article we’ll take a look at the infrastructure built by TriliTech and Marigold engineers to support a DAC. But before delving into the specifics, let’s first understand a bit better how rollups work, to see how they fit in.

Smart Rollups

With the activation of the Mumbai protocol upgrade on Tezos, Smart Rollups are live on Mainnet. A Smart Rollup is an application that is executed off-chain by one or many rollup nodes that periodically submits a hash of the application state on-chain. Anyone can run a smart rollup node and post commitments by locking a bond of 10,000 tez. Furthermore, the Tezos economic protocol provides a mechanism to challenge and disprove fraudulent commitments. Since Smart Rollups execute outside of Layer 1, they have the potential to massively scale the amount of computations involved in running blockchain applications.

There are currently two data sources from which a Smart Rollup can import messages to process: the protocol-wide rollup inbox and the reveal data channel. We’ll cover both of them shortly, but before doing so let’s outline the requirements we’d like to satisfy:

Integrity: The data imported into a Smart Rollup must be verifiably correct, that is, it is possible to prove that it has not been tampered with.
Availability: A Smart Rollup must be able to retrieve any message addressed to it. Not being able to do so could hinder the progress of a rollup at best, and could allow a dishonest party to take control of the rollup at worst.

The Rollup Inbox

The rollup inbox is stored in the Tezos blockchain’s context. Users add messages to the rollup inbox via a dedicated manager operation. Rollup nodes then download blocks from the Tezos network to retrieve the contents of the inbox.

The protocol ensures that the data published through the rollup inbox is well-defined, that is there can be no disputes about what messages consist of or in which order they were received. Data availability is also guaranteed by being able to download blocks through a node participating in the Tezos network. However, this method sacrifices scalability since the bandwidth of the rollup inbox is limited by the size of a block (currently 500 KB) and the minimum time it takes to bake a new block (15 seconds at the time of writing). As a result, the bandwidth of the rollup inbox is restricted to around 33 KB/s, which is further shared among all active rollups.

The Reveal Data Channel

To overcome the scalability limitations of the rollup inbox, smart rollups offer an additional source for importing data — via the reveal data channel. This allows rollup kernels — the application logic of a rollup — to request from the rollup node the preimage of a given hash. Such a preimage is called a page and has a size limit of 4KB. The rollup node is expected to provide pages in a fixed location known as the reveal data directory. There is no limit on the number of pages a rollup can request at each Tezos level, ensuring scalability. The integrity of the data is enforced since a preimage is requested through its hash. However, unlike the Smart Rollup’s inbox, the_ reveal data channel_ does not guarantee availability of the data. That is, there are no assurances that a rollup node will have the page corresponding to a given hash when it’s requested. This is precisely the problem addressed by Data Availability Committees.

Data Availability Committees

A DAC consists of a group of parties that commit to storing copies of input data and keeping the data available upon request. Our implementation of DACs:

Provides the infrastructure needed to send, distribute and store data of arbitrary size among the DAC members.
Enables rollup nodes to download payloads from a DAC and to populate the reveal data directory.
Defines a communication pattern that can be used by Smart Rollups to import the data stored by committee members, provided that a sufficient number of committee members have signed their commitment to make a copy of the data available.

It’s important to note that the DAC stack is external to the Tezos economic protocol. That is, the Tezos Layer 1 is not aware of any DAC. The relationship between DACs and Smart Rollups is one-to-many. A DAC can serve multiple rollups while a single rollup must use at most one DAC.

Here’s a high-level description of the workflow:

The user sends the payload to a DAC (1a) and waits for a certificate with a sufficient number of signatures (1b), as determined by the rollup kernel (the application logic of a particular rollup).
The certificate, which is small in size (approximately 140 bytes), is posted to the rollup inbox as a Layer 1 message (2a) and will eventually be downloaded by the rollup node (2b).
The rollup kernel imports the certificate contained in the rollup inbox, and verifies that it contains valid signatures of several committee members. It is the responsibility of the rollup kernel to define the minimum number of signatures required by a certificate to be considered valid,
If the certificate is deemed valid, the rollup kernel will request to import the pages of the original payload to the rollup node. The rollup node downloads those pages from the DAC infrastructure (4a) before importing them into the kernel (4b).

The rollup kernel must implement the logic to determine if a DAC certificate is valid, and to request the original payload by importing the corresponding pages through the reveal data channel.

Deploying a DAC committee

Following is a brief description of how to set up and deploy a DAC committee for serving data for a Smart Rollup. Note that the software has not yet been released, but it still can be tested on Mondaynet and Nairobinet, after building Octez from sources.

For reference, this blog post uses the current Tezos repository master branch at the time of writing. We plan to publish all the DAC binaries with one of the coming Octez releases.

After building from source, users will find two experimental executables:

./octez-dac-node (the DAC node)
./octez-dac-client (the DAC client)

The octez-dac-node executable can be used to set up a new committee or track an existing one. The octez-dac-client can be used to send payloads to the DAC for storage, and to retrieve certificates signed from the data availability committee.

To set up a DAC, several inter-connected instances of the DAC node must be executed. In particular, the DAC node supports three modes of operations: coordinator mode, committee member mode, and observer mode, described next.

Coordinator

The Coordinator acts as a gateway between the clients of the DAC and the other DAC nodes. It is responsible for receiving a payload and splitting it into pages of 4KBs each — the maximum size of a preimage that can be imported into a rollup — and forwarding the resulting pages to other nodes. It is also responsible for providing DAC clients with data availability certificates. A DAC node running in coordinator mode must have access to the public keys of the committee members. A coordinator can be configured with the following command:

./octez-dac-node configure as coordinator with data availability committee members $TZ4_PUBLIC_KEYS --data-dir $DATA_DIR --reveal-data-dir $REVEAL_DATA_DIR

where:

$TZ4_PUBLIC_KEYS is a list of BLS aggregate account (tz4 accounts) public keys.
$DATA_DIR is the directory containing the persisted store of the DAC node instance. This argument is optional and will default to ~/.octez-dac-node when missing. It is suggested to give it an explicit value in case multiple dac nodes run on the same host.
$REVEAL_DATA_DIR is a separate directory where payloads are stored. This argument is also optional.

Once it has been configured, the coordinator can be run with:

./octez-dac-node run --data-dir $DATA_DIR

where $DATA_DIR is the same as for the configuration command

Committee Member

A committee member receives pages from the coordinator and stores them on disk. Once all the pages for the original payload are received, the committee member sends a cryptographic signature to the coordinator to confirm its commitment to storing the data and making it available to external entities upon request. The coordinator collects these signatures and includes them in the data availability certificate for the payload.

To connect with the coordinator, a committee member node needs to use the following command for configuration:

./octez-dac-node configure as committee member with coordinator $COORDINATOR_RPC_ADDR and signer $TZ4_ADDRESS --data-dir  $DATA_DIR --reveal-data-dir $REVEAL_DATA_DIR

where,

$COORDINATOR_RPC_ADDR is the RPC address of the coordinator node, in the format {host}:{port};
$TZ4_ADDRESS is the address of the tz4 account that will be used to sign commitments to the availability of payloads; and,
$DATA_DIR and$REVEAL_DATA_DIR serve the same function as in the coordinator node, but should have different values from the coordinator node if the two run on the same machine.

Observer

Similar to a committee member, an observer node also receives pages from the coordinator and stores them on disk. If the observer is run on the same host machine as a rollup node, and its reveal data directory is set to the same one as on the rollup node, it becomes responsible for providing the pages corresponding to the input payload. To configure an observer you can run the following command:

./octez-dac-node configure as observer with coordinator $COORDINATOR_RPC_ADDR and committee member rpc addresses $COMMITTEE_MEMBE  RPC_ADDRESSES --data-dir $DATA_DIR --reveal-data-dir $REVEAL_DATA_DIR

where,

$COMMITTEE_MEMBER_RPC_ADDRESSES is the list of the RPC addresses of the committee member nodes, in the format {host}:{port}.

Retrieving a DAC certificate

Once the DAC infrastructure has been set up, users can request the committee members to store a payload of arbitrary size via the octez-dac-client command, by running the following:

./octez-dac-client send payload to coordinator $COORDINATOR_RPC_ADDR with content $PAYLOAD --wait-for-threshold $THRESHOLD

where,

$COORDINATOR_RPC_ADDR is the address of the DAC coordinator, in the form {host_ip}:{port}
$PAYLOAD is the hex-encoded payload that committee members will store
$THRESHOLD is the minimum number of committee members that must commit to make the data available, before the command returns.

Upon executing the command, a hex-encoded data availability certificate is returned, with a size of approximately 140 bytes. This certificate can be posted to the global rollup inbox and will eventually be processed by the rollup kernel.

Next Steps

Moving forward, we aim to enhance DACs to ensure seamless integration with Smart Rollups. Some of our planned improvements include:

Introducing new DAC-related functions to the Rollup Kernel SDK.
Allowing committee members to commit to data availability for a set period of time.
Provide in-depth tutorials demonstrating how to write a rollup kernel that utilizes DACs.

We welcome all community feedback on their experience with DACs to help inform our future efforts, so don’t hesitate to join the conversation at #scoru-ecosystem on the Tezos Dev Slack workspace.

What’s Cooking for Smart Rollups in Nairobi?

2023-05-11T15:00:00+02:00

TL;DR: The Nairobi protocol upgrade brings new features and improvements to Smart Rollups that will make life easier for developers.

Mumbai has only just found its way to Mainnet, and a new protocol amendment is already being proposed to the community: Nairobi.

As mentioned in the Nairobi announcement, the upgrade brings several quality of life improvements for developers of Smart Rollups. This blog post is an introduction to these new features. It is aimed at developers who are already familiar with Smart Rollups, but includes context and links to existing documentation for less versed readers.

First and foremost, it is important to understand that every new feature covered by this blogpost will benefit every Smart Rollup on Mainnet, including those deployed prior to activation of Nairobi. Hence you can already deploy your Smart Rollup today, and benefit from the new features once Nairobi reaches Mainnet (though you will need to upgrade your kernel for some of them).

This “free upgrade” in Nairobi is a good example of the advantages that come with Smart Rollups being enshrined – i.e. part of the Tezos protocol itself. It is also a first re-affirmation of Tezos core developers’ commitment to continuously maintain and improve Smart Rollups.

The Protocol_migration Internal Message

Internal messages are a particular kind of message sent to the Smart Rollups shared inbox by the Tezos protocol itself. In Mumbai, there are four kinds of internal messages:

Start_of_level is the very first message of the shared inbox in every Tezos level.
Info_per_level comes just after Start_of_level and provides some information about the Layer 1 progress (namely, the block hash and timestamp of the previous Tezos block).
Transfer messages are injected by the protocol when a Michelson smart contract performs a contract call targeting a Smart Rollup.
End_of_level is the very last message of the shared inbox for any Tezos level.

Nairobi introduces a new kind of internal message, called Protocol_migration. This message will be injected by the protocol after Info_per_level for the very first block of a given Layer 1 protocol. This way, Smart Rollup kernels can be made aware of new protocol activations. This opens new possibilities, like a kernel taking advantage of new features as soon as they are available.

The Typed_transactions_batch Outbox Message

In Mumbai, Smart Rollup kernels can interact with Layer 1 thanks to their outbox. More precisely, they can send operation batches which can be executed in the Layer 1 as soon as the commitment containing them is cemented, i.e. can no longer be refuted.

However, these batches have a potential flaw, in the sense that they are untyped. This is an issue, because some Micheline expressions can potentially have several valid types. Worst case scenario, a kernel could potentially be tricked into withdrawing tickets by mistake. Even if this cannot be used to forge tickets on the Layer 1 (which is well protected against tickets forgery), this could potentially allow attackers to drain a vulnerable rollup.

To prevent such a scenario, Nairobi introduces a new kind of outbox message which allows the kernel to specify the types of the batched transactions it wants to execute on Layer 1.

The First WASM PVM Upgrade

In our communications about Smart Rollups we have continuously emphasized how the runtime of Smart Rollups, the WASM Proof-generating Virtual Machine (PVM), allows for installed kernels to be upgraded.

But that isn’t all. In Nairobi we demonstrate that the WASM PVM itself is also upgradable.

A kernel can inspect the version of the WASM PVM executing it by looking at its durable storage, more precisely under the key /readonly/wasm_version. The version name of the PVM released in Mumbai is 2.0.0.

This can be verified using an up-to-date [octez-smart-rollup-wasm-debugger], which also provides an option to choose which WASM PVM to use when debugging (with the -p option).

Nairobi introduces the first revision of the WASM PVM, version 2.0.0-r1, which is key to the new features contained in the protocol upgrade. Every Smart Rollup originated in Mumbai will see their PVM upgrade automatically at the beginning of the first block of Nairobi, should the protocol amendment be adopted by the community.

The new version is fully backwards compatible with the interactive fraud proof system introduced in Mumbai: refutation games started before the activation of Nairobi will not be affected by the presence of these new features.

Stack Size Limit

In version 2.0.0 of the PVM, the stack of a kernel is limited to 300 stack frames. This value was inherited from the reference WASM interpreter, but early adopters of Smart Rollups quickly reported it to be fairly limited, and hard to satisfy. To this day, octez-smart-rollup-wasm-debugger is still lacking the feature to tell you how many frames at most have been created by kernel_run invocations. Besides, Wasmer (the standard execution engine used for Smart Rollups when no proof is required) does not limit its own stack in terms of frames count, but rather in terms of memory usage.

As a consequence, in version 2.0.0-r1 the stack size limit of the WASM PVM has been bumped significantly to ensure that as long as Wasmer is able to execute a kernel, the WASM PVM can execute it too. This should make the lives of kernel developers easier.

Two New Host Functions

Finally, version 2.0.0-r1 of the WASM PVM introduces two new host functions:

store_create lets you preallocate large values in the durable storage almost for free, ticks wise. It is analogous to calling truncate to an empty file in most file systems.
store_delete_value is the counterpart of store_delete, restricted to the value under a given key. The durable storage allows a given key to be both a value and a directory, and store_delete deletes both.

The following figure summarizes the difference between the two. Consider a durable storage which contains three values, at /foo, /foo/bar and /foobar. Calling store_delete with /foo as its argument also removes /foo/bar, while store_delete_value will leave /foo/bar untouched.

Warning against store_get_nth_key

Please note that the store_get_nth_key host function is now considered harmful. While it has not been removed from the WASM PVM set of host functions, we advise kernel developers to not use it, as it has been discovered that its behavior is potentially incompatible with the refutation game.

Kernels which do not use this host function are safe, and we see no indication of this issue affecting the other host functions.

On the future of validity rollups on Tezos

2023-05-04T17:00:00+02:00

TL;DR: Validity rollups currently suffer from a “scaling trilemma” that calls for a strategic shift in how they are integrated into Tezos. Instead of offering both optimistic and validity rollups, we will combine them in a single product.

Validity rollups (aka. zk-rollups) are all the rage, and we would like to update the community on our work to bring this technology to Tezos.

As you may know from previous communications, our implementation is referred to as Epoxy, and an early version is enabled on Mondaynet. In order to test the system we have also developed epoxy-tx, a transactional rollup capable of handling Tezos’ tickets.

It is the result of two years of R&D by our cryptography team and has given us great insights into the usefulness and applicability of Zero Knowledge Proofs, but also into the challenges and limitations involved. This has led us to draw some conclusions about this technology that may be surprising for some.

In short, we believe that validity technology won’t achieve general compatibility and high throughput at a reasonable cost for at least a few years. Not just on Tezos, but in general.

In this blog post we lay out our perspective on the current state of validity rollups, based on our research and conclusions. And we present an exciting strategic shift in how we aim to integrate this technology into Tezos in a way that counterbalances its intrinsic limitations.

Optimistic rollups vs. validity rollups

The scaling roadmap for Tezos, published in early 2022, focused on two rollup technologies: optimistic rollups and validity rollups.

The Tezos variant of optimistic rollups, Smart Rollups, launched with the Mumbai upgrade. With optimistic rollups, the work of rollup operators is treated as honest by default, hence the “optimistic” element. However, if there is foul play, an operator’s fraud can be refuted by another operator by posting a proof demonstrating the wrong-doing within a two-week period.

Because proofs are only produced in case of a dispute, hardware requirements for rollup operators are moderate even with high throughput and complex operations. You can run any virtual machine – on Tezos, Smart Rollups offer a WASM execution environment. And it takes just one honest operator to ensure the integrity of the rollup.

The main drawback is the two-week dispute period. Until this period has passed, transactions in the rollup can’t be considered final, and when withdrawing assets from the rollup to Layer 1, assets are only released after expiration of the dispute period. Additionally, all incoming transaction data needs to be kept publicly available for verification during this period, though solutions for this are underway.

Validity rollups work differently. A small proof is posted with every commitment guaranteeing the correctness of the operator’s work. The proof is small, lightweight and can be easily verified by anyone. Hence, a validity rollup can be run by a single operator without concerns about the honesty about that operator’s work. Foul play is automatically rejected by the protocol itself, and no other actor needs to keep track of the rollup to guarantee security.

Additionally, the technology enables the state and operations of the rollup to (optionally) be completely hidden from the main chain. They can even be hidden from the users of the service, e.g. each user knows their balance and transactions but not anyone else’s. Only rollup operators have full access to the data of the rollup in order to produce the proof.

Third, providing a proof with every commitment means that validity rollups have instant finality. This greatly simplifies the security analysis and the implementation of applications when compared with optimistic rollups.

So, is this the solution that solves everything? Unfortunately not.

The challenges with validity rollups

Validity rollups currently have their own significant drawbacks – some of which we believe deserve a bit more attention in the general promotion of them as a scaling solution.

These drawbacks are a couple of interconnected challenges that we, and everyone else working on this technology, are currently faced with.

Challenge #1: Proofs are very expensive. Creating SNARK proofs – which must be done with every commitment – requires a considerable amount of computing power. This makes running validity rollups very expensive and puts a limit on the complexity of operations that can be expressed (see challenge #2). For example, support of standard cryptography, such as ed25519 signatures (used for ‘tz1’ accounts) and Blake2b hashes, is currently difficult to achieve at high throughput. New cryptographic primitives for use in validity rollups are being developed, but this again requires that existing infrastructure is adapted.

In our efforts to address this challenge, we have developed aPlonK, an advanced proving system tailored to Tezos, which uses a novel technique for efficient proof aggregation. Essentially, it reduces the proof size and verification time when multiple statements are proven in a batch. It contains a language to describe circuits, and a prover that enables proof-generation to be distributed over a cluster of machines to achieve, in theory, unlimited scalability. The limiting factor being the data-center bill.

But regardless of computing power (and funds), even our best prover, fully optimized and parallelized, cannot go faster than an optimistic rollup operator, which by default does not produce any proof, but simply executes the program and posts a commitment. As long as there is at least one honest participant continuously validating rollup activity, security is ensured and in a much more cost-efficient way than using SNARK proofs.

Challenge #2: Limited compatibility. The ability to execute arbitrary code is a crucial feature for a Layer 2 solution. This not only enables compatibility with existing smart contracts programmed for blockchain ecosystems (such as EVM and Michelson), but also enables the Layer 2 solution to become a distributed backend for more conventional development environments.

However, validity rollups rely on so-called Succinct Non-interactive Arguments of Knowledge (SNARKs) for their proofs. For this to work, all statements must be translated into circuits – a set of mathematical equations that the proving systems can process. In effect, every smart contract or dApp must be ported into what is essentially another programming language.

This also brings us back to challenge #1: current proving technology doesn’t support sufficiently advanced circuits to interpret the execution of existing smart contracts directly, if cost of computation is to be kept reasonable. Hence, general compatibility is currently prohibitively expensive at the throughput necessary to make validity rollups relevant as scaling solutions. Again, this is true not just for Epoxy, but for validity tech in general.

Challenge #3: Fragmentation in tooling. Due to the use of circuits for the proofs, a whole new stack of tooling, SDKs, wallet integration and other kinds of infrastructure must be created. While this in itself may not be the biggest challenge, having parallel stacks for Layer 1, optimistic rollups and validity rollups introduces a level of fragmentation which we believe will become problematic. We believe there are better ways, which we will go into further below.

A (current) validity trilemma

The above can be illustrated as a “trilemma” of validity rollups. The three desired properties of validity rollups are

Compatibility – existing dApps and programs can be executed in the rollup
Reasonable cost – requirements for computing power are realistic
High throughput – proofs can cover many operations and still be generated fast enough for blockchain purposes

Smart Rollups give you all three properties – at the cost of longer finality – while the current state of validity rollup tech allows you to only pick two.

If you want high compatibility and reasonable cost, throughput will be too low for practical purposes. If you go for high compatibility and high throughput, it will become incredibly expensive to run a rollup – we’re talking massive data centers. And if you prioritize reasonable cost of operation and high throughput, the complexity of operations that can be processed will be severely limited. For example, epoxy-tx, our rollup for Tezos tickets transactions, has high throughput and low cost relative to other validity designs, but is limited to, well, transactions.

Of course, a lot of work is currently going into reducing the required computational resources. However, our own experiments and extensive review of current research into this by various projects in the industry lead us to conclude that this trilemma will remain relevant for at least a couple of years – possibly longer.

A brief overview of currently available validity rollups and their approach:

Optimizing for throughput (and cost): ZkSync Lite, dYdX, Loopring, Immutable X follow the same philosophy as Tezos’ Epoxy – relying on application specific ZK circuits to get good scalability (in the order of 1000 transactions per second). Building those circuits, however, is an expensive and delicate task, and compatibility is limited.
Optimized for compatibility (and cost): ZkVM projects, such as ZkSync Era and Polygon ZkEVM, go for a compatibility-first approach. Even though this technology is promising, we’ve yet to see them achieve high throughput.

So, what about Epoxy?

The question is then: what does this conclusion mean for our work?

To answer that question, it is important to make clear what we have built. We have referred to our exploration into validity rollups on Tezos as Epoxy, but it actually consists of two parts: a prover and a connecting framework.

Our prover is called aPlonk and includes a language for describing circuits. The prover is the ‘engine’ of a validity rollup and by far the most important element. And it’s the part we have spent most resources developing. What we call Epoxy is the framework, the glue, that connects this proving system to the Tezos blockchain.

Based on our conclusions presented above, we have decided to go a different route than launching Epoxy as a product competing with Smart Rollups. Not because we don’t believe in a bright future for validity rollups – far from it. In fact, we are excited to be able to give a sneak peek into a strategic shift that we believe will benefit everyone using rollups on Tezos.

We are essentially taking the Epoxy prototype apart and re-purposing the engine and other parts in what we believe is a revolutionary new product.

The hybrid approach

In Smart Rollups, we already have a high-compatibility and high-throughput solution at a low cost, but with longer finality. Tooling is developing rapidly and soon vast Smart Rollup-based infrastructure will be built out on Tezos.

Launching validity rollups as a product competing with Smart Rollups, but with different tooling and the above mentioned trade-offs, is not the best approach.

What is the better approach? Upgrading Smart Rollups with validity tech! Think instant finality for a higher fee. Or standard transactions with short finality, and longer finality for more complex operations. Or new confidentiality features.

This ‘hybrid’ approach has several advantages

Cementing Smart Rollup longevity: Users of Smart Rollups can be confident that their toolchains and infrastructure will remain relevant for the foreseeable future.
Complimentary features: Validity tech can be implemented where it makes sense and used depending on the needs and priorities of a given rollup. Optimistic rollup tech covers everything else.
Gradual implementation: As validity tech matures, the balance between optimistic and validity elements can be adjusted through kernel upgrades after new features are introduced in Tezos protocol upgrades. In other words, existing Smart Rollups can evolve as validity technology evolves.

Make no mistake: In the long term, we see a bright future for validity rollups. But our analysis tells us that the technology just isn’t there yet for them to be competitive with optimistic rollups. Again, this is not specific for Epoxy, but for validity rollups in general. We believe that a gradual implementation, resting on the solid foundation of Smart Rollups, is the right solution for Tezos for the years to come.

The hybrid design for Smart Rollups is ongoing R&D, but we will soon be able to release more details. We look forward to embarking on this exciting journey in cooperation with the Tezos ecosystem and community!

Labelled type parameters in OCaml

2023-04-25T14:00:00+02:00

The OCaml language allows for labelled parameters in function definitions and calls. This is a pleasant feature of the language which can be used to make the code self-documenting. For example consider the difference between the blit functions in the Stdlib.String and Stdlib.StringLabels.

val blit : string -> int -> bytes -> int -> int -> unit
val blit : src:string -> src_pos:int -> dst:bytes -> dst_pos:int -> len:int -> unit

In the second version, the meaning of each parameter is easier to remember and their order is not a source of confusion. And when the function is called, the order of the parameters can be changed:

blit ~len:1024 ~src:s ~src_pos:0 ~dst:b ~dst_pos:0

The OCaml language supports type constructors. But the type parameters cannot be labelled. The meaning and order of each parameter can become a source of confusion. E.g., with the format types from the OCaml Stdlib:

type ('a, 'b, 'c, 'd, 'e, 'f) format6 = ('a, 'b, 'c, 'd, 'e, 'f) CamlinternalFormatBasics.format6
type ('a, 'b, 'c, 'd) format4 = ('a, 'b, 'c, 'c, 'c, 'd) format6
type ('a, 'b, 'c) format = ('a, 'b, 'c, 'c) format4

Well it turns out there is a way to label type parameters and we are about to show you how!

Orthogonal features

One of the strength of the OCaml language is how separate orthogonal features tend to combine predictably.

It turns out OCaml’s type system has some orthogonal features that, taken together, let you label your type parameters.

Type constructors with type parameters: Type constructors allow to parametrise a whole family of types over some other types. For example, the Stdlib.Result module defines a type:
```
type ('a, 'e) t = Ok of 'a | Error of 'e
```
We call 'a and 'e the type parameters and t the type constructor because, given actual types for the parameters (say int and string) the application (int, string) t construct an actual type.

Note that the parameter name chosen to represent the error type uses the mnemonic 'e. However, when you encounter an instance such as (int, string) t, there are no indication which parameter is which. (The Result.t type is common enough through the OCaml ecosystem that this is not generally a problem; we are just setting up a small manageable example.)
Object types: Types which describe objects. These type mentions the publicly available methods of an object. Importantly for our purpose, they mention those by names and are equivalent regardless of the order they appear in.
```
type o = < get_x: int; get_y: int >
```
Type parameter constraints: Narrows down the possible instantiations of a type parameter. This is a rarely used feature but it is handy on occasions (e.g., when passing a module that is monomorphic through and through to a functor which expects one with polymorphism).
```
type 'a t = 'a list
    constraint 'a = int
```

Now putting all three features together, we can define labelled type parameters:

type 'p res = ('a, 'e) Result.t
  constraint 'p = < ok: 'a; error: 'e >

This is a bit circumvoluted, but essentially the type constructor res is a one-parameter alias for the two-parameter type constructor Result.t. And the type parameter of res is constrained to be an object type with two methods. The method names serve as our parameter labels:

let catch_exceptions
  : (unit -> 'a) -> < ok: 'a; error: string > res
  = fun f ->
  match f () with
  | v -> Ok v
  | exc -> Error (Printexc.to_string exc)

A real-world use-case

The example above is somewhat artificial: the type constructor Result.t has only two parameters and is widespread in the OCaml ecosystem. But the idea can be used for real-world use-cases. In fact it is!

In the Octez Tezos suite, one of the type constructor has six parameters. The type constructor is for RPC services and the parameters correspond to:

the type of allowed HTTP methods,
the type of parameters in the URL (think /block/<level>/operations),
the type of parameters in the URL but differently this time because of reasons irrelevant to the blog post at hand,
the type of parameters in the query section of the URL (think /version?format=<format>),
the type of parameters in the body of the request, and
the type of the returned value.

So roughly, there is a type

type ('meth, 'prefix, 'params, 'query, 'input, 'output) service = …

The details are not too important for this blog post; the important part is that this number of type parameters is cumbersome. To make the code more readable (not more concise, but more readable), we introduce a type alias using the labelled type parameter technique above:

type 'rpc service =
  ('meth, 'prefix, 'params, 'query, 'input, 'output) Tezos_rpc.Service.service
  constraint
    'rpc =
    < meth : 'meth
    ; prefix : 'prefix
    ; params : 'params
    ; query : 'query
    ; input : 'input
    ; output : 'output >

With this alias in place we use concrete types based on labels rather than position:

let post_commitment :
    < meth : [`POST]
    ; input : Cryptobox.slot
    ; output : Cryptobox.commitment
    ; prefix : unit
    ; params : unit
    ; query : unit >
    service =
  …

Without an alias

The combination of feature presented above only works as an alias of an existing type constructor with unlabelled parameters. This is the actual use-case we had in the Octez project because the service type comes from an external library.

But you can eschew the alias using GADTs. Or rather GADT syntax. For example:

type _ either =
  | Left : 'a -> <left: 'a; right: 'b> either
  | Right : 'b -> <left: 'a; right: 'b> either

The importance of orthogonal features

It could be tempting to suggest that labelled type parameters should be added to OCaml as a native feature. This could help with the syntax, with the compiler error messages, and in a few other ways. However, adding such a feature to the language increases the maintenance cost of the compiler: the feature needs to be added, tested, documented, and maintained through releases. It requires time. Time better spent adding features which are unavailable, even by combining existing features.

OCaml’s strength is not measured as the sum of its features but as ways in which these features combine together.

Announcing Nairobi, Tezos’ 14th protocol upgrade proposal

2023-04-13T14:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, & Functori.

Following the successful activation of the Mumbai protocol upgrade on March 29th, we are happy to unveil our latest Tezos protocol proposal, Nairobi.

As usual, Nairobi’s “true name” is its hash, PtNairobiyssHuh87hEhfVBGCVrK3WnS8Z2FT4ymB5tAa4r1nQf.

The Nairobi protocol proposal contains several updates and improvements to the Tezos economic protocol, the most prominent being:

An up to 8x increase in TPS for certain manager operations (including tez transfers and smart contract calls).
Improved gas model for signatures to reflect the cost of different curves.
Renaming endorsements to attestations.
Faster propagation of pre-attestations to reach consensus earlier.
New host functions for Smart Rollups, and new internal Layer 2 messages allowing rollup kernels to be aware of Tezos protocol upgrades.

In this article, we give a preview into the improvements described above. More in-depth descriptions can be found in Nairobi’s technical documentation. This protocol proposal includes further minor improvements and other changes – a complete list is provided in the protocol proposal’s Changelog.

Increased TPS thanks to a finer-grained gas model

Currently, the gas cost of signature verification for manager operations is a flat-tax constant, which over-approximates the required resources. With Nairobi, their cost depends on the cryptographic curve used and the size of the payload.

For instance, the gas cost of checking the signature of a tz1 or implicit account will be significantly smaller than those for a tz3, in order to better reflect the difference in computational costs associated with signature verification when using the ed25519 (tz1) and p256 (tz3) cryptographic curves. See this entry in Nairobi’s Changelog for further detail (including breaking changes).

Thanks to this finer-grained accounting of the gas costs for signature verification, the maximum number of manager operations (transactions, smart contract calls, Smart Rollups maintenance operations, etc.) that can be included in a block has increased by a factor of about 8.

Consequently, we observe a similar 8x increase in performance for certain common operations, such as basic tez transactions between tz1 or tz2 accounts¹. For smart contract calls the number will be lower, and operations involving ‘tz3’ accounts will see no increase due to their more computationally demanding signature scheme.

Renaming Endorsements to Attestations

The term “endorsement” might incorrectly convey the idea that bakers (validators) approve of the contents of a block – that is, of the transactions and other operations chosen by the payload producer – even though in reality they are merely attesting that a valid block has been produced. Moreover, ongoing work towards a Data Availability Layer will change the behavior of consensus operations in a way that makes the term “endorsements” less fitting.

We propose thus to rename endorsements into “attestations” – a more precise term, which reflects better past, present, and potentially future semantics of these consensus operations.

With this protocol proposal, we begin this process by launching the deprecation of the endorsing_rights RPC endpoint, introducing attestation_rights as a replacement. This symbolic step is not breaking, as both old and new names would be available if Nairobi is activated on Mainnet. In future protocol proposals, we intend to go beyond symbolic steps though, and may have to introduce breaking changes in API names – which will be communicated with ample notice.

Fine-tuned validation pipelining for faster consensus

In order to keep Tezos consensus both live and fast, it is paramount to ensure consensus operations are validated and propagated swiftly – especially after Mumbai halved block time from 30 to 15 seconds on Mainnet. To this end, this protocol proposal fine-tunes its Mempool module, which implements the business logic used by the Octez node’s to validate operations in the mempool.

With Nairobi, an Octez node’s prevalidator would initialize mempool validation faster, and it would moreover accept consensus operations that are slightly in the future or branched in cousin blocks². As a result, the node can immediately propagate pre-attestations for blocks which have just been validated but which have not yet been applied. Consequently, bakers participating in a consensus committee would be able to communicate their votes in the pre-attestation voting phase faster, which should lead to reaching consensus earlier on block proposals.

Continuous Improvement of Smart Rollups

Tezos is the constantly evolving blockchain so it is only fitting that Smart Rollups, its general-purpose Layer 2 solution, follow the same path.

This new protocol amendment, if deployed on Mainnet, will enable additional host functions for all Smart Rollups. Nairobi also introduces a new kind of Layer 2 internal message which allows to broadcast to all originated rollups the activation of new Tezos protocol upgrades on Layer 1.

Buckle up, ‘Nairobinet’ test network will launch soon

Note that if Nairobi is voted in by the community, upgrading to Octez v17.0 (or later) will be necessary for participating in consensus.

In order to allow the community to start testing the Nairobi proposal as soon as possible, a release candidate for Octez v17.0 will be published in the coming days, and a dedicated protocol test network Nairobinet is also scheduled to begin soon. More information about this test network will be available on https://teztnets.xyz/.

Read more about Tezos test networks here, and don’t hesitate to reach out in the Tezos Developer Slack or in the Tezos Discord if you need help getting started.

These observations arise from our experiments using the TPS benchmark tool distributed with the Tezos protocol and the Octez suite. See this earlier blog post for further detail on how to replicate this experiment. ↩
Unpacking the jargon: operations in Tezos are branched (that is, they reference) an earlier block hash. By “slightly in the future”, we mean consensus operations not branched in the predecessor of the target block (also known as the “grandfather block”) but rather on other close descendants. Then, a “cousin block” is a block who shares a grandfather with another block – usually the successor of a block proposal for a different round than the current mempool head. ↩

Smart Rollups and 15 second blocks: Mumbai upgrade is live!

2023-03-29T19:00:00+02:00

On March 29 2023 16:40:29 UTC, the Tezos blockchain successfully upgraded by activating the Mumbai proposal at block #3,268,609

This 13th Tezos protocol upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

Included in Mumbai:

Smart Rollups are live on Mainnet: With Smart Rollups, anyone can deploy decentralized WebAssembly applications with dedicated computational and networking resources.
Block time reduced to 15 seconds: With improved pipelining validation fully deployed, block propagation times are significantly reduced, allowing for minimal block time to be halved to 15 seconds.
Ticket transfers between user accounts: In Mumbai, Tezos tickets can be transferred between user accounts (aka. implicit accounts) and not just to/from smart contracts and rollups.
RPCs for ticket balances: Two new RPC endpoints were added to improve the visibility of ticket ownership. all_ticket_balances returns a complete list of tickets owned by a given contract. And ticket_balance returns the given-contract’s balance of the ticket with specified ticketer, content type, and content.
New Michelson operations: Michelson opcodes AND, OR, XOR, NOT, LSL and LSR were extended to support logical operations on bytes, similar to those on nat. And an opcode was added to convert between bytes and nat values.
A peek at validity rollups: Our exploration into validity rollup (aka zk-rollup) technology, Epoxy, arrives on the Mondaynet testnet.

For more details, see the Mumbai preview post.

A deeper technical description can be found in the protocol proposal’s technical documentation, and a complete list of changes is provided in Mumbai’s changelog.

Get started with Smart Rollups

Smart Rollups are the backbone of Tezos’ scaling strategy and hence instrumental in reaching our goal of 1 million transactions per second in 2023.

But Smart Rollups are much more than that. They enable entirely news way of building decentralized applications on Tezos, using any programming language that compiles to WASM, such as Rust, C/C++, Go and Python.

The scalability and flexibility makes Smart Rollups the perfect solution for any organization looking for a future-proof blockchain solution. Don’t hesitate to reach out to Nomadic Labs’ adoption and support team if you have questions regarding this.

For builders and tinkerers we recommend the Tezos Developer Slack or the Tezos Discord if you need help getting started, as well as these community resources:

Smart Rollups (Tezos Technical Documentation)
Setting up a Tezos Smart Rollup in 5 steps (Tezos Commons)
How to Write a Rollup Kernel (Marigold)
How to Deploy a Kernel Bigger Than 24kB (Marigold)

We are excited to be part of this major step in Tezos’ evolution, and look forward to seeing Smart Rollups deployed in all varieties over the coming months!

Liveness vulnerability found: A patched Mumbai proposal is available

2023-03-07T16:00:00+01:00

TL;DR: We are proposing a patched Mumbai protocol upgrade to address a liveness vulnerability.

We have discovered a vulnerability which could affect the Tezos network’s liveness but not its safety. In other words, the vulnerability could slow or halt the network, but not put funds at risk.

We have investigated various ways of mitigating the problem via an Octez shell update (everything but the protocol), but none were found to be satisfactory, and a patch for the protocol itself is required.

We have taken the step of producing a patched version of the Mumbai protocol upgrade proposal that addresses this vulnerability. The hash of Mumbai 2, the patched version of Mumbai, is: PtMumbai2TmsJHNGRkD8v8YDbtao7BLUC3wjASn1inAKLFCjaH1.

As the Mumbai proposal is currently in the Promotion period of the governance cycle, there are two scenarios going forward:

User activated protocol override: The community votes Yay to Mumbai in the current voting period. All nodes running v16.0 of Octez will activate the patched version, PtMumbai2, instead of PtMumbaii. This is what happened with the patched Babylon, Edo, and Hangzhou proposals.
Restarting the governance cycle: The community votes Nay to Mumbai in the current voting period, and the governance process “reverts” to a new proposal period. A patched Mumbai protocol will then be proposed and must undergo a full governance cycle. This should last about 2.5 months assuming approval in every voting period.

A new release candidate, Octez v16.0~rc3, has just been published, and v16.0 is expected shortly after. In addition to the Mumbai 2 protocol, this release candidate also includes performance improvements in the baker that we consider necessary for the block time reduction coming with Mumbai.

Important: Octez v16.0 will be required for Mumbai

As stated in a previous blogpost, upgrading to v16.0 (or later) is necessary to participate in consensus once Mumbai is activated, independent from the discussion above.

We remind the community that bakers representing at least 2/3 of the total stake must be running the same protocol version for the chain to stay live, due to the requirements of the Tenderbake consensus algorithm.

The Mumbainet test network will be restarted shortly to instead use the Mumbai 2 protocol. Joining the new rebooted test network will require Octez v16.0~rc3, or the upcoming stable release. Please check this page for further details.

We acknowledge that this is coming at a late stage in the current voting process, and we ask for the community’s understanding and cooperation in addressing the situation. A full write-up on the vulnerability will be published when it is safe to do so.

We are confident that the measures outlined in this post will sufficiently address the present issue and look forward to unleashing the potential of the ground-breaking features contained in Mumbai!

Introducing new Octez node logs for better UX

2023-03-02T12:00:00+01:00

TL;DR: Future versions of the Octez node will only output essential information, while a more detailed log is written to disk in the background

Core developers and users of the Octez software suite know it well: the Octez node is pretty verbose.

For every block, the node yells out a whole series of steps taken:

Jan  6 14:33:10.879 - validator.block: prechecked block BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.900 - validator.block: block BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E validated
Jan  6 14:33:10.900 - validator.block:   Request pushed on 2023-01-06T13:33:10.853-00:00, treated in 153us, completed in 45.531ms
Jan  6 14:33:10.920 - prevalidator: switching to new head BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.920 - prevalidator:   Request pushed on 2023-01-06T13:33:10.900-00:00, treated in 308us, completed in 19.333ms
Jan  6 14:33:10.920 - validator.chain: Update current head to BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.920 - validator.chain:   (level 462824, timestamp 2023-01-06T13:33:10-00:00, fitness
Jan  6 14:33:10.920 - validator.chain:   02::00070fe8::::ffffffff::00000000), same branch

To improve the user experience, we’ve decided to make a significant change to these logs. In the next major version of Octez (17.0), the node will output ONE SINGLE LINE for each new block, plus for injected operations and baked blocks (if running a baker). But don’t worry, all the nitty gritty technical information remains available.

Read on to learn what we did and why.

What a node log should do

Logs are a way to communicate information on the state of some software. In the context of the Octez node, the goal is to keep the user aware of the interaction between local Octez software and the Tezos blockchain.

Some examples of what could be going on:

The chain is running, node is synced
The node is stuck or bootstrapping
New blocks are being discovered
Connecting or disconnecting to/from peers
Operations are being received
Various errors
…

When everything goes well, a new head appears regularly with every new block. Or the node reports receiving new operations. During these peaceful times there is really no need to be verbose.

Still, at every new block, the node is yelling the eight lines in the introduction of this post. These lines contain what to most users must seem like dark magic incantations, like prevalidator and Request pushed or timing in micro-seconds.

Is that interesting for a user? We don’t believe it is. The only pertinent information for a user is :

Hey, it's going well, the level is now 462824 and the most recent head 
has the hash BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E.

Wait, I want the juicy details!

On the other hand, when issues arise, we want users to be able to communicate errors to relevant people, such as the developer team and incident managers.

In fact, it’s not only the errors we need, but also the state of the node during the last few hours or days. This information can sometimes be even more important than the error itself.

To make the logs less of a pain to read for a user, we need to remove information. But if we want logs to stay useful enough for a developer to fix issues after an incident, we need to keep this information somewhere, and even add more!

So, we’ve split logs into two channels:

User logs: Simple and clean, except for exceptional behaviors or errors. It is shown as the standard output.
Internal logs: Verbose, but silently written and stored to disk for a limited number of days to avoid excessive use of disk space.

For internal logs, we added a new node configuration option (as specified in the documentation), daily-logs, to create a log file every day, which is kept during a user-defined number of days.

Users can find these log files at the path <node-data-dir>/daily-logs/. By default, the daily log files will be kept for seven days, in which case the directory shouldn’t exceed 500MB.

Stripping down the node output

Now that the important debugging logs are stored elsewhere, we can safely remove information from the user logs. At each block, we now have only one line:

Jan  6 14:33:10.879: head is now BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E (462824)

In short, we

Removed the part of the log (like prevalidator, validator.chain) that references a location in the code
Removed a few redundant messages saying the same thing: a new block has been processed and added to the chain
Split some messages containing too much information, like timestamp or fitness that can be found elsewhere anyway.

Most of this work is expected to appear in Octez suite v17.

What’s next?

The next step is mostly a refactoring of the node logging engine, and hopefully removing a deprecated library, lwt_log. This work will allow us to have colors in the logs to emphasize specific information, like when your baker baked a block, or when an error is shown.

Announcing Tezos’ 13th protocol upgrade proposal, Mumbai

2023-01-17T15:00:00+01:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

We’re happy to announce that Mumbai, our next Tezos protocol proposal, is ready. As usual, Mumbai’s ‘true’ name is its hash, which is PtMumbaiiFFEGbew1rRjzSPyzRbA51Tm3RVZL5suHPxSZYDhCEc.

The main features of this protocol upgrade proposal are:

Smart rollups are activated: Smart rollups are fully operational on Mainnet offering a powerful scaling solution. Anyone can deploy decentralized WebAssembly applications with dedicated computational and networking resources.
Epoxy makes its first appearance: Validity Rollups (aka ZK-rollups) arrive on the Mondaynet testnet. Epoxy allows for instant finality due to SNARK’s proof-of-validity.
Block time reduced to 15 seconds: With improved pipelining fully deployed, block propagation times are significantly reduced – this allows for the minimal block time to be cut in half to 15 seconds!
Ticket transfers between accounts: In Mumbai, tickets can be transferred between user accounts (aka. implicit accounts) and not just to/from smart contracts and rollups.
RPCs for ticket balances: Two new RPC endpoints were added to improve the visibility of ticket ownership. /all_ticket_balances returns a complete list of tickets owned by a given contract. And /ticket_balance returns the given-contract’s balance of the ticket with specified ticketer, content type, and content.
New Michelson operations: Michelson opcodes AND, OR, XOR, NOT, LSL and LSR were extended to support logical operations on bytes, similar to those on nat. And an opcode was added to convert between bytes and nat.

To learn more about Mumbai’s contents, see our full preview post.

In addition to these exciting new features and steady progress, the Mumbai proposal disables Transaction Optimistic Rollups (TORUs) on Mainnet. TORUs were the first optimistic rollup implementation on Mainnet enabled with Jakarta mid 2022, and the first step of the scalability roadmap presented last March.

Transaction rollups were always intended to be an temporary solution, as clearly indicated by the sunset they were released with. Now that smart rollups are reaching Tezos Mainnet, it makes little sense to keep transaction rollups enabled, as this functionality can easily be implemented through Smart Rollups. It is important to understand that, if Mumbai is voted in by the community, the transaction rollup subsystem will be completely disabled as soon as the protocol becomes activate on Mainnet. This means tickets deposited on transaction rollups will be lost forever, and operators will not be able to reclaim their ꜩ 10,000 bond. Currently, no transaction rollups have been originated on Mainnet.

The changelog provides a detailed list of changes, and a general technical overview of Mumbai can be found in the protocol proposal’s technical documentation.

Note that, if Mumbai is voted in by the community, upgrading to Octez v16.0 (or later) will be necessary for participating in consensus. A release candidate for Octez v16.0 will be published in the coming days, and a dedicated protocol test network Mumbainet is also scheduled to begin soon. More information about the test network will be available on https://teztnets.xyz/.

Smart Rollups and Epoxy testers wanted

First, Smart Rollups will be active on Mumbainet soon. Functioning rollups implemented in Rust are currently running on Mondaynet. We highly encourage ecosystem participants to experiment and build with these rollups. In roughly 2 months they are expected to activate on Mainnet, assuming Mumbai is voted in. Smart Rollups are instrumental in our efforts to reach 1 millon TPS.

Second, Epoxy - Tezos’ validity rollup (aka ZK-rollup) solution makes its way onto Mondaynet. It’s not, however, part of the Mumbai protocol upgrade, as more time needs to be spent on validation, testing, and integration with ecosystem tools before they can be activated on Mainnet. We ask ecosystem participants to start experimenting with Epoxy on Mondaynet.

Broad testing and feedback from the ecosystem is invaluable in our efforts to minimize the risk of undetected issues upon Mainnet activation.

Read more about Tezos testnets here, and don’t hesitate to reach out in the Tezos Developer Slack or in the Tezos Discord if you need help getting started.

Anyone interested in getting started with building a smart rollup node can reach out to contact@nomadic-labs.com.

Incident Report: slow consensus on block #3,019,851

2023-01-05T18:00:00+01:00

TL;DR: A low-probability scenario caused a temporary slowdown of the network. This is a known possibility and intended behavior of the Tenderbake consensus algorithm.

On January 1st, around 14:13 UTC, the Tezos Mainnet experienced a major slowdown: ~52 minutes elapsed between block levels #3,019,850 and #3,019,852, due to block #3,019,851 requiring 18 rounds to reach consensus. Network endorsing power recovered immediately on the following block, producing a slower round 0 block. After #3,019,853, block time stabilized to the expected 30 second time between blocks.

Following the incident, we’ve examined how this slowdown came to happen, and why the network failed to reach a quorum on earlier rounds.

Today, we are confident that no bug in either the Tenderbake consensus algorithm or its implementation by the Octez baker was behind this incident. The results of our investigation point towards a particularly unlikely, yet known, scenario.

Below, we will detail how it unfolded. But first, a short summary:

Block production did not stop, Tenderbake kept running: it took 43 minutes for the network to agree on the next blockchain head, as a result of #3,019,851 being finalized at round 17. But, proposals were made in all previous rounds at that level. Remember that Tenderbake entails a trade off in choosing network safety over network liveness.
Network was perfectly safe: the behavior observed fits within a known slow consensus scenario for Tenderbake: the endorsing stake was split roughly even between bakers which had locked their endorsement for the Round 0 block and those who did not, without reaching the 67% endorsing power requirement.
Liveness was slow, but kicking: it took 17 extra rounds for one of the locked bakers to re-propose the locked payload. It immediately got sufficient endorsements for the chain to move on.
A very late block proposal was the trigger: the scenario was triggered by a block proposal arriving unusually late within the expected time slot. The reasons for this delay are still under investigation. However, it was less pressing than assessing whether the Octez baker implemented Tenderbake correctly.
An unlikely yet known (and tested!) scenario: the particulars of this slow consensus scenario (detailed below) make it a quite unfortunate incident. Moreover, similar scenario were already thoroughly tested both in the Octez baker’s test suite and using Functori’s Mitten simulation toolkit.
Not related to baker crashes: we had first observed reports of baking nodes crashing due to a lack of available storage space. This working hypothesis, mentioned in our early messages on Sunday, turned out to be a false lead. It is not, a priori, connected to the slow consensus scenario that unfolded.

So, what happened?

In focus: the consensus committee for #3,019,851

To understand what happened, we dug into the committee for #3,019,851, and analyzed its interactions before it reached a quorum. We relied on Teztale, a live consensus-introspection tool, currently being developed at Nomadic Labs. We started building Teztale to inform our Incident Response Team in situations like this.

As an aside, the description that follows is fairly technical and requires some understanding of how Tenderbake works. If you are not familiar with the concepts or the terminology, we highlight a few resources:

The Lima protocol’s Consensus documentation entry.
A look ahead to Tenderbake (blog post).
Tenderbake’s Baker as a StateMachine (blog post).
A Solution to Dynamic Repeated Consensus for Blockchains (research article).

Round 0

Let’s plunge into the consensus committee for #3,019,851. The histogram above tracks the reception of the block proposal at round 0, and its associated preendorsements and endorsements. Here, the X axis tracks the elapsed time since round rights were enabled, and the Y axis plots the number of delegates seen (pre)endorsing this proposal.

The rights to propose a block at round 0 at level #3,019,851 were enabled at 2023-01-01T14:14:29Z – that corresponds to the origin of the plot. The baker holding those rights proposed a payload, whose block hash was BLZL7AtZKP21eQKEESymTCJYvfV5fpuh6ZZQXffTwg4QstzfxoD.

However, this block proposal was not seen (and validated!) by our node running Teztale, or any of the nodes we have access to, until 14:14:54Z – denoted by the red vertical bar in the histogram. That is, the block payload could only be (pre)endorsed 25 seconds after rights were enabled. This is awkward, as most bakers usually produce and propagate blocks significantly faster than this.

Given a 30 seconds round duration, Round 0 was scheduled to finish at 14:14:59Z. This left only 6 seconds for the network to reach consensus on this block. It was clearly not the case: preendorsements and endorsements for this round arrived too late to our node, as the histogram shows.

Let’s focus on the (pre)endorsements in this round. In this particular committee, 229 bakers (out of the 240 that had endorsing rights) managed to inject a preendorsement within this period. Then, 122 bakers (accounting for 43.83% of the endorsing power) observed in time that a pre-quorum was reached, and proceeded to endorse that block – witnessed by the violet bars in the histogram.

This action “locked” them to the endorsed payload content for all successor rounds at level #3,019,851. This means they would only endorse re-proposals of the original block payload.

Given that 43,8% of endorsing power is obviously more than 33% but far from the required 67%, the rules of Tenderbake specify that the only way to move forward in this scenario is when one of the “locked” bakers re-proposes this round proposal’s payload content.

That is, when it is their turn to do so.

Rounds 1 – 16

Here is when Fate kicks in. Unfortunately, all bakers with proposing rights between Rounds 1 and 16 were part of the 104 bakers (amounting to the 56.17% of the stake) that had not locked to the payload of round 0!

The probability of arriving at this scenario, given the stake distribution in play, was 0.5617^16 = 0.0098191%. That is, in roughly 1 out of 10000 possible committee configurations, the right to propose a block in the next 16 rounds would be given to a baker which had not locked its commitment to the payload proposed at Round 0. And this was the case.

Indeed, in every round between Round 1 and Round 16, the designated delegate proposed a block. Yet, neither of them were locked, so they proposed a fresh proposal with a new payload instead.

They were all doomed to fail: their proposals would never be endorsed by the locked delegates and it was not possible to reach the minimal endorsement requirements without them.

The network was both safe and live. Yet, the dice had rolled a bad hand, and there was regrettably nothing to do but wait for fairer winds.

Round 17 and beyond

Finally, it was the turn for tz1gUNyn3hmnEWqkusWPzxRaon1cs7ndWh7h to propose a block at Round 17. This delegate was one of the delegates locked on the payload proposed at Round 0, and correctly re-proposed it. The resulting block proposal, BLfpYc4HvRpyacP6f75hpTgRJjzzpfyQRaxzNQys5gByEdr6T24, was produced with timestamp 2023-01-01T14:56:59Z, as expected.

Most delegates in the committee endorsed the block proposal in time, as seen in the histogram above, resulting in 6,982 out 7,000 endorsed slots. Having reached consensus on the head of the chain, it became #3,019,851 and the network finally moved on.

Note that the next block #3,019,852 was produced at Round 0. Still, the “time between blocks” #3,019,852 and #3,019,851 is around 5 minutes because the timestamp of the former corresponds to the end of the last round of the latter, namely the end of Round 17. Its successor, another Round 0 block, stabilized time between blocks to 30 seconds.

Lessons and take-aways

The foremost conclusion is that what we experienced is just Tenderbake at work.

We are relieved to see the implementation of Tenderbake working as expected in a slow consensus scenario. Still, we regret having experienced the worst slowdown in block propagation since the adoption of Tenderbake on Tezos Mainnet:

#2244609, the first block of the Tenderbake era required 15 rounds to reach consensus, as some delegates needed time to adapt to the migration.
#2244610, the second block in the Tenderbake era, required 13 rounds.
#2490369, the first block of Jakarta, required 12 rounds – due to an expected longer migration due to patching contracts during stitching.

Looking ahead, we consider possible optimizations targeting scenarios like the one described above. For instance, Tenderbake could be less aggressive when increasing round duration. Another possibility is to allow bakers which are not locked to still consider late preendorsements for older payloads, instead of just discarding them. This would allow them to re-propose a block consistent with the potentially locked payload, leading to faster agreement.

When we first implemented Tenderbake, we gave more priority to the safety of the implementation over heuristics to improve liveness. Today, as we propose to further reduce the time between blocks, these optimizations become more relevant, and we look to implement them in the near future.

Lima, the latest Tezos upgrade, is LIVE!

2022-12-19T03:30:00+01:00

On December 19 2022 01:57:59 UTC, the Tezos blockchain successfully upgraded by activating the Lima proposal at block #2,981,889.

This 12th Tezos protocol upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, & Functori.

Lima introduces:

More pipelining: Speeding up propagation of operations and blocks to enable higher Layer 1 throughput on Tezos. This work is also the foundation for reducing block time to 15 seconds in the upcoming Mumbai proposal.
Consensus keys: This much requested feature lets bakers change their key for signing blocks and consensus operations without changing the baker’s public address.
Improvements to Tickets: Ticket ownership updates are now part of transaction receipts, which helps indexers keep track of tickets. Also, zero-amount tickets will be deprecated.
Ghostnet fixes: Lima fixes two problems that arose during the migration from Jakarta to Kathmandu on the permanent ‘Ghostnet’ test network.
Liquidity Baking sunset removed: The subsidy can be turned off with the moving-average toggle introduced with the Jakarta upgrade.
Temporary Timelock deprecation: Due to a discovered vulnerability, origination of new contracts using Timelock are disabled while a safer mechanism is developed.

To learn more about Lima’s contents, see our preview post.

The changelog provides a detailed list of changes, and a general technical overview of Lima can be found in the protocol proposal’s technical documentation.

Octez v15.1

With Lima activated, we highly recommend upgrading to Octez v15.1, as this version fixes a bug in the bootstrap pipeline that would make v15.0 nodes apply blocks without prechecking them first.

As always, we encourage the ecosystem to participate in testnets. Read more about Tezos testnets here, and don’t hesitate to reach out in the Tezos Developer Slack or in the Tezos Discord if you need help getting started.

Pruning the context — and other seasonal activities

2022-12-12T18:00:00+01:00

TL;DR The latest Octez major release introduced a long-awaited feature: context pruning for rolling and full nodes’ storage — aka the garbage collection of the context storage. Context pruning significantly reduces disk usage of Octez nodes, lowering the barrier of entry, and improving the quality of life of node operators and bakers.

Where does all this data come from?

A desirable property of any blockchain is that the chain grows. Most blockchains share the notion of a “genesis block” — a Big Bang event where the first block was produced, and information is added continuously and (hopefully) forever: transactions, smart contract calls, NFT mints, etc. Ultimately, this ever-growing amount of data has to be stored by node operators.

In Tezos parlance, we speak of context when referring to the ledger state. For each blockchain level, the context reflects the account balances, the code and storage of smart contracts and other global data pertinent to the business logic of the Tezos economic protocol: information about stake and consensus rights, tickets, global constants, and more, that are updated by the block operations of a given level.

Moreover, the context storage in the Octez node is versioned: you can think of the context store as a key-value store mapping each block level (or each block hash) to the full context at that level. Of course, the real implementation is smarter (and more efficient). The Octez node relies on the Irmin storage system, developed and maintained by Tarides, which compresses and optimizes this history. Irmin works in a similar way that Git works, storing diffs between different versions of the same entry rather than recording each version of an entry in full¹. Still, this takes some space.

Thus, starting from its origin block, almost 3 million blocks ago² the size of the Tezos storage never stopped growing.

Why do we need to keep all this data around? There are two parts to the answer to this question. First, it is paramount enough participants do: For the sake of verifiability, we need to be able to replay — that is, to reproduce — the complete history of the Tezos blockchain from its genesis block to a desired point in time and see the resulting context is the expected one. This tasks entail being able to find the contents from any given block, at all times so we can apply them on top of the predecessor block’s context.

Then, storing the results of those computations (the context is computed and updated each time a node validates a new block) makes sense for efficiency purposes as well — compute once, read from disk later, as those values are immutable³. For instance, a baker should not need to replay the chain each time to find out, say, what were its delegators balances during the recent cycles in order to pay delegation rewards.

Do we all need to keep this data around? Indeed, not every node operator requires access to the complete history of the chain: a baker doesn’t need to know all the chain’s history to bake new blocks; and we rather rely on indexers and block explorers to query operation details from any given block in the past.

To cater these different needs, the Octez implementation provides different storage modes, known as history modes: rolling, full, and archive. Nodes running in archive history mode keep all data (and metadata) from all operations since genesis, while nodes running rolling or full history modes keep less information, and hence demand a leaner footprint. Indexers and block explorers rely on archive nodes, whereas bakers are recommended to rely on full nodes.

Even for users running the lighter rolling and full history modes, keeping multiple versions of the context imposes a significant disk space overhead.

Starting with Octez v15.0, we incorporate the first version of a long-awaited feature, dubbed context pruning⁴. This recurrent maintenance process drastically reduces the disk footprint of the context store for full and rolling nodes, and the overall free space required to operate an Octez Tezos node. Moreover, users should notice an overall improvement in node performance as well!

This feature is a result of our ongoing collaboration with Tarides. The bulk of the work entailed working under the hood in Irmin, to implement the necessary changes in the overall architecture so that the Octez node could profit from this new functionality.

Tarides has published a quite detailed, in-depth article about the technical aspects of context pruning, garbage collection, and the intricacies of its implementation within Irmin.

Today, we rather focus on a higher level description of this feature and how it fits in with the overall Octez architecture, and provide a report of our early experimentation on the impact of the new feature on our working Mainnet nodes.

What’s context pruning then?

As we mentioned earlier, there are three different history modes in Tezos. And here today we focus on two: rolling and full. These two history modes aim to keep only relatively recent chain data, and reduce the disk footprint of the node’s data directory. The full history mode stores the minimal data necessary to replay the chain from genesis, but doesn’t snapshot context information from older levels, like older balances or staking rights. The rolling mode is even more aggressive, as it keeps information from just a limited number of cycles — on Mainnet, the default is the last 6.

Still, the way in which irmin-pack (the Irmin back-end used by and conceived for Tezos) was designed did not allow to fully reap the rewards of running leaner history modes: it was not possible to “delete” the information from dead, unreachable objects (the data from blocks deemed to be too old to be kept in the store), and reclaim valuable free space without prohibitive performance trade-offs.

The new Irmin version addresses this limitation by re-structuring the design of the context store.

How does context pruning work? Context pruning is, in essence, an automated garbage collector (GC) mechanism. It executes periodically, and asynchronously, with regard to other Irmin tasks. The GC determines a window of data to be kept, and iterates over the data on disk to extract the information from blocks belonging to that window which needs to be preserved — those from reachable objects only. When all the reachable data is extracted, the storage will switch to that newly garbage collected data, and drop the former one, which contains unreachable objects.

As a result, once context pruning kicks in, the disk footprint from the context increases linearly in its true size instead of the age of the data directory, eliminating the need for manual maintenance operations on long-running Octez nodes. In addition, the reduced footprint entails a better performing Octez node, as read and write accesses become cheaper.

We refer to Tarides’ blog post for further details on how the process works. In the following section, we revisit the impact the new context storage has on Mainnet nodes.

Putting pruning in context

As excited as you are to test this new feature, we ran a limited experiment to assess the impact in our Mainnet nodes.

The experiment consisted in importing the same rolling Mainnet snapshot on two similar machines:

one running a vanilla Octez v15.0 node — that is, with context pruning enabled; and,
one machine running a v15.0 Octez node with context pruning manually (and artificially) disabled — so, no pruning for you.

The purpose of the experiment was to simulate running both nodes under the same conditions and throughout a period of time (validating around 40 cycles in a row) where we could witness the effect of several calls to the context pruning mechanism.

Figure 1 below plots the disk footprint of context storage on both machines; the raw data snapshot can be explored interactively here.

Figure 1: Contrasting context storage footprints on Octez v15.0 nodes: context pruning enabled vs. context pruning disabled.

We see that the disk footprint from the stored context for the “artificially unpruned” node quickly rose up to ~140 GB. On the vanilla Octez v15.0 node with context pruning enabled, the size of the context storage on disk is significantly smaller: it oscillates between ~30GB and ~60 GB — the spikes are caused by the duplication needed before the context pruning calls, and they will be lowered in upcoming Octez releases. Still, that’s a 2x — 4x reduction in disk footprint!

How to profit from context pruning

To benefit from context pruning, there are a few simple steps to follow. As the storage requires a particular format introduced in Octez v13.0 (storage version 0.0.8), it is mandatory to make sure that your node is running with such a compatible storage format.

The easiest way to make sure of that is to import a fresh snapshot using an Octez v15.1 node. You will find detailed instructions on how to import snapshots in the online docs.

If the data directory was imported recently enough, that is by a node running Octez v13.0 or later, then you can skip the step to import the snapshot. However, the first context pruning operation for nodes that have been running on the same data directory for a long time can take a while, and use a significant amount of memory. To avoid this, it is also recommended to import a fresh snapshot.

As context pruning is enabled by default, there is nothing much to do. If you have configured your node as usual, you should be all set. If something is amiss though, a warning message like this will be printed while starting your node.

... : garbage collection is not fully enabled on this data directory:
context cannot be garbage collected. Please read the documentation or
import a snapshot to enable it.

That’s pretty much all you should care about. Context pruning being an internal maintenance procedure, we have not considered (yet) scrapping specific telemetry to monitor it — other than, of course, witnessing its effect by measuring the full size as in Figure 1 above⁵. This maintenance procedure is executed at each cycle dawn. Note that, if you have imported a fresh snapshot, context pruning will only occur after having validated blocks from 6 complete cycles, i.e. around 3 weeks⁶.

Looking ahead

While a significant part of our collective focus in 2022 has been on developing and deploying Tezos (vertical) scalability solutions — like transaction or smart rollups — that doesn’t mean we have forgotten about making the Tezos Layer 1 faster and more secure. Together with earlier improvements to the Octez node’s storage layer, context pruning makes the overall node leaner, demanding less resources, and improving overall quality of life for node operators and bakers. In addition to that, it also improves the performance: a lighter context implies faster read/writes.

Further upgrades to Irmin and context pruning. As Tarides’ blog post announces, this is a first deployment of the context pruning mechanism, and more improvements to Irmin are already in development for future Octez releases. These target:

Avoiding duplication of the context before the actual pruning takes place (effectively keeping the footprint around 40GB).
Improving the overall performance of the mechanism to be less resource intensive.
Adding “a lower” layer to archive nodes, so that old data will be kept in it. The live data will remain in an lighter “upper layer”, allowing an overall performance increase.

Other changes in Octez v15.x In addition to the context pruning feature described in this article, the latest Octez major release includes other minor changes. It is also worth pointing out it was recently superceeded by a new minor release, Octez v15.1, to correct a small-but-critical bug in the bootstrap pipeline. We invite you to take a careful look at the Changelog before upgrading your node. Another breaking change is a complete renaming of all executables: tezos-client is now named octez-client, tezos-baker-014-PtKathma becomes octez-baker-PtKathma, etc. This was an overdue change, to reflect that the Octez implementation is one of possibly many Tezos node implementations, as described when we announced the Octez name. In a previous blog post, we have expanded on this rationale, and discussed the backward-compatibility mitigation provided in the release.

Getting ready for Lima! Note that the Lima protocol will activate on Tezos Mainnet on block #2,981,889, which is currently expected at early hours of December 19th CET. Octez v15.1 will be the minimal compatible version required in order to participate in the consensus. As such, we recommend you take sufficient time to test this new Octez version to avoid surprises closer to the activation date.

This is, of course, a very simplified version of the story. Tezos uses a particular Irmin back-end, irmin-pack, developed by Tarides and catering for our special needs. If you are interested in learning more about it — this blog post is a good entrypoint. ↩
Or a bit over 50 months or 4 years and something in human time. But, as Tezos reminds ourselves on each protocol upgrade activation… the natural unit of time in a blockchain is the block height or level. ↩
Oh dear! if they are not. ↩
Aka as garbage collection of the context storage. However, we prefer to rather use pruning here because, unlike the case of good old, memory management, garbage collection, the data in question is definitely not garbage. Even if the technique is indeed a garbage collection mechanism — as we discuss later on. ↩
If Figure 1 caught your attention and you want to know more on how to monitor your Octez Tezos node, we recommend you add this recent article to your reading list. ↩
For consensus reasons, the Octez node must keep the information from at least 5 cycles, as prescribed by the PRESERVED_CYCLES protocol constant. To ease bakers’ daily operations, the node keeps one additional cycle by default — that is, a total of 6 cycles. Until this 6-cycle window is reached, no context pruning will be performed. ↩

A faster and more scalable Tezos: A sneak peak at the Mumbai proposal

2022-12-08T18:00:00+01:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

With the Lima upgrade set to activate on Dec 19, at block 2,981,889, let’s look ahead to what’s next for Tezos. For our 13th upgrade proposal to come in 2023, we will journey from Lima in the Pacific to the gateway of India in the Arabian Sea: Mumbai.

There are 3 major innovations coming in the Mumbai upgrade proposal, in addition to other improvements. We recently wrote about how smart rollups are coming to Tezos, and with Mumbai they officially arrive. With the capabilities that Smart Rollups provide, the rails to reach 1 million TPS are in place. In addition, we are introducing Epoxy—our validity rollup (aka ZK-rollup) solution — on testnets. And lastly, Tezos is getting faster: we are reducing the block time to 15 seconds!

Smart Rollups activated

The activation of smart rollups in Mumbai is the result of an intense year-long undertaking. Smart rollups offer a powerful scaling solution for Tezos, allowing anyone to deploy decentralized WebAssembly applications with dedicated computational and networking resources. Smart rollups have been available on the Mondaynet testnet since the Lima proposal, but have undergone a number of significant improvements in their design and implementation during the development of Mumbai.

First, while in Lima each rollup had a dedicated inbox, in Mumbai there is only one global inbox shared by all rollups, which enables more use cases. Second, decentralized applications running on rollups can now retrieve information from a “reveal-data channel”, enabling access to data sources external to the inbox or even external to the Tezos blockchain. And third, smart rollups are forward compatible with the Data-Availability Layer (DAL), a data sharding feature that is in the works for improving scalability.

Epoxy: Validity Rollups on Mondaynet testnet

Validity rollups (also known as ZK-rollups) are coming to Tezos starting with Mumbai (in addition to Smart Rollups). This will be introduced gradually over multiple upgrades. Our implementation of validity rollups is named Epoxy, and allows for instant finality due to SNARK’s proof-of-validity, as opposed to the “refutation game” used by Optimistic Rollups. Epoxy allows for applications to be built on Tezos which do not necessarily need to rely on Tezos’ Layer 1 to publish and distribute data, but rather can implement their own data availability layers. This will enable building privacy-preserving solutions which require keeping sensitive information (reified as part of their state or transactions) private from the public Layer 1.

Epoxy will be soon available on Mondaynet and allows Tezos’ economic protocol to host new validity rollups written as circuits for the aPlonK proving system. The first instance that will be originated on Mondaynet is a transactional rollup handling tickets – that is providing a similar functionality to what TORUs do, but benefiting from instant validation.

Pipelining is fully deployed, giving us 15 second block times!

The validation pipelining project has been transforming the protocol’s internal business logic, and different milestones have been deployed with the Kathmandu and Lima protocol upgrades. Our end goal was clear from the start: streamlining the internal validation process without compromising the security of the chain.

Mumbai brings the final installment of this project, which delivers optimizations in the protocol by separating the validation from application of blocks and operations. This allows blocks and operations to be propagated faster, giving us confidence to reduce block times. With Mumbai, we halve the minimal time between blocks from 30s to 15s.

To summarize, in Kathmandu we first focused on applying this new validation discipline to manager operations—e.g. transactions and smart contract calls, which are the ones which consume the most resources. In Lima, we extended our scope to all operations supported by the Tezos economic protocol, and to how blocks are handled when nodes receive them. And now in Mumbai, pipelining is brought to the baker, so the block payload (smart contracts, etc.) is not executed before the proposed block is propagated—the block must only go through a lightweight validity check (see our previous discussions on pipelining for more details).

Since baking will only consist of sorting and validating selected operations and signing the block, the delay in block propagation is reduced significantly, giving us sufficient confidence to safely halve minimal block time to 15 seconds.

This is the second time Tezos reduces block times via on-chain protocol amendment. They were first cut in half from 1 minute to 30 seconds with the adoption of the Emmy* consensus algorithm, activated in Granada. And now with Mumbai, they are halved again to 15 seconds. We will continue to streamline the implementation of the Tezos economic protocol and the Octez suite to bring faster block times.

Other Improvements

Implicit account tickets: With Mumbai, implicit accounts can receive, store and send tickets to other implicit accounts and originated contracts. In the previous protocols, tickets could only be stored in contract/TORU storage or sent to other contracts/TORUs. With this upgrade, contracts may also send tickets to implicit accounts so that they can store and send to other entities later; or tickets can be exchanged between implicit accounts.
RPCs for ticket balances: We added two RPC endpoints to improve the visibility of ticket ownership.
- <contract-id>/all_ticket_balances endpoint that returns a complete list of tickets owned by a given contract by scanning the contract’s storage. It can only be used for originated contracts. In conjunction with the ticket update field in transaction receipt, this enables indexers to reconstruct a “global ticket table” that keeps track of all ticket ownership
- <contract-id>/ticket_balance endpoint that returns the given contract’s balance of the ticket with specified ticketer, content type, and content. It can be used for implicit accounts, originated contracts, and rollups. This can be used to double-check ticket ownership.
New Michelson operations: First, we extended the Michelson opcodes AND, OR, XOR, NOT, LSL and LSR to support logical operations on bytes. Their semantics are similar to those on nat. Second, we added an opcode to convert between bytes and nat.

Now it’s your turn

As it was the case in 2022, we are pushing to make 2023 a year of ground-breaking advances, and Mumbai brings to life the monumental work initiated in Kathmandu and continued with Lima. Of course, the technical advances in smart rollups and validity rollups (Epoxy) will only really show their power once new applications are built with them. So we encourage the Tezos developer community to build with these technologies and show everyone what is possible.

To learn more about Smart Rollups:

The road to a million TPS (and beyond): Smart rollups are coming (blog post).
Why the next generation of optimistic rollups are a game-changer for Tezos (blog post).
Scaling Tezos with rollups (presentation by developers).
Look to our blog for articles on Epoxy in the coming weeks.

Anyone interested in getting started with building a smart rollup node can reach out to contact@nomadic-labs.com.

Improving Randomness in Tezos with Verifiable Delay Functions

2022-11-21T18:00:00+01:00

In this article we explain how a novel cryptographic primitive called a Verifiable Delay Function (VDF) improves random seed generation in Tezos. This novel feature was announced in July, and went live with the activation of the Kathmandu protocol in September.

Randomness in a blockchain

Tezos is a blockchain that relies on a consensus mechanism based on proof-of-stake, which requires that participants be chosen at random for each cycle. These participants, or delegates, are appointed to bake and endorse blocks for which they earn rewards. Since a delegate’s stake can fluctuate, randomness is used as a mechanism to fairly distribute baking and endorsing rights based on a discrete probability distribution in proportion to the delegate’s stake.

However, since a blockchain is supposed to be deterministic (everything should be re-computable), how can we achieve randomness? Further discussion on why randomness is challenging in a blockchain can be found here. It is important to be able to produce random numbers in a way that can’t be gamed, since participants have a strong incentive to cheat in order to get more rights and thus more rewards.

Various approaches to getting random numbers in blockchains have been devised. Generally they either rely on external submissions of values that are provably random, or they combine multiple values produced by participants into a single random value. These range from VRF (Verifiable Random Function) solutions, including Chainlink’s VRF to the RANDAO algorithm used in Ethereum. The Tezos whitepaper describes our MPC (multiparty computation) commit-and-reveal scheme. While it precedes Ethereum’s RANDAO, we have subsequently adopted the term since the algorithms are similar.¹

Randomness in Tezos with RANDAO

Prior to Kathmandu, Tezos only used RANDAO (that is, our variant of it). The basic idea is to incentivize participants to submit random values, or “nonces”, that we combine on-chain into a “random seed”. When selected to submit one of these values, delegates stand to lose part of their rewards for a given cycle if they fail to do so.

RANDAO works as follows: to compute a seed for cycle n i.e. seed_n, we go back in time to cycle n-2-PRESERVED_CYCLES (PRESERVED_CYCLES is a protocol constant, set to 5 at the time of writing) where we asked delegates to submit commitments to random values commitments_{{n-2-PRESERVED_CYCLES}}. The commitments are hashes of random values. Submitting hashes rather than the random values themselves at this stage is done to prevent an adversary from adaptively choosing its random value to its advantage, for instance to achieve more baking rights and thereby more rewards. In the following cycle, n-1-PRESERVED_CYCLES, the delegates were asked to submit their random values nonces_{{n-1-PRESERVED_CYCLES}}. If the hash of these values equals the corresponding commitment, we store them. The seed is finally computed at the end of the cycle as the hash of the previous target seed and the stored revealed values (in the order in which they were committed): seed_n = Randao_{{n-1-PRESERVED_CYCLES}} = Hash(seed_{n-1}, nonces_{{n-1-PRESERVED_CYCLES}}). As long as one of the participants is honest (that is, they submit an honestly generated random value), and the hash function is secure, so is the scheme.

Last revealer’s advantage

As previously stated, this algorithm is deemed secure if at least one honest member participates. However, the algorithm isn’t perfect as it can offer an advantage to the last revealer. The last revealer, seeing all revealed values, can choose whether to reveal their commitment or not, in effect choosing a seed between two potential values, resulting in “biasable randomness”. In truth, the advantage can extend to that last n revealers if they collude.

This potential for bias can be mitigated in various ways. Examples include using a 1 bit clock and Avalanche’s RANDAO construction.

The other main alternative to RANDAO for generating unbiased randomness is Threshold Relay which is used in DFINITY. However, it relies on an expensive distributed key generation mechanism ( $\mathcal{O}(n^2)$ messages for n members) and high liveness requirements—the beacon may stall if even 15% of honest players go offline.

Unbiased randomness with Verifiable Delay Functions

Currently, the most sophisticated method to remove this bias in RANDAO is by using a Verifiable Delay Function (VDF). VDFs are a novel cryptographic primitive that allow a verifier to check that a value was computed by a prover as a result of a long computation. Appending a VDF phase after our current RANDAO scheme would prevent any participant from anticipating what the next seed might be, because the fast hash function is replaced by a (provably) long computation. While VDFs are state-of-the-art cryptographic primitives they were recently integrated into Tezos with the activation of the Kathmandu protocol upgrade. Ethereum is planning to add VDF for random number generation in the future².

For the implementation of the underlying cryptographic primitive, we adapt the VDF library developed by Chia. Chia however doesn’t use it for random number generation, but rather for creating time slots in its consensus algorithm, to prove some amount of elapsed time.

Seed Generation with VDF

The diagram below illustrates the steps in creating a random seed with VDF.

RANDAO with VDF works as follows:

In cycle # n-2-PRESERVED_CYCLES the parties publish commitments to random values;
In cycle # n-1-PRESERVED_CYCLES the nonces are revealed, but the revelation phase is now shortened from the entirety of the cycle to only the first NONCE_REVELATION_THRESHOLD blocks;
At block NONCE_REVELATION_THRESHOLD, the RANDAO output is then computed and used to instantiate a hidden order group (an algebraic group whose number of elements is unknown), and a challenge to compute a VDF solution for. Any party can query the information needed to compute the VDF solution through RPCs;
The first baker to disclose the correct solution before the end of cycle n-1-PRESERVED_CYCLES receives a small tip;
The random seed for cycle n is then computed as either the hash of the RANDAO output and the VDF solution, if a valid VDF solution is received, or the RANDAO output otherwise.

That is, seed_n = Hash(Randao_{{n-1-PRESERVED_CYCLES}}, VDF_{{n-1-PRESERVED_CYCLES}}), or seed_n = Randao_{{n-1-PRESERVED_CYCLES}} if no valid VDF was submitted in cycle n-1-PRESERVED_CYCLES.

Note that even with a valid VDF output we still use the time-tested RANDAO when computing the randomness seed, as shown in step 5, in order to make the whole system more robust. Indeed, while the VDF output can be considered as unpredictable it may not be random enough, hashing it guarantees the seed randomness. See the Tezos documentation and this Medium article for more details on VDF.

Conclusion

The regular upgrade cycles of the Tezos protocol mean the latest advances in cryptographic research can be quickly implemented into the protocol. While the RANDAO protocol is used in a number of blockchains, and while VDFs are widely discussed, Tezos is currently the only blockchain which uses it for randomness generation, making random seed generation in Tezos highly robust compared to its competitors.

And that is the ethos of Tezos: to continuously monitor new research and technological developments, evaluate what’s best suited to be adopted, and integrate it quickly.

Note that in some versions of RANDAO there is a vulnerability when using XOR to combine nonces. Tezos’ RANDAO hashes the nonces together so this vulnerability doesn’t exist. ↩
Ethereum is working on a hardware solution, while we currently use a software one (a hardware solution doesn’t exist yet). Note that Tezos and Ethereum are advisory partners in the VDF Alliance. ↩

The road to a million TPS (and beyond): Smart rollups are coming

2022-11-16T14:00:00+01:00

TL;DR: Smart rollups are feature complete, a reference manual is ready, and we are looking to onboard early adopters with relevant use cases.

The next generation of optimistic rollups on Tezos are nearing their inclusion in a protocol upgrade proposal, and we have interesting news to share.

If you haven’t read our introduction to smart rollups, the elevator pitch is that they are more than just smart contract-enabled optimistic rollups with high scalability.

They allow anyone to program and deploy their own interoperable Layer 2 solution in a number of popular programming languages, using a general-purpose infrastructure built for decentralization and security.

Fancy an EVM-compatible “sidechain” in the Tezos ecosystem? This can be implemented using the new rollup infrastructure. We are already exploring this and will ramp up our work once smart rollups are live on Tezos Mainnet.

Smart rollups also enable an entirely new way to write decentralized applications on Tezos. In addition to using smart contracts on Layer 1, developers will be able to implement a dapp as a rollup, though this is mostly relevant for users requiring high throughput and/or intense computation.

Doing so effectively makes the smart rollup work like an app-chain – a blockchain dedicated to a single application – but one that draws on the time-tested security model of the Tezos network.

Finally, let’s not forget scaling: smart rollups play a central role in our mission to reach one million transactions per second (TPS) on Tezos, as announced at the TezDev Paris conference earlier this year.

Note that the million TPS goal is one we have set for ourselves for the short term. In the long term, due to the combination of horizontal and vertical scaling, there is virtually no limit to the scaling potential of Tezos rollups.

Enshrined? What?

The key to the flexibility, decentralization, and security of these next-generation rollups is that they are implemented at the protocol level, making them a type of enshrined rollups.

Googling “enshrined” will get you the following definitions

“place (a revered or precious object) in an appropriate receptacle”
“preserving something in a form that ensures it will be protected and respected”.

And that is what enshrined rollups are about. Instead of deploying rollups as smart contracts like it’s done on, e.g., Ethereum, they are officially recognized by the Tezos protocol as special entities with particular privileges and features.

You could even argue that such tight integration with the underlying blockchain makes enshrined rollups a feature of Layer 1, rather than a Layer 2 solution.

Enshrined rollups also represent a social contract between protocol developers and the community. By design, they are equally available to everyone. And, if adopted by the community through a protocol proposal, they will be maintained and continuously improved in ways that benefit the Tezos network as a whole.

On the technical side, the benefits of enshrined rollups are significant

It generally enables complex features that are at best difficult and often impossible to implement using a smart contract-based rollup.
Rollup-related Layer 1 activity can be made much more computation/gas efficient.
Common infrastructure enables standardized and efficient communication with Layer 1 and between rollups, even with different execution environments.
Special data solutions can serve these rollups, so rollup-data doesn’t consume excessive storage and bandwidth on Layer 1.

Ready for data availability solutions

The ability to create tailored data solutions is particularly important for scalability.

While smart rollups are deeply integrated to the Layer 1 protocol, we want to avoid having Layer 1 becoming a bottleneck, as described in our post on scaling strategy. Trying to reach our million TPS goal would quickly fill up Layer 1 blocks, and bandwidth on the network would become an issue.

To tackle this challenge, we are implementing a general and flexible mechanism for providing arbitrary data to a rollup without going through Layer 1. The mechanism can be utilized in different ways, but we expect to see two approaches:

A Data Availability Committee (DAC) is an off-chain mechanism to keep data available for verification, which relies on a trusted set of data providers. This presents a compromise with regards to decentralization, and unless a user has specific needs for control of rollup data – for instance in a private solution – the use of DACs will likely be transitional.

The more decentralized approach is a Data Availability Layer (DAL), which serves data like a DAC, but is a “public good” provided by the protocol itself: a dedicated rollup data layer for Tezos. Such a layer is in development, and will be included in a future protocol upgrade proposal.

New reference manual for dapp developers

The (first) execution environment for smart rollups is WebAssembly (WASM), which was chosen for at least two reasons. First, WASM is designed for fast execution, and second, WASM is becoming a broadly adopted compilation target, which means you will be able to write your “smart rollup contracts” in a large set of popular programming languages.

We invite interested readers to explore our new reference manual, which demonstrates how to concretely implement a dapp as a smart rollup.

The manual covers

The generic optimistic rollup infrastructure developed for Tezos.
The life-cycle of a WASM smart rollup.
The software environment in which the smart rollup program is executed, including the API smart rollups can use.
How Rust can be used to implement a simple smart rollup.
How this simple smart rollup can be deployed on Tezos.

We have also released a Rust module that defines the low-level API that is made available to developers. We would love to see what kind of safe and efficient API Rust developers can come up with.

Of course, the language used is not limited to Rust. Any programming language that compiles to WASM and does not assume a browser as its sole execution environment can be used to implement smart rollups.

Start your WASM engines

The coming months are about making sure the Tezos ecosystem is off to a running start, once smart rollups are activated on Mainnet. It involves two important tasks:

Ecosystem testing: Smart rollups on Mondaynet are now feature complete. Things like APIs can still change, so (minor) breaking changes are possible, but there will be no paradigm or design changes. It’s time to start experimenting, and we are looking to onboard early adopters.

If you are part of a project/organization that requires the scalability or flexibility that smart rollups offer, don’t wait for Mainnet activation. Get in touch now, and get help with building your solution while helping us ensure smart rollups are optimized for our users’ needs upon Mainnet activation.

Tweaking and tuning: We are working on making sure the infrastructure for WASM smart rollups can handle the high throughput we have committed to reaching on Tezos. For instance, the rollup node that will be shipped with Octez will make good use of the performances of the WASMER execution engine.

Provided that no blocking issues are uncovered in this final testing phase, activation of smart rollups on Mainnet will be included in the ‘M’ protocol proposal in early 2023.

We look forward to exploring the power of this new rollup engine, and to sharing our experiences with the community. Stay tuned!

Learn more about smart rollups:

Why the next generation of optimistic rollups are a game-changer for Tezos (blog post)
Scaling Tezos with rollups (presentation by developers)

Announcement: New Names for Octez Executables in v15.0

2022-10-11T14:00:00+02:00

In Octez v15.0, the next major release of the Octez suite, the names of executable binaries will change:

the tezos- prefix becomes octez-, and;
the protocol number (e.g., the 014 in tezos-tx-rollup-node-014-PtKathma) is removed from protocol-specific executables — baker, accuser, Layer 2 nodes and clients, etc.

These changes have already been implemented in the Octez master branch, and will be enforced from its first release candidate, v15.0~rc1.

The prefix change from tezos- to octez- is enacted for consistency with the name we have given to this specific implementation of Tezos — the motivation for these changes was described in this blog post when Octez was announced to the world. This step is meant to further clarify that Octez executables are one particular implementation of Tezos — that is, of everything but the economic protocol itself.

Removing protocol numbers will make it clearer that they are not related to the Octez version numbers. Indeed, some users were confused when Octez adopted the use of version numbers, as they eventually grew to match protocol numbers (although this was the result of a fortuitous coincidence).

Furthermore, removing the protocol number will help clarify why different versions of protocol-specific binaries are shipped with Octez releases, making instructions (specially around protocol upgrades) more precise and less ambiguous. Now, those executables only contain the name of the protocol they target, which should make it clearer that they are a particular implementation, which is part of a specific version of Octez.

Here is the full list of renamings:

tezos-node becomes octez-node.
tezos-client becomes octez-client.
tezos-admin-client becomes octez-admin-client.
tezos-signer becomes octez-signer.
tezos-codec becomes octez-codec.
tezos-protocol-compiler becomes octez-protocol-compiler.
tezos-proxy-server becomes octez-proxy-server.
tezos-validator becomes octez-validator.
tezos-accuser-014-PtKathma becomes octez-accuser-PtKathma.
tezos-accuser-015-PtLimaPt becomes octez-accuser-PtLimaPt.
tezos-accuser-alpha becomes octez-accuser-alpha.
tezos-baker-014-PtKathma becomes octez-baker-PtKathma.
tezos-baker-015-PtLimaPt becomes octez-baker-PtLimaPt.
tezos-baker-alpha becomes octez-baker-alpha.
tezos-tx-rollup-client-014-PtKathma becomes octez-tx-rollup-client-PtKathma.
tezos-tx-rollup-client-015-PtLimaPt becomes octez-tx-rollup-client-PtLimaPt.
tezos-tx-rollup-client-alpha becomes octez-tx-rollup-client-alpha.
tezos-tx-rollup-node-014-PtKathma becomes octez-tx-rollup-node-PtKathma.
tezos-tx-rollup-node-015-PtLimaPt becomes octez-tx-rollup-node-PtLimaPt.
tezos-tx-rollup-node-alpha becomes octez-tx-rollup-node-alpha.

Adapting to these changes

We understand that these are breaking changes, which impact possibly years of existing deployment infrastructure and documentation. With this in mind, we have included in Octez v15.0 the following provisions for maintaining backwards compatibility.

When building Octez v15.0 from sources using make, symbolic links will be created from the old names to the new names. The list of symbolic links is exactly the list given above. For instance, a symbolic file named tezos-baker-014-PtKathma is created, which targets octez-baker-PtKathma. This way, the changes brought in by Octez v15 should be compatible with, e.g., existing deployment scripts which rely on the explicit names of previous Octez versions’ binaries.

In the case of Octez Docker images, the transition is slightly less smooth: When specifying the PROTOCOL environment variable to choose which baker and accuser to run, the protocol number must not be included. For instance, PROTOCOL=PtKathma must be used instead of PROTOCOL=014-PtKathma.

Additionally, there are no symbolic links from the old names to the new names in the Docker images themselves. However, the entrypoint script accepts both tezos- and octez- prefixed commands. The old tezos-node and tezos-baker commands should still work as before, although it is now strongly recommended to use the newly added respective replacement commands, octez-node and octez-baker, instead.

In the case of Octez executables downloaded from their release page on GitLab, these static binaries use the new naming conventions. Users will need to either adopt the new names, or create symbolic links themselves.

Note that the old naming convention is now formally deprecated. In a future major release, possibly Octez v16.0, we plan to remove symbolic links as well as the old names for the commands in the Docker image entrypoint. It is thus a good idea to try to migrate to the new naming conventions before Octez 16.0 is out!

If you are unsure about what needs to be done on your end, reach out to us on the Tezos baking Slack or feel free to contact the Nomadic Labs support team.

Announcing Tezos’ 12th protocol upgrade proposal, “Lima”

2022-10-10T16:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, & Functori.

We’re happy to announce that Lima, our next Tezos protocol proposal, is ready. As usual, Lima’s ‘true’ name is its hash, which is PtLimaPtLMwfNinJi9rCfDPWea8dFgTZ1MeJ9f1m2SRic6ayiwW.

The main features of this protocol upgrade proposal are:

More pipelining: The work to separate validation from application of operations and blocks continues. It will enable higher Layer 1 throughput on Tezos.
Consensus keys: This much requested feature lets bakers change their key for signing blocks and consensus operations without changing the baker’s public address.
Improvements to Tickets: Ticket ownership updates are now part of transaction receipts, which helps indexers keep track of tickets. Also, zero-amount tickets will be deprecated.
Ghostnet fixes: Two problems arose during the migration from Jakarta to Kathmandu on the permanent ‘Ghostnet’ test network. These are fixed in Lima.
Liquidity Baking sunset removed: The sunset is no longer needed, as the subsidy can be shut off with the moving-average toggle introduced with the Jakarta upgrade.
Temporary Timelock deprecation: Due to a discovered vulnerability, origination of new contracts using Timelock are disabled while a safer mechanism is developed.

To learn more about Lima’s contents, see our full preview post.

The changelog provides a detailed list of changes, and a general technical overview of Lima can be found in the protocol proposal’s technical documentation.

Note that, if Lima is voted in by the community, upgrading to Octez v15.0 (or later) will be necessary for participating in consensus. A release candidate for Octez v15.0 will be published in the coming days, and a dedicated protocol test network Limanet is also scheduled to begin soon – more information about the latter test network will be available on https://teztnets.xyz/.

Rollup testers wanted

Development of next-generation enshrined optimistic rollups on Tezos is progressing steadily. Functioning rollups implemented in Rust are currently running in our test-suite.

However, these rollups are not part of the Lima protocol upgrade proposal, as more time needs to be spent on validation, testing, and integration with ecosystem tools before they can be activated on Mainnet.

We highly encourage ecosystem participants to start experimenting with these rollups on the Mondaynet testnet, where they are already activated.

Broad testing and feedback from the ecosystem is invaluable in our efforts to minimize the risk of undetected issues upon Mainnet activation. Read more about how to get started.

What to expect in Lima — our 12th protocol upgrade proposal for Tezos

2022-10-05T18:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda & Functori.

With the Kathmandu upgrade successfully activated, it’s time to look at what’s next for Tezos.

For our 12th upgrade proposal, we are leaving the Himalayan mountains behind and heading to the coastal climate of Lima, capital of Peru and one of the largest cities in the Americas.

But before we embark on that journey…

An update on optimistic rollups

Development of next-generation enshrined optimistic rollups on Tezos is progressing steadily. Functioning rollups implemented in Rust are currently running in our test suite.

These rollups will however not be part of the Lima protocol upgrade proposal, as more time needs to be spent on validation, testing, and integration with ecosystem tools before they can be activated on Mainnet.

We highly encourage ecosystem participants to start experimenting with these rollups on the Mondaynet testnet, where they are already activated. Broad testing and feedback from the ecosystem is invaluable in our efforts to minimize the risk of undetected issues upon Mainnet activation. More on this further below.

Now, let’s look at the series of improvements to Tezos functionality contained in the Lima upgrade proposal.

More pipelining for a faster blockchain

Our pipelining work continues to separate validation from application of operations and blocks in order to speed up their processing. It may not sound sexy, but this is an important part of increasing Layer 1 throughput on Tezos.

A quick distinction:

Validating means performing basic checks, such as the cryptographic signature being valid, and that there are funds to pay fees. This is a light and quick process.
Applying means executing the full contents of the operation, whether a simple transaction or a complex contract call. This can be much more computationally intensive and hence time consuming for the node.

The Kathmandu upgrade reduced the number of times manager operations are applied by a node before it is propagated through the network. This minimizes the delay introduced with each node, as blocks and operations are gossiped through the peer-to-peer network.

The Lima protocol proposal extends pipelined validation to all remaining classes of operations, and to blocks themselves. When receiving a new block from its peers, a node will only check the validity of the block before forwarding it to other peers, speeding up block propagation on the network. Afterwards, the node will then apply the block.

The upcoming step will be to extend pipelined validation to block production itself, reducing the effort (and hence time) required for bakers to propose new blocks. Along with further upcoming optimizations, this opens the door to reduced block times.

This one’s for the bakers: introducing consensus keys

We are happy to introduce consensus keys — a feature which has been highly requested by bakers.

Consensus keys allow bakers to designate a special key — separate from the baker address key — for signing blocks and consensus operations, such as (pre)endorsements. The proposed implementation lets bakers change their consensus key without changing the baker’s public address.

Rotating keys is generally good practice in computer security. And this feature will be of great benefit in situations where:

There are concerns about a baker’s private key having been compromised.
A baker using a Key Management System (KMS) or Hardware Security Module (HSM) wishes to switch to a different setup. These generally don’t allow key extraction.
There is loss of access. E.g., if a geographically remote baking setup using KMS/HSM fails. With consensus keys, the baker can remotely deploy a new setup under the same baking address.

Hence, a baker’s delegators no longer need to actively redelegate to a new address, which was cumbersome and required off-chain coordination, reducing the chance of reaching all of the baker’s delegators.

The consensus key feature is based on contributions made by G.-B. Fefe (anonymous contributor) and Nicolas Ochem. As a reward for their involvement, invoices of respectively 15,000 and 10,000 tez are included in the Lima proposal.

Improvements to Tickets

We are deprecating creation, storage and transfer of zero-amount tickets. This removes a source of inconvenience and reduces the risk of bugs in smart contracts, but introduces a breaking change in the TICKET instruction.

Furthermore, we added ticket ownership updates to transaction receipts. This enables indexers to maintain a table that tracks which accounts own what tickets by traversing the receipts.

The change to receipts was introduced after fruitful meetings with ecosystem actors, and we are happy to collaborate with the ecosystem in this way and implement their valuable feedback.

In Lima, receipts will include this information for ticket minting/removal within a single contract and transfers between originated contracts. In upcoming upgrades, we will cover all combinations of transactions between implicit accounts, originated accounts, and rollups. A design document can be found here.

Ghostnet fixes

During the migration from Jakarta to Kathmandu on the Ghostnet test network, two problems arose, that are fixed in Lima:

The VDF feature activated itself with the same difficulty as on Mainnet, but cycles on Ghostnet are ¼ of Mainnet! So it’s impossible to do the required computation within the allocated time frame. For this reason, the VDF challenge’s difficulty on Ghostnet is now set to ¼ of Mainnet’s.
The length of a voting period on Ghostnet was changed from two cycles to one cycle. However, due to the way a protocol upgrade is executed, a “time until next period”-counter became negative after Kathmandu was activated, leading Ghostnet to not advance through voting periods automatically. Therefore, a ‘force reset’ of the voting period is scheduled for Ghostnet’s migration to Lima, should the proposal be adopted by the community.

Other changes

Liquidity Baking sunset removed: The liquidity baking sunset will be removed, since the subsidy can now be shut off with the moving-average toggle introduced with the Jakarta upgrade.
Temporary Timelock deprecation: As recently announced, a vulnerability has been discovered in the Timelock mechanism. A safer Timelock mechanism is currently being developed. As a preventive measure, Octez v14 already disabled interaction with smart contracts using this feature. The Lima protocol proposal complements this measure by preventing the origination of new contracts using this functionality. This is achieved by temporarily deprecating the CHEST_OPEN instruction in Michelson.

Help us get rollups rolling!

As mentioned initially, we are looking to have our next-generation optimistic rollups tested extensively and integrated with ecosystem tools before activation on Mainnet.

Careful and incremental integration in close cooperation with the Tezos ecosystem is always desirable, but with optimistic rollups being the backbone of Tezos’ scaling strategy and possibly the largest evolutionary leap for the Tezos protocol so far, this becomes paramount.

In particular, the Webassembly (WASM) Proof-generating Virtual Machine (PVM), which is at the heart of the system, is currently considered to be in beta.

The WASM PVM included in Lima lets users originate a rollup on the Mondaynet testnet by providing a WASM program — called a kernel — which interprets any Layer 2 operations targeting the rollup.

Aside from intra-rollup operations, these Layer 2 operations may also transfer assets wrapped as tickets from Layer 1 accounts to rollups, or produce asynchronous transfers of tickets from the rollup to Layer 1 contracts. A mechanism of data revelation is also introduced, so that data from sources external to Tezos’ Layer 1 can be imported in a rollup.

There is no restriction on who can write and deploy a WASM kernel. An example of a kernel facilitating transactions is available here, and we expect such kernels to fully replace Transaction Optimistic Rollups once kernel-based rollups are activated on Mainnet.

We encourage the ecosystem to start identifying use cases for kernel-based optimistic rollups, and to prepare for building WASM kernels of their own. We are currently finalizing the necessary documentation and guides, but anyone interested in getting started with building kernels can reach out to contact@nomadic-labs.com.

More info will be released soon. Stay tuned!

Kathmandu, the latest Tezos upgrade, is LIVE!

2022-09-23T23:00:00+02:00

On September 23 2022 20:36:44 UTC, the Tezos blockchain successfully upgraded by activating the Kathmandu proposal at block #2,736,129.

This 11th Tezos protocol upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori, Tweag, and G.-B. Fefe (an anonymous contributor).

Kathmandu’s main features are:

Smart contract optimistic rollups: the next generation of optimistic rollups for Tezos begin their journey towards future integration in Tezos. At this stage, they are available at the bleeding edge Mondaynet and Dailynet test networks to ensure the community has sufficient time to build integration, tooling and applications.
Pipelined validation of manager operations: increasing throughput for Tezos’ Mainnet, without compromising the network’s safety. The contributions of this ongoing project to Kathmandu reduce the need to fully execute time-expensive operations (like smart contract calls), before they reach a baker, resulting in a faster propagation of new blocks and operations across the network.
Improved randomness: integration of Verifiable Delay Functions into the protocol’s random seed generation, reinforcing the security of the rights allocation mechanism.
Tailored governance support for permanent testnets: changes brought by Kathmandu will reduce the need for user-activated upgrades in Ghostnet. The Oxhead Alpha team will be able to centrally, and automatically, upgrade this test network after a protocol proposal is elected in the Promotion period on Tezos Mainnet.
Event logging in Michelson smart contracts: this new feature will enable DApp developers to send publicly visible on-chain messages in order to trigger effects in off-chain applications.
A new operation for increasing paid storage of a smart contract.

For more details, see our Kathmandu preview post.

A deeper technical description can be found in the protocol proposal’s technical documentation, and a complete list of changes is provided in Kathmandu’s changelog.

Building solid foundations for the Future… now

As the Tezos network evolves, and new exciting features boost adoption, broad testing becomes increasingly important. Therefore, it is paramount to have as many bakers, builders and users as possible in our public test networks.

By running nodes, producing blocks, and deploying apps and infrastructure as early as possible, the community can gain significant foresight into the integration of new features. In addition, protocol developers can learn valuable insights from feedback, and adjust the design and implementation of features while protocols are still being developed.

In particular, extensive testing of next-generation optimistic rollups, currently available on Mondaynet and Dailynet, is of paramount importance, as these form the backbone of Tezos’ scaling strategy.

Read more about Tezos testnets here, and don’t hesitate to reach out in the Tezos Developer Slack or in the Tezos Discord if you need help getting started.

Monitoring Your Node with Octez Metrics

2022-09-22T14:00:00+02:00

Introduction

Until now, the only tool developers had to monitor the behavior of their Tezos node was to look at the logs, adjust the log verbosity, and reconstruct all relevant information from this stream. But getting more insight into a node’s performance was tedious and difficult. For instance, the number of connected peers, the number of pending operations, or the number of times the validator switched branches, were not easy to observe continuously.

After a few iterations of different methods to gather node information and statistics that could be easily analyzed, we have recently chosen to include metrics within the node. With Octez Metrics it’s simple to get a myriad of statistics about your node — and quite efficiently so. You can also attach a Grafana dashboard to get a visual representation of how your node is performing. And with Grafazos, you can get customized ready-to-use dashboards for monitoring your Tezos node.

A Grafazos dashboard looks like this:

Table 1: Grafana dashboard of a Tezos node

As you can immediately see at the top, the dashboard will tell you your node’s bootstrap status and whether it’s synchronized, followed by tables and graphs of other data points.

Node metrics

In previous versions of Octez, a separate tool was needed for this task. tezos-metrics exported the metrics which were computed from the result of RPC calls to a running node. However, since the node API changed with each version, it required tezos-metrics to update alongside it, resulting in as many versions of tezos-metrics as Octez itself. Starting with Octez v14, metrics were integrated into the node and can be exported directly, making it simple to set up. Moreover, as the metrics are now generated by the node itself, no additional RPC calls are needed anymore. This is why the monitoring is now considerably more efficient!

Setting up Octez Metrics

To use Octez Metrics, you just start your node with the metrics server enabled. The node integrates a server that registers the implemented metrics and outputs them for each /metrics HTTP request.

When you start your node you add the --metrics-addr option which takes as a parameter <ADDR:PORT> or <ADDR> or :<PORT>. This option can be used either when starting your node, or in the configuration file (see https://tezos.gitlab.io/user/node-configuration.html).

Your node is now ready to have metrics scraped with requests to the metrics server. For instance, if the node server is configured to expose metrics on port 9932 (the default), then you can scrape the metrics with the request http://localhost:9932/metrics. The result of the request is the list the node metrics described as:

#HELP metric description
#TYPE metric type
octez_metric_name{label_name=label_value} x.x

Note the metrics are implemented to have the lowest possible impact on the node performance, and most of the metrics are only computed when scraping it. So starting the node with the metrics server shouldn’t be a cause for concern. More details on Octez Metrics can be found in the Tezos Developer Documentation: see here for further detail on how to setup your monitoring; and here, for the complete list of the metrics scrapped by the Octez node.

Types of metrics

The available metrics give a full overview of your node, including its characteristics, status, and health. In addition, they can give insight into whether an issue is local to your node, or it is affecting the network at large — or both.

The metric octez_version delivers the node’s main properties through label-value pairs. It provides the node version or the network it is connected to.

Other primary metrics you likely want to see are the chain validator¹ ones, which describe the status of your node: octez_validator_chain_is_bootstrapped and octez_validator_chain_synchronisation_status. A healthy node should always have these values set to 1. You can also see information about the head and requests from the chain validator.

There are two other validators, the block validator² and the peer validator³, which give you insight on how your node is handling the progression of the chain. You can learn more about the validators here.

To keep track of pending operations, you can check the octez_mempool metric.

You can get a view of your node’s connections with the p2p layer metrics (prefixed with octez_p2p). These metrics allows you to keep track of the connections, peers and points of your node.

The store can also be monitored with metrics on the save-point, checkpoint and caboose level, including the number of invalid blocks stored, the last written block size, and the last store merge time.

Finally, if you use the RPC server of your node, it is likely decisive in the operation of your node. For each RPC called, two metrics are associated: octez_rpc_calls_sum{endpoint="...";method="..."} and octez_rpc_calls_count{endpoint="...";method="..."} (with appropriate label values). call_sum is the sum of the execution times, and call_count is the number of executions.

Note that the metrics described here are those available with Octez v14—it is likely to evolve with future Octez versions.

Dashboards

While scraping metrics with server requests does give access to node metrics, it is, unfortunately, not enough for useful node monitoring. Since it only gives a single slice into the node’s health, you don’t really see what’s happening over time. Therefore, a more useful way of monitoring your node is to create a time series of metrics.

Indeed, if you liked the poster, why not see the whole movie?

The Prometheus tool is designed for this purpose, to collect metric data over time.

In addition, in order to get the most out of your metrics, it should be associated with a visual dashboard. A Grafana dashboard, generated by Grafazos gives a greater view into your node. Once your node is launched, you can provide extracted time series of metrics to Grafana dashboards.

The Grafazos version for Octez v14 provides the following four ready-to-use dashboards:

octez-compact: A compact dashboard that gives a brief overview of the various node metrics on a single page.
octez-basic: A basic dashboard with all the node metrics.
octez-with-logs: Same as basic but also displays the node’s logs, with Promtail Promtail (for exporting the logs).
octez-full: A full dashboard with the logs and hardware data. This dashboard should be used with Netdata (for supporting hardware data) in addition to Promtail.

Note that the last two dashboards require the use of additional (though standard) tools for hardware metrics and logs (Netdata, Loki, and Promatail).

Let’s look at the basic dashboard in more detail. The dashboard is divided into several panels. The first one is the node panel, which can be considered the main part of the dashboard. This panel lays out the core information on the node such as its status, characteristics, and statistics on the node’s evolution (head level, validation time, operations, invalid blocks, etc.).

The others panels are specific to different parts of the node:

the p2p layer;
the workers;
the RPC server;

along with a miscellaneous section.

Some metrics are self-explanatory, such as P2P total connections, which shows both the connections your node initiated and the number of connections initiated by peers. Another metric you may want to keep an eye on is Invalid blocks history, which should always be 0 — any other value would indicate something unusual or malicious is going on.

Another useful metric is the Block validation time, which measures the time between when a request is registered in the worker till the worker pops the request and marks it complete. This should generally be under 1 second. If it’s persistently longer, that could indicate trouble too.

Graph 2: Block validation time

The P2P connections graph will show you immediately if your node is having trouble connecting to peers, or if there’s a drop-off in the number of connections. A healthy node should typically have a few dozen peer connections (depending on how it was configured).

Graph 3: P2P connections

The Peer validator graph shows a number of different metrics including unavailable protocols. An up-to-date, healthy node should see this as a low number. If not it can indicate that your node is running an old version of Octez, or that your node is being fed bad data from peers.

Note again these dashboards are built for Octez v14 and are likely to evolve with the Octez versions.

Working with Grafazos

Grafazos allows you to set different options when generating the ready-to-use dashboards described above. For instance, you can specify the node instance label, which is useful for a dashboard that aims to monitor several nodes.

Furthermore, you can manually explore the metrics from the Prometheus data source with Grafana and design your own dashboards. Or you can also use Grafazos to import ready-to-use dashboards for your node monitoring. You can find the packages stored here. There is a package for each version of Octez.

Grafana is a relatively user-friendly tool, so play with creating a custom dashboard as you like. You may also want to use the “explore” section of Grafana. Grafazos is also particularly useful in automatic deployment of Tezos nodes via provisioning tools such as Puppet or Ansible.

Conclusion

We developed Octez Metrics to give Tezos users a better insight into how their node is performing, and to observe the overall network health. This is a product that will continuously improve, and we encourage node operators and bakers to suggest new features and additional metrics that they would like to see. We recognize that the best way to keep your node healthy — and in turn, to keep the entire Tezos network healthy — is to provide everyone the tools needed to monitor their setup.

The chain validator is responsible for handling valid blocks and selecting the best head for its chain. ↩
The block validator validates blocks and notifies the corresponding chain validator. ↩
Each peer validator treats new head proposals from its associated peer, retrieving all the operations, and if valid, triggers a validation of the new head. ↩

Verify, but test: extracting QCheck Property-Based Tests from F* specifications

2022-09-12T17:00:00+02:00

The focus of this blog post is on the automated extraction of an F* specification to OCaml and its execution as a QCheck Property-Based Test against an OCaml implementation. This work was done by Antonio Locascio during a 4 month internship at Nomadic Labs under the supervision of Germán Delbianco and Marco Stronati.

This work will also be presented at the ML 2022 workshop.

Introduction

A typical workflow when developing high-assurance software is to start with a prototype to better understand the problem at hand, write a specification for the desired system (having learnt from all the mistakes in the process of implementing said prototype), write a reference implementation of the specification, and then formally verify that the implementation matches the specification, a process that is slow and costly.

However in a fast paced development cycle, the not-yet-verified implementation is often sent to production, continues to be improved over time and becomes a moving target for the verification effort, which often does not bear fruit. This is not a new problem. One of the main goals behind the development of novel formal verification tools has been to reduce the cost of verification, to allow it to be better integrated in the development cycle (see for example proof oriented programming).

In many cases it is possible today to extract an implementation that matches a specification, thus merging the last two steps of the workflow described above.

Going a step further, we would like to start sketching a specification while we develop the prototype and use this model to test the prototype itself. Thus moving from a linear development process to an iterative one, where implementation and specification are constantly refined and adapted. While abstracting away implementation details in the specification, the verification gap problem arises: given that the verified model is loosely coupled with the implementation, do the properties that we have proved for the model really hold for the implementation?

One way of addressing the verification gap problem is to rely on Property-Based Testing (PBT), that is, to extract the model as properties that can be executed as tests on the implementation. The focus of this work is on a lightweight automated tool for extracting F* specifications to OCaml, and its execution as a QCheck test against an OCaml implementation. We propose to OCaml programmers to write specifications as F* programs, instead of manually writing QCheck tests. By doing this, our tool will automatically extract the specification as QCheck properties, which can then be used to test either the code extracted from the model, or another implementation.

Thanks to this approach, rather than waiting for a finished implementation and working on a complete formal verification, we can move along a verification spectrum. Start with a prototype implementation tested by PBT on its co-developed abstract model, refine the model over time in order to extract parts of it and replace them in the implementation until (possibly) a full extraction can be achieved.

It should be noted that a full extraction is sometimes too costly, as it may require the model to precisely represent complex parts of the code that are not essential to the program’s core functionality. However, being able to find the sweet spot along this spectrum where only the crucial parts have been verified is still very valuable and common practice.

Even where a full extraction can be achieved, the complex toolchains and large trusted computing base can still benefit from a testing infrastructure. For example it is common practice to assume (and axiomatize) the existence of a List library that, during extraction, will be linked to the List module of the standard library, despite the fact that there is no formal connection between the two. For this reason, even if our properties have already been proved in the model, they should also be tested in the extracted implementation. In other words the verification gap may never be really closed in full, even at the end of the spectrum, and property based testing is a way to account for it in the development process.

The rest of the document is structured as follows: we first give a brief overview of the F* language and Property-Based Testing. Then, we give a detailed account of our proposed specification extraction tool for F*. After that, we present the case study for our workflow, Incremental Merkle Trees, which illustrates the wide spectrum of verification and testing strategies. Next, we describe our experience of working with F*, and explain our contributions to that project. Finally, we discuss related projects and some future work.

Overview

The workflow presented in this post, and the tooling developed, requires choosing a verification framework capable of extracting correct code to the target programming language, and a PBT library for the target language. In particular, given our OCaml codebase, we choose F* as a mechanization medium, and QCheck as a testing library. In the following, we give a brief overview of both systems.

An F* primer

F* is an ML-style functional programming language aimed at formal verification. F* combines automated verification through an SMT solver with the interactive functionality of a proof assistant based on dependent types, making it a hybrid verification tool. After the verification step, an F* program can be extracted to OCaml in order to be executed. Additionally F* supports meta-programming, which is mainly developed to support proving by tactics, but is a crucial feature for this work as it allows the extraction of specifications.

We’ll work through an example to introduce F*’s fundamental aspects. Let’s take a look at the following function, which reverses a list:

val reverse : list α -> Tot (list α)
let rec reverse l =
  match l with
  | []      -> []
  | x :: xs -> reverse xs @ [x]

Those familiar with ML-style languages (such as OCaml) will straight away feel at home with F*’s syntax. One difference with OCaml’s syntax is that in F* signatures are usually given before a definition. These are a central aspect of F*, because of its rich type system.

One key feature of the type system is the effect system. Notice that in the return type of the example, Tot is an effect annotation, expressing that the function is total, pure and non-diverging. The generic shape of a function type in F* is α -> E β, where E is an effect. Additional built-in effects are: Dv, for diverging functions, ST for stateful computations, and ML for functions with arbitrary side-effects. Tot is the default effect, meaning that its declaration can be omitted.

Refinement types are another important part of F*’s type system. The refinement type x:T{φ(x)} is a sub-type of T for which all its inhabitants satisfy the property φ, also called the refinement. In other words, type x:T{φ(x)} is the type of values x from T for which φ(x) is true. For simplicity, we can assume that the predicates φ are total functions returning a boolean, although they can be more general. For instance, we can use them to give a more precise signature to our example:

val reverse : l:list α -> Tot (l':(list α){length l' = length l})

Here, we use the refinement type to state that we expect the resulting list to have the same length as the original one. To refer to the original list, we needed to name the argument (l) in the signature, making this a dependent function type.

F* provides an alternative syntax for writing a function’s pre- and post-conditions explicitly. Let’s look at it with a new example:

val tail : l:list α -> Pure (list α)
                            (requires (length l > 0))
                            (ensures (fun l' -> length l' = length l - 1))

Pure is just a version of the Tot effect that, in addition to the type of the return value (list α), is parameterized by a pre-condition (requires clause) and a post-condition (ensures clause). Then, this signature is equivalent to:

val tail : l:(list α){length l > 0} ->
           Tot (l':(list α){length l' = length l - 1})

It’s not always desirable to pollute a function’s signature with properties about it. That’s when lemmas come into play. In F*, a lemma is a computationally irrelevant function (a total function returning unit), used to state some property about a program. For example, we could state (and subsequently prove) that our reverse function is involutive with the following lemma:

val reverse_involutive : l:list α -> Lemma (l = reverse (reverse l))

A more detailed introduction to the F* language can be found in its tutorial.

Property-Based Testing with QCheck

Property-based testing (or PBT for short) is a testing discipline in which properties about programs are checked to be true by validating them on a large number of randomly generated examples. Property-based testing a program generally requires the user to do two things: (i) to define the desired properties about the code, and (ii) to provide functions for generating random inputs for those properties (commonly known as generators).

To aid the definition of these properties and custom data type generators, several libraries have been developed. One of these is QCheck, which offers a wide range of combinators to do PBT in OCaml. In this project we present a tool that automatically extracts QCheck properties from F* specifications. The generator definition, however, is still left to the user, as explained in the following section.

Specification Extraction Toolchain

Let’s start with some motivation for using a specification extraction tool. Say you have an OCaml program, or even an idea for one, and you want to write a specification for it. We propose that, instead of directly writing QCheck tests for it, you write the specification as an F* program. If you do this, our tool will automatically extract the specification as QCheck properties, which can be used to test both the code extracted from the model and the original implementation.

It’s not required to do a full verification of the code in order to use our tool. As previously mentioned, one could specify an abstract model of the implementation, whose properties could be tested against the more complex OCaml implementation. Further, one might even write the specification as an axiomatization of the implementation, i.e assuming the properties and leaving their actual proofs as a future step.

In order to extract a specification written in F* as properties for property-based testing we introduce a Meta-F* tactic called extract_spec, which is defined in the SpecExtraction module.

This tactic does two things. First, it extracts a function’s pre- and post-conditions as OCaml boolean functions. Second, it synthesizes the QCheck boilerplate code for defining the test.

Functionality

Before getting into the tool’s design, let’s take a look at a simple example of its usage. Its main functionality consists of extracting a function’s or lemma’s pre- and post-conditions as OCaml boolean functions. For instance, given the following F* function:

val foo : x:T{P1 x} -> Pure T' (requires (P2 x)) (ensures (fun y -> Q x y))
let foo x = ...

One can call the extract_spec tactic through a splice, which inserts syntax generated through Meta-F*. This is done by adding the following line (anywhere foo is in scope):

%splice[] (extract_spec (`%foo))

As a result, in the extracted OCaml code of the module where the splice was called, the following predicates will be defined:

let foo_pre = fun x -> P1 x && P2 x
let foo_post = fun x -> fun y -> Q x y

As shown in this example, a function’s pre-condition is parametrized by the function’s arguments, while its post-condition takes as an additional argument the function’s return value.

The second step, generating the QCheck tests, is performed by the make_test tactic (called by extract_spec). Continuing with the example, this tactic generates the following declarations:

let (gen_foo_args : T FStar_QCheck.arbitrary)
  = FStar_QCheck.undefined_gen "Generator for foo's input not yet implemented"

let (test_foo_spec : FStar_QCheck.test_t) =
  FStar_QCheck.test_make "foo_spec" gen_foo_args
    (fun a0 ->
      match a0 with
      | x -> (FStar_QCheck.assume_ (foo_pre x);
              foo_post x (foo x)))

The first of these corresponds to a template definition of the QCheck generator for foo‘s arguments. Deriving the generators is outside the scope of this project. Some discussion regarding this can be found in this issue.

The second is the definition of the QCheck test, in which the pre-condition is assumed and the post-condition is checked. The FStar_QCheck module is just a simple interface wrapper to expose some QCheck functions in F*.

Tool’s design

The tool’s design revolves around Meta-F* tactics. Meta-F* is a metaprogramming framework for F*, i.e. a library for manipulating F* terms from within F*. A tactic, then, is simply a function that uses Meta-F*.

The bulk of the specification extraction work is performed by the extract_spec tactic, which takes as argument the name of the function the user wants to extract a specification from. We’ll give an overlook of the algorithm it implements:

The first step is to query the environment to get the function’s type. This could either be annotated in a signature or inferred by the typechecker.
The retrieved type is subject to some (mostly trivial) transformations. The most interesting preprocessing step is synonym resolution (for example turning nat into v:int{v >= 0}), which is crucial to capture properties hidden in refinements.
Next, the type is traversed, one arrow at a time, collecting the names, types and refinements for each argument. For instance, for a type x1:T1{P1 x1} -> x2:T2{P2 x1 x2} -> C, collect_args will compute the list: [(x1;T1; fun x1 -> P1 x1); (x2;T2; fun x1 x2 -> P2 x1 x2)]. For each refinement we abstract all the previous binders. For now, we leave the final computation C untouched. It’s important to note that an argument’s type might have nested refinements, e.g. x:(v:int{v >= 0}){x < 5}. In this step, these refinements are flattened to get fun x -> x >= 0 && x < 5.
All of the arguments’ refinements are then joined to get the function’s implicit pre-condition.
Finally, the final computation type C is inspected. From it the post-condition (both from an ensures clause and the return type’s refinement) and potential explicit pre-condition (requires clause) are retrieved. When there is an explicit pre-condition, it is joined with the computed implicit pre-condition. There’s special treatment for lemmas, as in that case the post-condition doesn’t require an extra argument.

Most of this procedure is performed by the pre_post tactic defined in SpecExtraction.PrePost. The QCheck test boilerplate generation, defined in SpecExtraction.TestGeneration.make_test, is not discussed, as it’s mostly trivial syntax declaration.

Limitations

Our specification extraction tool has two limitations worth mentioning.

The first one is that only bool refinements are supported. This is necessary, as Type predicates might not be computable, for they can include universal and existential quantifiers anywhere in the term.

The second is that only pure functions work with the specification extraction mechanism for now. Although some other effect monads might be easily supported, this is not true in general. Supporting the ever-so-present ST effect, that carries a model of the heap in the type, for instance, doesn’t seem a trivial matter.

Case study: Incremental Merkle Trees

We applied our proposed workflow of verification and testing to the implementation of the Incremental Merkle Tree data structure, used in the Sapling protocol. This was a suitable target for our project, as:

It’s a self-contained piece of code, whose only important dependencies are well-specified cryptographic primitives.
The data structure has clear invariants, the preservation of which is not trivial.
There are multiple implementations for it, of varying complexity, which help showcase the wide spectrum of verification and testing models.

We now proceed to give a brief introduction to Sapling and IMTs, before delving into a detailed account of our case study.

Sapling

Sapling is a protocol used in Tezos that enables privacy-preserving transactions of tokens. For storage purposes, this protocol uses the Incremental Merkle Tree data structure, or IMT. This IMT structure is simply a fixed height Merkle tree, in which the leaves are only stored on the last level in the leftmost positions and cannot be deleted. Because of their use in Zero-Knowledge proofs, the IMTs must always be considered to be full trees of fixed capacity. In the next section, we provide a more detailed description of IMTs. For more documentation, refer to the protocol’s specification.

In Tezos, there are currently two implementations of Sapling. The first one, found in lib_sapling, implements IMTs through purely functional ADTs. In contrast, the second implementation, which is part of the economic protocol, is written in a stateful style to make use of the protocol’s key-value storage. While the latter is more efficient, the former is simpler and thus easier to verify.

Incremental Merkle Trees

An Incremental Merkle Tree of height h contains 2^h leaves and h + 1 levels of nodes, with all leaves at level 0 and root at level h.

For now, we focus on the IMT implementation found in the Storage module from lib_sapling. This implementation includes the following definition of an algebraic data type tree to represent IMTs:

type tree = Empty | Leaf of C.Commitment.t | Node of (H.t * tree * tree)

This type of trees have commitments in their leaves and hashes (H.t) in their nodes. Although both of these have the same internal representation, they are differentiated because the commitments correspond to the values stored in the tree, while nodes’ hashes are used to preserve integrity (as in any Merkle tree).

The trees are always treated as being full, using the default value H.uncommitted 0 for unused leaves. All the nodes at the same level of an empty tree have the same hash, which can be computed from the default value of the leaves. These hashes are stored in the H.uncommitted list, so H.uncommitted n is the hash of an empty tree of height n.

To avoid storing huge empty trees, any subtree filled with default values is represented by the Empty constructor and given its height it’s possible to compute its hash using the H.uncommitted list. Because of this, we’ll sometimes refer to this representation of IMTs as compressed trees.

The leaves are indexed by their position pos, ranging from 0 to (2^h) - 1. The tree is incremental in the sense that leaves cannot be modified but only added and exclusively in successive positions from the leftmost to the rightmost leaf — i.e., from 0 to (2^h) - 1.

IMT F* Model

Commitments axiomatization

Let’s begin our dive into the model by focusing on the Commitments module. This interface axiomatizes the cryptographic primitives needed for IMTs, namely commitments and hashes. Here is where most of the model’s assumptions are introduced, most notably the following lemma:

val perfect_merkle : h:v_height ->
  Lemma (ensures (forall h1 h2 h3 h4.
          merkle_hash h h1 h2 = merkle_hash h h3 h4
            ==> h1 = h3 /\ h2 = h4))

which states that the assumed Merkle hash function is injective at every valid level (h). This property, which can be read as “if two outputs are equal, then their inputs must be equal”, models the absence of collisions in an ideal hash function.

This module interface’s OCaml realization (ml/Commitments.ml) is just a wrapper for the corresponding functions in lib_sapling.

Machine integers

The second aspect of the model we’ll discuss is the use of machine integers. Our initial version of the model was based on F* unbounded ints, which are implemented as Zarith ints. In order to make the model as close to the implementation as possible, we refined the specification using machine integer types, following the ones used in the OCaml code. This not only means that we prove the absence of arithmetic over/underflow in our model, but also that the extracted code should have the same performance (w.r.t. arithmetic operations) as the original implementation.

In the OCaml implementation from lib_sapling, both Stdint and OCaml native ints are used. For the former, F* already provides [U]IntN modules in the standard library, that are extracted to their Stdint counterpart.

Things are a little trickier for OCaml’s native int type, whose representation is architecture dependent. For this model, we assume a 64 bit architecture, so we need to deal with 63 bit integers. To do so, we provide an interface module Int63 following the structure of those provided by the standard library. We also include an OCaml realization for it (ml/Int63.ml), that implements them using the native int type.

Refining the specification using these representations of machine integers proved to be much smoother than what we initially feared. There was a downside, however, from an extraction point of view, which will be discussed in a future section.

Main Tree model

Now we can delve into the main part of the model for IMTs, defined in the Tree module. There are three important components that build up this model, each implemented as a sub-module. The first of these is Tree.Data, which introduces the underlying data type that we use to represent IMTs.

type tree α β : Type =
  | Empty : tree α β
  | Leaf : α -> tree α β
  | Node : β -> l:tree α β -> r:tree α β -> tree α β

As implied by its name, the type tree models lib_sapling‘s type of the same name. The only difference is that in our model, the types of the values of the leaves and internal nodes are parameterized. For our IMT spec, α will generally be instantiated with the type for commitments and β with the one for hashes.

This tree type, then, is a direct translation of its OCaml counterpart. Things only start getting interesting when the type is refined with richer properties. Hence, in Tree.Properties the core properties used to specify IMTs are presented, alongside some lemmas about them. Let’s take a look at the two properties that give IMTs their name.

val incremental : h:C.v_height -> t:tree α β -> Tot bool
let incremental h t =
    valid t && left_leaning t && balanced t && has_height h t

The first one is incrementality. This property essentially states that the leaves are filled in consecutive positions from left to right. Additionally, some other structural invariants are included in this property, such as the fact that no internal node should have two empty sub-trees. The next one, merkle, checks that the values in the internal nodes actually are the hashes of their sub-trees.

val merkle :
  h:C.v_height ->
  t:tree C.t_C C.t_H{valid t /\ has_height h t /\ balanced t} ->
  Tot bool
  (decreases %[v63 h])
let rec merkle h t =
  match t with
  | Empty -> true
  | Leaf _ -> true
  | Node ha l r ->
    let hs = pred63 h in
    ha = hash hs l r && merkle hs l && merkle hs r

With these two properties, we can define the following type:

type imt (h:C.v_height) : Type  =
   t:tree C.t_C C.t_H {incremental h t && merkle h t}

So, imt h is the type of incremental Merkle trees of height h. As we’ll see next, this type makes it quite simple and elegant to specify that a tree manipulating function preserves the IMT invariants directly on the function’s signature.

The third component of the core IMT model is Tree.Methods, where the functions that operate on IMTs are defined. We’ll focus on insertion, as it’s the only way to modify an IMT. The insert_list function, whose signature we show next, inserts a list of commitments into a tree. Its implementation is a model of insert, from lib_sapling.

val insert_list :
  vs:list C.t_C ->
  pos:uint32 ->
  h:C.v_height ->
  t:(imt h) ->
  Pure (imt h)
       (requires (length vs <= pow2 (v63 h) - UInt32.v pos
                  && UInt32.v pos = count_leaves t))
       (ensures (fun t' ->  to_list t' = to_list t @ vs))
       (decreases %[v63 h;0])

Let’s break down this signature. The first three arguments are quite simple: a list of commitments vs, the position pos of the tree in which those commitments will be inserted, and the height h of the tree (which is bounded by 32). Then, the function takes the initial tree t, which must be an IMT of height h. The return type first states that the function is Pure, meaning that it terminates and has no side-effects. After that, it describes that the function returns an IMT of height h. Finally, we have the requires and ensures clauses, which define additional pre- and post-conditions. For pre-conditions, the function requires that the list of commitments fits on the tree, and that pos is the next position to fill. As for post-conditions, we have that the list of elements of the resulting tree must be equal to appending vs to that of t. The decreases tag is just aiding the termination-checker.

It’s important to note that some pre- and post-conditions are implicit (those in the types of the arguments and return value), while others are explicit (requires/ensures clauses). In this example, the implicit properties are used to state the preservation of the IMT invariants, while the explicit clauses are reserved for the functional specification of insert_list.

Insertion identities

There were three insertion functions defined in the initial IMT specification:

insert_model: a naive implementation that checks if the left sub-tree is full to determine the path,
insert_pow: an optimization carrying the next available position,
insert_list: a further optimization for batch insertion. This is the one found in the OCaml implementation.

In the Tree.InsertionIdentities module the first two are defined, and all three are proven to be semantically equivalent.

Full Merkle Tree Translation

As explained in a previous section, our model uses a compressed representation of IMTs, following their implementation. An important verification target was to prove a translation from this representation to a standard Merkle tree model. To do so, we used the off-the-shelf Merkle tree model from F*, being able to adapt the Merkle properties proven in this model for our IMT type.

Certified OCaml implementation extraction

For the certified OCaml implementation, we need to extract the Tree.Data and Tree.Methods modules. The module ml/Tree.ml bundles these two extracted modules, and should replace the original OCaml implementation.

To perform the extraction, we just ask F* to extract these modules to OCaml. This will generate the OCaml files in _out/ containing the extracted code (alongside the extracted specification).

We can now take a brief look at some of the extracted OCaml code. First, the tree is extracted to the expected OCaml part.

type ('a, 'b) tree =
  | Empty
  | Leaf of 'a
  | Node of 'b * ('a, 'b) tree * ('a, 'b) tree

More interestingly, the refined imt h type becomes:

type 'h imt = (Commitments.t_C, Commitments.t_H) Tree_Data.tree

Although the refinements are erased as expected, we’re left with a phantom type parameter h, which would be instantiated to unit in the rest of the extracted code. However, as the imt type is tagged with inline_for_extraction, it’s replaced by its definition. As for the insert_list function, the resulting OCaml code is:

let rec (insert_list :
  Commitments.t_C list ->
    Tree_Data.uint32 ->
      Commitments.v_height ->
        (Commitments.t_C, Commitments.t_H) Tree_Data.tree ->
          (Commitments.t_C, Commitments.t_H) Tree_Data.tree)
  =
  fun vs ->
    fun pos ->
      fun h ->
        fun t ->
          match (t, (Int63.eq h (0)), vs) with
          | (Tree_Data.Empty, true, v::[]) -> Tree_Data.Leaf v
          | (uu___, uu___1, []) -> t
          | (Tree_Data.Empty, uu___, uu___1) ->
              insert_node (Tree_Data.pred63 h) Tree_Data.Empty
                Tree_Data.Empty pos vs
          | (Tree_Data.Node (uu___, l, r), uu___1, uu___2) ->
              insert_node (Tree_Data.pred63 h) l r pos vs

Here we see that the final code looks quite similar to the original version, apart from some very clearly automatically generated names for the unused variables in patterns.

The extracted code has important shortcomings, some of which we managed to solve, as explained in a future section.

The Makefile target to run the extraction is:

make extract

Specification extraction and QCheck integration

In our concrete case, we put all the splices in a new module called Tree.PrePost.fst. This is not only helpful for organizing the extracted code, but also allows the user to verify the IMT model without running the specification extraction tactic (which can be slow).

In turn, running the splices will generate the OCaml file _out/Tree_PrePost.ml, containing the pre- and post-conditions and tests spliced with the extract_spec tactic.

We can take a look at the extracted property for insert_list‘s pre-condition:

let (insert_list_pre :
  Commitments.t_C list ->
    Stdint.Uint32.t ->
      Int63.t ->
        (Commitments.t_C, Commitments.t_H) Tree_Data.tree -> bool)
  =
  fun vs ->
    fun pos ->
      fun h ->
        fun t ->
          (((Int63.gte h (0)) &&
              (Int63.lte h Commitments.max_height))
             &&
             ((Tree_Properties.incremental h t) &&
                (Tree_Properties.merkle h t)))
            &&
            (((Util.(Z.of_int << List.length) vs) <=
                ((Prims.pow2 (Tree_Data.v63 h)) - (Util.(Z.of_int << Stdint.Uint32.to_int) pos)))
               && ((Util.(Z.of_int << Stdint.Uint32.to_int) pos) = (Tree_Properties.count_leaves t)))

Although it may be somewhat hard to see at once, this is just the conjunction of the refinements in the function’s signature. Note that the << operator is just function composition.

Finally, we want to actually test the certified implementation against the extracted properties. To do so, we first complete the definition of the extracted QCheck generator templates, which can be found in the ml/generators.ml. Next, we copy the extracted QCheck tests in ml/test_internal_tree.ml. For instance, one of these tests is:

let (test_insert_model_spec : FStar_QCheck.test_t) =
  FStar_QCheck.test_make "insert_model_spec" gen_insert_model_args
    (fun a0 ->
       match a0 with
       | Prims.Mkdtuple2 (v, Prims.Mkdtuple2 (h, t)) ->
           (FStar_QCheck.assume_ (insert_model_pre v h t);
            insert_model_post v h t
              (Tree_InsertionIdentities.insert_model v h t)))

These splices are executed by running:

make extract

To compile the tests, just run:

make test

This will generate an executable called test.exe in the _out directory, which runs the tests.

Testing the Protocol Sapling Implementation

As mentioned earlier, there are two Sapling implementations in Tezos. Up to now, we’ve solely focused on the one found in lib_sapling, for its relative simplicity. The other implementation is part of the Tezos Protocol, and is written in a stateful manner to make use of the Context (Tezos’ key-value store). This makes this IMT implementation much harder to verify directly than the ADT-based version.

This is a perfect opportunity to recall the verification spectrum we proposed in the introduction. The IMT implementation from lib_sapling sat at one end of this spectrum: it was easy to verify it directly and it could be replaced by the extracted OCaml code. The PBTs extracted from the specification in this case help close a somewhat narrow verification gap (mostly assumptions about machine integers, lists, cryptographic primitive and F*’s extraction mechanism).

The protocol version, on the other hand, stands at the opposite end of the spectrum. Due to its pervasive use of the Context and Lwt (an OCaml library for promises and concurrent I/O with a monadic interface), doing a full verification and replacing the code with the extracted specification becomes unfeasible. Then, the verification effort in this case is limited to a model of the implementation, whose extracted code then becomes a reference implementation against which the complex version can be tested. In our particular case, the work of specifying and verifying an abstract model for IMTs is already done, as we can use the work done for lib_sapling to this end.

Following this idea, we decided to test the protocol Sapling implementation against the certified and tested extracted ADT implementation. This is implemented on the ml/test/proto-test.ml module. Several tests were considered, which are discussed in this issue.

The key to carrying this out was to define a projection function (project_tree) from the context tree representation to the IMT ADT. With this, we can describe the test we implemented with the following diagram:

This diagram is divided in two by the vertical dotted line that delimits the protocol IMTs to the ADT IMTs. For instance, t0 stands for a context in which an IMT is stored, following the representation defined in sapling_storage.ml.

Here, proj0 and proj1 are calls to the projection function, and insert' is the IMT insertion function from our ADT spec (insert_list).

The function that we want to test is insert, the protocol IMT insertion (actually called add). Our test consists of checking the commutativity of this diagram. This means, to check that, for any given t0, proj1 (insert t0) is equal to insert' (proj0 t0).

By performing this test we ensure that the complex protocol insertion function is equivalent to the one we know is well-behaved, modulo the projections.

Our experience of working with F*

In this section we’ll summarize our experience with the F* language, and outline our contributions to the F* project.

General evaluation of F* for verifying OCaml code

One aspect that makes F* suitable for verifying OCaml code is the similarity of their syntax and type system. However, F* is not a superset of OCaml, so the modelling of a piece of OCaml code requires some varying degree of manual work. This will depend on the source program, as the translation of certain OCaml constructs (such as functors) is not trivial. In our experience, this process was quite straightforward, but that might not be the case for code bases that make heavy use of first-class modules.

Another of F*’s key selling points is its hybrid nature. The language’s SMT support did in fact allow for many of our model’s properties to be proven with little to no work, while being flexible enough to spell out more complex proofs. This, however, came at a cost. Overreliance on the SMT backend often led to increasingly flaky (i.e. unstable) proofs. A small change in a seemingly unrelated lemma might cause a previously proven theorem to no longer be accepted. This issue is often exacerbated by uninformative error messages that require some deciphering, which might negate some of the time savings gained by proof automation. All this should not be read as taking merit away from F*’s hybrid approach, but rather as a remark on the importance of finding the right balance between explicitness and reliance on automation when writing a proof.

Contributions to F*

As previously mentioned, during the duration of this project we encountered some of F*’s rough edges, which are to be expected in a research language. After discussing them with some of the language’s designers we agreed to try to provide solutions for some of these shortcomings, under their guidance. We can split these contributions into two groups.

Extraction of machine integers

After going to the trouble of refining the specification with machine integer types, it came as a surprise that the extracted OCaml code using machine integers was quite inefficient.

This was mainly due to how machine integer constants were extracted. The code generated for a literal such as 42l was the following:

FStar.Int32.int_to_t (Prims.of_int 42)

where int_to_t is defined as:

let int_to_t x = Stdint.Int32.of_string (Z.to_string x)

This means that for every machine integer constant, the extracted OCaml code would:

Convert it to a Zarith int,
Convert that Zarith int into a string, and
Parse that string to get the desired Stdint int.

This added a significant unnecessary overhead, considering that the same can be achieved by the code:

Stdint.Int32.of_int 42

This was precisely the solution we implemented in PR FStar/#2325, which has already been merged.

A particular case of this issue was the extraction of 0 and 1 constants, which followed a similar logic. These cases are special, because all the Stdint modules, which implement F*’s machine integers, expose zero and one constants. So the solution for this was much easier: to add these constants to the machine integer interface. This is implemented by PR FStar/#2306, which is also merged.

Another source of inefficiency is the Int.Cast module, that defines the conversion between machine integer types. The issue here is almost the same as in the extraction of literals. For casting a value of a machine integer type to another, the extracted code converts the value into a string, parses the string into a Zarith int, which in turn gets converted into a new string that finally gets parsed as a value of the desired type. Again, this means that casting has a significant overhead. This can be avoided by providing OCaml realization for this module, instead of extracting it. The realization can use Stdint casts directly, improving its performance. We implemented this on PR FStar/#2315, which hasn’t been merged yet.

Meta-F*

The biggest contribution we made to F*, and the most crucial to our project, was adding support for mutually recursive let-bindings to the reflected syntax. Reflection allows inspecting, manipulating and creating F* terms from within F*, and was the tool we used for extracting specifications.

However, F*’s reflection AST didn’t support mutually recursive let-bindings, so our toolchain wouldn’t work for the function we cared the most about (insert_list).

After discussion with the F* team, and several iterations, we arrived to the solution in PR FStar/#2291, which has already been merged. This was a (though slight) breaking change, so we also had to fix some uses of the reflection interface in PR HACL-star/#471.

Finally, we encountered an issue when extracting polymorphic functions built through Meta-F*. In short, if a function’s type had more than one universally quantified type variable, then all except the first one would be instantiated as unit in the extracted code. We solved this issue in PR FStar/#2305, which is yet to be merged, but has already been reviewed. Although the fix is quite small, finding the source of this error took some time.

Unsolved issues with F*

Even though we managed to overcome the limitations of F* outlined in section [ref to section], there’s still one problem with the extracted OCaml code: its reliance on fstarlib (the OCaml implementation of F*’s standard library).

The issue lies in the fact that the extracted code uses fstarlib even when that could be avoided. For instance, primitive types (such as bool) are extracted as their F* standard library symbols (Prims.bool), instead of being extracted directly to their OCaml counterpart. Although both alternatives are semantically equivalent — Prims.bool is defined as a synonym to bool — this means that the extracted OCaml code will depend on fstarlib, even if the only part of the standard library that is used is the primitive types. Then, if one wanted to plug a module extracted form F* into an existing codebase, a number of possibly unnecessary dependencies needed by fstarlib might be added (e.g. batteries, yojson), greatly hindering the adoption of such a workflow.

There are possible, though unsatisfactory, workarounds to this issue. One of them is to manually tinker the extracted code to avoid the uses of the F* types, which is essentially what we did with the cleanup script. It goes without saying that doing this somewhat defeats the purpose of having a certified implementation, so one has to be very careful when doing so.

Before we go, we discuss projects that address similar problems through different approaches.

Extraction vs Lightweight Validation

Another well-established workflow is to directly extract correct implementations automatically from mechanized implementations. Projects like the Verified Software Toolchain narrow the verification gap by building a vertical stack of mechanized components, building on the foundations provided by the Compcert certified C compiler. Closer to F*, the KaRaMeL project allows the extraction of certified programs written in Low*, a subset of F*, to C.

Even when this approach has been successfully adopted in large industrial contexts, such as the verification of the seL4 micro-kernel, it would be hard to apply to a codebase like Tezos’, which is naturally designed to evolve. Instead, we propose to tackle the verification gap by leveraging the automated extraction of property-based tests from a formal specification in order to validate that the implementation complies with their formal model.

QuickChick

QuickChick is a randomized property-based testing plugin for the Coq Proof Assistant. Its central idea is foundational testing, which means that the testing code is formally verified to be testing the desired property. Additionally, QuickChick supports automatic derivation of generators for data satisfying a particular predicate. Our approach is more light-weight: implementing QuickChick’s features for F* would have required a very significant effort, and QCheck was already integrated into the Octez test suite.

There has also been some recent work leveraging QuickChick for testing OCaml code. Their workflow is the opposite of ours: they use coq-of-ocaml to translate OCaml definitions into Coq, and then rely on QuickChick to test the implementation. Moreover, this approach suffers from the limitation that coq-of-ocaml cannot fully translate all OCaml features into Gallina (Coq) without making arbitrary design decisions which carry semantic weight.

Monolith

Monolith is a framework for testing OCaml libraries, that supports random testing and fuzzing. The user has to specify the library’s interface (types and operations) and provide a reference implementation. Then, Monolith runs sequences of operations trying to find unexpected behaviours.

Given that we propose extracting reference implementations from formally verified abstract models, it would be interesting to study if those could be integrated with Monolith. This way, the open issue of defining generators could be solved by making use of Monolith’s fuzzing and random testing features.

Extracting effectful specifications

Currently, our tool targets only pure F* programs with Boolean pre- and post-conditions. An interesting line of future work would be to extend it to support other F* monadic effects, and to leverage the specification extraction mechanism to test arbitrary monadic OCaml code implementing such effects. For example, in our setting this could be used for directly extracting a specification for the more complex stateful implementation of Incremental Merkle Trees.

Two weeks at the OPLSS 2022 — and some reflections on the elegance of Call-by-Push-Value

2022-08-03T12:00:00+02:00

Nomadic Labs PhD student Colin Gonzalez attended the Oregon Programming Language Summer School, that had the Tezos Foundation as a sponsor in 2022. In this blog post he reflects on his experiences.

One of the perks of preparing a PhD is that sometimes you get to travel to conferences or summer schools. A summer school is like a summer camp where the main activity is attending lectures with fellow grad students from around the world. And on your free time you gather with people to grab a couple of drinks and discuss the lectures but also your own PhD experience. I recently got to attend this year’s edition of Oregon Programming Language Summer School in Eugene, OR.

OPLSS usually lasts two weeks. The first one covers a set of introductory courses on proof theory, type theory and logic; the second week, selected lecturers among the top researchers in the programming languages field are invited to discuss more advanced topics in a privileged setting where students and lecturers share the campus in very casual fashion. This creates opportunities for great discussions over dinner and lunch.

The first few days were rhythmed by a combination of proof theory by Frank Pfenning, type theory by Thorsten Altenkirch, algebraic programming by Jeremy Gibbons and game semantics by Pierre-Louis Curien. Eventually the week closed with a great course on classical realizability by Paul Downen and rewriting theory with Sylvia Ghilezan.

Admittedly, one of the most difficult lectures was the one on Game Semantics. A lecture during which you would glance at your neighbour with a puzzling face and asking quietly: “Do you understand anything?”. The answer would often be just “No” and the same puzzling face. It was for most of us a new topic.

All these topics, either new or a refresh, were a great preparation for the hours we would spend with Robert Harper discussing Logical Relations or implementing a dependent-type theory type-checker with Stephanie Weirich. Also, it was the opportunity to have Stephanie Balzer walking us through the world of Pi-Calculus and Session Types and hearing Adam Chlipala explain how to use a proof assistant like Coq to certify production grade software and hardware. Finally the lectures of Sam Lindley and Steve Zdancevic were a great pair talking respectively of effect handlers and using the free monad in DeepSpec to formalise and verify imperative programs.

I noticed that the lecturers kept connecting Call-by-Push-Value[1] with linear logic and type-theory. Call-by-Push-Value, CBPV for short, which plays a central role in my PhD research, happens to be a very active topic in the US, and it was a pleasant surprise to get so much new insight into this domain during the conference. CBPV is a formalisation of the connection between values and computations. It was an amazing time to get a new perspective on my work.

We can observe this duality when comparing OCaml and Haskell. While they share similar semantics and features, they differ in the way they treat values. We could say that in Haskell, every program is a value in its own right. Whereas in OCaml, values are programs that are in a special form that can’t be reduced any further. Take for instance the very short OCaml program (fun x -> x) (1 + 1): OCaml will first replace 1 + 1 with the value 2, before handing it to the function, while Haskell would happily pass (1 + 1) as an unevaluated argument to the function. Eventually, they both give the same result. But this tiny difference has important implications: OCaml and Haskell programs do not behave the same because computations happen in different orders. And this is where CBPV shines: it can be seen as a core language capable of interpreting both OCaml and Haskell programs correctly.

And the obvious question arises: why is CBPV of interest while doing a PhD at Nomadic Labs? In my PhD, I work on compiling spreadsheets, considered as programs, to Michelson smart contracts, and I use CBPV as a formal framework to reason about the semantics of these programs.

Overall, OPLSS 2022 was a great experience. I got to enjoy some of the best CS lectures I have ever attended, and I gained new perspectives and insight on my own work.

[1]Paul Blain Levy. 1999. Call-by-Push-Value: A Subsuming Paradigm.

Announcing Tezos’ 11th protocol upgrade proposal, “Kathmandu”

2022-07-13T22:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

We were proud to see Jakarta go live on June 28th, 2022. In keeping with our policy of proposing upgrades on a regular schedule, we’re happy to announce our next Tezos protocol proposal, Kathmandu. As usual, Kathmandu’s true name is its hash, which is PtKathmankSpLLDALzWw7CGD2j2MtyveTwboEYokqUCP4a1LxMg.

The main features are:

Smart contract optimistic rollups enabled on bleeding edge testnets
Pipelined validation of manager operations (ongoing)
Improved randomness with integration of Verifiable Delay Functions
Support for tailored governance for permanent testnets (Ghostnet)
Event logging in Michelson smart contracts
New operation for increasing paid storage of a smart contract

For more details, see our Kathmandu preview post. A deeper technical description can be found in the protocol proposal’s technical documentation, and a complete list of changes is provided in the changelog.

Continued efforts to strengthen testing

As always, we encourage the community to participate in Tezos testnets, which is of great help in the protocol development process. There are different options:

Protocol testing on Kathmandunet

We are happy to see Beacon and Taquito having already added support for Kathmandunet. It is critical to have as many bakers and builders as possible participating in this testnet, by running nodes, producing blocks and deploying apps and infrastructure.

Nomadic Labs will publish a release candidate for a new Octez suite version, v14.0~rc1, within a few days. It will include the daemon binaries that enable participation in the Kathmandunet test network.

Should the Kathmandu protocol proposal be accepted by the community, v14 of Octez (or later) will be required to participate in consensus due to necessary changes introduced to the protocol environment.

Smart contract rollup testing on bleeding edge testnets

The smart contract optimistic rollups core business logic and Wasm PVM are in a mature enough state for the community to start developing and testing applications and infrastructure on testnets.

As mentioned in our Kathmandu preview post, smart contract rollups will not be enabled on Tezos mainnet by the Kathmandu protocol proposal. This is to give the community extra time to develop and test infrastructure and Layer 2 applications.

Given that Kathmandunet is meant to mirror the current protocol proposal, it will not have support for smart contract rollups enabled. Instead, we encourage the community to start building and testing on the bleeding edge Mondaynet and Dailynet testnets.

Continuous dApp testing on Ghostnet

Ghostnet is live! It’s a permanent testnet that follows Mainnet upgrades, meaning dApp developers will no longer have to redeploy to a new testnet after each upgrade. It was activated with a User Activated Upgrade (Ithacanet -> Jakarta), and another UAU will be necessary to migrate to Kathamandu. After that, further evolution of Ghostnet will happen via a new upgrade mechanism managed by Oxhead Alpha.

Further details are available in the TZIP advocating for this feature. The implementation is a contribution of G.-B. Fefe, a community member not affiliated with the core developing teams behind this protocol proposal. For this work, an invoice of 3000 XTZ is included in the proposal.

We are interested in the communty’s input on the optimal date for testing migration of Ghostnet to the next Mainnet protocol. The migration from Ithaca to Jakarta was performed a few hours before Mainnet activation. For Kathmandu, we are considering to do Ghostnet migration 72 hours prior to Mainnet activation, should the proposal be adopted.

However, we would like to empirically figure out the best time (during the Adoption period) to perform the migration. The community’s participation in the test network and feedback will be most welcome in this process.

Why the next generation of optimistic rollups are a game-changer for Tezos

2022-07-11T14:00:00+02:00

TL;DR: It’s not just about smart contracts and scaling. The next generation of optimistic rollups are a plug’n’play solution for running any software on Tezos.

This is a joint post from Nomadic Labs, TriliTech, Tarides, and Functori.

With the Jakarta upgrade and Transactions Optimistic Rollups successfully activated, it’s time to look ahead to the next major step in Tezos protocol development: Smart Contract Optimistic Rollups (SCORUs).

These next-generation optimistic rollups don’t just enable smart contracts. They are a platform for running any type of software, including emulating the Ethereum Virtual Machine (EVM), and having all computation made verifiable on the Tezos blockchain. That is a game changer for Tezos functionality.

As explained in the recent preview of the upcoming ‘Kathmandu’ protocol upgrade proposal, they will hit test networks soon, while mainnet activation is expected to be included in ‘L’ protocol upgrade proposal.

For a general understanding of optimistic rollups and the rationale behind them as a scaling solution for Tezos, check out our blog post on this topic. See also our blog post on Transaction Optimistic Rollups (TORUs).

Let’s jump in.

From transactions to general computation

Where TORUs have a fixed design focused on enabling higher transaction throughput while being highly decentralized, SCORUs can do whatever they are programmed to. Executing smart contracts is just one of many use cases. In fact, general computation optimistic rollups might have been a better description, but here we are.

For this, a different rollup dispute mechanism is needed.

TORUs are limited to transactions due to a simpler design of this mechanism. If a commitment published by a node operator is wrong, an honest operator can neutralize it by broadcasting a single L1 operation containing proof.

This one-step procedure means the proof must be small enough to be included in a single L1 operation, putting a limitation on the number and complexity of operations you can do between two commitments.

Smart contract rollups remove this limitation by using a different, more advanced dispute mechanism. Rather than being settled in one operation, it works as an interactive process between two rollup node operators.

This refutation game plays out over multiple rounds and Layer 1 blocks. The parties start broad and step by step narrow down the area of contention. When the exact point is found, and the required proof is small enough to be included in a Layer 1 block, the dispute can be resolved.

The bottom line: Interactive proofs remove the limit to the complexity of rollup operations that can be handled by Layer 1, opening the door for general computation.

	Transaction rollup	Smart contract rollup
Functionality	Asset transfers (simple)	General computation (complex)
Who can be a node operator	Anyone	Anyone
User restrictions	None	Configurable at deployment
Admin rights	None	Configurable at deployment
Dispute mechanism	Single step (non-interactive)	Refutation game (interactive)
Operation complexity	Limited	Unbounded

The rollup “computer”

To understand the design of SCORUs, think of a computer connected to a network.

The rollup itself is like an IP address on the network, through which users can access the computer “hardware” – a virtual machine run by rollup node operators.

The hardware has no inherent functionality. All functionality comes with an operating system that needs to be installed first. This we call a kernel.

A kernel can be an Ethereum Virtual Machine (EVM) emulation, enabling Solidity smart contracts. It can also be a simpler, single-application kernel that focuses on, e.g., transactions of assets. Or something completely different. As such, there are different ways to design a rollup “stack”.

The Tezos rollup “stack”
*Computer analogy*	EVM rollup example	Transaction rollup example
	Solidity smart contracts
Operating system	Kernel (EVM engine)	Kernel (Transaction engine)
Hardware	Virtual Machine (Wasm)	Virtual Machine (Wasm)
IP address	Rollup address	Rollup address
Network	Tezos blockchain	Tezos blockchain

At the time of launch, we will make a simple kernel with transaction functionality similar to TORUs available for demonstration purposes. A proof-of-concept kernel in the form of a token exchange using BLS signatures will also be available. The latter is meant to be integrated with the in-development data-availability layer to demonstrate the next-level scaling potential of Tezos. We are also actively working on an EVM kernel, though with a release date that is yet to be determined, and a Michelson kernel is being researched.

As with an operating system, kernels can be updated or replaced at any time, though it depends on how the host rollup is configured. Staying with the server hardware analogy, these rollups come with a kind of boot sector which defines a number of its properties at the time of deployment, such as who can update the kernel, if anyone.

The Proof-generating Virtual Machine

With all this talk of hardware and operating systems, keep in mind that the whole point of rollups is to execute off-chain and verify on-chain.

Rollup operations are processed off-chain, on Layer 2, by dedicated rollup nodes. They continuously post commitments on the Tezos main chain, Layer 1, representing updates to the state of the rollup.

As it works with optimistic rollups, commitments are treated by default as correct but can be disputed within a given timeframe by anyone else running a rollup node. For those situations, a Proof-generating Virtual Machine (PVM) is implemented in the Tezos protocol. It’s a slightly modified virtual machine that can output a proof that operations have been processed correctly. The rollup node can use this implementation to produce a proof and post it to Layer 1, where it will be checked by the Layer 1 nodes.

This on-chain process is only activated when a dispute needs to be resolved, and only for a small execution step, once the exact point of contention has been pinpointed through a refutation game.

WebAssembly as the (initial) base layer

The first virtual machine, and PVM, for Tezos rollups will run WebAssembly, or Wasm – a low-level assembly-like language, which is well on track to become a standard for high-performance applications on the web. It’s designed as a compilation target for other languages, much like how desktop programs are compiled into a binary format.

Starting with a Wasm VM means Tezos will instantly be able to welcome a much larger developer community by supporting a number of popular programming languages. Notably C, C++, and Rust all have good Wasm-compilers. Support for blockchain specific languages like Tezos’ Michelson and Ethereum’s Solidity can be added through kernels or with new (proof-generating) virtual machines added in future protocol upgrades.

For integrating Wasm, we had the advantage that the reference interpreter of Wasm is written in the same language used for the Tezos protocol: OCaml! For any programming language, you need to specify how it works. One way to do this is to program a virtual machine declaratively – that is, in a way that makes it easy for others to understand how it’s meant to work by simply reading the code. That is a reference interpreter.

So we have been able to integrate the official definition of Wasm into Tezos – with some modifications enabling it to produce proofs about its execution.

Calling all builders

Some kernels will be provided by protocol developer teams, but we encourage ecosystem builders to start developing their own, innovative kernels.

For this we provide a native token bridge based on Michelson tickets, similar to the one used for TORUs, and a bare-bones execution environment supporting token exchanges between Layer 1 and Layer 2, interaction with Layer 1 smart contracts, gas monitoring, and self-governance.

We also provide safe Rust bindings to the Wasm PVM. The point is to let kernel developers think less about I/O on the blockchain, how blocks work etc., and instead provide an abstract layer that makes it more like working with files as in a standard computer program.

The kernel has access to the full state of the rollup (Layer 2), but any I/O with Layer 1 has to go through the rollups inbox/outbox, asynchronously. For this task, the bindings are like an API, so, for instance, if you want to send an asset from the rollup to a Layer 1 address, there’s a ready-made function for that.

With these bindings, developers can write a kernel, compile it to WebAssembly, and know that its interaction with the Tezos blockchain will be reliable and secure. We chose Rust because it is a popular language with a mature toolchain providing robust compilation to Wasm.

Opening up Tezos

With this design of optimistic rollups, Tezos opens itself up to interacting with all kinds of software systems and making their operation verifiable on the Tezos blockchain. This puts Tezos solidly at the forefront of blockchain technology.

Wasm establishes a highly flexible and future-proof base layer that doesn’t exclude support for currently popular execution environments like the EVM, or the well known security-focused environment of Michelson, while still inviting Rust and C/C++ developers on board.

Each rollup will be highly configurable, and will have governance modules for kernel upgrades. In this sense, a rollup will offer everything that parachains, subnets, app-chains and similar scaling solutions used by other blockchains do.

The difference from most of these is that rollups on Tezos offer all that while being secured by a time-tested, highly decentralized and censorship resistant Layer 1: the Tezos main chain.

We believe this to be the future of blockchains, and we are excited to be building it with the Tezos community!

Jakarta, the latest Tezos upgrade, is LIVE!

2022-06-29T01:00:00+02:00

On 28 June 2022 23:31 CET, the Tezos blockchain successfully upgraded by activating the Jakarta proposal (technically Jakarta 2) at block #2490369.

This tenth Tezos protocol upgrade was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

We were happy to observe 2/3 of the total stake having performed the required upgrade to Octez v13 or TezEdge v3 a day before activation. Still, we encourage bakers to upgrade to new required node versions at the earliest possible time. This minimizes the risk of the network getting stuck (or advancing slowly) due to the consensus requirements introduced with Tenderbake.

Further updates to the protocol environment, and hence mandatory node and baker upgrades, will be necessary for features in upcoming upgrade proposals, such as Smart Contract Optimistic Rollups. We greatly appreciate the community’s help and efforts in keeping Tezos at the forefront of blockchain innovation.

Now, let’s celebrate the new features and improvements included in the Jakarta protocol! Not least the first scaling solution on Tezos, and one of few truly decentralized Layer 2 solutions out there: Transaction Optimistic Rollups (TORUs).

TORUs play the important role of:

addressing short-term scaling needs;
demonstrating Tezos’ ability to implement scaling solutions through protocol upgrades;
enabling ecosystem developers to build rollup infrastructure;
paving the way for Smart Contract Optimistic Rollups, coming in a future proposal.

For more on rollups, see our blog posts introducing TORUs and outlining the rollup-based scaling strategy. Those interested in running a rollup node can check out this rollup tutorial.

Other changes in Jakarta include:

Tickets hardening: The protocol now explicitly tracks ownership of Michelson tickets (see also ‘Tickets for Dummies’) by checking ticket creation and ownership changes against a global balance table. This adds extra protection against attempts to forge tickets and increases security for Layer 2 solutions that use tickets to represent Layer 1 assets (e.g., TORUs). With this change, tickets are no longer considered experimental and are believed safe for use on mainnet.
A safer Sapling integration: The integration of Sapling transactions into Michelson smart contracts has been changed to a new, safer design. With Jakarta’s activation it is now only possible to originate Sapling smart contracts which conform to the new version.
New Liquidity Baking voting: The Liquidity Baking Escape Hatch is now a “Liquidity Baking Toggle Vote”, with options On, Off, or Pass. Furthermore, a deactivation is no longer permanent and can be reversed by a later change in votes. More information here.
Michelson interpreter improvements: In particular, type safety and performance has been improved. A few smart contracts which relied on legacy features have been patched to be compliant with the modern Michelson specification.
Rolls are no more: The Jakarta protocol proposal redefines the computation of delegates’ voting power in the self-amendment process. Instead of being measured in terms of rolls, it is now directly proportional to a delegate’s stake.

As always, the upgrade train doesn’t stop here. We are working hard on finalizing the upcoming Kathmandu proposal – and on further laying the groundwork for Smart Contract Optimistic Rollups, a Data Availability Layer, and other great improvements to Tezos.

From Jakarta to Kathmandu non-stop

2022-06-23T15:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

The Tezos community is preparing for exciting new developments in the coming days: the activation of the Jakarta protocol upgrade will be the first step towards massive scalability, with the introduction of the first Tezos Layer 2 solution: Transaction Optimistic Rollups, or TORUs.

Today we look a bit further ahead. However beautiful and lively a port Jakarta is, the Tezos protocol’s path to scalability doesn’t stop there. After mending our sails and reloading our batteries, we are eager to present to the community what’s next in our pipeline. We hope you brought your mountain gear with you – next stop is Kathmandu.

In this article we give you a preview of the different projects that have kept us busy in the last months. We will describe the features which, after some stabilization in the coming weeks, are likely to be part of Kathmandu, Tezos’ 11th protocol upgrade proposal. But, first we will focus on Smart Contract Optimistic RollUps (aka SCORUs), our next Layer 2 scalability solution, and how big new features like this one require us to rethink the development process for protocol upgrades.

SCORUs are rolling into testsnets

As we hinted in an earlier scalability preview, SCORUs provide a generic infrastructure to implement any computational device, as long as its semantics can be described as a Proof-producing Virtual Machine (PVM), in the Tezos protocol.

This design allows us to continuously develop and deploy new supported PVMs, and thereby new execution environments, via protocol amendments. In our upcoming protocol proposal, we plan to introduce a first PVM supporting WebAssembly (WASM) — although only available via bleeding-edge test networks.

WebAssembly-based rollups come with a native token bridge based on Tickets, similar to the one used for TORUs, and with a bare-bones execution environment supporting token exchanges between Layer 1 and Layer 2, gas monitoring, and self-governance. On top of this infrastructure, users can write custom rollup kernels, ranging from application-specific rollups to full-scale “parachains”.

Finally, a proof-of-concept rollup kernel will be also provided with the proposal: a token exchange with BLS signature verification. We aim to integrate this with the in-development Data Availability Layer to demonstrate scaling up to higher throughput values.

Start building with SCORUs on Mondaynet

Starting with the Kathmandu protocol proposal, we want to take a new approach to the protocol development process. Taking in feedback from the recent protocol proposal periods, Kathmandu marks a move towards a “continuous development” process that reduces friction and enables ecosystem developers to build support for new features incrementally in close collaboration with the core protocol developers.

We observe that the current development rhythm does not always provide sufficient time for infrastructure and application developers to embrace and support the new protocol features, including experimenting with them on test networks.

We are therefore changing the process of integrating major new features into Tezos protocol proposals so as to:

make sure that together with ecosystems builders, we can provide end-to-end support for new features of the Tezos protocol;
enable the community to provide valuable feedback to protocol developers before features are locked by the governance process;
increase the chance of discovering bugs before proposals are injected and avoiding re-injections and protocol overrides.

The SCORU core business logic and the Wasm PVM are in a mature enough state for the community to start developing and testing applications and infrastructure on testnets. But, given the new approach outlined above, we want to provide the community with extra time to develop and test infrastructure and Layer 2 applications. As a result, SCORUs will not be enabled on Tezos Mainnet by the Kathmandu protocol proposal.

Given that the “Kathmandunet” test network is meant to mirror the protocol proposal subject to on-chain governance, it will not be spawned with support for SCORUs enabled either. Instead, we encourage the community to start building and testing support for SCORUs on the bleeding edge Mondaynet and Dailynet testnets.

What’s coming in Kathmandu

In the following, we focus on the features we are confident will be enabled in the upcoming Kathmandu protocol proposal.

First steps towards pipelined block validation

The validation Pipelining project aims to streamline the block validation process, in order to reduce the number of times they need to be applied (that is, executed) across the Tezos Layer 1 network. It also aims to propagate blocks before fully applying them, but after having fastly validated a set of preconditions on the block and its operation payload which guarantees their correctness. These goals combine to pursue increasing throughput for Tezos’ Layer 1, without compromising the network’s safety.

The Kathmandu protocol proposal will implement first steps in this direction by focusing on disentangling validation and application for manager operations¹ inside the Tezos protocol. That is, we separate the business logic of addressing the question of whether an operation can be safely included in a block, from its actual inclusion in a block.

The second part of the project is planned for future protocol proposals, and it will consist of integrating the remaining parts necessary for a complete pipelined validation of blocks and operations to be fully operational inside the Tezos protocol.

Improved randomness using Verifiable Delay Functions

All Tezos protocols versions so far — including Jakarta — rely on a simple RANDAO-like scheme to generate a random seed at each cycle, which in turn feeds a Pseudo-Random Number Generator (PRNG). Sound randomness is paramount for enabling the Tezos protocol to fairly decide the attribution of consensus rights, which ultimately affect participation rewards.

In order to strengthen the Tezos protocol’s distributed randomness generation mechanism, we propose integrating Verifiable Delay Functions (VDFs). These are novel cryptographic primitives that allow a verifier to quickly assert that a value was computed as a result of an expensive computation. Appending a VDF phase after our current RANDAO scheme adds an extra layer of security to the protocol’s random seeds.

Event logging in Tezos smart contracts

The Kathmandu protocol proposal will provide support for Tezos smart contracts to cheaply emit on-chain events via statically typed event data attachments. This new feature will enable DApp developers to send publicly visible on-chain messages in order to trigger effects in off-chain applications. For instance, events can be notifications of changes in contract states, be it explicit or implicit. We aim to provide a uniform interface with existing token transfer facilities to allow event logging without origination of any extra contracts.

Further details on the motivation behind this feature can be found in Contract event logging.

Support for permanent test networks

The Kathmandu protocol proposal will also include a new testnet-specific governance mechanism necessary to support a permanent test network, Ghostnet.

For any chain other than Tezos Mainnet, this mechanism lets the chain originator designate a special account, that can upgrade the protocol unilaterally with a new protocol proposal. Instead of going through the normal governance election cycles, the chain will upgrade straight away to the new protocol. This will allow for the creation of a long-running testnet, Ghostnet, where protocol upgrades will be centrally managed by Oxhead Alpha and which will closely follow Tezos mainnet upgrades.

Looking ahead

The contents of the Kathmandu proposal will be finalized shortly, after the last stages of stabilization are completed. As customary with recent proposals, a dedicated test network Kathmandunet will be spawned shortly after the protocol is announced. Moreover, and as we mentioned above, we invite the community to start testing the upcoming SCORU functionality in the Mondaynet test network, and we will provide more details shortly.

We set out to make 2022 a year of ground-breaking advances in the Tezos technology stack, and we believe Kathmandu follows naturally the path set out with Ithaca and Jakarta protocol upgrades. We look forward to sharing these new exciting features with you all!

Manager operations include transfers, smart contract originations, calls to smart contracts, etc. In short, any fee-paying operation in competition for block space. Manager operations have a counter associated (unsurprisingly) to their manager, which is the Tezos account which signs the operation and which pays the fees. ↩

Announcing Tezt

2022-06-16T14:00:00+02:00

Tezt (pronounced /tɛzti/) is a test framework for OCaml that has been developed and used at Nomadic Labs to test Octez, an OCaml implementation of the Tezos blockchain. It has become quite mature and we feel it would benefit the OCaml community at large, so we are releasing it publicly as a standalone product.

Use Cases

Tezt is well suited for:

unit tests;
integration tests;
regression tests.

This allows you to seamlessly use the same framework for most of your tests.

Features

General features:

easy and flexible test selection from the command-line;
run tests in parallel in separate processes;
automatically split tests into well-balanced batches for running in separate CI jobs;
various reporting options, including JUnit for CI integration;
write assertions (equalities, inequalities, existence in lists, etc.) that result in nice error messages when they fail.

For integration tests:

run external processes, automatically logging their output and terminating them at the end of a test;
declare temporary files that are automatically removed at the end of a test;
external processes can be run on distant runners through SSH;
a JSON module is provided to test REST APIs.

For regression tests:

capture some strings (e.g. output from external processes);
apply regular-expression-based substitutions to make those outputs deterministic;
compare with previous captured outputs.

Additionally, substantial effort was put into the user experience. Many small details that may not seem like much individually, but add up to significantly improve everyday use. These include:

colorful logs: for instance, each external process gets its own color;
by default, logs are quiet until an error occurs, in which case previous log messages are displayed to get some context;
an option allows you to see all commands that are used to run external processes, making it easy for you to copy-paste them and reproduce;
if one of your tests is flaky (i.e. randomly fails), an option allows you to run it in a loop to more easily reproduce the error;
JSON error messages show the location of ill-typed JSON value;
all functions are clearly documented.

A Simple Test

Create a dune file containing:

(test (name main) (libraries tezt))

Now create a file named main.ml. To add a test, register it with Test.register:

open Tezt
open Tezt.Base

let () =
  Test.register ~__FILE__
    ~title: "example test"
    ~tags: ["example"; "addition"]
  @@ fun () ->
  if 1 + 1 <> 2 then Test.fail "1 + 1 is not 2";
  unit

let () =
  (* Call this after you are done registering tests. *)
  Test.run ()

Let’s see what is going on here.

~__FILE__ tells Tezt which source file is defining this test. You can tell Tezt to run all tests that are registered in a given source file with --file (or -f). Here, that would be --file main.ml.
~title should be a short descriptive title for your test. You can tell Tezt to run this particular test with --test 'example test'.
~tags is a list of identifiers that can be used to select a subset of your tests from the command-line. For instance, you can run all tests that have tag addition simply by adding addition on the command-line. You can also specify negative tags; for instance, you can have a tag slow for tests that take longer to run, and exclude them with /slow.
The last argument is a function that implements the test.
Test.fail raises an exception, similar to Failure. It stops the test. Tezt then cleans up if there is something to clean, and reports the error.
unit is Lwt.return_unit. All tests are written in the Lwt monad.
Test.run tells Tezt that all tests have been registered. Tezt takes it from here, handling command-line options and running tests that it should run.

To run your test, run either:

dune runtest

or:

dune exec main.exe

You can get a list of command-line options with:

dune exec main.exe -- --help

In particular, you can try the --list option, which gives you the list of registered tests and their tags. Don’t forget to use -- to separate Dune arguments and Tezt arguments:

dune exec main.exe -- --list

Some basic options that you may be interested in are:

--list to get the list of registered tests;
--test TITLE (or -t TITLE) to select a test from its title;
--file FILE.ml (or -f FILE.ml) to select tests from their source file;
--verbose (or -v), --info (or -i) to control log verbosity;
--log-file FILE to store verbose logs in FILE;
--keep-going (or -k) to continue with remaining tests even if a test fails;
--jobs N (or -j N) to run up to N tests in parallel.

A Simple Integration Test

An integration test is just a test that runs external processes. It often also manipulates temporary files. Here is a simple example:

open Tezt
open Tezt.Base

let () =
  Test.register ~__FILE__
    ~title: "example integration test"
    ~tags: ["example"; "cat"]
  @@ fun () ->
  let filename = Temp.file "test.txt" in
  with_open_out filename (fun ch ->
    output_string ch "test" ^ string_of_int (Random.int 1000));
  let* output = Process.run_and_read_stdout "cat" [ filename ] in
  if output =~! rex "^test\\d{1,4}$" then
    Test.fail "got %S instead of the expected output" output;
  unit

let () = Test.run ()

Let’s see what is going on here. This tests uses three functions from the Base module of Tezt:

Temp.file "test.txt" returns a filename of the form /tmp/tezt-1234/1/test.txt. This file will be automatically removed at the end of the test unless you use --keep-temp, or --delete-temp-if-success and the test fails.
with_open_out opens a file for writing, gives you the out_channel, and ensures the file is closed after.
let* is Lwt.bind.

Then it uses the Process module to execute cat on the temporary file and check its output. The Process module will log all output from cat in verbose mode. This is particularly convenient for debugging.

Finally, this test uses =~!, which means: check that the left-hand-side does not match the regular expression at the right-hand-side. Regular expressions are often convenient to use with integration tests, so some easy-to-use regular expression operators are provided by Tezt.Base.

You can run this test like any other test. One option that may be useful is --commands (or -c): it tells Tezt to print commands that it runs. It quotes them using shell syntax so that you can copy-paste them into your terminal to easily reproduce.

A Simple Regression Test

Regression tests are tests that produce a string which should never change. This string is stored in a file that you can commit in your repository. When you run your test, the string that it produces will be compared with the contents of this file, and the test will fail if it differs.

Regression tests are in particular useful to check that the output of external processes stay unchanged, so they are often used in integration tests. But they can also be used for unit tests.

Here is an example regression test:

open Tezt
open Tezt.Base

let () =
  Regression.register_test ~__FILE__
    ~title: "example regression test"
    ~tags: ["example"]
    ~output_file: "example.txt"
  @@ fun () ->
  let* output =
    Process.run ~hooks: Regression.hooks
      "git" [ "--help" ]
  in
  Regression.capture (string_of_int (1 + 1));
  unit

let () = Test.run ()

Let’s see what is going on here.

This test is registered with Regression.register_test instead of Test.register. This tells the test that it shall produce a file with the captured regression output.
output_file is the name of the file to produce. It is prefixed with tezt/_regressions/, so here the output file is tezt/_regressions/example.txt. You can override this prefix with --regression-dir.
We use ~hooks to tell the Process module to capture the output of the process in the regression output file.
We also manually capture the result of a computation.

You need to run the test once with --reset-regressions to generate the output file:

dune exec main.exe -- regression --reset-regressions

Here we also specify the regression tag on the command-line. This is optional but this tells Tezt to only run tests which have tag regression. This tag is automatically added by Regression.register_test.

Next time you run your tests, for instance with dune exec main.exe, the output of the test will be compared with tezt/_regressions/example.txt. If it differs, the test will fail.

Regression tests are convenient because they are fast to write. However, they do have some drawbacks:

the expected output is not in the source code of the test itself (an alternative for this is to use ppx_expect);
if the output contains non-deterministic values, such as time data or random values, you first need to replace them with fixed values (this can be done with custom ~hooks).

CI Integration

Our tests run in GitLab’s CI for all merge requests. This prompted us to make Tezt well integrated with the CI. Here are some of the features that were built with that in mind.

GitLab’s job interface has a limit for how many lines of output can be shown for a given job. By default Tezt only shows detailed logs in case of a failure, and only the most recent lines (the exact amount is a command-line option). This means that we never reach GitLab’s limit. We can still see the full logs if we want: we use the --log-file option and store the log file as an artifact.
Tezt can generate record files that store the time each test took to run. It can then use those records to automatically split tests into a partition where each subset takes roughly the same amount of time. And you can tell Tezt to run only one of this subset in each CI job. The result is that you can easily split your tests in well-balanced parallel jobs in the CI.
Tezt can generate JUnit reports. JUnit is supported by GitLab. This gives the ability to show a summary of test results in the merge request interface.
Because Tezt can run external processes using SSH, you could have the CI runner be the puppetmaster for a bigger cluster of machines. We don’t actually do that, but we could.

Note that outside of the JUnit format, none of these features assume that your CI runs on GitLab, so they should port rather easily to other CIs.

And More

Tezt has a rather long list of command-line options to configure its behavior. Run your executable with --help to see it, or read Tezt’s cli.ml source file. We already had a look at some of these options in section A Basic Test and we already mentioned some others. But there are too many to list here, as Tezt, being heavily used, has aggregated many small yet useful features over the time.

Conclusion

Tezt development started more than two years ago. At the time, we had two frameworks for integration tests. Flextesa, which was written in OCaml but failed to convince all developers (possibly because it focused on interactive tests), and a Python-based framework. Most integration tests were thus written in Python. As primarily OCaml developers, we were eager to write our tests in OCaml.

Tezt thus began with a focus on integration tests and was developed with user experience, simplicity of implementation and ease of use in mind. It quickly convinced a few early adopters at Nomadic Labs, and expanded from there. Now, no new test is written in Python, all new integration tests are written using Tezt.

We eventually realized that Tezt could also be used for unit tests. We did not actually transition to Tezt for unit tests though; most unit tests are written using Alcotest, a test library for OCaml. But some developers are starting to voice preference for Tezt for unit testing, either because they like its focus on user experience, or because they like its CI integration capabilities like auto-balancing, or just because they would prefer to only have one framework. Tezt provides most of Alcotest’s features, except (for now) integration with QCheck, a library for property-based testing. It should, however, not be very hard to integrate QCheck with Tezt, and we may decide to make the jump one day.

It’s now clear to us that Tezt is a success with Octez developers, and we see no reason to keep it for ourselves. Version 1.0.0 was already released on opam a while ago, but with no announcement — it was mainly so that we could use it ourselves on other Tezos-related projects. We just released version 2.0.0 on opam: run opam install tezt and start tezting!

You can find the API documentation here.

Transaction Optimistic Rollups – a stepping stone for Tezos

2022-06-09T16:00:00+02:00

TL;DR: Transaction optimistic rollups are an experimental solution meant for short-term scaling and for adapting ecosystem infrastructure to work with rollups, while smart contracts rollups are under development.

With the Jakarta protocol proposal, the first step on the road towards scaling Tezos with optimistic rollups has been taken.

To quickly recap: Optimistic rollups enable higher throughput by moving the computationally demanding validation of transactions away from the main chain, Layer 1, to dedicated nodes, Layer 2.

This approach is far from new, but optimistic rollups do so while delegating consensus – the authority to determine the true state of the Layer 2 ledger – to Layer 1.

All data necessary to verify correctness of rollup activity remains available on Layer 1, along with an on-chain dispute mechanism for automatically correcting dishonest behavior by rollup nodes.

This way, optimistic rollups enable the use of powerful, centralized systems for validation of Layer 2 activity, while integrity is ultimately ensured by the security and censorship resistance of the Layer 1 protocol that hosts the rollup.

For a broader understanding of optimistic rollups and the rationale behind them as a scaling solution, we recommend our recent blog post on the topic.

TORUs as a stepping stone

Transaction Optimistic Rollups, TORUs, are the first implementation of optimistic rollups in the Tezos protocol. As the name implies, they allow for exchanges of assets, but not execution of smart contracts.

The purpose of TORUs is to

Demonstrate Tezos’ ability to implement scaling solutions through protocol upgrades
Provide a first implementation of optimistic rollups for ecosystem developers to start experimenting with
Pave the way for smart contract optimistic rollups – a more advanced implementation, expected to be part of a protocol proposal later in 2022.

TORUs should be treated as a short-term and experimental scaling solution. For this reason, they are introduced with a sunset of 1 year. This can be changed in future protocol upgrades, should the community desire it.

A roll of tickets

A transaction optimistic rollup is an entity with its own address residing on the Tezos main chain.

The rollup holds assets that are deposited from – and that can be withdrawn back to – the main chain, Layer 1. These assets can be transferred across rollup-native Layer 2 addresses at a higher throughput than what is possible on Layer 1.

Deposited assets are represented as Michelson tickets in the rollup. These are a special type of tokens that were introduced in the Edo upgrade in 2021.

With FA2 tokens (Tezos) and ERC-20 tokens (Ethereum), you don’t actually hold the token in your account. Rather, each token has a centralized smart contract that acts like a bank, keeping track of how much your account holds.

Tickets change this. They are recognized directly by the protocol and can be passed around freely without interacting with a centralized smart contract. If traditional tokens are like a bank account, tickets are like an unforgeable dollar bill. As close to true ownership as it gets.

Besides this desirable property, using tickets has the benefit of simplifying the code when used in TORUs. For a deep dive on Michelson tickets, check out this blog post by our friends at Marigold.

Built to promote decentralization

Optimistic rollups on Tezos are implemented as part of the Tezos protocol directly, and not as smart contracts. This is often called enshrined rollups.

An immediate benefit is that it allows for a specialized, more gas- and storage-efficient implementation. For example, deploying a TORU is way less costly in terms of storage than deploying a smart contract of similar complexity. And as Tezos evolves with further upgrades, new features and optimizations can boost functionality and throughput of all enshrined rollups.

The way TORUs are integrated into the protocol makes them highly decentralized and facilitates broad participation. By design, they are

Permissionless: Anyone can deploy a new TORU, and anyone can be a node operator for any existing rollup.
Easy to use: TORUs are identified by specific addresses with the prefix txr1. Layer 2 addresses inside the rollup have the prefix tz4. Furthermore, the interface is designed to make interacting with rollups similar to interacting with a smart contract.
Accessible: It is fairly simple to run a rollup node, which is a (experimental) daemon being developed as part of the Octez code-base. Once configured, it scans for rollup instructions in Layer 1 operations, executes the instructions, and posts a hash representing the updated rollup state to Layer 1.

TORUs on Tezos are not run by a single company or group of people. No one “owns” a smart contract containing the TORU. There’s no admin key. Once deployed, the network “owns” the rollup.

No rules can be applied other than the semantics of included operations. It’s not possible to prevent someone from depositing a ticket or from exchanging this ticket.

Rollup node operators also don’t decide which Layer 2 operations will be processed, or their order of processing, as this is determined by their inclusion in Layer 1 blocks. Any attempt at circumventing this will be neutralized by the Layer 1 dispute mechanism.

Using a TORU

Interaction with a TORU happens using the following workflow:

Depositing: Assets are sent to a Layer 1 smart contract which locks them and mints a corresponding Michelson ticket. The smart contract then sends a transaction containing the ticket and a recipient Layer 2 address to the rollup. It’s worth noting that such ticket minting contracts are not part of the rollup itself and should be seen as a trusted third party. We imagine NFT marketplaces and DEX’s integrating this feature, as some trust is already placed in their existing smart contracts by the users.

Transferring: Just as on Layer 1, a Layer 2 address is tied to a private key held by the owner, which is used to sign transactions. Instructions for the TORU – signed transactions – are included in Layer 1 operations, but only interpreted by dedicated rollup nodes scanning the blockchain for these.

Withdrawing: If the recipient of a rollup transfer instruction is a Layer 1 address, the rollup sends the ticket(s) to that address on Layer 1, though it will remain frozen until the dispute period is over. This period will be two weeks initially, but might be changed in the future. Once unfrozen, the ticket can be exchanged for the underlying asset with the smart contract that minted it.

Rollup users who also control a Layer 1 address can of course choose to interact with the rollup by including instructions for their Layer 2 address in Layer 1 operations themselves. However, we expect that most end users will interact with rollups via trusted third-party nodes that batch multiple Layer 2 transactions in a single Layer 1 operation. For example, a wallet provider might run a node that batches all Layer 2 transactions submitted by users of the wallet.

Such batching of transactions is an essential part of the efficiency of rollups. Through the use of BLS signatures, multiple transaction signatures can also be aggregated into one, which means even more Layer 2 transactions can be included in each Layer 1 block.

Why TORUs are experimental

It bears repeating that TORUs are an experimental solution. They should only be used by projects that are prepared to migrate to a different solution within a year.

The design chosen for TORUs allows for a simpler and faster integration of rollups, but has the side-effect that the maximum throughput of an individual TORU is capped at a lower level than required for long-term scalability.

A conservative estimate puts throughput for each individual TORU at 300-500 tps. Total throughput can be increased by running several rollups in parallel, but the design of TORUs means there is a hard limit imposed by Layer 1 blocksize at around 900 tps.

These numbers are however highly dependent on the ecosystem’s use of tools for optimizing rollup capacity, with aggregation of transaction signatures being a major factor. Potential Layer 1 block congestion is another factor.

All in all, TORUs should be seen as a short-term scaling solution and a way for the ecosystem to begin adapting infrastructure to working with rollups, while smart contract optimistic rollups, SCORUs, are being developed.

SCORUs are likely to eventually become the preferred solution for all rollup use cases, as they don’t just add smart contract functionality and other new functionality, but also are far more scalable in terms of throughput.

We will cover SCORUs in much more detail in future blog posts, as their inclusion in a protocol upgrade proposal approaches. Stay tuned!

Also coming with Jakarta: spring-cleaning the Michelson interpreter

2022-05-12T16:00:00+02:00

In recent months the Michelson team at Nomadic Labs has launched a project dedicated to paying off the technical debt in the interpreter. Michelson is the language of smart contracts in Tezos blockchain and its interpreter is an integral part of the Tezos economic protocol. It evolves along with the protocol and in fact many new features were added to it in previous upgrades. As happens commonly when developing software, technical debt accumulated with these changes and it was decided that it’s time to make a dedicated effort to pay at least a part of it off.

What is technical debt?

It is a very broad term encompassing all inefficiencies, suboptimal design choices, insufficient testing and even bugs that accumulate in a software project along its lifetime. Writing a good piece of software is a very difficult and time-consuming task. It relies on information that isn’t always readily available. Environments change and so do users’ needs. When pressed by deadlines or uncertainty, developers often create suboptimal solutions, either erroneously or even intentionally as a trade-off necessary to deliver on time.

It is called “debt” because allowing for these deficiencies in the short term allows software to be delivered more quickly, but over time they take a toll on its quality. Hence it’s important to keep the technical debt low and pay it off by fixing inefficiencies on a regular basis.

Technical debt in the Michelson interpreter

When creating a programming language, an additional complication with respect to technical debt arises. While the language’s interpreter (or compiler) accumulates technical debt like any other piece of software, it must still support all the software previously written in the language. So it is imperative to ensure that every program previously written still works in the same way it did before any upgrade. This requirement makes it challenging to redesign programming languages and demands extensive testing of each change.

That said, it is sometimes possible (albeit risky) to announce a breaking change to the language, making it clear to the users that their programs may no longer work with the new version of the language. This was done for instance with Python. Python 3 was incompatible with Python 2 and the breaking changes were at least partially motivated by technical debt. The transition was so painful that most Linux distributions still ship with Python 2 (either as the default or additional interpreter), as a lot of software was never updated to work with Python 3.

However, in the case of Michelson this approach cannot work, because a smart contract, once submitted to the blockchain, stays there forever. The protocol must retain the ability to interpret old contracts, no matter how many protocol upgrades were implemented since the contract’s origination. The need to support every existing contract is often a great obstacle in refactoring and improving the code.

In addition, sometimes certain features of Michelson turn out to be less useful or safe than expected. They might also conflict with other features that are later deemed to be more important. It seems that technical debt can also arise in utterly normal development processes, because of external situational changes! For example, users behave differently than expected, or new scientific developments are announced that would be good to take advantage of. So, is the blockchain doomed to support those legacy features forever?

Fortunately there is a solution! The Michelson interpreter has a so-called legacy mode, which supports all the features that were ever used on the blockchain. As the name suggests, it is an optional behaviour, which is only enabled for executing contracts already existing on-chain. New contracts before they’re originated are type-checked in normal mode, which does not have to support all the legacy features. Indeed, thanks to these two distinct modes of operation it is possible to deprecate old features and disallow origination of contracts using them, while still supporting them in old contracts.

But does it really help? The blockchain still has to support deprecated features in the legacy mode, doesn’t it?

Patching legacy contracts

In the Michelson team at Nomadic Labs, we decided that the time has come to make steps towards eliminating currently¹ deprecated features for good. Contrary to Python, which is a general-purpose language, Michelson only makes sense within the Tezos blockchain. Because the blockchain is publicly available, unlike developers of Python, we do have access to all the Michelson programs, or at least to all those that can have an impact on the blockchain. Although we cannot modify the on-chain data², we can tell the nodes to replace one contract with another at a later block. Because of the evolving nature of the Tezos protocol, each new protocol version can (and often does) alter the way nodes store information about the current state of the blockchain (called the context). We can use this mechanism to modify contracts stored in the context. Albeit contracts retain their original form in the blocks that originated them, they can be patched at a later block by the protocol migration.

This gives us a clear procedure to remove deprecated features even from the legacy mode of the interpreter:

Announce the feature deprecation and remove the feature from the normal mode.
Wait until the protocol deprecating the feature gets activated.
Type check all the contracts on chain in the normal mode and select those that fail.
Patch the selected contracts so that they type check successfully again.
Finally remove legacy features from the legacy mode.

The waiting step is unfortunate, but necessary. We have to wait for the new protocol with the deprecated feature to get activated, so that no new contracts are originated using the deprecated feature after we do the patching.

In the past we did deprecate features in several protocol upgrades, but this was the first time we actually started patching contracts in order to remove those features. After some patching and hacking on the tezos-node we managed to extract all the smart contract scripts from the mainnet and type checked them using tezos-client typecheck command. We have found 8 contract scripts that required patching, although most of them were instantiated multiple times. Patches were mostly trivial to do and all the scripts type checked successfully again. But does the story end here? As a careful reader will probably guess, in fact it has only just begun! We now need to make sure that all the patched contracts work exactly as they did before.

One example of a deprecated feature we decided to patch away was the possibility to store typed references to other smart contracts in storage (of Michelson contract type). These references were at some point forbidden from appearing in contract storages, so we had to find those contracts that still held contract references in storage and remove them. The following patch to smart contract KT1MzfYSbq18fYr4f44aQRoZBQN72BAtiz5 is an example of such a change:

--- patched_contracts/exprtgpMFzTtyg1STJqANLQsjsMXmkf8UuJTuczQh8GPtqfw18x6Lc.original.tz
+++ patched_contracts/exprtgpMFzTtyg1STJqANLQsjsMXmkf8UuJTuczQh8GPtqfw18x6Lc.patched.tz
@@ -1,10 +1,5 @@
 { parameter (or (lambda %do unit (list operation)) (unit %default)) ;
-  storage
-    (pair key_hash
-          (contract
-             (or (option address)
-                 (or (pair (option address) (option mutez))
-                     (or mutez (or (pair (option address) (option mutez)) address)))))) ;
+  storage (pair key_hash address) ;
   code { DUP ;
          CAR ;
          IF_LEFT
@@ -28,6 +23,8 @@
                NIL operation ;
                { DIP { DIP { DUP } ; SWAP } ; SWAP } ;
                { DIP { DIP { DIP { DROP } } } } ;
+               CONTRACT (or (option address) (or (pair (option address) (option mutez)) (or mutez (or (pair (option address) (option mutez)) address))));
+               IF_SOME {} {PUSH string "Bad contract in storage"; FAILWITH};
                AMOUNT ;
                SENDER ;
                SOME ;

As can be seen here, contract ... in the storage type is replaced by an un-typed address. The difference between these values is only in their types: contract is parametrised by the referenced contract’s type, while address is not. The latter can be converted into the former, by specifying the said type. This is exactly what the two lines added in the body of the contract do.

It’s worth noticing that this patched version of the contract can fail at runtime while the original version couldn’t. It would fail when the stored address points to a contract of a wrong type. This is unfortunate, but could not be helped. We will take steps to verify that the change doesn’t break the contract and once we do it, all should be well.

Verifying the patched contracts

How do we make sure that we didn’t break a program of perhaps a thousand lines of code that we’re seeing now for the first time? It’s especially worrisome, considering that Michelson is not a particularly human-friendly language. There’s no (formal or even informal) specification for the contract, no documentation, and all we know about the author is their public cryptographic key (albeit, in actuality there’s no guarantee that the person who originated the contract is also the one who wrote it). The situation seems almost hopeless.

Fortunately one of the particularly terrifying contracts was written by us some time ago. There was some documentation and even some tests written in the form of a shell script. So we started with that one. We rewrote the test, because it relied too much on old behavior of the tezos-client that has since changed. Also reworking that test gave us a thorough understanding of the purpose and interface of the contract. We ran the test against both the original and the patched version of the contract and fortunately both versions passed. So far, so good

With other contracts we had less luck though. Fortunately what once happens on a blockchain, stays there forever. Thus we have a complete record of transactions ever made to these contracts. We could replay these transactions with the patched version of the script and check if they yield the same results as they did originally. How can we replay a transaction to a smart contract altering some of its aspects? Of course, we have the tezos-client run script command. It accepts a script, a storage, and a parameter and executes the given script in the current context of the node. However, there’s a lot more of the blockchain’s state to reproduce than just storage and parameter. The contract has a balance, an address (which is more or less randomly assigned to the contract during origination). In addition, the transaction comes from some other account which also has an address and so forth.

Indeed, as we replayed transactions with the original scripts to verify the soundness of our procedure, we faced some contracts failing on transactions that succeeded in the past. The apparent point of failure was checking user signatures. But why would a signature that used to be valid, become invalid over time? Fortunately, failures occurred in the contract we had tests for. From these tests we managed to deduce that the signature consists in encrypting a certain message with user’s private key, and a part of that message was the address of the contract being called.

With the run script command the node originated a new contract with a fresh address to run the script for. Hence the replayed scripts had a different address than originally and that is why historic signatures didn’t work in replay. Therefore, it was necessary to force execution with a given address. So before testing the scripts we had to implement additional options to the tezos-client. This is how paying off technical debt can sometimes lead to new features in the software.

Having done that we were able to replay transactions from the past.

Scaling up

Of course with all the contracts there were several hundred transactions to replay in total. Typing the parameters for all these transactions on the command line takes a lot of time. So it was necessary to devise an automatic process of replaying the transactions and checking results.

We decided to use the API of the tzkt.io indexer to download historic transaction data for our contracts. We wrote a script where for each transaction the local tezos-client is called to execute the contract. Then, it would parse the transaction result and compare them to whatever the indexer reported.

Developing the script took a lot of time as we had to translate between 2 different encodings of Michelson values (JSON and Micheline), distinguish between internal operations generated by the script and the “main” operation that triggered them, and generally understand the data returned by the indexer in various situations. We also had to decode operation results returned by the indexer and compare them to those produced by tezos-client. Lastly, we had to produce a nice summary, indicating which operations to which contracts failed so that we could debug them.

After fixing all false-positives and errors we discovered some operations that we couldn’t get to work. One of the contracts was previously patched during a migration so that its storage type already changed. Of course, reproducing transactions from before the patch was not possible, so we had to skip those³. Moreover, some contracts relied on big maps, which are stored in the node’s storage and currently there’s no way to inject their modified values into the run script command. It’s not at all obvious how an interface for this should look, not to mention implementation problems. Hence we decided to give up and leave these operations unchecked. There were few of them though.

Code review

The contracts will be patched along with a protocol migration. The node iterates over all the contracts, looking for hashes of the contracts we want to modify. Once such a hash is found, the contract for that account is replaced with a hard-coded binary string representing the patched version of the contract. How do we review such a change? Binary representation of Michelson code is even less readable to a human eye than the code itself. Reviewers could and did review the test for the one contract that we had a written test for, as well as the migration code. But how do reviewers read the contract patches if they’re binary-encoded?

To make the review process easier we attached to the migration code both the original and the patched code for all the modified contracts as well as diffs between them. We wrote unit tests to verify that the patched versions were equivalent to hard-coded binary representations. We also checked that the supplied diffs are the same when comparing the hard-coded binary representation and the text files attached to the original code.

Finally we published the aforementioned script to let the reviewers run the test for themselves. Replaying all the transactions takes a few hours, so it’s not very convenient to run, but not that bad if one has to do it only once.

At the time of writing this post, the migration code is merged into the master branch, and constitutes a part of the Jakarta proposal. Obviously it won’t be executed until Jakarta is accepted by the community and activated, though. That said, having performed all these tests and checks we’re confident that the patched contracts work the same as they did before, or at least they do for all the test cases that reality itself graciously provided us. After all, what better test to a software than real usage by real users?

Actually the word “currently” here refers to the moment of making the decision, because while working on this, we already decided to deprecate more features, which were not included in the scope of this work. ↩
Changing the contracts directly would disturb block hashes, thus making these blocks invalid. ↩
The states of the storage that the indexer remembered were invalid according to the patched version of the contract, so attempts to replay these transactions with the patched version resulted in a type error. ↩

Announcing “Jakarta 2”

2022-04-27T09:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

In a recent blog post, we reported two critical bugs in the original Jakarta protocol proposal, affecting the implementation of the experimental Transaction Optimistic Rollups (TORUs) feature.

Fortunately, we have been able to confidently fix both issues, and a new protocol proposal, Jakarta 2, has been since developed, tested and released. Moreover, the latter proposal — whose “real name” is given as usual by its hash, PtJakart2xVj7pYXJBXrqHgd82rdkLey5ZeeGwDgPp9rhQUbSqY — has already been injected and it is already competing with the original Jakarta one in the current Proposal period.

Yesterday, Nomadic Labs published a release candidate for a new Octez suite version, v13.0~rc1. This release candidate includes the daemon binaries that enable testing the Jakarta 2 protocol proposal in the upcoming Jakartanet test network.

Changes in Jakarta 2

The changes in Jakarta 2 with regard to the previous Jakarta proposal concern only fixing the reported bugs. In this section we shed more light into these issues, and detail how they were corrected for the new protocol proposal.

Bugfix for uncraftable rejection operations

The first bug concerns rejection operations, that is the Layer 1 operation which allows a honest node to refute an erroneous or malicious commitment in a TORU. A flaw in its implementation ultimately made it possible for an attacker to publish commitments whose refutation proofs would not fit within the maximal allowed size for Tezos’ manager¹ operations batches — 32 KiB.

This limit underpins a design decision in the current implementation of TORUs: there is an upper-bound on the number of Layer 2 transaction batches from the same rollup which can be (ahem!) rolled up within a given Tezos Layer 1 block. For each of these batches, operators of the client rollup are required to include the batch’s hash in their respective commitment operation on Layer 1. Inserting too many batches from the same rollup in the same block would make it impossible to insert a valid commitment for it in a subsequent block, which would result in the incriminated rollup being stuck forever.

Rejection operations denounce a specific rollup commitment, which contains one hash per batch of Layer 2 transactions. Honest rollup accusers can trivially find the first hash that they disagree with, and craft a rejection operation to refute said batch. The rejection operations requires two key pieces of data: (1) the target batch itself, and (2) a collection of Merkle proofs which allow to replay said message. This raises an immediate concern. What if the provided Merkle proofs are just too large to fit within the size boundaries a Tezos Layer 1 manager operation? If the batch is large enough (that is, it contains enough transactions), this can definitely happen.

To tackle this risk, TORUs relies on two design choices. First, TORUs encode Merkle proofs using a format that allows to send truncated proofs. Second, a batch for which one can exhibit a valid, “large enough” truncated proof has to be considered a no-op.

In the first Jakarta proposal, the size boundary for Layer 2 operations batches was 5 KB, and a truncated proof larger than 30 KB was required, in order to prove that a batch should be considered a no-op. So, in order for an honest rollup node to prove that a batch of size 5KB was to be ignored, it needed to produce a refutation whose size was at least 35 KB (~36 KiB)… which does not fit within the 32 KiB limit on Layer 1.

The fix is straightforward: accept a truncated proof of 30KiB - sizeof(batch). In this way, rejection operations are guaranteed to fit within a single Tezos manager operation. The curious reader can have a look at the diff of the MR implementing this bugfix.

Bugfix for forged zero-valued Tickets

The second bug concerns Tezos Tickets. The latter are an abstract representation of tokenized assets, and in TORUs, they are used to represent the assets deposited and transacted within the rollup.

Tickets can be seen as a tuple consisting of: (1) its (typed) payload, (2) the address of the smart contract which has minted it — a.k.a its ticketer — , and (3) a quantity. A ticket can be split into two (and eventually more parts), assuming that the total sum of the quantities associated to each of the two resulting half-tickets is equal to the amount of the original one. Dually, two tickets with the same payload and issued by the same ticketer, can also be joined, adding up the quantities.

A subtle, and often-overlooked detail is that a zero-valued Ticket is a completely valid one. In a nutshell, owning a ticket whose quantity is 0 is not the same thing as not owning any ticket at all — and the consequences or the semantic meaning of the former depends on the application.

Both with TORUs and the Tezos protocol’s Table of Tickets² specific design choices concerning zero-valued tickets were made, in order to simplify each component’s designs. On one hand, the Table of Tickets does not take into account zero-valued tickets at all. This is a legitimate choice, because its purpose is to prevent the forgery, and the incorrect splitting, of tickets holding non-zero quantities. On the other hand, in TORUs we have decided to not support zero-valued Ticket transfers within the Layer 2, and to rely on the Table of Tickets to transfer ticket ownership in the Layer 1. In the end, these choices were proven to be incompatible, was the TORU implementation relied on stronger assumptions than those provided by the Table of Tickets. As it is often the case in Software Engineering, composing two software components brought unforeseen issues in the resulting system.

In Jakarta 2, we addressed this situation by enforcing in a stricter manner that zero-valued tickets are not supported within the rollups, adding additional checks in the Tezos protocol’ Layer 1. More precisely, three straightforward bound checks are enough to prevent zero-valued tickets being forged by TORUs withdrawal operations.

Join Jakartanet!

A testnet for the Jakarta 2 protocol proposal, named Jakartanet, will launch on April 27th at 15hs UTC. It is critical to have as many bakers as possible participating in this testnet, by running nodes, producing blocks and deploying apps. We are looking for more bakers to participate from day one. If you are interested, please join the Tezos Baking slack and make yourselves known in the #test-networks channel.

Once more, we strongly encourage you to test your own Tezos-based applications to check for compatibility problems.

Should the Jakarta 2 protocol proposal be accepted by the community, the following minimal version of Tezos node (shell) software will be necessary to participate in the consensus, due to necessary changes introduced to the protocol environment: v13 of Octez, or v3 of TezEdge.

Manager operations is the term we use in the Tezos architecture to denote any fee-paying operation in competition for block space, like transfers, smart contract originations, and smart contracts calls. Thus, the new Layer 1 operations related to rollups belong to this category as well (see here for more details). ↩
the Table of Tickets, introduced in Jakarta, is an internal component of the Tezos economic protocol who keeps track of all emitted tickets in the current context — that is at the current head of the blockchain. ↩

We discovered two bugs in Jakarta — a reproposal is coming

2022-04-22T17:00:00+02:00

TL;DR: During continued stress-testing of the Jakarta protocol proposal, we discovered two critical bugs in the implementation of Transaction Optimistic Rollups. A new proposal —Jakarta 2— will be released in time to be considered in the ongoing proposal period. We encourage bakers to avoid upvoting of the initial Jakarta proposal, and to upvote Jakarta 2 instead.

Update: Jakarta 2 has been released. More details in its announcement blog post.

We recently announced Jakarta, a new protocol proposal for the Tezos blockchain which notably introduces the first, experimental, step in our scalability roadmap: transaction optimistic rollups (TORUs).

Despite significant unit testing, property testing, and integration testing of the new protocol features prior to injection, we have found two critical bugs in the implementation of TORUs during continued post-injection testing.

The first bug is an issue with the rejection operation, that is the Layer 1 operation which allows a honest node to refute an erroneous (read, malicious) commitment. A flaw in its implementation ultimately makes it possible for an attacker to publish commitments for which honest participants cannot provide refutation proofs. This is a critical security bug, but one isolated to TORUs. It has no impact on the security of Tezos’ Layer 1.

The second bug concerns Ticket withdrawals, the underlying mechanism allowing for tokens to be transferred back from the Layer 2 rollup to Layer 1. The bug enables an attacker to forge tickets accounting for 0 tokens.

In TORUs, tickets are used to internally represent assets deposited to the rollup. The design of tickets allows for zero-valued ticket accounting for 0 tokens, which in some use cases can make sense. However, due to the bug in Jakarta, zero-value tickets can be forged when withdrawing from a rollup. For other smart contracts where zero-valued tickets are indeed meaningful, the bug can be exploited by an attacker to create these for free. Hence, the bug renders any ticket accounting for 0 tokens inherently suspicious, and ultimately worthless. Thus, this is a critical bug which might affect existing or future applications building on Tezos’ Tickets.

Fortunately, we have been able to confidently fix both issues, using fairly small patches that were easy to review, and that are backed up with appropriate tests, which have now been added to the existing test suite.

A call to bakers

While the original Jakarta protocol proposal has already passed the minimal quorum, we are still early in the proposal period. There is sufficient time for Jakarta 2 to achieve enough upvotes for it to replace the original proposal as the one that will continue to the next step.

We encourage bakers who have still not voted to wait until the new Jakarta 2 proposal is released early next week. We also encourage bakers who have already voted for Jakarta to vote again for the new proposal once it is injected.

We have decided to delay the publication of the release candidate for the next Octez version v13~rc1, so as to provide the baker and accuser binaries for Jakarta 2 instead. The Jakartanet test network will thus be based on the Jakarta 2 protocol proposal.

We invite the community, specially developers of applications building upon TORUs, to join Jakartanet from the beginning. Moreover, we want to take the opportunity to advertise and advocate the usage of the

Dailynet and Mondaynet teztnets — feel free to reach out to us if you need help deploying your contracts and applications on the test networks.

As for future proposals, we have already taken actions to improve the specification and testing frameworks to better uncover similar flaws, and we intend to strengthen them even more. The occasional bug is unavoidable, some will be critical and others less so, but we strive to make it continuously less likely that larger bugs escape the nets. And when they do, its unique on-chain amendment process makes the Tezos blockchain uniquely equipped to react, fix, and upgrade accordingly.

TPS evaluation for Tezos

2022-04-19T18:00:00+02:00

In this post we are going to try to answer the question “what is the throughput of Tezos on the Mainnet” by examining the subject from different angles. First, we must note that there is no standard methodology for measuring the throughput of a blockchain. In general, people want to know the number of transactions per second (TPS) that a blockchain can process. There are, however, a few questions that need to be answered first:

What is the configuration of the network that is being examined? Is it a minimal network or a larger network where latencies are increasingly important?
What does it mean for a transaction to be “processed”? Does it mean that it is included in the next produced block or perhaps in a decided block?
Is the TPS value limited by the protocol constants/parameters or technical limitations of the system?
What kind of transactions are we dealing with? In most systems that support smart contracts the makeup of the stream of transactions that is being processed is going to directly affect the maximal TPS number. The situation is simpler with e.g. Bitcoin where all transactions are identical from that point of view.

The last point is an important one because, all things being equal, TPS of a blockchain can change over time depending on the transactions users tend to perform. Let’s take a closer look.

Analyzing transactions

In Tezos, all transactions can be divided into two groups depending on their destination:

Transactions that have an implicit contract as destination.
Transactions that have a smart contract as destination.

In the first case the cost (both in gas and in terms of processing power) is constant no matter the amount being transferred and other parameters. In the second case, however, the cost and the performance of such a transaction is going to depend on the code of the smart contract in question.

It follows, that in order to determine the TPS, one needs to know the proportions of different kinds of transactions in the typical block. The best way to achieve that is to analyze the history of performed transactions. We choose to use the Tezos indexer to do that. The tool stores Tezos’s transaction history in a Postgresql database, which we can query and analyze. The result is a JSON file such as this one:

$ cat 2022-01-01-to-2022-02-28.json
{
  "regular": 5026035,
  "origination": 16905,
  "contract": {
    "KT1RJ6PbjHpwc3M5rw5s2Nbmefwbuwbdxton": 1611069
  }
}

Here, we can see that in the first two months of 2022 there were 5026035 transactions that had an implicit contract as their destination and the most popular smart contract at the time was KT1RJ6PbjHpwc3M5rw5s2Nbmefwbuwbdxton (Hic et Nunc NFTs) with 1611069 calls. For now, we include only smart contracts that are involved in at least 10% of all transactions, but this threshold can be changed. Having this data, we can run gas estimations for regular transfers and smart contracts calls in order to calculate the average transaction cost.

Gas-based estimation

It is possible to perform a TPS estimation based on the maximal allowed gas consumption per block. Transactions in a block cannot exceed hard_gas_limit_per_block, which is a protocol parameter; its value (on the master branch) is 5200000. With Tenderbake, in the best case scenario a block will be decided at round 0, therefore, according to the current value of minimal_block_delay, the maximal block production rate is 1 block per 30 seconds. Next, we can use the average transaction cost that we have obtained through analysis of transaction data to determine TPS as:

tps = hard_gas_limit_per_block / (average_transaction_cost * minimal_block_delay)

Our utility (which is part of Tezos and can be built by running make build-tps-deps && make build-tps) can perform this estimation:

./tezos-tps-evaluation-gas-tps -a average-block=src/bin_tps_evaluation/average-block.json
[14:26:25.243] Starting test: tezos_tps_gas
[14:26:27.956] Reading description of the average block from src/bin_tps_evaluation/average-block.json
[14:26:28.061] Originating smart contracts
[14:26:28.285] Waiting to reach the next level
[14:26:57.514] Average transaction cost: 2900
[14:26:57.514] Gas TPS: 60
[14:26:57.536] [SUCCESS] (1/1) tezos_tps_gas

We can see that the gas-based estimation of TPS on Mainnet is currently 60.

For comparison with blockchains like Bitcoin that do not support smart contracts, here is a gas-based estimation for regular transactions only:

./tezos-tps-evaluation-gas-tps
[08:40:21.806] Starting test: tezos_gas_tps
[08:40:21.807] Gas TPS estimation
[08:40:24.382] Using the default average block description
[08:40:24.499] Originating smart contracts
[08:40:24.723] Waiting to reach the next level
[08:40:54.642] Average transaction cost: 1421
[08:40:54.643] Gas TPS: 122
[08:40:54.667] [SUCCESS] (1/1) tezos_gas_tps

Benchmark-based estimation

It is also possible to lift limits imposed by the protocol in the form of constants and parameters, as well as hard-coded values in order to determine the purely technical TPS limit by running a specialized benchmark. This approach requires us to make some choices regarding the setup:

We are going to examine the smallest self-sufficient network possible: a node + a baker. This way we will be able to more directly judge raw performance of the node and the baker, but not of a distributed network. The former is much simpler and more predictable than the latter. Also, needless to say, performance of a distributed network depends directly on the performance of the node.
A transaction is considered processed if it is successfully injected and included in the next produced block, i.e. it has been successfully applied and validated.
The stream of transactions is modeled on actual historical data.

The TPS benchmark is run with the tezos-tps-evaluation-benchmark-tps command. It spawns a network comprising a node, a baker, and a client. The network will use the same constants and parameters as the Mainnet. By default 10 blocks will be produced, but this can be changed by supplying the blocks-total command line option. The total number of applied operations in these blocks will be divided by the total time spent producing the blocks and the resulting value will be presented as the empirical TPS. The benchmark is also capable of calculating de facto TPS of injection—the number of transactions actually injected, which is useful in judging the results.

./tezos-tps-evaluation-benchmark-tps -a average-block=src/bin_tps_evaluation/average-block.json -a lift-protocol-limits
[20:42:19.167] Starting test: tezos_tps_benchmark
[20:42:19.167] Gas TPS estimation
[20:42:21.761] Reading description of the average block from src/bin_tps_evaluation/average-block.json
[20:42:21.858] Originating smart contracts
[20:42:22.082] Waiting to reach the next level
[20:42:51.262] Average transaction cost: 2900
[20:42:51.262] Gas TPS: 60
[20:42:51.293] Tezos TPS benchmark
[20:42:51.293] Protocol: Alpha
[20:42:51.293] Blocks to bake: 10
[20:42:51.293] Accounts to use: 30000
[20:42:51.293] Spinning up the network...
[20:43:01.039] Originating smart contracts
[20:43:01.861] Waiting to reach the next level
[20:43:27.301] Using the parameter file: /run/user/1000/tezt-185969/1/parameters.json
[20:43:27.301] Waiting to reach level 3
[20:43:57.054] The benchmark has been started
[20:49:03.482] Produced 10 block(s) in 306.43 seconds
[20:49:04.091] BLSh3JB76aHp2Zg37zrv35kr6XNu8Nti9hiGELriEWNJXHoEQN9 -> 5699
[20:49:04.431] BM2MMEawEUvKZm25LoB81rGEm2MtQtdkaXzAeuJACsbcZcsuuD2 -> 5651
[20:49:04.759] BLRN2oxtgpep1D3oCiD6NGo9GPha6Z2S69LiTa8SPskHLdjN1Ag -> 5762
[20:49:05.095] BMD3U3mv8EM9ZKYsK6sxFob5ZHHCYRUZiSn6LJVgvnp8eMaV8bY -> 5822
[20:49:05.411] BLCZf4mHGaawCCuJZLF2yFXwhg1tn32BbBezYm4dnik1mfDQ4wK -> 5892
[20:49:05.684] BLpPYyppLmp4enYopGBWNHvhGLzBFuWaFSJmEqkR1QQK1stWbPt -> 3868
[20:49:05.996] BKmqoWrZgevEmmd7RXaEX5umsBQaW3cnHJRK4AJq5uzEXzaryeE -> 5721
[20:49:06.303] BMGpJV4HtYjzfeq5HLJmvN9LFpPoPeuBeEdGdwYsZ57rc7gyvnQ -> 5810
[20:49:06.610] BLYCC8jRAb4xAQAYzZixC4PSYLacFtFHAV2C6jWi43obZL9aXVw -> 5537
[20:49:06.918] BLAb72vxpN5tnGoq2gbLP1ce8XfAqEruWgbkheyKdiUmMC5LuDt -> 5417
[20:49:06.918] Total applied transactions: 55179
[20:49:06.918] Total injected transactions: 55452
[20:49:06.918] TPS of injection (target): 1000
[20:49:06.918] TPS of injection (de facto): 180.96
[20:49:06.918] Empirical TPS: 180.07
[20:49:06.962] [SUCCESS] (1/1) tezos_tps_benchmark

We can see that on my laptop the empirical TPS result for the minimal network is 180.

The goal of the TPS benchmark is to give a high-level estimate of the TPS value that the system is capable of. It can be used to catch TPS regressions, but not to find where exactly the bottlenecks are.

The empirical TPS is significantly affected by the hardware on which the benchmark is run and other factors, such as the amount of logging that is performed. For example, passing -v is likely to result in lower empirical TPS values. This is why it is important to run this kind of benchmark on a dedicated runner that has predictable performance.

The empirical TPS should normally be very close to the de facto TPS of injection. If it isn’t, then it means that the system cannot keep up with the injection rate, i.e. the bottleneck is in the system. Otherwise the bottleneck is in the injecting code, as is the case in the log above.

Automatic daily runs of the TPS benchmark

Every day the TPS benchmark is run and the following results are registered:

Gas TPS
Results of running the TPS benchmark with protocol limits:
De facto TPS of injection
Empirical TPS
Results of running the TPS benchmark with protocol limits lifted:
De facto TPS of injection
Empirical TPS

Regressions for these values are detected and recorded using the same framework as the one Tezt long tests use.

Current limitations and future work

Since every smart contract is unique, we cannot automatically provide support for all smart contracts that might become popular in the future. Therefore, some work will be necessary in order to add more smart contracts to the benchmark.

Another area for improvement is making the injecting code more efficient so that it can reach higher rates of injection.

Announcing Tezos’ 10th protocol upgrade proposal “Jakarta”

2022-04-16T12:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

Update: A revised prootocol proposal, Jakarta 2, has been released. Jakarta 2 adresses two critical bugs in the implementation of Transaction Optimistic Rollups.

We were proud to see Ithaca 2 go live on April 1st. In keeping with our policy of proposing upgrades on a regularly scheduled basis, we are happy to announce our latest Tezos protocol proposal, Jakarta.

Jakarta’s “true name” is its hash, PtJakartaiDz69SfDDLXJSiuZqTSeSKRDbKVZC8MNzJnvRjvnGw.

The Jakarta protocol proposal contains major updates to the Tezos economic protocol, as well as numerous minor improvements. In this article, we preview its most relevant features. A more thorough description can be read in the protocol proposal’s technical documentation entry, and a complete list of changes is provided in its changelog.

Transaction optimistic rollups

We are pleased to announce that Transaction Optimistic Rollups (or TORUs) have found their way into this protocol proposal, as a first and experimental implementation of optimistic rollups on Tezos. As the name implies, TORUs allow for exchanges of assets, but not execution of smart contracts.

‍Optimistic rollups enable higher throughput (TPS) by moving the validation of transactions away from the main chain, to ‘Layer 2’. They are called optimistic, because they work on the assumption that this validation is correct until explicitly proven otherwise. One consequence of this approach is longer finality when withdrawing from the rollup, currently set to 40,000 blocks (about 2 weeks).

Optimistic rollups have a number of desirable properties:

Trust minimized: You don’t have to trust that a majority of the rollup nodes are honest to always be able to withdraw your funds from the rollup. One honest node is enough.
Permissionless: Anyone can submit operations to a rollup since all the rollup block data is posted on the main chain.
Capital efficient: Unlike with state channels (e.g., Lightning Network), rollup users are not required to lock up a bond upfront. Only rollup node providers are.

With our proposal, Tezos rollups will not be implemented as smart contracts (like, e.g., Arbitrum on Ethereum), but rather natively in the economic protocol — making them enshrined rollups. Leveraging Tezos’ unique self-amendment feature this way allows us to implement more efficient, expressive, and scalable solutions.

For a broader understanding of optimistic rollups, and the rationale behind them, we recommend checking out our blog post outlining a scaling strategy for Tezos.

Note that TORUs should be considered an experimental feature, and they are introduced with a sunset of 1 year. This sunset can be extended, or removed altogether, in future proposals, depending on community adoption.

A safer Sapling integration

We recently reported a design flaw of the existing integration of Sapling transactions into Michelson smart contracts, which makes unshielding tez from certain smart contracts vulnerable to manipulation.

The Jakarta protocol addresses this situation by implementing a new, safer design.

Jakarta also deprecates the previous format of the Michelson integration for Sapling transactions, and if Jakarta is adopted, it will only be possible to originate Sapling smart contracts which conform to the new version.

Further details on the vulnerability and the new integration design can be found here.

Liquidity Baking toggle vote

The Liquidity Baking Escape Hatch mechanism has been redesigned and renamed to the “Liquidity Baking Toggle Vote”.

The options are now “On”, to vote for the Liquidity Baking subsidy being turned on, and “Off”, to vote for the subsidy to be turned off, and a new “Pass” option to abstain.

Another change is that if the threshold for deactivation is reached, the deactivation of the subsidy is no longer permanent. If the proportion of bakers voting “On” later increases back over the threshold, the subsidy can be restarted.

More information can be found in the feature’s TZIP.

Note that the future Tezos Octez v13 baking daemon for Jakarta, tezos-baker-013-PtJakart, will impose the use of a mandatory command-line flag --liquidity-baking-toggle-vote to set up the delegate’s default Liquidity Baking toggle vote preference, and will also provide an optional --votefile flag intended to be used to declare the path for a JSON file encoding the delegate’s toggle vote. When provided, the latter file takes precedence over the former mandatory flag.

Michelson interpreter improvements

We have implemented various improvements to type safety and performance of the Michelson interpreter. The majority of these changes do not impact the semantics of Michelson other than decreasing gas costs for parsing and unparsing scripts.

The single semantic change is ignoring annotations. With Jakarta, annotations are only used by the type-checker and the interpreter to identify smart contract entry-points. Additional annotations still remain valid but they no longer carry any semantic meaning.

Moreover, the protocol proposal upgrades a few smart contracts which relied on legacy features which are no longer available since the Babylon protocol upgrade. They are now compliant with the modern Michelson specification. The list of patched contracts, the patches themselves, and a detailed description of the process can be found at the merge request.

Tickets hardening

Tickets provide a first-class notion of ownership in the Tezos Protocol. They can be used to implement fungible as well as non-fungible tokens.

We introduce a mechanism for explicitly tracking ownership of tickets in the protocol. Whenever a ticket is created or its ownership changes — for instance by sending it to a different contract — it is explicitly recorded and validated against a balance table. This extension does not impact the Michelson API.

The feature serves two purposes:

Extra protection against attempts to forge tickets
Facilitate Layer 2 solutions that use tickets to represent assets that can be exchanged with the main chain (e.g., TORUs)

Rolls are no more

The Jakarta protocol proposal redefines the computation of delegates’ voting power in the self-amendment process. Instead of being measured in terms of rolls, it is now defined directly by delegate’s stake (expressed in mutez). The minimal stake required to be assigned voting rights is kept at 6000 tez.

This change complements those introduced with the Ithaca2 protocol, in order to unify voting and baking power. As result the notion of rolls is no longer relevant, and has been deprecated.

Testing, testing…

A testnet for the Jakarta protocol named Jakartanet will launch in the coming days. It is critical to have as many bakers, tool builders, indexers, wallets, etc. as possible participating in this testnet.

We are looking for more bootstrap bakers to participate from day one. If you are interested, please join the Tezos baking slack and reach out in the #test-networks channel.

Furthermore, we strongly encourage you to test your own Tezos-based applications for compatibility problems with Jakarta. Jakarta, and the configuration for its test network Jakartanet, will be included in version 13 of Octez.

Should the Jakarta protocol proposal be accepted by the community, v13 of Octez or v3 of TezEdge will be necessary to participate in consensus due to necessary changes introduced to the protocol environment.

Over the coming months, our teams will continue to work on increasing performance, lowering gas consumption, reducing block times, and increasing the overall throughput — as measured, for example, in transactions per seconds or smart contract invocations per second. We are excited to be part of this continued development of Tezos.

Activating Tenderbake — a story in data

2022-04-13T18:00:00+02:00

On April 1st, 2022, the Tezos blockchain successfully executed its most ambitious protocol upgrade to date. At block #2,244,609, the Ithaca2 protocol upgrade was activated on Tezos mainnet, moving the Tezos network away from the Emmy family of consensus algorithms to Tenderbake.

It is not the first time we have made changes to the consensus algorithm via the on-chain amendment process, but this time we have collectively gone where nobody in this space has gone before: hot-swapping to a completely different consensus algorithm on a live network without a hard-fork in just 1815 seconds.

Yes, it took only ~30 minutes for a majority of the network participants to surf the migration waters, get the first block endorsed by 2/3 of the network, and deliver Odysseus home to Ithaca.

We have been monitoring the network closely since then, and we are glad to see that the transition from Emmy* to Tenderbake was generally frictionless, and that the new consensus algorithm is running without interruptions since the first block of the cycle.

In this article, we present some early observations and reflections on the behavior of the network during cycle 468 - that is, the first 8192 blocks of the Ithaca2 protocol, between Friday evening and Monday evening (Paris time).

The data for this article was retrieved from the TzKT API.

Say hello to fast deterministic finality!

The following table discriminates the first 8192 blocks of cycle 468, from the activation of Ithaca2 on block #2,244,609 until block #2,260,992 on CEST Monday, April 4th.

Table 1: Cycle 468 statistics by block round

Consensus on the head of the chain was both safe and fast: the average time between blocks was 32.058s, close to the theoretical minimum of 30 seconds. This is a direct consequence of having an overwhelming majority of blocks proposed and agreed upon in round 0: 7936 blocks out of 8192 – around 97% of them.

The round number for a block denotes the number of attempts (minus one) that were necessary to reach a consensus on a block at a given level. Lower is better, and a high number of rounds indicates a slow or unreliable network. Under normal network conditions, the chain should reach a consensus about the next block in round 0 – that is, on the first attempt.

Of the remaining 256 blocks, 236 were produced on round 1, 11 on round 2, and 5 blocks in rounds between 3 and 5. The only two outliers were understandably the first two blocks of the protocol migration, which passed respectively on rounds 14 and 12. The latter were, also understandably, the slower blocks of the cycle, with a time between blocks of 1815s and 1590s.

These facts are better illustrated observing the values for these attributes throughout the whole cycle. Figure 2 plots, for each blockchain level, the payload and block rounds. Further below, Figure 3 plots the time between blocks without outliers above round 3¹.

Figure 2: Block and payload proposal rounds per level in cycle 468

With Tenderbake, we distinguish between the payload producer and the block proposer. The payload is the non-consensus content of a block (transfers, smart contract calls, etc.), and it can be locked even if there was no consensus reached at the round. A baker with rights to propose a block on a higher round must then re-use the same payload – in jargon, it can re-propose the payload. Thus the payload round is the round at which the block’s payload has been first proposed (at the current level), and it is smaller or equal to the block round.

In the majority of cases, the payload and block rounds match, even for blocks proposed at rounds higher than 0. Even when Tenderbake splits baking rewards (awarding 10 ꜩ to the payload producer and a variable bonus to the block producer), in practice, most blocks in the cycle have the whole of the baking rewards allocated to the same baker. This also illustrates that even for higher rounds, the payload was first proposed at the same round as that of the block that was finalized, entailing that the failure to achieve a quorum on the first round was more likely to be caused by absent bakers in the committee, rather than a slow or untimely propagation of preendorsements.

Figure 3: Time between blocks per level in cycle 468

As we mentioned above, the average time between blocks was 32.058s, pushed downwards by the majority of round 0 blocks. For round 1 blocks, it was 62.521s, or a bit over a minute. In Figure 3, we also observe a significant minority of blocks with a 45s time between blocks. This corresponds to round 0 blocks which follow a round 1 block, as the time between blocks measures the difference between two consecutive blocks’ timestamps, and the timestamps are set at the beginning of the round. Thus, the difference in such cases matches the length of the duration of round 1 – exactly 45s. These corner cases also explain why the average values for the time between blocks at a given round presented in the table above, e.g., for round 0, diverge slightly from the theoretical minima. It moreover explains the maximal outliers, e.g., the 90s time between blocks for #2,250,528. The latter is a round 0 block which follows a round 4 predecessor, #2,250,527.

A safe network means higher bonuses

Starting with Tenderbake, a part of the baking rewards of up to 10 ꜩis paid to the baker for having included in a block extra endorsements for its predecessor resulting in more than the minimal 4667 endorsements slots. That is, beyond the 2/3rds of the total validations required to keep the chain live. On average, the blocks in the first cycle had 6795.5 endorsed slots out of 7000 — or a 97%. This not only resulted in the majority of fast round 0 blocks we described before, but was also reflected in the bonus rewards paid to the bakers: the average bonus included in each block was 9.125202 ꜩ.

Figure 4 plots the distribution of bonus baking rewards for each level of cycle 468: the first few blocks included lower bonuses, as the network was still recovering from the protocol migration and some bakers were catching up. In fact, the bonus rewards averaged 4.227153 ꜩ in the first 100 blocks, but then we see how it rose quickly to stabilize over 8 ꜩ by the 1000th block. Past the quarter-cycle mark, it settled over 9 ꜩ. In fact, the average bonus rewards for the trailing three-quarters of the cycle was 9.379600 ꜩ.

Figure 4: Bonus rewards awarded per level in cycle 468

Notice that around level #2250609, we can observe a noticeable drop in the bonus reward: this is consistent with a large drop in validation power due to a top public baker with a significant stake being out temporarily. This highlights the fact that given that Tenderbake favors safety over liveness, ensuring that the network is live — and that everybody reaps the rewards — relies more than ever on the collective effort.

The sweeping crew is still working

In the days following the activation, a few bakers using Ledger signers have been reporting issues with their baker software: the Ledger seems to either occasionally freeze and disconnect when signing (pre)endorsements, or return parsing errors when signing operations or blocks.

These can cause the baking daemon to freeze for up to ~10 minutes, resulting in endorsement misses and occasionally lost baking slots. This is happening seldomly, and we estimate it will not affect the participation of the affected bakers enough to make them miss the full endorsement rewards for the cycle. Sadly, this is not the case for the occasional blocks lost.

We have been keeping busy with these (and other minor issues), which should be fully addressed by an upcoming release of the Ledger Tezos baking app.

I am hooked on Tenderbake, give me more to read

Along the ride to deliver Tenderbake to Tezos, we have produced and shared reports on the development of Tenderbake and its features. If you want to read more about Tenderbake, we leave you a few good reads:

several blog entries;

a few academic articles;

L. Aştefănoaei, P. Chambart, A. Del Pozzo, T. Rieutord, S. Tucci, E. Zălinescu. Tenderbake — A Solution to Dynamic Repeated Consensus for Blockchains. In 4th International Symposium on Foundations and Applications of Blockchain (SCFAB) 2021.
S. Conchon, A. Korneva, Ç. Bozman, M. Iguernlala, A. Mebsout. Formally Documenting Tenderbake. In 3rd Workshop on Formal Methods for Blockchains (FMBC‘21), July 2021.

and, of course, the main documentation entry-point:

https://tezos.gitlab.io/ithaca/consensus.html

If you made this far, you might be interested to know that our consensus team is hiring!

Lessons and looking ahead

The activation of the Ithaca2 protocol marks the end of a two-years-plus-long journey undertaken by the research and development teams of Nomadic Labs, Functori, Tweag, and other Tezos core developers and academic partners, like CEA List and Université Paris-Saclay.

The superlative performance of Tenderbake so far is also a consequence of the preparedness of the community. A large majority of bakers were ready for Tenderbake, and had done their homework in time. We are also thankful to the many community members that helped out spreading the word, and were ready to help others in different spaces. After all, this is a decentralized, collective effort.

Tenderbake is alive and Odysseus is finally home on Ithaca, but it does not mean we are done. We have ambitious plans for the future of the Tezos blockchain, and the performance of the early Tenderbake cycles gives us confidence to keep pushing the boundaries.

Sails are going to be raised back up soon.

Fair winds!

Otherwise, it does not plot nicely, and we cannot distinguish between blocks on rounds 0, 1, and 2. ↩

Ithaca 2, the latest Tezos upgrade, is LIVE!

2022-04-01T21:00:00+02:00

On 1 April 2022, the Tezos blockchain successfully upgraded by adopting the Ithaca 2 proposal at block #2,244,609.

It is the ninth Tezos protocol upgrade and was jointly developed by Nomadic Labs, Marigold, TriliTech, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

Ithaca 2 introduces Tenderbake, a new consensus algorithm for Tezos. A comprehensive introduction can be found here, but the headlines are:

Fast, deterministic finality: A block can be trusted to be final after two blocks, regardless of delays in network communication. Under normal circumstances this is one minute.
Choosing a safe network over a live network: As long as 2/3 of the stake is honest, Tenderbake allows no parallel block production that can potentially revert transactions. Any small fork stops producing blocks, and if more than 1/3 of the total stake is isolated, the entire network will halt and await reconnection.
The road to faster blocks: While block time is kept at 30 seconds in Ithaca 2, Tenderbake makes it safe to introduce lower block times in future protocol upgrades.

A number of other changes are mostly relevant for bakers:

Lower minimum stake: The minimum amount of tez required to receive baking/endorsement rights is reduced from 8,000 to 6,000.
Instant rewards: Baking/endorsement rewards will no longer be frozen for 5 cycles as has been the case up until now.
Fewer problems with overdelegation: Instead of making a deposit with each baked/endorsed block, bakers freeze 10% of their stake upfront in each cycle. Bakers can set a limit for how much of their stake will be used. This prevents missed slots from overdelegation.
Total stake is used: Rolls are no longer used for assigning baking/endorsement rights. Bakers will receive slots in proportion to their actual stake, which should benefit small bakers in particular.
More endorsements, steady participation: Endorsement slots per block are increased from 256 to 7,000. This means a baker with the minimum amount of tokens will participate every 10 blocks on average.

Also included in Ithaca 2

Precheck of manager operations: The new version of the protocol allows any Tezos shell (e.g., Octez and TezEdge) to avoid fully executing manager operations (those created by end users, e.g. transfers and smart contract calls) before gossiping them through the network. This lets operations reach bakers faster, and is a prequel to further optimizations that can increase throughput.
Liquidity Baking: Ithaca 2 includes an increase to the liquidity baking sunset level of 819,200 blocks, or twenty voting periods, roughly an additional ten months. To balance this increase, the threshold for activating the escape hatch is lowered from 50% to 33%.

… and more: We invite you to look at the changelog for a full description of the contents of Ithaca 2, which includes a new environment version (V4), updates to Michelson, as well as minor changes and bug fixes.

Congratulations to everyone involved in the development of this protocol amendment and welcome to the Tezos blockchain, Ithaca 2!

A special thank you to the Tezos community for updating nodes, bakers, signer software/hardware, indexers, etc., in preparation for Tenderbake — and for reaching out to others to make sure everyone was ready.

Our next protocol proposal, “J”, is targeted for injection in mid-April. It will be the first of a series of scaling-focused upgrade proposals meant to prepare Tezos for high-throughput use cases and long-term growth. Stay tuned!

Get ready to roll. Tezos is scaling

2022-03-28T17:00:00+02:00

TL;DR: A series of Tezos protocol proposals in 2022 will focus on paving the way for further adoption, high-throughput use cases and expanded smart contract functionality.

In case you haven’t heard: 2022 is the year of scaling for Tezos.

The new Tenderbake consensus mechanism enables lower block times, which will improve latency and finality – how quickly a transaction is included and made irreversible by the network. For the user it means faster transactions and smoother running dapps.

But Tenderbake doesn’t significantly change how many transactions can be processed per second, throughput. And that is the essence of scaling: enabling more people to do more on the network at the same time.

As increased adoption brings more projects, companies, and institutions into the Tezos ecosystem, it’s time to prepare the network for more activity and high-throughput applications.

However, there are natural limits to how much we can increase throughput on the main chain, Layer 1. We are pursuing every possible avenue to push the limits but going beyond them has a cost. Any blockchain doing so sacrifices either decentralization, latency, stability, security, censorship resistance, or a mix of these.

If we want to avoid this for Tezos, we need a different approach. To support very high-throughput use cases and prepare Tezos for long-term growth, we therefore turn to optimistic rollups, a Layer 2 solution.

It’s a solution that can be implemented now to fulfill Tezos’ short-term scaling needs. We expect optimistic rollups to initially offer a 10-100x increase in throughput. And with further upgrades already in the making, it’s a step towards scaling on a whole new level.

The best of both worlds

Scaling fundamentally comes in two flavors.

You can use more powerful computers able to process and store more transactions, vertical scaling, or you can split the workload across a higher number of computers, horizontal scaling. Both have benefits and drawbacks for blockchains.

Pure vertical scaling – requiring powerful hardware to participate on the main chain – is a straightforward solution, but it means fewer people are able to participate. The security and censorship resistance of the main chain depends on an honest majority of block producers along with other nodes validating the chain, so it’s important to have low barriers for participation.

Some degree of vertical scaling can be achieved through software optimizations, allowing more throughput with the same hardware. This is a constant focus for Tezos development and has already resulted in significant performance improvements on the main chain – for example by refining the gas model, optimizing the Michelson interpreter, and introducing a cache for smart contracts.

Horizontal scaling of blockchains is mostly done by splitting up the main chain workload across clusters of nodes, known as sharding. It preserves decentralization, but it’s a complex solution and interaction between clusters adds latency – something we have just sought to minimize on the main chain with Tenderbake.

Optimistic rollups are a way to take elements from both approaches and apply them where it makes the most sense.

Large batches of transactions are moved from the main chain to rollup-specific nodes, a form of horizontal scaling. These transactions can then be processed using more powerful hardware, achieving vertical scaling.

Meanwhile, the security of the main chain acts as a backstop in case of foul play.

Rollups in a nutshell

The technical details of optimistic rollups on Tezos will be covered in upcoming blog posts, but the fundamental principle is this:

A rollup is an entity on the main chain, with its own address, that compactly represents off-chain transaction execution and state updates. Assets can be deposited to the rollup – and withdrawn – by accounts, smart contracts and other rollups.

Transactions using the rollup are still sent to an inbox on the main chain, but are left unprocessed by the main chain nodes. Instead they are collected by specialized rollup operators.

The operators, who store and maintain the state of a rollup, process the transactions and update the state to reflect the changes. This all happens off-chain, on Layer 2. The operator only posts a receipt back to the main chain, summarizing the new state of the rollup as a cryptographic hash. A commitment.

This process of continuously “rolling up” transactions into commitments is where the name comes from. Collect, update, commit, repeat. The cycle is executed as fast as possible by the rollup operators, using powerful hardware if needed.

By contrast, main chain nodes simply store incoming rollup transaction data and the commitments. They are relieved of the computationally expensive task of validating the transactions, which lets them produce and validate main chain blocks at a higher rate.

The system is optimistic because a commitment is treated as being correct unless someone disputes it. If no dispute happens within a given time period, assets can be withdrawn back to the main chain.

To incentivize honest behavior by the rollup operators, there are economic penalties for posting incorrect commitments. Should a dispute happen, the main chain serves as a court.

This doesn’t just make incorrect commitments costly for the operator: any incorrect commitment is ultimately neutralized. As long as there is just a single honest node checking commitments, and the main chain is censorship resistant, posting incorrect commitments is futile.

A rollup roadmap

Optimistic rollups are implemented in line with the general Tezos approach: gradual evolution through incremental upgrades.

The development is a joint effort by teams at Nomadic Labs, Marigold, Oxhead Alpha, Functori, Tarides, DaiLambda, and TriliTech.

The roll-out will differentiate between transaction rollups and smart contract rollups.

Both are implemented as protocol upgrades rather than deployed as smart contracts, which allows for more gas- and storage-efficient implementations. They are also permissionless. Anyone will be able to launch a rollup or be an operator for any rollup.

Transaction rollups are expected to be part of the upcoming “J” upgrade proposal. They can handle account-to-account transactions of assets, but won’t be able to run smart contracts. This allows for a simpler design and faster implementation.
Smart contract rollups are targeted for later this year. The chosen approach is inspired by proven designs for rollups, but tailored to Tezos. They are designed with a generic structure that is not limited to Michelson smart contracts, but can be adapted to support environments like WebAssembly or EVM, the Ethereum Virtual Machine.

Next-level scaling

While optimistic rollups are a good scaling solution for Tezos, they do currently have limitations.

One issue is that all transaction data for the rollup inbox must be included in blocks on the main chain, but there is a limit to how much data can be stored in each block.

Therefore, avenues for further scaling are being researched and developed.

While sharding all main chain functionality isn’t an approach we’re currently pursuing for Tezos, sharding can be used to increase the size of the inbox for rollups.

The approach is called data-availability sharding, and has the major benefit of further increasing rollup throughput as more bakers join the network. Horizontal scaling at work.

Instead of having every main chain node download and store all incoming rollup transaction data, the data is spread among the network nodes, and among bakers in proportion to their stake. Essentially making bakers data providers for rollup nodes.

This increases the total bandwidth and capacity, which improves throughput for rollups significantly. If rollups alone offer a 10-100x increase in throughput, the increase is 1,000-10,000x when combined with data-availability sharding.

The zero-knowledge track

Another issue is the time period needed for commitments to be checked and potential disputes resolved.

Sending assets to a rollup and transacting inside it has the same finality as the main chain. But to withdraw assets back to the main chain, you must wait for the dispute period to pass. On Arbitrum, a popular rollup system on Ethereum, the period is currently seven days.

In parallel to the work of implementing optimistic rollups, we are therefore also exploring zero-knowledge rollups, or zk-rollups.

The principle of moving transactions off-chain and posting commitments to the main chain is the same as with optimistic rollups, but zk-rollups attach a cryptographic proof of correctness with the commitment. No need for a waiting period for withdrawals.

This is possible because zk-proofs are very small and can be checked by main chain nodes in a fraction of a second, regardless of the amount of transactions or smart contract operations involved.

The downside is that zk-proofs are currently highly time consuming to generate. It requires a disproportionate amount of computing power, especially when dealing with smart contracts. The zero-knowledge team at Nomadic Labs is however working on improving this.

We look forward to updating the community, as the work on data-availability sharding and zk-rollups progresses.

In the meantime, let’s get ready to roll.

We Discovered a Flaw in the Sapling Protocol Integration — a fix is ready for J.

2022-03-15T16:00:00+01:00

Summary: A design flaw in the SAPLING_VERIFY_UPDATE Michelson opcode renders unshielding transactions from the shielded pools implemented in sapling_contract.tz malleable. This affects any deployment of this sample contract on the Tezos mainnet, and could potentially extend to other contracts following a similar pattern. We have implemented a new, strengthened, version of this primitive which should become the required opcode for safer integration of Sapling transactions in Michelson smart contracts. This new version will be available as a part of the upcoming J protocol proposal. In the meantime, we advise against originating new Sapling contracts on mainnet, and to avoid interacting with the existing contracts.

This blog post follows a discussion in Tezos Agora, where a vulnerability is revealed in sapling_contract.tz, a sample Michelson smart contract distributed as a part of the test suite of the Tezos protocol. This flaw could potentially apply to other contracts implementing a similar pattern.

The bug in the contracts arises from a design flaw in one of the Michelson opcodes provided to integrate Sapling transactions to Tezos smart contracts, SAPLING_VERIFY_UPDATE. That is, this is not a vulnerability in the Sapling protocol, its implementation, or even the implementation of the Michelson opcode per se. Instead, it is rather the result of its specification, and the provided interface, not being sufficiently strong to write safe smart contracts.

Over the last few days, we have implemented a patch addressing this issue on tezos!4589, which will be included with the next protocol proposal for protocol J, strengthening the support for shielded transactions.

In the meantime we advise the Tezos community to avoid originating new contracts creating shielded pools, and to avoid interacting with the deployed contracts. Even if we have not observed any exploit of this vulnerability, we are confident an attack is indeed plausible.

A recent inspection of deployed contracts on the Tezos mainnet at level 2155024, reveals only 2 contracts originated using a copy of the vulnerable shielded pool implemented by sapling_contract.tz: KT1BUZy6x and KT1E3kx2W¹. At the time of writing, the outstanding balance in the two shielded pools is under 6 tez. We recommend users to avoid interacting with them.

Given the low exposure in locked liquidity in the shielded pool, as this feature has not been used extensively on mainnet so far, we consider that shipping this fix with the J protocol proposal is the best alternative among other available options.

In the sequel, we provide further detail on the vulnerability and we describe the efforts undertaken to lift the current limitations of the Sapling protocol integration.

Sapling and unshielding transactions

Since the activation of the Edo2 protocol proposal on block 1,343,489 a little over a year ago, the Tezos economic protocol has provided integration for a Sapling protocol, which enables privacy-preserving transactions. The design and implementation of Sapling in Tezos follows closely the original design by the Electric Coin Company for ZCash. Its integration into the Tezos economic protocol provides also two Michelson opcodes, SAPLING_EMPTY_STATE and SAPLING_VERIFY_UPDATE, to enable application developers to seamlessly integrate Sapling transactions into their smart contracts. This opened the door to new type of privacy-preserving applications, in particular asset transactions with selective disclosure — as implemented in sapling_contract.tz.

In order to support such applications, the SAPLING_VERIFY_UPDATE provides a unified interface for both: shielding (i.e., depositing) tez into a shielded pool, unshielding (i.e., withdrawing) tez from a pool, and transferring shielded tokens between shielded accounts — all this using a single interface. The specification reads:

SAPLING_VERIFY_UPDATE / t : s : S => Some (Pair b s') : S

In words, it prescribes that the opcode consumes a shielded transaction t and a Sapling state s from the top of the execution stack. It will then verify the transaction and, if it succeeds, it returns Some outstanding balance b, and a new state Sapling state s'². The balance b is a signed integer value which determines an outstanding balance that needs to be consolidated with the smart contract’s unshielded balance:

A strictly-positive balance value entails unshielding tokens — e.g., to match burned shielded tokens in a pool.
A strictly-negative balance value entails shielding tez — e.g., to compensate minted shielded tokens in a pool.
A zero-valued balance denotes an internal shielded operation which doesn’t prescribe a change in the unshielded balance of the contract — e.g. an internal transfer between shielded accounts.

Thus, in the case of non-zero balances, a strictly-positive b entails that b tez need to be debited from the contract to restore the application’s invariant. Respectively, a strictly-negative b entails that b tez have to be credited to the contract to preserve the invariant.

The flaw in the current integration lies within the interaction between the Sapling protocol and the Tezos protocol, in particular when withdrawing tokens from a shielded pool. As a general rule, when designing an operation that transfers ownership of a token, it is crucial that the key owning the tokens signs a permission to debit a certain amount, together with the address of the recipient. If the recipient is not signed by the owner, then an attacker can intercept the transaction, and swap the recipient with its own address. For example in the case of a shielding transaction, the Tezos account that owns the tez signs a transfer which includes the Sapling recipient. Unfortunately, the converse is not true. In the case of an unshield, there is no way for a Sapling key to sign the Tezos recipient, given the current format of Sapling operations as introduced in Edo.

In short: when someone withdraws tez from the shielded pool, an attacker observing the pending smart contract call can steal the Sapling transaction parameter, and use it to inject a new smart contract call, with their own address as recipient. This can lead to a loss of funds, but it doesn’t compromise the privacy of the shielded pool.

Now, in order to illustrate this design flaw, we take a closer look at sapling_contract.tz. This contract manages a shielded pool where tokens are pegged 1 to 1 to tez. This tokens can be shielded to and unshielded from the pool, or transferred between shielded accounts, using Sapling transactions. We focus on a few key selected lines.

On line 5, we observe the parameter declaration. The first parameter is the Sapling transaction to be verified by the protocol, denoted by (sapling_transaction 8) type — just ignore the 8 value there³. The second has a key hash option type:

parameter (list (pair (sapling_transaction 8) (option key_hash) ) );

The latter is an optional implicit account, which denotes the intended destination, when unshielding the tokens from the pool. As you can see, the two arguments are given separately and there is no cryptographic signature binding the two. In other words, the combination of Tezos smart contract call and Sapling unshield is malleable. An attacker can intercept an unshield operation, and proceed to resubmit it with its own address as destination instead, effectively stealing the unshielded tez. Let’s see why:

The whole interaction with the Sapling protocol on this contract, as we hinted before, is concentrated on the SAPLING_VERIFY_UPDATE opcode on line 21:

# We verify the transaction and update the storage if the transaction is
# valid. The shielded transactions are handled here.
# The new state is pushed on top of the stack in addition to the balance
# of the transaction. If the rest of the script goes well, this state
# will be the new state of the smart contract.
    SAPLING_VERIFY_UPDATE;

The remark in the docstrings are worth revisiting: the pushed state will be the new state of the smart contract, making proof replay impossible. In the case of an unshield — a positive balance being pushed to the stack as a result — , the interesting bit of the contract’s effect takes place jumping forward to the IFGT opcode expanding over lines 36 — 45:

IFGT {
       DIIP { ASSERT_SOME;
              IMPLICIT_ACCOUNT };
       SWAP;
       DIP { UNIT;
             TRANSFER_TOKENS;
             SWAP;
             # Stack manipulation to order.
             # The operations will consist of the# TRANSFER_TOKEN operation.
             DIP {CONS} ;};

If on line 36 the value on the top of the stack is greater than zero, then, on line 41, we will TRANSFER_TOKENS (that is, unshielded tez) to the address pushed on the stack to restore the 1 to 1 peg, in this case the second parameter provided to the contract.

Now, we see clearly that there is no binding of the destination consumed as an argument by TRANSFER_TOKENS, and the Sapling transaction verified by the protocol, as we mentioned above. Enforcing the operational correctness of unshield, that is, ensuring that tokens arrive to their intended destination, is a burden of the rest of the smart contract — and ultimately, its developer. This contract does not implement any such mechanism, and therefore is indeed malleable. An adversary could plausibly exploit this by observing a valid call to this contract, and subsequently replace the destination with the attacker’s public key hash.

Indeed, this is a critical bug in the contract, and any other contract following this pattern on mainnet is vulnerable as well. This is clearly an unintended design error on the integration of the Sapling protocol, and not in the implementation of the Michelson API — which is correct with regard to its specification — nor, on the Sapling protocol — which is still, safe and sound.

That said, the limitations of the current existing API are clear, and the Tezos protocol needs an improved set of primitives for safe, privacy-preserving transactions. We discuss our immediate first step in the next section.

Strengthening shielded transactions

In order to address this issue, we have implemented the following changes to the Sapling integration in Tezos and Michelson, which will be included in the upcoming J protocol proposal. Adopting the proposed changes will prevent the issues described above, and enable Tezos smart contract developers to write safer privacy-preserving applications. In further detail:

We add an extra field to Sapling transactions, called bound_data. This field is signed by the Sapling spending key, as a part of the whole transaction. In the case of an unshield, it is intended to contain its recipient.
We update the Michelson type sapling_transaction, and we overload and extend the SAPLING_VERIFY_UPDATE opcode to handle the updated transaction type. When provided the new transaction type, SAPLING_VERIFY_UPDATE performs the same checks done by the previous version, and it additionally checks the signature on the supplied bound_data field. The latter is returned together with the balance and the updated state.
We provide a new sapling_contract.tz reference smart contract which expects the Tezos address of the recipient to be included in the bound_data field of the Sapling transaction. Moreover, the address should be encoded as a Micheline public_key_hash type. Thus, instead of having the recipient as a separate parameter to the smart contract, it is extracted from the Sapling transaction.
We update the Octez Tezos client to correctly populate the bound_data of a Sapling unshield transaction with the Tezos address of the recipient.
We deprecate the original Michelson type — which is also renamed to sapling_transaction_deprecated. As a result, it will continue to work on mainnet for previously originated contracts, but after (and if) protocol proposal J is activated, new smart contracts using the deprecated transaction type will no longer able to be originated, and the origination operation will fail.

We are confident this strengthening will avoid similar issues arising from loose bindings between Sapling transactions, and their sources and destinations.

It should be noticed that these changes will only be effective on the Tezos mainnet once — and only if — the J protocol proposal is accepted by the community, and after it is activated on the Tezos mainnet. Before then, we advise the community against deploying new contracts using the current support for Sapling transactions, and to avoid interacting with pre-existing deployed contracts. Shielded funds locked in the vulnerable contracts are safe, but attempting to unshield tokens from these pools entails a risk of being targeted.

Looking ahead

It is regretful that this critical issue with the integration of the Sapling protocol exists, as we were very much looking forward to seeing increased adoption of privacy-preserving features. We are reevaluating our development process to make it harder and harder for such issues to slip through in the future and we renew our commitment to develop technologies that are innovative and trustworthy.

If there is a silver lining, it is that this was caught by testing and reviewing a contract before it was massively adopted by the community. We are also reassured by the fact that a patch could be implemented quickly.

We invite the community, specially developers of applications building upon the Sapling protocol’s infrastructure, to test the new set of primitives on upcoming test networks. A teztnet for the J proposal will be announced soon, following the usual process for launching protocol test networks. Another option is to join the rolling Dailynet and Mondaynet teztnets — feel free to reach out to us if you need help deploying your contracts on the test networks.

We have observed three contracts which use the Sapling integration provided by SAPLING_VERIFY_UPDATE. In addition to the two contracts mentioned above, there is a third deployed contract, KT1UmxfNX. This contract implements different functionality, it has not been used since its origination, and thus never held any tez balance. ↩
We elide here the failing cases which would push None to the stack, as they are not needed to illustrate the bug. ↩
The value denotes the size of the memo, encrypted arbitrary bytes attached to each output, and only accessible to the owner of the viewer/spending keys. ↩

All Hands on Deck for Tenderbake

2022-02-24T18:30:00+01:00

A successful transition to the Tenderbake consensus mechanism relies on the Tezos ecosystem making sure the infrastructure is prepared for the changes. Here is a checklist of the necessary steps.

The Ithaca2 protocol proposal contains the perhaps most significant upgrade of the Tezos protocol to date.

It introduces a new consensus mechanism, Tenderbake, which brings several improvements including deterministic finality, i.e. absolute certainty that transactions cannot be reversed after two blocks.

Replacing the consensus mechanism is in itself a major undertaking, as it defines the rules by which Tezos bakers decide the state of the ledger – a core function of a blockchain.

Performing such an upgrade on a live, global blockchain network further complicates things. It is essentially like replacing the engine of a car while it is running, and it is important to note that for certain types of existing infrastructure Tenderbake introduces breaking changes.

A decentralized network is dependent on its constituent actors doing their part, and in the Tezos ecosystem that means keeping on top of necessary changes as the protocol evolves.

It can get bumpy

After the Granada upgrade in August, the network experienced longer block times and many missed endorsements, temporarily lowering network “health”. After the Hangzhou upgrade in November, context flattening caused some low-spec nodes to run out of memory.

In both cases the network stayed online and the issues were resolved, but there is no denying that protocol upgrades can get bumpy.

Despite increased testing efforts on our part, it is not possible to fully model the complexities of the mainnet, and this is where preparation becomes important.

To increase the likelihood of a smooth transition, we call on all ecosystem participants to make sure they are prepared for Ithaca2 activation, which will happen around March 31st, provided that the upgrade is voted in by the community.

This includes bakers, block explorers, wallet providers, exchanges, indexing service providers, node-as-a-service providers, dapp maintainers, and everyone else involved in providing tooling or services in the Tezos ecosystem.

The Tenderbake checklist

Generally, we encourage the ecosystem to join the ithacanet testnet to make sure their setup and infrastructure works with the upcoming protocol version.

As a minimum, the following should be completed by the time of activation:

Tezos node and baking software need to be updated to a Tenderbake compatible version. For Octez, this is v12.0 and later versions. For TezEdge, it’s v2 and later versions.
Bakers using a Ledger hardware wallet for secure signing need to update the Tezos Baking app on their device to v2.2.15. Earlier versions will NOT work after Ithaca2 activation.
Remote signing software for baking will need to be significantly updated¹.
Block explorers and other indexing software will need to be significantly updated.
Dapp maintainers are highly encouraged to test their dapps on the testnet.

If you are unsure about what needs to be done on your end, reach out to us on the Tezos baking Slack or feel free to contact the Nomadic Labs support team.

Choosing safety over liveness

Given the scope and complexity of the upcoming upgrade, we find it important to remind the community that the Tenderbake consensus mechanism marks a shift to favoring the network being safe over being live.

The current Emmy* consensus mechanism allows for multiple versions of the network, forks, to run in parallel during a major network split. This could be due to global internet disruptions or a software bug. Similar to how Bitcoin and Ethereum work.

When connection between different forks is re-established, the version with the largest stake (or for Proof-of-Work networks, the most hash-power) will define the state of the ledger. Smaller forks are abandoned.

This keeps the network running, live, at all times, but carries the possibility of transactions being reverted if they are on a smaller fork.

With Tenderbake this changes. As a so-called classical BFT-style consensus algorithm, it operates on the assumption that at least 2/3 of the total stake is honest.

As long as this is the case, there can be no parallel block production that suddenly replaces or reverts transactions following a network split. Any small fork will halt². The network is safe.

The trade-off is that if more than 1/3 of the total stake is isolated or offline, the entire network will halt until connection between at least 2/3 of the total stake is re-established, rather than staying live as separate networks.

This is by design. Intended behavior. But it can be triggered by a bug — or by a large enough number of unprepared network participants, which this blog post aims to prevent.

A note on hardware

As Tezos continues to gain adoption and will need to offer higher throughput, questions have arisen whether hardware requirements for bakers will increase.

We do not see the current upgrade to Tenderbake consensus making this necessary. We expect current hardware setups to remain sufficient, provided they live up to the generally recommended specifications.

Upcoming Layer 2 scaling initiatives, such as optimistic roll-ups, are expected to greatly alleviate the pressure on the main chain, Layer 1.

However, Tenderbake is a first step towards tweaking the network to increase throughput on Layer 1. It is possible that increased adoption, resulting in fuller blocks, combined with tweaks of certain network parameters can lead to the lowest-powered baking systems eventually needing to be upgraded.

Should it become relevant, it is important to note that a slight upgrade of the lowest-powered systems — e.g. from a Raspberry Pi to a sufficiently powered Intel NUC — would not affect Tezos’ highly energy-efficient profile in any significant way.

As the recent PwC report assessing Tezos’ carbon footprint shows, 10% of respondent bakers reported using a Raspberry Pi, each drawing about 9 watts. Switching to an Intel NUC would increase the energy consumption of these bakers’ systems to about 12 watts.

Tenderbake is a major milestone for Tezos and its community. We are excited to be a part of this major protocol transformation and eager to see what new opportunities it will bring.

The change of consensus algorithm introduces new operations (pre-endorsements), new concepts (such as rounds), and changes the semantics of existing software components. This changes the byte payload of the consensus operations signed by bakers, and the rules to prevent double (pre)endorsing/baking. Hence, a new version of the signer software is required. Read more here. ↩
Assuming the split is unintended. If more than 1/3 of the total stake is held by bad actors intentionally performing a coordinated attack, the network can indeed fork. ↩

A POPL 2022 Retrospective

2022-02-15T19:00:00+01:00

A few weeks ago, we physically attended POPL 2022 — the 49th ACM SIGPLAN Symposium on Principles of Programming Languages — which took place in Philadelphia, PA and also virtually everywhere on Earth via Airmeet.

POPL is a¹ premier annual conference event of the programming languages research community showcasing, together with several colocated events, the latest cutting edge results in programming languages, especially in functional programming and formal verification. The latter topics are very dear to the Tezos community, as they enable us to build safety-critical, complex-yet-beautiful software — such as the Tezos protocol and the Octez suite — on sound technical foundations. Moreover, research presented in conferences like POPL have immediate impact on our daily work at Nomadic Labs.

This affinity is reflected in our continuous involvement as sponsors:

The Tezos Foundation was a platinum sponsor of POPL 2022, for the fourth consecutive year.
We at Nomadic Labs are pleased and proud to have sponsored the colocated CPP 2022 (Conference on Certified Programs and Proofs), for the third consecutive year.

After almost two years of virtual events, and debugging audio and video connections across different virtualization platforms, we were eager to return to in-person POPL-ing² and real-life hallway tracks. Our on-site team consisted of: Michel Mauny, Nomadic Labs’ CEO; Germán Delbianco, Research Engineer at Nomadic Labs; and Michael Holey, Blokhaus’ community manager³. They were present throughout the week at the Tezos booth, eager and ready to answer questions about Tezos. It was an amazing opportunity to promote our interest in these research fields, and also to advertise possible collaborations, openings in the Tezos ecosystem and at Nomadic Labs, as well as research funding opportunities.

We are happy to have had the chance to catch up with the PL community, and we hope that we have planted the seeds for fruitful collaborations which will benefit the Tezos community. There was also plenty of interesting, high-quality talks, and below we bring you a few (biased) highlights of this year’s program⁴.

The POPL Program and our Highlights

Each year, the ACM SIGPLAN Symposium on Programming Languages — POPL in short — features novel research contributions ranging from theoretical foundations of programming languages (e.g. semantics and type systems), to the development and application of formal tools for crafting, and verifying, reliable and correct systems.

Among the other colocated events on offer during the week (like VMCAI, or LAFI), we took time to attend CPP 2022 and CoqPL 2022.

The full program is available here, and several events, including the main conference, have made the talks available on SIGPLAN’s YouTube channel.

POPL 2022

This year the POPL program consisted of 60+ accepted papers, in addition to 3 keynotes, and a special session of TOPLAS articles. There was plenty of high-quality research catering to different tastes and appetites. An incomplete list of those we enjoyed⁵:

Alfred Aho’s keynote, “Principles of Programming Language Translators” focused on the topic of computational thinking, and how it applies to compiler design. In particular, it revisited what an abstraction is, from a programming language perspective: we should define an abstraction as a model and a set of operations for manipulating data (the “programming language” of the data model). The talk focused on a taxonomy of four abstractions for computer science: fundamental abstractions, abstract implementation, and declarative and computational abstractions; and illustrated with examples from compiler design. Towards the end, and moving away from compilers, Professor Aho focused on quantum computing, and argued that the four postulates of quantum mechanics⁶ define new kinds of abstractions that extend the previously presented classical ones.
“Concurrent Incorrectness Separation Logic”. Incorrectness separation logic has recently been introduced as a dual to separation logic, whose goal is not to establish the correctness of programs, but rather prove the presence of bugs, by catching them, based on under-approximated reasoning. This work extends the logic to a parametric framework to a concurrent, shared memory model, which can be instantiated to soundly assess whether races, deadlocks, and memory safety errors detected are true positives — where “sound” means that all results are true positives.
“The Leaky Semicolon: Compositional Semantic Dependencies for Relaxed-Memory Concurrency”. This talk featured a retrospective of 20 years of research in memory models, looking into how to support sequential composition while targeting modern hardware architectures, following an event-based approach with preconditions and families of predicate transformers.
Simuliris: A Separation Logic Framework for Verifying Concurrent Program Optimizations. This talks presents Simuliris, a simulation technique to establish termination preservation for a range of concurrent program transformations that exploit undefined behavior in the source language. The key idea is using ownership to reason modularly about compiler assumptions about well-defined behavior. Based on the Iris framework.
“Pirouette: Higher Order Typed Functional Choreographies”. Pirouette is a programming language for typed higher-order functional distributed choreography. Pirouette offers programmers the ability to write a centralized functional program and compile it into programs for each node in a distributed system. The paper provides in particular a formalization in Coq, notably showing that the soundness of the type-system entails deadlock-freedom.
Last, we want to highlight an effort on smart contract verification: “SolType: Refinement Types for Solidity”. This work presents a refinement type system for Solidity that can be used to prevent arithmetic over- and under-flows in Ethereum smart contracts.

This year, POPL also featured a virtual post-conference workshop event, across multiple time zones. In addition to a few select invited speakers, the novelty this year was a couple of speed-dating sessions where industrial sponsors (like us) could connect with potential candidates.

All presentations are available on a dedicated YouTube playlist.

CPP 2022

As mentioned above, Nomadic Labs were proud sponsors of CPP 2022, the 8th Conference on Certified Programs and Proofs. This conference covers a broad spectrum of mechanized verification efforts and tools, ranging from the formalization of mathematics and certified algorithms to new proof techniques, frameworks and tooling for interactive theorem proving.

This year, we highlight two very interesting keynotes that resonate with our own verification endeavors at Nomadic Labs:

Professor Andrew Appel’s keynote, on “Coq’s vibrant ecosystem for verification engineering” addressed the engineering aspects of building vertical stacks consisting of verified software components, centered around the VST project, and how to scale up verification efforts by relying on community libraries and community-centric projects like the Coq platform. In particular he focused on how to address the verification gap across multiple layers of formalization efforts and mechanized software. The issue of the verification gap, that is the semantic distance between verification artifacts and real-world production code, is one of the big hurdles of industrial scale projects like ours — and it has for instance been at the center of different recent projects and internships in house.
June Andronick’s keynote, “The seL4 verification: the art and craft of proof and the reality of commercial support”, consisted of two parts. The first one was a tour de force of the seL4 project, discussing in general the scalability of large verification stacks, and in particular the challenges of the project. The second part focused on the broader picture of carrying out large scale verification projects in industrial settings. She focused on both technical proof-engineering aspects; as well as non-technical ones, like how to properly explain the virtues of formal verification to non-technical audiences. That rings a bell!

Other talks we enjoyed were:

“Specification and Verification of a Transient-Stack”, a very cool verification effort using CFML and time credits in Coq to certify concurrent and persistent data structures.
“Mechanized Verification of a Fine-Grained Concurrent Queue from Meta’s Folly Library”, presents the formal specification and verification in the Iris framework of a high-performance, fine-grained concurrent multi-producer multi-consumer queue from Meta’s folly library.

Blockchain-related topics like certified frameworks for programming correct smart contracts, or mechanized proofs of consensus algorithms, frequently appear in the program. This year we had:

“A verified algebraic representation of Cairo program execution”, which focuses on how to verify STARKs in the LEAN theorem prover. StarkNet’s STARKs constitute the core of a Zero Knowledge-based Layer 2 solution for Ethereum — but eventually scalable to other blockchains — whose aim is to translate EVM bytecode into simple, low level Cairo programs.
“Formal Verification of a Distributed Dynamic Reconfiguration Protocol”, presenting the specification and formal verification using TLA+ of the core MongoRaftReconfig reconfiguration protocol, implemented in the MongoDB distributed database.

All CPP 2022 presentations are available on ACM SIGPLAN’s YouTube channel playlist.

CoqPL 2022

On the last day of the conference, we attended CoqPL 2022, the Eighth International Workshop on Coq for Programming Languages. The Coq proof assistant is one of Nomadic Labs’ favorite verification tools, as witnessed by the Mi-Cho-Coq framework, the coq-of-ocaml project — and also, various internship projects on offer. Moreover, the Tezos Foundation supports the development of Coq and of its ecosystem (see here how to apply for support), and the CoqPL workshop provides an opportunity to interact with the Coq development team, learn about the recently released features and what’s coming next in the pipeline, and other recent library developments.

See you next year!

It was a great experience for us to be back at POPL, and the hybrid format allowed us to reach out and interact better with the community.

POPL turns 50 next year, and it will again be a hybrid event, with the physical event taking place somewhere in the United States’ West Coast⁷. We look forwards to seeing you there for this special occasion — both in person and online!

or the premier conference, depending on whom you ask and how many papers they got in this year. That said, it is indeed a very reputable venue, and a mark of pedigree that can make a career as a researcher in the field. ↩
That said, the still-ongoing pandemic did not stop us from attending (and blogging about) virtual POPL last year. ↩
Many thanks also to Blokhaus’ Jenny Carbonaro who handled logistics, and was in place to set up our cool stand, and welcome our Nomads to Philly. ↩
A more detailed (and even more biased) live blog by Germán is available here. ↩
A practical lesson in linearizability. In person, parallel sessions, entail that you can enjoy one talk live, and eventually watch the other (a few weeks) later on YouTube. This often entails you find yourself in the wrong-room, or that there were two talks you really wanted to attend scheduled at the same time. You can attend virtual parallel sessions live in parallel, at your own risk. ↩
Specifically, he referred to the four postulates of quantum mechanics as they are presented in Chapter 2 of the seminal “Quantum Computation and Quantum Information” by Michael A. Nielsen & Isaac L. Chuang. ↩
It has not being officially announced yet but preliminary announcements hinted towards cities that, for instance, host teams in the Western Conference of the NBA, regardless of what maps have to say on the matter. ↩

Refactoring the Management of Native Tokens in the Tezos Economic Protocol

2022-02-01T14:00:00+01:00

Before the Ithaca protocol proposal, native protocol token¹ (tez) movements and associated balance updates could be difficult to follow in the Octez source code for developers and maintainers. For example, crediting or debiting a contract, or creating the corresponding balance updates, often happened at distant locations in the code, with a sometimes non-trivial control flow in between those operations. With Ithaca, we have refactored the code to centralize token movements (e.g. transactions, fees payments, rewards, …) in a dedicated module, so that:

Token transfers are more explicit and uniform, and associated balance updates are exhaustive.
It is easier to assess properties related to token movements, such as preserving the invariant circulating tokens = minted tokens - burned tokens. Here, the term circulating tokens refers to all tokens frozen or held by user and smart contracts.

To achieve this, we aimed for an implementation of token transfers that is correct-by-construction in the following senses:

Tokens are either minted and deposited into an account, moved from one account to another, or withdrawn from an account and burned; and
Balance updates found in block metadata give a complete and exact account of all tokens minted, moved or burned.

The first property ensures that the total amount of tokens in circulation is equal to the difference between the amount of tokens minted and the amount of tokens burned. Continuously ensuring theses properties while the protocol evolves can be particularly tedious when the implementation does not use an explicit notion of token transfer from one token holder to another. The second property allows anyone to audit all token movements that happen when a block has been applied, just by looking at the balance updates in the block’s metadata.

Token management in Hangzhou

In protocol Hangzhou, token transfers are implemented in a few steps consisting of debiting a sender of tokens, crediting a receiver of tokens and constructing balance updates to report the movement. Even though those steps are semantically very close to each other, they are intermixed with other instructions and hence can be distant within the source code. For example, a transfer from an implicit account to another is implemented as follows:

...
Contract.spend ctxt sender amount >>=? fun ctxt ->
(match Contract.is_implicit receiver with
| None -> return (ctxt, [], false)
| Some _ -> (
    Contract.allocated ctxt receiver >>=? function
    | true -> return (ctxt, [], false)
    | false ->
        Lwt.return
          ( Fees.origination_burn ctxt >|? fun (ctxt, origination_burn) ->
            ( ctxt,
              [
                Receipt.
                  ( Contract payer,
                    Debited origination_burn,
                    Block_application );
              ],
              true ) )))
>>=? fun (ctxt, maybe_burn_balance_update, allocated_receiver_contract) ->
Contract.credit ctxt receiver amount >>=? fun ctxt ->
...

This implementation makes it difficult to ensure that the tokens have been accurately transferred from the sender to the receiver. Indeed, to ensure this, it is also necessary to be certain that the function calls between the transfer steps do not modify the balance of the accounts concerned. Moreover, the balance updates corresponding to debiting the sender and crediting the receiver are done much later after many other instructions. Furthermore, between debiting the sender and crediting the receiver, a balance update is created to reflect the fact that tokens have been burned to pay the origination cost. However, the call to Fees.origination_burn does not involve tokens and only involves the accounting of the consumed storage space. Here the balance update is created in anticipation of tokens being burned at some other location in the code when the following instruction is executed: Fees.burn_storage_fees ctxt ~storage_limit ~payer:.... Consequently it is more difficult to verify that all balance updates found in block metadata faithfully reflect token movements that have occurred.

Token management in Ithaca

In protocol Ithaca token management explicitly consists of three possible operations:

Mint tokens and deposit them into an account
Transfer tokens from one account to another
Withdraw tokens from an account and burn them

A new module named Token provides functions to perform these operations, and to obtain the balance of an account. Here, the term account must be understood in a broader sense than in implicit or originated accounts. For instance, to freeze the deposits of a delegate, deposited funds are withdrawn from the delegate’s implicit account and its balance of frozen tokens is increased. To make that type of token management operation fit the pattern of a transfer from one account to another, the frozen deposits of a delegate must be viewed as a kind of account from the point of view of the Token module. Hence, the notion of account needs to be generalized.

A lightweight generalization of accounts

We want to make the notion of a transfer from one account to another more explicit, while minimizing the amount of rewritten code, and hence reducing the risk of introducing bugs. Hence, the notion of account is generalized in a lightweight fashion. There are three kinds of accounts: source accounts, container accounts, and sink accounts.

Source accounts are debited whenever new tokens are minted, and designate fictitious accounts with a virtually infinite balance from which tokens can be withdrawn. For example, `Nonce_revelation_rewards is the source account of tokens minted to reward delegates for revealing their nonces, and `Liquidity_baking_subsidies is the source account of tokens minted to subsidize the liquidity baking CPMM contract.

Container accounts are regular (user and smart contract) accounts, or convenience accounts that hold tokens temporarily (e.g. when parts of a delegate’s funds are frozen). These accounts have a finite capacity (of a fixed-size integer) and a balance that is increased or decreased whenever they are credited or debited. The function Token.balance allows to read the balance of a container account. For example, the account (`Contract c) represents an implicit or originated account c, and the account (`Frozen_deposits d) represents the account of the frozen deposits of the delegate d.

Sink accounts are credited whenever tokens are burned, and designate fictitious accounts virtually able to receive an unlimited number of tokens. For example, the sink account `Storage_fees is the receiver of storage fees burned for consuming storage space on the chain, and the sink account `Double_signing_punishments is the receiver of tokens burned as punishment for a delegate that has double baked or double endorsed.

Tokens can be transferred from a sender account (i.e. a source or container account) to a receiver account (i.e. a container or sink account). The type Token.container represents container accounts. The type Token.source represents accounts that can play the role of the sender in a transfer of tokens, and similarly, the type Token.sink represents accounts that can play the role of the receiver. Both Token.source and Token.sink contain Token.container since container accounts can be both sender and receiver of tokens. Those three types are wrappers that allow the Token module to dynamically dispatch the operations of crediting and debiting corresponding accounts to the right piece of code able to handle the operations.

Transferring tokens

The operations of token management can be performed by invoking the transfer or transfer_n functions of the Token module. A transfer of a given amount from a sender to a receiver simply consists in withdrawing that amount from the sender’s account and crediting the same amount to the receiver’s account. Hence:

to mint an amount of tokens and deposit it into a receiver account, one transfers that amount from a source account to the receiver account
to move tokens from one container account to another, one performs a transfer from the former (sender) account to the latter (receiver) account, and
to burn an amount of tokens withdrawn from a given account, one performs a transfer from the sender account to a sink account.

Consider the following examples:

Token.transfer ctxt (`Contract sender) (`Contract receiver) amount
Token.transfer ctxt `Endorsing_rewards (`Contract d) rewards
Token.transfer ctxt (`Frozen_deposits d) `Double_signing_punishments amount_to_burn.

The first example is the instruction invoked during a transaction operation from a sender contract to a receiver contract. Here the constructor `Contract allows to construct container accounts that can designate sender accounts as well as a receiver accounts. The implementation of the transfer functions is such that, by construction, transfers between container accounts leave the total amount of tokens in circulation unchanged. Also, those functions are the only locations where that property needs to hold.

The second and third examples correspond to the instructions invoked, respectively, to distribute endorsing rewards to a delegate d, and to punish a delegate d for double signing. Here Endorsing_rewards is a source account that cannot play the role of the receiver in a transfer, and Double_signing_punishments is a sink account that cannot play the role of the sender in a transfer. Using distinct types of accounts for different types of token transfers makes intent more explicit, more tractable and easier to verify. Tokens withdrawn from a source account are by definition minted, and tokens sent to sink accounts are by definition burned. When sticking to this transfer pattern, we can be sure that these are the only ways to respectively increase or decrease the amount of tokens in circulation. Since all token management operations now follow the transfer pattern, it is easy to list all locations in the protocol where token movements are involved just by running:

grep -R "Token.transfer" src/proto_alpha.

Balance updates

Whenever tokens are minted, moved, or burned, one or more accounts are debited, and another account is credited. This results into a sequence of balance updates reporting the amount of tokens debited from and credited to each of the accounts involved. In Ithaca, balance updates are exclusively generated by the transfer functions of the Token module. Therefore, by construction, they reflect exactly the token movements that have occurred. And it is now easier to assess that they are exhaustive and correct just by reviewing the implementation of the Token module.

In block metadata, the field balance_updates contains balance updates generated by the invocations of token transfer functions. Each invocation of those functions generates a list of balance updates starting with a series of debits, and ending with a credit matching those debits. Typically, this field contains a flat list resulting from the concatenation of one or more lists of balance updates. Consider for example the following flat list of balance updates:

  [ {"kind": "...", ..., "change": "-100", "origin": "block"},
    {"kind": "...", ..., "change": "100", "origin": "block"},
    {"kind": "...", ..., "change": "-125", "origin": "block"},
    {"kind": "...", ..., "change": "-75", "origin": "block"},
    {"kind": "...", ..., "change": "200", "origin": "block"} ]

This list reports that two transfers have occurred: 100 mutez are transferred from one account to another, and a total of 200 mutez are transferred from two accounts to a third. For a more complete description of the format for balance updates, see this page of the documentation.

Conclusions & Future Work

In Ithaca we have centralized the management of native protocol token movements in a dedicated module so that token transfers are correct by construction. Basically we have implemented some of the ideas mentioned here and here in a lightweight manner so as to minimize the risk of introducing bugs. The transfer pattern is now the only uniform means of minting, moving, or burning tokens in the Tezos protocol. By construction, balance updates in block metadata faithfully and exhaustively reflect all token transfers that have occurred when the block has been applied. To test this refactoring work, we have very much relied on our existing suite of unit and integration tests, and we have also written more unit tests to check the expected properties of the token transfer functions.

In some cases quite sensitive changes were necessary before we could apply the new transfer pattern to manage tokens, while preserving previous behavior. For instance, this was the case for the burning storage fees. Here it was important to preserve the validity of transactions where the sender’s balance is sufficient to pay the fees only when all internal transactions have been processed. To test such changes, we have back-ported them to previous protocols, and the chain has been replayed to ensure that the previous behavior has been preserved.

Our next work on the subject of token management will be to extend the Token module so that it also manages token delegations. The staking balance of a delegate can only be changed when token owners delegate or transfer their tokens, but updating staking balances is currently performed at various locations after each token transfer that affects staking balances. With the refactoring described above it will be possible to update the staking balance once and for all in the transfer functions of the Token module.

Tez are the Tezos blockchain’s native protocol tokens. We use native protocol here to make explicit that in this article we focus on the former and not on, e.g., digital assets implemented using Tezos smart contacts following the FA1.2 and FA2 token standards. ↩

Announcing “Ithaca 2”

2022-01-19T17:00:00+01:00

This is a joint post from TriliTech, Nomadic Labs, Marigold, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

We are happy to announce Ithaca 2, a revised version of Ithaca that contains an important performance improvement of metadata RPCs.

(As is usual, Ithaca 2’s “true name” is its hash, which is Psithaca2MLRFYargivpo7YvUr7wUDqyxrdhC5CQq78mRvimz6A.)

Ithaca 2’s sole difference with Ithaca is the deprecation of a redundant field in an error message. When the interpretation of the Michelson script of a smart contract results in a runtime error, the script code will not be passed anymore to the error message. This will save a lot of space because the script is typically large and is duplicated in each error message - i.e. each time the smart contract will be called and its execution results in a runtime error. Furthermore, the script code can already be fetched from the contract address, so there is no need to record this information again. Storing script code in the block metadata is an anomalous behavior and the deprecation of this redundant field will result in an important reduction of disk usage.

Other features of Ithaca 2 are contained in Ithaca and we refer to our previous announcement.

We also encourage you to look at the changelog for a full description of the contents of the proposal.

As it was the case with Ithaca, testing is critical. A testnet for the Ithaca 2 protocol will launch in the coming days. It is critical to have as many bakers as possible participating in this testnet, by running nodes, producing blocks and deploying apps. We are looking for more bootstrap bakers to participate from day one. If you are interested, please join the baking slack and make yourselves known in the #test-networks channel.

Once more, we strongly encourage you to test your own Tezos-based applications to check for compatibility problems. Ithaca 2, and the configuration for its test network, will be included in the next release candidate of Octez version 12.

Similarly to Ithaca, should the Ithaca 2 protocol proposal be accepted by the community, the following minimal version of Tezos node (shell) software will be necessary to participate in the consensus, due to necessary changes introduced to the protocol environment: v12 of Octez, or v2 of TezEdge.

Tenderbake has been injected

2021-12-21T12:00:00+01:00

You may recall from earlier this year our in-depth discussion of Tenderbake, our new classic-style consensus algorithm that offers quick and deterministic finality and safety under asynchrony.

Since then, and in collaboration with Functori, we have prepared Tenderbake for inclusion in the next Tezos protocol upgrade proposal Ithaca. A big thank you to the incredible and amazing Functori for their incredibly amazing collaboration!

In this post, we will complement our previous discussion of Tenderbake with a recap of some practical aspects of the new consensus algorithm: what does Tenderbake mean for you, how can you use it, and what benefits will it bring? We will overview the most important points here, and you can find further details in

the full Tenderbake documentation, and
the Tenderbake changelog.

Overview

Recall that when determining which block to add to the blockchain, Tenderbake proceeds in rounds, starting from round 0:

If there is enough timely participation at round 0, then consensus is reached and a decision is made.
If there is not enough timely participation¹ at round 0 then we go to round 1, and so forth.

Provided that at least two thirds of the total active stake participates honestly in consensus, then a decision is eventually taken.² In the current implementation of Tenderbake the duration of each round increments by 15 seconds, starting from 30 seconds: thus the deadline for participation in round 0 is 30 seconds, that for round 1 is 45 seconds after that, and so on. So in normal conditions, when consensus is reached promptly at round 0 every time, we can expect Tenderbake to add one block every 30 seconds.

Note that:

Tenderbake has deterministic finality after just two blocks. In normal conditions, when the network is healthy, decisions are made at round 0, after 30 seconds. This means that in normal conditions the time to finality is about one minute.
Finality is guaranteed under an assumption that at most one third of the active stake is Byzantine (a Byzantine participant is one that is either offline and so unresponsive, or responsive but not following the rules). This is a standard assumption, but it does have the corollary that the chain will halt if more than one third of the active stake is offline.³

Speaking of bakers — baker daemons now endorse as well. Indeed, Tenderbake bids farewell to the concept of endorser daemon from the Emmy family of consensus algorithms which Tezos has used to date. Goodbye endorser daemon; thank you for the years of service!

The baker also emits a new kind of consensus operation called a preendorsement.

Aside from the new kind of consensus operations, there are changes to the block metadata and RPCs, so Tezos tool developers — for example, developers of block explorers and indexers — are invited to check the metadata and RPC changes so that their tools can be seamlessly updated.

Finally, there are notable improvements to the incentives mechanism, which we describe next:

The incentives mechanism

The deposit scheme

In Emmy*

In Emmy*, each time a delegate wants to bake or endorse, it must put down a deposit as a guarantee of honest behaviour. That is: in Emmy* there is a deposit for each individual baking and endorsement action, which is computed such that security deposits are around 10% of the total stake.⁴ Double signing in Emmy* is penalised by forfeiting the entirety of the deposits made by the delegate during the cycle when the double signing was made.⁵

This design has some drawbacks:

If a delegate becomes over-delegated (meaning that their stake is not sufficient for their required deposits) then this may lead to the delegate missing slots and thus slowing down the chain.
Handling deposits requires I/O operations. This imposes communication and computational overheads, which may slow down block validation.⁶
Penalties for double signing (see the discussion of slashing below) are not proportional to the double signer’s stake.

In Tenderbake

Tenderbake takes steps to address the issues above.

In Tenderbake’s new deposits scheme, delegates still need to put down deposits as a guarantee against double signings, but this deposit is now based on the delegate’s stake rather than on individual baking/endorsement actions. Deposits are frozen when baking and (pre)endorsement rights are allocated to stakeholders, five cycles in advance.

The deposit amount is determined by the stake a delegate has during the so-called active cycles: two cycles in the past and five cycles in the future. To be precise, a delegate’s frozen deposit at cycle c is 10% of the highest stake during cycles c-2 to c+5.

Delegates can put an upper limit on their deposit by using the client command set deposit limit. In particular a delegate can set this limit to zero, and then the delegate will lose its baking rights in 7 cycles. To be able to bake again, the delegate would have to reset the deposits limit or simply unset it (with the client command unset deposit limit).

Slashing

In Tenderbake, we say a delegate double signs when it bakes or (pre)endorses two blocks at the same level and round, but with different payload hashes.⁷ Double signing is penalised by slashing (forfeiting) a part of the delegate’s deposit:

Double baking is penalised by slashing 640 ꜩ (or the whole deposit, if less than 640 ꜩ).
Double (pre)endorsement is penalised by slashing half of the deposit. This means that a wrongdoer who double (pre)endorses twice during a single cycle will lose their entire deposit, and thus lose baking rights for the rest of the cycle.

These penalties reflect the relative severities of double baking versus double (pre)endorsement: double baking must be penalised because it wastes resources, but double (pre)endorsement could fork the chain, which would be much more serious.⁸

Rewards

In Tenderbake, just as in Emmy*:

Participation in consensus (both baking and endorsement) is rewarded.
A delegate must have a minimal stake to participate.⁹ (The minimal stake for becoming a baker is 8000 ꜩ in Emmy* and 6000 ꜩ in Tenderbake.)

However in Tenderbake, baking rewards and fees can be dispensed immediately, thanks to the new deposit scheme.

The block, its payload, and baking rewards

In Tenderbake we distinguish between the payload producer and the block producer:

The payload producer is the baker who selects the non-consensus operations to be included in the block; we refer to these operations as the block’s payload.
The block producer is the baker who signs the block.

The payload producer is typically the same as the block producer — but not necessarily! For instance, if consensus is not reached within a given round and the baker has committed to the block proposed at the given round by endorsing the proposed block, then it might be that at a later round the baker is forced to propose a block with the same payload. In such a case the payload producer may differ from the block producer.

Given this separation:

Operation fees as well as a fixed baking reward of 10 ꜩ go to the payload producer, because the payload producer selected the operations.
A bonus for including endorsements above the required threshold goes to the block producer, to incentivize him/her to play fairly.

To be precise, a block producer who includes endorsements corresponding to x extra endorsement slots receives a bonus of 0.004286 * max (0, x - 4667) ꜩ, where

4667 = 2/3 * 7000 + 1 represents the required threshold for agreement,¹⁰ and
7000 is the value fixed for the total number of endorsement slots (see parameters consensus_committee_size and consensus_threshold in the documentation).

Endorsement rewards

Endorsement rewards are distributed at the end of each cycle, because only endorsers actively participating during the given cycle are rewarded. Such actively participating endorsers are known as present¹¹ endorsers.

An endorser is considered present¹² when the number of slots corresponding to its endorsements which are included in blocks is at least two thirds of its total slots during the cycle. Therefore an endorser may be rewarded even if some of its endorsements are not included.

A present endorser is rewarded proportionally to its stake, and not proportionally to its allocated slots. To be precise, a present endorser with x percent of the total active stake during a cycle, is rewarded with 0.002857 * x * 8192 * 7000 ꜩ where

8192 is the number of blocks per a cycle and
7000 is the consensus_committee_size mentioned above.

Happy Tenderbaking (remember to update!)

Please remember to update to the latest version of Octez — otherwise, your baker might end up being a Byzantine baker simply by virtue of running out-of-date software, and nobody wants that! You can learn more about Tenderbake in the full documentation and changelog.

That’s it! Stay tuned for another blog post on testing Tenderbake. Until then, we wish you all the best for the New Year: may all your chains reach prompt and productive consensus!

Perhaps the network is running slow … ↩
… for instance, once the network outage passes. ↩
An offline baker is (as standard) considered to be a Byzantine baker — a Byzantine baker is one that, by definition, does not properly act according to the consensus algorithm, and the simplest way to not act according to the consensus algorithm is to not act at all! Whether this is by accident (poor network) or design (actively hostile to the network) is, from the point of view of designing a consensus algorithm, unimportant. ↩
See this discussion of the consensus parameters: “Since deposits are locked for a period of PRESERVED_CYCLES, one can compute that, at any given time, about ((BLOCK_SECURITY_DEPOSIT + ENDORSEMENT_SECURITY_DEPOSIT * ENDORSERS_PER_BLOCK) * (PRESERVED_CYCLES + 1) * BLOCKS_PER_CYCLE) tokens of all staked tokens should be held as security deposits. For instance, if the amount of staked tokens is 720,000,000 ꜩ, then roughly 8.74% of this amount is stored in security deposits.” ↩
A cycle in Tezos is a certain number of blocks. This is 8192 blocks, and will stay the same in Tenderbake. At a typical block time of 30 seconds, this comes to about three days. ↩
The bottleneck here is the node. Each baking/endorsement event requires its own small deposit, which imposes its own small overhead. A node that is busy processing the I/O operations needed to update the data structure storing the security deposits, is a node with less time for validating its next block. ↩
A block’s payload is the sequence of non-consensus operations contained in the block. Preendorsements and endorsements contain the hash of the payload they refer to. ↩
Even if there are two blocks at the same level and round, honest bakers will (pre)endorse only one of them so safety is preserved. ↩
The reader might be familiar with the notion of token_per_roll in Emmy* which was used to denote the minimal stake needed to participate in consensus. In Emmy*, rolls played a role in the computation of baking and endorsement rights. In Tenderbake, this is no longer the case because rights are computed from delegates’ stake. Thus rolls are only used to determine the voting power of a delegate. ↩
A block at level l+1 must include endorsements for a block at level l as a proof that agreement has been reached at level l. These endorsements must correspond to at least two thirds of the total slots. ↩
Note that this notion is different from the notion of active delegates. ↩
To monitor participation, the protocol provides a new RPC, unsurprisingly dubbed participation. ↩

Announcing Tezos’ 9th protocol upgrade proposal “Ithaca”

2021-12-20T15:00:00+01:00

This is a joint post from TriliTech, Nomadic Labs, Marigold, Oxhead Alpha, Tarides, DaiLambda, Functori & Tweag.

We were proud to see Hangzhou go live on chain on December 4th, 2021. In keeping with our policy of proposing upgrades on a regularly scheduled basis, we’re happy to announce our latest Tezos protocol proposal, Ithaca.

(As is usual, Ithaca’s “true name” is its hash, which is PsiThaCaT47Zboaw71QWScM8sXeMM7bbQFncK9FLqYc6EKdpjVP).

Ithaca contains two major updates to the protocol, as well as numerous minor improvements. Below we discuss some of the most interesting and important changes.

Tenderbake

Tenderbake is a major update to the Tezos consensus algorithm. Like Tendermint, Tenderbake brings fast deterministic finality to the Tezos protocol.

Tenderbake comes with a set of important changes:

The protocol moves away from a roll-based model to an optimized stake-based model to allocate rewards: bakers will receive rewards depending on their current stake instead of the number of rolls they own.
A reduction in the minimal number of tokens required to be selected as a validator would be implemented: from 8,000 tez to 6,000 tez. This minimal stake of 6,000 tez remains necessary for performance reasons.
The baking and endorsement rewards mechanism has been reworked (c.f. rewards documentation). In particular, baking rewards will be credited instantaneously, and not frozen for 5 cycles as is the case with Emmy*. Furthermore, there will no longer be a variance for endorsement rewards. The total sum of endorsement rewards for a cycle will be fully distributed at the end of the same cycle, provided delegates have at least 2/3 of their endorsement slots included in blocks.
A new security deposit mechanism is introduced: delegates are required to freeze, at minimum, 10% of their stake in advance in order to obtain baking and endorsement rights. A new operation Set_deposit_limit is also introduced to manually manage this limit.
The number of endorsement slots per block has been bumped from 256 to 7,000: this means that a delegate with the minimum amount of tokens will participate every 10 blocks on average. The node’s storage layer and prevalidator have been optimized to handle the charge, with the precheck feature also contributing to the increase in performance. The number of endorsement operations, which will continue to endorse multiple slots, will be proportional to the number of validators in the network, i.e. around 500.
Since Tenderbake is modeled after classical BFT consensus algorithms, it favors safety over liveness and requires active participation of validators holding 2/3 of the stake in order for the chain to progress.

This consensus algorithm also offers the possibility to easily reduce the minimal time between blocks, which may be proposed in future Tezos protocol amendments.

Precheck of operations

The new version of the protocol will enable the prechecking of operations. This is not a feature of the Ithaca protocol proposal per se, but it rather consists of a new set of functions which are exposed by the economic protocol, and which can be used by any Tezos shell (e.g., Octez and TezEdge) to avoid fully executing manager operations before gossiping them through the network.

The feature serves mainly one purpose: increasing the number of operations gossiped over the Tezos network. It is a prequel to further optimizations that should increase the transaction throughput over the Tezos network.

Liquidity Baking

Ithaca includes an increase to the liquidity baking sunset level of 819,200 blocks, or twenty voting periods, roughly an additional ten months. This bigger increase will avoid needing to worry about the sunset level for the next few protocol amendments. Also, to balance this increase, the threshold for activating the escape hatch is lowered from 50% to 33%.

We invite you to look at the changelog for a full description of the contents of the proposal.

Ithaca is the biggest update to Tezos to date, and testing is critical. A testnet for Ithaca protocol named Ithacanet will launch in the coming days. It is critical to have as many bakers as possible participating in this testnet, by running nodes, producing blocks and deploying apps. We are looking for more bootstrap bakers to participate from day one. If you are interested, please join the baking slack and make yourselves known in the test-networks channel.

Furthermore, we strongly encourage you to test your own Tezos-based applications to check for compatibility problems with Ithaca. Ithaca, and the configuration for its test network Ithacanet, will be included in the version 12 of Octez.

Should the Ithaca protocol proposal be accepted by the community, the following minimal version of Tezos node (shell) software will be necessary to participate in the consensus, due to necessary changes introduced to the protocol environment: v12 of Octez, or v2 of TezEdge.

If Ithaca is adopted, the next proposal (which likely will have a name starting with the letter “J”) should be proposed and enter the Tezos amendment process next year.

Over the course of the coming months, our teams also intend to continue to develop and propose amendments to increase performance, lower gas consumption, reduce block times, and increase the overall Tezos network’s throughput — as measured, for example, in transactions per seconds, or smart contract invocations per second. We are all excited to continue developing the future of Tezos.

Meanwhile at Nomadic Labs #13

2021-10-28T15:00:00+02:00

Welcome to our meanwhile series, the ongoing story of Nomadic Labs’ amazing adventures in the Tezos blockchain space. This post is a recap of our activities in the third quarter of 2021, following on from our 2020 recap and our 2021 Q2 Meanwhile. As always, you can find out more about us here: Twitter @LabosNomades ~ Website ~ LinkedIn ~ Technical blog ~ GitLab repo.

Table of contents (Q3 2021)

Octez
Mi-Cho-Coq
Umami
Protocol upgrade: Granada activated, Hangzhou proposed and testnet launched
Adoption and Support
Training
Technical documentation
PhD student, Intern, and apprentice interviews
Media Interviews and Academic Papers
NL research seminars and blog posts
Sponsorship
Contract calls
À la prochaine

Octez

You may recall from Meanwhile #12 that we announced the christening of Octez — the veteran implementation of Tezos which had previously been known just by its version number and by a GitLab repo.

The Octez team were active in Q3 2021, releasing Version 9.4 (3 July), 9.5 (29 July), 9.6 (6 August), and 9.7 (7 August); then 10.0~rc3 (10 August), 10.0 (19 August) and then Version 10.1 (26 August). Changelogs are here. To top this off, we released the release candidate Octez 11.0~rc1 on 22 September.

The currently-active version at time of writing (end October) is Octez 10.3. Here is the Octez GitLab repo. Feel free to get the Octez Tezos implementation and join the Tezos blockchain!

Mi-Cho-Coq

We are proud to announce that Mi-Cho-Coq version 1.0 was released on 2 July.

Mi-Cho-Coq is a free and open-source library for verifying the correctness of Michelson smart contracts in Coq using weakest-precondition calculus. It is a Coq library which models all aspects of the Michelson language: its syntax, its type system, and its semantics.¹

For concrete applications of this powerful tool, see for example verification of a spending-limit contract, of the FA1.2 token standard (see also an associated paper in FMBC‘21), and of several versions of the Dexter decentralised exchange (Dexter v2 and Liquidity Baking).

Umami

The Umami wallet was released in April 2021 and is an all-in-one Tezos cryptocurrency wallet for both beginner and advanced users. At time of writing the current version is version 0.5.3, available for macOS, Linux, and Windows.

We are pleased to announce that during Q3 2021

Beacon SDK v2.3.0 supports the Umami wallet, and
Umami wallet hardware wallet integration was released for Ledger devices.

Umami was built by OCaml developers for OCaml developers using Reason (formerly ReasonML) and supports all the native features of the Tezos protocol, including multiple accounts, tokens, batch transactions, and delegation — with more features in the pipeline. For more information see a Umami Wallet page on Medium, a short essay on the purpose of Umami, and the Umami GitLab repo.

Protocol upgrade: Granada activated, Hangzhou proposed and testnet launched

The Tezos economic protocol enjoys regular upgrades. How this happens concretely is that a self-amendment mechanism is activated to propose an upgrade to the protocol — and because Tezos is an open community, protocol upgrades are approved by community vote. This means that upgrades can only happen when you, the Tezos community, vote that it be so; which is why you’ll notice we only ever talk about us making upgrade proposals.

Recall from Q2 2021 that:

Florence was activated on 11 May 2021 (block height 1,466,368; cycle 357; changelog), and
Granada was proposed (ongoing election; changelog) — and approved on 20 July.

We are pleased to announce of 2021 Quarter 3 that:

Granada went live on 6 August (vote details).
The Hangzhou protocol upgrade proposal was released on 21 September (changelog; vote details).
The Hangzhou testnet (Hangzhounet, of course) was released on 24 September. Joining instructions are here.

For more information on Granada, you can see:

A detailed blogpost on the Granada upgrade.
The summary in Meanwhile Q2.
Our analysis blog post on network updates from the Granada protocol amendment, which is an analysis of the first complete cycle under Granada.

Substantive upgrades in Hangzhou are listed here. These include a new Timelock primitive.

Adoption and Support

Our adoption team have been hard at work developing relationships, and thanks to their dilligence we are proud to report that during Q3 2021:

Our adoption team supported Ipocamp, a start-up specialising in creating solutions to protect creations, in choosing Tezos for its intellectual property solution (also in French).
On 28 July Smartlink (a decentralised escrow smart contract platform built on Tezos) became a corporate Tezos baker. See also the announcement here.² You can view Smartlink’s baking activity at their baker address.
On 22 September Block0 (a blockchain consulting and development company with an emphasis on supply chain traceability and transparency) became a corporate Tezos baker.² It is also the first Tezos corporate baker in Belgium.

Our support team has developed several support documents (see the support homepage, ‘Useful Resources’ for a full list):

9 July. Baking: creating blocks on Tezos (also in French).
12 July. Nodes on the Tezos blockchain (also in French).
21 July. NFTs on Tezos
27 July. Tickets on Tezos (also in French).
4 August. Tezos Amendment Process (also in French).
18 August. Sapling demo of confidential transactions (also in French).
25 August. Formal verification of smart contracts (also in French).
23 September. DeFi on the Tezos blockchain

See also the list of Tezos-related research publications.

Training

Training is a good thing in itself, and also a key complement to adoption and support.

Training in Q3 2021 proceeded apace, and we are pleased to report that we ran half-a-dozen courses training roughly a hundred people in total from countries all over the world — in European countries like Germany, France, Luxembourg, and Belgium of course; but also India, the USA, Canada, and Vietnam — and together with the Tezos in Africa foundation we trained Tezos developers from Tunisia, Algeria, Senegal, Ghana, and Nigeria (see tweet 1 and tweet 2).

Check out the Nomadic Labs training webpage. If you’d like us to run a training course for you, in either English or French, then you can send us an e-mail at training@nomadic-labs.com.

Technical documentation

Our Technical Documentation team have laboured mightily to annotate and enrich the lives of Tezos developers everywhere by expanding the online documentation for Tezos developers. This includes the following new content:

A tutorial explains GADTs (Generalised Algebraic Data Types) in the context of Tezos. GADTs are a hugely powerful, but complex, OCaml mechanism. The tutorial illustrates example applications of GADTs within Tezos — especially as applied to the Michelson interpreter for ensuring the absence of certain classes of smart contract runtime errors.
A tutorial helps developers add unit tests to legacy code modules, by providing schemes for increasing the testability of existing code.
A series of pages explains the consensus algorithm, customised for each protocol version. As part of this revamp, existing pages on Proof of Stake have been restructured to describe the proof-of-stake mechanism and the associated concept of delegation, independently of the consensus algorithm.
We have enriched and documented event-based logging. This existing feature generalises the classic logging API (which will be deprecated in due course, though not just yet!). The documentation includes a page for developers and a page for users.
We have written structured documentation on releasing a new protocol proposal.
- The main page is a checklist detailing all the steps of this process, including both technical steps and public-relation steps. Two other pages (linked to from the main page) describe specific sub-procedures:
- Protocol freezing provides a guide for removing dead code from older protocols.
- Adding a new environment handles the case when a protocol proposal needs new features from the shell. In this case, a new protocol environment must be created. This page details all the steps involved.
A new tutorial guides developers in writing and executing long-running tests in the Tezt framework. This presents infrastructure for storing results in an InfluxDB database; for visualising the results with Grafana; and for sending alerts when test results differ significantly from the previous tests. Long live tests!
Finally, a new page in the User sections of the documentation gives an overview of the versioning schemes in use in the Tezos ecosystem: including Octez releases, protocol releases, protocol environment numbering, and RPC versions; and we explain which schemes are orthogonal and which are related. We hope you will find this useful.

PhD student, Intern, and apprentice interviews

We are extremely pleased at Nomadic Labs to host interns (stagiaires) and apprentices (apprentis), and to supervise some PhD students in collaboration with the local universities in Paris. In Q2 we introduced a ‘people’ category in our blog, to host interviews with our valued guests.

We are delighted by the variety of interesting and special people with whom we have been able to work. Interviewees in Q3 include:

Killian Delarue, an apprentice working with the Node and Tooling team on a clean, easy and flexible way to run and monitor a Tezos node on your terminal.
Daniel Jean, an apprentice working with the Support team to help corporate users experiment, test, and build on the Tezos blockchain.
Julien Coolen, an intern working with the Shell team on implementing a super-scalable distributed hash table using the Octez peer-to-peer library.
Valentin Chaboche, an intern working with the Verification and Testing teams on lightweight property-based testing through type annotations, for the Octez codebase.
Étienne Marais, an intern working with the Shell team on energy profiling of the Tezos node, with an emphasis on green computing and sustainability.
Mathis Gontier Delaunay, an intern working with the Umami team on new features for the Umami wallet.
Antonio Locascio, an intern working with the Privacy team on automatic extraction of property-based tests from F* specifications.
Tianchi Yu, an intern working with the Michelson team on superoptimisation for the Michelson language.
Corentin Calmels, an intern working with the Adoption team on new business solutions.
Paul Laforgue, a PhD student working with the Verification team on specification and verification of message-passing distributed systems using choreographies.

Media Interviews and Academic Papers

We are delighted to report that:

On 7 July our adoption manager Alexia Martinel was interviewed by BlockStart about Tezos and what it means for France, Belgium, and Luxembourg.
On 9 July the Blockchain Game Alliace interviewed Alexia Martinel, our senior support engineer Florian Pautot, and others on creating a gaming industry on blockchain.
On 23 September Alexia Martinel spoke during the European Blockchain Week (EBCW 2021) in the Workshop 4 on Decentralized Digital Identity. You can view the video here.
On 13 August our CEO Michel Mauny was interviewed for the 21Millions newsletter (newsletter link).
On 15 September our head of adoption Hadrien Zerah was interviewed in a Taleo Consulting webinar on The impact of Blockchain on the Fund Industry.
On August 11 Daniel Jean and Charles Dehlinger of Nomadic Labs gave an Intoduction to Tezos in the Tezasia hackathon.
Our support engineer Florian Pautot was a judge in the Tezasia hackathon, which ran from 10 August to 6 September.
On 24 September our apprentice Daniel Jean gave a talk on Day 5 of Blockchain Month Malaysia 2021.

We are also delighted to announce that:

Gabbay, Jakobsson, and Sojakova’s paper on the the formal FA1.2 ledger standard was accepted to the FMBC 2021 workshop. Well done team GJS!
Conchon, Korneva, Bozman, Iguernlala, and Mebsout’s paper on Formally Documenting Tenderbake was also accepted to the FMBC 2021 workshop. Well done team CKBIM!

We maintain a list of Tezos-related publications; if your paper should be included then please let us know at contact@nomadic-labs.com.

Finally, we are pleased to report on two AMAs, and on a video which we released for World Youth Skills Day on 15 July:

An AMA on /r/Tezos on 13 July.
An AMA on /r/Tezos on 25 August.
We produced 30s of life advice for World Youth Skills Day (15 July) on tips for the next generation of software engineers. The question was: “What are the most important skills to build a career in the Blockchain field?”. The answers included: understanding decentralisation, persistence, rigourous development practices, communication skills, abstract critical thinking, good sense of humour, rigour, strong taste for technology, humble, hard worker, and to be passionate about blockchain technology. It does seem theoretically possible for a single human being to combine all of these skills — and if you do … then please see our careers page!

The following two talks by Arthur Breitman, the co-founder of Tezos, are not part of Nomadic Labs’ activities — but if you are reading this far then they well be of interest:

On 5 July Arthur Breitman discussed Tezos: Approaches to Scalability.

On 2 August the Paul Barron network interviewed Arthur Breitman about upgradeable blockchain.

NL research seminars and blog posts

Our series of Nomadic Labs research seminars saw the following talks in Q3:

Prototype of a Typical Smart Contract Agency (6 July 2021).
Specifying a Concurrent Queue in Multicore OCaml (17 August 2021).
Then we paused for the summer. We’re French; it’s cultural. The seminars re-start in October and you can click here for full and up-to-date list of talks.

Our series of blog posts has been particularly active and you can check out our in-depth articles for a growing list of detailed articles on our work.

Sponsorship

Nomadic Labs releases all of its software and research freely and openly. Consistent with this philosophy, we play our part in promoting an ecosystem of academic research, by contributing to the organisation of academic conferences and workshops. In particular:

We sponsored the very extensive QONFEST 2021 federation of conferences in Paris (online) in August 2021. QONFEST included four well-known conferences (CONCUR 2021, FMICS 2021, FORMATS 2021, and QEST 2021) and also four satellite workshops (Express/SOS 2021, PM 2021, SNR 2021, and TRENDS 2021). Feel free to browse the conference website to see if there are any publications that might interest you.
We sponsored FMBC 2021, the 3rd International Workshop on Formal Methods for Blockchains in July 2021. You can download the FMBC 2021 preproceedings here (permalink).

Contract calls

Nomadic Labs is part of the Tezos ecosystem, and on this topic it may be worth noting that the number of smart contract calls on Tezos has displayed a striking exponential increase in Q3 2021.

Here are the figures in brief: Tezos turned three years old on 30 June 2021 (see the genesis block, baked on 30 June 2018). In those first three years we had 5 million contract calls. Then we saw another 5 million contract calls in July and August — and then another 5 million contract calls just in September. This means that Tezos saw twice as much smart contract activity in Q3 2021, as it saw in the entire first three years of its existence, so this quarter does appear to have witnessed a step change in activity.

You can view summary bar charts here. Nomadic Labs is pleased to have played its part in these encouraging and excellent developments.

À la prochaine

Before you go, we’d like to advertise:

The Infrachain summit on 18 November in Luxembourg, in Italy, and Online. Infrachain aims to support and foster the European blockchain industry and the Summit is about supporting real business with a gathering dedicated to blockchain topics.
Nomadic Labs will have a booth at the Digital Finance Summit 2021 in Brussels, with an overarching theme of Sustaining the Economic Recovery.

We hope to see you there.

Thanks for reading what we’ve been up to in Quarter 3 of 2021: three months of Nomadic Labs building and testing software and extending public understanding and adoption of blockchain technology. Do check in again for the next Meanwhile for Quarter 4 of 2021.

Computer scientists would call this is a deep embedding of Michelson in Coq. ↩
A corporate baker is a corporate institution that sets up one or more block validating nodes on the Tezos blockchain. More information is on bakers is here. ↩↩

A Deep Dive into the Octez Prevalidator

2021-10-25T14:00:00+02:00

In this blog post we’ll describe recent work on improving the Tezos Octez prevalidator by making it faster and more resilient, and outline our plans for the future.

A brief map of Tezos, situating the prevalidator
Updating the prevalidator
- How we had to …
- … because of consensus operations
A primer on Tezos Prevalidation
Why the propagation of consensus operations should be as fast as possible
An operation on Tezos (from the shell’s point of view)
Propagation of endorsements arriving too early
Making the prevalidator harder
Testing the prevalidator
The future of the Octez prevalidator

A brief map of Tezos, situating the prevalidator

Let’s outline how the prevalidator fits into the Tezos blockchain workflow:

The Tezos blockchain is a chain of blocks — going right back to the genesis block on 30 June 2018 at just after 5pm.
Each block is a sequence of operations (e.g. “Credit 1 tez from my account to yours”; “credit 5 tez from my account to the supermarket”; “grab this nifty NFT from my favorite marketplace”; …)
Users propose operations, which are hashed, and the hashes are bundled into small packages called mempools, which are gossiped¹ across the Tezos peer-to-peer (p2p) network.
The Tezos prevalidator is a software component that mediates between the peer-to-peer (p2p) network communications layer (which gossips information across the network) and the Tezos economic protocol (which decides which operations are valid and thus potential candidates for inclusion in blocks on the blockchain).
A Tezos baker packages operations up into blocks, and then adds the block to the chain.
The Tezos consensus algorithm does the work of organizing consensus across the network on which blocks get included in the final, official blockchain history.

In a little more detail, the prevalidator does several things:

It scans incoming mempool data for unknown operation hashes, and if it sees one such hash then it may request the full contents of that operation.
It may invoke the economic protocol to assess the prevalidity of operations, and accordingly store them in appropriate reservoirs.
It stores known (but not-yet-included-in-blocks) operations as described below.
It may pass operations back to the p2p layer for distribution, with or possibly without a prevalidity check.
A Tezos baker may dip into the prevalidator’s reservoir of prevalidated operations, looking for operations to bake into new blocks.

To paraphrase Animal Farm: every part of the Tezos blockchain is equally critical, but the prevalidator is more critical than most. So it’s important that the Tezos Octez prevalidator should be correct, efficient, and resilient.

Updating the prevalidator

How we had to …

Since the Octez code went live with the launch of the Tezos mainnet over 3 years ago, we have tended to avoid changing the prevalidator’s code, except for bug-fixes and some code linting.

The prevalidator is a critical component and an error in it could crash the network. It ain’t broke, so we didn’t try to fix it. Nevertheless, while the prevalidator has served us well, it does need to be improved.

A prototype re-implementation of the prevalidator was considered, but we decided it would be safer to make careful, incremental improvements to the existing code base, and since April 2020 we have undertaken significant efforts to modernize its implementation and (just as importantly) to increase the prevalidator’s test coverage.

One of the key functions of the Octez prevalidator is to make sure that consensus operations — endorsements² — are propagated sufficiently quickly and efficiently, even in adverse conditions. This can be subtle because it has to do with the behavior of complex interacting systems, and this turns out to be a rich source of failure modes of the Tezos blockchain overall. Regrettably, the propagation of endorsements has been affected by a number of bugs recently, which became evident “in the wild” after activation of the Granada protocol which reduced the minimal block time from 60s to 30s.

It may seem counter-intuitive that making the time between blocks shorter could make the network overall slower — but this is the joy of designing blockchain systems. We will explain how this works below (a key point is here).

Fixing these bugs has motivated several recent releases of Octez: notably v9.2, v9.4 and v9.7, and to a lesser degree, v9.6.

… because of consensus operations

Propagation of consensus operations will be the particular focus of this article, because it is these operations’ particular interaction with the prevalidator that caused us issues over the summer.

We hope this article will explain what we are doing to get this right, and will shed some technical light on what happened with the Tezos network last summer, specifically during the activation of the Granada protocol. We will also describe current and future endeavors to make the prevalidator Harder, Better, Faster, Stronger!

Or to put it another way: we’ll explain what went wrong, what role the prevalidator played in this, and what we’re doing to update the prevalidator to ensure that this particular failure mode will not be repeated.

A primer on Tezos Prevalidation

The mempool data structure

On the Tezos blockchain, operations are gossiped using a container data structure called mempool, which can roughly³ be described as a collection of operation hashes. These operation hashes are advertised in a single CurrentHead network message which contains

a block header — which is usually the one from the head of the Tezos chain as seen by the sender node when it broadcasts its current head — and
a mempool data structure containing the hashes of some operations that the sender node wants to broadcast.

Initial processing: to `Pending` or not to `Pending`

On reception of this message, a Tezos node running the Octez Tezos implementation does two things:

it passes the block header to its block validator — the shell component in charge of block validation — and
it passes the mempool to its prevalidator.

The prevalidator processes the incoming mempool, finds any hashes which are unknown to the receiving node, and sends a request to the sender node for the full content of the corresponding operations. This saves on bandwidth, since network hashes are far more compact than full operations.

The prevalidator performs some basic checks on a new operation that it is well-formed according to the currently-active economic protocol: for example it parses the operation and checks that the data is not gibberish and that the operation would pay enough fees. (What it does not do at this stage is apply the larger and more expensive apply_operation method from the economic protocol; see next paragraph.) If the incoming operation passes these basic checks, then it is marked as Pending by the prevalidator.

Further classification of `Pending` operations into reservoirs

Operations which have passed basic checks to become Pending operations, are collected in a set of operations⁴ called pending — so pending is the set of operations with Pending status — that are next in line to be prevalidated using the apply_operation method of the economic protocol.⁵ To ensure fairness with respect to the other components of the node, pending operations are prevalidated in batches of fixed maximal size.

The result returned by apply_operation enables the prevalidator to classify each operation in a batch as follows:⁶

Applied: the operation can be applied in the current context (meaning: the ledger state, as seen by the node in question). However, the application of the operation might still fail later, in the event a baker decides to include the operation in a block.
Branch delayed: the operation cannot be included in the next head of the chain, but it could be included in a descendant. Example: a manager operation⁷ with a counter value in the future of the expected one.
Branch refused: the operation cannot be included in the next head of the chain, nor in a successor, but it might be applied on a different branch if a reorganization happens. Example: a manager operation whose counter value is in the past of the current counter.
Refused: the operation is impossible and should be rejected. It is not valid, and there is no alternative chain in which it might be. Example: an operation with an invalid signature.

The prevalidator stores the operations in different reservoirs⁸ according to this classification.

Flushing the reservoirs

Regularly — which in normal operation of the current economic protocol means every 30 seconds — the node’s chain will change (e.g. by adding another block), and the block validator will update its head block. This means that the current block state has changed so that prevalidity of operations — and the ensuing classification — in the current state of the prevalidator might be outdated and need to be revised. The node triggers the prevalidator to perform a flush event (also called a recycling event) in which:

each operation classified as Applied or Branch delayed and not yet included in a block is reverted to a Pending state, and
each operation classified as Branch refused is reverted to a Pending state, provided that the new head is not a direct successor of the previous one; that is, if the node switched branches — otherwise they remain classified as Branch refused.

These newly pending operations should then be re-prevalidated.⁹

And the cycle repeats …

Our description so far has been a high-level, somewhat idealized account of the prevalidator’s execution logic. It omits some details, including protections and counter-measures against DDoS attacks from malicious peers. One of these protections will feature in what follows, because it impacts the propagation of consensus operations and will turn out to be fundamental to our story.

Why the propagation of consensus operations should be as fast as possible

The Tezos protocol has relied to date on Proof of Stake consensus algorithms from the Emmy family. The currently-active consensus protocol is Emmy*, which has been running on-chain since the activation of the Granada protocol (at block 1,589,248 on 6 August 2021).

Consensus algorithms à la Emmy come with a special kind of operation called an endorsement, whose purpose is to facilitate reaching consensus faster. This blog post digs deeper into how consensus algorithms work and the role endorsements play in effectively securing the Tezos blockchain. The details relevant to us here are:

An endorsement operation included in a block at level $n+1$,¹⁰ endorses the predecessor block at level $n$. Hence, a baker cannot include an endorsement in its block without having seen and validated its predecessor block at level $n$.
The minimal block delay (the protocol-mandated time gap between two consecutive blocks at levels $n$ and $n + 1$)¹¹ depends on the number of endorsements included in the block at level $n + 1$. With Emmy*, the time between two blocks is $30$ seconds for a priority $0$ block that contains at least $60\%$ of all possible $256$ endorsements — that is $154$ endorsements. With (say) $150$ endorsements, the delay escalates to $60 + 4\cdot (192-150) = 228$ seconds — nearly $4$ minutes!

Thus including more endorsements in a block means

a shorter minimal block delay (i.e. the block can be baked earlier), and so
a greater chance that this block (rather than any competing blocks from other block producers) will get included in the final blockchain history, and also
higher block rewards payed to the baker if the block gets included in the final blockchain history.

If endorsements are validated and propagated quickly across the network, from endorsers to bakers, then blocks can be produced with optimal speed. And as a corollary: it is very important that endorsements arrive promptly relative to the minimal block delay, because if endorsements consistently fail to arrive in time for the minimal block delay then — by design — the block delays as imposed by the consensus algorithm may substantially increase.

And this was how reducing the minimal block delay could potentially make the network slower, if the endorsements start arriving “late” relative to this new minimal block delay. Thus, as we reduce the minimal block delay in search of greater speed and efficiency, it becomes increasingly important that we expedite the passage of endorsements across the network. This in turn requires us to tweak the prevalidator to recognize and prioritize endorsement operations, and — such is the reality of programming complex systems — this requires much careful and precise work which we will now discuss.

An operation on Tezos (from the shell’s point of view)

The Tezos protocol distinguishes between

the (economic) protocol, and
the shell.

This is summed up in a well-known octopus diagram in the Tezos architecture description.¹²

The economic protocol defines the blockchain protocol’s execution logic (consensus algorithms, what kinds of operations are available, smart contracts, …). This can be updated and is subject to self-amendment.
The shell handles the lower level, more mundane tasks: p2p gossiping, context storage, maintaining a distributed database of known blocks and operations. And — key to our story today — the shell triggers the validation of blocks and the prevalidation of operations, by choosing blocks and operations and applying appropriate validation functions from the economic protocol.

The prevalidator is a component of the shell. It makes calls to functions from the economic protocol to help it decide whether operations are valid — but the prevalidator itself is agnostic about precisely how the economic protocol decides.

This agnosticism implies that the Octez suite must implement the shell parametrically over the economic protocol and in such a way that it can deal seamlessly with updates to the protocol. One particular design consequence of this is that the shell must take a rather high-level and generic view of the internal content of certain data structures like operations and blocks, so that there will be an abstract and generic view for the shell, and then a specific view for each specific economic protocol.¹³ Thus, from the shell’s perspective, we have roughly the following (permalink):

(* The branch of an operation is the block hash of the block upon which
   the operation was forged. *)
type shell_header = {branch : Block_hash.t}

(* An operation from the shell point of view is a branch and a list of bytes.
   Only the economic protocol knows how to interpret those bytes. *)
type t = {
    shell : shell_header; proto : Bytes.t}

The branch of an operation is then a block hash. It determines the lifespan of an operation: if the block hash is too old, the operation can be discarded. The current lifespan on Emmy* is $120$ blocks,¹⁴ which (assuming a normal 30s per block) translates to about one human hour.

For the prevalidator, the branch field is also used as an anti-spam filter: if the block hash is unknown, then the operation can be safely discarded. In this context, unknown means “a block that has not been validated yet”.

The proto field is then — in the prevalidator’s eyes — a mere list of Bytes. The economic protocol has the key to unlock the secret of whether that operation is an endorsement or something else. And this detail is central to the plot of the next section.

Propagation of endorsements arriving too early

Granada (the currently active economic protocol) and also the Hangzhou protocol proposal, require that the branch field of an endorsement operation point to the block being endorsed. This requirement is due to a pre-existing legacy format for endorsements,¹⁵ but the requirement can hinder the propagation of endorsements by the prevalidator, in the following corner case:

Suppose that the node receives a new CurrentHead message from the network: this consists of a block header and a mempool. Suppose that this block header is not known to the receiving node, and suppose that the mempool includes operations which are endorsements of that very block. In this corner case, it may be that the prevalidation of the received mempool completes earlier than the validation of the received block (even if only by a second or two).¹⁶ The block header only needs to be validated once, when it is first seen — but this is that first time. In this case, any endorsements for that block that may be contained in the mempool message, will be discarded and not propagated, since the branch of the endorsement is precisely the not-quite-yet-validated block.

Other kinds¹⁷ of operations, like manager operations, avoid this problem by not branching on the immediate predecessor, but rather on an ancestor block three or four levels behind.

Addressing the race between the validator and the prevalidator in a scenario of simultaneously receiving a new block header and an endorsement operation for that same block, required small patches to the shell and to the economic protocol:

On the shell side, the prevalidator now asks the economic protocol whether the freshly-received operation is a valid endorsement, before checking if the operation was branched on a known block — that is, before checking if the block header targeted by the branch field of the endorsement operation is indeed a member of the set of recent live_blocks known to the node. In this case the prevalidator classifies the endorsement as Branch refused, and triggers the advertisement of its current head and of a mempool including the not-yet-validated newly-found endorsement.
On the protocol side, the apply_operation function now checks whether the signature of an endorsement has been verified, before checking whether its branch is known. This avoids propagating endorsements with bad signatures (i.e. it avoids sending spam).

These changes were both included in Octez v9.2, which in addition to the new version of the node also included the “protocol snapshot” for Granada, effectively advertising to the network (and to bakers in particular) that there was a protocol candidate which could be injected.

After the release of Octez v9.2, fewer endorsements were missed. However, the small patches mentioned above created two unforeseen issues:¹⁸

The network could propagate endorsements that were too old and were already included in previous blocks (more than $60$ blocks ago by then, $120$ nowadays¹⁴). This is wasteful, as these operations would be Refused by the receiving peer’s prevalidator.
The execution cost of checking the signature of an endorsement, in the case that the endorsement’s level is not the one expected, is significantly higher than the execution cost of checking the signature of a well-branched endorsement (meaning one included in the successor block to its target). This is because only the public keys of endorsers for the current level are cached by the economic protocol. For the rest, checking their signatures requires a hard-drive read, which may be computationally expensive.

The first issue was fixed in v9.4, and the second issue — which is what caused nodes to slow down when Granada was activated on August 6th 2021 at 09:36 UTC — was fixed with v9.7.¹⁸

The idea behind the fixes is this: in Octez, there are filters specific to an economic protocol which can classify an operation based on its content. We tweaked those filters to only consider endorsements which are at the appropriate level.

In order to prevent such issues from recurring, we have structured our work on the prevalidator around answering three questions:

How can we test the prevalidator beyond integration tests?
How can we reliably benchmark the prevalidator?
How can we make the prevalidator faster?

We tackle test coverage first, then benchmarking, and then speed, because: test coverage catches functional regressions, benchmarking permits us to profile and measure speed, and then we are in a position to both measure performance and more safely change code to improve it.

Thus during the summer of 2021 (and learning from the experience with the activation of Granada) we focused on the first task above, which we call hardening the prevalidator — or to fit the story, making it harder. We aim to tackle the remaining two tasks during autumn 2021.

Making the prevalidator harder

The prevalidator is part of the shell as we mentioned above — but it calls functions from the economic protocol to parse operations and validate them, so that it can decide whether to propagate them. This interaction between the shell and the economic protocol, and how this separation is implemented in the codebase, is fruitful but it does make the prevalidator tricky to test — especially with unit tests.

Hence, the first stage of our work was to isolate the interface of the prevalidator with the economic protocol, by refactoring the prevalidator to separate

those components and those parts of the execution logic that just have to do with the shell, from
those components that are in the shell but interact directly with the economic protocol.

This refactoring enabled us to identify and fix a number of minor bugs, including: several memory leaks, and some minor corner cases where valid operations were not being propagated.

This was nice, but the main achievement of this work was to isolate the core of the prevalidator: a software component, reified as an OCaml module, which classifies operations correctly.¹⁹

Testing the prevalidator

Any sufficiently large software project should employ multiple testing frameworks to provide increased confidence in the implementation. This is especially true for a complex project like the Tezos Octez implementation, each of whose different components — from the lower-level grit of the distributed network to the higher-level Michelson interpreter — is an unruly beast of software in its own right, just waiting to spring bugs and regressions on the unsuspecting software engineer.

Of the many testing frameworks available to the working Tezos dev,²⁰ two are particularly relevant to our story here:

Unit and Property Based Tests for the Prevalidator using, respectively, the Alcotest and QCheck OCaml testing libraries.
Integration tests using Tezt, a custom-made testing framework for system and integration tests for Tezos, focusing on the interaction between Octez nodes and clients.

Prior to April 2021 there were few unit tests for the prevalidator because of the technical issues described in the first paragraph of this section. Our refactoring allowed us to be more rigorous and to add many unit tests for the prevalidator, enabling us to detect and fix legacy bugs. Most of the new unit tests are moreover property-based tests: they not only assert the prevalidator invariant, but also test that the root cause of certain bugs is no longer there.

An example of the latter property is the following:

Given any operation, at any time, an operation cannot be classified twice with two different classifications.

That is, an operation cannot be at the same time Applied and Branch refused.

In summary we have added:

18 property-based tests asserting correctness properties of the classification of operations, and
12 integration tests covering different scenarios in the prevalidator’s life-cycle.

Of the latter, an important integration test we have included asserts that consensus operations which arrive too early are still propagated by the node.

The future of the Octez prevalidator

After making the prevalidator harder, the next step is to make it better. To this end we aim to enhance the prevalidator, working towards two main objectives:

We aim to minimize the elapsed time between when an operation is received (either injected via RPC from a client, or gossiped from the network), and when we propagate the operation to our peers.
We aim to prioritize the handling of pending operations to give more priority to pending consensus operations, and thus prevalidate them as quickly as possible (the current prevalidator treats all pending operations equally).

The first objective will help improve operation throughput — increasing the TPS metric (transactions per second). This will require non-trivial changes to the economic protocol, which we hope to include in the forthcoming protocol proposal (for protocol I).

The second objective will empower the node to discard operations sooner, if their priority is too low. This will help ensure that critical operations (e.g. endorsements, and consensus operations in general, and also high-fee operations) will be retained by the prevalidator and propagated more effectively.

Looking further into the future, making Octez scale to cope with the ongoing increase in network traffic will require ever more efficient validation and propagation time for operations and blocks, and this will require a comprehensive effort across the Octez architecture.

Our work so far has created a harder prevalidator, as a foundation to build a better one. Faster and stronger will follow.²¹ Stay tuned!

Gossip here is a technical p2p term for ”(unsolicited) broadcast to a subset of known peers”. Gossiping contributes to global network coverage of knowledge — like an office rumor (or an epidemic). ↩
Tezos is a Proof of Stake blockchain which requires participants to vote for block candidates. These votes, called endorsements, are included in blocks and recorded on-chain, just as “regular” transactions are. We expand on this below. ↩
The current implementation (permalink) is a record, with two fields: a known_valid list of operation hashes, presumably prevalidated by the node; and a pending set consisting of operations of unknown prevalidity, or invalid-yet-safe-to-broadcast operations. ↩
We mean set here in the mathematical sense: an unsorted, collection with unique members. The current specification does not impose an order on the set of pending operations. However, given the nature of the implementation with pending : Operation.t Operation_hash.Map.t, in practice iteration over pending follows the lexicographical order on operation hashes. ↩
In Tezos we refer to this phase as operation prevalidation to distinguish it from the validation of operations done by the baker when producing a new block, and by a node’s (block) validator when receiving a new block. The entry point to the economic protocol is the same in both cases: the apply_operation method from the protocol API. ↩
The apply_operation function is so called because it checks whether it can apply the operation to the current ledger state (as seen by the node on which this is running) — in Tezos jargon, this ledger state is called the (blockchain) context. If apply_operation succeeds, it returns an Ok value; if it cannot do this, then it returns an Error value of a certain kind. This returned value enables the prevalidator to further classify each operation depending on whether it could be included in the next head of the chain, on the same branch later on, or possibly on some other branch, or never. ↩
Manager operations include transfers, smart contract originations, calls to smart contracts, etc. In short, any fee-paying operation in competition for block space. Manager operations have a counter associated (unsurprisingly) to their manager, which is the Tezos account which signs the operation and which pays the fees. ↩↩
The pun with “mempool” is deliberate. Readers diving into this text with a low tolerance for wordplay can fish out the word “reservoir”, and splash in the word “container” instead, and they will get on swimmingly. ↩
“Reprevalidated”: this blog post’s contribution to the English language. ↩
The level of a block is simply how many blocks are between it and the genesis block (which is at level 0). ↩
The minimal block delay is defined as a function of a block’s priority and the number of included endorsements. See the full definition in the documentation. ↩
This is also the motif in a limited series of coveted laptop stickers. ↩
You can read this article for a holistic account of the life cycle of operations, including the perspective from inside the economic protocol. ↩
The constant regulating this behavior is called max_op_ttl (“maximal operation time-to-live”). It was bumped from $60$ to $120$ when Granada was activated, to compensate for the halving in minimal block time and thus keeping operation lifespan (approximately) constant. ↩↩
In it, the proto field contained only the level endorsed. The rationale behind this design was to save space, as endorsements must be branched in the block immediately following the target. More compact endorsements means more operations inside blocks. And that seemed a clever win at the time. ↩
You may be wondering how the block validator can validate a block if it just has its header. The trick is that the block_header carries a lot of information (unlike the branch of operations), including the hash of the resulting context and the hash of the Merkle tree of operations. By design, a block header and the full contents of its operations, is all the information that the block validator actually needs. More on this in the documentation for the validation subsystem. ↩
From genesis to the latest Tezos protocol there have been four kinds (or classes) of operations: consensus (e.g., endorsements), voting (e.g., protocol injection and ballots), anonymous (e.g., double-baking or double-endorsement accusations), and the manager operations described above.⁷ ↩
A brief chronology is as follows: Octez v9.2 with the aforementioned patches and advertisment of Granada; then 9.3 and 9.4; then Granada activation on August 6th; then Octez v9.5, 9.6, and 9.7. ↩↩
That is, we isolated the core of the execution logic of the prevalidator, and reified this as an OCaml module called prevalidator_classification — in fact, this is technically just a data structure, albeit one containing an API, functions, and internal data structures. OCaml is an object-oriented functional programming language in which modules are first-class, so this kind of encapsulation is practical and all in a good day’s work. If you’re reading this and it blows your mind, but in a good way, then feel free to be in touch to discuss employment opportunities. ↩
More details can be found on Testing in Tezos — the entry point in the Tezos Developer Documentation for all things tests. ↩
If you liked this blog post you may also like a previous blog post on optimizing gas consumption. It describes a distinct set of optimizations in a distinct part of the Tezos universe, but it is interesting to note a resemblance in the overall workflow, described there as “make it work, make it right, make it fast”. Thank you for reading. ↩

Introducing a new Storage Backend for Octez

2021-10-18T16:00:00+02:00

A blockchain is a chain of blocks, where a block is a blob of data holding relevant transactions and other data. This means that Octez — the Tezos blockchain implementation to which Nomadic Labs contributes — should optimize how it handles this particular data structure.

The recently-released Version 10 of Octez introduces a new storage backend for Octez, with new features and performance improvements (referenced as v0.0.6 in the Changelog).

In this blog post we will outline what these optimizations are, and how they work.

Octez node’s store in a nutshell

The storage layer of an Octez node has two parts:

the context, containing the ledger state as a git-like versioned repository, and
the store, containing the blocks, operations and protocols of the chain.

Our Version 10 update is to the store part; context storage is unchanged.

The previous implementation relied on the generic key-value database LMDB. This works, but because it is generic it cannot accomodate optimizations that (in the Tezos use-case) would increase performance.

Therefore, we implemented a bespoke storage layer to optimize read/write speed, memory usage, storage size, and concurrent access.

The key insight

LMDB introduces a global lock when writing new data to the store. It has to because (in the general case) it cannot know how a write might affect the database.

Global locks are expensive, and in the Tezos use-case they are mostly unnecessary: intuitively, blockchains consist mostly of immutable data — except for forks on recent blocks as discussed below — so we know that fresh writes will not affect data if it is old enough, and in fact only a small lock on some concurrent read accesses is required.

We need a little background before we can flesh out the particulars of how to turn this insight into an optimized implementation:

Some background on time and consensus

In Tezos, time is split into cycles, where currently 1 cycle = 8192 blocks ≈ 68 hours.¹
Tezos reaches global consensus about the state of the chain in a few levels, under normal network conditions.² To have a very generous safety margin, Tezos considers that there cannot exist a fork that is longer than 5 cycles ≈ 14 days.

Thus the blockchain can be treated conceptually as consisting of

a long chain of immutable (cemented) blocks which come before the no fork point, and for which consensus is certain, and
a recent and shorter chain of possibly mutable blocks which come strictly after the no fork point, for which consensus (finality) is uncertain.

We illustrate this with a graphic:

This new structure allows us to optimize the store, as we discuss next.

The new store structure

Cemented store

Cemented store manages cemented cycles (left of the no fork point illustrated above), which contain cemented blocks. A cemented cycle is a contiguous list of blocks which reflects the same notion of cycles from the economic protocol. These are immutable so they can have a fast ‘cemented look-up’:

We assume finality for cemented blocks, so history is canonical in the sense that every node has the same cemented storage representation.
The block metadata of each cemented block can be compressed, to reduce storage size.³
We maintain two indexes that allow for rapid block accesses, increasing the overall process performance: a hash -> level index, and its converse level -> hash. These are cheap to implement and maintain, because we know they need not be updated once they have been populated.

Floating store

The floating store splits blocks further into two sub-stores:

a Read Only (RO) floating store of blocks, and
a Read Write (RW) floating store.

The RO store is read-only; no blocks are added, removed, or updated. The RW store is append-only; no blocks are removed or updated, but blocks may be added.

A quick summary of why we have this distinction:

The RO store contains any blocks left over from the floating store of the previous merge procedure that were neither cemented nor trimmed. Details are below.
The RW store contains any blocks that have arrived since the previous merge procedure was initiated.
The floating store (RO+RW) contains blocks which are candidates for reorganization or trimming. It may contain forks and consensus is not final.

We now come to the merge procedure, whereby blocks in RO+RW are examined, organized, trimmed, and (where appropriate) shifted from the floating to the cemented store:

The merge procedure

When shifting the no fork point forward,⁴ we move some blocks from the floating store to the cemented store using a merge procedure, as follows:

Initiate the merge procedure by locking RW so no further appends are accepted. To ensure liveness during and after the the merge procedure, create an empty RW’ store to store blocks as they continue to arrive.
Retrieve the blocks between the previous no fork point (Previous NFP) and the new no fork point (New NFP), starting from the head. This corresponds to the cycle(s) that we might cement.
Trim branches in the selected cycle(s) to those that remain valid according to the consensus algorithm.
Cement the blocks for which we have consensus and place them in a new cemented cycle.
Combine any leftover blocks — meaning blocks in RO+RW that were not deleted by the trimming procedure, but for which consensus is uncertain so that they have not made it into the cemented store — into a new RO’ store.
Promote the RO’ and RW’ as the new RO and RW stores.

Performance benchmarks

These benchmarks were computed on a mid-range laptop with SSD storage running a Tezos node with Mainnet data (which at time of benchmarking had approximately 1.65 million blocks).

Store overall performance

With the bespoke store format discussed above, our new storage backend significantly reduces the node’s disk I/O requirements, memory consumption, and storage footprint.

We can benchmark the storage footprint below as follows:

The block history maintained by a Mainnet node in full mode now requires around 8GB of disk space, compared to more than 26GB before.
The block history required in archive mode now requires 20GB of disk space, compared to between 40GB and 60GB.

Snapshot overall performance

Thanks to the new snapshot version and new store format — moving from the v.0.0.1 (in Octez <= 9.x) to v.0.0.2 (in Octez >=v10.x) — snapshots are lighter and faster to import and export. Indeed, snapshots are now based on the canonical layer of the store, which allows more optimizations:

Snapshot export time is improved by 1.8x, and snapshot import time is improved by 3.7x.
Snapshot size, for full mode, is reduced by around 20%.

New features of the storage backend

The new storage backend of Octez version 10 is lighter and faster as discussed above. It is also more resilient and contains numerous other improvements and tweaks. We conclude with a brief changelog:

New snapshot format (v2) is lighter and faster to export and import and can be exported in two formats:
- (default) as a tar archive: easy to exchange,
- as a raw directory: suitable for IPFS exchange thanks to the canonical representation of the block history.
A new command tezos-node snapshot info allows to inspect the information of a given snapshot.
Rolling mode is now labelled as rolling mode, rather than experimental-rolling.
Reworked history modes to be able to configure a cycle offset to preserve more cycles, if needed. By default, the node now retains 5 cycles with metadata below the no fork point (i.e. full+5), instead of 3 cycles (i.e. full+3).
The store is more resilient thanks to a consistency check and an automatic restore procedure to recover from a corrupted state, meaning that if (for example) power to your node cuts, you are more likely to be able to just reboot and carry on.

Further details of new features, and information on how to upgrade, are in the full Octez v10 changelog.

At time of writing Tezos is running the Granada protocol, in which a block lasts 30s during typical, normal operation. Thus, 1 block = 30s and 1 cycle = just over 68 hours. Since this blog post is written from the point of view of Tezos, we will take blocks and cycles as our primitive notion of time henceforth (rather than seconds and hours). You know you think about blockchain too much, when you tell your child “hurry up to bed, it’s just 20 blocks till bedtime”. ↩
A discussion of Nakamoto-style consensus is here (based on Emmy*) and one of classical BFT-style consensus is here (based on Tenderbake). For the purposes of this blog post, all that matters is that there are blocks, and consensus is reached. ↩
Compressing data reduces storage size but introduces overhead since accessing the data requires some computation to decompress it. However, the cemented blocks are infrequently accessed so for cemented store, this tradeoff is worthwhile. ↩
When the no fork point moves, and how far, is governed by constants and algorithms in the current economic protocol. In practice the economic protocols so far just choose constants such that the fork point gets updated to remain relatively recent, yet far enough into the past for consensus to be certain. ↩

Improving the implementation of cryptography in Tezos Octez

2021-10-14T13:00:00+02:00

1. Our cryptography toolchain, and why it matters
2. The HACL* library
2.1 Quick, correct, and compatible: pick three
2.2 Three cryptographic primitives
3. EverCrypt: an API for HACL* (and more)
3.1 How EverCrypt enhances HACL*
3.2 Advantages of having a cryptographic provider
3.3 The OCaml API
4. A deep dive into the OCaml API
4.1 The low-level API hacl-star-raw
4.2 The high-level API hacl-star
4.3 Using the high-level API
5. Further reading
Acknowledgements

1. Our cryptography toolchain, and why it matters

A safety-critical program is only as trustworthy as the libraries it relies on, so we at Nomadic Labs pay close attention to our tools and dependencies — i.e. to our toolchain.

Our toolchain is based on HACL*; a verified library of cryptographic primitives, which include the hash functions which are the backbone of blockchain technology (principally: Blake2), and the digital signatures which we use to assure transaction authenticity (Ed25519, P-256, and secp256k1 ²).

In this blog post, we’ll discuss recent improvements to our cryptography toolchain, and how we integrated them into practical OCaml programming of the Tezos Octez implementation. Namely:

HACL* has been enriched and improved with new crypto primitives, and
access to those primitives has been improved by introducing a sophisticated new cryptographic provider called EverCrypt.

In this blog post, we’ll survey

what the HACL* crypographic library offers,
how the cryptographic provider EverCrypt can enhance it, and
the scaffolding we use to efficiently and reliably invoke these powerful tools from within the Tezos Octez implementation.

2. The HACL* library

2.1 Quick, correct, and compatible: pick three

HACL* is a cryptographic library¹ offering a comprehensive collection of cryptographic primitives (we give three examples below).

HACL* is complex, safety-critical code. Accordingly, it is written in and formally verified using the F* language, and then extracted (compiled) to correct and efficient C using KreMLin, a tool which translates a subset of F* to C. KreMLin also facilitates using the generated C code in OCaml projects — such as the Tezos Octez implementation — by automatically building the scaffolding that developers would otherwise write themselves.

The resulting code offers three formal safety guarantees:

Functional correctness: the code’s behaviour complies with its specification.
Memory safety: memory is managed correctly — so no buffer overflows, dereferencing null pointers, accessing memory after it has been freed, etc.
Secret independence: the C instructions executed, in what order, and any memory accesses, do not depend on any secret values. This protects against timing attacks that might slurp unintended information.

For more details see the original HACL* paper.

The point of this toolstack is that it gives us three important properties which it is far from trivial to reconcile:

Safety guarantees as described above, while
retaining the state-of-the-art performance required of a cryptographic library in a real-world system, and
safely interfacing with the code from within OCaml.

2.2 Three cryptographic primitives

The Tezos Octez implementation has been using HACL* since before its launch, and Nomadic Labs has been actively supporting the continued development of HACL*: via grants to the Prosecco team at Inria Paris; and through the work of engineers at Nomadic.

HACL* offers implementations for all but one² of the core crytographic primitives used by the Nomadic Labs Octez implementation of Tezos.

Let’s survey three examples, which are cryptographic primitives recently introduced into HACL* and relevant to the Tezos implementation:

2.2.1 P-256

P-256 (also called secp256r1) is an elliptic curve signature algorithm and one of the three signature schemes supported in Tezos: P-256; Ed25519; and secp256k1. P-256 is a NIST standard with wide industry support. It allows interoperability with HSMs (hardware security modules), including hardware wallets and Apple’s Secure Enclave.

Any Tezos address generated with P-256 starts with tz3.

A verified implementation of P-256 by the Prosecco team is now in HACL* and replaces our previous library.

2.2.2 SHA-3

Version 1 of the Tezos protocol environment introduced three hash functions based on the Keccak algorithm:

SHA3-256 and SHA3-512 from the official NIST SHA-3 standard, and
Keccak256, another variation of the Keccak algorithm, which is the hash function used in Ethereum.

Starting with the Edo protocol upgrade, these three hash functions are available as Michelson opcodes, alongside the previously-present hash functions BLAKE2 and SHA-256 and SHA-512 (which are two versions of SHA-2).

2.2.3 BLAKE2

BLAKE2 is the main hash function of the Tezos protocol. BLAKE2 hashes everything from individual keys and messages, to whole blocks.³

We now use BLAKE2 via its new HACL* implementation. This gives us the three formal safety guarantees above, and furthermore HACL* offers

a portable C implementation of BLAKE2 which runs on any platform, and
a faster vectorized implementation which assumes Intel’s Advanced Vector Extensions 2 (AVX2), which offer CPU instructions for SIMD (single instruction, multiple data) parallelism.

Modern cryptographic algorithms, including BLAKE2, are often designed from the ground up to allow implementors to make certain optimizations where possible, such as using hardware features like AVX2 or SIMD mentioned above where the hardware supports this. See a paper on how this fits into HACL*’s verification pipeline: for example, the SIMD implementation is around 30% faster than the non-SIMD one.⁴

3. EverCrypt: an API for HACL* (and more)

3.1 How EverCrypt enhances HACL*

Having a great library like HACL* is one thing. But how to package its features for developers to efficiently deploy in working code? This is where EverCrypt can help.

EverCrypt is a cryptographic provider that bundles the cryptographic primitives present in HACL* into a unified package. EverCrypt — which like HACL* is written in F* and extracted to correct and efficient C using KreMLin — does not replace HACL* so much as provide an additional interface to access the suite of cryptographic primitives offered by HACL*. EverCrypt also bundles routines written in highly-optimized assembly verified with a tool called Vale.⁵

We can sum this up as follows:

HACL* is a cryptographic library.
EverCrypt is a cryptographic provider.

3.2 Advantages of having a cryptographic provider

In practice, accessing cryptographic primitives through EverCrypt offers two advantages over accessing them directly (e.g. using HACL*): multiplexing and agile interfaces.

EverCrypt provides multiplexing:

Multiplexing means that EverCrypt automatically chooses the fastest available implementation of a given primitive, with no input required from the developer.

HACL* may offer multiple implementations of some cryptographic algorithms. In the case of BLAKE2 there is a choice between
- a C implementation that is portable and runs on all platforms, and
- a faster but less portable vectorized implementation that only works on some platforms.
For other primitives, there might be a choice between a portable C implementation, and one that uses verified assembly.
EverCrypt provides agile interfaces:

EverCrypt offers interfaces that group algorithms that perfom the same general function, e.g. hashing or HMAC. Users call a single function hash and pass the name of specific algorithm desired as a parameter. EverCrypt will then also multiplex to choose the specific implementation of that algorithm.⁶

See this paper for a much more in-depth look at how EverCrypt works and how it supports the development of verified applications.

3.3 The OCaml API

So now we have a cryptographic library (HACL*) and a cryptographic provider (EverCrypt).

There remains a technical issue: calling C functions (like those of HACL* and EveryCrypt) from an OCaml application (like Octez). How best to call a C function from OCaml?

It depends! One common mechanism is a foreign function interface (FFI). The OCaml FFI allows developers to call C functions; it’s up to the developer to write a binding that matches the signature of the C function, and to manage the relevant memory.

Interfacing with an external library using the OCaml FFI works, but it can be error-prone, time-consuming, and can duplicate effort for each project that uses the external library.

To ameliorate this and facilitate adoption in the OCaml ecosystem, EverCrypt and HACL* support an OCaml API primarily developed by Nomadic Labs. Released as the hacl-star package on opam, this provides an idiomatic, high-level interface to the EverCrypt and HACL* APIs. Starting with the recently-released version 0.4, the API is also fully documented.

As is often the case in programming, a more convenient interface is also a safer one, and hacl-star offers safety benefits compared to binding with the C code directly:

The lower-level bindings which interact directly with the C code are automatically generated as part of the compilation of F* code to C and are thus valid by construction with respect to the C code.
In the higher-level interface, all function calls check the preconditions that these functions have in F*. The formal guarantees of the verified code only hold if the arguments satisfy these preconditions (such as buffers being a certain length or not passing the same buffer as multiple arguments). This prevents developers from using the API incorrectly, and is therefore safer.

The rest of this blog post dives into a description of how the tools above actually get invoked in the Octez codebase. We are proud of our code and would love you to read this — but you are also welcome to skip to the further reading!

4. A deep dive into the OCaml API

The OCaml API is split into

a low-level part called hacl-star-raw (this contains the actual cryptography) and
a high-level idiomatic interface hacl-star (this makes the low-level part much easier to use).

Let’s look in more detail at how hacl-star-raw replaces manually-written bindings with automatically generated ones, and how hacl-star builds on top of this to offer a convenient, safer API.

4.1 The low-level API `hacl-star-raw`

Consider the example of the SHA-256 function from HACL*. Its C signature is:

void Hacl_Hash_SHA2_hash_256(uint8_t *input, uint32_t input_len, uint8_t *dst);

To call it from OCaml we can write a C stub file and match the C types with compatible OCaml FFI types:

#include <caml/mlvalues.h>
#include <caml/bigarray.h>
#include "Hacl_Hash.h"

CAMLprim value ml_Hacl_Hash_SHA2_hash_256(value input, value input_len, value dst) {
    Hacl_Hash_SHA2_hash_256(String_val(input),
                            Int_val(input_len)
                            String_val(dst));
    return Val_unit;
}

Then we can bind this external function to an OCaml function which we can call in the rest of the code as we would any OCaml function:

external sha2_256_hash : Bytes.t -> int -> Bytes.t -> unit =
    "ml_Hacl_Hash_SHA2_hash_256" [@@noalloc]

However, there’s a simpler way to write bindings.

Using the Ctypes library, we can just write OCaml declarations for the C functions that we want to bind, and the library takes care of the rest. A Ctypes declaration for the SHA-256 example above would look like this:

open Ctypes
module Bindings(F:Cstubs.FOREIGN) =
  struct
    open F

    let hacl_Hash_SHA2_hash_256 =
      foreign "Hacl_Hash_SHA2_hash_256"
        (ocaml_bytes @-> uint32_t @-> ocaml_bytes @-> returning void)

end

This is cleaner than before, but we’ve only bound a single function – the SHA-256 hash. We would need to write one such binding for every function from the C library that we want to use in OCaml.

We now take this a step further by having the KreMLin tool automatically generate these Ctypes declarations, at the same time as the C code. This automation saves time and effort and furthermore it ensures that the signatures are correct with respect to the original F* code and the extracted C code, and that they remain in sync when the F* code changes.

This functionality in KreMLin can be fine-tuned,

allowing users to specify which of the resulting C modules will come with Ctypes bindings, and
correctly inferring which dependencies also need to be bound.

The output is a collection of bindings that resemble the snippet above, along with the required Ctypes boilerplate and a .depend file listing dependencies between the generated bindings to be used as part of the build.

Thus, we generate bindings for the entirety of the EverCrypt/HACL* code every time a new snapshot is produced. This is then all packaged as the hacl-star-raw opam package.

4.2 The high-level API `hacl-star`

hacl-star-raw is better than writing C bindings by hand, but it still offers a low-level, C style interface to the library.

To complement this we developed a handwritten idiomatic OCaml API. This further improves convenience and safety; primarily because the preconditions of the original F* functions (which are lost when compiling the F* code to C) can be enforced at runtime.

To illustrate, let’s look at the original F* signature (permalink) of the SHA-256 function above:

module B = LowStar.Buffer

let hash_st (a: hash_alg) =
  input:B.buffer uint8 ->
  input_len:size_t { B.length input = v input_len } ->
  dst:hash_t a->
  ST.Stack unit
    (requires (fun h ->
      B.live h input /\
      B.live h dst /\
      B.disjoint input dst /\
      B.length input <= max_input_length a))
    (ensures (fun h0 _ h1 ->
      B.(modifies (loc_buffer dst) h0 h1) /\
      Seq.equal (B.as_seq h1 dst) (Spec.Agile.Hash.hash a (B.as_seq h0 input))))

[...]

val hash_256: hash_st SHA2_256

Let’s break hash_st down:

We can see that hash_256 is an instantiation of a generic SHA-2 function sha_st, parameterized by the specific algorithm. This is a pervasive pattern throughout the library.
Just as we specified in the OCaml binding above, it takes three arguments:
1. input, which is a uint8_t buffer,
2. input_len, with a refinement specifying that the size of input must be equal to input_len, and
3. dst of type hash_t a, which is also a uint8_t buffer of the correct digest size for algorithm a.

Glossing over some of the details, we see in the requires clause above that the function hash_st (and therefore hash_256 too) has certain liveness, disjointness, and other preconditions. In particular:

the input buffer must be smaller than the maximum allowed size for the specific algorithm (B.length input <= max_input_length a), and
hash_256 is guaranteed to modify only dst, and dst will contain the result of the hash as defined in the spec.

In the original F* these preconditions are statically checked when compiling code that calls hash_256. But in OCaml we only get to work with the extracted C code, in which this information has been erased (as seen in the C signature of the function). Users of the C library must make sure that the arguments they pass respect the original preconditions of these functions.

In the OCaml API, we do check all of these preconditions at runtime. For example, this is what the functor used internally for hash functions roughly looks like:

module Make_HashFunction (C: Buffer)
    (Impl : sig
       val hash_alg : alg
       val hash : C.buf -> uint32 -> C.buf -> unit
     end)
= struct
    let hash ~msg ~digest =
      check_max_input_len Impl.hash_alg (C.size msg);
      assert (C.size digest = digest_len Impl.hash_alg);
      assert (C.disjoint msg digest);
      Impl.hash (C.ctypes_buf msg) (C.size_uint32 msg) (C.ctypes_buf digest)
end

Make_HashFunction is parameterized both by

C, which is the data type we want to use to represent C buffers (we currently use Bytes, but Bigstring is also possible), and
Impl, the specific hash function implementation.

Before calling the bound C function, the buffers are checked to have the correct size and to be disjoint (which for Bytes simply means checking for inequality).

4.3 Using the high-level API

We’ve seen how the OCaml library comes together, from the HACL* C code, to the low-level Ctypes bindings, to the idiomatic OCaml API. Now let’s look at how it can be used.

Most cryptographic algorithms exposed through this OCaml API can be called in more than one way, to suit different use cases. Our SHA-256 example above will conveniently illustrate them.

The API is split in two interfaces:

Hacl, which directly exposes the portable C implementations
EverCrypt, which exposes the agile and/or multiplexing interfaces

In Hacl, SHA-256 can be used in two styles:

Hacl.SHA2_256.hash which takes a buffer representing the message and returns the digest
Hacl.SHA2_256.Noalloc.hash which, more in keeping with the C style, takes as inputs both the message buffer and the output buffer into which the digest will be written

The first style is usually more convenient, but the second style can be useful in cases where an output buffer has already been allocated and we don’t want to allocate a new one. The choice is the programmer’s to make: most modules in Hacl and EverCrypt offer both styles above.

EverCrypt offers three further options:

EverCrypt.SHA2_256 is an identical interface to Hacl.SHA2_256, but with a different underlying implementation: Hacl.SHA2_256 uses the portable C implementation of SHA-256; whereas EverCrypt.SHA2_256 uses the multiplexing EverCrypt interface which automatically uses code relying on Intel SHA extensions if the architecture supports it.
EverCrypt.Hash.hash is an agile and multiplexing interface to all hashing functions supported in EverCrypt and it is parameterized by the hashing algorithm:
```
    let digest = EverCrypt.Hash.hash ~alg:SHA2_256 ~msg
    
```

The EverCrypt.Hash incremental hashing interface can be used when we need to update the internal state repeatedly before generating a digest:


    let st = EverCrypt.Hash.init ~alg:SHA2_256 in
    EverCrypt.Hash.update ~st ~msg; (* can be called multiple times *)
    let digest = EverCrypt.Hash.finish ~st

5. Further reading

Taken together, these changes constitute a significant investment in improving our codebase, which has brought us up-to-date with the best technology available and allowed us to increase reliability and flexibility, without compromising on performance.

You can read further in:

The OCaml API documentation.
For web development, an official JavaScript API is available, exposing the version of HACL* extracted to WebAssembly.
For more examples using other algorithms, see the unit tests.
EverCrypt is free and open-source, for you to use.
See this paper for a more in-depth look at how EverCrypt works and how it supports the development of verified applications.

We hope this survey has been informative and may inspire you to use EverCrypt in your own OCaml projects too!

Acknowledgements

Thanks to Karthikeyan Bhargavan, Natalia Kulatova, and Marina Polubelova of the Prosecco team at INRIA, and to Jonathan Protzenko of Microsoft Research, for their help, support, and collaboration. At Nomadic Labs, the work reported on in this article was carried out principally by Victor Dumitrescu.

HACL* was developed as part of Project Everest, a collaboration between Inria Paris, Microsoft Research, and other institutions and contributors. ↩
secp256k1 (Bitcoin’s signature scheme) is the only core primitive that is not currently implemented in the HACL* library; we just import the code directly from Bitcoin. ↩↩
BLAKE2 comes in two flavours: BLAKE2b optimized for 64-bit platforms and produces digests between 1 and 64 bytes long, and BLAKE2s optimized for 32-bit platforms and produces digests between 1 and 32 bytes. There is also 4-way parallel BLAKE2bp and 8-way parallel BLAKE2sp, as well as BLAKE2x which can produce digests of arbitrary length. None of that matters for this article, except to note that the Tezos Octez implementation uses BLAKE2b and for the purposes of this article, BLAKE2b = BLAKE2. ↩
See Section 4.1 of the paper “On Intel processors, vectorization speeds up Blake2 by about 30%”.

In practice the computational cost of cryptographic operations in Tezos is small compared to that of the other components of the network, so the discussion of performance just confirms that our tooling upgrade is (slightly faster and certainly) not slower than the version it replaces. We checked this empirically with a benchmark, running a Tezos node for a fixed (large) set of operations with different BLAKE2 implementations. There was no noticeable difference in performance, even using BLAKE2 implementations with significantly different performance when directly compared. ↩
Not all primitives in HACL* are accessible through EverCrypt, and we don’t use all of them through EverCrypt. Primitives that only have a portable C implementation and are not part of some agile interface, don’t benefit from being included in EverCrypt. Conversely, all the primitives in EverCrypt are available in HACL*, but parts of their implementations can sometimes come from a different source, e.g. Vale. ↩
Agile Interfaces are just a limited form of higher-order programming (meaning functions passed as parameters to other functions), but remember that this is accessed as a C library. Passing functions as parameters is a bigger deal in the C world than e.g. in the OCaml world. ↩

Timelock: a solution to miner/block producer extractable value

2021-10-11T12:00:00+02:00

Block Producer Extractable Value (BPEV) — called MEV on proof-of-work chains (Miner Extractable Value) — is a form of arbitrage present in most decentralized finance applications on blockchains. It is undesirable, because it imposes costs on users, to the benefit of block producers (examples below).

To help improve this situation we propose in Hangzhou a new Michelson smart contract instruction, based on a cryptographic primitive named Timelock (technical documentation). The solution we propose allows to temporarily hide the payload of a transaction sent to a smart contract for a period of time greater than the time it takes to include the transaction in a block.

In this way there is no leakage of information at the P2P level and no reordering of transactions can be applied in order to arbitrage the smart contract.

We argue that this tackles the issue of BPEV at its core.

In this post we explain BPEV and its relationship to the fees market.

We identify the core problem as a leakage of information between different layers of the architecture of modern blockchains. We argue that this problem is not inherent to blockchains and thus can be solved with the right cryptographic tool.

We explain the advantages of cryptographic commitments and why they are not enough to tackle our problem.

Finally we show how the Timelock primitive improves over commitments and we sketch how to use it to protect a smart contract from BPEV.

Block Producer Extractable Value (BPEV)

Block Producers and fees market

To understand BPEV it is useful to recall the role of block producers in a blockchain.

To enjoy the benefits of decentralization, blockchains have a permissionless design so as to attract a large number of block producers.¹

In Tezos for example anybody with a computer and the necessary stake can become one. Today the network is run by around 400 block producers. In order to motivate block producers to operate on the network they obtain an economic incentive for each block produced.

This incentive is usually composed of a fixed amount created from inflation (block reward) and, less importantly, the fees of all transactions included in the block.

Because each block can only include a finite amount of transactions, a fee market forms where users bid for inclusion of their transactions. The block producer naturally prioritizes transactions with a higher fee.

This system popularized by Bitcoin works well for simple transfers. However things get more complicated when transactions are intended for complicated smart contracts, and can for example affect the price of an asset when executed. A first hint of the problem is visible from the fact that the order in which transactions are included in a block is the prerogative of the block producer and is outside of the control of the transaction emitter.

What is BPEV?

A large class of smart contracts (typically ones related to DeFi) are sensitive to the ordering of transactions. By scanning the transactions on the network, arbitrageurs can exploit the fees market to manipulate smart contracts and produce undesirable arbitrages.

We can see a simple example in which a smart contract automatically computes the price of an asset based on supply and demand (e.g. an Automated Market Maker). A user wishing to buy this asset sends a transaction with a fee. An arbitrageur monitoring the P2P network then produces a transaction to buy the same asset but with a higher fee, so as to have priority in the next block. When the arbitrageur’s transaction is executed, the price of the asset is adjusted by the smart contract and the honest transaction buys the asset at a higher price from the arbitrageur, who then immediately sells the asset and makes a profit, a so-called “sandwich”.

This simple example can be reminiscent of front-running in a classic financial market, although a key difference is that front-running in financial market involves the malicious use of client’s information by fiduciaries while, in this scenario, no direct breach of trust is involved.²

The phenomenom is particularly relevant on blockchains since the effect of a block can be determined algorithmically, and the process can be easily automated and leads to more sophisticated arbitrages that are used in practice.

It is important to note that several actors can compete among themselves for priority in a block thus causing spikes in the fee prices and reducing their margin of gain.

In the end, block producers are extracting the most value from this practice because of the inherent power of their role.

The value in this attack is therefore called Miner Extractable Value (MEV) on Ethereum. Tezos faces a similar phenomenom, that we call Block Producer Extractable Value to extend the terminology to proof-of-stake networks (BPEV).

One concrete account of MEV (on the Ethereum network) is described in the article “Ethereum is a dark forest”. The author suffered a 12,000 USD loss on a transaction which was submitted and arbitraged away. See also

Flashbots docs (permalink), and
a detailed paper “Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges”.

which shows the extent of the problem.

The core of the problem

It is important to understand that this is an architectural problem. Blockchains are usually separated into three layers: a P2P (peer-to-peer) layer, a consensus layer, and a smart contracts layer. Let’s take an example to see the life of a transaction throughout the three layers in a typical blockchain. A user creates a transaction of the form (source, fees, payload). The source is the address of the user, the fees are meant for the block producer, and the payload is intended for a smart contract. The user then broadcasts the transaction in the clear on a gossip P2P network. The consensus layer validates the transaction and includes it in a block. In principle the validation could only ensure that the source has enough balance to pay the fees. Once the transaction is included in a block by the consensus layer, the recipient smart contract executes the payload.

Note however that the P2P and consensus layers have access to the payload even if it’s not necessary for their operation. This unnecessary leakage of information is the core cause of BPEV.

As we will explain in the rest of this blogpost our solution consists of hiding the payload from the P2P layer. Note that this is a non-trivial task as a smart contract cannot hold secret keys.

Users of the Tezos network might recall the integration of the Sapling protocol in the Edo amendment and wonder if it could solve the problem. Indeed in some cases Sapling can be used to alleviate BPEV, however it offers a very strong notion of privacy which in many cases is not required and which may limit the expressivity of the smart contract.

The solution we propose, called Timelock, allows to temporarily hide the payload of a smart contract for the time necessary to have it included in a block. The payload is then decrypted and the smart contract can operate as before.

Cryptography and fairness

Before explaining our solution based on Timelock, we digress to explain the concept of fairness through the children’s game rock-paper-scissor.

The commit-and-reveal technique

Consider the game of rock-paper-scissors. In this game, it is key that players reveal their moves simultaneously — or at least within a time interval less than a human reaction time. Unfortunately if we play the game on a blockchain, computers have ample time to react to broadcasted transactions on the P2P layer before they are included. This makes it impossible for our children to play a naïve version of rock-paper-scissors across a blockchain.

How to play rock-paper-scissors over a blockchain, using commitments

Using a commitment scheme, our children could still in principle play rock-paper-scissors on our blockchain. A commitment scheme is a well-known cryptographic primitive in which we commit to having made a decision, without immediately revealing what that decision is.

Formally, we require two properties:

Hiding property: given a value I can commit to this value. However, my commitment gives no information about the value committed to.
Binding property: If I reveal my value, this is guaranteed to be equal to the value that I committed to earlier.

We can now devise a protocol to play our game:

Player 1 chooses a move $M_1\in\{rock, paper, scissors\}$ and broadcasts a commitment $C(M_1)$ of $M_1$.
Player 2 receives $C_1$ and sends a move $M_2\in\{rock,paper,scissors\}$ in the clear.
Player 1 receives $M_2$ and reveals $M_1$.

The hiding property gives Player 1 confidence that Player 2 cannot use the value of $M_1$ to choose $M_2$, and the binding property gives Player 2 confidence that $M_1$ was indeed the chosen move. Thanks to commitments, both players are protected, neither can gain an advantage, and the game is fair.

Commitments and BPEV

Commitments are a partial solution to BPEV.

Users can commit to transactions, and block producers can include these in the blockchain. However, block producers are kept honest in the sense that the hiding property means that they cannot interrogate the commitments for information with which to extract value for example by reordering or inserting transactions. The binding property means that commitments really do commit users to the transactions that they chose.

The overall P2P-Consensus-SmartContracts blockchain architecture, is untouched. Later — just as in the game above — users can reveal their actual transactions.

However, there is a catch: the binding property guarantees that a user cannot reveal a different transaction, but does not force the user to reveal. Users could spam the system with transactions and only reveal the ones that suit them.³

So commitments are a step in a good direction, but they do not fully solve the problem.

Timelock and BPEV

We propose a solution to alleviate this issue which is relatively easy to implement and is based on an old cryptographic technique called time-lock encryption (see Time-lock puzzles and timed-release Crypto for more details).

Time lock encryption allows to encrypt a message such that it can be decrypted in two ways: it can be decrypted rapidly using the author’s secret; or a long and time-consuming computation can decrypt the ciphertext without requiring access to the author’s secret. The duration of this computation can be set to an arbitrary predetermined constant $T$. In addition, a proof of the correctness of the decryption can also be produced and checked rapidly ($\log T$ in our case).

We can see intuitively that this solves the problem we had with the commitment. Indeed even if a user refuses to reveal the content of their transaction, someone else can go the slow way to decrypt it.

General principle

Armed with our new Timelock cryptographic primitive we can image a smart contract implementing the following pattern:

In a first period, a contract collects user-submitted and Timelock encrypted Michelson values along with some valuable deposit, such as tez.
In a second period, after the values are collected, the contract collects from users a decryption of the value they submitted alongside with a proof that the decryption is correct.
In a third period, if any value remains undecrypted, anyone can claim part of the deposit by submitting a decryption of the value, with the other part of the deposit being burnt. Different penalties can be assessed depending on whether the user merely failed to submit a decryption for their value, or if they also intentionally encrypted invalid data. Different rewards can be distributed for submitting a correct decryption. The third period needs to be long enough so that people have enough time to perform the Timelock decryption.
Finally, the contract can compute some function of all the decrypted data.

There is generally no incentive for users not to provide the decryption of their data and thus the third period generally does not need to take place. However, the second period needs to be long enough so that bakers cannot easily censor submission of the decryption in a bid to later claim the reward. Burning a part of the deposit also limits grieving attacks, meaning where a user gets back their whole deposit by providing the decryption, but in a way that delays everyone else.

As we can see our solution is generic in the way it solves the leakage issue in the P2P network. However it has to be included in the design phase of an application, and many parameters need to be carefully evaluated depending on the application (incentive, latency…).

We propose in Hangzhou the introduction in Michelson of an opcode (OPEN_CHEST) and two types (chest and chest_key) allowing Timelock-encrypted values to be used inside a Michelson contract. See more information in the technical documentation.

Caveats

Although Timelock solves the core of the problem of BPEV, it also moves part of the complexity of validating an operation to the smart contract. In the traditional setting, a number of errors can be detected during validation by the block producer, which can for example mark the transaction as invalid (all the while retaining the fee). Once this information is encrypted this is no longer possible. Special cases such as decryption and deserialization errors (see the technical documentation for more details) are inherently application-specific issues which we therefore leave to the designer of the specific smart contract application to address.

Other uses

While the problem we wished to solve was BPEV, Timelock might also be useful for other applications, for instance this self-tallying fully-decentralized voting system. Intuitively, as we outline in this blog post, Timelock allows to bring some notion of fairness in interactive protocols with respect to the ordering of messages. We hope other creative uses can be made of the instruction we propose.

Conclusion

Thanks to the introduction of Timelock in the Michelson smart contract language, transactions can hide sensitive information at the P2P layer, preventing any malicious reaction before the consensus layer orders them. We believe our solution tackles the BPEV problem at its core, at the price of additional complexity during smart contract design. We hope Timelock will help protect the next generation of DeFi applications in the Tezos ecosystem from the threat of BPEV.

To be fair, not all blockchains are permissionless: a permissioned blockchain could address MEV/BPEV by fiat, by only admitting “good” block producers. We would expect this to be just as effective as (for example) the traditional global financial network is effective at ensuring honest behaviour by licensing only “good” brokers.

Setting aside how realistic honesty-by-fiat can be in general, for Tezos it begs the question, by replacing one problem (“How to ensure brokers stay honest”) with two (“How to ensure the permissioning authority remains honest” and “How to ensure brokers stay honest”). We in Tezos believe in inclusivity and decentralization by design, so our brief is to design a blockchain that is robust against MEV/BPEV without requiring a fiat permissioning authority. We believe that this is possible, feasible, and furthermore that this is the correct engineering solution; whence the topic of this blog post. ↩
MEV/BPEV is also called generalized front running. We deprecate this terminology since it might suggest to the reader that there is a fiduciary relationship between block producers and transaction emitters when, in fact, none exists unless explicitly contracted into. ↩
Imagine playing rock-paper-scissors with a three-armed alien child using commitments: it just puts down rock, paper, and scissors, one with each hand, then reveals whichever hand wins. ↩

Announcing new internship subjects for 2021!

2021-09-27T16:00:00+02:00

Are you a student in Computer Science or Cryptography? Are you excited about tackling complex programming problems, eager to acquire a deeper knowledge in your field of interest, and happy to work with talented and skilled staff to build a free, open-source, decentralized blockchain architecture dedicated to social good?

Then the Nomadic Labs internship program might be for you!

We at Nomadic Labs are one of the premier research and development centers of the Tezos ecosystem. We work on the core development, evolution, and adoption of this self-amendable blockchain protocol in France, Luxembourg, Belgium, and elsewhere. More than fifty talented engineers work with us, mixing industrial and academic skills and applying their expertise in distributed, decentralized and formally verified software.

In the previous internship program eight interns spent between three and six months working with us on various subjects, from peer to peer to building testing tools. They came from Université de Paris, the École normale supérieure, the École Polytechnique — and even from the Universidad Nacional de Rosario in Argentina.

By joining one of our teams as an intern, you will grow your skills in a collegiate and collaborative space, and help to build a free and open-source decentralized ecosystem that is as committed to doing social good, as it is to technical excellence.

Internship subjects

We are offering the following list of 13 internship topics — or you can come to us to propose your own (see below).

Cartography, monitoring and analysis of the p2p network: build and use cartographer nodes to analyze network metrics and topology.
Ad-hoc Static Analysis of Octez: extend or develop a pragmatic tool to analyze octez codebase and provide metrics to improve code quality.
Memory footprint analysis of Ocaml concurrent programs: profile node by to evaluate the overhead of the currently used promise library
FAT CAT, Formal Acceptance Testing of Contracts for Administering Tokens: use Coq proof assistant to formally verify token smart contracts and provide a testing tool for developers of such contracts.
Generation of Scenario Tests: build a syntax extension tool that generates tests scenarios and reports from OCaml functions specifications.
Tezos-metrics, Live Monitoring of Tezos Nodes: enhance an existing monitoring tool and add live analysis of the protocol execution.
MechaTez, Formally Verifying Critical Features of Tezos Protocols: extend existing Coq formal verification of the protocol to target new features
Integrating static analysis in smart contract development tools: build an lsp based interface for michelson and brought it to actual development tool
How to Reason on Traces between Tezos Nodes: build a logical framework to analyze traces of messages exchanged by tezos nodes
Contribute to the next Tezos blockchain protocol amendment: build a new feature for the protocol, from start to finish.
React/Reason programming on a wallet application in a blockchain setting: take part on the development of the umami-wallet using React and ReasonMl.
Improve our Formal Verification Framework: design and implement formal tools in an industrial setting.
Comparing Program Proof Tools for OCaml Programs in an Industrial Setting: benchmark formal verification tools for OCaml, and experiment especially with FreeSpec.

You can also propose your own project, provided it is close to Tezos or the work we do at Nomadic Labs. Please just describe your interests and outline your ideal internship, and we will be happy to discuss developing the topic together.

To submit your application, please send an email to internship@nomadic-labs.com, or click or follow the link of the desired subject and fill in the form at the bottom of the page.

Are you a lecturer who would like to share these internships with your students? Thank you sincerely for your interest: please feel free to share our 2021-22 internship catalog.

Three questions to Nomadic Labs apprentices — Killian Delarue

2021-08-27T14:00:00+02:00

Nomadic Labs is delighted to host apprentices (apprentis). These are students — usually studying for an undergraduate or Master’s degree — who work with us in parallel with their studies for a period of one to two years. Each apprentice has a unique story to tell about how their understanding of blockchain as a practical and applied technology has developed during their time with us.

In this blogpost, we will ask three questions of one of our current apprentices: Killian Delarue (and a couple of questions of his mentor at Nomadic Labs).

Killian — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Killian

1. Please present yourself and your academic background

My name is Killian Delarue, I’m a 24 year old student and an apprentice at Nomadic Labs.

I’m in the final year of a five-year degree in computer science and software engineering at ENSIIE graduating in September 2021. My subject specialization is functional programming and formal methods.

In summer 2020 I did a three-month internship at Nomadic Labs, and I was very pleased in September 2020 to return on a professional development contract. This means that I get to work on a project with Nomadic Labs, in parallel with the final year of my engineering studies. The contract will continue until my graduation in September — so it’s a full year of apprenticeship with one of the best companies in the sector!

2. Tell us more about your apprenticeship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

I’m working with the Node and Tooling team, mentored by Julien Tesson on a project to build a user interface to launch and interact with a Tezos node on the terminal. My project aims to improve the user experience of Tezos. It provides a clean, easy and flexible way to run and monitor a Tezos node on your terminal.

My project requires me to understand the architecture of Nomadic Labs’ software and to carry out an efficient technical development within it. I already had some familiarity with the basics of functional programming (especially OCaml) from my engineering school, but this project has taught me more advanced aspects of programming, and more about the OCaml toolstack.

This project has also helped me to develop new communication skills, as it requires close coordination with the development teams of Nomadic Labs. We have to discover, understand, and take into account the needs and suggestions of the developers and users, and conversely I have to communicate back to them about the progress of the project.

Working at Nomadic Labs has been an opportunity to be part of a team of awesome people with great human values and a great attitude to mentorship. Undergraduate courses are done by individuals but software is created by teams, so it has been a delight to learn how important it is to be technically competent and also well-integrated working with members of the team.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this apprenticeship?

Before my 2020 summer internship at Nomadic Labs and my year’s apprenticeship since then, I didn’t know much about blockchain. I heard about Nomadic Labs and Tezos from a teacher at school. I started to read Tezos project documentation and found myself fascinated by the algorithms, the techniques, the libraries used, and in general the detailed software architectures that help achieve robustness in a large-scale safety-critical distributed system. Also, I like that it is open-source and uses OCaml.

So I was delighted to get an apprenticeship with Nomadic Labs to develop my technical skills while learning more about OCaml, the blockchain ecosystem, and about Tezos in particular.

I have found the experience thrilling and hope to keep working in the OCaml community, and hopefully within the blockchain ecosystem.

Questions for Killian’s Nomadic Labs mentor Julien Tesson.

What is your input to the work carried out by Killian?

We started the project from scratch at the beginning of Killian’s apprenticeship back in September 2020, and he has become highly committed to it. Killian did a very good job of designing his prototype, and we have had some great discussions on the project architecture and on the user experience offered.

During the year of Killian’s apprenticeship it’s been my pleasure to watch Killian learn and evolve as a software engineer, and I have been impressed by his determination and his progress. His apprenticeship with us will end soon, when he graduates, and I wish Killian all the best for his future!

Why is the topic of this apprenticeship important?

Currently, you can download Octez the Nomadic Labs implementation of Tezos, and fire up the command tezos-node to run a Tezos node (tutorial here). And this is all good — but even a professional developer would agree that the tezos-node logging output is not the easiest to read.

A command-line client exists which can query a node for more information, but it does not necessarily deliver more easy-to-read information. Developers also have specific tools to monitor node health — you launch a dedicated program to query the node, that feeds a Prometheus service that itself feeds a Grafana service, which serves you a webpage. This is powerful, but not necessarily well-suited to every end-user.

Killian is developing a lightweight and user-friendly standalone prototype control utility with which a user can run a node and interact with it, getting meaningful and easily-digested diagnostic information from the node, and being able to issue commands via the utility to control the node’s behaviour. Think: a “a dashboard, for Tezos nodes”.

This is quite a large project that would greatly improve the user experience, and Killian’s prototype has laid the foundations for it. I am very pleased with his work.

Three questions to Nomadic Labs apprentices — Daniel Jean

2021-08-20T14:00:00+02:00

In this blogpost, we will ask three questions of one of our current apprentices: Daniel Jean (and a couple of questions of his mentor at Nomadic Labs).

Daniel — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Daniel

1. Please present yourself and your academic background

I am Daniel Jean. I’m in the first year of a two-year Master’s degree in financial engineering at ESILV, with a specialisation in fintech. From 2018-2020 I did a three-year Bachelor’s degree in computing at the Université de Paris.

I am also an enthusiastic Judo player. I started when I was three and represented France at the 2018 European and World championships in Israel and Azerbaijan.

2. Tell us more about your apprenticeship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

With the growing adoption of blockchain, we are witnessing vigorous interest from users, companies, banks, and institutions for the technology in general and for the (rapidly developing) Tezos blockchain in particular.

Tezos has some unique features which in principle make it well-suited to industrial adoption: it’s fully open-source; it has an on-chain governance structure; and it’s particularly flexible with its regular auto-update feature. Indeed, Tezos builds in upgrade propositions and an associated voting mechanism which allows users to vote for or against proposed updates in a fully transparent manner.

This is fine in high-level terms — but blockchain is an industry that is very new and very competitive. So concretely, we still need to communicate the benefits and potential of the Tezos ecosystem, and discuss with companies like Exaion, Ubisoft and Société Générale how integration with Tezos software can meet their current and future business needs. Conversely, we also need to understand our users’ needs and feed these back to Nomadic Labs’ developers.

This process of communication and consensus-building is vital, and that’s why a good support team is needed. It’s never just about the tech; it’s always also about how that tech is communicated, developed, and supported.

As Nomadic Labs’ support team, we help users to experiment, test, and build on the Tezos blockchain. We design case studies and proofs of concept, and support the creation and deployment of solutions to the Tezos main network. We help users to solve issues, and create documentation to explain the core principles of Tezos and so make it easy for newcomers to join the Tezos ecosystem.

My mentor is Sébastien Choukroun. He has been a very helpful and engaged supervisor and I have learned a lot from him and from the rest of the support team.

I have been working at Nomadic Labs since August 2020. It’s been very fruitful to strengthen my knowledge of blockchain in general and deepen my expertise in Tezos in particular. I have learned about the core features of Tezos and how the different components of this complex software work. My discussions with institutional users have taught me how to effectively listen to users’ needs and deliver effective support; be this about helping to launch Tezos nodes, new bakers, or reviewing smart contracts.

Tezos brings the third pillar of blockchain technology to the industry: the auto-evolutivity through its governance core principle called self-amendment. Tezos is capable of evolving every three months to improve its software and make it more robust, scalable and user-friendly. Self-amendment, and the technical robustness which it implies, can be a key differentiator to create a sustainable network and secure long-term adoption.

For instance, I was delighted when Tezos users recently accepted the Granada amendment of the protocol, bringing various improvements to ensure a better scalability, liquidity of the XTZ (the native coin of Tezos), and gas optimisations (related to fees paid when performing operations on the blockchain).

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this apprenticeship?

I heard about blockchains three years ago after the 2018 crash, and I was curious about this new geek “thing” to allow people to exchange value and expand financial technology through smart contracts. I started to read about blockchain technology and before I knew it, I had fallen into the rabbit hole (as we say in the ecosystem). I’ve been happily in it ever since.

I learned about the basic principles of Tezos and found the blockchain really interesting, technically speaking. During a discussion with a friend, I discovered that Nomadic Labs was working on Tezos and that It could be a great place to learn and deepen my knowledge of the software. I jumped at the chance to apply at Nomadic Labs … and here I am!

After my apprenticeship, I plan to further invest myself in the blockchain industry, hopefully as an engineer at Nomadic Labs. Also, I am delighted to run a Tezos baker Baking.Finance and I plan to keep maintaining it in the future.

Questions for Daniel’s Nomadic Labs mentor Sébastien Choukroun

What is your input to the work carried out by Daniel?

I helped Daniel to focus on priorities amongst the projects he worked on. I also gave him feedback on project management. This provided direction for his answers to users and for preparing materials for Tezos users. I also shared advice on good practice for blockchain projects with financial institutions.

Why is the topic of this apprenticeship important?

It helped users to interact with Tezos, in particular Exaion, Ubisoft and Société Générale. His experience as a baker allowed him to share good practice and handle potential issues faced by these corporate bakers.

Three questions to Nomadic Labs interns — Julien Coolen

2021-08-16T14:00:00+02:00

Each year we host around half-a-dozen interns (stagiaires), for periods of two to six months. These young men and women are usually studying or have just finished studying for undergraduate or Master’s degrees, and are eager to gain experience in the blockchain industry. Each has a unique story to tell about their individual interest in this technology.

In this blogpost, we will ask three questions of one of our current interns: Julien Coolen (and a couple of questions of his mentors at Nomadic Labs).

Julien — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Julien

1. Please present yourself and your academic background

I am a first-year master’s student in cryptology (double master Mathématiques, Informatique de la cryptologie et sécurité) at the Université de Paris. I graduated in 2020 with bachelor’s degrees in Mathematics and Computer Science.

Being fond of open research, open source, and formal verification, I tried to gain experience in these areas. In 2019 I worked as a system administrator intern to configure and deploy a digital library for my mathematics faculty. Then in 2020 a friend and I attempted to program and prove the correctness of an LL(1) parser using the Coq proof assistant. We thank Yann Régis-Gianas for this learning opportunity. Also in 2020, I developed features for an OCaml bot to simplify collaborative software development, under the supervision of Théo Zimmermann at Inria.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

I am developing a distributed hash table (DHT) for file sharing during a three-month internship from June to August 2021, under the supervision and mentorship of research engineers Vivien Pelletier and Julien Tesson at Nomadic Labs.

A hash table is a key-value store; you need a key to retrieve each piece of stored information. However as your store grows you become limited by computer memory. The distributed nature of DHTs addresses this issue: the data is distributed around different computers on the internet. An example of a DHT is the IPFS distributed web.

There are several ways to implement a DHT. We have chosen one that exploit the Tezos peer-to-peer (P2P) library as an off-the-shelf and well-tested component. With it, the distributed parts of the distributed hash table (called peers) can communicate.

This project aims to test the architecture of the peer-to-peer library of Tezos in these somewhat different circumstances:

In the Tezos blockchain, connections between peers are arbitrary, whereas in a DHT connections follow specific patterns and rules.
The size of a message sent across the Tezos network is typically a few kilobytes, and only occasionally megabytes. The Tezos network protocol never requires to transfer a message more than 100 megabytes; meanwhile the serialization library (which interfaces between the Tezos network protocol and the peer-to-peer library itself) is limited to a message size of 1 gigabyte.

My DHT places no theoretical limit on message size (though there are practical ones, e.g. disk space) so we intend to benchmark and then optimize performance of the peer-to-peer network using my DHT, to see how far we can scale practical message size. We hope to attain at least gigabytes — thus ensuring that the practical capabilities of the peer-to-peer library are a thousandfold above that required by the exigencies of the Tezos network.

After a bit more than a month, I (with help from my mentors) have built a prototype of DHT and tested it using the library for unit and integration testing from Tezos. To achieve this, I had to familiarize myself with part of the Octez codebase, which is the Tezos implementation to which Nomadic Labs contributes.

I thank Vivien and Julien for explaining the innards of a full-scale industrial codebase running live and safety-critical code. This experience has also brought home to me how just one tiny mistake in a distributed system can corrupt the entire network!

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I chose Nomadic Labs because the company applies formal methods to build robust, open-source software with the OCaml programming language. In particular, their smart-contract stack-based language Michelson is the only one of its kind to be formalized!

Blockchains are in an exploratory phase, so they raise many technical challenges and create many opportunities for innovation and discovery. The end goal is to build resilient, transparent, and accessible systems.

After this internship, I hope to continue working with functional programming languages and formal methods.

Questions for Julien’s Nomadic Labs mentors Vivien Pelletier and Julien Tesson.

What is your input to the work of Julien?

Julien is independent, and in a good way. Our supervision consists mostly of discussing objectives for the week and answering his many penetrating technical questions.

Why is the topic of this internship important?

This peer-to-peer layer is tailored to the OCaml Tezos node implementation (which is now called Octez). By using this layer for other purposes — sharing files, in this case — we hope to improve its maturity.

In developing a distributed hash table, Julien had to add some features to this layer that should be useful for future improvements of the Tezos node; for example, the possibility to query the related network address of a peer, given its cryptographic identity. Another feature that Julien plans to add is to let users of this layer configure the topology to their needs.

Network Updates from the Granada Protocol Amendment

2021-08-13T08:01:00+02:00

The Granada protocol amendment was activated at block 1589248 on Friday 6 August 2021 at 11:36 AM CEST. Nomadic Labs would like to thank the community, bakers, and development partners for their diligent assistence in developing and adopting this proposal.

On May 31st, Nomadic Labs proposed together with Marigold, Oxhead Alpha, Tarides, and DaiLambda an amendment to the Tezos protocol named Granada. Tezos’ on-chain governance procedure enabled the adoption of the proposal by the community and the subssequent activation of Granada.

Granada introduced three important upgrades:

Liquidity Baking: 2.5 tez are minted at each block to incentivize liquidity providers of a tzBTC/tez pair
Emmy*: a change in the consensus algorithm making blocks up to twice more frequent
Gas improvements coming from a refactoring of the Michelson interpreter and an optimization of data serialization.

Since August 10, 2021 23:19 CEST, the Granada protocol has run on Tezos mainnet for a complete cycle (8192 blocks). Below, Nomadic Labs is pleased to share data from the first complete cycle and an analysis of the impact of these three features.

Liquidity Baking

Liquidity baking is a primary example of the utility of Tezos’ on-chain governance to provide for public goods that go beyond securing the network.

The liquidity baking proposal is an experiment to see if a decentralised protocol can use incentives to create liquidity around them. As of publication, over 980,000 tez and 72.3 tzBTC (about $6.8M worth) were deposited in the liquidity baking contract, demonstrating a successful deployment of the contract and concept.

During the first cycle of Granada, total value locked in the liquidity baking contract steadily increased. In the first few hours after going live, over 2 million USD worth of coins (24.19 tzBTC + 331,000 tez) were deposited.

Graph of the so-called “TVL”, courtesy of liquidity-baking.com

Community members can track live statistics on liquidity baking on the statistics page developed by Tessellated Geometry — liquidity-baking.com.

Emmy*

Emmy consensus algorithm is one of the most important changes to the core protocol of Tezos in Granada. Emmy tweaked the block delay formula to allow blocks to be produced more frequently.

Emmy* is the culmination of multiple updates to the consensus algorithm. In a recent survey we have detailed the various flavours of the Emmy family of consensus algorithms.

Before Granada, the minimum delay between blocks on Tezos mainnet was set to 60 seconds. When the network was perfectly healthy (with all bakers and endorsers online and able to quickly produce, validate, and transmit blocks and endorsements), the chain advanced at the speed of one block every minute. When network health dropped however, the chain reacted to maintain consensus. This works based on the number of endorsements that a baker includes in a given block; each block contained 32 endorsement slots and if less than 2/3 of these slots were filled, the baker had to wait before publishing its block (the fewer slots are filled, the longer the delay).

Emmy* does not fundamentally change this behavior, but it changes the parameters in order to decrease the latency of the chain. In Granada, each block contains 256 endorsing slots (instead of 32), the minimal delay between blocks is 30 seconds (instead of 60), and the required proportion of endorsement slots needed to publish a block after 30 seconds is 60%.

Figure 1 shows the level of each block of the last cycle of Florence (cycle 387) and the first cycle of Granada (cycle 388) in function of the time the block was produced.

Block Times

Activation of Granada and Emmy* caused a temporary slow down in the chain. Blocks were produced on average every 2 minutes and 11 seconds with a maximum time between blocks of roughly 15 minutes.

Our investigations uncovered several issues. First, blocks that arrived too late were not endorsed at all, causing the next block to be produced after the maximal block delay (almost 15 minutes for a block baked at priority 0). Version 9.6 of Octez was released on Friday at about 5PM CEST to mitigate the delay. The update increased the delay after which the endorser gives up on endorsing to 1200 seconds (previously 110 seconds).

Version 9.6 of Octez ensured no blocks were baked completely without endorsements, solving a key aspect of the slow down. After further research, we noticed numerous computations took place during the handling of pending consensus operations. Fixes for this problem were included in version 9.7 of Octez, released on Saturday August 7 at about 10 PM CEST.

Bakers reacted quickly and updated to the new versions of Octez. This time, the rate at which blocks are produced and propagated on Mainnet was significantly improved: quickly after the version 9.7 of Octez release the chain average delay between blocks decreased to almost 30 seconds.

If we take as reference the block production frequency of mainnet until the activation of Granada (one block per minute), it took 3 days, 6 hours, and 51 minutes for the chain to catch up at block 1593978 (the 4731 first blocks of Granada were produced in 4731 minutes).

Remaining Issues

For the rest of the cycle, the chain has run at the expected speed but problems remain: endorsements are still being missed more often than expected. We are continuing to monitor the situation and to investigate the potential causes of the missing endorsements.

At the moment, we are leaning towards the conclusion that with Emmy* block propagation itself may be reaching the limits of what some bakers are capable to sustain. We are thus investigating several ways to improve block propagation, with possible optimizations such as only validating block headers before propagating blocks.

We would like to thank the community for their continuous support. We are especially grateful to everyone that promptly upgraded their nodes, tested patches and provided feedback. The collaboration and communication of the community and bakers contributed to our ability to identify and mitigate the network issues.

If you are running a node and haven’t upgraded it to version 9.7 yet, we strongly advise you to do so.

Gas improvements

As we explained in a recent post, Granada includes two important gas optimizations: a refactoring of the Michelson interpreter and memoization of the serialization function for recursive types.

In order to evaluate the impact of these gas improvements we have measured the gas consumption of all contract calls in the last cycle of Florence (cycle 387) and in the first cycle of Granada (cycle 388). There is lot of variation between the gas consumed in the smart contract calls because the called contracts can be very different. To make the result smoother and more readable, we have averaged these gas consumption measured over a span of 1000 smart contract calls.

The result can be seen in the figure below:

Overall, average gas per smart contract call has been decreased by a factor of 5; from 43197 gas units per call in cycle 387 to 8591 gas units per call in cycle 388.

Conclusion

Thanks to the activation of the Granada amendment, more than 5 million USD worth of liquidity have been provided to the newly deployed Liquidity Baking contract, the Tezos chain is moving almost twice as fast as before, and the gas consumption for smart contract calls has been reduced by a factor of 5.

We acknowledge the migration was not as smooth as usual. All Tezos dev teams understand the inconvenience that results from slowing down of the chain and missed endorsements. This was caused by a combination of issues, some of which still require investigation. We would like to thank the responsiveness of community members and bakers who applied emergency Octez updates.

Three questions to Nomadic Labs interns — Valentin Chaboche

2021-08-06T14:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Valentin Chaboche (and a couple of questions of his mentors at Nomadic Labs).

Valentin — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Valentin

1. Please present yourself and your academic background

My name is Valentin Chaboche, I am 23 years old and a student at the Université de Paris.

In 2019 I graduated from the Université Paris Diderot 7 with a bachelor’s degree in Computer Science, then I stayed for a master’s degree in Language et Programmation, graduating in 2021. The course content includes algorithms and programming, distributed systems, and formal verification.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

I am doing an internship at Nomadic Labs with the verification team under the supervision of Arvid Jakobsson and Zaynah Dargaye, from March to September 2021.

Nomadic labs is always working to improve the quality of the Tezos codebase and assure its correct behaviour — in particular through mass testing of the codebase and reduction of boilerplate code. My internship aims to contribute as follows:

Develop an annotation-based system for specifying invariants of OCaml functions.
Implement a preprocessor that transforms the annotations to property-based tests.
Explore heuristics for automatically constructing data-generators for types of particular relevance to the Octez codebase (the OCaml Tezos implementation).

My internship has exposed me to many new ideas, including formal verification of programs, blockchains and distributed systems, and metaprogramming. This is also my first experience in a collaborative open-source project. During my internship I have had the opportunity to develop and release libraries, create and review merge requests in Tezos, and participate in open-source projects used in Tezos. I had to learn to communicate and transfer knowledge about my work at Nomadic Labs within development teams, through meetings in small groups, documents, and team presentations.

During the three months of my internship so far, I have been surprised and delighted by the kindness and goodwill of people at Nomadic Labs and in the OCaml ecosystem. They have always taken the time to help me with technical and theoretical questions. I’ve found it to be a welcoming environment for a newcomer like me and I am grateful to have had an opportunity to be part of such a great community.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I strongly believe in distributed and decentralized systems, so participating in such a complex and rich project was a great opportunity for me! Nomadic Labs hosts very smart engineers — and I have discovered that a lot of theoretical insight goes into designing Tezos.

This internship has been an opportunity to work with and learn from talented software engineers and researchers. It’s the perfect opportunity for an OCaml enthusiast such as myself to participate in the development of a complex OCaml codebase, with sophisticated programming features, all while working on one of the leading projects of the OCaml ecosystem.

Questions for Valentin’s Nomadic Labs mentors Arvid Jakobsson and Zaynah Dargaye

What is your input to the work of Valentin?

Valentin already had good familiarity with algorithms and programming, and with the development and versioning tools which we use at Nomadic Labs. Our role was to teach him how industrial research and development is carried out: study the available state of the art, consider and select a solution that seems likely to work, and develop it incrementally as directed by relevant use cases.

Why is the topic of this internship important?

At Nomadic Labs, we are constantly looking to assure the high quality of our code.

Property-based testing is a lightweight formal methods-based style of testing that lies between formal methods and unit tests. It relies on asserting correctness properties of code, and then using the computer’s own computational power to generate huge numbers of unit tests from test script heuristics which, in practice, are highly effective at finding corner cases — often more effective than humans.

Our experience with property-based testing has revealed some practical bottlenecks to its more widespread application. In particular, we have found that writing input-set generators and test-scripts can be time-consuming, requires expertise in the tooling, and can become a source of bugs.

Valentin’s internship is exploring techniques to automate such steps, and so permit a more widespread practice of property-based testing in the validation phase of developments in Nomadic labs, and more generally for developments in the OCaml language. This would assure a higher-quality codebase while still making relatively light demands on our developers to provide information for generating property-based tests.

Valentin’s first results are promising. He has been able to validate his approach on concrete use cases in our team’s developments, and we expect other software components could soon benefit from it too.

Granada, the latest Tezos upgrade, is LIVE

2021-08-06T08:00:00+02:00

This is a joint announcement from Nomadic Labs, Marigold, Oxhead Alpha, Tarides and DaiLambda.

On 6 August 2021, the Tezos blockchain successfully upgraded by adopting Granada at block #1,589,248. Granada was jointly developed by Nomadic Labs, Marigold, Oxhead Alpha, Tarides and DaiLambda. It follows the Florence protocol upgrade three months ago, and will be the seventh Tezos upgrade since Athens’ activation in May 2019 (overviews here and here).

Granada includes several bug fixes and small improvements and includes the following substantive changes:

Emmy*: Granada brings faster finality to Tezos. With the Emmy* update of the consensus algorithm, the time between blocks can now be as low as 30 seconds. Also the number of endorsement slots per block has been increased from 32 to 256. Other constants including rewards and security deposits have been updated.
Liquidity Baking: A decentralized exchange between tez and wrapped bitcoins (via tzBTC) will be deployed during the activation including a mechanism to incentivize decentralized provision of liquidity.
Gas Improvements: a refactoring of the Michelson interpreter has led to significant performance improvements (typically a decrease of three to six times for already deployed contracts). Note that unfortunately this refactor has introduced a non-critical bug for smart contracts using the COMPARE operator.

Congratulations to everyone involved in the development of this amendment and welcome to the Tezos blockchain, Granada!

We intend to inject our next proposal “H” before the end of September. It has yet more interesting features, large and small. Stay tuned, and happy baking!

Follow-up on the verification of Liquidity Baking smart contracts

2021-08-03T18:00:00+02:00

Granada has passed all three stages of voting and will be activated at the end of the Adoption period, which would likely take place this week on August 6. Liquidity Baking is one of the key features of Granada. We already wrote a first progress report on the verification of Liquidity Baking smart contracts in which we have provided an in-depth description of our effort to provide strong safety and correctness assurances. In this post, we would like to give a short update on our verification efforts around Liquidity Baking smart contracts.

In a nutshell, these efforts can be divided into two approaches: (1) formal verification of the CPMM contract using the Mi-Cho-Coq framework, and (2) intensive integration testing using property-based testing frameworks. Most notably, the tests implemented in (2) were focusing on the hypotheses used to conduct our formal proofs in (1). Recently, we have decided to increase the scope of our integration tests to also include some key properties proven in Coq using Mi-Cho-Coq. In particular, we are now testing that:

All three balances of the CPMM contract are always strictly positive
The CPMM invariant (see its definition here) is strictly increasing.

Having in our integration tests properties that have been proven in Coq may seem redundant but these tests will act as regression tests in case the CPMM contract is ever changed in future protocol update. Furthermore, it is reassuring to have results validated by two different methods.

Other audits and verification efforts

Speaking of different methods, we would also like to highlight that audits and verification efforts for Liquidity Baking smart contracts have been made outside of Nomadic.

Indeed an audit of tzBTC has been published in March 2020 by Least Authority.

Furthermore Runtime Verification have published a Formal Verification Report on Dexter 2 and have just released one on Liquidity Baking, which was based on the same contract.

Runtime Verification’s verification approach shares some similarities with our formal verification efforts with Mi-Cho-Coq. In both cases, the work relies on a formal semantics of Michelson (K-Michelson for Runtime Verification, Mi-Cho-Coq for Nomadic Labs) that permits to rigorously reason about the behavior of programs. Note also that similar properties have been proven (e.g., the functional correctness of the contracts’ Michelson bytecode, or the fact that the CPMM invariant is strictly increasing).

Of course, our approaches, while similar, differ in some aspects. In particular, Mi-Cho-Coq is based on Coq, while K-Michelson is based on the K Semantic Framework. We can also notice that the verification effort of Runtime Verification also covers the FA1.2 contract used by the CPMM to manage the tokens distributed to the liquidity providers. Overall, we believe that our approaches are complementary.

We are glad that several teams have independently studied the security of the Liquidity Baking contracts, as it gives the community more assurances that Liquidity Baking is safe to use.

Three questions to Nomadic Labs interns — Étienne Marais

2021-08-02T11:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Étienne Marais (and a couple of questions of his mentors at Nomadic Labs).

Étienne — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Étienne

1. Please present yourself and your academic background

My name is Étienne Marais, I’m 24 years old, and I live in Paris. I’ve been an OCaml enthusiast for 4 years now! I’m really interested in the free software movement and the open source community. I studied undergraduate computer science at Paris Diderot University from September 2016 to August 2019. Then, I started a two-year master’s degree at Université de Paris (former Paris Diderot University) in September 2019, and I’m now in my last year. My studies are mostly on programming languages, especially functional programming and compilation.

I got my interest in OCaml and blockchains from a pair of summer internships: two months in the summer of 2018 with IRIF (Research Institute on Fundamental Computer Science) showed me OCaml; then two months in the summer of 2020 with the OCaml Software Foundation working on the Learn-OCaml platform brought me into contact with the Tezos ecosystem and Nomadic Labs. I learned about blockchains and the possibilities they offer for decentralization, self-governance and transparency.

I’m concerned about global warming and ecology, so how blockchains deal with their energy consumption is an important issue to me. To answer it, I’ve dug further into the subject and learned about the difference between Proof of Work (which by design consumes a lot of energy) and the Proof of Stake (which consumes less energy, and is what is used in Tezos).

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

In March 2021, I started an internship with Victor Allombert, Julien Tesson and Mathias Bourgoin on the energy consumption of Tezos. There are two goals:

To provide the Tezos community with tools to monitor the energy consumption of their live Tezos nodes.
To develop a benchmarking tool which will allow developers to profile the energy consumption of a Tezos node while performing various standard functions in the lab (i.e. under controlled and replicable conditions), such as baking a block or executing a simple smart contract.

This internship is really challenging and stimulating, for two reasons:

Firstly, it has pushed me deeper in the OCaml world, and it’s really thrilling to see the capabilities of the language in such a big project. Discussing with other developers about how the tools I’m writing could be integrated into the actual workflow is amazing, because they always have good suggestions and help me improve the quality of my work.
Secondly, profiling computer performance is not new, but optimising the energy-efficiency of computer systems (some call this green computing, or ICT sustainability) is a relatively recent field of research. There are few precedents for this and we have to build the tools we need from scratch. It’s really stimulating to work with my mentors to try to produce a tool which is as accurate as possible.

I’m amazed by the technologies used to make a blockchain work, and how every person on the project is important to make the project succeed. It is exciting to be part of such a big project and make contributions to it.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I decided to join Nomadic Labs because people working here are close to the research world. Many of them have a PhD, so they are used to sharing their knowledge with others, and this creates a healthy place to learn the skills to develop a blockchain ecosystem. Furthermore, they belong to the OCaml community, so I get to learn about the design of those libraries in the ecosystem that they maintain.

I wouldn’t have called myself a blockchain enthusiast before starting my internship with Nomadic Labs, but now I’ve discovered that blockchain is a technology that exists at the intersection of many important fields, including formal verification, networking, and concurrency — and with broad applications that are not limited to finance.

After the internship, I will leave Nomadic Labs as I want to discover other companies, but I intend to join another company working on Tezos, and to continue helping develop the OCaml ecosystem.

Questions for Étienne’s Nomadic Labs mentors Victor Allombert and Mathias Bourgoin

What is your input to the work of Étienne?

Through regular meetings and spontaneous questions, we discuss, exchange and propose various solutions and ideas to solve our problems. This guides Étienne toward the implementation of prototypes that we validate and improve together over time, leading to viable implementations.

Why is the topic of this internship important?

There’s no point in building a technology nowadays unless it’s green. Unlike Proof-of-Work blockchains, Tezos’ Proof-of-Stake inherently requires much less energy and cost to operate, but we need to back this up and optimize performance using concrete figures. Fine-grained performance monitoring is necessary to understand the energy consumption of a node, and thus of the network overall. Thanks to such tools and metrics, we can continuously reduce the ecological (and economic) footprint of Tezos.

Emmy: seven years of updatable consensus

2021-07-30T12:00:00+02:00

Tezos is a blockchain¹, and a blockchain is a distributed database that gets updated by adding blocks (small bunches of database operations). Tezos’ basic design goes back to a 2014 whitepaper, which was genuinely groundbreaking² — not least in its insight of making the protocol of the database itself be a mutable database entry; this is the so-called Tezos protocol upgrades. From this idea has grown a broad blockchain ecosystem.

A key design choice is the consensus algorithm: how our system decides which blocks to add, and which to discard. Thus far, Tezos has used and refined a Nakamoto-style consensus algorithm called Emmy. Consensus algorithms are genuinely subtle,³ so it is high time that we explain what Emmy is and how it works:

A survey (and explanation) of Emmy
How does the Emmy family work?
Forks
The three flavours of Emmy
The mathematics of withstanding attacks
Forks starting now
Forks started in the past
A tool, and the attack model
- A concrete tool
- Notes on the attack model
The future
- Further reading:

\(\newcommand\ie{\mathit{ie}} \newcommand\dpp{\mathit{pd}} \newcommand\te{\mathit{te}} \newcommand\md{\mathit{md}} \newcommand\db{\mathit{bd}} \newcommand\de{\mathit{ed}} \newcommand\trz{\mathit{level\_reward\_zero}} \newcommand\tr{\mathit{level\_reward\_nonzero}} \newcommand\br{\mathit{block\_reward}} \newcommand\er{\mathit{endorsement\_reward}} \newcommand\f[1]{\mathit{#1}} \newcommand{\set}[1]{\{#1\}} \newcommand{\var}[1]{\f{#1}} \newcommand\epdelay{\mathit{emmy}^{\text{+}}\hspace{-.12cm}\_\mathit{delay}} \newcommand\esdelay{\mathit{emmy}^{\star}\hspace{-.12cm}\_\mathit{delay}} \newcommand\edelay{\f{delay}} \newcommand\edelayp{\f{delay^+}} \newcommand\edelays{\f{delay^*}} % syntax highlighting hack \newcommand\edelaydelta{\edelay_{\Delta}} \newcommand\edelays{\f{delay^\star}} \newcommand\params{\lambda} \newcommand\ie{\f{ie}} \newcommand\te{\f{te}} \newcommand\dpp{\f{dp}} \newcommand\de{\f{de}} \newcommand\dpde{\f{dpde}} \newcommand\bd{\f{bd}} \newcommand\bdz{\f{bd_0}} \newcommand\od{\tau} \newcommand\emmy{\f{Emmy}} \newcommand\emmyp{\f{Emmy^+}} \newcommand\emmys{\f{Emmy^{\star}}} \newcommand\emmyf{[\f{Emmy}]} \newcommand\hb{H} \newcommand\cb{D} \newcommand\pr{\mathtt{Pr}} \newcommand\pprio[1]{\mathbb{P}_{prio}(#1)} \newcommand\pendo[1]{\mathbb{P}_{endo}(#1)} \newcommand\pdiff[1]{\mathbb{P}_{\Delta}(#1)} \newcommand\secu{\eta} \newcommand\steal{\f{sb}} \newcommand\seen{\f{seen}} \newcommand\pseen[1]{\mathbb{P}^{#1}} \newcommand\psteal[1]{\mathbb{P}^{#1}} \newcommand\pcst{\f{no\_delay}} \newcommand\pcstn{\f{past}} \newcommand\pc{p^D} \newcommand\maxp{\f{max\_p}} \newcommand\ph{p^H} \newcommand\barpc{\bar{p}^D} \newcommand\barph{\bar{p}^H} \newcommand\tsh{t^H} \newcommand\tsc{t} \newcommand\bh{b^H} \newcommand\difff[1]{\f{diff\_first}(#1)} \newcommand\diff[1]{\f{diff\_subseq}(#1)} \newcommand\diffl[1]{\f{diff}_{\ell}(#1)} \newcommand\difffp[1]{\f{diff\_first^{\f{no\_delay}}}(#1)} \newcommand\diffp[1]{\f{diff\_subseq^{\f{no\_delay}}}(#1)} \newcommand\diffg{\f{diff}} \newcommand\diffgns{\f{diff}^{ns}} \newcommand\diffgs{\f{diff}^{s}} \newcommand\cchain{\f{ch}^\cb} \newcommand\hchain{\f{ch}^\hb}\)

A survey (and explanation) of Emmy

In this blog post we will give an explanation and survey of the Emmy family of consensus algorithms, which has powered the Tezos blockchain since its inception. We will start with

an explanation of how the Emmy family works, then
discuss in specific detail the mathematics of withstanding attacks, then
discuss “forks starting now”, then
discuss “forks starting in the past”, then
mention a concrete tool to compute expected confirmation numbers — how long you expect to wait for transactions to become final; bankers would call this the “settlement time” — and finally
we discuss the future — including a planned move to Tenderbake (which has also been the topic of a previous blog post).

We mentioned there are several flavours of Emmy:

Currently, Tezos uses the Emmy+ consensus algorithm.
This is a successor⁴ to Emmy.
An upgrade to Emmy* is imminent in the Granada protocol proposal (election).

So ‘Emmy’ is a family of consensus algorithms — technically, a collection of proof-of-stake Nakamoto-style consensus algorithms — parameterised over some dials and switches which we can be tweaked to optimise safety and performance — and ideally, both at the same time.

These parameters include (explanations will follow below):

the number of endorsement slots per block,
the minimal delay function, and
the chain fitness function.

The evolution from Emmy to Emmy+ to Emmy* consists of tweaks to these parameters, each of which had to be carefully considered and tested. These tweaks are practically motivated: each iteration offers worthwhile real-world improvements in either speed or security or (preferably) both.⁵

The evolution of these algorithms has been a part of the evolution of Tezos itself, and it illustrates just how far it is possible to optimise Nakamoto-style consensus in a live, functioning, industrial blockchain. So …

How does the Emmy family work?

A blockchain is a sequence of blocks. We call position of a block in this sequence its level: thus, the “block at level $l$” is the $(l{+}1)$-th block in the sequence. We start counting at zero, so the first block (also called the genesis block) is at level $0$. (You can view details of the genesis block of the Tezos blockchain.)

Blocks are timestamped. So the block at level $l$ will also have some timestamp $t_l$.

For simplicity suppose also that the blockchain is unforked, meaning that the network agrees on the blockchain up to and including block $B$ at level $l$ (we consider the more general case of a forked chain below).

Now it’s time to decide on what block to add at level $l{+}1$ to our unforked chain. The consensus algorithm gets to work, according to the following rules:

The rules

High-level view …

At a high level, Emmy acts as follows:

sort participants into random order,
query each participant in order for a block, and
add the first valid block received, to the blockchain.

But this is a blockchain system, so we need an algorithm that can achieve the effect above and

the algorithm is scalable, efficient, and resilient to real-world faults such as network interruptions, and
does not assume a central authority, and
does not assume all participants are well-behaved.

… in more detail

We will need two inputs to the algorithm:

We need a random seed.

This is a number used to seed a (pseudo)random generating function. This is obtained as a hash of the state of the blockchain approximately $4096\times 5$ blocks in the past — at one block per minute that means $20480$ minutes ago (about two weeks).
We need a mathematical function called the delay function.

$\mathit{delay}$ has type $\mathbb N\times\mathbb N\to\mathbb N₊$, meaning that it inputs two nonnegative numbers and returns a strictly positive number. $\mathit{delay}$ is a parameter of the Emmy algorithm, meaning that it varies between Emmy, Emmy+, and Emmy*; for now, it just suffices that this exists. More on this shortly.

We now proceed as follows:

Using the random seed described above, each active blockchain participant $a$ with at least one roll ( = 8,000 ꜩ) is randomly assigned two pieces of data:
- a unique earliest priority $p(a)$, which is a non-negative number.⁷ Smaller numbers are better and each participant gets a different number (think: positions in a queue).
- an endorsement power $w(a)$, which is a number between 0 and 32 (think: magic talismans; see below).
Note that the earliest priority and the endorsement power are abstract data quantities. We discuss below how these quantities get mapped to a time in seconds.

The distribution of earliest priority $p(a)$ is proportional to participants’ stake: if $a$ has more stake then there is a correspondingly increased chance that e.g. $p(a)=0$ or $p(a)=1$.

Equivalently: priorities are dispensed by a uniform random distribution on a per-roll basis, so a participant who controls more rolls, will get proportionally more priorities.

The distribution of endorsement power $w(a)$ is also proportional to participants’ stake, and furthermore the total endorsement power dispensed must sum to 32. Equivalently: the 32 endorsement slots are distributed per roll, so a participant who controls more rolls, is proportionally more likely to get one or more endorsement slots.

So intuitively:

imagine that Tezos is a fantasy magic kingdom which dispenses queue positions and a strictly limited total of 32 magic talismans to its landowners — and, because this is a proof-of-stake system, larger landowners tend to get better queue numbers and more talismans. We will continue this example below.
Recall that we assumed for simplicity that all the participants agree on the state of the blockchain so far, and in particular that it ends at block $B$ at level $l$.

Now, any participant $a$ can endorse (vote for) $B$ by transmitting an endorsement operation $d(a,B)$ across the network.

So suppose at some particular moment that block $B$ has received a set $e_B=\{d(a_1,B), d(a_2,B), \dots, d(a_n,B)\}$ of endorsements from distinct participants $(a_1,a_2,\dots,a_n)$ with endorsement powers $(w(a_1),w(a_2),\dots,w(a_n))$ respectively.⁸ Write $w(e_B)$ for the sum of the endorsement powers of $a_1$ to $a_n$:
$$ w(e_B) = w(a_1) + w(a_2) + \dots + w(a_n) . $$

We see that each endorsement is weighted by the endorsement power $w(a_i)$ of $a_i$ — so in effect, at most 32 participants can usefully endorse $B$, since there are at most 32 endorsement slots to go around.

Then a participant with earliest priority $p(a)$ has the right to bake a valid block $B'$ and attach it to a chain after a delay of
$$ delay(p(a),w(e_B)) $$
seconds. We discuss in detail below what function $delay$ is; for now the function is just a parameter of the algorithm.

Every block contains enough information (timestamp, priority, public key of creating participant, endorsements, and so on) that every participant can check — just by examining it and without reference to any external oracle — whether it validly follows on from the last block.⁶

Then $B'$ is transmitted to the network and, assuming everything is working efficiently, $B'$ gets appended to the blockchain at level $l{+}1$.
If everything is not working efficiently — perhaps the network is slow or participant $a$ is rebooting — then time passes and eventually a delay of $delay(p(a'),w(e_B'))$ may pass, where $a'$ is the next participant in line and $e_B'$ is the number of endorsements that $a'$ sees for $B$. From this moment, both $a$ and $a'$ can bake blocks. In particular, if $a$ remains hors combat then $a'$ may be the one to bake the next valid block. And so it goes on until somebody bakes a valid block and the blockchain is extended.

It might be helpful to continue our analogy of Tezos as a magic kingdom:

Suppose the Tezos blockchain is a magic kingdom and suppose block $B$ at level $l$ is the most recent Law of the magic kingdom to be passed. Landowners line up in a random order and 32 magic talismans are randomly dispensed, where both queue order and talisman distribution are weighted according to how much land each landowner owns. Landowners use their talismans to broadcast a magical blessing of the most recent Law, and — depending on the delay function, which depends on a landowner’s queue position, and on the talisman-weighted number of blessings of the most recent Law $B$ that the landowner has received — the first landowner to bake a block, gets to determine the next $B'$, i.e. to write the next law of the magic kingdom. Of course, this means that large landowners connected to large sections of the magical blessing broadcast network, tend to get early queue positions and receive lots of the magical blessings and so tend to set the laws; whereas those with smaller parcels of land or more restricted access to the network, tend to miss out.

Why we need a consensus algorithm

If this seems complicated, note what it buys us:

there is no central authority, and
the system self-organises to reward having a large stake and following the rules, and is resilient against attack, non-participation, and network delays.

Large fast networks beat small slow ones

The algorithm dispenses endorsements amongst all active participants. Thus if a small group of nodes becomes isolated for a while — or just decides to go its own way — then it is not the case that they can just give one another early priorities, vigorously endorse one another’s blocks, and so build a long chain.⁹

An isolated group may fork, but

its members can garner relatively few priorities (because statistically speaking, most priorities go to participants not in the group) and
relatively few endorsement slots (ditto),

so that the values of the delay function within the group will be relatively large, and it will make relatively slow progress. Thus, if and when the group tries to rejoin the main body of the network, the group’s branch will be short compared to the branch of the rest of the blockchain, and it will die off.

Endorsements

At first sight it might seem strange to endorse the last block $B$. After all $B$ has already been attached to the blockchain, and we assumed for the sake of argument that there are no forks. So why does it even matter if we now endorse it?

Endorsements tend to unify the network and kill off small, slow, isolated chains: a small fragment of the network will find it difficult to ignore the rest of the community and go its own way,

not just because it will struggle to obtain small priorities, but also
because even once it gets a priority it may also struggle to gain endorsements from the wider network for the block that it bakes with that priority.

In principle (withholding) endorsements could also be used to penalise $B$ if we disapprove of its contents. In practice this is not done: participants just endorse $B$ as soon as they can (for which they are rewarded with endorsing rewards). See the algorithm from the point of view of a participant.

Forks

Recall that we simplified and assumed that all participants agree on the state of the blockchain so far. But what if they don’t? In practice the chain can fork, meaning that different participants have different views of the blockchain so far.¹⁰

So what if the blockchain is forked?

Recall that the consensus algorithm depends on these three parameters:

the random seed, which was seeded from the state of the chain at least two weeks ago — we presume that no fork can last that long, so every participant now agrees on this quantity — and
the delay function, which is a fixed parameter of the current economic protocol, so again every participant agrees on this — and
the fitness function which is also a fixed parameter of the economic protocol.

So we can assume that these parameters are shared and everybody is working from a shared notion of “what is a valid chain”. Furthermore, checking validity of a proposed chain is precise and unambiguous: participants should just work from the fittest blockchain branch that they can see. Note that the use of a fitness function to choose the canonical branch is common to Nakamato-style consensus algorithms. This is known in the literature as the fork choice rule.

If the network becomes partitioned then the blockchain may fork for a while, and the partitions could evolve independently for a while, but it will snap back once connectivity is restored, provided that most (= at least half) of the participants work following the rules above. This is called a Nakamoto-style system because this is how Bitcoin works.¹¹

The rules: the Emmy family, from the point of view of a participant

Continually observe blocks and endorsements on the network.
Work from the ‘best’ valid chain $\mathcal B$ that you observe. If the system is unforked then picking the best chain is easy because there is only one. If the system is forked, only switch to another chain if it is ‘better’. What ‘better’ means for a chain is determined by a metric called chain fitness, discussed below.
Endorse the first block you observe that can validly attach to $\mathcal B$.
If you do not observe such a block, then produce one as soon as the delay function says that you validly can.

The need for a chain fitness function

Suppose we are on the blockchain and it is forked. How is an honest participant to decide which chain is better?¹³ Here are two possible answers:

In Emmy:

chain fitness is equal to the sum of the length of a chain and the number of endorsements which it contains.

Thus, an honest participant who bakes on the fittest branch of a fork will prefer the branch with the most blocks-plus-endorsements.
In both Emmy+ and Emmy*:

chain fitness is equal to the length of the chain.

Thus, an honest participant who bakes on the fittest branch of a fork will prefer the longest branch.¹⁴

Reasons for the change in fitness function

This is discussed in the post on Emmy+: search for “simplifies the optimal baking strategy”.

The issue with the Emmy fitness function (item 1 in the list above) is that it gives equal weight to an endorsement and a block — and each block can hold up to 32 endorsements. Thus, in terms of chain fitness, gathering endorsements for the last block is more important than producing the next block and thus extending the chain.

This incentivises bakers to hold off baking a block until they can pack it with as close as possible to a full complement of 32 endorsements, for fear that if they do not then their chain might be overtaken by a shorter fork (as much as ¹⁄₃₂ of the length) but with more endorsements and so greater fitness.

This incentive structure could slow down the blockchain overall, and in particular it could slow down transaction rate (the overall rate at which transactions get included in blocks and baked onto the chain), which is not good for users, who are likely to prefer high transaction rates and rapid transaction settlement times.

Emmy+ and Emmy* address this by setting chain fitness to equal chain length. Endorsements still play a role, but the effect is exerted via the delay function — discussed below; essentially, more endorsements means you can bake earlier. Honest participants just prefer the longest chain and there is no reason not to bake a block, once the delay function permits it.

The three flavours of Emmy

List of flavours

Emmy exists in three flavours:

The original version Emmy. This was used from 2014 until 18 October 2019 (this is the first block to use the Babylon protocol). Emmy is no longer current.
The new version Emmy+. This is current.
The forthcoming upgrade to Emmy* in Granada.

These are essentially the same algorithm but they differ in tweaks to their parameters. Specifically:

Emmy and Emmy+ have 32 endorsement slots per level. Emmy* has 256, speeding up confirmation time and increasing user participation.¹⁵
The chain fitness function of Emmy is “blocks + endorsements”; that of Emmy+ and Emmy* is just “blocks”, as discussed above.
The delay functions of the three algorithms differ, as discussed below.

The Delay function of Emmy

In Emmy, $\edelay$ is a function just of the priority $p$:

$$ \begin{equation}\tag{1}\label{eq:delay} \begin{array}{r@{\ }l} \edelay(p) =& \db + \dpp \cdot p \\ =& 60 + 75 \cdot p \quad \text{(seconds)}. \end{array} \end{equation} $$

$\db=60$ is the base delay. Thus this is a constant minimal offset from one block to the next.
$\dpp=75$ is the priority delay. Thus this establishes the time between priorities, as discussed above.

Worked examples:

A participant with earliest priority $0$ can start baking after $60+75\cdot 0 = 60$ seconds.
A participant with earliest priority $1$ can start baking after $60+75\cdot 1 = 135$ seconds.

The Delay function of Emmy+

Emmy+ adds a dependency on $w$, the endorsement power of the endorsements in the block to be baked:

$$ \begin{equation}\tag{2}\label{eq:epdelay} \begin{array}{r@{\ }l} \edelayp(p, w) =& \db + \dpp \cdot p + \de\cdot \max(0, \frac{3}{4}\cdot\te - w) \\ =& 60 + 40 \cdot p + 8\cdot \max(0, 24 - w) \quad\text{(seconds)}. \end{array} \end{equation} $$

It might help to view $\mathit{delay}$ abstractly as a function that tends to increase on the first argument (the priority slot), and decreases linearly on the second (the endorsement power). So:

later priority slot = more delay;
more endorsements = less delay.

In words, Equation \eqref{eq:epdelay} says that the delay is:

a base delay of $\db=60$ seconds, plus
a priority delay of $\dpp=40$ seconds, plus
a delay per missed endorsement of $\de=8$ seconds for every endorsement slot that the block to be baked falls short of a threshold of $24=\frac{3}{4}\cdot \te$ out of the $\te=32$ available endorsement slots.

From the Babylon protocol upgrade (18 October 2019) to time of writing, these parameters have been fixed at

$$ \db = 60,\quad \dpp=40,\quad \de=8,\quad \te=32. $$

Worked example:

A participant with earliest priority $0$, baking a block with $16$ endorsements for the previous block — thus, with half of the full complement of 32 possible endorsement slots — can start baking after $60+40\cdot 0+8\cdot 8=124$ seconds.
If the participant had gathered $24$ instead of $16$ endorsements, then this would drop to $60$ seconds.
A participant with earliest priority $1$, baking a block with $16$ endorsements for the previous block — thus, with half of the full complement of 32 possible endorsement slots — can start baking after $60+40\cdot 1+8\cdot 8=164$ seconds.
If the participant had gathered $24$ instead of $16$ endorsements, then this would drop to $100$ seconds.

The Delay function of Emmy*

Emmy* builds on the delay function of Emmy+ while observing that during normal operation most blocks are baked at priority 0 and with plenty of endorsements. So let’s fast-track this optimal case:

$$ \begin{equation}\tag{3}\label{eq:esdelay} \edelays(p, w) = \begin{cases} \md & \text{ if } p = 0 \wedge w \geq \frac{3}{5}\te \\ \edelayp(p, w) & \text{ otherwise} \end{cases} \end{equation} $$

Above $\md=30$ is the minimal delay and $\te=256$ is the number of endorsement slots. Furthermore, the constant $\de$ used by $\edelayp(p, w)$ has been changed to 4.

Worked example:

A participant with earliest priority $0$, baking a block with $153$ endorsements for the previous block — thus, with less than 60% of the full complement of 256 possible endorsement slots — can start baking after $60$ seconds.
A participant with earliest priority $0$, baking a block with $154$ endorsements for the previous block — thus, with at least 60% of the full complement of 256 possible endorsement slots — can start baking after $30$ seconds.
If the participant had gathered $192$ instead of $155$ endorsements, then this would make no difference and they can still start baking after $30$ seconds.
A participant with earliest priority $1$, baking a block with $128$ endorsements for the previous block — thus, with half of the full complement of 256 possible endorsement slots — can start baking after $60+40\cdot 1+4\cdot (192 - 128)=356$ seconds.
If the participant had gathered $192$ instead of $128$ endorsements, then this would drop to $60+40\cdot 1=100$ seconds.

So much for our overview of the delay function. How does this help the blockchain to withstand attacks? What does an attack look like, anyway? Read on …

The mathematics of withstanding attacks

An attack scenario (a Valuable Car)

Let the time be $t=0$ and consider the following scenario:

You have a Valuable Car to sell.
I agree to purchase it and we agree on a sum of Quite A Lot for the sale.
I bake a block $B$ at time $t=0$ recording a transfer of Quite A Lot from my account to yours.
I also secretly bake another block $B'$ (this is called double baking). $B'$ is identical to $B$ except that this includes a transaction for Far Less from my account to yours.
The main Tezos chain $\mathcal M$ continues to evolve from $B$, and (still in secret) I bake an alternative chain $\mathcal S$ evolving from $B'$ instead of from $B$.
Meanwhile, I take possession of the Valuable Car, and drive it home.
When I get home, I reveal my chain to the network. In this chain I bought your car, but for Far Less rather than for Quite A Lot.

If you complain, I can say “What’s the problem? I paid you and you gave me the Valuable Car”.

Say that my attack succeeds when there exists some time $t>0$ at which $\mathcal S$ is longer than $\mathcal M$. Then, I can reveal $\mathcal S$ to the world and as per the rules honest participants will bake from $\mathcal S$.
Say that my attack fails when for every time $t>0$, $\mathcal S$ is shorter than $\mathcal M$. I can still reveal $\mathcal S$ to the world, but as per the rules it will be ignored.

Whether or not this attack succeeds or fails depends on the relative rates of growth of $\mathcal M$ and $\mathcal S$. This is dictated by the delay function as discussed above. In the long term my chain $\mathcal S$ will be slower than $\mathcal M$ (so long as I have less than half the total stake on the chain) because I will gain relatively fewer priorities and endorsements. Still: like a gambler in a casino, I might get lucky.

So we can ask:

For a given attacker stake, what are the chances of an attacker undoing a transaction as outlined above?
Turning the question around: after how many blocks on the main chain after I paid you Quite A Lot can you feel safe about handing over your Valuable Car?

Calculating the confirmation number

Setting the scene, and some simplifying assumptions

We can amalgamate multiple dishonest bakers $\cb_i$ into a single ‘composite’ dishonest baker $\cb$ having as stake fraction the sum of the stake fractions of $\cb_i$, and similarly for the honest bakers. We can also think of a fork as two competing chains — an honest main chain $\mathcal M$ and a dishonest hostile chain $\mathcal S$ — both extending some genesis block at level $\ell = 0$.¹⁶

Thus we reason henceforth using

a single honest baker $\hb$ with stake fraction $1-f$ and
a single dishonest adversary baker $\cb$ with stake fraction $f$,
both competing to add a longer chain starting from a common block at level $\ell = 0$.

We’re playing for the adversary $\cb$ in the mathematics below, so we write

$f$ for the stake fraction of $\cb$,
$p$ for the earliest priority of $\cb$, and
$e$ for the number of endorsement slots of $\cb$.

To win, the dishonest chain $\mathcal S$ of $\cb$ must accumulate more blocks than the honest main chain $\mathcal M$. So we need to compute the probability that the timestamp of the $l$th and final block in $\mathcal S$ is less than the timestamp of the $l$th and final block in $\mathcal M$. If this happens, then $\cb$ can reveal the dishonest chain $\mathcal S$ to the network and participants will switch to it.

Assume that everyone favours themselves, and the network favours the attacker

We assume that:

Everybody bakes as early as they can (thus handing minimal advantage to the adversary).
The honest baker’s messages take time $\Delta$ to arrive, which intuitively corresponds to some slowest possible path in the network.
The attacker’s messages arrive instantly.

So this is a worst case scenario for the honest chain, in which honest network communications are as slow as they possibly could be, and dishonest attacker communications are very fast. In symbols, we can calculate¹⁷ that for a block baked by $\hb$, the minimum delay function $\edelaydelta(\ph, e)$ is:

$$ \edelaydelta(\ph, e) = \min\bigl(\max(2\Delta, \edelay(\ph, e)),\ \max(\Delta, \edelay(\ph, 0))\bigr) % \edelaydelta(p, e) = \edelay(p, e)$ for blocks baked by the $$

Some probabilities

We want to calculate the probability that the hostile chain $\mathcal S$ overtakes the main chain $\mathcal M$, under the assumptions above. This can be expressed in terms of differences between minimal block delays, using a large summation over the distribution of priorities and endorsements.

Write $\pprio{f,p}$ for the probability that the earliest priority of a player with stake fraction $f$ is $p$:
$$\pprio{f,p} = \pr[best\_prio = p] = (1-f)^p f$$
(recall: the first priority is $0$).
Write $\pendo{f,e}$ for the probability that $\cb$ gets $e$ many endorsement slots:
$$\pendo{f,e}=\pr[num\_endo = e] = \textstyle\binom{32}{e}f^e(1-f)^{32-e}.$$

We distinguish two cases, depending on whether we are baking a block to follow the shared genesis block, or subsequent blocks:

\begin{align} \difff{\pc, \ph, e} & := \edelay(\pc,32) - \edelaydelta(\ph, 32 - e) \tag{4}\label{eq:diff1}\\ \diff{\pc, \ph, e} & := \edelay(\pc,e) - \edelaydelta(\ph, 32 - e) \tag{5}\label{eq:diff} \end{align}

Above:

$\ph$ and $\pc$ are parameters representing the best priorities of $\hb$ and $\cb$ respectively.
The equations just subtract the minimal delay function for $\hb$ from that of $\cb$. A negative value here is good for the dishonest chain $\mathcal D$, and a positive value is good for the honest chain $\mathcal M$.
$\difff{}$ corresponds to the case when we compute the delay difference for the first block $\cb$ bakes which is on top of genesis and $\cb$ has access to all endorsements for genesis, while $\hb$ only has its own endorsements, not those of $\cb$ — in contrast, $\diff{}$ corresponds to the case of subsequent blocks $\cb$ bakes; for these ones, $\cb$ can only use its own endorsements, since $\hb$‘s endorsements are for blocks on $\mathcal M$ that are not on $\mathcal S$, and $\cb$ therefore cannot use them to extend $\mathcal S$.

We write sequences of priorities and endorsements of length $\ell\geq 1$ as

$$ \barpc = (\pc_1,\dots, \pc_\ell), \qquad \barph = (\ph_1,\dots, \ph_\ell), \quad\text{and}\quad \bar{e} = (e_1,\dots, e_\ell) $$

and, parameterised over these sequences, we define an accumulated difference $\diffl{\barpc,\barph,\bar{e}}$ by

$$ \diffl{\barpc, \barph, \bar{e}} := \difff{\pc_1, \ph_1, e_1} + {\sum_{2\leq i\leq \ell}}\diff{\pc_i, \ph_i, e_i} . $$

Forks starting now

We can now calculate the probability that

for chain length $\ell\geq 1$, the timestamp of the head of $\cb$‘s chain, minus the timestamp of the head of $\hb$‘s chain, is equal to $\delta$ seconds

as follows:

\begin{align} \pr_\ell[\var{chains\_diff}=\delta] := \sum_{\substack{(\barpc, \barph)\in P_2^{\ell},\bar{e}\in [32]^{\ell}\\\diffl{\barpc,\barph,\bar{e}} = \delta}}\;\prod_{1\leq i\leq\ell}\pendo{f,e_i}\cdot \pprio{f,\pc_i,\ph_i} \label{eq:pr0} \end{align}

where above, $P_2 = \{(p,p')\mid \text{either } p = 0 \text{ or } p'=0\}$ and

$$ \pprio{f,\pc,\ph} := \left\{ \begin{array}{ll} \pprio{f, \pc} & \text{if $\ph = 0$}\\ \pprio{1-f, \ph} & \text{if $\pc = 0$}\\ \end{array} \right. $$

to distinguish between the case when either $\cb$ or $\hb$ has priority 0.

Next, we give an inductive characterisation of $\pr_\ell[\var{chain\_diff}=\delta]$ which is amenable to computation.

We first consider the probabilities corresponding to the differences in Equations \eqref{eq:diff1} and \eqref{eq:diff}.

\begin{align} \pr[\var{first\_diff}=\delta] := \sum_{\substack{(\pc, \ph)\in P_2,e\in [32]\\\difff{\pc,\ph,e} = \delta}}\;\pendo{f,e}\cdot \pprio{f,\pc,\ph} \tag{7} \label{eq:pri00} \\ \pr[\var{subseq\_diff}=\delta] := \sum_{\substack{(\pc, \ph)\in P_2,e\in [32]\\\diff{\pc,\ph,e} = \delta}}\;\pendo{f,e}\cdot \pprio{f,\pc,\ph} \tag{8} \label{eq:pri01} \end{align}

Above:

$\pr[\var{first\_diff}=\delta]$ is the probability that the delay difference between the first block $\cb$ bakes on its secret chain $\mathcal{S}$ and the first block $\hb$ bakes on $\mathcal{M}$, is $\delta$.
$\pr[\var{subseq\_diff}=\delta]$ is the probability that the delay difference between a block (other than the first one) $\cb$ bakes on its secret chain $\mathcal{S}$ and the block $\hb$ bakes on $\mathcal{M}$ at the same level, is $\delta$.

For a given difference $\delta$, we define the probability of forks of length $\ell$ inductively by:

\begin{align} &\pr^{\mathit{now}}_1[\var{chains\_diff}=\delta] := \pr[\var{first\_diff}=\delta] \tag{9} \label{eq:prn1} \\ &\pr^{\mathit{now}}_{\ell{+}1}[\var{chains\_diff}=\delta] := \displaystyle\sum_{\substack{(\delta_{\ell}, \delta_1) \in \mathbb{Z}^2\\\delta_\ell+\delta_1=\delta}}\pr^{\mathit{now}}_{\ell}[\var{chains\_diff}=\delta_\ell]\cdot\pr[\var{subseq\_diff}=\delta_1] \tag{10} \label{eq:prnl} \end{align}

We note that $\delta_\ell$ can be negative as long as $\delta_1$ is big enough to compensate.

Putting this all together, the probability that $\cb$ bakes a fork of length $\ell$ that is faster than that of $\hb$ is:

\begin{align} \pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)} = \ell)] = \displaystyle\sum_{\delta \leq 0}\pr^{\f{now}}_\ell[\var{chains\_diff}=\delta] \tag{12} \label{eq:lf} % syntax highlighting hack - !_ \end{align}

We use $\pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}= \ell)]$ to answer the question:

Given some transaction in some block $B$, how many confirmations — blocks after $B$ — must we observe to be Reasonably Sure that the transaction will remain in the chain?

We call this number of blocks the confirmation number. By Reasonably Sure, we mean

“with probability smaller than some reasonable security threshold $\secu$“

whose value can be fixed to the reader’s taste. In practice, we fix a value for our security threshold $\secu$ such that our expectation of being wrong about a block being final is expected to be roughly once every two centuries, and conversely; an attacker would have no reasonable confidence of seeing an attack succeed in their lifetime. For example, when there is one block per minute, we set $\secu=10^{-8}$, as done in our previous analysis of Emmy+.

So to return to our motivating example above of the Valuable Car, and assuming one block per minute, all you need to do is wait confirmation number number of minutes before handing over the keys, and you should be Reasonably Sure that the payment of Quite A Lot is final. If you’re inclined to be paranoid, just wait a few minutes longer (meaning in effect that you use a lower and so more stringent value of $\secu$).

To compute the confirmation number concretely, we simply need to find the smallest $\ell$ such that

$$ \pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)} = \ell)] < \secu. $$

We built a tool for computing this, discussed below. For reference, the maths above also underlies the results presented in “forks starting now”.

In practice — for Tezos as it currently runs Emmy+ and assuming an attacker controlling at most one fifth of the network, a message delay $\Delta$ less than 60 seconds, and a security threshold $\secu$ of $10^{-8}$ — confirmation number is about seven blocks.

Forks started in the past

The analysis above can be refined using conditional probabilities: as time passes we can observe the behaviour of the current blockchain $\mathcal M$ and gather information about its health, and this might inform our estimates of the remaining confirmation time.

We can ask:

My transaction $tx$ was included $n$ blocks ago into $\mathcal M$ (i.e. there are $n{-}1$ blocks on top of the block with my transaction $tx$) and $\delta^o$ time ago. Can I be Reasonably Sure that my $tx$ is final?

To answer this, we consider the accumulated delay of the current chain $\mathcal M$, relative to an ideal chain $\mathcal I$ operating in ideal conditions in which all blocks are produced at priority 0 (i.e. by the first available baker) with a full complement of 32 endorsements and without any network delays.

The accumulated delay $\delta^a$ of $\mathcal M$ is easy to calculate: in ideal conditions the ideal chain $\mathcal I$ bakes one block every $\mathit{time\_between\_blocks}$ seconds,¹⁸ so the accumulated delay of an $n$-block blockchain-fragment is just

$\delta^o$ (the timestamp of the current block minus the timestamp of the block containing my transaction $tx$),
minus $n \cdot \mathit{time\_between\_blocks}$,¹⁹

or in symbols:

$$ \delta^a = \delta^o - n\cdot \mathit{time\_between\_blocks} $$

The accumulated delay $\delta^a$ is a simple measure of the ‘health’ of $\mathcal M$: the smaller $\delta^a$ is, the healthier $\mathcal M$ is. Intuitively an unhealthy chain is easier for a hostile baker $\cb$ to attack with a hostile chain $\mathcal S$, so our task is now to quantify this.

We compute the probability that our hostile baker $\cb$ has a hostile chain $\mathcal S$ that $\cb$ has forked from the main chain just before the block with our transaction $tx$ and that has the same length as $\mathcal M$ but is faster. We conceptually split $\mathcal S$ into two parts: a “past” one consisting of first $n$ blocks, and a “future” one consisting of the subsequent blocks.

We will use our ideal chain $\mathcal I$ as an intermediate reference step. Recall that we assume that blocks on $\mathcal I$ are baked with no delay, that is, every $\bd$ seconds. Then we proceed in three steps:

For the first $n$ blocks, we compare $\cb$‘s chain $\mathcal S$ with the ideal chain $\mathcal I$.
We then shift from $\mathcal I$ to $\mathcal M$ to account for $\delta^a$.
At this point, we are in a similar situation as with forks starting now, and so we compare $\mathcal S$ with $\mathcal M$.

For Item 1, the basic element we need is the difference between the minimal block delays on $\mathcal S$ and on $\mathcal I$:

\begin{align} \difffp{\pc, \ph, e} & := \edelay(\pc,32) - \mathit{time\_between\_blocks} \tag{13} \label{eq:pdiff1}\\ \diffp{\pc, \ph, e} & := \edelay(\pc,e) - \mathit{time\_between\_blocks} \tag{14} \label{eq:pdiff} \end{align}

We note that the above equations are similar to Equations \eqref{eq:diff1} and \eqref{eq:diff}, with the difference that we subtract $\bd$ from the definition of the ideal chain.

We can write the probabilities corresponding to Equations \eqref{eq:pdiff1} and \eqref{eq:pdiff} as follows:

\begin{align} &\pr^{\pcst}[\f{first\_diff}=\delta] = \displaystyle\sum^{32}_{e=0}\pendo{f,e}\cdot \sum_{\substack{\pc\geq 0\\{\difffp{\pc,0,e}=\delta}}} \pprio{f,\pc} \tag{15} \label{eq:pasteq1} \\ &\pr^{\pcst}[\f{subseq\_diff}=\delta] = \displaystyle\sum^{32}_{e=0}\pendo{f,e}\cdot \sum_{\substack{\pc\geq 0\\{\diffp{\pc,0,e}=\delta}}} \pprio{f,\pc} \tag{16} \label{eq:pastge1} \end{align}

Similar to Equations \eqref{eq:prn1} and \eqref{eq:prnl}, for a given difference $\delta$, the probability that the difference between $\mathcal S$ and $\mathcal I$ is $\delta$ is defined inductively as:

\begin{align} &\pr^{\pcst}_1[\f{chains\_diff} = \delta] = \pr^{\pcst}[\f{first\_diff} = \delta] \tag{17} \label{eq:past1} \\ &\pr^{\pcst}_n[\f{chains\_diff} = \delta] = \displaystyle\sum_{\substack{(\delta_{n{-}1}, \delta_1) \in \mathbb{Z}^2\\\delta_{n{-}1}+\delta_1=\delta}}\pr^\pcst_{n{-}1}[\f{chains\_diff} = \delta_{n{-}1}]\cdot\pr^\pcst[\f{subseq\_diff} = \delta_1] \end{align}

For Item 2, $\pr_n^\pcst[\f{chains\_diff} = \delta - \delta^a]$ captures the shift with respect to $\delta^a$ and this probability is the base case in the inductive definition for the probability that the difference between $\mathcal S$ and $\mathcal M$ is $\delta$, which is the probability mentioned in Item 3:

\begin{align} &\pr^\pcstn_1[\f{chains\_diff} = \delta \mid \delta^a,n] = \pr_n^\pcst[\f{chains\_diff} = \delta - \delta^a]\tag{18}\label{eq:shift}\\ &\pr^\pcstn_{\ell{+}1}[\f{chains\_diff} = \delta \mid \delta^a,n] = \displaystyle\sum_{\substack{(\delta_{\ell}, \delta_1) \in \mathbb{Z}^2\\\delta_\ell+\delta_1=\delta\\\delta_{\ell} > 0}}\pr^\pcstn_{\ell}[\f{chains\_diff} = \delta_\ell\,\mid \delta^a,n]\cdot\pr_1^\pcstn[\f{chains\_diff} = \delta_1] \tag{19} \label{eq:ppl} \end{align}

For brevity we will write

$\pr^\pcst$ for the probability that $\cb$ baked $n$ blocks until now and
$\pr^\pcstn$ for the conditional probability that $\cb$ bakes $l$ blocks from now given $\pr^\pcst$ and the accumulated delay.

Note the condition $\delta_{\ell} > 0$ in Equation \eqref{eq:ppl}: since $\delta$ represents the difference between the timestamps of $\hb$ and $\cb$, by means of this condition, we do not take into account the probabilities of forks of length $\ell$ the attacker could have won. Thus the probability in Equation \eqref{eq:ppl} represents the probability that the difference between the timestamps of $\mathcal S$ and $\mathcal M$ is $\delta$ and any prefix of $\mathcal S$ is not faster than the corresponding prefix of $\mathcal M$. In other words, $\ell{+}1$ is the first level at which the attacker might have a faster chain.

We define $\pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell) \mid \delta^a,n]$ to be the probability the attacker has a faster fork of any length smaller than or equal to $\ell$:

\begin{align} \pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell) \mid \delta^a,n] = \displaystyle\sum_i^{\ell}\sum_{\delta \leq 0}\pr^\f{past}_i[\f{chains\_diff} = \delta \mid \delta^a,n]. \end{align}

For a block to be considered final, given an accumulated delay $\delta^a$ we need to find the smallest $n$ such that the probability to have a fork of any length in the future is smaller than our security threshold $\secu$ (e.g. $\secu=10^{-8}$), namely, we need to find the smallest $n$ such that $\forall \ell. \pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell) \mid \delta^a,n] < \secu$. To effectively capture the universal quantifier, we define $\pr[\exists\mathit{faster\_fork} \mid \delta^a,n]$ as the following limit:

\begin{align} \tag{20} \label{eq:pfl} \pr[\exists \mathit{faster\_fork} \mid \delta^a,n] = \displaystyle{\lim_{\ell \rightarrow \infty}} \pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell) \mid \delta^a,n]. \end{align}

To solve Equation \eqref{eq:pfl}, it suffices to compute the least $\ell$ such that:

\begin{align} \frac{\pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell) \mid \delta^a,n]}{\pr[\exists\mathit{faster\_fork}.(\mathit{len(faster\_fork)}\leq \ell{-}1) \mid \delta^a,n]} \leq 1 + \epsilon. \tag{21} \label{eq:pastf} \end{align}

where we introduce $\epsilon$ to denote some desired computational precision, just to limit the computation. That is, the larger $\ell$ is, the smaller the probability, and at a certain point the probability of a fork of length $\ell$ is within $\epsilon$ of the probability of a fork of length $\ell{-}1$, and we decide that we are now precise enough and can stop the computation.

So now, to compute (or rather: to estimate using precision $\epsilon$) the confirmation number for a given accumulated delay $\delta^a$, we just need to find the smallest $n$ such that

$$ \pr[\exists \mathit{faster\_fork} \mid \delta^a,n] < \secu. $$

Calculating this is best done by computer, so we have built …

A tool, and the attack model

A concrete tool

The methodology above underlies the results presented in a previous blogpost on forks started in the past. Since then, we have created a standalone tool to perform relevant calculations, which we have made accessible online as a web demo for forks started in the past.

We can use our tool to compute and then compare concrete confirmation numbers for both the forks starting now and the forks started in the past scenarios considered above. Note that forks starting in the past is simply a conditional probability: forks starting now makes no assumptions about future chain health, whereas forks starting in the past asks “given a particular value for chain health since the transaction concerned, what is the expected confirmation number”? So recall our previous assumptions of

an attacker controlling at most one fifth of the network and
a security threshold $\secu$ of $10^{-8}$.

Under a forks starting now scenario — that is, at the time that we generated our block — we expected to wait seven blocks before considering our transaction final.

Suppose now that time has passed and we see that since our transaction, three blocks have been baked in 190 seconds — i.e. with a 10 second delay with respect to an ideal chain in Emmy+ which bakes one block every 60 seconds. Then our tool tells us that, assuming $\Delta$ is less than 60 seconds, the confirmation number is in fact three, so we can already consider our transaction final — four blocks earlier than our original worst-case scenario. And this is just because we have observed that the chain has been reasonably healthy since it included our transaction.

Thus by taking account of chain health since our transaction was included, we may — if the chain remains healthy — get a smaller number of confirmations than in the case of forks starting now.

Notes on the attack model

As always, our security guarantees depend on our attack model: i.e. what we assume an attacker wants to accomplish, and what powers we assume the attacker has when trying to do so. So it is as well to be clear about the limitations, as follows:

In the case of forks starting now, we have only considered that the attacker’s goal is to undo a transaction, so the attack stops when the attacker succeeds. In an alternative attack model, the attacker could keep switching branches and start again with the purpose of maintaining a fork for as long as possible. (Don’t worry! We have analysed this scenario in our blog post on mixed forks in Emmy.) The attacker could also play on multiple branches, while we consider only two branches.
In the case of forks started in the past, we have only considered the case when the attacker starts a fork at the block with the transaction to be undone. In an alternative attack model, the attacker could have started a fork before the block with the transaction to be undone — though one can always bring this within the scope of our analysis by assuming the attacker is trying to undo an arbitrary transaction in the earlier block.

The future

Tezos plans to switch to the classical BFT-style consensus algorithm Tenderbake, because this offers faster finality. Does that mean the Emmy family of algorithms is obsolete? Yes if you only care about Tezos, but absolutely not, if you care about Nakamoto-style consensus in general.

Emmy* is a perfectly fine consensus algorithm which arguably represents an evolutionary peak in Nakamoto-style consensus: the fact that it and Tezos stand to part ways reflects more on what Tezos requires for its future, than on Emmy itself.

Furthermore, all Nakamoto-style consensus algorithms are quite similar, so the lessons learned from optimising Nakamoto-style consensus to run the Tezos blockchain for seven years with increasing sophistication and real-world reliability may be valid, or at least indicative, of your favourite Nakamoto-style consensus algorithm too. Mathematics doesn’t rust, and if the maths in this blog post might help inform the design of future blockchain protocols, then it will have done its job.

Three questions to Nomadic Labs interns — Mathis Gontier Delaunay

2021-07-28T14:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Mathis Gontier Delaunay (and a couple of questions of his mentors at Nomadic Labs).

Mathis — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Mathis

1. Please present yourself and your academic background

My name is Mathis Gontier Delaunay and I am 21.

I took a Mathematics and Physics preparatory class in Rennes with the Computer Science option, which included algorithms, programming methods, and data structures with OCaml. This provided the grounding for my current interest in computer science.

Since September 2020 I have been studying at Telecom SudParis, a generalist engineering school oriented in the digital sector.

In September 2020 I also joined KRYPTOSPHERE, the first French student association dedicated to blockchain technologies and cryptocurrencies. I became passionate about the technology, its associated ideology, and its fast-growing ecosystem. I find the underlying concepts fascinating, and I am more and more drawn to the engineering aspects of its applications and protocols. I really believe in the future of these technologies, which can solve many problems, and I want to understand them in depth to participate in their development.

In April 2021 I was delighted to begin a developer’s role with a team of three researchers and one student working on a decentralized finance protocol that aims to create a “zero spread money market”; a much more efficient way to use liquidity than current lending protocols. The project has since grown and seven people are working on it today, with a proof of concept on the way: https://morpho.best/.

I hope one day to fully understand decentralized ledger technology and to participate in its long-term development. I look forward to contributing to the maturation of decentralized ledgers into a practical technology and a useful part of our lives and society.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

In early June 2021 I joined the Umami team at Nomadic Labs, and I will stay until early September. Umami is a desktop cryptocurrency wallet for the Tezos ecosystem whose Beta version was launched in April. It is developed by a team of about ten people at Nomadic Labs, currently working on new features like hardware wallet integration and the interaction with Tezos decentralized apps (Dapps) as listed on this webpage.

For my internship I’m contributing to Umami’s development. My mentor is Rémy El Sibaïe, one of the five developers of the team.

I bring my understanding of blockchain technology operation, usage and applications to my role, and this internship is an opportunity for me to participate with a team in actual software development in a blockchain ecosystem.

A big part of this turns out to be learning specialist tooling and collaboration techniques! For example, I spent my first days getting used to Git version control, and learning how to work with a team of developers on the same project. That’s two valuable life skills learned right there in the first week!

Umami is written in ReasonML, a functional language which works much like OCaml (and notably having a strong type system) but with a Javascript-like syntax, and which integrates better with the React library. I now have a better understanding of functional languages and how to use them to build applications like Umami. My mentor also explained to me the more precise techniques that are used in Umami, such as futures and promises.

Contributing to an open-source project is very rewarding: after only one month, my first contribution to the project is already in the main codebase and can be used by the community. But working here at Nomadic Labs, surrounded and supervised by blockchain experts, with a really strong understanding of the technology and its challenge, would already have been an accomplishment for this first internship. Listening to their discussions about the future of Tezos is a pleasure each day.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I have a strong interest in computer science and particularly in decentralized ledger technologies, which for me have opened the door to many innovations that I expect to revolutionize the way we exchange value, finance, and a lot more. I naturally wanted to do my first year internship in this domain.

Nomadic Labs is a major player in the crypto-blockchain sector in France, and also one of the most technical ones. Nomadic Labs is a key contributor to the core protocol of the really innovative blockchain that is Tezos, and combines this with wider work on adoption and support. That combination is what attracted me.

Participating in the development of an ecosystem with infinite potential is very exciting and motivating, and this experience has confirmed my attraction to the sector. That is why I would like to specialize in distributed systems as soon as possible, probably at university, in France or abroad. I would also like to conduct a bigger research / development project for my next internship. Who knows: perhaps that too could be at Nomadic Labs.

Questions for Mathis’ Nomadic Labs mentor Rémy El Sibaï:

What is your input to the work of Mathis?

Mathis has a good understanding of the blockchain world and has been making useful contributions to the Umami project from the start of his internship.

This is his first industrial experience, so his learning curve has included our development tools and professional good practice. And of course, Mathis has been learning the functional programming and type systems that are specific to OCaml and ReasonML.

Mathis has good ideas and a sound understanding of users’ requirements. We look forward to integrating his contribution into our production code.

Why is the topic of this internship important?

Mathis is making substantive contributions to core features of the Umami system and really pulling his weight (= making significant contributions) as he works with our team. We could not ask more of an intern.

Mathis also brings something else of great value to us: he sees our project with fresh eyes, and this helps us to find and fix all the accessibility mistakes which we — as the system’s developers who know Umami too well — are liable to miss. Thank you Mathis!

A tale of two reductions in gas consumption in Tezos

2021-07-26T15:00:00+02:00

Introduction
Tale 1: a new tail recursive monadic interpreter
Tale 2: hunt for a serialization performance bug
Conclusion

Introduction

We subscribe to Kent Beck’s motto:

Make it work, make it right, make it fast!

— in that order. This implies an incremental software development process:

When you make it work, you may elide corner cases or design considerations required to make it right; and
when you make it right, you may introduce abstractions which are good for provability but are too naive to make it fast in the implementation.

This is normal: incremental development separates concerns so we can properly deal with each of them.

We’d like to share with you two tales of making Tezos faster in Granada by following the motto above:¹

Tale 1 led to an optimization of the Michelson interpreter; and
Tale 2 led to an optimization of the serialization layer.

Both tales are set after the “make it right” stage and during the “make it fast” stage. The code was clean (and we wanted to keep it that way); we just wanted to increase performance.

With these two optimizations, Granada — the latest, and recently-accepted, amendment proposal of Tezos economic protocol — executes smart contracts with much better gas consumption.² For example, a typical call to Dexter entrypoint XtzToToken consumes roughly

75600 units of gas in Florence, and
9320 units of gas in Granada.

That’s an eightfold improvement.³

Tale 1: a new tail recursive monadic interpreter

Tezos executes smart contracts in Michelson, a strongly-typed low-level language.⁴ Roughly speaking, when provided with

an input argument, and
a storage state, and
a stack of pending operations,

a Michelson script computes

a new storage state, and
a new stack of pending operations, and
it may also carry out some concrete action to change the state of the blockchain — the technical term is a side-effect.

An aside: side-effects via monads

Monads are how functional programmers deal with effectful computation, meaning programs that may compute a return value and also have side-effects on the ambient world as they do so (input/output, token transfers, etc). The impatient reader is welcome to skip forward to our toy MichelBaby interpreter and return to this subsection for reference if required, especially when the reader sees us mention bind later.

Still here? Great!

A monad is a type-constructor, meaning that given a type t and a monad monad, we have a type t monad. There are many different monads — e.g. the error monad, the state monad, and the Lwt (lightweight threads) monad — but for the purposes of this blog post, these all just represent actions on some state machine.⁵ Thus t monad is a type for computations that compute a t while exerting side-effects on a state monad monad.

A monad comes with the following key operations:

return : 'a -> 'a monad
This inputs a value of type 'a and returns that same value, in an empty (or initial) state.
bind : 'a monad -> ('a -> 'b monad) -> 'b monad
This binds a stateful computation from 'a to 'b, to an input 'a-in-state, thus obtaining an output 'b-in-state.
run : 'a monad -> 'a
This discards the state and just returns the value.⁶

C, Java, or Python have monads, or more precisely one monad: a single “global state monad” which represents how the actual state of the computer changes as it executes instructions. Since in these languages the global state monad permeates all computation, it need not be mentioned explicitly — but it is still there, as you can check: just run a virtual machine, take a snapshot, et voilà your global state is saved to disk as a datum.

OCaml (in common with may other functional programming languages) allows us to be far more fine-grained, by bundling state-fragments up into monads.⁷ This has at least two benefits:

fragments of machine state are now encapsulated as typed data and can be passed as arguments to other functions (as our crude example above of snapshotting a VM illustrates); and also,
using monads encourages the programmer to be discriminating about precisely what effects a given computation really needs — if any.

So if we think about it, execution of a Michelson script requires a few distinct kinds of state:

The state of the blockchain (e.g. to transfer tokens).
The gas level consumed in the current block (especially in case we are about to run out of gas).
A script may interact with the environment via I/O.
A script may also fail because it tried to perform some invalid operation, and this too is interaction with state; namely the failure monad whose effect (if triggered) is to say “I have no return value because I have failed”.

In practice the Michelson interpreter uses a monad in OCaml which is the composition of three different monads: the error monad, the state monad, and the Lwt (lightweight threads) monad.

If the reader sees return, bind, or run below — they are just the monadic glue coming from this or a closely related monadic combination.

An interpreter for Michelson scripts

An interpreter is a program that runs scripts. Let’s sketch what an OCaml interpreter might look like for a simple object-level language⁸ MichelBaby. MichelBaby is derived from a subset of Michelson’s instructions as used in the currently-live economic protocol Florence; so what follows can be viewed as a simplified but indicative reflection of the smart contracts system as it is currently deployed.

Here’s the OCaml type declaration for the abstract datatype instr of MichelBaby instructions (the impatient reader can skip forward to compare with the second version):

type (_, _) instr =
  |  Push : 'b -> ('s, 'b * 's) instr
  |   Mul :       (z * (z * 's), z * 's) instr
  |   Dec :       (z * 's, z * 's) instr
  | CmpNZ :       (z * 's, bool * 's) instr
  |  Loop :       ('s, bool * 's) instr
               -> (bool * 's, 's) instr
  |   Dup :       ('a * 's, 'a * ('a * 's)) instr
  |  Swap :       ('a * ('b * 's), 'b * ('a * 's)) instr
  |  Drop :       ('a * 's, 's) instr
  |   Dip :       ('s, 't) instr
               -> ('a * 's, 'a * 't) instr
  |    IO :       ( z * 's,  z * 's) instr
  |   Seq :       ('s, 't) instr * ('t, 'u) instr
               -> ('s, 'u) instr

(The type z above is the integer type from the standard OCaml library Zarith for infinite precision arithmetic.)

We notice that:

instr is polymorphic over two type parameters: ('s, 't) instr.
- The first parameter 's represents the input stack type and
- the second parameter 't represents the output stack type.
Each line in the datatype declaration corresponds to an individual instruction. The type parameters give useful information on the intended meaning. For example:
- Mul : (z * (z * 's), z * 's) instr is an instruction that inputs a stack headed by two integers (z * (z * 's)) and outputs a stack headed by one integer (z * 's). Intuitively, Mul pops the two integers off the stack, multiplies them, and pushes the result.
- Push : 'b -> ('s, 'b * 's) instr is an instruction that inputs a value of type 'b and returns an instruction that inputs a stack of type 's and outputs a stack of type 'b * 's. Intuitively, Push b pushes b (of course).
There is a special sequence constructor Seq that
- inputs an instruction that inputs stack 's and outputs a stack 't, and
- inputs an instruction that inputs a stack 't and outputs a stack 'u, and
- returns a composite instruction that inputs a stack 's and outputs a stack 'u.
Note how the types make it obvious what the interpreter should do with I Seq J: first I, then J.
IO is a basic effectful instruction — for concreteness, the reader can assume that it writes the topmost integer of the stack to a file, and could be used for example as a hook for logging and profiling execution.

MichelBaby typechecking for free, courtesy of the OCaml typechecker

This style of polymorphic datatype declaration makes the OCaml typechecker act as a MichelBaby typechecker too: only (representations of) well-typed scripts can inhabit the instr type. For example, consider the following MichelBaby implemention of fact the factorial:

(* Inline syntactic sugar for sequencing *)
let ( @ ) s1 s2 = Seq (s1, s2)

(* MichelBaby instruction to calculate n! *)
let fact n =
  assert (Z.(n > zero));
  Push n
  @ Push (Z.of_int 1)
  @ Dup @ CmpNZ @ Dip Swap
  @ Loop (Dup @ Dip Swap @ Mul @ IO @ Swap @ Dec @ Dup @ CmpNZ)
  @ Drop

Hurrah, this is well-typed! OCaml’s typechecker does not know that fact is a MichelBaby sequence of instructions to compute the factorial, but it does certify that fact has the following type:

fact : z -> ('s, z * 's) instr

In English, this is the OCaml typechecker telling us:

fact will input an integer and return a MichelBaby instruction that inputs a stack and returns a stack of the same shape except it has an integer pushed onto it.

Now suppose we to forget the CmpNZ instruction above. Then the OCaml typechecker will reject the definition with the following error message:

This expression has type (bool * 'a, 'b) instr
but an expression was expected of type (Z.t * 'c, 'd) instr
Type bool is not compatible with type Z.t

This is the OCaml typechecker, giving us MichelBaby typechecking for free.

Tracing factorial: a standard toy example

Our implementation fact of factorial is a tracing factorial, meaning that it computes factorial and contains an IO tracing (or breakpoint) hook so we can observe intermediate results of the computation using an input/output primitive of our choice — a similar hook is in our efficient OCaml factorial function below.

Tracing factorial is interesting because it is an easy-to-understand toy example of a mostly pure computation with some side-effects — which is what most smart contracts look like. This is a convenient toy example: no implication is intended that tracing factorial is the only thing that could be done in MichelBaby or Michelson!

The `step` function, Version 1

The instr datatype allows us to represent well-typed programs. Now we need to design an interpreter. First, suppose we are given a function step

step : ('i, 'o) instr -> 'i -> 'o monad

Then we can write our interpreter as follows:

type ('storage, 'argument) well_typed_script =
  (('storage * 'argument) * unit, (('storage * operations) * unit)) instr

type ('storage, 'argument) interpretation =
  'storage -> 'argument -> ('storage * operations) monad

let interpreter : type storage argument.
  (storage, argument) well_typed_script -> (storage, argument) interpretation  =
  fun instr storage argument ->
    bind (step instr ((storage, argument), ()))
    @@ fun ((storage, operations), ()) -> return (storage, operations)

So now we just have to write code for step. Standard practice is to just follow the inductive structure of instr.

We present Version 1 of the step function (cf. Versions 1.1 and 2 below):

let rec step : type a b. (a, b) instr -> a -> b monad =
  fun instr stack ->
  match (instr, stack) with
  | Push x, stack
    -> return (x, stack)
  | Mul, (x, (y, stack))
    -> return (Z.mul x y, stack)
  | Dec, (x, stack)
    -> return (Z.(sub x (of_int 1)), stack)
  | CmpNZ, (x, stack)
    -> return (Z.(compare x zero <> 0), stack)
  | Loop _, (false, stack)
    -> return stack
  | Loop body, (true, stack)
    -> bind (step body stack) @@ fun stack -> step (Loop body) stack
  | Dup, (x, stack)
    -> return (x, (x, stack))
  | Swap, (x, (y, stack))
    -> return (y, (x, stack))
  | Drop, (_, stack)
    -> return stack
  | Dip i, (x, stack)
    -> bind (step i stack) @@ fun stack -> return (x, stack)
  | IO, (z, _)
    -> bind (io z) @@ fun () -> return stack
  | Seq (i1, i2), stack
    -> bind (step i1 stack) @@ fun stack -> step i2 stack

The recursive function step observes the instruction and the stack and produces a monadic computation using return and bind. Let’s read through some cases of this definition:

Case for Push. As prescribed by the definition of Push, the input stack type is 's and we must return a stack of the form 'b * 's. That is the type of (x, stack). Besides, since we need the returned value to be of type ('b * 's) monad, we use return to turn (x, stack) into a monad computation.
Case for Seq. As prescribed by the definition of Seq, the instruction i1 must be of type ('s, 't) instr and i2 of type ('t, 'u) instr. The recursive call (step i1 stack) is well-typed and returns a value of type 't monad. Using bind, we can retrieve the stack of type 't needed to interpret i2. The final computation has type 'u monad as expected.

Case for IO. Here, we use an effectful operation io : z -> unit monad and compose it with a computation that returns that stack unchanged.

We check that the interpreter correctly computes the factorial on an input (other inputs work too!):

# run (step (fact (Z.of_int 100)) ()) |> fst = ocaml_fact (Z.of_int 100);;
- : bool = true

This interpreter has a good property: it can’t fail. Because we get MichelBaby type-checking for free, every application of step to a well-typed argument in instr is guaranteed to successfully execute (provided the ambient execution environment doesn’t suffer an overflow; see next paragraph). We can’t fail on an ill-formed operation like multiplying two strings or popping an empty stack, because our free MichelBaby typechecker will detect and reject the instruction-sequence as ill-typed.

Unfortunately, our interpreter above is not yet “right”, because step is not tail recursive. Note that:

Recursive calls to step are not the final step in the computation in the clauses for Seq (in step i1 stack) and Dip (in step i stack).
For comparison, the calls in Loop to step (Loop body) stack and in Seq to step i2 stack, are tail recursive.

This is not “right” because each call to the interpretation loop that passes through a non-tail-recursive call to step, may consume a bit of the OCaml calling stack, and we can use this to provoke incorrect behaviour, namely a stack overflow (when the call stack fills up and the computer has to terminate execution because it runs out of memory):

# let rec long_seq accu n =
  if n = 0 then accu
           else long_seq (Seq (accu, Seq (Push Z.zero, Drop))) (n - 1);;
val long_seq : ('a, 'b) instr -> int -> ('a, 'b) instr = <fun>
# step (long_seq (Push Z.zero) 10000000) ();;
Stack overflow during evaluation (looping recursion?).

When the stack overflow error occurs, is architecture-dependent and depends on how much stack is available to consume. A cleaner way to handle this is to count the nesting depth of non-tail-recursive calls, and abort if we go too far.⁹

The `step` function, Version 1.1

We add a depth counter to Version 1 of step (cf. Version 2 below):

let rec step : type a b. int -> (a, b) instr -> a -> b monad =
  fun depth instr stack ->
  if depth > 100000 then fail "Too many recursion"
  else match (instr, stack) with
  | Push x, stack
      -> return (x, stack)
  | Mul, (x, (y, stack))
      -> return (Z.mul x y, stack)
  | Dec, (x, stack)
      -> return (Z.(sub x (of_int 1)), stack)
  | CmpNZ, (x, stack)
      -> return (Z.(compare x zero <> 0), stack)
  | Loop _, (false, stack)
      -> return stack
  | Loop body, (true, stack)
      -> bind (step (depth + 1) body stack) @@ fun stack -> step depth (Loop body) stack
  | Seq (i1, i2), stack
      -> bind (step (depth + 1) i1 stack) @@ fun stack -> step depth i2 stack
  | Dup, (x, stack)
      -> return (x, (x, stack))
  | Swap, (x, (y, stack))
      -> return (y, (x, stack))
  | Drop, (_, stack)
      -> return stack
  | Dip i, (x, stack)
      -> bind (step (depth + 1) i stack) @@ fun stack -> return (x, stack)
  | IO, (z, _)
      -> bind (io z) @@ fun () -> return stack

The program is still non-tail-recursive but at least we have replaced a stack overflow which we cannot control, with an explicit branch within our own program which we can control.

Some tests are required to determine whether 100000 is a good limit.¹⁰ Once this limit value is appropriately chosen, we can claim to have made the interpreter “right”:

it follows a standard OCaml programming style (hence natural to reason about), and
it is robust to stack overflows.

Making the interpreter fast

How fast is our step function? To get an idea, we can compare how long it takes to compute the factorial of a hundred — 100! = 100*99*98*...*1 — with an equivalent native OCaml implementation of the factorial:

let lwt_fact n =
  let rec aux k accu =
    if Z.(compare k zero = 0) then return accu
    else
      let accu = Z.mul accu k in
    bind (io accu) (fun () -> aux (Z.sub k (Z.of_int 1)) accu)
  in
  aux n (Z.of_int 1)

Here are the benchmarks, for the OCaml factorial function and V1.1 of our step function:

┌─────────────┬──────────┬────────────┐
│ Name        │ Time/Run │ Percentage │
├─────────────┼──────────┼────────────┤
│ OCaml       │   3.71us │      23.4% │
│ Step (V1.1) │  15.83us │     100.0% │
└─────────────┴──────────┴────────────┘

On this specific example, the interpreter is five times slower than the reference implementation in OCaml. Why? Profiling execution with the linux perf command, we learn that 50% of the time is spent in the monadic combinators (mentioned above). Mostly this is due to the code for handling Seq in the code above, in which bind is called to glue together the interpretation of the two instructions of a sequence: this has a negative impact on the performance because bind happens to be a rather complex operation when the monad includes the Lwt (lightweight threads) monad.

We need to get rid of those binds.

Reducing the number of `bind`s

Let’s reexamine the MichelBaby code for fact above:

(* Inline syntactic sugar for sequencing *)
let ( @ ) s1 s2 = Seq (s1, s2)

(* MichelBaby instruction to calculate n! *)
let fact n =
  assert (Z.(n > zero));
  Push n
  @ Push (Z.of_int 1)
  @ Dup @ CmpNZ @ Dip Swap
  @ Loop (Dup @ Dip Swap @ Mul @ IO @ Swap @ Dec @ Dup @ CmpNZ)
  @ Drop

The MichelBaby script contains only one inherently effectful instruction: IO. Yet each Seq (written @ above) calls a monadic bind operation, which is expensive. This means that we pay the cost of the monadic abstraction repeatedly, when in fact a pure computation could perform (all but one) of the calculations more efficiently.

Can we separate the pure instructions from the impure ones and then use bind just when impure instructions enter the scene? We can start by syntactically separating the pure computations from the impure ones, taking inspiration from our OCaml version of the factorial:

let lwt_fact n =
  let rec aux k accu =
    if Z.(compare k zero = 0) then return accu
    else
      let accu = Z.mul accu k in
      bind (io accu) (fun () -> aux (Z.sub k (Z.of_int 1)) accu)
  in
  aux n (Z.of_int 1)

Notice how

the accumulator accu accumulates the result of a sequence of multiplications — that’s the pure computation, whereas
the impure computation — namely the call to the effectful operation io — is performed in the body of the aux function.

We will now rephrase step so that:

The pure computations rewrite the input stack passed as an argument to a tail recursive call to the next instruction.
The impure computations return monadic computations.

First, we must rephrase the datatype that defines the instructions, so that every instruction points to the next instruction (compare the code below with the first version, and z explained here):

type (_, _) instr =
| KHalt  : ('s, 's) instr
| KPush  : 'b * ('b * 's, 'f) instr ->
                     ('s, 'f) instr
| KMul   :       (z * 's, 'f) instr ->
           (z * (z * 's), 'f) instr
| KDec   : (z * 's, 'f) instr ->
           (z * 's, 'f) instr
| KCmpNZ : (bool * 's, 'f) instr ->
              (z * 's, 'f) instr
| KLoop  : ('s, bool * 's) instr * ('s, 'f) instr ->
                            (bool * 's, 'f) instr
| KDup   : ('a * ('a * 's), 'f) instr ->
                 ('a * 's , 'f) instr
| KSwap  : ('a * ('b * 's), 'f) instr ->
           ('b * ('a * 's), 'f) instr
| KDrop  :      ('s, 'f) instr ->
           ('a * 's, 'f) instr
| KDip   : ('t, 's) instr * ('a * 's, 'f) instr ->
                            ('a * 't, 'f) instr
| KIO    : (z * 's, 'f) instr ->
           (z * 's, 'f) instr`

Let’s compare Mul from the first version with KMul here:

Mul : (z * (z * 's), z * 's) instr is an instruction that

*inputs a stack headed by two integers (z * (z * 's)) and * outputs a stack headed by one integer (z * 's).

Intuitively, Mul pulls the two integers off the stack, multiplies them, and pushes the result.
KMul : (z * 's, 'f) instr -> (z * (z * 's), 'f) instr is an instruction that
- inputs a sequence of instructions that inputs a stack headed by an integer z * 's and outputs a stack 'f, and
- outputs a sequence of instructions that inputs a stack headed by two integers z * (z * 's) and outputs a stack 'f.

Intuitively, KMul just adds Mul onto the head of an existing instruction sequence.

Our new datatype does not represent instructions; it represents instruction-sequences.
The datatype-constructors represent instructions to push instructions onto sequences.

Ever written a shopping list? The family sits around the table calling out household essentials, and then Dad (or Mum — whoever does the shopping) adds items to a written list. KMul corresponds to somebody saying “Don’t forget to buy multiplication” and then Dad (or Mum) writes “Mul-ti-pli-cation” on the end of the list.

Except that, because this blog post is about Very Serious Programming, our lists start on the right-hand-side and expand leftwards.

The issue with the cost of bind is also relevant in this analogy. Most of the time it doesn’t matter what order items are put into the shopping basket. So it makes sense to split the shopping list into a small number of big heavy things that need put in the trolley last, and a large number of smaller things that can be efficiently executed by rapidly traversing the supermarket in whatever order is most efficient for that supermarket’s layout. Isn’t computer science wonderful?

KHalt is a new instruction, to represent the end of our to-do list (think: EOF or EOS marker).

There is one catch to our to-do list / shopping list analogy (to be fair, the reader can expect programming smart contracts to be a little harder than shopping): it may not always be possible to determine the next instruction entirely in advance. Execution may dynamically depend on input parameters: for example, a control-flow operator like KLoop dynamically chooses the next instruction by observing a Boolean.

This means that we need another stack — a control stack — that will allow control-flow operators to dynamically define what the next instruction should be:

type (_, _) instrs =
  | KNil : ('s, 's) instrs
  | KCons : ('s, 't) instr * ('t, 'u) instrs -> ('s, 'u) instrs

Intuitively, if instr represents a sequence of instructions then instrs represents a list of sequences of instructions, with the constraint that the final stack type ('t above) of each sequence of instructions on the list, must be compatible with the input stack type ('t above) of the next sequence in the list (if any).

The `step` function, Version 2

We present Version 2 of the step function (cf Versions 1 and 1.1 above):

let step : type a b p. (a, b) instr -> a -> b monad =
  fun i stack ->
  let rec exec : type a i p. (a, i) instr -> (i, b) instrs -> a -> b monad =
    fun k ks s ->
      match (k, ks) with
      | KHalt, KNil -> return s
      | KHalt, KCons (k, ks) -> exec k ks s
      | KIO k, ks ->
          let z, _ = s in
          bind (io z) (fun () -> exec k ks s)
      | KPush (z, k), ks -> exec k ks (z, s)
      | KLoop (ki, k), ks -> (
          match s with
          | true, s -> exec ki (KCons (KLoop (ki, k), ks)) s
          | false, s -> exec k ks s )
      | KMul k, ks ->
          let x, (y, s) = s in
          exec k ks (Z.mul x y, s)
      | KDec k, ks ->
          let x, s = s in
          exec k ks (Z.sub x (Z.of_int 1), s)
      | KCmpNZ k, ks ->
          let x, s = s in
          exec k ks (Z.(compare x zero) <> 0, s)
      | KDup k, ks ->
          let x, s = s in
          exec k ks (x, (x, s))
      | KSwap k, ks ->
          let x, (y, s) = s in
          exec k ks (y, (x, s))
      | KDrop k, ks ->
          let _, s = s in
          exec k ks s
      | KDip (ki, k), ks ->
          let x, s = s in
          exec ki (KCons (KPush (x, k), ks)) s
    in
    exec i KNil stack

Compared with Version 1.1 and Version 1, Version 2 uses far fewer monadic combinators:

Version 2 has one use of return (for KHalt, of course) and one use of bind (for KIO).
Versions 1 and 1.1 have four uses of bind and more uses of return than we care to count.

How does this work?

Pure computations (i.e. stack-related and arithmetic operations) just get pushed onto the input stack (the to-do list).
The impure KIO computation calls bind to glue its interpretation to the rest of the evaluation.
Control operators, like KLoop, update the control stack to dynamically determine the next instruction-sequence.

Question: does this improve performance? Answer: Yes, significantly:

┌───────────────┬──────────┬────────────┐
│ Name          │ Time/Run │ Percentage │
├───────────────┼──────────┼────────────┤
│ OCaml         │   3.62us │      22.9% │
│ Step (V1.1)   │  15.83us │     100.0% │
│ Step (V2)     │   6.39us │      40.4% │
└───────────────┴──────────┴────────────┘

On this specific micro-benchmark, Version 2 is almost three times faster than Version 1.1 and its speed is comparable (to within a factor of two) with the efficient native OCaml implementation.

Note that protection against stack overflows (as discussed above) is not required because the new interpreter is tail recursive. Note also that step Version 2 still gives us MichelBaby typechecking for free, so it is as safe (as as much static guarantee of correctness) as Version 1.

Is that everything?

Granada’s new Michelson interpreter gains most of its efficiency from the program transformation described above, but three other optimizations are also substantive:

We use a local gas counter in the interpretation loop instead of the gas level found in the context.¹¹ The OCaml compiler can represent this local gas counter in a machine register instead of a heap-allocated object; reading from and writing to a machine register is several orders of magnitude faster than for a heap-allocated object.
Similarly, we cache the top of the stack by passing it as a dedicated argument in the interpretation loop, separately from the tail of the stack. Semantically this makes no difference — it’s still a stack! — but by this device we encourage the compiler to store the top of the stack in a machine register, potentially saving many trivial RAM read/writes as the top of the stack would otherwise get pushed to and then popped from memory (see a paper on stack caching for interpreters).
The Michelson interpreter includes an integrated logging tool, so that users can test smart contract code off-chain (e.g. profile their smart contracts before they go live). This logging tool is inactive for live on-chain code, but in Florence the logger has nonzero cost even when inactive — thus, live smart contracts run slightly slower in Florence than they would if the profiler did not exist at all. For Granada, we found a way to exploit the control stack to implement genuinely zero-cost logging.

Tale 2: hunt for a serialization performance bug

Doing serialization right thanks to well-typed encodings

Serialization converts data values into a form that can be stored or transmitted on the network (e.g. when you save a text file to disk, it gets serialized to 0s and 1s).
Deserialization is the dual operation of reconstructing the data from its serialized representation.

(De)serialization is ubiquitous in Tezos. For example:

Deserialization occurs whenever data is read from the chain (e.g. loading a script source code to execute a smart contract call).
Serialization occurs whenever the protocol stores information on the chain.
(De)serialization occurs whenever nodes communicate over the Tezos network.

Nearly anything useful you might do on a blockchain involves either reading from the chain, writing to it, or communicating with another node — so the short list above covers pretty much everything, other than the occasional few microseconds of pure computation.

The node and the economic protocol use a library named data-encoding — version 0.2 in Florence; version 0.3 in Granada — that provides convenient operations to define encodings and to automatically generate functions to convert values into sequences of bytes or JSON objects, and of course functions to decode serialized data back to values.

Serialization functions should not generally be written manually: it’s hard and error-prone work. Better to generate serialization functions automatically. To this end, data-encoding provides combinators to define a value of type t encoding from a value in some arbitrary serializable type t. For example suppose t is a basic binary tree:

type tree = Leaf of int | Binary of tree * tree

Then we can produce an encoding just by following the structure of t as follows:

let tree_encoding : tree encoding =
 mu "tree" @@ (* `fix` is this function -> *) fun self ->
    union [
      case
        (Tag 0)
        ~title:"Leaf"
        int31
        (function Leaf x -> Some x | _ -> None)
        (fun x -> Leaf x);
      case
      (Tag 1)
      ~title:"Binary"
      (obj2 (req "left" self) (req "right" self))
      (function Binary (left, right) -> Some (left, right) | _ -> None)
      (fun (left, right) -> Binary (left, right));
    ]

With that definition, we describe the encoding declaratively through the equation that defines the type of the value it works over:

$$ tree = \mu T. (int + T \times T) $$

Combinators’ types are informative and so limit the chances of error:

val mu    : string ->
            ('a encoding -> 'a encoding) ->
               'a encoding
val case  : title:string -> case_tag ->
            'a encoding -> ('t -> 'a option) -> ('a -> 't) ->
               't case
val union : ?tag_size:[`Uint8 | `Uint16] ->
            't case list ->
               't encoding

So data-encoding makes serialization right: we avoid writing error-prone boilerplate and thanks to combinators

the input required from the programmer is minimal (thanks to their expressivity),
the results are fairly likely to be correct (thanks to the combinators’ precise types), and
the results are safe (again, thanks to static typing).

How costly is this abstraction?

Serialization with data-encoding version 0.2 (as in Florence, and thus before Granada) may have been “right”, but it was not particularly fast. Typically, if we compare the time required to encode a binary tree with $2^{20}$ nodes using data-encoding with the time taken to encode the same tree using Marshal (the unsafe serialization module of OCaml standard library)¹², we get the following numbers:

┌───────────────────┬──────────┬────────────┐
│ Name              │ Time/Run │ Percentage │
├───────────────────┼──────────┼────────────┤
│ Marshal           │  67.46ms │      30.4% │
│ data-encoding 0.2 │ 221.82ms │     100.0% │
└───────────────────┴──────────┴────────────┘

data-encoding version 0.2 is roughly three times slower that Marshal in that example.¹³

One might assume this is just the price of having a nice abstraction for and a well-typed implementation of serialization. After all, Marshal is untyped and implemented in C, and low-level programming in C can be fast.

Surprisingly, this is not the case. Using the linux perf command we can observe that a specific function in data-encoding takes half of the execution time during writing: check_cases.¹⁴

check_cases checks that the data-encoding‘s combinator for union types is properly applied to a nonempty list of disjoint cases.

It’s surprising to see check_cases called during serialization of concrete values, because one would have expected it to be called just once when the definition of the encoding is processed.

In fact, things are even worse than they appear because check_cases has quadratic complexity with respect to the number of cases. Thus execution time degrades further if we add a new data constructors to our type for binary trees:

┌───────────────────┬──────────┬───────────┐
│ Name              │ Time/Run │Percentage │
├───────────────────┼──────────┼───────────┤
│ Marshal           │  74.25ms │     25.4% │
│ data-encoding 0.2 │ 292.16ms │    100.0% │
└───────────────────┴──────────┴───────────┘

For datatypes with five or more constructors, the performance of data-encoding seems to bottom out at roughly ten times slower than Marshal.

So, why is check_cases called during serialization of concrete values? The investigation is simple since the definition of tree encoding uses only two combinators: union and mu.

check_cases is called during serialization of concrete values because of a bug in mu which we will explain now.

Fortunately, the case for mu-based encodings in the function write_rec is a mere one-liner:

| Mu {fix; _} ->
    write_rec (fix e) state value

This line says that fix (the function passed to mu) is called to get the encoding of the value to be encoded. Here Mu is the internal data constructor used in the function mu.

Let us see what this fix function is when write_rec is executed on our example for tree encoding:

 1. let tree_encoding : tree encoding =
 2.  mu "tree" @@ (* `fix` is this function -> *) fun self ->
 3.     union [
 4.       case
 5.         (Tag 0)
 6.         ~title:"Leaf"
 7.         int31
 8.         (function Leaf x -> Some x | _ -> None)
 9.         (fun x -> Leaf x);
10.       case
11.       (Tag 1)
12.       ~title:"Binary"
13.       (obj2 (req "left" self) (req "right" self))
14.       (function Binary (left, right) -> Some (left, right) | _ -> None)
15.       (fun (left, right) -> Binary (left, right));
16.           ]

This is a correct encoding function! In fact the return type of the fix function is tree encoding so it makes sense to recursively call write_rec to encode value in the interpretation of mu since it is of type tree. Besides, self will be equal to e in the interpretation of mu, that is Mu {fix; ...} which is also consistent because tree_encoding is used to serialize the sub-trees.

However, even though the definition of tree_encoding is functionally correct, it has a hidden performance bug. Each time we write a tree, we execute union (see line 3 in the code above), and union carries out some sanity checks, including the aforementioned check_cases. The encoding for trees is an immutable value, so its definition need only be checked once — not each time a tree is serialized!

How to fix `fix`?

Now that we understand the problem with fix, how can we address it?

The idea is simple: the construction of the encoding for an arbitrary recursive type (of which the binary trees considered above are an example) should execute fix only once and then produce an encoding that we can trust to serialize an arbitrary number of data elements of that type, with no further checks.

This works because an encoding is an immutable value so executing fix on e always returns the same value. On the implementation side, we just need to call fix e once, then remember the result in a local reference and return the content of this reference in subsequent evaluations of fix e. Programmers may recognize this as the common technique called memoization (caching the results of function calls).

Here is an excerpt of the corrected mu that illustrates how we proceed:

let self = ref None in
let fix_f = fix in
let fix e =
  match !self with
  | Some (e0, e') when e == e0 ->
      e'
  | _ ->
      let e' = fix_f e in
      self := Some (e, e') ;
      e'
in
...

self remembers that the fix e = e'. This allows us to return e' when fix is called with e again instead of recomputing fix e.

We re-run our first benchmark on data-encoding version 0.2 but with a correct mu and see that this patch significantly improvemes the performance of data-encoding:

┌────────────────────────────────┬──────────┬────────────┐
│ Name                           │ Time/Run │ Percentage │
├────────────────────────────────┼──────────┼────────────┤
│ Marshal                        │  69.19ms │      61.5% │
│ data-encoding with correct mu  │ 112.42ms │     100.0% │
└────────────────────────────────┴──────────┴────────────┘

We reduced the execution time, and also our encoding is now less sensitive to the number of cases in the encoding. Indeed, the second benchmark (in which our type for binary trees is extended with additional data constructors) performs almost as well (112.42ms vs 128.18ms):

┌───────────────────────────────┬──────────┬────────────┐
│ Name                          │ Time/Run │ Percentage │
├───────────────────────────────┼──────────┼────────────┤
│ Marshal                       │  74.14ms │      57.8% │
│ data-encoding with correct mu │ 128.18ms |     100.0% │
└───────────────────────────────┴──────────┴────────────┘

Is that everything?

The optimizations described above account for the bulk of the performance improvements moving from data-encoding 0.2 in Florence to data-encoding 0.3 in Granada.

data-encoding 0.3 does include other incremental optimizations, which yield real but less significant speedups, and we can sum it up in a little equation as follows:

data-encoding 0.3 in Granada = data-encoding 0.2 from Florence + 
                               correct mu +
                               some other optimizations

Yet marginal gains are still gains and can still accumulate, and putting all of them together we were able to reach nearly the same level of efficiency as Marshal for important encodings — in particular the one dedicated to the Micheline format, a central data representation in the Tezos protocol.

Conclusion

We said at the start of this blog post that we followed a process of “Make it work, make it right, make it fast”.

These three qualities, far from being in opposition, complemented one another to make our optimizations possible and safe. We needed the codebase to be of high quality — having a sound architecture and using appropriate abstractions — to find, perform, and check our optimizations. Clean, well-engineered code meant that we could understand the code, effectively profile it, locate sources of inefficiency, and then apply local rewrites to improve efficiency.

Code that is right, is easier to optimize. Even in the context of a critical system like Tezos, where correctness and security have the highest priority — especially the context of a critical system — making it right also helps to make it fast.

Some background: the Tezos blockchain has what amounts to an operating system layer called the economic protocol, which is unusual amongst blockchains in that it is upgradable by stake-weighted community vote.

The currently-live economic protocol is Florence, and an upgrade to Granada is now scheduled for August 2021. So this blog post can be thought of as an inside peek into how a significant performance optimization was attained for Tezos’ next big OS upgrade. Developing such updates is a large part of Nomadic Labs’ day-to-day activities. ↩
A blockchain is “just” a distributed state machine and smart contracts are “just” programs that run on that machine to modify the state. “Gas” is used to bound computation and so prevent denial of service attacks: a smart contract receives a gas budget when started, and if it exhausts its gas, it gets terminated. Lower gas consumption is a corollary of optimizing on-chain computation to make it run more efficiently. Or to put it another way: optimizing Tezos to run faster on a given piece of actual hardware, means that more useful on-chain computation can be fit into a given gas budget. ↩
Test suite here: bench.sh, dexter.tz, fa1.2.tz, ↩
You can use another language to write your smart contract, but to execute it on the Tezos blockchain you’ll need to compile to Michelson. So does that make Michelson a bytecode language? Yes, but just calling it that somewhat belies its power: we describe it as a “strongly-typed low-level language” instead, which arguably captures the spirit of things somewhat better. ↩
The subtle simplification here is that monad does not quite represent a single state; it represents a state-change (i.e. a side-effect). However, if we assume that all computations start from an initial empty state, then a state change can be identified with the result of applying the state change to an initial empty state. ↩
That’s like running a calculation on a calculator, reading off the result … then switching off the calculator. The calculator had a state but we don’t care about it any more because we have the result we wanted. ↩
We’d like to call them bytes of state … but that would be confusing. Perhaps bits of state? ↩
Object-level language here just means that MichelBaby is the language we are implementing. Contrast this with the meta-level language which is the language in which we are writing the implementation; OCaml, in this case. ↩
Python does this for you whether you want it to or not. There’s a system-wide (mutable) limit of 1000 on recursive calls. The implication is that if your function has recursed that many times, it’s actually doing a loop (i.e. an iteration). Tail recursion is a functional programmer’s version of iteration and in particular, OCaml automatically recognizes and optimizes tail recursive calls to avoid stack allocation, so that a tail recursive OCaml program is operationally just a (generalisation of an) iterative loop. ↩
“Good” here means something specific: In practice, smart contracts are subject to gas limits — specifically, the current Operation Gas Limit for the currently-live Florence protocol is 1,040,000; see also this limit directly in the relevant line in the Florence source code — so a smart contract that gobbles our interpreter’s resources is likely to be terminated for exhausting its gas allocation anyway by the ambient Tezos abstract machine, and this would normally happen before the local hardware that is running the Tezos node on which our abstract machine happens to be executing, runs out of stack space. Thus a “good” limit is
- large enough to allow complete gas exhaustion for practical smart contracts that run on-chain (so in practice it would never actually be reached), yet
- small enough to prevent stack overflows on the underlying execution environment (so our code has predictable, controllable behaviour that we can reason about).
Does the fact that we do not expect this branch to be reached in practice make it irrelevant? Not really: for impractical smart contracts, stack could still be exhausted before gas. For instance, a deep recursion of sufficiently cheap operations (e.g. PUSH 1; DROP) might still overflow the call stack before gas is exhausted. So we do still have to detect and fail deep recursions — just in case. ↩
The context is the state of the blockchain used to validate a block. ↩
The Marshal module of OCaml is untyped. For this reason, the module must trust the programmer that some serialized bytes can be turned into a value of a given type. If the programmer is wrong, the program may crash. ↩
Test file here: comparing-marshal-and-data-encoding-0.2.ml.html. ↩
The files linked are at https://gitlab.com/nomadic-labs/data-encoding/-/tree/0.2/src at time of writing. However, we will give permalinks to an archived snapshot of the repository at https://archive.softwareheritage.org/browse/origin/content/?branch=refs/tags/0.2&origin_url=https://gitlab.com/nomadic-labs/data-encoding/&path=src/. ↩

Meanwhile at Nomadic Labs #12

2021-07-23T10:00:00+02:00

Welcome to our meanwhile series, the ongoing story of Nomadic Labs’ amazing adventures in the Tezos blockchain space. This post is a recap of our activities in the second quarter of 2021, following on from our 2020 recap and our 2021 first-quarter Meanwhile. As always, you can find out more about us here:

Twitter @LabosNomades ~ Website ~ LinkedIn ~ Technical blog ~ GitLab repo

So here’s what we’ve been up to these past three months:

Octez
Adoption and Support
Training
Umami
Announcing: a new logo and website
Culture and growth
Announcing: PhD student, Intern, and apprentice interviews
Protocol upgrade: Florence activated, Granada proposed
The Florence upgrade
The Granada upgrade proposal
Tenderbakenet
Testing
Interviews and Papers
NL research seminars
Tezos is 3
À la prochaine

Octez

We are pleased to announce Octez. Octez is a new name for a veteran implementation of Tezos which had previously been known just by its version number and by a GitLab repo.

At time of writing, the latest version of Octez is Octez Version 9.4, and the latest release candidate is Octez Version 10.0~rc1.

Here is the Octez GitLab repo. Feel free to get the Octez Tezos implementation and join the Tezos blockchain!

Adoption and Support

Our adoption and support teams¹ have been hard at work developing relationships, and thanks to their dilligence we are proud to announce that:

On 15 April 2021 Forge Capital Markets (a subsidiary of Société Générale) created a structured product on Tezos public blockchain. This five million Euro pilot scheme is designed in particular to demonstrate the legal, regulatory, and operational feasibility of issuing complex financial instruments (structured products) on a public blockchain.
On 20 April 2021 Ubisoft (a leading computer gaming company) became an institutional baker via its Strategic Innovation Lab, whose mission is to help prepare Ubisoft for the future by exploring innovative technologies. As a Tezos corporate baker, Ubisoft will experiment with the liquid proof-of-stake consensus algorithm and research the potential of blockchain technology for the future of gaming.²
Nomadic Labs has been helping Arteïa to sell artworks as non-fungible tokens on the hic et nunc platform (case study here). On May 8, a number of artworks by Benjamin Spark were “dropped” on the platform.
On 10 May, Nomadic Labs was re-elected as a board member of the association ADAN (an industry federation in the crypto-asset and blockchain sector in France and across Europe).
On 3 June 2021 Nomadic Labs joined WalChain, a Wallonian business network of blockchain startups, investment funds, clusters, research centres, and Wallonian universities. As Oana Ladret Piciorus, Managing Director of Nomadic Labs said: “We are happy to join WalChain and hope to bring to this ecosystem our extensive knowledge of the Tezos protocol and of course offer our support to companies in this region looking to explore it.”
In June 2021 the association Pour Que Marseille Vive! and Equisafe deployed the first non-fungible token (NFT) of a physical work of art (by the Marseille artist Deniz Doruk) on the Tezos public blockchain. You can view it here. Watch this space!
MacLaren is working on Tezos NFTs with our adoption team. See this NFT, it’s as easy as 1-2-3 howto (dated 17 June).
We have an ongoing collaboration with BlockStart, the Blockchain Partnership Programme, and we are pleased to announce that on June 17, Thibault Chessé, Alexia Mertinel, and Hadrien Zerah became blockchain mentors with Blockstart, helping to mentor blockchain start-ups during the Prototype and Pilot stages.

Training

Training is key to uptake and adoption and in May/June 2021 Nomadic Labs, in collaboration with colleagues from the SmartPy, Ligo and Archetype language teams, organised two online training sessions on Smart Contract programming (in French: “Développer des Smart Contrats sur Tezos”) for a total of just over thirty developers from various French companies interested in doing projects based on the Tezos blockchain.

The courses were well-received and we plan to continue to run training courses in both French and English. If you’re interested, please just send us an e-mail at training@nomadic-labs.com.

Umami

The Umami Team at Nomadic Labs released the Beta version of the Umami wallet in April 2021. This Tezos cryptocurrency wallet is designed as a power tool, built by OCaml developers for OCaml developers using ReasonML, and supporting all the native features of the Tezos protocol, including multiple accounts, tokens, batch transactions, and delegation — with more features in the pipeline.

You can download Umami here. For more information see:

Announcing: a new logo and website

Nomadic Labs is happy to introduce to the world its new logo and its new modern and accessible website.

Since our team formed in 2018, the Tezos project has grown considerably, and our own identity has grown along with it. While we remain primarily an OCaml programming house, we have seen our activities diversify with the addition of teams dedicated to adoption, support, and training. Our new image encapsulates our dedication to technical excellence, community and commitment, which are the core values for our activities.

	Our 2018 look	Our new 2021 look

Culture and growth

Since April 2021 we are delighted to have been joined by five new hires and four interns (see next item), bringing our count of full-time employees to 67.

Announcing: PhD student, Intern, and apprentice interviews

It’s the people in a workplace that make it a good place for people to work in.

Thus we are extremely pleased at Nomadic Labs to host interns (stagiaires) and apprentices (apprentis), and to supervise some PhD students in collaboration with the local universities in Paris: our contribution to educating the next generation and to getting fresh perspectives on our own work.

Our interns, apprentices, and PhD students come from many backgrounds, and each has their own unique story to tell, so we’ve introduced a ‘people’ category to the Nomadic Labs blog to host interviews with our valued guests. Interviews so far have included:

Protocol upgrade: Florence activated, Granada proposed

On 11 May 2021 Florence was activated, and on 25 May 2021 Granada was proposed.

The Tezos protocol environment enjoys regular performance and security upgrades.³ How this happens concretely is that a self-amendment mechanism is activated to propose an upgrade the protocol — and because Tezos is an open community, protocol upgrades are approved by community vote. This means that upgrades can only happen when you, the Tezos community, vote that it be so; which is why you’ll notice we only ever talk about us making upgrade proposals.

Most recently,

Delphi was activated on 12 November 2020 (block height 1,212,417; cycle 296; changelog; significance of the upgrade), and
Edo was activated on 13 February 2020 (block height 1,343,489; cycle 328; changelog).

We are pleased to announce of 2021 Quarter 2 that:

Florence was activated on 11 May 2021 (block height 1,466,368; cycle 357; changelog).
Granada was proposed (ongoing election; changelog) — and approved as this article went to press on July 20.

The Florence upgrade

You can find out more about the Florence upgrade here:

Florence, Tezos’ Sixth Protocol Upgrade Goes Live, Bringing Further Gas Optimizations & More (on TQ Tezos).
Florence, the latest Tezos upgrade, is LIVE
Florence changelog (on Tezos Developer Resources).

Substantive Florence upgrades include:

Gas optimisations using saturation arithmetic (so you get more smart contract execution for your gas). Our benchmarks indicate a tenfold speedup of gas computation, and a 35% speedup of the execution cycle of the smart contract Michelson interpreter in Florence overall.
The maximum operation size is doubled, meaning that the maximum length of a smart contract is doubled.
Tezos calling convention has migrated from Breadth-First to Depth-First Order (BFS to DFS), making smart contract development more intuitive and less liable to bugs (see also a discussion of the code design).
Florence drops the Test Chain feature of the Economic Protocol, because in practice it wasn’t needed.

For more reading see:

The Granada upgrade proposal

As per a detailed blogpost on the Granada upgrade proposal, Granada proposes the following changes:

A switch from Emmy+ consensus to Emmy*. Emmy* halves block time from 60 to 30 seconds (doubling transactions per second), increases the number of endorsement slots from 32 to 256 (increasing stability and participation), and provides a special fast consensus path for when the network is operating normally. All this makes Emmy* significantly faster than Emmy+.
Liquidity Baking promotes low-slippage exchange between tez and other currencies using wrapped tokens.
The improvements to gas consumption continue. Empirically, we observed gas consumption decrease by a factor of three to six in the execution of already-deployed contracts.
Regrettably, Granada also contains a non-critical bug. We expect this to be corrected in the following “H”-named upgrade proposal. In the meantime, we will provide a linting tool to help developers detect the bug-affected pattern in any new smart-contract code.

For further reading, see:

Tenderbakenet

We plan to upgrade our consensus algorithm to Tenderbake in the near future. To this end, we are pleased to announce Tenderbakenet, an experimental Tezos blockchain based on Tenderbake.

See the TzKT block explorer for Tenderbakenet (many thanks to Baking Bad), and
the instructions for joining Tenderbakenet.

Further information is in an article on “Rapid Innovation: Tezos Tenderbake Testnet Spawned”.

Testing

We continue to improve our home-grown Tezt framework. One substantive new feature is the ability to run tests on remote hosts (see merge request).

How is this substantive? Tezt was originally a tool for running a test suite of local unit and integration tests on your local machine. This new functionality makes it convenient to run tests on remote hosts; Tezt-ing by remote control, so to speak. This opens up possibilities to administer unit and integration Tezts at scale on clusters of remote nodes.

We’re also investing heavily in property-based testing (see also our previous blog post). For example:

We have completed a migration from crowbar property-based testing framework to the more mature QCheck framework. In particular, our friends at Tweag have developed a new version QCheck2, which includes integrated shrinking, which means that QCheck2 automatically searches for minimal counterexamples,⁴ without requiring the programmer to write a shrinker by hand.⁵
In March 2021 Valentin Chaboche started an internship on enabling large-scale light-weight specification and testing using QCheck. His development consists of
- a ppx to derive generators and
- a ppx to transform annotation-based function specifications to property-based tests.
ppx is an OCaml metaprogramming framework, and a generator is a program to generate test cases e.g. for QCheck or QCheck2; so that makes a ppx to derive generators an “OCaml metaprogram for a programmer to programmatically generate generators to test code the programmer generated”. Similarly a ppx to transform specifications to property-based tests means more metaprogramming, for even more automatic test generation. All in all, this means: lots and lots of new tests, many of which are generated by the computer itself with minimal or no further programmer input.

To most effectively take advantage of these advances in tooling, we’re refactoring our code to expose and thus test interfaces in the codebase (more details in our official “better testing through refactoring” milestone). We’re concentrating on hardening a set of critical code layers in the shell: the peer-to-peer layer, the distributed database, and the mempool.

This effort is complemented by the development of an adversarial testing tool by Functori. The tool, currently in design, is intended to enable the specification and execution of network attack scenarios, thus enabling early detection of security issues in Octez (an implementation of the Tezos blockchain in OCaml).⁶

Finally, the better to check the coverage of our test suite, we’re working on improving our usage of test coverage reports. This happens mainly through the switch to a newer version of bisect_ppx and forthcoming usability and automation improvements in our continuous integration environment.

Interviews and Papers

We are delighted to report that:

On 9 April our head of adoption Hadrien Zerah was interviewed in Le Point magazine (in French) about “L’irrésistible Ascension des Cryptomonnaies” (the irresistable rise of cryptocurrencies).
On 13 May, Tezos Ukraine interviewed Hadrien Zerah.
You may be interested in this interview with Vincent Botbol, Research and Development Architect at Nomadic Labs.

We commissioned a report by INRIA on “Possible evolutions of the voting system in Tezos”, which was released on April 14. See also the discussion on Tezos Agora.

We are also delighted to announce that:

Richard Bonichon et al’s paper on “Search-Based Local Black-Box Deobfuscation: Understand, Improve, and Mitigate” (see also author’s pdf) was accepted to the ACM CCS 2021 conference (15-19 November 2021). Well done Richard!
Richard Bonichon et al’s paper on “Program Protection Through Software-Based Hardware Abstraction” (see also author’s pdf) was accepted and presented at Secrypt 2021 (6-8 July 2021). Well done again Richard!
Lăcrămioara Astefănoaei, Pierre Chambart, Eugen Zălinescu, et al’s paper on “Tenderbake - A Solution to Dynamic Repeated Consensus for Blockchains” was presented at the Fourth International Symoposium on Foundations and Applications of Blockchain (FAB‘21). Well done Lacra, Pierre, and Eugen!

NL research seminars

Our series of Nomadic Labs research seminars continues apace. In Q2 we saw:

Helmholtz - A Verifier for Tezos Smart Contracts Based on Refinement Types (20 April 2021)
Information Extraction from Graphs and the TezQuery Tool (13 April 2021)
Verifiable Delay Functions and Groups of Unknown Order (27 April 2021)
On Oracles and Contract Modules (11 May 2021)
TLA+ Formal Specification of Bootstrapping (25 May 2021)
Package Tezos as a MirageOS Unikernel (08 June 2021)
Prototype of a Typical Smart Contract Agency (22 June 2021)

Click here for full list of talks.

Tezos is 3

Tezos is three years old! Specifically, June 30 2021 — just at the end of Q2 2021 — was the three-year anniversary of the Tezos genesis block, which was baked on June 30 2018. See the video on Twitter.

À la prochaine

And that’s what we’ve been up to in Quarter 2 of 2021: three months of Nomadic Labs building and testing software and extending public understanding and adoption of blockchain technology. Thanks for reading, and do check in again for the next Meanwhile for Quarter 3 of 2021.

We have two teams: adoption cultivates relationships with new users; support helps existing users to implement solutions. A third team focusses on training. ↩
A baker is a block validator on the Tezos blockchain. An institutional baker is a corporate institution that sets up one or more validating nodes on the Tezos blockchain. More information on bakers here. ↩
Technically it’s a library of cryptographic primitives and other functions, packaged as an OCaml module. ↩
Inputs that break correctness assertions, i.e. inputs that suggest errors either in the program, or in the programmer’s understanding of their program’s correctness. Also see next footnote. ↩
Some context: in property-based testing, the programmer states desired correctness properties and then leverages the computer itself to generate (thousands, tens of thousands, or even millions of) test cases for their code with respect to their desired correctness properties.

In practice this is very effective — but it is important for subsequent debugging that counterexamples be small and so easy to understand; for a given bug, a small and precise counterexample to trigger it is far easier to trace than a large and redundant one. Thus, while a property-based testing tool may find a counterexample, it is the job of a shrinker to input a possibly large counterexample and find a small one that a programmer can quickly dispatch.

It’s suboptimal if a programmer has to write shrinkers by hand, since then we may just be replacing the problem of writing bug-free code with the problem of writing bug-free shrinkers for code! This might still be a worthwhile tradeoff, but it would be better and also safer if it could happen automagically. This is the burden of which QCheck2 relieves the programmer, relative to QCheck, and doing this is possible because shrinking tends to be a heuristic process: an “integrated shrinker” is a bundle of heuristics which, in practice, tends to automatically generate effective shrinkers that find small counterexamples from large ones. ↩
Think: a penetration testing tool, applied to give us efficient white-hat hacker testing of our releases. ↩

Three questions to Nomadic Labs interns — Antonio Locascio

2021-07-21T11:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Antonio Locascio (and a couple of questions of his mentors at Nomadic Labs).

Antonio — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Antonio

1. Please present yourself and your academic background

I graduated last year as a Licentiate (a five-year undergraduate degree) in Computer Science from the Universidad Nacional de Rosario, in Argentina (where I’m from). My main academic interests since my time at university are in the fields of functional programming, programming languages and type systems. For my Licentiate thesis, I worked on a bidirectional type-and-effect system for a core language with algebraic effects, under the supervision of Mauro Jaskelioff and Exequiel Rivas.

Towards the end of my studies, I became quite interested in the field of formal verification, although I didn’t have much first hand experience in it. This was one of the reasons I was so keen on doing this internship, as it was a great opportunity to learn more about this topic while working on real-world problems.

After graduating, I was still undecided on which career path to take regarding whether to continue in academia or go straight to industry. I figured that doing an internship could help me with this decision, and there was hardly a better place to do so than in Nomadic Labs. Here, I would get a taste of what working in an industrial setting is like while still being exposed to, and taking part in, the most recent developments in the academic fields.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

My mentors for my internship here at Nomadic Labs are Marco Stronati, Germán Delbianco and Victor Dumitrescu. I’ve been working on verifying OCaml code using the F* language. F* is a hybrid verification-oriented programming language. This means that it not only has the expressivity of interactive theorem provers based on dependent types, but thanks to its SMT support it can also automatically prove many properties.

The goal of my internship is to evaluate a verification workflow consisting of three steps. The first one is to model a piece of OCaml code in F*, specifying and proving interesting properties that it should satisfy. When this effort is completed, the next step is to extract a certified OCaml implementation from the model, with which to replace the original implementation. The third step, perhaps the most novel, is to extract the specification itself from the model, as Property Based Tests for the OCaml code. We consider this last step very important, as it helps to close the verification gap, i.e. the semantic difference between the verified model and the actual OCaml code.

Concretely, I’ve first applied this workflow to the implementation of Sapling, a protocol used in Tezos for enabling privacy-preserving transactions. The main verification effort for this initial case study was focused on its storage, which consists of a special type of Merkle tree. After finishing with it, I’ve moved on to the ZK-Rollup project, which proposes a Layer 2 scalability solution with minimal impact on the main chain. Before starting its verification, I’ve been working on a prototype OCaml implementation that is helpful for tinkering with its design. The Rollup’s storage is also a Merkle tree, so the experience from the Sapling model will definitely come in handy for this new verification effort.

I’ve learned a lot during these months: from verification and testing techniques, to code reviewing and good programming practices. All of those things point to what probably surprised me the most for now, which is the importance given to program correctness here at Nomadic Labs. Although it should be expected for a company working on critical code, it was great to see that correctness was emphasized at every step of the pipeline, which is sadly not the case very often.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

Although I didn’t really know much about blockchain, the nature of it being an emerging technology with many interesting problems still to be solved compelled me. This field is perfect for verification, as there are plenty of important properties about their implementation that are worth proving. As I explained above, the appeal of Nomadic Labs is its rigorous scientific background, which guarantees an opportunity to learn about interesting new developments and ideas. After my internship, I plan to continue working as a developer on cutting edge projects such as this one.

Questions for Antonio’s Nomadic Labs mentors Marco Stronati and Germán Delbianco:

What is your input to the work of Antonio?

As Antonio’s mentors, our role is to provide guidance based on our combined experience working on the development of the Tezos Economic Protocol, on mechanized software development and verification, and in the toolchain itself. We plant seeds, in the form of suggesting relevant academic literature or related projects; we brainstorm ideas together; we provide feedback on his contributions; and answer his questions. Occasionally, we also warn him of known pitfalls, and of Here be dragons.

That said, we are pleased by the fact that Antonio has hit the ground running on this project, and by the independence he has shown. In order to implement the objectives we had devised together, he not only succeeded in learning how to verify programs in F*, but also he quickly realized how to improve the toolchain so that we could take the most out of this experiment. This makes this work relevant not just to us at Nomadic Labs, or to the Tezos ecosystem at large, but eventually to the F* community as well.

Why is the topic of this internship important?

Making verification techniques scale up to a large industrial, cutting edge, project is a Herculean task. In our case, one of the great hurdles is tightening the verification gap between a mechanization in F*, and the real-world Tezos implementation in OCaml. In a large and complex codebase like ours, there is inevitably a semantic distance between the abstract models we build for the verification of critical components of the Tezos Economic Protocol, such as Sapling or ZK-rollups, and the nitty-gritty of their implementation.

In this project we are indeed interested in reducing this semantic gap, by recovering specifications from the F* mechanization, to validate both the extracted OCaml code and the (different) implementations in the codebase.

Extracting the specification as property-based tests automatically from a mechanized implementation (technically, from dependently-typed signatures) is a (to the best of our knowledge) novel contribution of this internship. Usually, tools like F* (or Coq, or Agda) are designed so that specs and proofs are erased from type signatures, and thus they do not feature in the extracted code — they are lost! Antonio’s solution to recover them is quite elegant, as it relies on two existing F* features — meta-programming and tactics — to extract the definitions and the specs as QCheck tests. Hence, this approach doesn’t require modifying how the extraction mechanism in F* works, but only some tweaking of the Meta-F* framework.

In an ideal world, we would like to have a development cycle which efficiently integrates and interweaves concurrent verification and development efforts. This requires developing common specification styles and languages that can be shared by developers and verifiers¹. This internship provides a plausible approach: by extracting specs from F* and translating them directly to QCheck tests, we can ensure that the specs and definitions in both F* and OCaml — that is both in Proofs and Tests — are two correct reifications of the same concepts, increasing our confidence in the verification process.

Formal methods/verification experts, or anyone who gets a kick out of writing mechanized proofs. ↩

Three questions to Nomadic Labs interns — Tianchi Yu

2021-07-19T14:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Tianchi Yu (and a couple of questions of his mentors at Nomadic Labs).

Tianchi — so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. We’d love to hear a bit about you and your activities chez nous …

Questions for Tianchi

1. Please present yourself and your academic background

My name is Tianchi YU. I have been studying in the Master Program on Cyber-Physical Systems at the École Polytechnique since September 2020. From March 2020 to August 2020 I studied in the Computer Science Department at EPFL, and from September 2018 to February 2020 I studied in the Ingénieur Program at ISAE-SUPAERO in Toulouse. Before that, I did the Engineering Bachelor Program at Southeast University in China.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

The Tezos blockchain has a high-level bytecode smart contract language called Michelson. Michelson programs consume gas, which is an abstract resource designed to bound smart contract computation and thus (amongst other things) incentivize efficient use of on-chain computation. My internship at Nomadic labs is to optimize Michelson programs to consume the least gas possible, i.e. to do their jobs while consuming minimal on-chain resources.

Specifically we are studying superoptimization (finding global program optimizations which might be missed by a smaller and simpler search for local optimizations). We’re aiming for an AI-based method to find the optimal bytecode in a fully blackbox way (i.e. without user intervention).

My mentors are Richard Bonichon and Yann Régis-Gianas. They have been very dedicated and supportive.

Over the past few months I’ve gained a deeper understanding of Tezos in general, the Michelson language in particular, and I’ve learned about OCaml and advanced skills in research and development as practiced in a real day-to-day industrial setting. I am impressed by the scalable and robust structure of the Tezos ecosystem; by the efficient, continuous contributions of programmers at Nomadic Labs; and by the exceptional competence and dedication of the teams that work behind the scenes to develop Tezos. In short, for me this internship is a holistic experience: I’ve learned about programming, how to be a programmer, how to be a teacher, and how a top-quality open-source community-driven industrial project can (and should!) be run. It’s really been an inspiration for me.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I have always been curious about how blockchains work and I am eager to be one part of the contributions to it. There are many interesting topics to explore. For example: optimization, consensus, verification, and also security. Functional programming is a very skillful and interesting approach, and this led me to Tezos and Nomadic Labs.

After this internship, I plan to complete my last year of the Ingénieur Program. After that I plan to focus on Computer Science and Optimization, hopefully as a programmer or researcher. I hope that I will have the chance to get a PhD or do more industrial research.

Questions for Tianchi’s Nomadic Labs mentors Richard Bonichon and Yann Régis-Gianas:

What is your input to the work of Tianchi?

This internship topic is quite involved and can have many ramifications, so our main goal is to guide Tianchi to get a minimal viable prototype (MVP) for an AI-driven superoptimizer. This includes weekly debrief meetings, answering his questions, and in general steering him towards programming a first prototype while listening to his inputs.

Why is the topic of this internship important?

Code optimization is a critical component of practical compilers. For smart contracts and Michelson bytecode, optimization saves not only time but also crypto-fees through lower gas consumption.

Normal compilers balance speed of code generation against speed of the code generated. But for smart contracts, on-chain computation is so much more valuable than off-chain computation, that almost absolute priority can be given to optimizing bytecode. This seems an excellent application for superoptimization, since (to simplify only slightly) its job is to exhaustively search for the absolute most efficient code possible.

Tianchi’s work is an initial and interesting step towards achieving this in the Tezos ecosystem, with a way to optimize smart contracts a posteriori in a compiler-agnostic manner.

Three questions to Nomadic Labs interns — Corentin Calmels

2021-07-15T10:00:00+02:00

In this blogpost, we will ask three questions of one of our current interns: Corentin Calmels (and a couple of questions of his mentor at Nomadic Labs).

Corentin; so happy to have you with us! We hope you’re having a wonderful and educational period with Nomadic Labs. Please, tell us a bit about yourself and your activities chez nous …

Questions for Corentin

1. Please present yourself and your academic background

My name is Corentin Calmels. I’m 20 years old and from Paris. I speak fluent French and English, and am studying German and Japanese.

I am a student in HEC Paris business school HEC Paris business school. I completed the Bloomberg Certificate of Finance and I would like to specialize in finance in my future career. I’m doing a summer internship at Nomadic Labs, from June to early August 2021.

I discovered blockchain through newspaper headlines, but I wanted to go deeper and link my studies with blockchain, so I got a job in May with HEC Bourse(the HEC finance association) as a cryptocurrency columnist, and applied for a summer internship at Nomadic Labs. In so doing, I discovered an incredible technology.

2. Tell us more about your internship: main subject, who is your mentor, what you have learned and especially, what surprised you the most within these months?

I’m working at Nomadic Labs with the adoption team, under the supervision of Thibaut Chessé. I’m learning about blockchain and adoption techniques, including about how to strengthen relationships with potential corporate adopters, and about analysis and presentation skills to help identify and solve potential adopters’ key needs.

I also study DeFi (Decentralized Finance), which I believe will be the core of tomorrow’s finance. Blockchain itself is a key DeFi technology, and although DeFi is quite recent, I have big expectations for the field and it’s exciting to see it grow day by day.

My role is to support the adoption team. I am in daily contact with companies and public clients to discuss and develop potential applications of the Tezos blockchain. For instance, one of our recent adopters is McLaren Racing, which is developing a Formula 1 NFT (non-fungible token) platform on Tezos.

Since I began the internship on 1 June 2021, my understanding of blockchain and crypto has greatly increased. I’ve been exposed to a diversity of fields in blockchain: central bank digital currencies; euro stablecoins such as Lugh; tokenized and decentralized art marketplaces such as HicEtNunc; and even sports collectibles implemented as NFTs (blockchain-based collectible fan cards!) for example in car racing in cooperation with Redbull.

It is astonishing how much blockchain has already impacted on our lives. Working here has opened my eyes to the industry’s potential.

3. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs? What are your plans after completing this internship?

I’m a Blockchain enthusiast and I want to learn more about this state-of-the-art technology, which will soon be ubiquitous. Who better to work with than the pioneers in the field?

Nomadic Labs is one of the most important companies in the Tezos ecosystem and is a key contributor to its development. Like me, Nomadic Labs is based in Paris, and it’s wonderful to have a great blockchain company on my doorstep — all the more so because Nomadic Labs also has a richness of nationalities and cultural diversity.

Tezos itself has some unique features that I think will make it the blockchain of tomorrow — such as its unique self-amendment facility, and the on-chain governance — and this is even more reason to be involved in this project. After my internship, I plan to complete my studies, and then get professionally involved in digitalization of finance through blockchain.

Questions for Corentin’s Nomadic Labs mentor Thibaut Chessé:

What is your input to the work of Corentin?

As Corentin’s internship supervisor, I help him to understand blockchain-related business issues, and provide him with analytical methods and knowledge about blockchain and its ecosystem.

Why is the topic of this internship important?

It’s fair to say that industry accepts that blockchain is or could be a transformative technology. But, a big part of the challenge to actually applying blockchain to capture value is understanding how to support adoption, communication, marketing, and business processes.

Think of the sailing ship: the business case for transporting cargos across oceans is clear, but you can still invent insurance, limited companies, shareholders, new kinds of advertising, and many other business innovations to get the most out of the technology.

Thus Corentin is studying business and marketing strategies for promoting the emerging market of blockchain-based business solutions, and thinking about how to systematically analyze the complexities and the opportunities that players from different industries face when developing blockchain innovations. It’s a nascent field and the scope for business innovation is effectively unlimited.

Five questions to Nomadic Labs PhDs — Paul Laforgue

2021-07-09T10:00:00+02:00

At Nomadic Labs we are proud to create next-gen software … but we are even prouder to help create the next generation of software scientists!

So it’s our great pleasure to host and work with several PhD students, who bring unique perspectives on the technology which they help us to develop, and whose interest in blockchain we are happy to nurture and inform.

In this blogpost, we will ask five questions of one of our students, Paul Laforgue (and a couple of questions of his supervisors at Nomadic Labs).

Over to you Paul … no the mic is over here … that’s right, you’re on air!

Questions for Paul

1. Please present yourself and your academic background

I’m a PhD student working with the Nomadic Labs VERIF team and at the IRIF research laboratory of the Université de Paris.

I’ve always been interested in programming language design and the use of proof-assistants to check correctness. In 2015 I did my first research internship at Chalmers University, in which I was introduced to the Agda proof assistant. In 2016, I started a Master’s degree at the Université de Paris under the supervision of Yann Régis-Gianas, who was at the Université at the time (and is now a programmer at Nomadic Labs). Yann and I showed how a programming language like OCaml can be extended with codatatypes and copatterns; modern programming language constructs to define and handle infinite data structures. Then in 2017 I did an internship in which I designed a type system for a subset of the R programming language at Northeastern University.

In October 2018 I started my job at Nomadic Labs in what is now called the Shell team.¹ I mainly worked on the introduction of history modes, and metrics for the node. I started my PhD thesis at the Université de Paris in late 2020.

During my time as a Nomadic programmer, I became acquainted with the Tezos OCaml implementation Octez and with the challenges it has to face. This background knowledge helped us define the goals we want this PhD thesis to tackle.

2. Tell us more about the topic of your PhD: who is your mentor, and what are your research objectives?

Distributed, networked systems are now ubiquitous. By distributed system I mean one consisting of multiple participants — usually in different locations — who coordinate by exchanging messages to achieve some goal.

Blockchains are distributed of course, but so are telecommunication networks, peer-to-peer systems, and distributed databases (e.g. in medicine and transportation). Notice how all of these examples are also safety-critical!

Distributed systems tend to appear in applications that require the following features:

Reliability: a few faulty nodes won’t corrupt the system’s behaviour,
Scalability: computing resources and scheduling methods can be dynamically adjusted, and
Performance: incoming tasks can be distributed optimally amongst participants.

But there’s a catch: programming concurrent and distributed systems is notoriously hard, and in particular it is very hard to predict all possible interactions between their components.

Message-passing is a fundamental coordination mechanism used by participants in a distributed system. We are developing a framework to specify and verify message-passing protocol specifications, and to validate an implementation against a given protocol.

In particular, we are developing a framework to specify and verify multiparty message-passing protocol specifications with the aim of making them executable and usable for conformance testing. In this framework, written in Coq, protocols are defined as choreographies.

Choreographies are global descriptions of distributed systems, expressing the composition of the expected interactions between their participants. We want to specify and verify the desired properties² at the level of choreographies, and then produce certified executables by synthesising local behaviours of each participant that faithfully realise the communications given in the choreography, a mechanism known as endpoint projection.³

My mentors at Nomadic Labs are Zaynah Dargaye, Thomas Letan and Yann Régis-Gianas. My research directors from the IRIF research laboratory at Paris University are Giovanni Bernardi and Giuseppe Castagna. All have theoretical and practical backgrounds on communication-based programming, which is essential for this project. I’m lucky to work with such talented and expert people.

3. What is the value added and innovative side of your PhD thesis (if applicable, in the Tezos blockchain)?

Tezos is open-source and relies on an open network, so anybody can build or profile its nodes and enter the network. This is an important feature when we care for decentralization. Today, two projects implement Tezos nodes: the OCaml implementation Octez and the Rust implementation Tezedge. Also, numerous releases of the node are currently running on mainnet. Miscommunication between the nodes may lead to critical issues,⁴ thus to guarantee system health we must ensure that the nodes communicate well.

Also, it is important to build rigourous specifications, to avoid Tezos later encountering issues related with legacy code — as e.g. core banking platforms are finding themselves struggling with now (permalink).⁵ In our case, the fact that the specification is executable offers benefits: it stays up-to-date with the implementations and can assist programmers when modifying the node.

4. What is the benefit of preparing a PhD and spending most of your time in a private company vs. a University lab?

Both at Nomadic Labs and at the University, I am surrounded by people passionate about their fields, coming from different backgrounds and driven by curiosity. However, there is a constructive difference in emphasis:

At the University people target the publication of academic papers, whilst at Nomadic Labs the target is the improvement of the Tezos implementation. Nomadic programmers use their academic background to transform theoretical results from university research, into practical solutions. That’s two distinct yet interdependent approaches to computer science research. As a PhD student working at Nomadic Labs I get to study, participate in, and contribute to both approaches.

5. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs?

I believe that blockchain technology is promising — for decentralised finance of course, but also in other ways which I hope may be directly socially constructive as well: e.g. e-voting, such as electis, decentralised identities, and tickets.

Tezos’ self-amendment mechanism makes it interesting, and it is also a fascinating project because it involves such a breadth of areas: whether you are interested in the peer-to-peer layer, in cryptography, in verification or consensus algorithms you will find something to dig in to and to improve the software. On a more personal level I really appreciate all the energy coming from the community’s feedback.

I want to enhance my skills in the formal verification of distributed systems using proof assistants, so that by the end of the thesis can tackle industry challenges with the Verif team at Nomadic.

I would like to enrich my perspectives when coping with research problems, by learning from my mentors and studying their approaches to problem-solving. Each of my mentors is unique and has his or her strengths, which I look forward to learning from.

Finally, I hope that — indeed, I would be honoured if — my work could make distributed software more secure. Delivering a tool to concretely help programmers in their message-passing protocol design and implementations would be very gratifying.

Questions for Paul’s Nomadic Labs mentors Zaynah Dargaye, Thomas Letan and Yann Régis-Gianas

What is your input to Paul’s thesis?

A thesis is an initiation and apprenticeship into top-level research: we will help Paul to learn how to identify an open problem, understand the state of the art, and study how to build on that state of the art to find a solution. There are so many skills involved in doing this, and there is so much knowledge required to apply these skills: domain-specific mathematical knowledge, research skills to evaluate which research paths are likely to bear fruit and are worth pursuing, and how to kindle that spark of creative insight which can sometimes make all the difference to finding the solution to a stubborn problem.

Furthermore, this field is an industrial field so solutions also have to be practical. Paul will have to learn (and we will have to teach him) what the practical requirements and priorities currently are, and how to translate theoretical insights into solutions that are applicable to the real and pressing challenges of designing industry-grade distributed systems.

Why is this thesis important?

In early 2020 Paul brought up a concern he encountered in ensuring the correct implementation of a component in the Tezos codebase: how to specify and validate the correct implementation of a message-passing protocol. Curious, he studied some solutions in existing formalisms, but felt unsatisfied.

The problem that Paul highlighted is widespread in modern systems, whether or not they are distributed: it is not enough that each component of a software system should satisfy its specification locally; they also have to talk together correctly according to some agreed protocol. In other words, we need a language to specify and reason about how components interact, and what it means for these interactions to be correct (bug-free).

Paul’s thesis introduces a formal language to describe the execution of message-passing protocols rigorously and unambiguously. From that description, one can generate a reference implementation for each party in a protocol that is correct-by-construction: given that the implementation is formally constructed from its specification, it is guaranteed that each component responds correctly to the received messages, and sends the messages predicted by the protocol. The wonderful thing is that we can deploy this reference implementation in production, and also use it to validate other implementations — e.g. optimised ones that are faster or have better memory-consumption.

Tezos is split into two parts: the user-facing (and user-upgradable) Economic Protocol, which is what you see and interact with when you run Tezos — and the hardware-facing Shell, which is a lower-level abstraction layer which takes care of e.g. messaging and communication. Think: Economic Protocol = OS and Shell = Kernel. If Tezos were a car, then the Shell team would be a bunch of mechanics covered in grease, making sure all the bolts are tight and no users get hot oil squirted in their faces (no offense, Paul). -ed ↩
Examples range from generic correctness properties, such as no deadlock, when two or more processes get stuck waiting on action from one another (like two super-polite people waiting, forever, for the other one to go through the door first!); all the way to very specific technical correctness properties, such as that the trustworthiness of a certain percentage of the peers advertised to bootstrapping nodes by an advertiser node, is correct with respect to its pool of connections. ↩
For choreographies see Fabrizio Montesi’s 2013 PhD thesis “Choreographic Programming” (permalink) and “A Core Model for Choreographic Programming” by Luís Cruz-Filipe and Fabrizio Montesi (2016) (see in particular page 1, final paragraph). For endpoint projection, see “Deadlock-freedom-by-design: multiparty asynchronous global programming” by Marco Carbone and Fabrizio Montesi (2013) (permalink). ↩
For example, if communications in a distributed system become incoherent then the network might split, meaning that a subset of the system’s nodes become isolated and evolve independently for some time, and once this has happened it may not be trivial to ‘glue’ the system back together again. ↩
This article on the modern need for COBOL (permalink) is also a good read. COBOL is in a sense about as far from modern OCaml in the evolution of industrial programming languages as it is possible to get. ↩

Granada comparisons bug

2021-07-08T09:00:00+02:00

As a major development center within the Tezos ecosystem, Nomadic Labs routinely performs ongoing reviews and analysis of the Tezos protocol code. In conducting a recent review of the Granada proposal, we identified a low-severity bug that occurs in an uncommon pattern in the handling of comparisons, which we would like to raise awareness of so that developers and bakers can be best informed.

The bug was introduced when refactoring the Michelson interpreter. The refactoring dramatically improves gas consumption (typically by 5x or more), but a case was missed out in the Michelson COMPARE function (for Michelson values): when comparing two pairs where the first element of each pair is an option type set to None, COMPARE concludes that the two values are equal when correct behaviour would be to recursively compare the right parts of the pairs. For example:Pair None 3 and Pair None 5 would be deemed equal by the Granada COMPARE operator, when they should not be because 3 and 5 are not equal. The ability to compare option types in Michelson is fairly new and was introduced in Edo. This was further confirmed after we reviewed the contracts deployed on Mainnet, Florencenet, and Granadanet and found that no contracts currently use the pattern that would trigger this bug. Thus, no current contracts would be affected by this error in the Granada proposal.

Please note that future contracts could be affected in the following two scenarios:

comparing values that include option types, or
using values that include option types as keys in sets or maps (big maps are unaffected).

Neither use case is common, and even if they do appear, they only affect the contract using it, not the protocol as a whole.

While the Michelson interpreter is well covered by tests and pair and option types are tested, the uncommon combination that triggers this bug was not, so this bug was missed by the test suite.

To better detect such errors in future, we updated our test suite to use property-based testing. Property-based testing is an approach that automatically generates random test cases that try to break a function’s desired properties, and is better able to help surface these types of bugs caused by uncommon pattern use. We used property-based testing recently to test properties of the smart contracts used in Liquidity Baking.

If Granada is adopted in the next voting period then:

A fix for the comparison bug will be included in the subsequent “H”-named protocol. If adopted we would expect this to activate in October 2021.
Although developers are unlikely to use the patterns that produce incorrect behavior as described above, for safety and convenience purposes we will provide a linting tool to help detect them in code.

Introducing Mi-Cho-Coq v1.0

2021-07-02T00:00:00+02:00

It’s our great pleasure to announce Mi-Cho-Coq version 1.0: the first public release of the free and open-source Mi-Cho-Coq framework, a library for verifying the correctness of Michelson smart contract in Coq using weakest-precondition calculus.

The Mi-Cho-Coq framework is a Coq library which models all aspects of the Michelson language: its syntax, its type system, and its semantics.

Although this is the first public release, Mi-Cho-Coq has been in internal use since 2019, including functional verification of some quite large-scale code (some of which is now live):

the spending-limit contract,
several implementations of the FA1.2 token standard, and
several versions of the Dexter decentralized exchange (Dexter v2, Liquidity Baking).

Mi-Cho-Coq also features a simple certified Michelson optimizer¹ and it can also be used as a standalone Michelson type checker²

For more details on Mi-Cho-Coq, see the Mi-Cho-Coq README.

A Michelson optimizer is a tool for turning a Michelson script into a semantically equivalent but more efficient one (i.e. it does the same thing, but quicker). Most compilers targeting Michelson come with Michelson optimizers specialized to that compiler’s output. Mi-Cho-Coq’s Michelson optimizer was initially developed for the Albert compiler. For more details about this optimizer, see section 3.3 of our article (see also authors’ pdf). ↩
So a smart-contract programmer can use Mi-Cho-Coq as a lightweight Michelson smart contract type checker, without having to necessarily run tezos-client. ↩

Announcing Octez

2021-06-21T09:00:00+02:00

Rome the city, Rome the ideal

There’s a powerful scene in the film the Gladiator where the Emperor Marcus Aurelius explains that Rome is both a physical installation of bricks and mortar, and also it is the idea of Rome — an ecosystem of standards and laws by which the people lived and a city was built.

In other words: Rome is a city, and an ideal.

When you download Tezos, you are actually downloading code that runs on a machine and in so doing embodies the idea of Tezos.

In other words: Tezos is an implementation, and an ideal.

We at Nomadic Labs are proud to have played a role in the coalition of programmers that wrote a Tezos implementation which is now widely used in the community. Historically, this was the first complete implementation of Tezos, which was used to activate Tezos Mainnet (the live blockchain) back in 2018. You can download this implementation from the open source repository https://gitlab.com/tezos/tezos/, where it is actively maintained today.

However, we were so excited by this at the time that — much like the Romans — we neglected to distinguish linguistically between the ideal, and the implementation of that ideal. This may be forgivable for world-spanning preindustrial empires, but we modern software developers should be more precise. So …

The implementation is Octez; the standard is Tezos

We are happy to announce that the bundle of concrete code files maintained at https://gitlab.com/tezos/tezos/ now has a name: Octez.

‘Octez’ is a portmanteau of octopus and Tezos, a pun which draws inspiration from this big picture of Tezos.¹

‘Octez’ is also a portmanteau of OCaml and Tezos (OCaml being the main programming language used in Octez), and a pun on ‘octet’.

Octez, in more detail

Octez is an implementation of a suite of Tezos-related software. It lives in this GitLab repo: https://gitlab.com/tezos/tezos/.

Octez includes:

a Tezos node (which you may know as tezos-node);
a Tezos client for this node (tezos-client);
an implementation of the environment for the economic protocol;
daemons (baker, accuser and endorser) for protocols which are active on Mainnet;
a remote signer (tezos-signer);
and further tools, such as an encoder-decoder for Tezos data types (tezos-codec); tezos-protocol-compiler; and tezos-validator.

Everything in the big picture above is Octez, except for

the Network (underneath the legs),
the Explorer (bottom right), and
the economic protocol (the green bit with PROTOCOL written in it) — though Octez is distributed with the economic protocols of Mainnet for convenience.²

Origin of the implementation

Octez was created by a coalition of teams including Nomadic Labs, Tarides, Tocqueville Group, Obsidian Systems, Tweag, and Metastate (non-exclusive list).

The work is coordinated by the merge team (list of members here), and the name Octez was proposed to and approved by them. This reflects the decentralized nature of Tezos: Octez is a decentralized implementation, and the name itself was chosen in collaboration.

Why Choose a Name Now?

When we release a new version of https://gitlab.com/tezos/tezos/ we may say something like

version 9.2 has just been released

This invites the question: Version 9.2 of what? The answer is

Version 9.2 of the Tezos implementation that lives at https://gitlab.com/tezos/tezos/,

which is a bit of a mouthful. So henceforth we can write

Version 9.2 of Octez,

with a clear conscience, complete precision — and our SEO officer would feel better about it too, if we had one.

Thanks for reading … and keep an eye out for a future post titled “Releasing version 10.0 of Octez”!

This same picture inspired the Nomadic Labs logo. ↩
See for example the files for the Florence economic protocol, which is what Mainnet is running at time of writing. ↩

FA1.2 Approvable Ledger, formal verification by Nomadic Labs

2021-06-15T16:00:00+02:00

Introduction

Overview

We’d like to describe a recent verification effort at Nomadic Labs, namely:

A formalisation in Coq of the FA1.2 standard;
verification in Coq of the formal correctness of three FA1.2 smart contracts with respect to this formalisation; and
a description of what we learned from our effort, and of the changes and updates made to the standard and the implementations, following our checks.

Background, precise links, and definitions of technical terms¹ follow below. Here are the Coq files:

The GitLab Merge Request of the verification effort (also the corresponding GitLab branch).
A snapshot of this GitLab branch on Software Heritage. All relevant code links in this blog post will point here.
The FA1.2 interface
The FA1.2 specification
Per-contract proof file for the camlCase implementation
Per-contract proof file for the Edukera implementation
Per-contract proof file for the Dexter 2 implementation

Background: ledgers as smart contracts

Most blockchains have a native token. Tezos has tez; Bitcoin has bitcoin; and Ethereum has ether. These tokens can be used to store value, just like fiat currency.² Tezos has no other primitive tokens — but it has smart contracts, which gives flexibility to create further tokens and more generally to store value in novel ways.

In fact, Tezos can express any token you can imagine: just code a smart contract to keep a ledger of who owns your token and how much, and invoke your contract on the Tezos blockchain. Et voilà: you have a new token on the Tezos blockchain, whose ownership is represented as a ledger state in your smart contract, along with whatever other functionalities you imagined and implemented — e.g. to convert between ledger entries and USD, like USDtz.

However, to do this in a scalable and uniform manner, we need interoperability standards to which such smart contracts can adhere. Enter the FA1.2 standard:

The FA1.2 standard

What FA1.2 is

The FA1.2 standard is a standard for smart contracts implementing token ledgers on the Tezos blockchain.³ The FA1.2 standard is a plain English document (i.e. not code, but a description of what code should do), officially registered as Tezos Improvement Proposal (TZIP) number 7.⁴

An FA1.2-compliant smart contract implements a ledger that maps owners to account balances (just like a bank ledger maps account owners to account balances), and offers entrypoints (described below) to get balances, approve transfers,⁵ and so forth, satisfying certain properties as stated in the standard.

For the rest of this blog post, we may write FA1.2 contract as shorthand for “a smart contract that satisfies the FA1.2 standard”.

Entrypoints of an FA1.2 smart contract

An FA1.2-compliant smart contract has (at least) the following five entrypoints:

%transfer expects a from account, a to account, and an amount of tokens to be transferred, and updates the ledger accordingly.
%approve expects an owner, a spender, and a new allowance for the spender, and updates the approvals accordingly.
%getBalance expects an owner and returns (via a callback) the owner’s balance.
%getAllowance expects an owner, a spender, and returns (via a callback) the approved allowance for the spender.
%getTotalSupply returns (again via a callback) the total sum of all balances in the ledger.

The list above is non-exclusive: an FA1.2 contract can include other entrypoints — e.g. for burning and minting tokens, or managing an administrator — but it’s the entrypoints listed above that gives a smart contract the property of being “an FA1.2 smart contract”.

We will call the last three entrypoints — %getBalance, %getAllowance, and %getTotalSupply — view entrypoints; in OOP we might call them getter methods. The view entrypoints will become particularly interesting later on in this post, because thanks to our verification work we discovered an ambiguity in their specification.

Examples of FA1.2 contracts

Examples of live FA1.2 contracts include:

ETHtz, a token wrapping Ether (ETH) (dapp)
The USD-pegged stablecoin USDtz (dapp)
Wrapped Bitcoin tzBTC (dapp)
A liquidity ledger that is part of the Dexter 2 system.

A word on Dexters 1 and 2: Dexter is a smart contract to enable trade between tez — the Tezos blockchain’s native token — and any FA1.2 token. In the current, live version of Dexter (call it Dexter 1 in this blog post⁶) the liquidity ledger is integrated in the main contract in a manner that does not conform to FA1.2. Dexter 2 is a forthcoming replacement of Dexter 1 which includes various improvements, including a modular internal architecture which includes a distinct, FA1.2-compliant, liquidity ledger. Dexter 2’s liquidity ledger is the contract that has been verified in this work.

Furthermore, two FA1.2 contracts exist that are not themselves live on the blockchain but which exist as code made freely available and open-source for anyone to build on and deploy:

So several FA1.2-compliant ledgers exist for the Tezos blockchain:

ledgers in ETHtz, USDtz, and tzBTC;
an internal ledger used by the Dexter 2 smart contract; and
a pair of general-purpose ledger implementations, one by camlCase and one by Edukera.

This makes sense, because keeping a ledger is a basic functionality.

Two important questions

Two questions now arise:

Are the ledgers above correctly implemented: does the implementation comply with the FA1.2 standard?

The technical term for this is (formal) correctness. Just because an implementation says

“I am a smart contract implementing an FA1.2-compliant token ledger”

does not mean it is one. The implementation might contain an error.
Is the FA1.2 standard itself correct and unambiguous?

Just because a standard says

“I am a standard”

does not mean that it is one. The FA1.2 standard is a plain English document. This could contain an error, ambiguity, or omission.

We should answer

Question 2 (correctness of the standard) before
Question 1 (correctness of an implementation with respect to a formalisation of that standard),

since we should check that a standard makes sense before we formalise it and check that an implementation is formally correct with respect to this formalisation!⁷

The FA1.2 standard formalised

Formalising “FA1.2 standard” as “FA1.2 interface” + “FA1.2 spec”

To answer Question 2 above, we at Nomadic Labs wrote some code in the Coq proof assistant, as follows:

The FA1.2 interface.

This asserts internal (intensional) structure which the smart contract must support, along with axioms on its behaviour.

An FA1.2-compliant smart contract may support other internal structure too. And why is a file regarding intensional behaviour called an interface? Because it provides an abstract interface to the implementation’s storage and parameter types; see a note below.
The FA1.2 specification.

This specifies how FA1.2 entrypoints interact with the internal structure of the FA1.2 interface. An FA1.2-compliant smart contract may contain more entrypoints than those specified in the FA1.2 standard.
There is also an FA12 verification file, which collects lemmas proved purely from the interface and specification files. More on this later.

So: we translated the English FA1.2 standard into a pair of Coq file and some corollaries — the first concerning intensional structure, the second concerning extensional behaviour, and the third collecting their consequences when put together — and we can write the following informal equation:

$$ \text{FA1.2 standard} \quad\stackrel{\text{COQ}}\Longrightarrow\quad \text{FA1.2 interface}\ \ +\ \ \text{FA1.2 specification} . $$

We now discuss the components on the right-hand side of this equation, in more detail.

We start with the FA1.2 interface, which is as discussed a collection of Coq module types that does two things: provide an abstract interface of an FA1.2 implementation’s storage and parameter types; and impose axiomatic requirements on its behaviour.

We discuss each in turn:

The FA1.2 interface (functions)

The FA1.2 interface posits the following getters and setters over an FA1.2 implementation’s storage:

a function getBalance to return an owner’s balance, or return zero if the owner does not have a balance in the ledger (getBalance is a totalising wrapper around an underlying partial function getBalanceOpt);⁸
a function getAllowance to return an owner’s allowance for a spender, or zero if the spender is not recorded as having an allowance with the owner (again, this is a wrapper for an underlying getAllowanceOpt);
a function setBalance and a function setAllowance to modify the corresponding data; and
a function getTotalSupply to obtain the total sum of all tokens in the ledger.

Here that is again as Coq code:

getBalance      : data storage_ty -> data address -> data nat (** is wrapper for ... **)
getBalanceOpt   : data storage_ty -> data address -> data (option nat)
getAllowance    : data storage_ty -> data address -> data address -> data nat
getAllowanceOpt : data storage_ty -> data address -> data address -> data (option nat)
setBalance      : data storage_ty -> data address -> data nat -> data storage_ty
setAllowance    : data storage_ty -> data address -> data address -> 
                                  data nat -> data storage_ty
getTotalSupply  : data storage_ty -> data nat

For experts: data above is a wrapper that turns a Michelson type into a Coq type. To be more precise, it is a Coq function that translates an underlying Mi-Cho-Coq representation of Michelson types, to a type data in Coq.

Note that

the functions getBalance, getAllowance, setBalance, setAllowance, and getTotalSupply might be present in the smart contract’s source code as explicit internal functions — e.g. one way to write an FA1.2-compliant smart contract is to use a smart contracts language that is a functional language and allows us to actually write these functions — but also
these functions might be implicitly definable from the basic datatypes of the implementation, but not explicitly represented as such — e.g. perhaps the smart contract is written in a low-level language that simply does not have functions.

Part of the power of the interface is that it makes this distinction not matter so much: in the rest of the verification we can operate as if getBalance exists, even if it’s only implicit.

The FA1.2 interface (axioms)

The FA1.2 interface also imposes axioms on the getters and setters above. For example:

Example axiom 1

setBalance to amount, followed by getBalance, should return the amount. In Coq:

Parameter getBalance_setBalance_eq : forall sto owner amount,
      getBalance (setBalance sto owner amount) owner = amount.

Example axiom 2

Setting a balance of one owner must leave the balances of other owners unchanged:

Parameter getBalance_setBalance_neq : forall sto owner owner' amount,
  owner <> owner' ->
  getBalance (setBalance sto owner amount) owner' =
  getBalance sto owner'.

For experts: Example axioms 1 and 2 state that the ledger is an abstract array.

Example axiom 3

Setting an allowance for an owner and a spender must have no impact on anyone’s balance values:

Parameter getAllowance_setAllowance_eq : forall sto owner spender amount,
  getAllowance (setAllowance sto owner spender amount) owner spender =
    amount.

Summary

So: the FA1.2 interface is Coq code, not English. Like the FA1.2 standard from which it derives, the FA1.2 interface abstracts away from smart contract implementation details to allow us to carry out aspects of the verification once and for all, in an implementation-independent manner.

With this in hand, we can conveniently state the FA1.2 specification:

The FA1.2 specification

The FA1.2 specification specifies how smart contract entrypoints should invoke functions specified in the FA1.2 interface — in other words, it specifies the effect of executing FA1.2 entrypoints in terms of the abstract view of the contract given by the interface.

Excerpts from here and here:

(** Entry point: ep_getBalance *)
    Definition ep_getBalance
               (p : data parameter_ep_getBalance_ty)
               (sto : data storage_ty)
               (ret_ops :  data (list operation))
               (ret_sto :  data storage_ty) :=
      let '(owner, callback) := p in
      let balance := getBalance sto owner in
      let op := transfer_tokens env nat balance (amount env) callback in
      ret_sto = sto /\ ret_ops = [op].

Here, we specify in ep_getBalance that the behavior of the %getBalance entrypoint is equivalent to retrieving the balance of the owner (using the getBalance getter provided by the interface) and sending it to callback, through the emission of a transfer operation.

The specification looks like code but it’s not: it specifies required entrypoints and their parameters and behaviour, but treats them as opaque structures, the internal composition of which we do not examine.

Three FA1.2 smart contract implementions verified

Having written the FA1.2 interface and specification just discussed, we verified formal correctness with respect to them, of three FA1.2 smart contract implementations:

Edukera had already checked their implementation using a different methodology. Both the contract and its specification are written in the Archetype language and converted by the Archetype compiler into a collection of proof obligations. These proof obligations are subsequently solved using the verification platform Why3 by means of SMT solvers.

Nevertheless, we decided to carry out the verification of the Edukera contract in Coq to put our approach to the test: if the two verification efforts yield the same result, namely a formal proof of correctness, we have a basis for comparison; if not, we investigate any discrepancies.

Examples of implementational differences

The three implementations maintain internal storage differently:

Map of approved balances / ledger of balances

In the Edukera and Dexter 2 implementations, the map of approved allowances across owners is stored separately from the ledger of balances. The Edukera storage is

storage (pair (big_map %allowance (pair address address) nat)
              (big_map %ledger address nat))

and the Dexter 2 storage is

storage (pair (big_map %tokens address nat)
              (pair (big_map %allowances (pair (address %owner) (address %spender)) nat)
        (pair (address %admin) (nat %total_supply))))

In contrast, the camlCase implementation maintains one big map mapping an owner to a pair whose first component is the owner’s balance and the second component is a map of the owner’s approved allowances:

storage (pair (big_map %accounts address
                                 (pair (nat :balance)
                                       (map :approvals address
                                                       nat)))
              (nat %fields));

(In the code above, %fields corresponds to the total supply.)

Thus, the Edukera and Dexter 2 contracts admit an account that is absent from the ledger — not even with a zero balance — yet has approved spenders in the map of allowances.

In the camlCase contract this cannot happen as there is no global map of allowances separate from the ledger; instead, each owner maintains its own map of allowances.

Additional entrypoints

Contracts providing functionalities beyond the FA1.2 standard may have additional fields in their storage. For instance, the Dexter 2 contract has a field called %admin for an administrator account.

Admin is the only account allowed to burn and/or mint tokens in Dexter 2, using an additional %mintOrBurn entrypoint. The Edukera and camlCase implementations have no entrypoints aside from those specified in the FA1.2 interface.

Guarantees provided by formal correctness

Correctness with respect to the FA1.2 interface and specification ensures that certain things are guaranteed, regardless of any implementational differences. For example:

Sum of balances is unchanged

If an implementation is correct, then regardless of

the specific form of storage, or
whether or not there are any additional entrypoints,

each of the five FA1.2 entrypoints must leave the total sum of balances unchanged.⁹

This is in a file fa12_verification, which consists of a number of useful lemmas and a culminating Theorem sumOfAllBalances_constant.

This theorem is checked once-and-for-all, purely from an assumption that the smart contract is FA1.2-compliant — no need to re-run the proof for each implementation; no need even to look at the implementation code (once we have proved that it is FA1.2-compliant).

Storage Validity

The FA1.2 standard states (in English) that the %getTotalSupply entrypoint must return the total sum of all tokens in the ledger.¹⁰

The FA1.2 interface posits a function getTotalSupply : data storage_ty -> data nat, and the FA1.2 specification requires that that the value returned by the FA1.2 %getTotalSupply entrypoint is equal to the value we get by calling getTotalSupply directly (in other words: the specification insists that the %getTotalSupply entrypoint is equivalent to calling the getTotalSupply function directly on the storage).

However this in itself does not guarantee that the getTotalSupply function actually returns the sum of all tokens. This must be checked on a per-implementation basis, as we now discuss.

The FA1.2 interface defines a storageValid predicate —

Definition storageValid sto :=
    getTotalSupply sto = sumOfAllBalances sto.

But in an implementation we are unlikely to see this summation directly implemented in code, since it is impractical to recompute the total sum of all balances every time the %getTotalSupply entrypoint is invoked —- depending on the implementation language, the data structures may not even permit this iteration (i.e. not foldable), so that the language abstractions might make such a sum impossible to even express (e.g. big map).

Thus, the total supply is stored as a separate field in the storage and updated as tokens are burned or minted. The camlCase and Dexter 2 implementations do this. The Edukera implementation does too but since it provides no options for burning or minting tokens, it just sets total supply to a large constant number: constant totalsupply : nat = 10_000_000.

Now to prove formal correctness we must prove that storageValid is indeed valid:

The proofs of these properties are quite long, and this is where some real work has to happen: we have to actually read the implementation and prove in Coq that storageValid — the equality between the value returned by %getTotalSupply and the total sum of all tokens — is maintained as a dynamic invariant preserved by all of the contract’s entrypoints. This includes entrypoints that are not required by the FA1.2 standard, e.g. entrypoints for minting and burning tokens, if any.

In practice, most of the work was already done by the sumOfAllBalances_constant theorem previously discussed. Of the three contracts we considered only the the Dexter 2 contract has an additional entrypoint. Yes this is handled using bespoke reasoning, but even so this reasoning mostly just calls generic lemmas from the fa12_verification file which we mentioned above.

All entrypoints are present

The FA1.2 parameter identifies the entrypoint to be invoked and provides the arguments to go along with the specified entrypoint. The abstract FA1.2 interface requires an implementation to provide a method

extract_fa12_ep : data parameter_ty -> data (option fa12_parameter_ty)

that maps an entrypoint of the implementation to a corresponding entrypoint in the FA1.2 standard. It is partial (the “option” in option fa12_parameter_ty) because the implementation may have additional entrypoints that do not correspond to any FA1.2 entrypoint.

We then require that each FA1.2 entrypoint has a corresponding entrypoint in the implementation — and hence that the implementation is indeed an implementation of FA1.2:

Parameter ep_transfer_required : forall (q : data parameter_ep_transfer_ty),
  exists p, extract_fa12_ep p = Some (inl (inl q)).

Parameter ep_approve_required : forall (q : data parameter_ep_approve_ty),
  exists p, extract_fa12_ep p = Some (inl (inr q)).

Parameter ep_getAllowance_required : forall (q : data parameter_ep_getAllowance_ty),
  exists p, extract_fa12_ep p = Some (inr (inl q)).

Parameter ep_getBalance_required : forall (q : data parameter_ep_getBalance_ty),
  exists p, extract_fa12_ep p = Some (inr (inr (inl q))).

Parameter ep_getTotalSupply_required : forall (q : data parameter_ep_getTotalSupply_ty),
  exists p, extract_fa12_ep p = Some (inr (inr (inr q))).

The `%approve` entrypoint is correct

The specification for the %approve entrypoint guarantees that the storage returned by the contract is obtained from the initial storage by calling setAllowance with the owner, spender, and amount arguments accompanying the call to %approve.

Here’s the Coq code:

(** Entry point: ep_approve *)
Definition ep_approve
           (p : data parameter_ep_approve_ty)
           (sto : data storage_ty)
           (ret_ops : data (list operation))
           (ret_sto : data storage_ty) :=
  let '(spender, new_allowance) := p in
  (sender = spender /\ ret_sto = sto) \/
  let current_allowance := getAllowance sto sender spender in
  (current_allowance = 0%N \/ new_allowance = 0%N) /\
  ret_sto = setAllowance sto sender spender new_allowance.

Underspecification

The FA1.2 standard is underspecified by design. For example:

The standard imposes no restriction on the operations returned by the %transfer and %approve entrypoints — since e.g. a contract may need to invoke calls to another contract to access the contents of its ledger, if these are stored remotely.
A contract may contain entrypoints other than those mentioned in the standard (e.g. to mint and burn tokens).

Just because such details are not fully specified in FA1.2 does not mean we can ignore them in our verification effort! We saw one instance of this above: an obviously necessary requirement that every entrypoint must preserve storage validity — even if the entrypoint is not mentioned in the FA1.2 standard. Thus, the FA1.2 standard requires (explicitly or implicitly) some invariants which may apply to all of an implementation, even if — especially if — that implementation has extra bells and whistles.¹¹

So for each particular implementation, we formulate an additional specification file describing in more detail relevant behaviour of the implementation’s entrypoints:

For example: all three specification files above include a further requirement ret_ops = nil that the %transfer and %approve entrypoints return no operations.¹²

For example: in the camlCase and Edukera specification files, we require the %getBalance entrypoint to fail if the owner does not have a balance in the ledger; the Dexter 2 specification file must succeed even in this case and return zero.¹³

Each such contract-tailored FA1.2 specification is designed in such a way that it:

implies the general FA1.2 specification, and
fully describes the behavior of the given contract.

The latter point implies that for each entrypoint, the returned storage and the list of operations are uniquely identified by the initial environment and storage, and the entrypoint parameter.¹⁴

As mentioned earlier, we can carry out some important verification once and for all, without recourse to a contract-tailored specification file. For instance, we can prove that in any contract satisfying the general FA1.2 specification, the total sum of all tokens remains unchanged by any of the five entrypoints. For the view entrypoints this is of course trivial; establishing this for the %approve and (especially) %transfer entrypoints requires some amount of effort that would otherwise have to be carried out for each implementation separately. The axioms we imposed earlier as part of the abstract FA1.2 interface are key here.

In contrast, the Edukera Why3 verification features similar invariants, but as one-off proof obligations; e.g., the following formal property is part of the %transfer entrypoint specification:

forall tokenholder in ledger,
   tokenholder.holder <> %from ->
   tokenholder.holder <> %to ->
   before.ledger[tokenholder.holder] = some(tokenholder)

This states that if an account is neither the sender %from nor the recipient %to, then its ledger entry is untchanged by the %transfer entrypoint.

Problems detected, problems solved

Our verification revealed two discrepancies between how the three implementations above handled corner cases, which had not been explicitly addressed in the the FA1.2 standard — in other words, the implementations satisfied the FA1.2 standard but this standard was underspecified.

Discrepancy 1 resolved (making a transfer to yourself)

When the from and to accounts in the %transfer entrypoint coincide, the operation can be treated either

as a NOOP, or
as a regular transfer.

We noted that the camlCase contract implementation treats the operation as a NOOP; whereas the Edukera and Dexter 2 contract implementations treat the operation as a regular transfer.

There is a real logical difference between these two options. In the case of a NOOP, nothing changes at all. In the case of a regular transfer, the account balances don’t change but the spending allowance does, because it gets decremented by the amount transferred (from the contract to itself).

Following discussions:

The FA1.2 standard was updated to eliminate this ambiguity by requiring that this corner case be treated as a regular transfer (GitLab issue).
The camlCase implementation of the %transfer entrypoint was updated (GitLab issue).

Discrepancy 2 resolved (the view entrypoints)

Recall the view entrypoints %getBalance, %getAllowance, and %getTotalSupply mentioned above. The callback transactions are now required to forward to the callback all of the tez passed to the entrypoint (see the GitLab issue). The Edukera and the camlCase implementations were duly updated:

The Edukera view entrypoints were updated (GitLab commit).
The camlCase view entrypoints also needed slight adjustment (GitLab issue).
The Dexter 2 implementation needed no change. The view entrypoints are hardcoded to send zero tez to the callback, and this is compliant since every Dexter 2 FA1.2 entrypoint is designed to fail if the number of tez passed to it is greater than zero.

Conclusions

The formal verification of an FA1.2 smart contract in Coq has the following components:

Checking correctness with respect to the abstract FA1.2 interface and specification.

This includes instantiating concrete Coq types for the parameter and storage type parameters parameter_ty and storage_ty, providing methods for manipulating them, and proving that the axioms required by the interface are satisfied.

(Here’s the relevant file for our first example of the camlCase contract.)
Formulating a contract-tailored FA1.2 specification for the implementation’s behaviour — including any additional entrypoints — together with a proof that the general FA1.2 specification is satisfied.

The contract-tailored specification may reference additional data, operators, or behaviour that are not present in the FA1.2 standard: for instance, some contracts offer an option to pause some of the contract’s capabilities; if this flag is set, some (or even all) of the FA1.2 entrypoints might fail by default.
Verifying that the contract-tailored FA1.2 specification does indeed capture the behavior of the contract.

Specifically: each of the FA1.2 entrypoints successfully returns an updated storage and a list of operations if and only if these satisfy the specification. We don’t need to establish both directions of the if-and-only-if explicitly. It suffices to prove:
1. If an entrypoint successfully returns an updated storage and a list of operations, then these satisfy the specification.
2. The specification uniquely identifies the return storage and the list of operations.
3. If there exists a return storage and a list of operations that satisfy the specification, then the entrypoint succeeds (irrespective of the actual storage and list of operations returned).
Here i is one direction of the equivalence, and i together with ii and iii implies the full equivalence. Proving ii and iii is often simpler than establishing the converse to i directly.
Verifying that all entry points (including any additional ones) preserve the validity of the storage.

For entrypoints outside of FA1.2 this has to be established explicitly. For the FA1.2 entrypoints we already know that the sum of all tokens remains unchanged; hence we only need to show that the %getTotalSupply value remains the same. In case of the Dexter 2, Edukera, and camlCase implementations this was straightforward.

We will continue to improve and simplify the Coq code (and the Mi-Cho-Coq tool on which it depends), and future work includes:

An analogous treatment of the FA2 standard, an instance of which can be plugged into Dexter 2.
Adding native FA1.2 support to tezos-client.

— and loads of great Coq, of course — ↩
E.g. the authors of this blog post may store some value as Euro in a bank account, and some as tez on the Tezos blockchain. Tokens are what makes a blockchain more than an amusing exercise in distributed computing and databases. They put the “and” in the sentence: “Blockchain T can let you do X, Y, and Z …. and you get a token for it”. ↩
It is similar to the ERC20 standard for Ethereum. ↩
Anyone can propose an improvement to Tezos as a TZIP. See the TZIP explorer, and the TZIP GitLab repo. The TZIP improvement proposal for FA1.2 is here and the rendered markdown is here.

FA1.2 itself builds on the previous (now deprecated) FA1 abstract ledger interface. The TZIP improvement proposal for FA1 is here. ↩
In a little more detail: an owner O can issue an authorisation that some spender S may withdraw — i.e. transfer to some other account(s) — up to some amount A of tokens out of O‘s account. This can happen in a single transaction of (up to) A, or in multiple smaller transactions of total no greater than A. ↩
Royalty faces similar naming issues. E.g. King Richard the Lionheart only became King Richard I when King Richard II came along. We recently verified the functional correctness of Dexter 1, as part of a larger effort to construct a fully-verified token system. The verification of FA1.2 discussed in this blog post complements that work. ↩
Let’s emphasise this point: if we take a bad requirement — or one that has been misunderstood or applied out of context — and formalise it and write code that sanctifies this formalisation with a correct implementation — then all we have done is gone from bad to worse. This sounds obvious but surprisingly often people forget that just because a system is operating normally and as per spec, does not in itself mean that the outcomes are sensible, desirable, or good. This is of course a systems phenomenon that is not restricted to code. ↩
getBalance also appears inside the annot module. This is a different namespace and can be ignored; similarly for getAllowance and getTotalSupply. ↩
As discussed, additional non-FA1.2 entrypoints might exist to change the total sum of balances, and this is fine. ↩
To be more precise it says:

%getBalance and %getTotalSupply entrypoints have the same semantics as they do in FA1

The FA1 standard tzip-5 then says:

getTotalSupply This view returns the sum of all participants’ balances.

↩
To really spell this out: choosing what invariants to impose and on what entrypoints, is a creative process that requires and expresses our understanding of the code’s intended meaning. For instance, we (presumably) want all entrypoints to preserve validity of storage, but we might want to allow some (non-FA1.2) entrypoints to mint or burn tokens. Thus, not only does designing these invariants require us to understand what the entrypoints are supposed to do — but at an even deeper level, designing invariants is a way to formally capture, express, and record this understanding for our colleagues, readers, and any continuous integration tools. This is not new: well-chosen invariants, like well-chosen tests, are how good programmers design good software. ↩
camlCase: the ret_ops = nil on lines 63 and 86; Edukera: similarly on lines 61 and 82; Dexter 2: similarly on lines 62 and 84. ↩
- For camlCase this is the condition isOwner sto owner on line 108 of the spec file, where isOwner is defined here, in the FA1.2 interface file and states that the owner must have a balance in the ledger.
- Similarly for the Edukera spec file, line 104.
- Dexter 2 has no isOwner clause; see line 123. The result of getBalance is returned, which is 0 when the owner does not have an account.
↩
Geek note. This is slightly tautological of course: all we are saying is that the smart contrat’s output depends on everything the smart contract’s output depends on. This includes quantities such as:
- The AMOUNT sent to the contract.
- The contract’s BALANCE.
- The state of the blockchain (NOW and LEVEL).
- The existence or non-existence of called contracts.
- Loads of other stuff.
… all of which is rolled up conceptually into “the state of the blockchain” or the “environment of the smart contract when called”. But that’s fine; this is a stateful system so that’s what we would expect. ↩

Five questions to Nomadic Labs PhDs — Guillaume Bau

2021-06-11T15:00:00+02:00

At Nomadic Labs we are proud to create next-gen software … but we are even prouder to help create the next generation of software scientists!

In this blogpost, we will ask five questions of one of our students, Guillaume Bau (and a couple of questions of his supervisor at Nomadic Labs).

Over to you Guillaume … no the mic is over here … that’s right, you’re on air!

Questions for Guillaume

1. Please present yourself and your academic background

I’m currently a PhD student at LIP6 and Nomadic Labs. I graduated with a computer science BSc in 2007 from the Université Pierre et Marie Curie (now Sorbonne Université). I worked for some time in some network analysis and cybersecurity companies, mainly as a C language developer. In 2016, I entered a startup targeting floating point manipulations errors and approximations. This was the occasion for me to use my theoretical background: I developed an abstract interpreter for C and Ada, designed to detect floating point arithmetic misbehaviour in avionics software.

Then, while first entering Nomadic in October 2018, I had the opportunity to continue my academic studies: I graduated with a master’s degree and worked on a prototype for an abstract interpreter for Michelson as my Nomadic Labs internship.

2. Tell us more about the topic of your PhD: who is your mentor, and what are your research objectives?

The topic of my PhD is to develop a static analysis tool for the Michelson smart contract programming language, helping developers to assert correctness properties in their Michelson code and to detect errors in it.

It will use Abstract Interpretation, a static analysis technique which tends to give quick results, even for thousand-line contracts, and which does not require user intervention or programming to guarantee some properties, as are required with proof assistants.¹ Abstract interpretation works by abstracting known, unknown, and partially-known values as overapproximations, e.g. using an interval [1,10] if we know a value might be 1 or might be 10. This often allows us to deduce useful mathematical or logical properties, for example, “this value is always different from 0”, or “this piece of code cannot be reached”.

Abstract domains are responsible for managing Michelson instructions with respect to these value approximations, and a large part of the work needed to write an abstract interpreter is to develop the abstract domains handling some specific language feature like Michelson sets or maps, or some specific property, like storage changes though multiple calls to a specific smart contract entrypoint. Of course, because we are always manipulating overapproximations, an analysis can raise false alarms² if the approximation is too loose.

Much of my PhD work goes into developing new abstract domains that make useful trade-offs between precision and performance, to help Tezos smart contracts developers guarantee properties of their work. For instance, I wrote an abstract domain of “smashed” lists, able to represent as a unique abstract value the set of elements contained inside a Michelson list, and an improved domain will consist in keeping abstract items of the list separate.

My research director is Antoine Miné from LIP6, and my mentors are Vincent Botbol and Mehdi Bouaziz from Nomadic Labs. My work integrates with Mopsa, Antoine Miné’s modular framework for Abstract Interpretation. This is currently maintained by Antoine with the help of the Mopsa team, and allows building analysers for multiple languages (C and Python analysis are provided built-in), while providing useful primitives and a robust framework for abstraction compositions.

3. What is the value added and innovative side of your PhD thesis (if applicable, in the Tezos blockchain)?

Michelson has an interesting combination of language features. It is high-level in some respects — it’s a strictly-typed language with high-level data structures — yet low-level in others — e.g. it is stack-based. Stack-based languages are uncommon³ and there is relatively little work on static analysis for them, and furthermore, Michelson’s high-level data structures create opportunities to develop rather precise customised abstractions.

Michelson is also interesting because it’s a smart contract language, not a regular programming language, so you tend to write different kinds of programs, for which you may need to prove different kinds of correctness properties. Some of these may be new and specific to smart contract programming languages; like token-preserving invariants or resources analysis (gas, storage size evolution, and so on).

Interesting properties also emerge from the underlying blockchain platform, e.g. properties of inter-contract and transactions behaviour, and these can depend on the underlying blockchain implementation.⁴ That is: if you have a specific blockchain implementation C running smart contracts X, Y, and Z, then an effective analysis of the (full panoply of possible) interactions between smart contracts X, Y, and Z in the context of C, may be key to proving safety, even if you personally only care about the correctness of contract X! As you might imagine this can get quite complicated, which is why effective automated tools can be so valuable.

Thus from the point of view of a hands-on real-life Tezos smart contract developer, an automatic static analysis tool would add real value. It would help find errors earlier in you development cycle, and could prevent blocked contracts or malicious exploitation. Abstract interpretation is scalable to real-life use-cases and does not require much human work to obtain its guarantees of properties. I would argue that for the working programmer, this is a desirable combination of features!

4. What is the benefit of preparing a PhD and spending most of your time in a private company vs. a University lab?

University labs are fecund environments and ideas can be exchanged and relations between different topics explored — but it’s not always the case that the actual developers of the system you’re studying are working in the very same office. It would be like working on a C language analyser and being able to pop over to Brian Kernighan in the neighboring office to ask “Why did you do it like that?”

Studying at Nomadic, I’m surrounded by Tezos developers, some of whom are working on techniques for certifying smart contracts with Coq; some of whom are working on the Michelson interpreter; and we are all reviewing code from other Tezos contributors. So I’m right at the heart of Michelson’s development, with the ability to pre-empt the language’s evolution, and even participate in it.

5. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs?

Blockchains are an interesting topic, being at the intersection of several important disciplines: distributed systems, networking, cryptography, language design and analysis and more. So there is much opportunity for new and useful academic research.

And, the Tezos blockchain in particular considered safety in its design right from the start: the main node is written in a memory-safe language, the cryptographic routines are formally verified, and Michelson was built without implicit failures. Nomadic engineers share a desire to apply formal verification to guarantee the inner working of the node and network behaviour. This culture from Tezos and particularly from Nomadic Labs implied working with the OCaml language, and working on Abstract Interpretation, both of which I’m very interested in.

I have a background in and am familiar with basic abstract interpretation techniques, to e.g. find numeric invariants, but I have a lot more to learn to be able to build systems capable of proving properties involving e.g. authentication and authorisation. While working with the Mopsa framework, I’m learning how real-life and industrially robust abstract interpreters are developed, with advanced feature sets, like modular analysis and abstraction compositions.

I would be very pleased to be able to make a small contribution to making smart contracts more secure — initially on the Tezos chain, but I hope the research may be transferable to others — and I look forward to studying and learning at Nomadic Labs and creating a practical project which will help real Tezos users to create value, by making things that people want and can (safely) use!

Questions for Guillaume’s Nomadic Labs mentor Vincent Botbol

What is your input to Guillaume’s thesis?

Ph.D. students are expected to absorb vast amounts of information. My role is to give Guillaume guidance, advice, and a framework to understand the material — as well as to suggest pertinent literature and references where relevant.

Blockchain is a particularly challenging and exciting space in this regard, because it is inherently interdisciplinary and because, being so new, there is less of a body of knowledge to guide us.

My background includes expertise in formal verification (in particular using abstract interpretation) and in the implementation of static analysers. I also know the Tezos codebase and its ecosystem intimately, having participated in its evolution from even before the mainnet launch back in September 2018.

My technical input is then twofold:

I will help Guillaume by proposing techniques to increase performance and precision⁵ in his static analyser design, and also
I will guide him towards useful real-life use-cases, to make his work maximally useful for the smart contract developer community.

Why is this thesis important?

Many of the most expensive errors on blockchains are related to buggy smart contracts. Formal verification can help critical software development but it remains had to apply: it’s great when it works, but tooling tends to be either incomplete or hard to apply (or both).

Guillaume hopes to create a fully automated analyser. The user just needs to provide a specification (e.g. “only my tz1 address may remove tokens from this contract”) and the analyser will automatically check whether the specification is met.

By relying on the mathematically sound⁶ framework that is abstract interpretation we obtain a strong proven guarantee that if the analysis states that an invariant is true, then it is.⁷

Guillaume’s tool will also be able to detect coding style anti-patterns, dead code and security issues.

The abstract interpretation method Guillaume is using has two strengths. First, it’s great with the numerical (i.e. arithmetic) properties which are common in smart contract code (asset management, ERC20-like contracts, decentralised exchanges, etc.). Second, it’s fast: thousands of lines of code can be analysed in a few seconds.

I look forward to integrating this analysis into the Tezos smart contract development process and so helping users and developers in the Tezos ecosystem to catch more bugs, more quickly, in their smart contracts.

To be more precise (and in fairness to proof assistants):
- there is a highly labour-saving subset of properties for which abstract interpretation tends to not require user intervention or programming (e.g. detecting integer overflows), and
- there is another still-useful subset for which abstract interpretation is applicable, but which may require some user intervention e.g. to refine invariants (“user X has permission to transfer the contract tokens” might require us to refine our invariant with a precondition like “user X has address tz1”), and
- there is a third subset for which abstract interpretation can not provide useful invariants in general: e.g. because the properties involved are just too complex for a general abstract domain to check. In this case, you could try to build a customised abstract domain to prove a specific instance of this property — but this may just be a case of picking the right tool for the job and it may be quicker and easier to prove your invariant using other means, and then input it into the analyser as an assumption.
So abstract interpretation is not a magic wand, but it can feel like that sometimes. ↩
This is like a false positive in a medical test: not necessarily bad in and of itself, but it wastes time. ↩
May I expand on this in a footnote please? [Sure, please do -ed]

Stack-based general programming languages are uncommon, but low-level assembly and bytecode languages are frequently stack-based: e.g. Intel and ARM assembly, Java VM bytecode, Microsoft CLR bytecode, and OCaml bytecode.

So Michelson is an interesting engineering hybrid that combines features typical of a low-level compile-target language, with features of high-level programming languages. It is possible to program directly in Michelson, though I wouldn’t want to myself. A key point is that verifying low-level languages requires to reconstruct the programmer’s intent to some degree, which can be hard in some cases and may lead to loss of precision. But, Michelson’s high-level features ameliorate this, and furthermore, being strictly-typed means that specific classes of problems can never arise, which facilitates the writing of both an interpreter and an abstract interpreter. ↩
For instance, transaction evaluation ordering differs between Ethereum and Tezos, and Tezos has notions of internal vs. external operations. A good abstract interpreter for Tezos, if it is to be maximally useful, may need to know about and account for these specificities. ↩
More performance = Increased speed. More precision = the analysis raises fewer false positive alarms. ↩
Sound means something rather precise in mathematics, namely “correctness with respect to a logical model that is proven to mathematical precision” …. ↩
… and correct means “the stated invariants are maintained and there can be no exceptions, counterexamples, or corner cases”. Knowing how to translate intuitive notions of correctness into verifiable invariants is a key skill, but that’s a whole other discussion. ↩

Progress report on the verification of Liquidity Baking smart contracts

2021-06-04T12:00:00+02:00

Context

Liquidity baking is one of the main features of the Granada proposal. The feature itself has been well explained by others (initial proposal on Agora, TZIP, Presentation by Midl-dev, Presentation by XTZ.news) so we won’t present it here in details.

An original aspect of this feature is that it depends on a smart contract that will likely hold a very high balance. For this reason, the security of this smart contract is extremely important and we have devoted a lot of efforts into checking that the contract is safe.

We have used formal verification to prove that the compiled Michelson code for the CPMM contract is valid with respect to functional specification. As was shown in the past, this approach increases our confidence in the correctness of the implementation, but says little about the properties of its specification. In particular, the specification of Dexter was flawed. To avoid similar pitfall, we also have proven higher-level, security properties for the CPMM contract, and we have subjected the overall Liquidity Baking feature to thorough testing.

Overview: smart contracts involved in Liquidity Baking

Three smart contracts are involved in the Liquidity Baking feature: a Constant Product Market Maker (CPMM for short), one FA1.2 contract managing a pool of tzBTC, and another FA1.2 contract managing liquidity tokens.

The main contract is the CPMM; a kind of contract deeply inspired by Ethereum’s Uniswap decentralized exchange. Two kinds of users interact with this contract.

The first kind of users are called traders; they can use the contract to trade tez for a particular FA1.2 token called tzBTC (using the xtzToToken entrypoint) and vice versa they can trade tzBTC tokens for tez (using the tokenToXtz entrypoint). Traders can also use the CPMM to trade tzBTC tokens for other FA1.2 tokens (using the tokenToToken entrypoint).

In order to perform the trade without waiting for another trader wanting to do the symmetric trade, the CPMM contract holds both tez and tzBTC tokens that are provided by the second kind of users, called liquidity providers. By maintaining in its storage the number of tokens available, the CPMM is able to compute an exchange rate for the current trade (independently from the other trades).

Liquidity providers deposit tez and tzBTC tokens simultaneously in a so-called liquidity pool (using the addLiquidity entrypoint). The share of the liquidity pool that each liquidity provider owns is materialized using a FA1.2 token contract (called the liquidity token contract) administered by the CPMM. Each time addLiquidity is called new liquidity token are minted and credited to the liquidity provider that called it. Liquidity providers can also withdraw an amount of tez and tzBTC token by burning their liquidity tokens (using the removeLiquidity entrypoint).

Finally, anyone can make a donation in tez to the CPMM contract (using the default entrypoint). This donation goes to the tez part of the liquidity pool but no liquidity token is minted for it so users have no reason to call this entrypoint; the sent tez are effectively shared between the liquidity providers. The default entrypoint is used by the Tezos protocol to send the liquidity baking subsidy (2.5 freshly minted tez) to the contract in each block.

This subsidy is meant to incentivize liquidity providers. To further incentivize them, for each trade, a 0.1% fee of the traded amount is also sent to the liquidity pool.

Verified properties

For each property, we will explain the potential risk if it did not hold, we give an informal description of the checked property, we show the verification technique used to check that this property holds, and we provide pointers to online resources for more details.

Notations

$L$ : number of liquidity tokens accounted in the CPMM storage.
$X$ : number of xtz tokens accounted in the CPMM storage.
$T$ : number of tzBTC tokens accounted in the CPMM storage.
$\gamma$ is equal to 1 minus the fee (a constant equal to $0.999$ in the CPMM).

We write $L'$, $X'$, and $T'$ for these values after the execution of the entrypoint under focus.

$X$ and $T$ are called the supplies of the CPMM and their product $X \cdot T$ is called the product of supplies.

Verification tools

Our main verification tool is the Mi-Cho-Coq library that allows to verify Michelson programs with the Coq proof assistant. Coq is a very expressive system in which essentially any mathematical theorem can be stated and proved. However, Mi-Cho-Coq has some limitations, the most important one in the context of this work is that it cannot currently be used to prove properties about contract interactions.

In that situation, to get more confidence in the validity of properties where Mi-Cho-Coq cannot be used to get a mechanically verified mathematical proof, we use a testing framework based on property-based random testing named QCheck. We have extended the test framework which has been developed to test Tezos protocols in two directions: (1) execute arbitrary sequences of Liquidity Baking contract calls on top of arbitrary contexts, and (2) simulate the execution of these contracts.

Both calling the real contracts and simulating them have benefits. Executing the real contracts improves our confidence in the tests results, but has proven to be time-expensive. Simulating the execution addresses this limitation and allows to execute many more test cases, but the results obtained with this approach are only valid if the simulation is trustworthy.

To get the best of both worlds, we have validated our simulation against the contracts implementation, by executing the same sequences of contracts calls using both approaches. During this process, we check after each baked blocks if the simulation state is an abstraction of the blockchain context. In the process, we have both ironed out bugs from our simulation and improved our confidence in the contract implementation.

Correctness of compilation

The CPMM and the FA1.2 liquidity token contract have been developed using the LIGO language and compiled to Michelson scripts. Compilers are complex programs and bugs inside them are commonly found.

Property

The semantics of the Michelson scripts matches the semantics of the source LIGO scripts.

Risk

Most compiler bugs are benign but in the worst case a bug in the LIGO compiler could introduce a security risk if the semantics of the LIGO scripts was not preserved through compilation.

Verification technique

Using the Mi-Cho-Coq framework and the Coq proof assistant, we formally prove that the Michelson scripts are semantically equivalent to a high-level specification arguably equivalent to the source LIGO scripts. During this verification work, no bug in the LIGO compiler was found.

Resources:

Safety of execution

All the entrypoints of the CPMM, except default, involve divisions. We make sure that the script never divides by zero. As a matter of fact, the contracts divide by the number of tokens of the pools (the pool depending on the entrypoint in question).

Property

$L$, $X$, and $T$ should always be strictly positive. This is an obvious consequence of the following property:

$$ 0 < L^2 \le X \cdot T $$

In plain english: the square of the total number of liquidity tokens is always positive and the product of xtz tokens and FA1.2 tokens is always greater or equal to this value.

Risk

If $L$, $X$, or $T$ ever reaches zero, some entrypoints of the CPMM cannot be called anymore so the functionality they provide is lost. In particular, some funds would be frozen and the subsidy would also be minted for nothing.

Verification technique

Using the Mi-Cho-Coq framework and the Coq proof assistant, we formally prove that all the entrypoints preserve this relationship between $L$, $X$, and $T$ and that this property is sufficient to show that the token pool never reaches 0 (hence, the contract never divides by zero and never gets stuck).

The case for removeLiquidity is interesting because it can lead to $L = 0$ in the edge case of all liquidity providers choosing to remove all their liquidities. To guarantee that this case never happens, a tiny initial liquidity deposit is done with the null address so that it can never be removed in the future. This external assumption has been taken into account in the formal proof.

Resources:

Evolution of the product of supplies

Despite its name, the CPMM does not preserve a constant product of supplies. Indeed, the default entrypoint is increasing $X$ with no change to $T$. The fees and the rounding in division also prevent the product of supplies to be constant.

As a matter of fact, the product of supplies generally increases. This is true for allthe entrypoints, except, of course, for removeLiquidity that decreases both $X$ and $T$.

In order to take the removeLiquidity case into account, we need to track not only $X$ and $T$ but also $L$. The correct invariant is a property named “Ratio between Product and Squared Liquidity Increases” and it holds for all entrypoints:

$$ { { X \cdot T } \over L^2 } \le { { X' \cdot T' } \over L'^2 } \qquad (RPSLI) $$

This property is important because it guarantees that there is an incentive for providing liquidity: whatever the users of the contract do, liquidity providers are guaranteed that the amount of tez and tzBTC tokens that they will be able to withdraw in the future always have a greater product than what they originally deposited.

Property

addLiquidity of $A$ tez increases the product of supplies as follows:
$$ X' \cdot T' = (X + A) \cdot (T + {\left\lceil {{T \cdot A} \over X} \right\rceil}) \ge X \cdot T $$
and $L' = L + { { L \cdot A } \over X }$.
removeLiquidity of $A$ liquidity tokens decreases the product of supplies as follows:
$$ X' \cdot T' = (X - { { X \cdot A } \over L}) \cdot (T - { { T \cdot A } \over L}) \le X \cdot T $$
and $L' = L - A$.
xtzToToken of $A$ tez increases the product of supplies as follows:
$$ X' \cdot T' = (X + A \cdot \gamma) \cdot (T - {{ \gamma \cdot A \cdot T } \over {X + \gamma \cdot A}}) \ge X \cdot T $$
and $L' = L$.
tokenToXtz of $A$ tzBTC tokens increases the product of supplies as follows:
$$ X' \cdot T' = (X - { { A \cdot \gamma \cdot X } \over { T + A \cdot \gamma } }) \cdot (T + A) \ge X \cdot T $$
and $L' = L$.
tokenToToken of $A$ tzBTC tokens increases the product of supplies as follows:
$$ X' \cdot T' = (X - { { A \cdot \gamma \cdot X } \over { T + A \cdot \gamma } }) \cdot (T + A) \ge X \cdot T $$
and $L' = L$.
default of $A$ tez increases the product of supplies as follows:
$$ X' \cdot T' = (X + A) \cdot T \ge X \cdot T $$
and $L' = L$.

Risk

A strict decrease in the product of supplies means that value is taken out of the liquidity pool. The only case where such a decrease is expected is the removeLiquidity entrypoint. A decrease in any other entrypoint means that an attacker has found a way to steal value from the liquidity pool.

Verification technique

Using the Mi-Cho-Coq framework and the Coq proof assistant, we formally prove that each entrypoint satisfies its corresponding property. We also formally proved that each property implies the property $RPSLI$.

Resources:

Consistency between the contracts’ internal state

As explained in the overview, there are three contracts involved in Liquidity Baking: they must have a consistent image of the token distribution. The CPMM storage contains the CPMM’s view on both parts of its the liquidity pool ($X$ and $T$) and also the total supply $L$ of the liquidity token contract. The liquidity token contract also stores this value alongside the token balances of all the liquidity providers. The tzBTC contract stores the tzBTC balances of all tzBTC holders, in particular the CPMM.

Properties

$L$ should always be equal to the total supply of the liquidity token contract.
$X$ should always be equal to the tez balance of the CPMM contract,
$T$ should always be smaller or equal to the tzBTC balance of the CPMM in the tzBTC contract.

Remark that $T$ could become strictly smaller than the tzBTC balance of the CPMM contract if a tzBTC holders were to directly call the tzBTC contract to transfer tokens to the CPMM (instead of calling the tokenToXtz or the addLiquidity entrypoint of the CPMM); in that case the CPMM contract can not reject the transfer or even discover that the transfer happened. Users have no reason to donate tzBTC tokens to the CPMM but when reasoning about the security of Liquidity Baking we need to consider this edge case.

Risk

An inconsistency between the states of the three contracts could lead the CPMM to incorrectly compute exchange rates and liquidity payback.

Verification technique

The test framework of the Tezos protocol has been extended to originate three contracts: the CPMM, the FA1.2 contract to deal with liquidity tokens, and another FA1.2 contract to act as a replacement for tzBTC (the latter being way more complicated to originate than the former).

We generate sequences of calls to the three contracts involved in Liquidity Baking, and apply the transactions on top an initial context provided by the test framework. After each baked block, we check that the contracts’ storage remain consistent with each other, i.e., (1) the tez balance stored in the ledger, (2) the storage of the FA1.2 contract, and (3) the storage of the tzBTC replacement contract.

Resources:

The implementation of our property-based tests can be found in a dedicated branch in the Nomadic Labs repository of Tezos

No global gain

Assuming an attacker with an arbitrary balance of tez, tzBTC, and liquidity token, we do not expect the attacker to be able to gain value through its interactions with the CPMM contract alone. Each call to an entrypoint of the CPMM requires a payment in at least one of the assets but we could imagine, if a vulnerability was present in the CPMM code, that a sequence of calls resulted in the attacker increasing their holding in all three balances. This is what we call a global gain.

Property

For any sequence of calls from a fixed attacker to the CPMM, the global effect of the sequence of calls on the balances of the attacker in tez, tzBTC, and liquidity tokens is such that at least one of the balances decreases.

Note that we are assuming that the attacker is the only entity interacting with the contract. In fact, if we take the subsidy into account a global gain is possible by adding liquidities, waiting a few blocks (during which the subsidy is received by the CPMM and shared between the liquidity providers), and finally removing the liquidities. But this global gain coming from the subsidy is not an attack on the CPMM, it is the purpose of the Liquidity Baking subsidy!

Risk

Even a very small global gain is very dangerous for the security of the contract because an attacker able to exploit a global-gain vulnerability can likely exploit it many times in a row and the total loss for the contract can be dramatic.

Verification technique

We have looked for global-gain attacks using our property-based testing framework. More precisely, the test consists in generating 100 initial contexts, and for each of these contexts, generating and executing a sequence of 100 valid contracts calls performed by the same implicit account $c$. Note that the execution is done with a subsidy equal to $0$, because the objective of this test is to discover if an attacker can become richer by exploiting the CPMM, not by depositing funds against liquidity tokens. After each contract call, the balance of $c$ is checked to assert that either one of its balances (xtz, tzBTC or lqt) has decreased, or they all remained constant.

QCheck has not being able to find a counter-example to this property.

Resources:

The commit introducing the test

Trust base

Our verification and property-based testing efforts have mostly been directed to the liquidity token FA1.2 and CPMM scripts because these two scripts are new to the Tezos blockchain; the Granada protocol, if activated, will originate these scripts during the migration from Florence. The third contract involved, the tzBTC FA1.2 contract, is however already used onchain, it has been audited and it is has been traded on Tezos decentralized exchanges (Dexter and Quipuswap) for months.

The security of Liquidity Baking relies on the assumption that the bahavior of the tzBTC contract is consistent with the one of the FA1.2 contract that we have used as a replacement for it in our property-based testing context.

We also assume that Mi-Cho-Coq correctly defines the semantics of the Michelson language. If this was not the case, we could have missed compilation bugs.

Conclusion

Since the very beginning, we were aware Liquidity Baking is a sensitive feature, even more due to the mechanisms implemented to encourage traders to use it. This is why we have devoted a significant amount of effort to increase our confidence in its implementation. The formal verification effort of the CPMM contract includes key functional properties. The complete architecture (the three contracts, and the changes integrated to the Granada protocol) has been challenged with unit testing and property-based testing. Finally, the proposal comes with two halting mechanisms that further reduce the risk of a malicious actor taking advantage of the CPMM contract, and let the bakers be the final arbiters of Liquidity Baking’s fate.

Overall, we feel confident that all components of the Liquidity Baking features are sound and secure.

While the Granada election takes its course, we plan to continue our effort on asserting the economic properties of Liquidity Baking. The latter are reminiscent of game theory properties where actors aim to maximize their gains by responding to incentives in the contracts. We believe we now have a strong foundation for tackling this additional challenge.

Announcing Granada

2021-05-31T08:00:00+02:00

This is a joint post from Nomadic Labs, Marigold, TQ, Tarides, and DaiLambda.

We were proud to see Florence go live on the chain on 11th May 2021. In keeping with our policy of proposing upgrades on a regularly scheduled basis, we’re happy to announce our latest Tezos protocol proposal, Granada.

(As is usual, Granada’s “true name” is its hash, which is PtGRANADsDU8R9daYKAgWnQYAJ64omN1o3KMGVCykShA97vQbvV).

Granada contains several major improvements to the protocol, as well as numerous bug fixes and minor improvements. Below we discuss some of the most interesting and important changes:

Emmy*: The current Tezos consensus algorithm is Emmy+ and we propose to replace it with Emmy*. As described in this blog post, if Granada is adopted, Emmy* will generally halve the time between blocks, from 60 seconds to 30 seconds, and allow transactions to achieve significantly faster finality than under the current consensus algorithm. (We expect several significant further improvements to our consensus algorithm and reductions in block times in coming proposals.)
Liquidity Baking: The availability of low-slippage exchange of tez into other currencies and vice-versa is key to allow the widespread use of Tezos. Liquidity Baking addresses this directly by piggybacking off the liquidity and global availability of Bitcoin, and incentivizing large amounts of decentralized liquidity provision between tez and wrapped bitcoins.
Gas improvements: A number of substantial improvements to performance have been made, which in turn result in dramatic reductions in gas consumption. We have generally observed a decrease of a factor of three to six in the gas consumed in the execution of already deployed contracts. For some contracts, the improvement has been almost a factor of eight. This reduction in gas consumption, the latest in a series we began with Delphi, will enable developers to deploy richer, more complicated, and more interesting applications on Tezos at reasonable real-world cost.

Granada contains numerous other bug fixes and small improvements, and we encourage you to look at the changelog for a full description of the contents of the proposal.

We strongly encourage you to test your own Tezos-based applications to check for compatibility problems with Granada. Granada, and the configuration for its test network Granadanet, are included in version 9.2 of the Tezos node.

If Granada is adopted, the next proposal (which likely will have a name starting with the letter “H”) should be proposed and enter the Tezos amendment process this summer.

We hope that “H” will introduce a new consensus algorithm that, if adopted, will bring fast finality to Tezos.

Over the course of the coming months, our team also intends to continue to develop and propose amendments to increase performance, lower gas consumption, reduce block times, and increase Tezos’ throughput (as measured, for example, in transactions per seconds, or smart contract invocations per second).

Simulating Tenderbake

2021-05-21T11:00:00+02:00

If you’re impatient, you are welcome to read the guide and jump right to the simulator now. See also the implemented algorithms below.

If you find that reading about it helps you get excited before using your simulator, then read on …

Background

The consensus algorithm is a crucial part of any blockchain project. Because of the distributed nature of blockchains, different nodes can have different ideas of what the current state of the blockchain is supposed to be. The role of the consensus algorithm is to decide which of these possible states, called forks or branches, will be selected globally. In the world of distributed systems, there are two distinctive families of consensus algorithms: Nakamoto-style and classical BFT-style. Most blockchain solutions use Nakamoto-style algorithms that allow the existence of any number of forks of any length, but make longer forks increasingly unstable, so that they eventually collapse to a single branch. We say that these algorithms have probabilistic finality. Classical Byzantine fault tolerance (classical BFT) algorithms refer to a class of consensus algorithms which offer deterministic finality over asynchronous networks (you can check these two classical papers for more details The Byzantine Generals Problem and Practical Byzantine Fault Tolerance). They stipulate definite conditions that must be fulfilled for a block to become final.

Nomadic Labs intends to propose Tenderbake (blog post; paper) — a classical BFT-style algorithm — as the next consensus algorithm of Tezos. Its deterministic finality allows us to make solid claims about the period of time that should pass for a transaction to become final. In Tenderbake, a block becomes final when there are two blocks on top of it; this is the only condition. So, if the system produces one block per 15 seconds, a transaction will become final in about 30 seconds. This is the kind of performance that users expect from a successful blockchain solution.

Historically, Tenderbake started as a variant of Tendermint and was subsequently adapted to fit into the existing Tezos system. However, the description in the Tenderbake paper and its nascent implementation in the Tezos codebase started to drift apart. Thus, for a person starting working on Tenderbake, reading the paper is not enough. It is also not ideal to study the algorithm by reading the Tezos code, since the consensus algorithm is by design implemented in close cooperation with other components of the system.

This is where the Tenderbake simulator project steps in.

The simulation framework

Nomadic Labs asked Tweag to develop a framework that would be general enough to model any consensus algorithm (be it BTF-style, Nakamoto-style or in styles yet to be invented) in a clear way to facilitate onboarding of newcomers and for exploration of consensus algorithms in the future. The results of our work can be found in this repository. The language of choice at Nomadic Labs is OCaml and it made sense to use it for this project. The simulator is distributed under the MIT license, like the Tezos codebase.

Principles of operation

Here’s a (simplified) description that should give the reader a flavour of the simulation framework. The framework comes with a guide that explains in detail how the simulation framework works and how to implement a consensus algorithm in it.

The framework allows us to observe the evolution of a system that comprises a collection of nodes — independent processes that do not share memory but can exchange messages by means of asynchronous broadcast that can be fine-tuned by the user if desired.

Consensus algorithms in the framework are implemented as event handlers — functions that are called when an event occurs at a particular node. A call of an event handler is called an iteration. Event handlers have the following type signature:

type event_handler =
  Algorithm.params ->
  Time.t ->
  Event.t ->
  Signature.private_key ->
  Algorithm.node_state ->
  Effect.t list * Algorithm.node_state

Let’s go over the arguments of the function:

Algorithm.params is the parameters of the algorithm such as e.g. round duration in seconds.
Time.t is the current time.
Event.t is the event the node needs to react to. Currently there are two kinds of events: reception of a message and a “wake up” call that the node can schedule for itself. Message types are defined per consensus algorithm.
Signature.private_key is the private key that every node magically knows. It is used for signing of messages. This is important because the framework allows us to program and use Byzantine versions of nodes, too.
Algorithm.node_state is the node state. The type is defined per consensus algorithm.

The return type of an event_handler is an effect list and an updated node state. An effect in the list can be one of the following:

Broadcast a message to all nodes.
Schedule a wake up call.
Shut down.

Testing

We have written two kinds of tests: stress tests and scenario tests.

Stress tests are about letting an algorithm run for a number of iterations with a large enough network and realistic propagation of messages, including messages getting lost and messages arriving out of order. The framework allows us to specify a set of predicates that must hold at each iteration in such a test. This way we can determine if an algorithm satisfies liveness and safety properties (see also the Liveness and Safety Wikipedia articles). According to the tests, all models that we have written satisfy both Liveness and Safety.

Scenario tests are about adjusting propagation of messages and/or lifetime of nodes in order to model a situation of interest. We can then inspect execution logs and check whether the nodes behaved in the expected way. It is easy to do because simulations return typed logs that we can pattern match on.

Implemented algorithms

We have implemented four consensus algorithms (listed below roughly in order of increasing complexity), and applied stress tests and scenario tests as discussed above:

Leader election, see src/leader_election.
Ouroboros (the simple BFT version), see src/ouroboros.
Emmy⁺, see src/emmy_plus; this is the current consensus algorithm used by Tezos.
Tenderbake, see src/tenderbake; this is the algorithm Nomadic Labs is planning to propose as a future amendment to the Tezos blockchain.

Every algorithm is explained in its README.md. Our focus was not only explaining how a particular algorithm works in principle, but also how it translates to code that the simulator framework can run.

Conclusion

The simulator has already proven useful. People who have tried it out report that it has helped them to understand the algorithm of Tenderbake and experiment with it. In the future, we can expect new consensus algorithms to be implemented and explored using this framework. Please, feel free to give it a try and contribute!

Five questions to Nomadic Labs PhDs — Colin González

2021-05-12T10:00:00+02:00

At Nomadic Labs we are proud to create next-gen software … but we are even prouder to help create the next generation of software scientists!

In this blogpost, we will ask five questions of one of our students, Colin Gonzalez (and a couple of questions of his supervisor at Nomadic Labs).

Over to you Colin … no the mic is over here … that’s right, you’re on air!

Questions for Colin

1. Please present yourself and your academic background

Currently I’m a Ph.D. candidate in Computer Science at the Université de Paris and Nomadic Labs.¹ From 2011 to 2014 I did a BSc in Computer Science at the Université Paris Diderot, now Université de Paris. From 2014 to 2015 I took a year to be a Service Civique volunteer, serving in an NGO called Afev which focuses on non-academic education for disadvantaged children. From 2015 to 2017 I did an MSc in Algorithms and Software Engineering with a focus on Programming Languages, at the Université Diderot. During the second year of the MSc, I got a six-month internship with Yann Regis-Gianas at Université Diderot’s Computer Sciences Lab, on the semantics and implementation of a collaborative spreadsheet system. From 2018 to 2019 I worked as a critical software engineer for the Université Diderot Physics Department. There I developed my interests and skills in critical systems, using formal methods to develop embedded software for satellites. In 2019 I joined Nomadic Labs to start my Ph.D.

2. Tell us more about the topic of your PhD, who is your mentor, research objectives

The topic of my Ph.D. is building tools to develop smart contracts. My main mentor is Yann Regis-Gianas at Nomadic Labs, and I also work with Benjamin Canou from NL and with researchers from the Université de Paris’ Laboratory called IRIF.

One challenge is to make these tools comfortable for end-users unfamiliar with existing smart contract programming languages like Michelson or Ligo. In particular, we see a potential user base in the spreadsheet user community.

The obvious question arises: could we use spreadsheets to design and to build safe and efficient smart contracts? Safety is critical for smart contracts, because they inherently tend to be used in safety-critical applications, and also because by design, their code cannot be modified once they are launched. We intend to use certified compilation and formal methods to make a trustworthy tool.

With respect to efficiency, we intend to use the results of the ESOP‘19 publication, Incremental Lambda-Calculus in Cache Transfer Style by Paolo G. Giarrusso, Yann Regis-Gianas and their coauthors (the incremental lambda-calculus webpage includes the paper and supplementary material). The techniques of the paper could be used to automatically incrementalize smart contracts, improving their efficiency and shortening execution time.

3. What is the value added and innovative side of your PhD thesis (if applicable, in the Tezos blockchain)?

Currently, smart contracts are written in text-based programming languages. People who know programming know how to do this, but what about non-programmers?

Hopefully, a spreadsheet-based tool could make the creation of safe smart contracts more accessible to end users. This could lead a large community of users who are already familiar with spreadsheets being able to launch smart contracts without needing expertise in a `traditional’ programming language.

4. What is the benefit of preparing a PhD and spending most of your time in a private company vs. a University lab?

Both environments are full of brilliant and highly-skilled people. This is key for a young and curious Ph.D. candidate. Many engineers at NL have a Ph.D. diploma, or at least they are very keen on research-oriented topics. This culture of research is ideal to bridge between fundamental sciences and industrial applications.

Transferring scientific breakthroughs to industrial applications is an important challenge. In the context of my Ph.D. at NL, I look forward to developing scientific understanding in consideration of industrial needs. In particular, we expect the results of my Ph.D. to be immediately applicable to implementing Tezos.

5. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs?

I enjoy programming and I enjoy trying out new ideas and seeing them be implemented in a running system! I guess I’m a child of my generation.

Furthermore, direct experimentation is my preferred setup. Often, scientists purposely abstract away technical details. This makes research easier, but then integrating the results back in the real world can be complicated because many technical details remain unclear.

NL embodies the integration of modern computer science along with the best technologies: a philosophy which I find inspiring and want to be a part of.

Questions for Colin’s Nomadic Labs mentor Yann Regis-Gianas

What is your input in the thesis of Colin?

As a mentor, I try to provide two kinds of input: methodological advice and scientific expertise.

Doing a Ph.D. is learning how to solve complex problems that have not already been solved. This is a new setting for students, who are usually required to work on problems with known solutions. Thus, a Ph.D. student must learn to explore a world of uncertainty, based on a rigorous analysis of the problem in question, and must acquire a deep understanding of the state of the art in the domain.

That’s indeed the second kind of input I provide: with my experience in formal methods and programming languages, I can synthesise the state of our knowledge, typically by having my student Colin read the relevant research papers and interact with the scientific community in the field.

Why is this thesis important?

Believe it or not, spreadsheets are the most widely-used programming language!

In VLHCC’05 Christopher Scaffidi and his coauthors published Estimating the number of end users and user programmers, in which they concluded that there would be about 55 million spreadsheets users in the United States of America in 2012, compared to just 13 million professional programmers. Many people with no background in programming can routinely define sophisticated computations using MS/Excel. Many people use spreadsheets to maintain accounts, or even to automate complex accounting processes.

So: what if these spreadsheets could be turned into smart contracts and deployed on the Tezos blockchain?

Not only would this make programming smart contracts easier, but also spreadsheets are a good way to visualize the state of the contract once it is deployed on the chain: such a tool could help democratize blockchains. Perhaps one day we might see people including something like this in their daily workflow!

If spreadsheets are to be used as smart contracts, then we need to make sure their definitions are correct. Unfortunately, spreadsheets can be error-prone, and famous errors in spreadsheets have turned into losses of millions of dollars. Colin’s Ph.D. will provide reasoning tools to help in the simulation and verification of smart contracts, giving mathematically-backed guarantees of correctness before contracts are deployed on the chain (something that should be standard for any smart contract, in fact!).

With his new vision about what structured spreadsheet programming is, Colin might even have an impact on the quality of the spreadsheets written by the spreadsheet industry of the future.

My research affiliation is: Université de Paris, IRIF, INRIA, Équipe-Project π.r2. But as a student, my enrolment is at the Université. ↩

Faster finality with Emmy*

2021-05-03T12:00:00+02:00

We are happy to announce that Emmy* is set to be included in the next Tezos protocol proposal Granada,¹ replacing the current consensus algorithm Emmy⁺. If Granada is adopted, Emmy* will generally halve the time between blocks, from 60 seconds to 30 seconds, and allow transactions to achieve significantly faster finality than under the current consensus algorithm, Emmy⁺.

Specifically, Emmy* updates Emmy⁺ by:

a tweak in the definition of the minimal delay function, and
an increase in the number of endorsement slots per block

both of which bring faster times to finality (in other words: shorter transaction settlement times) without compromising security.³ Please see the TZIP for the specification and issue 1027 for the implementation.

Thanks to these changes in Emmy*:

a block can be produced with a delay of 30 seconds with respect to the previous block if it has priority 0 and at least 60 percent of the total endorsing power per block;
the number of endorsement slots per block is increased from 32 to 256.

With these changes, on a healthy⁴ chain and for a Byzantine attacker with up to 33% stake for instance, the number of confirmations² decreases to 2 blocks, therefore 1 minute, a sixfold improvement.

The following plot shows the number of confirmations (in log scale) for Emmy* vs Emmy⁺ when varying the stake fraction from 0.1 to 0.4. This plot assumes the “forks started in the past” scenario, meaning that we are interested in the finality of a block which already has a number of confirmations on top of it (and therefore, importantly, we know how healthy the chain was in the meanwhile), and we ask ourselves whether this number is sufficient. Here we assume a perfectly healthy chain. In the plot, the highest red point corresponds to 12 confirmations.

To complement the above plot, the following table presents a subset of the data in text form. Each value in the table gives the number of confirmations for a given attacker stake fraction for Emmy* and Emmy⁺ in a “forks started in the past” scenario.

Attacker stake	Emmy*	Emmy⁺
0.1	2	2
0.15	2	3
0.2	2	3
0.25	2	4
0.3	2	5
0.33	2	6
0.35	3	7
0.4	3	12

So the final line above expresses that even if an attacker controls 40% of the stake, in Emmy* we only have to wait three blocks (90 seconds, assuming a good network, because the block time is 30 seconds) to be reasonably sure that our transaction is final, and in Emmy⁺ we have to wait twelve blocks (12 minutes, because the block time is 60 seconds).

Transaction finality in tezos-indexer

We are also happy to announce that tezos-indexer has a new feature: now it can tell you if your transaction is final in Emmy⁺.⁶ For instance: suppose your transaction is included in a block and that the next three blocks get baked in 190 seconds; then the indexer will tell you that your transaction is final with respect to:

the security threshold in Emmy⁺ which is 1e-8 and⁵
an adversary with at most 20% stake.

Note that 190 is just a bit more than 3 * 60 seconds, which is the minimal time needed to bake three blocks.

Now instead suppose that the three blocks get baked in more than 227 seconds: then you will need to check again after one more block/confirmation. Say the fourth block is baked in 60 seconds, so given that 4 blocks have been baked in about 287 seconds, the indexer will tell you that now you can consider your transaction as final.

We are proud to share a web demo for you to experiment for yourself with computing fork probabilities in both Emmy⁺ and Emmy*. It is based on our previous analysis of Emmy. If you want to dig deeper and are familiar with OCaml and Jupyter notebooks, you are welcome to follow these instructions and let your creativity run free.

Please note our previous disclaimer: Our analysis is based on a certain model of the blockchain and of an attacker. In particular, the model assumes reasonable bounds on the communication delays: that blocks and operations are diffused within 30 seconds. The attacker model is that of a baker not following the rules of the protocol: it may double bake and double endorse as it finds best in order to fork the chain. The attacker model of our analysis does not include a more powerful attacker capable of disturbing the network by blocking or delaying messages between honest participants.

Happy baking!

The Tezos mainnet is currently running the Edo protocol. The Florence protocol is currently being voted on. Assuming all goes well and the Florence upgrade is accepted, the Granada proposal will follow, and Emmy* should be part of that. Subject to this all going well — we add this qualification because this is not something that Nomadic Labs can fully control; the voting process is up to you, our dear Tezos community! — then a further upgrade from Emmy* to Tenderbake is planned for a subsequent H proposal (name not yet fixed). ↩
A confirmation refers to a block on top of the block with a transaction of interest to the user. ↩
Emmy* is a Nakamoto-style algorithm (like Bitcoin), which means that its finality is probabilistic. Probabilistic finality means that as time passes we can become “reasonably sure” that a transaction we made is indeed included in the final blockchain (and not on some fork that might die out), where reasonably sure means “with probability of being wrong smaller than some reasonable threshold” — which we quantify as 5 * 1e-9 (five in a trillion). This puts our expectation of being wrong about a block being final at roughly once every two centuries. ↩
We call a chain healthy over a period of time when in this period blocks have priority 0 and (almost) all endorsement slots are filled. A concrete healthiness measure is the delay of the chain with respect to the ideal chain where each block has a delay of one minute with respect to the previous block. ↩
In Emmy⁺ blocks are baked at about 60 seconds, 1e-8 puts our expectation of being wrong about a block being final at the same roughly once every two centuries, which corresponds to the security threshold of 5 * 1e-9 in Emmy*. That is: we use a more rigorous security threshold in Emmy* because it’s faster so we can bake more blocks every two centuries. ↩
This new feature of tezos-indexer will be updated with respect to Emmy* once Emmy* is adopted. ↩

Announcing the report “Possible evolutions of the voting system in Tezos”

2021-04-14T14:00:00+02:00

Nomadic labs has an ongoing research relationship with INRIA (a French national technology research agency).

In the context of this relationship, Nomadic Labs commissioned a short report to explore what a privacy-preserving amendment procedure might look like on Tezos, authored by three experts in voting protocols and cryptography: Véronique Cortier, Pierrick Gaudry and Stéphane Glondu.

There is no plan to implement the contents of the report for now, but we welcome and encourage feedback on its findings: from regular Tezos users, and from research and industry experts.

The report is here and we encourage you to leave comments in this agora post. Thank you.

Meanwhile at Nomadic Labs #11

2021-04-09T10:00:00+02:00

Welcome to our meanwhile series, the ongoing story of Nomadic Labs’ amazing adventures in the Tezos blockchain space. This post is a recap of our activities in the first quarter of 2021, following on from our 2020 recap. As always, you can find out more about us here:

Twitter @LabosNomades ~ Website ~ LinkedIn ~ Technical blog ~ GitLab repo

So here’s what we’ve been up to these past three months:

Edo protocol upgrade and Florence protocol proposal
Easier, safer, better installation of the Tezos codebase
Culture and growth
Announcing 1,000 days of Tezos mainnet
Adoption and support
Dexter
Umami
Forthcoming switch from Emmy⁺ to Emmy^★ and then Tenderbake
NL research seminars
Research and sponsorship
À la prochaine

Edo protocol upgrade and Florence protocol proposal

The Tezos blockchain contains a self-amendment mechanism to upgrade the protocol by community vote, meaning: no need for hard forks, and in-built agility and ability to adopt new ideas in the fast-evolving blockchain space. We intend to follow a regular protocol upgrade schedule.

The recent Edo protocol upgrade on 13 February 2020 went smoothly (block height 1,398,551; cycle 341; changelog).¹ Edo introduced several substantive new features to Tezos:

Zcash Sapling integration (privacy-preserving transactions; see doc and an accessible explanation).
Tickets (mechanism for smart contracts to authenticate data with respect to a Tezos address).
An extended Voting procedure (adds a two-week “Adoption” phase).
… and more. See also a detailed analysis of the Edo upgrade.

Associated to the Edo protocol upgrade is also a new major release of tezos-node that contains an update to the protocol environment², numbered “Version 1”. This is significant because until now, all protocols have used “Version 0”.

Together with Marigold, Tarides, DaiLambda, and Keefer Taylor, we proposed a successor protocol upgrade, Florence. See

One standout feature of Florence is a new arithmetic system (based on saturation arithmetic) for computing gas costs in Michelson smart contracts. Our benchmarks indicate a tenfold speedup of gas computation, and a 35% speedup of the execution cycle of the smart contract Michelson interpreter in Florence overall.³ So smart contracts in Florence can be smarter, for the same gas.⁴

Another notable feature is a migration from BFS to DFS. See also:

The main developer of this feature is ChiaChi Tsai from Marigold. Nomadic Labs’ participation in the BFS to DFS migration was in the design (especially security considerations), to review several versions of the code, and to perform a major replay test.⁵

Easier, safer, better installation of the Tezos codebase

The “How to get Tezos” installation instructions are now automatically checked as part of the build process itself, meaning they are automatically tested and included in documentation and should stay correct and up-to-date even as they depend on an extensive environment in constant evolution, due to

development of Tezos itself (e.g. switching to new protocols, releasing new versions of the platform, adding dependencies), and
evolution of third-party software packages (e.g. the OPAM package manager for OCaml, individual OCaml packages, Linux releases, and so forth).

Technically, this was accomplished by writing a library of executable scripts for each installation scenario (binaries, docker images, compiling from source, …) which are executed in the CI (continuous integration) process itself, and automatically copied over each time documentation is generated and published.⁶

This new system means

early and automatic detection and quick correction of installation problems, and
no more copy/paste errors or stale install scripts in documentation

— thus making installation more reliable and convenient for all Tezos users, from neophytes to experienced developers.

In summary: the Tezos installation scripts and relevant documentation are now part of the testable codebase of Tezos itself.

Culture and growth

Since January 2021 we are delighted to have been joined by three new hires and three interns, bringing our count of full-time employees to 62.

Announcing 1,000 days of Tezos mainnet

As of Friday 26 March 2021, the Tezos mainnet is 1,000 days old. See this subtle poster on Reddit:

1000 Days of Tezos Mainnet from r/tezos

Specifically, the chain was launched on June 30 2018, as per the timestamp of the genesis Tezos block.⁷

(The main Tezos network wasn’t called “Tezos Mainnet” at the time but “Tezos Betanet”. It was renamed on September 17, once devs were confident in its stability. Following the precedent of a famous singer, we might say that “The blockchain formerly known as Tezos Betanet” is now 1,000 days old.)

You can check the arithmetic in a Unix system by running

$ date -d "June 30 2018 + 1000 days"
Fri 26 Mar 00:00:00 GMT 2021

Adoption and support

Nomadic Labs has joined Infrachain, a Luxembourg-based European cross-industry effort to pool blockchain expertise and promote regulatory-compliant adoption. See the announcement (in English) and a short article in Paperjam (in French).
We are pleased to welcome two new institutional bakers on the Tezos chain:
- Wakam (a digital insurance and bespoke insurance creator) is now a Tezos baker (see the press release; also as a pdf).
- The Blockchain Group (a blockchain service and technology group), through its subsidiary The Blockchain Xdev, is now a Tezos baker (see the press release).
Nomadic Labs has helped to deploy Lugh (currency name EURL).⁸ Lugh is a Euro-backed digital asset, meaning that 1 EURL is always equal to 1€ (see press release). If you hold 1 EURL then this corresponds to 1€ in the balance of a certain bank account at Société Générale. Lugh has been reviewed by a “big four” firm (PriceWaterhouseCoopers). See also this short video introducing Lugh, and the Lugh LinkedIn page.

Dexter

Dexter is a smart contract to enable trade between tez and any FA1.2-compliant token (dapp; doc; tutorials). Think: “Uniswap for Tezos”.

Examples of smart contract implementations satisfying the FA1.2 interface include:

The ETH-wrapped token ETHtz (ETHtz homepage)
The USD-pegged stablecoin USDtz (USDtz homepage)
The Wrapped Bitcoin tzBTC (tzBTC homepage)

Thus using Dexter, a user can exchange tez for ETHtz, USDtz, and tzBTC — all on the Tezos blockchain.⁹

Nomadic Labs verified that the Dexter implementation conforms to the Dexter specification — this is called verifying the functional specification — using the Coq proof assistant with the Mi-Cho-Coq framework. This was in addition to a security audit by Trail of Bits and property-based testing by camlCase.

However, we discovered in late February while working on the Florence upgrade proposal that the Dexter specification itself contained an error.

Just because something is a specification does not make it right: specifications can have mistakes just like anything else (indeed we commented on this in the initial blog post). This is why responsible ecosystems like Tezos subject complex systems like Dexter to multiple overlapping verification efforts.¹⁰ So this is just another day in The Real World.

We reported the error and a “White Knight” operation was used to remove funds from the contract — using the error itself, by the way — and to return funds to their owners.

A rewrite of Dexter is under development in a collaboration between Nomadic Labs and LIGO (see this GitLab repo). We at Nomadic Labs are carrying out a formal verification of it. So far, we’ve finished the verification of a functional specification, which brings confidence up to the level of the previous version of Dexter. Obviously we should go further, and the way to do this is to verify high-level properties of the new specification, to check that no further errors exist like the one discovered in the previous version.

In the meantime, and to avoid delays for the existing users of Dexter, the Dexter contract has been patched (patch source code) to fix the flaw, and brought back online.

Umami

Our Wallet team has been diligently at work on the new Umami wallet. Umami is a cryptocurrency wallet for Tezos giving both beginner users and power users convenient access to all features available within the Tezos protocol including: multiple accounts, tokens, batch transactions, and delegation — and eventually, though not in the initial beta: contracts and use of hardware wallets.

From April 20th you will be able to download Umami beta here. See also the Umami GitLab repo.

Forthcoming switch from Emmy⁺ to Emmy^★ and then Tenderbake

We plan to upgrade our consensus algorithm from the current Emmy⁺ to Emmy^★, and shortly thereafter to Tenderbake.

Emmy⁺ is a Nakamoto-style consensus algorithm. This means that it is similar to the Bitcoin consensus algorithm, just adapted to Tezos’ proof-of-stake system rather than the proof-of-work system used by Bitcoin. Emmy^★ refines the Emmy⁺ system in two ways (listed in decreasing order of significance):

Emmy^★ provides a special fast consensus path for when the network is operating normally, which is most of the time. This makes Emmy^★ faster than Emmy⁺ when things are going well — and no slower when they’re not.
Emmy^★ increases the number of endorsement slots from 32 to 256. This is a technical tweak designed to increase stability and participation.

How does Emmy^★ fare at scale in a test that approximates a real-live system? This is what resillience testing is about. TQTezos, with support from Nomadic Labs, has constructed a resilience test network framework based on Kubernetes which allows us to quickly deploy a large testnet on AWS. It takes about 30 minutes to deploy a 400 node private network and start baking on it.¹¹

After the upgrade to Emmy^★ there will follow an upgrade to Tenderbake. Tenderbake belongs to the classical BFT-style family, so the switch from the Nakamoto-style to the classical BFT-style is a big deal. We discuss this in detail in a blog post looking ahead to Tenderbake. In summary:

Tenderbake offers deterministic finality, with finality time of one minute assuming good network behaviour. This means that (assuming the network is stable and not under attack) a block becomes final in one minute (i.e. the transaction settlement time is one minute).
Tenderbake has no forks. If the network degrades, the Tenderbake consensus algorithm waits until connectivity is restored.¹²

NL research seminars

Our series of Nomadic Labs research seminars has continued at a regular pace:

Practical proofs using Juvix (27 October 2020)
Verifying smart contracts using Mi-Cho-Coq (10 November 2020)
Efficient data storage on the blockchain using Plebeia (24 November 2020)
Adding multicore programming to OCaml (08 December 2020)
zkChannels in Tezos (05 January 2021)
Towards mechanised verification of the LIGO compiler (20 January 2021)
SmartPy: the inner workings (02 February 2021)
Bringing K Powered Blockchain Security to Tezos (16 February 2021)
Implementing Checker, a Robocoin Mechanism for Tezos (02 March 2021)
High-level smart contract design & verification with Archetype (16 March 2021)

Research and sponsorship

The Tezos Foundation was a platinum sponsor of POPL 2021 in January 2021 (and of POPL 2020 and POPL 2019), and Nomadic Labs sponsored the colocated CPP 2021 (Conference on Certified Programs and Proofs) (and CPP 2020). Our engineers had a strong presence at these events, including:

On Monday 18 Jan, Arvid Jakobsson presented the formalisation of the Dexter decentralised exchange contract in the Mi-Cho-Coq framework at CPP’s 2021 Lightning Talks session¹³.
On Wednesday 20 Jan, Michel Mauny, Nomadic Labs’ CEO, made a short presentation of our company at POPL’s Sponsor Reception, which also featured a contributed video presenting a general overview of Tezos by Arthur Breitman.
On Friday 22 Jan, Germán Delbianco presented new developments in the algebraic foundations of Concurrent Separation logics at POPL¹⁴.

You can read our full POPL 2021 retrospective here.

À la prochaine

So that’s it! The first three months of 2021 have been eventful and productive, and the next three months surely will be too. Thanks for reading, and do check in again next quarter for the next Meanwhile.

Table of contents generated with markdown-toc

The previous protocol upgrade was Delphi on 12 November 2020 (block height 1,212,417; cycle 296; changelog; significance of the upgrade). ↩
The protocol environment is a library of cryptographic primitives and other useful functions (packaged as an OCaml module). ↩
The term execution cycle in the context of gas costing is the gas cost of decoding instructions, dispatching them, and of gas monitoring itself. So ‘execution cycle’ does not include the cost of actually executing instructions; it is the cost of arranging for instructions to execute … ↩
… thus, a theoretical smart contract in Florence, consisting of instructions that do zero work and cost zero gas, can do this nothing up to 35% faster (and for $1/1.35$ of the gas) than it would in Edo! ↩
A replay test is when chain history is replayed using the newer protocol, to check for breaking changes and to check that the functionality of any existing live smart contracts is not affected. See also the blog post; search for “Replays of on-chain history”. ↩
Contrast this with paper-only scripts, which exist in the documentation but must be manually tested and copy/pasted. ↩
The genesis block of a blockchain is its first block. It is numbered 0 (not 1) because programmers and computer scientists start counting at zero: thus zero is the first number, one is the second, and so on. ↩
Lugh is an Irish deity associated with craftsmanship and skill, roughly corresponding to Mercury in Roman mythology and Apollo in Greek mythology. ↩
Here’s how it works: tez is the primitive token of the Tezos blockchain. Tezos has no other primitive tokens. However, Tezos does have smart contracts, which gives huge flexibility, and in particular ETHtz, USDtz, and tzBTC are smart contracts on the Tezos blockchain which implement ledgers giving the effect of what we will call non-primitive tokens in this footnote — e.g. they include ledger-like entrypoints such as %transfer, %getBalance, and %getTotalSupply.

The FA1.2 standard is a precise specification which smart contracts intending to implement non-primitive tokens, can satisfy. Then Dexter is a smart contract which implements functionality to convert between tez and any non-primitive token, provided that the smart contract implementing the non-primitive token satisfies the FA1.2 specification. ↩
This means that yes, sometimes more work is required so that features ship later than we hoped. That’s not a bug, that’s a feature. And it’s not unique to blockchain. This is what it’s like to work with any complex system, be it a bank, an aircraft design, or getting your child into bed. ↩
Think: “easy test flights, for software”. You can fire up a large test system and watch it run as you tweak parameters, play with network properties, and so check the real-world conditions in which the system performs well as an engineering structure. The infrastructure toolchain is built on Helm, Kubernetes and Docker. You can also use Pulumi to automate your deployment into AWS EKS. For local deployments the toolchain utilises minikube, which enables comfortable set-up of a local network of up to 20 nodes (depending on available RAM).

The code is open-source and accessible via the tezos-k8s github repo. It is in active development, and the developers welcome external contributions in bug fixes, feature code, and requirements definition. ↩
Classical BFT-style algorithms are safe, but not live, whereas Nakamoto-style algorithms are live but not safe. This means that a classical BFT-style algorithm (like Tenderbake) will safely wait out periods of network degradation and continue once connectivity is restored; whereas a Nakamoto-style algorithm will remain live and fork during a period of network degradation, and the fork collapses once connectivity is restored. ↩
Arvid Jakobsson, Colin González, Bruno Bernardo, and Raphaël Cauderlier. Formally Verified Decentralized Exchange with Mi-Cho-Coq. Contributed Lightning Talk to CPP 2021. Thanks also to Kristina Sojakova (INRIA) and James Haver (camlCase) for their contributions to the formalisation effort. ↩
František Farka, Aleksandar Nanevski, Anindya Banerjee, Germán Andrés Delbianco, and Ignacio Fábregas. On Algebraic Abstractions for Concurrent Separation Logics. Proc. ACM Program. Lang. 5, POPL, Article 5 (January 2021), 32 pages. https://doi.org/10.1145/3434286 ↩

Sound and fast gas monitoring with saturation arithmetic

2021-04-02T13:00:00+02:00

Sound and fast gas monitoring? Let’s use saturation arithmetic!

Introduction: we got gas

In Tezos, as with most smart contract platforms, on-chain operations cost gas — a theoretical resource intended to reflect (and so limit) the on-chain computational cost of running a smart contract.

The gas model allocates gas costs to atomic computation steps. When a computation starts it receives some finite allocation of gas, from which the gas cost of each of its atomic computations is deducted as the computation runs, step by step. If the gas allocation is exhausted, the computation is deemed to have become too expensive and is aborted.¹

Three design considerations for a gas system are:

The gas model should be accurate.

We don’t want easy computations to get terminated unnecessarily for lack of gas; and conversely, we don’t want the gas model to allow an attacker to run expensive computations whose cost is undetected by the gas model, thus opening a potential DoS attack surface.

Getting this right is an art: the recent Tezos Delphi protocol upgrade, for example, finessed gas costs significantly.
The gas model should be computed (reasonably) correctly.

That is, the arithmetic of gas computations should be correctly performed. This sounds basic, but as we shall see, there is subtlety to this, because:
The gas model should be computed (fairly) cheaply.

Again, this sounds basic — but computing gas costs is a computation, just like anything else; and gas computations are ubiquitous on Tezos, so there may be a trade-off between efficiency and accuracy.

Let’s talk about how we optimized the cost of gas monitoring while preserving its correctness, thanks to saturation arithmetic. This optimization is a part of the Florence protocol proposal, currently under consideration by the Tezos on-chain voting procedure.

How does gas monitoring work?

Consider a node — call it Maria — on the Tezos network, about to perform a computation $C$ on some value $V$. Before computing $C(V)$, Maria first uses a gas cost model $M$ to compute a gas quantity $G_M(C,V)$ which — if we designed our model correctly — fairly realistically anticipates the real-world cost of actually computing $C(V)$. Note that:

$G_M$ is determined by the cost model $M$.
$G_M(C,V)$ depends on the computation $C$, and also on the value (or values) passed to it. This means that $G_M(C,V)$ must be computed at runtime, when $V$ is known, and cannot be statically determined at compile-time.²

$G_M(C,V)$ has dimension mgas (milligas) units, and $G_M(C,V)$ mgas is subtracted from the remaining gas counter.

If the remaining gas counter falls below 0, then Maria declines to compute $C(V)$, due to it exhausting its available gas.
If the counter remains nonnegative, then Maria proceeds to compute $C(V)$.

Both the gas model and the check for exhaustion perform arithmetic operations over gas. The usual arithmetic operations available in most CPUs implement modular arithmetic over 64-bit integers, in which only the numbers from $-2^{63}$ to $2^{63}-1$ inclusive can be represented.

The OCaml int memory representation reserves one bit for a compiler tag. Thus the effective length of an OCaml integer datum is 63 bits, and the native 64-bit machine instructions yield effective bounds of $-2^{62}$ to $2^{62}-1$ for a programmer using a signed integer in the OCaml compiler on a 64-bit system (or $0$ to $2^{63}-1$ for an unsigned integer).

If the result of an operation (e.g. add or multiply) exceeds these bounds, then we say the operation overflows. The special case of an overflow downwards may also be called an underflow, and we will use this terminology in this post.

For example:

$(2^{62}-1) + 1$ yields $-2^{62}$ (we overflow).
$(-2^{62}) - 1$ yields $2^{62}-1$ (we underflow).

Overflow is dangerous!

The gas model is public knowledge so an attacker knowing that $(2^{62}-1)+1 = -2^{62}$ need only request an operation whose gas cost calculation triggers this addition, to make a gas profit of $2^{62}$ via overflow, thus literally counjuring gas out of nowhere and taking over the system.³

How is it currently implemented?

We currently use the arbitrary-precision integer ZArith OCaml library, by Antoine Miné, Xavier Leroy, Pascal Cuoq, and Christophe Troestler. Edo performs gas computations using ZArith, and the Michelson interpreter uses ZArith for its arithmetic on natural numbers, integers, and timestamps.

ZArith uses a dynamically-sized representation for numbers: any number can be represented, provided it fits in the computer memory. This is safe, but slow. It protects Tezos from overflows by making them impossible, but:

The representation is complex. A mere machine word will not suffice, so arithmetic operations are costly simply because they process a more complex datastructure.
ZArith is implemented as a hybrid OCaml/C library. Calling a C function from OCaml is expensive. ZArith tries to avoid C functions when dealing with small integers, but it cannot fully escape this cost. Thus even an addition over a 63-bit integer represented in ZArith is about five times slower than a typical addition over int type in OCaml.
Inevitably and by design, very large numbers are there to be computed on, and the computational cost scales with their size.

$2^{62} - 1$ `mgas` ought to be enough for anybody

From the Delphi protocol onwards — thus: including the current Edo protocol, and also in the Florence protocol proposal — an operation cannot consume more than $1,040,000,000 \approx 2^{30}$ mgas. So representing mgas with a native OCaml 63 bit integer seems reasonable; on a 64-bit architecture the integer limit of $2^{62} - 1 = 4,611,686,018,427,387,903$ mgas would be hit in future evolutions of the protocol only if we multiply the gas limit by several billion.⁴

Once the cost model reaches $1,040,000,000$ mgas the operation will be cancelled, and that’s all we need to know. So integer precision is not an issue and we just need to consider the potential for over- or underflow.

Saturation arithmetic equips bounded integers with operations that do not overflow/underflow, but instead might saturate — they default to the biggest/smallest value that fits within the bounds (as we discuss next).

Because $2^{30}<2^{62}$, our final results cannot saturate, and while there might exist intermediate computations that could saturate, we commit to writing code that will detect this and correct as necessary.

How to implement saturation arithmetic?

Saturation arithmetic is a well-known technique to avoid underflow and overflow while computing with bounded integers. In case of an overflow / underflow, the result is replaced by the maximum / minimum value. We only need nonnegative integers for gas — remember if we drop below 0 we stop computing — so the minimum is 0 and the maximal is max_int, which in OCaml running on a 64-bit system is $2^{62} - 1$. Thus:

$(2^{62}-1)+1$ yields $2^{62}-1$.
$0-1$ yields $0$.

Saturation arithmetic operations are available in machine instruction sets like MMX or AVX2. However, we prefer to implement them in software on top of standard modular arithmetic operations, for portability (the efficiency hit is modest).

The challenge is therefore to efficiently detect when an overflow or underflow has occurred:

For addition: for $0\leq x,y<2^{62}$, we have that $x+y<0$ if and only if the computation of $x+y$ overflows (where here $+$ denotes the modular arithmetic operation). Thus to check whether an addition has overflowed, we just check that the result is nonnegative.
For subtraction: for $0\leq x,y<2^{62}$, we have that $x-y<0$ if and only if the computation $x-y$ underflows. Thus to check whether a subtraction has underflowed, we just check that the result is nonnegative.
Addition cannot underflow, and subtraction cannot overflow. Division can neither overflow nor underflow.
The interesting case is multiplication, as we now discuss.

We use a bespoke OCaml module saturation_repr.ml (see lines 73 and 98):

 1 let saturated = max_int
 2
 3 let small_enough z =
 4   z land 0x7fffffff80000000 = 0
 5 
 6 let mul x y =
 7   (* assert (x >= 0 && y >= 0); *)
 8   match x with
 9   | 0 ->
10      0
11   | x ->
12      if small_enough x && small_enough y then x * y
13      else if Compare.Int.(y > saturated / x) then saturated
14      else x * y

A multiplication between two nonnegative integers overflows when y > saturated / x. We see this check on line 13 above, but it requires a slow division so we try to avoid it on line 12 with a fast small_enough bitmask test that is sound, but not complete — an integer that passes small_enough is certainly small enough, but some integers are close enough to the bound that they may need re-checked the slow way. In practice, most executions run through the fast path.

An additional implementation trick: we use a phantom type to statically track the integers that are known to be safe for multiplication. The clients of the saturation arithmetic module are then offered several multiplications depending on the static knowledge they have about their arguments:

(** [mul x y] behaves like multiplication between native integers as
   long as its result stays below [saturated]. Otherwise, [mul] returns
   [saturated]. *)
val mul : _ t -> _ t -> may_saturate t

(** [mul_safe x] returns a [mul_safe t] only if [x] does not trigger
    overflows when multiplied with another [mul_safe t]. *)
val mul_safe : _ t -> mul_safe t option

(** [mul_fast x y] exploits the fact that [x] and [y] are known not to
   provoke overflows during multiplication to perform a mere
   multiplication. *)
val mul_fast : mul_safe t -> mul_safe t -> may_saturate t

(** [scale_fast x y] exploits the fact that [x] is known not to
   provoke overflows during multiplication to perform a
   multiplication faster than [mul]. *)
val scale_fast : mul_safe t -> _ t -> may_saturate t

What are the gains?

Here are the running times of a micro-benchmark which does 10,000 sequences of 10 operations over random pairs of integers:⁵

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith add     │ 2,993.12us │
│ Standard add   │   316.32us │
│ Saturation add │   411.82us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith sub     │ 2,340.94us │
│ Standard sub   │   322.81us │
│ Saturation sub │   421.24us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith mul     │ 3,285.63us │
│ Standard mul   │   321.70us │
│ Saturation mul │   479.08us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith div     │   481.98us │
│ Standard div   │   432.28us │
│ Saturation div │   436.89us │
└────────────────┴────────────┘

For addition, subtraction, and multiplication,

the ratio between modulo and ZArith arithmetic operations is roughly $10$, while
the ratio between modulo and saturation arithmetic operations is roughly $1.25$.

Unsurprisingly, there is no significant performance difference for division.

We see above that

saturation arithmetic is almost as fast as standard modulo arithmetic, and
provided you steer clear of division, it is ten times faster than ZArith.⁶

In the Michelson interpreter (documentation; nice overview) every execution cycle starts with a gas monitoring operation, and using saturation arithmetic makes this operation quicker: the gas counter can be represented in a machine register (which was not the case in Edo using ZArith); the OCaml compiler can inline the saturation arithmetic operations; and the cost model execution is faster because it uses a more efficient arithmetic. These optimizations taken together reduced the real computational cost of the Michelson execution cycle, and correspondingly allowed us to reduce by around 35% its gas cost in the Florence gas model.

When the node executes the following three well-known contracts, the gas consumed by the execution cycle⁷ is measured as follows (units are milligas; mgas):

┌──────────┬────────┬──────────┐
│ Contract │ Edo    │ Florence │
├──────────┼────────┼──────────┤
│ Dexter   │ 44,867 │  28,813  │
│ FA1.2    │  9,718 │   6,238  │
│ Manager  │  3,764 │   2,301  │
└──────────┴────────┴──────────┘

We used a bespoke version of Tezos to isolate the relevant signal in the measurement above, which we are happy to make freely available for your convenience. Code pointers for the contracts above are:

The Dexter (v1) contract (script).
The FA1.2 contract (script).
For the Manager contract, see this Manager script from the Mi-Cho-Coq repo.

What are the risks?

The system is still protected from an attacker armed with a hostile operation that requires a very large amount of gas: the cost model will saturate; the gas counter will be set to 0; and the hostile operation will be cancelled. So saturation arithmetic protects the node from this type of attack just as well as ZArith.

However, a more subtle issue is that we lose some standard arithmetic identities, because saturation arithmetic operations treats saturated values just like any other value — max_int and 0 serve double duty as themselves, and as overflow values, and this distinction is not recorded. For example:

$(x - y) + y = x$ is invalid in general since $x - y$ can saturate to $0$. For example, (0-1)+1 = 1.
$(x + y) - y = x$ is invalid in general because $x + y$ can saturate to max_int. For example, (1 + max_int) - max_int = 0.
We lose the ability to rebracket in general. For example, (0-1)+1=1 whereas 0-(1-1)=0.

This makes saturation arithmetic potentially counterintuitive, which could in principle lead to programmer error. For instance, a programmer needs to be aware of situations where an intermediate computation might saturate (even if the final result is expected to be in bounds). However, the risk of such error seems fairly low, for one general reason and one specific reason:

The general reason is that OCaml is a strongly-typed programming language and integers equipped with saturation arithmetic are represented in OCaml by an abstract type, such that the programmer is reminded (by the type) that these are saturation arithmetic integers, and only saturation arithmetic operations can be used to manipulate its values.
The specific reason is that the cost model implementation should only need increasing functions, so the pitfalls described above should not arise as long as the functions are written to correctly propagate saturation. Aside from this, the rest of the gas monitoring subsystem is just trying to efficiently decrement a nonsaturated gas counter; if it saturates to 0, the operation is cancelled.

So in practice, saturation arithmetic is a good fit for our needs, and adds efficiency.

Conclusions and future work

So in conclusion: the 63-bit native integers of OCaml running on a 64-bit system are precise enough for gas monitoring using saturation arithmetic operations. Moving from a general-purpose arbitrary-precision C library in Edo, to a bespoke OCaml module implementing saturation arithmetic operations in Florence, allowed us to

speed up gas arithmetic by a factor of ten,
increase overall performance⁸ of an execution cycle of the Michelson interpreter in Florence by 35%,
and thus globally improve the efficiency of the gas monitoring subsystem.

For future work:

We believe that saturation arithmetic could be further optimized by making it branch-free.
Some automatic verification would be useful to further eliminate risk, by checking that the cost model implementation only uses increasing functions (and raise an alert if non-increasing functions are introduced).

You can also read more about Nomadic Labs’ work on gas costing in a previous blog post on Delphi.

So gas is a proxy for computational complexity, like money is a proxy for value. The gas model is like a pricelist, and the gas allocation is a budget. Computations that exceed their budget, get shut down. Nomadic Labs is advertising an open internship on gas model validation. If interested, please e-mail careers@nomadic-labs.com. ↩
Gas costs depending on values is common for Michelson because we have built-in data structures and arbitrary-precision numbers. Contrast with e.g. the Ethereum Virtual Machine’s gas model (EVM), in which almost all instructions have a fixed gas cost. ↩
Early flight simulators were vulnerable to integer underflow: if you dived towards the ground and picked up enough downward speed, you could underflow the speed counter and find yourself travelling upwards at incredible velocity. It was possible using this trick to “hop” indefinitely with an empty fuel tank. Unfortunately this issue was not restricted to games. ↩
… but not on a 32-bit architecture, since $log_2 (1040000000) = 29.954$. The Tezos node only runs on 64-bit architectures so that is not a problem. ↩
We compose several operations to reach situations where large numbers appear in ZArith. ↩
In fairness to ZArith, it is good at what it does. But we only need to count to $2^{30}$, so we just don’t need all the power that it offers. ↩
In practice, the cost of the Michelson interpreter cycle is only part of the full picture, and other costs may intervene (as for any large system), e.g. deserialisation costs. We are of course actively optimising these too (for instance, we recently optimised deserialisation costs by a factor of around 10, see this MR). ↩
The speedup per arithmetic operation from ZArith to saturation arithmetic is approximately 10, as observed above. The execution cycle includes other operations — typically dispatching over instructions and moving to the next instruction — so the overall reduction is 35%. ↩

Tezos calling convention migrating from Breadth-First to Depth-First Order (BFS to DFS)

2021-03-08T18:00:00+01:00

Summary: If the Florence proposal is adopted, we recommend you do not deploy new Michelson contracts that are dependent on the BFS calling convention. We do not expect this to be a problem in practice. However, those planning on deploying contracts in the near term should check that their contract’s correctness is unaffected by the change in calling convention.

The current calling convention for intercontract calls in Tezos is that they are added to a “first-in, first-out” queue, also called a “BFS” (or “breadth-first”) approach.

The proposed Florence protocol update includes a switch to a “first-in, last-out” convention, also called a “DFS” (“depth-first”) approach.

Early in the history of Tezos, the decision was made to use a BFS calling convention for intercontract calls. This was motivated by theoretical work that appeared to show that BFS would be superior to DFS for this purpose.

However, experience indicates the opposite:

the BFS calling convention can confuse some developers and cause errors, and
it complicates porting contracts from other chains, where DFS calling conventions dominate.

A number of mechanisms were considered for the Florence proposal to add a DFS calling convention in addition to BFS. However, when mechanisms for backwards compatibility with legacy contracts were considered, subtle bugs were frequently found that would impact the correctness of existing contracts, and which could also render future contracts unsafe.

In practice reasoning about a mixed calling convention is hard, and accordingly the Florence proposal contains a straightforward change to switch the Tezos intercontract calling convention from BFS to DFS.

Replays of on-chain history indicate that this migration does not break the ordinary functionality of any existing live contracts. Because the BFS convention typically provides few useful guarantees, it appears that current contracts deployed on the chain are insensitive to calling order (we speculate that authors just found reasoning about BFS calling order too difficult, so avoided depending on it).

However, new contracts with a calling order dependency might get added to the chain between the injection of Florence and its adoption. Therefore, we recommend you do not add new contracts that depend on BFS calling conventions to the chain — if Florence is adopted, such contracts would break.

Note that, unless you explicitly built a contract to depend on the BFS calling convention, it probably doesn’t.

We will continue to monitor the chain for new contracts that might depend on calling convention ordering, and attempt to contact their authors.

Although this upgrade path is imperfect, it seems the best available option for improving both the smart contract developer experience and the future safety of contracts deployed on the Tezos network. The alternatives would have increased complexity and created unacceptable scope for error.

The BFS calling convention for smart-contract interactions was an unfortunate design flaw. However, Tezos can self-amend and is always evolving, so we in the Tezos community can solve this issue by adopting the Florence amendment through an on-chain vote.

Baking Accounts proposal contains unexpected breaking changes

2021-03-08T17:00:00+01:00

Summary

Ongoing testing and review of baking accounts has uncovered some important and previously undocumented breaking changes (see the section on breaking changes in the TZIP for Baking Accounts) in the baking account proposal.

These issues are significant, and affect the functionality of both existing and future smart contracts; they are detailed below. Bakers should please these carefully when casting their vote.

We believe Baking Accounts should be postponed until a thorough audit of functionality is complete, or an alternative implementation produced. The version of Florence without baking accounts is a safer choice.

SOURCE and SENDER changes

Consensus keys can not be SENDER (or SOURCE)

After the migration to Florence, a baker’s consensus key can never be SENDER (or SOURCE). This means they cannot authenticate themselves in the usual way to smart contracts (through SENDER, though some contracts incorrectly use SOURCE).

This includes all current delegate keys, including those of inactive delegates. These keys will become consensus keys upon the migration.

In particular, if any tokens (FA1.2, FA2, or similar) are sent to consensus keys, or already owned by delegates upon the protocol migration, these tokens will be locked, unless an allowance or an operator was set up before the migration.

Any manager.tz contracts (like those created automatically from originated “accounts” during the Babylon migration) which are managed by delegates

will become inoperable, and
their tez will be locked,

unless emptied before the migration. Roughly 85,000 ꜩ are currently in this state.

The “bakers registry” contract would need to be updated and redeployed to account for this, since it expects a baker to use their key hash as SENDER at least once (to set a “reporter” address). A new version of this contract should use the baker hash instead.

There could be other examples. For instance: any contract with an “owner” or “admin” address set to a delegate key would become impossible to administrate upon the protocol migration.

Rotating the consensus key out by setting a different one does not solve this problem. Any consensus key is permanently unable to be SENDER or SOURCE, until a future protocol fixes the problem.

SOURCE might not be an implicit account

The SOURCE instruction normally returns the address of the account responsible for invoking the call in the first place. Since the Babylon upgrade, the invariant that SOURCE is always an implicit account has held.

If a baker consensus key is used to initiate a transaction, the SOURCE instruction in Michelson will return the baking account address, and not the address of an implicit account.

Code that relies on SOURCE necessarily being an implicit account would be broken. We are not currently aware of any contract that relies on this behavior.

SOURCE = SENDER does not imply a top-level call

The interaction with SENDER is potentially more problematic. In the case of the consensus key being used as part of a multisig call to trigger the baking contract to send out transaction, the instruction SENDER would also return the baking account address.

A common pattern is to use SOURCE = SENDER to ensure that a transaction is happening at the top-level and cannot be interleaved with other calls. With baking accounts, it would no longer be true that SOURCE = SENDER implies a top-level call. Indeed, using the consensus key to trigger a baking account to release a series of transactions creates a non-top-level call where SOURCE = SENDER nonetheless.

However, given the change to the DFS calling convention in Florence, it’s no longer possible to use a top-level call to interleave transactions, which seemingly mitigates the issue.

A design where SOURCE represents the implicit account associated with the consensus key and SENDER the baking account would be more consistent. It could be folded in later on, so long as no application comes to depends on the semantics detailed above.

Calls to implicit accounts can fail

Calls to implicit accounts never fail when the burn limit is sufficiently high, the gas limit is sufficiently high, and the amount is positive.

In baking accounts, when a baker rotates to a new key, there is a period of time where the key is pending. During that period, transfers to that implicit account fail.

This can break contracts. For instance, the auction contract from tzcolors breaks as follows: A user could maliciously create an implicit account, bid on an item, delete the implicit account, and then register the account as a consensus key. If someone tries to outbid them, the transaction would involve a transfer back to the current winning bid, which would fail as it is the pending consensus key of a baking account. This would make it impossible to outbid the malicious bid.

While it might be possible for tzcolors to migrate their auction contract in the interim, the functionality could not be replicated with the baking account proposal. Instead, participants who are outbid would need to manually claim their bids back. Other versions of this contract are also deployed on the chain and would face the same issue.

CREATE_CONTRACT fails with some legacy code

The baking account proposal changes the Michelson instruction SET_DELEGATE to take a option baking_hash instead of a option key_hash. Contracts which rely on a key_hash are marked as legacy, and can continue to use the old instruction, but new contracts can only use the new version of the SET_DELEGATE instruction.

This breaks contracts that work as contract factories, if they try to automatically deploy new contracts with the legacy SET_DELEGATE instruction. This includes Kolibri and wXTZ. Both contracts are upgradeable and could adapt to the change, but the activation of the baking account proposal could distrupt their operation.

Conclusion

Due to these breaking changes, we believe that

baking accounts are not currently appropriate for on-chain use, and
it is safest to accept the proposal without them.

A future version of Baking Accounts which does not break current contracts and preserves important invariants is possible, and should be developed to take its place.

Baking accounts is a feature whose design and implementation have proven a significant challenge, because:

it requires extensive code updates; and
it changes the delegation system and thus how Tezos as a blockchain operates.

This underscores that Tezos development needs to provide more room for specifying precisely and systematically how new features integrate with the existing codebase. The Tezos Improvement Proposals (TZIPs) process is a much-needed step in this direction, and we are acting to integrate this even more tightly into our development process.

By conditioning the implementation and integration of new features on the formulation of comprehensive, community-reviewed specifications, we increase our chances of catching issues like those presented in this article. From this perspective, the Baking accounts TZIP was a missed opportunity, and we believe it is in our interest as a community to allocate more time and attention to this process.

Florence: Our Next Protocol Upgrade Proposal

2021-03-04T15:30:00+01:00

UPDATE: We believe that the baking accounts implementation is significantly flawed. See: Baking Accounts proposal contains unexpected breaking changes

This is a joint announcement from Nomadic Labs, Marigold, DaiLambda, and Tarides.

As we described in this post, several development organizations in the Tezos ecosystem are now collaborating to submit protocol upgrade proposals every few months, which is the interval permitted by the Tezos on-chain governance process. When the Edo upgrade went live on February 13, we mentioned that a new protocol proposal, codenamed “Florence”, would soon be ready. We are pleased to announce that, as expected, “Florence” is now complete.

We are offering the community two versions of the Florence proposal to choose between, one with Baking Accounts (as described below) and one without; we’ll explain the rationale for this decision later in this blog post.

The hash of the proposal with baking accounts is: PsFLorBArSaXjuy9oP76Qv1v2FRYnUs7TFtteK5GkRBC24JvbdE
The hash of the proposal without baking accounts is: PsFLorenaUUuikDWvMDr6fGBRG8kt3e3D3fHoXK1j1BFRxeSH4i

Florence has a number of bug fixes and small improvements; we encourage you to look at the change log. Below we will discuss some of the more interesting and important changes:

Increased Maximum Operation Size: Previously, the maximum size of an operation was 16kB. In Florence, we propose to increase it to 32kB. Among other things, this has the effect of slightly more than doubling the maximum size of a smart contract, which should be of interest to some developers with particularly complicated applications.

Gas Optimizations: We have again reduced gas consumption in smart contract execution by increasing the efficiency of gas computation inside the Michelson interpreter. This allows for smart contracts with more complicated functionality to operate economically on the chain. We will continue to work on further efficiency improvements in coming versions of the protocol.

Baking Accounts: Previously, token holders delegated to a baker by specifying that baker’s public key hash. This meant that bakers could never change their public keys, which was exceptionally inconvenient. The new “Baking Accounts” feature alleviates this issue. In Florence, a new account type has been added to represent accounts managed by bakers. These accounts are Michelson smart contracts running a fixed multisig script. This feature lets bakers renew and split their consensus keys without moving to a new address and asking their delegators to follow. In addition to the usual internal operations, baking accounts can also emit baking operations such as proposing and voting for protocol amendments.

(The rights granted to the baking key (baking, endorsing, voting, and spending the funds) remain unchanged. However, the system also allows a baker to vote and access their funds using multisig authentication. Not using the baking key for such tasks reduces the risk of it being exposed, and the baking key can also be rotated in a worst case scenario.)

Although we strongly believe that Baking Accounts are an important new addition to the Tezos protocol, and although we have made considerable effort to make them backwards compatible with existing code, we recognize that client libraries, wallets, indexers, and other software will require some work to fully support Baking Accounts. We are thus providing the community with the opportunity to decide for itself during the Proposal Period whether or not to include this feature in the Florence update.

Depth First Execution Order: Previously, intercontract calls were executed in a so-called “breadth first” ordering. This was believed to be the correct choice when the Tezos protocol was initially designed, but it has turned out to significantly complicate the lives of smart contract developers. If Florence is adopted, the calling convention will change to a “depth first” execution order. This will make it far easier to reason about intercontract calls.

No More Test Chain: Previously, during the voting process, a test chain would be spun up during the “testing period” which took place between the exploration and promotion voting periods. The intent was that this test chain be used to assure that the new proposal worked correctly, but in practice, the test chain has never been used in this manner, and has caused significant operational problems to node operators. The new proposal eliminates the test chain activation; the testing period has been retained but is now named the “cooldown period”. Instead, we will continue to test the protocol using test chains that operate outside of the mainnet voting process.

This protocol amendment and the related updates to the Tezos shell were developed by programmers from Nomadic Labs, Metastate, DaiLambda, Marigold, Tarides, and an external contributor, Keefer Taylor, to whom the proposals grant an invoice of ꜩ100 to thank him for his merge request that increased the maximum operation size.

Now that Florence is code complete, we encourage you to test your own Tezos related applications to check for compatibility problems. Docker images with both proposals are now available and two testnets, one with and one without Baking Accounts, will soon be available as well.

Supplementary Information:

Gitlab Repository with Baking Accounts

Gitlab Repository without Baking Accounts

A docker image containing both proposals and the necessary code to run them may be obtained by running:

docker pull tezos/tezos:master

The test network for Florence will be called Florencenet. This will running the proposal with new baking accounts feature, whose protocol hash is: PsFLorBArSaXjuy9oP76Qv1v2FRYnUs7TFtteK5GkRBC24JvbdE
The test network for Florence without baking accounts will be called FlorenceNoBAnet. The hash of that protocol proposal is: PsFLorenaUUuikDWvMDr6fGBRG8kt3e3D3fHoXK1j1BFRxeSH4i

A technical description of the Dexter flaw

2021-02-25T15:00:00+01:00

In this technical blog post, we detail the flaw found in the Dexter contract and the exploit used to “white-knight” the funds in those contracts.

Background

The Dexter contract contains several entrypoints allowing users to perform various operations, such as adding and removing liquidity, or converting tokens to tez back and forth. The exact interface is given by the type of the contract’s parameter:

parameter (or
            (or
              (or
                (pair %approve (address :spender)
                               (pair (nat :allowance) (nat :currentAllowance)))
                (pair %addLiquidity (pair (address :owner) (nat :minLqtMinted))
                                    (pair (nat :maxTokensDeposited)
                                          (timestamp :deadline))))
              (or
                (pair %removeLiquidity
                  (pair (address :owner) (pair (address :to) (nat :lqtBurned)))
                  (pair (mutez :minXtzWithdrawn)
                        (pair (nat :minTokensWithdrawn) (timestamp :deadline))))
                (or
                  (pair %xtzToToken (address :to)
                                    (pair (nat :minTokensBought) (timestamp :deadline)))
                  (pair %tokenToXtz (pair (address :owner) (address :to))
                                    (pair (nat :tokensSold)
                                          (pair (mutez :minXtzBought)
                                                (timestamp :deadline)))))))
            (or
              (or
                (pair %tokenToToken
                  (pair (address :outputDexterContract)
                        (pair (nat :minTokensBought) (address :owner)))
                  (pair (address :to) (pair (nat :tokensSold) (timestamp :deadline))))
                (or (key_hash %updateTokenPool) (nat %updateTokenPoolInternal)))
              (or (pair %setBaker (option key_hash) bool)
                  (or (address %setManager) (unit %default)))));

Of particular interest is the tokenToXtz entrypoint, which allows the swapping of tokens for tez. In Tezos, entrypoints are represented as different cases of a single sum type representing the expected parameter. The type for tokenToXtz is given as

(pair %tokenToXtz (pair (address :owner) (address :to))
                                    (pair (nat :tokensSold)
                                          (pair (mutez :minXtzBought)
                                                (timestamp :deadline)))))))

This means that the entrypoint expects a record with the following fields

deadline : the time until which the transaction requested is valid
minXtzBought : the minimum amount of tez requested. If the tokens were to be swapped for fewer tez, the transaction be considered invalid, protecting the seller against excessive slippage
tokensSold : the number of tokens the seller desires to sell
to : the address where the tez proceeds are to be sent
owner : the owner of the tokens being sold

When the Dexter contract receives requests, it performs a series of computations to determine how many tez should be sold and, ultimately, emits two operations. One which sends tez to the “to” field, and the other, directed at the FA1.2 token contract, which request a transfer of tokens from the owner to the Dexter contract itself. If any of these transactions fail, the whole transaction fails.

An FA1.2 transfer is valid if and only if the contract calling the FA1.2 contract owns the tokens being transferred, or if it is present in the allowance list for the owner of the tokens. This is a pattern found in ERC20 as well.

When Dexter users sell tokens to Dexter, they typically first make a temporary allowance in the FA1.2 contract to allow Dexter to access their funds.

Flaw

The presence of an owner field in tokenToXtz raises two problems. First, it means a user is able to spend the token of anyone who has an open allowance to Dexter. While it is possible to recommend that allowances only be set temporarily and be reset to 0 after every interaction with the contract, this was not, in fact, done, and we found that several liquidity providers had small dangling allowances left after adding liquidity.

More problematic is the fact that the owner field could simply be set to the Dexter contract’s address itself. Since the caller of an FA1.2 contract is always allowed to spend its own funds, this means that the Dexter contract could be instructed to dispense tez by sending itself its own tokens. Self-transfers are normal, and even required in the FA2 standard.

Dexter implements a constant formula market maker which attempts to maintain constant (before fees) the product of the quantity of the two assets it holds. By asking the Dexter contract to send itself its entire token balance, it’s possible to collect (before fees) roughly half of its tez balance.

The presence of an owner field in tokenToXtz was unnecessary, and its only safe value would be SENDER, which renders it useless as a parameter.

Conclusion

It is important for authors of smart contracts to consider and test edge conditions of all sorts on all possible calls. This is especially true in situations where a contract can be told to transfer value to or from a third party. Creators of contracts should strongly consider avoiding such facilities entirely when they are not critical to the function of a contract. When providing such facilities, it is vital to consider whether edge cases might result in unexpected behavior.

Dexter Flaw Discovered; Funds are Safe

2021-02-20T04:00:00+01:00

TL;DR: A flaw was found in the camlCase’s Dexter contract. The funds have been removed from the contract and returned to their original holders.

A high level explanation follows; technical details of the Dexter flaw will be described in a separate post to come.

As many of you know, we have been working on a new Tezos upgrade proposal. This proposal, if accepted, will change the calling convention from breadth first ordering to depth first ordering. In the course of reviewing potentially affected contracts, we stumbled into an unrelated but serious flaw in Dexter permitting unauthorized withdrawal of funds.

We provided information on the vulnerability to camlCase, and, following responsible disclosure policies, we have waited to make a public statement until public discussion of the vulnerability would no longer harm the participants in Dexter.

Since then, a so-called “White Knight” operation has been conducted in which the funds were removed from the contract using the bug itself and then returned to their rightful owners. We are informed that this operation has now concluded, and all funds are safe.

Given the apparent substantial community interest in availability of a distributed exchange contract, a team led by Nomadic Labs has, in the interim, rewritten the contract to avoid this bug, and is in the process of proving that this particular class of bugs cannot re-occur. We will be releasing both the code and specification for this contract shortly.

When we first published our blog post announcing the formal verification of the Dexter contract, we wrote some important things about what we had verified:

As with any formal verification, there are limits to our development. Some are inherent to software verification, while others stem from limitations of Mi-Cho-Coq.

We then went on to list several limitations, including this:

Formal verification of the soundness of Dexter’s specification.

By soundness we mean conformity to some “common sense” economic properties; e.g. that an attacker can’t remove the funds of a liquidity provider without permission.

This is simple garbage in, garbage out: we can check whether an implementation satisfies its specification, but what if the specification’s wrong? For instance, a specification’s author might inadvertently permit an attack by which an attacker would remove funds that are not theirs, by simple human error or by not fully understanding the domain logic of the system.

Unfortunately, our words have proven prophetic.

We again note that formal verification can only check if a contract meets a given specification, but not if the specification is incorrect.

As we have noted, we will soon be publishing the specification of the new contract. We invite interested members of the community to examine the new specification and provide feedback on whether our belief that the proper safety and correctness properties are now present is correct.

We look forward to the deployment and adoption of this new distributed exchange contract by interested parties once the community is satisfied that it is secure.

The Protocol: from High-level Command Line to Low-level Operations

2021-02-16T18:00:00+01:00

This note explains and illustrates flow of control in Tezos using the example of carrying out a simple transaction. We will go from the high-level command line call to the low-level account operations. To guide you through the codebase, we give line numbers based on release 8.2 (commit 6102c808).

Where do I start?

Our scenario. Imagine that Alice and Bob have one account each, and Alice wants to send ꜩ10 from her account to Bob’s. Alice can proceed by using a transfer operation. This transaction will trigger some additional transactions, which will be handled automatically by the system:

Alice must pay baking fees (tokens paid to the baker who writes Alice’s main transfer onto the blockchain).
Alice may also have to pay for key revelation to reveal the public key of her account.¹

Here’s a picture:

The easiest way to execute our example is to run a node locally by using the sandboxed mode. In the terminal, we just type:

$ tezos-client transfer 10 from alice to bob

Inside the code

As far as the Tezos codebase is concerned, everything is a contract. This includes user-owned accounts (called implicit accounts in the codebase) and smart contracts (called originated accounts in the codebase). This is reflected in the variable and function names in the code cited below, so for consistency we may also call everything a “contract” — but note that in our example, the source and destination contracts are actually the user-owned (i.e., implicit) accounts of Alice and Bob.

Step 1: Dispatching the client command

The flow of control starts with the commands function, whose code is in client_proto_context_commands.ml (l.254). This determines the command-line syntax for the switches following an invocation of ‘tezos-client’.² In our example, we have invoked the transfer sub-command (l.893).
Control now flows to transfer_command (l.110), which distinguishes the type of the source contract (in the pattern-matched variable sourcel.136) — in our case it is implicit and admits a public key hash. Flow of control passes to another function, transfer (l.167).
transfer is in client_proto_context.ml (l.73). Its job is to build the transaction operation itself, using a function build_transaction_operation. In our case, the operation is a GADT Transaction of type manager_operation (discussed below). In particular, its field amount contains the amount of tez to be transferred (excluding baking fees). transfer wraps the operation in an annotated_manager_operation by function prepare_manager_operation, adding fee and gas information; then it further wraps the operation into the GADT Injection.Single_manager of type annotated_manager_operation_list; and finally it passes this all on to function inject_manager_operation, detailed below.

Manager operations are operations that compete between themselves for inclusion in a block. This is in contrast to other types of operations such as consensus or governance related operations. The competition is realized through the proposed fees, which go to the baker selecting the operations to be included in the block. There are four types of manager operations, defined by the type manager_operation in operation_repr.ml using the following four constructors:

Reveal for the revelation of a public key, a one-time prerequisite to any signed operation, in order to be able to check the sender’s signature.

Transaction of some amount to some destination contract.

Origination of a contract using a smart-contract script and intially credited with the amount credit.

Delegation to some staking contract (designated by its public key hash).

Although we usually use tezos-client to inject a single operation, the economic protocol use a built-in notion of a list of operations (also called “packed operation”) that can be injected as a whole, for efficiency reasons. In this case the constructor Injection.Cons_manager, instead of Injection.Single_manager, would be used to build a value of type annotated_manager_operation_list.

Step 2: Injecting the operation

The function inject_manager_operation in injection.ml (l.911) proceeds, from the previous list of annotated manager operations (of type annotated_manager_operation_list), by creating a list of contents (of type contents_list) where each content (of type contents) is a GADT Manager_operation. This mapping is performed by the local function build_contents, and for each element by the local function contents_of_manager. These functions deconstruct and reconstruct the list of annotated manager operations, providing information for the missing values (fee, gas limit, storage limit).

Then, function inject_manager_operation proceeds according to two cases:
1. If Alice’s public key has not yet been revealed, then the function adds one more manager operation to reveal her public key.
2. If Alice’s public key has been revealed, then the next function is directly called with this list of contents, which is the case in our example.
The function inject_operation (l.773) completes the final steps of injecting into the node. It:
- computes estimated gas and storage (in function may_patch_limits),
- checks compliance with cap fees (in function may_patch_limits),
- performs a simulation (by calling preapply), and finally
- injects the operation (by calling Shell_services.Injection.operation in l.825).

Step 3: Dispatching the operation

The calls to Alpha_block_services.Helpers.Preapply.operations in preapply for simulation and to Shell_services.Injection.operation for injection from Step 2 actually represent two RPC calls from the client to the node. Such RPCs are declared in src/lib_shell_services and implemented in src/lib_shell/ (in files with the name of the form *_directory.ml). To serve these two RPCs, the shell part of the node simply dispatches the operation execution to the relevant economic protocol, here Alpha.

Step 4: Applying the operation in the economic protocol

The main entrypoint to apply the operation which we injected in the previous section is the function apply_operation in src/proto_alpha/lib_protocol/main.ml (l.180). A variable operation (of type 'kind operation') containing all information for the operation itself is passed to the next function.

The record-type 'kind operation defined in file operation_repr.ml (l.62) contains a field protocol_data of type 'kind protocol_data (l.67). It wraps together a list of contents (of type contents_list) and a signature of the source contract for authentication purposes.

Next, the function apply_operation in src/proto_alpha/lib_protocol/apply.ml (l.1339) is called, which after some initialization, uses operation.protocol_data.contents (of type contents_list) to call the next function.
In the function apply_contents_list (l.1103), two function calls are worth mentioning:
1. On one side (left in the figure), the function precheck_manager_contents_list (l.1327), which in turn calls the function precheck_manager_contents (l.928), and itself will call Contract.spend will trigger the payment of baking fees (l.833).
2. On the other side (right in the figure), the function apply_manager_contents_list (l.1331) makes successive nested calls to apply_manager_contents_list_rec (l.987), then apply_manager_contents (l.836), and finally apply_manager_operation_content (l.518). Finally, the amount of ꜩ10 is debited from Alice when calling Contract.spend (l.552) and then credited to Bob when calling Contract.credit (l.570).

Summary

To show how operations are handled in Tezos, we started from the command-line level using tezos-client to trigger a token transfer between two accounts. This client process occurs in the shell, outside of the procotol. The command itself is parsed and a manager operation is prepared. This operation is further wrapped up as an annotated manager operation in a single-item list of the one operation.

Furthermore, to prepare for the injection into the economic protocol, this list of annotated manager operations is transformed into a list of contents. A content is a wrapper for various operations, one of them is the manager operation. Finally, the operation is injected in injection.ml.

On the economic protocol side, the operation has been fetched, consisting of a signature and a protocol data, wrapped together with a shell header. This value is then sent to apply.ml which is in charge of calling the final low-level operations for funds debit/credit.

See this short tutorial (search for “first a revelation”). By default (to save space, and to provide a little additional security) only hashes of public keys are recorded on the blockchain. But in order for Alice to sign a transaction from her account to Bob’s, Alice has to pay a one-time cost — ꜩ0.001259, in the example linked to in the tutorial above — for the full public key of her account to be recorded on the blockchain as a special key revelation operation. ↩
You can fetch a list of them at the command line with tezos-client man.
We use here a bespoke command-line argument parser developed in Nomadic Labs called Clic. Other parts of the codebase use other parsers, including Arg, and Cmdliner (used in tezos-node). ↩

Edo, the latest Tezos upgrade, is LIVE

2021-02-13T20:00:00+01:00

Summary:

This is a joint announcement from Nomadic Labs, Marigold, and DaiLambda.

On 13 February 2021, the Tezos blockchain successfully upgraded by adopting Edo at block 1,343,489. Jointly developed by Nomadic Labs, Marigold, DaiLambda, and Metastate, Edo is the fifth Tezos upgrade in the span of two years, and follows the Delphi upgrade of three months ago.

The Tezos blockchain currently allows protocol upgrades every several months, and we intend for the foreseeable future to take advantage of every such opportunity, rapidly incorporating the best available technology into Tezos. Most cryptocurrency networks lack a mechanism to decide on the content of technical upgrades. By contrast, Tezos’ on-chain self-governance and self-amendment mechanisms allow it to evolve in a way that respects the expressed preferences of its users.

A full list of the changes in Edo can be found on this documentation page. To summarize, however, the upgrade contains some minor bug fixes, some improvements to performance and gas consumption, the addition of a new period (named the “Adoption Period”) to the upgrade process, and two important new features that we have been working on for some time: Sapling, and Tickets.

Sapling is a protocol originally developed by the Electric Coin Company. Edo allows smart contract developers to easily integrate Sapling in their smart contracts, enabling new types of applications such as voting or supporting asset transactions with selective disclosures.

Tickets are another substantial improvement in Tezos. Tickets are currently experimental and should not be used in mission-critical contracts until we have completed our ongoing audit of its implementation. Tickets are a convenient mechanism for smart contracts to grant portable permissions to other smart contracts or to issue tokens. While it’s possible to achieve this with existing programming patterns, tickets make it much easier for developers to write secure and composable contracts, and we expect to see extensive use of tickets after the feature has stabilized.

Just like any other feature of the protocol, Tezos protocol amendments may make changes to the amendment process itself. The “Adoption Period” (sometimes referred to as the “Fifth Period”) is an important improvement to the governance mechanism we have wanted to make for some time.

Up until now, new versions of the protocol have gone live (that is, have been “activated”) one block after voting has been completed, which in practice is only sixty seconds. This has made it difficult for some Tezos bakers, indexers, and other users of the network to assure seamless upgrades of their nodes. We have also seen instances where the lack of certainty about whether an upgrade would be adopted has caused some users to delay preparations until the last moment.

Under the new system, instead of four periods of eight cycles during voting, the Tezos upgrade process now lasts five periods of five cycles. The new fifth period, the Adoption Period, will be a five cycle (approximately two weeks) gap between the acceptance of the new protocol in the Promotion Vote Period, and the time when it is activated on the Tezos network. This will aid in assuring seamless protocol transitions.

We are exceptionally pleased with the progress that has been made on the Tezos protocol and its software in recent months. Thank you to all the developers who worked so hard to make Edo a reality. We intend to inject a new proposal, “Florence”, with a variety of interesting new features, some small, some large, within a few weeks.

IMPORTANT: Critical Patch to Tickets in Edo

2021-02-10T23:45:00+01:00

Summary:

We have discovered a critical bug within the new Tickets functionality in Edo.

Several mechanisms were considered to mitigate this problem; none were ultimately found to be satisfactory. We have therefore taken the step of producing and releasing version 8.2 of the Tezos node that includes a patched version of the Edo protocol that differs by only a few lines of code.

Nodes running 8.2 will automatically adopt the patched version rather than the original version of Edo when it activates on February 13th, 2021, around 19:30 GMT. We ask all bakers and node operators to please update immediately to 8.2, rather than 8.1 which most are currently running. Nodes running version 8.1 or earlier will not be able to communicate with the new chain.

The hash of the new version of Edo is: PtEdo2ZkT9oKpimTah6x2embF25oss54njMuPzkJTEi5RqfdZFA

We appreciate that this is coming at a very late stage in the Edo upgrade process, but we believe that this is the best choice available. It is safer to have the upgrade occur at a time when every bakeries already updating their software and planning to pay close attention to their node because of the impending adoption of Edo.

We do our best to prevent bugs from slipping inside our releases but unfortunately, as with any complex software, there is always a small chance of missing something. We intend to adopt several new quality control mechanisms to reduce the probability of similar bugs going undetected in the future.

The Git tag for this release is v8.2 and the corresponding commit hash is 6102c808a21b32e732ab9bb1825761cd056f3e86.

Full changelog and update instructions are available in the version 8 release page.

A look ahead to Tenderbake

2021-02-08T12:00:00+01:00

We’re working on changing the Tezos consensus algorithm from the current Emmy⁺ algorithm, to a new algorithm called Tenderbake. We’d like to discuss this development, and explain why we’re considering it, and what advantages it will bring.

Tenderbake and Emmy⁺ belong to different algorithm families:

Emmy⁺ is a Nakamoto style algorithm, whereas
Tenderbake is a classical BFT-style algorithm.

So moving to Tenderbake would be a significant development on the Tezos network.

We made every effort to keep this blog post self-contained, but just in case, you might also find this glossary of the technical terms useful!

Why Tenderbake?

Tenderbake has quick, deterministic finality

From the point of view of the user, Tenderbake’s killer feature, relative to Emmy⁺, is that it offers deterministic finality: a block that has just been appended to the chain of some node is known to be final once it has two additional blocks on top of it, regardless of network latency.

Tenderbake is also fast, in the sense that it has a small (quick) time to finality: under typical good network conditions, and making a standard assumption of an attacker (“byzantine”) stake of at most 33% — meaning that at most 1/3 of the network is trying to undermine correct behaviour:

in Tenderbake, one would expect to wait less than 1 minute for a block to be considered final, whereas
in Emmy⁺, one would expect to wait at least 6 minutes.

How this estimate is made will become clearer below, when we introduce the notion of round duration.

We now expand on some of these claims:

Probabilistic finality (Nakamoto style)

Blockchains are decentralized, so consensus must likewise be decentralized. So suppose there exist blocks $b$, $b_1$, and $b_2$ such that some participants on the blockchain’s network believe that $b_1$ immediately follows $b$, whereas others believe that $b_2$ immediately follows $b$: there is no central authority to enforce consensus on which participants are “right” and which are “wrong”, and we say the blockchain is forked, or in a forked state.

In Emmy⁺, like in all Nakamoto-style consensus algorithms, forks can have arbitrary length. However, forked states become exponentially unstable and tend to collapse down to a single branch (assuming decent bounds on network latency).

When a fork collapses to a single branch, we say that its blocks have reached finality. So, we say that Emmy⁺ has probabilistic finality because forks of arbitrary length are possible but they collapse with probability that increases suitably rapidly with fork length.

Deterministic finality (classical BFT-style)

In Tenderbake, forks are impossible. Or to put it slightly differently: forks collapse down to finality after just two blocks, always and regardless of network latency. This is deterministic finality. How this is achieved, is the topic of this post.

We need two blocks because:

The head of the blockchain is a candidate block to be agreed upon, and
its parent is a block whose non-consensus operations have been agreed, but the consensus operations might still change (on rare occasions).

Tenderbake is safe under asynchrony

From the point of view of a security analyst, Tenderbake has an advantage over Emmy⁺: no fork is possible, regardless of network delays, even during an asynchronous period.

Let’s explain. Our network model assumes partial synchrony, which splits time into two kinds of periods:

During a synchronous period, there’s a global bound on the delivery delay for any network message.
This is (hopefully) the usual state of affairs. Messages are delivered promptly (for a certain global value of “promptly”).
During an asynchronous period, network performance may be degraded: but, every asynchronous period is finite.
This is (hopefully) an exceptional state of affairs.

This is a realistic scenario because in the real world,

networks work …
… until they don’t, and then action is taken to fix the problem and service is restored in finite time.

So it is important for a consensus algorithm to

function efficiently during synchrony, and just as important to
(perhaps degrade but then) recover gracefully from asynchrony.

So back to our security analyst: Tenderbake guarantees that even if the network degrades during an asynchronous period, once synchrony is restored our blockchain will just pick up where it left off. Nakamoto-style consensus does not have this guarantee, and may emerge from an asynchronous period having developed long forks.

Nakamoto is “live”, classical BFT is “safe”.

As is so often the case in consensus, design boils down to making trade-offs: given that some asynchronous periods will eventually occur, do we prefer to be “live” or “safe” during them?

Nakamoto-style consensus algorithms favour being live: blocks are always produced, even in an asynchronous period. Classical BFT-style algorithms favour being safe: production of blocks pauses during an asynchronous period.

Our network model when designing Tenderbake assumes that any period of asynchrony is finite — we assume that somebody is going to rush to fix the broken cable, protect against the DoS attack, and/or bring the server back online. So it makes sense to be safe and (essentially) just wait for the network to come back online.

From Tendermint to Tenderbake: a journey

Tendermint

The starting point of our work was Tendermint, one of the first classical BFT-like blockchain algorithms.

Tenderbake is ‘just’ a version of Tendermint adapted for the Tezos blockchain, but the adjustments required are substantive, as we discuss below. In summary:

Tenderbake is tailored to match the Tezos architecture by using only communication primitives and network assumptions which Tezos supports.
In particular, Tenderbake makes weaker network assumptions than Tendermint, at the price of adding the extra assumption that participants have loosely synchronized clocks¹ — which is fine, because Tezos uses them.

The Tezos architecture

When we adapted Tendermint consensus to the Tezos architecture to arrive at Tenderbake, we had to account for the following structural features of Tezos:

Tezos’ self-amending feature means there is a separation between
- the shell, which handles low-level communication, and
- the economic protocol, which actually does the blockchain stuff and which can be updated by a Tezos amendment.
The Tezos shell is for our purposes immutable, so our Tenderbake implementation has to fit into the more easily-updated economic protocol. In particular, we can use only communication primitives supported by the Tezos shell.
Furthermore, Tezos distinguishes between
- a node, which is on the peer-to-peer network and whose job is to manage communication with other nodes and the validation and the storage of blocks, and
- a baker, who produces blocks and communicates with just one node.
We recall that bakers hold delegates’ keys and therefore for safety they are sandboxed — not directly exposed to the peer-to-peer network. So nodes manage communication and shield bakers from network attack, and bakers hold secrets and bake blocks into the blockchain. This arrangement affects what network assumptions we can make.

So, treating these points in turn:

Tendermint uses three types of consensus messages: proposals, prevotes, and precommits. We must map these to the following available shell communication primitives:
- an operation; and
- a block, which consists of
  - some block contents which is a list of operations, and
  - a block header which contains a hash of the block contents (for efficiency and as a checksum for data integrity).⁴
We choose as follows:
- Tenderbake maps Tendermint proposals to the native Tezos action of proposing Tezos blocks, and
- Tenderbake maps Tendermint prevotes and precommits to Tezos operations.⁵
Bakers are semi-isolated from the network for their own safety, as discussed above, so Tenderbake must weaken the network assumptions of Tendermint³ because those assumptions are too strong to be guaranteed by the Tezos peer-to-peer layer.

Tendermint assumes reliable broadcast: if a correct validator receives a message then all correct validators receive it. Notably, this requires that messages may be delayed, but messages may not be lost, during an asynchronous period.

Tenderbake does not assume communication is reliable. Messages sent during an asynchronous period might never arrive. In fact, Tenderbake is designed to work without any additional assumption except partial synchrony. The technical jargon is: it assumes a best-effort broadcast primitive. Best-effort broadcast means that if a correct validator sends a message during a synchronous period, then all correct (non-byzantine) validators receive it. Best-effort broadcast is the broadcast primitive implicit in the definition of partial synchrony above.

The Tenderbake consensus algorithm in brief

Levels

Some terminology:

The level of a block is the number of blocks since the genesis block, where the genesis block is at level 0.⁷
The fundamental unit of identity in Tezos is a cryptographic key. A delegate is then a cryptographic key whose owner can participate in the consensus algorithm and in the governance process by virtue of registering (the public part of) their key on the chain, holding at least one roll (= 8,000 tez), and having been active recently.

Tenderbake is executed for each new block level by validators, which are delegates selected at random based on their stake,⁸ in the same way as endorsers are selected in Emmy⁺.

The validators’ task is to agree on which block to add next. In Tendermint, this process is

started by each validator emitting a proposal message, which is just an abstract message in the algorithm proposing a block, and
continued by validators voting on which proposal to accept, by a voting mechanism which we will describe in more details shortly; what matters for now is that voting takes place so voting messages must be communicated.

Tenderbake has to choose a concrete representation for this process. The natural way to propose a block is for a validator to use the native Tezos proposal mechanism to actually propose its preferred block as the next block to add to the next level in the Tezos blockchain. Validators can then vote on the proposed blocks by communicating their votes across the Tezos network as Tezos operations.⁹

So schematically, Tenderbake acts as follows:

a validator injects a candidate block (representing a proposal) and consensus operations (representing votes) into the node to which it is attached, which then
diffuses those blocks and consensus operations to other nodes of the network, and thus
communicates them to the validators attached to those nodes,
to carry out voting on which block to accept.

We now consider how the voting process works, in more detail.

Levels are composed of rounds

For each level, Tenderbake proceeds in rounds. Each round represents an attempt by the validators to agree on some block for the current level.

Each round has an associated duration. Round durations are set to increase so that for any possible message delay and/or asynchronous period (when the network may be slow or unreliable), there is a round that is longer.

Each round has three phases:

a block proposal phase;
a preendorsement voting phase, and
an endorsement voting phase.

In more detail:

In the block proposal phase, one of the validators is designated as the round’s proposer. The proposer’s task is to propose a candidate block (more on how this is chosen in a moment).
In the preendorsement phase, validators send a supporting vote (a preendorsement) on the candidate block’s contents. So, this is a vote in support of the non-consensus operations¹⁰ that it contains.
Concretely, a preendorsement is a Tezos operation containing a tuple $(x,bc,r)$ with intended semantics
“I, validator $x$, hereby endorse block contents $bc$ for round $r$“.
In the endorsement phase, provided validators have observed a quorum¹¹ of preendorsements for the block contents, validators send a confirmation vote (an endorsement) for the contents of the candidate block and for the preendorsement quorum.
Concretely, an endorsement has the intended semantics
“I, validator $x$, hereby affirm to have observed a quorum of preendorsemeents for block contents $bc$ at round $r$“.

First round vs. subsequent rounds

In the first round, the proposer is free to propose some block contents taken from its node’s mempool — this being a pool of pending operations that the node has accumulated, but which have not yet been baked into a block on the blockchain. In subsequent rounds, the proposer must propose the contents of the block candidate from the previous round, provided a quorum of preendorsements was observed for the block contents proposed in the previous round; otherwise, the proposer is free to choose from its node’s mempool, as for the first round.

If a validator observes an endorsement quorum at its current round for that round’s candidate block, then the validator considers that agreement has been reached on the block candidate’s contents for level $\ell$,¹² and the validator can move to level $\ell+1$. Each block candidate at level $\ell+1$ will have to include the endorsement quorums of level $\ell$, as a proof that agreement was indeed reached at level $\ell$.¹³

If a validator does not observe an endorsement quorum at its current round for that round’s candidate block, then the validator considers that agreement has not been reached, and the validator loops back into a next round. This may be caused by:

a byzantine proposer which does not make any proposal, or proposes two candidate blocks, or
network delays: (pre)endorsements are sent but just don’t arrive in time, so that reaching a quorum times out.

A final round

Eventually there will be a round with

a correct (non-byzantine) proposer who proposes precisely one candidate block as required, and
the round is long enough so that any asynchronous periods have passed and network messages arrive in time, so that
a quorum is observed and a candidate block for that round is agreed upon,¹⁵

and our loop terminates with consensus.¹⁴

How long does this all take?

Assuming good network conditions, validators will agree on a block after just one round, so that each level lasts one round. Also, finality is in two blocks, so that finality is achieved after two levels = two blocks.¹⁶ Thus, assuming good network conditions, the estimated time for block finality is one minute at most.

On first experiments on a private testnet for Tenderbake, the duration of the first round was set to 15s, so we confirmed that this one-minute estimate is experimentally accurate.

Why do we need (pre)endorsements?

Tenderbake has two voting phases: preendorsements and endorsements. Why vote twice?

These algorithms are genuinely subtle, and this particular design is typical of classical BFT-style consensus algorithms, going back for example to the seminal 1984 DLS (Dwork-Lynch-Stockmeyer) paper. We offer intuition as to why having two phases can be helpful, which — to be clear — is not intended as a definitive explanation:

Intuitively, having two phases helps to ensure agreement and progress even if the network gets partitioned or participants crash.

Suppose we had only one voting phase (so no preendorsements). Then either we allow validators to change their vote in later rounds, or we don’t:

If we allow validators to change vote in later rounds…
… then suppose only one validator $v$ sees an endorsement quorum and decides on some block $b$, but then it crashes. The other validators are not aware of $v$‘s decision, so they may later vote on a different block. Thus, the agreement property may be broken.
If we don’t allow validators to change vote in later rounds…
… then suppose the proposer for a round is Byzantine and it proposes two blocks $b$ and $b'$; then some validators vote on $b$, and some vote on $b'$, and neither $b$ nor $b'$ gathers a quorum of votes. Then there will be no agreement, because validators have to stick with the votes forever. Thus, the progress property is broken.

This is solved by using two voting phases. Indeed, with two voting phases, participants only endorse a block once they are confident that a preendorsement quorum already exists for it. This adds stability: a fragmented network might delay consensus, but it cannot enable a fragmented consensus.

To quote Pierre Chambart, answering the question “Why preendorsements?”:

Ça permet de mesurer si tu es dans la majorité, avant de prendre la décision de voter pour de vrai. (J’imagine la gueule d’une vraie élection si les gens ne voulaient voter que pour le vainqueur).

This allows you to know you’re voting with the majority, before casting your final vote. (Imagine what a real election would look like, if people only wanted to vote for the winner.)

The interested reader can also watch Ittai Abraham’s excellent tutorials:

Byzantine fault tolerance, state machine replication and blockchains and
The HotStuff approach to BFT (Hotstuff is a successor to Tendermint).

Where are we?

We proved on paper that Tenderbake is correct, in a sense made formal by Theorems 5 and 6 of that paper.
Then, we worked on a prototype to test our approach. This prototype took the form of a “demo” economic protocol, which includes just the consensus algorithm and a basic account system (but nothing more: e.g. no rolls, delegation, smart contracts).
We are now close to a fully-featured economic protocol (a modified version of Delphi using Tenderbake instead of Emmy+).
A private testnet is running a modified version of Delphi using Tenderbake instead of Emmy+. After fixing some observed issues, we plan to make this testnet public.

Conclusions

So that’s it! Tenderbake is classical BFT-like whereas Emmy⁺ is Nakamoto-like. Emmy⁺ has probabilistic finality making it more live but less safe, whereas Tenderbake has deterministic finality making it more safe but less live. Tenderbake also has noticeably quicker time to finality.

We’re testing it now: so watch this space.

Recall our network model assumes that we are always either: in a synchronous period, when there exists a global bound $\delta$ such that for every pair of nodes on the Tezos network, messages get delivered within time $\delta$; or in an asynchronous period, when they don’t, but asynchronous periods only ever last finite time.

Say a network has loosely synchronized clocks when for every synchronous period there exists some constant $\rho$ such that and for any blockchain participant, the time error (also called “clock drift”; the difference between the real time and the participant’s local clock) is bounded by $\rho$.

The values of $\delta$ and $\rho$ in the two paragraphs above are a priori unknown.²

During an asynchronous period, Tenderbake does not assume any bound on clock drift. This is reasonable: if network delays are unbounded then it is reasonable to suppose that timing messages might be arbitrarily delayed. Conversely, during a synchronous period when there is a global bound on message delays, we may assume that timing messages arrive after a bounded delay, so that clock drift is also bounded.

Technical note: There is a stability assumption in the implemented system, that the duration of the first round must be greater than the clock drift $\rho$. This guarantees that a validator can ignore all messages that do not match its current round, which ensures that message buffers remain bounded, and the validator cannot be spammed with irrelevant messages. ↩
‘Unknown’ here means that $\delta$ and $\rho$ are parameters: an external observer of the system might be able to observe the system and calculate or deduce a value of $\delta$ or $\rho$, but no algorithm within the system itself is allowed to depend them having any particular value. See also a similar discussion for partial synchrony. ↩
See Section II.A of the Tendermint paper. ↩
The block header contains other useful metadata, including: a hash of its corresponding block contents as mentioned above, a hash of its predecessor block, the level, the round, and a timestamp. This absolutely doesn’t matter for this blog post, but if you’re reading this footnote then you’re probably the kind of person who would want to know. You asked your parents to read the encyclopaedia to you at bedtime instead of “made-up stories”, too, didn’t you? ↩
A Tezos “operation” is just the basic unit of information in Tezos. An operation is labelled by a kind (called a “pass” in the source code) which is an integer such that $0$ indicates “this operation is a consensus message” and any other number indicates “this may be something else”. It is up to the protocol to decide how to interpret operations — an operation is first created, then injected into some node’s mempool, transmitted on the network, and finally possibly (but not necessarily) inserted in a block. So there is nothing particularly fancy about encoding prevotes and precommits: they are just data which can be encoded as operations⁶ and — maybe, maybe not — baked into a block on the blockchain. ↩
We could map Tendermint prevotes to blocks (instead of operations), but it would be inefficient. The algorithm requires a quorum of prevotes, so we’d need $2f+1$ blocks to reach a quorum, where $n$ is the number of validators. With $150$ validators, prevotes would generate $100$ blocks just to generate consensus on the next ‘real’ block to be added.

It might be possible to map block proposals to operations. This was discussed, but it would require changing the shell: a proposal would have to be a special bundled operation that behaves almost like a block; when a node sees a proposal operation it would need to ask its neighbors to provide the operations contained in the bundle (the bundle operation would not actually contain operations but just pointers to, i.e. hashes of, operations). Changing the shell is anyway a big deal, and there’s clearly an easier way of just using the native Tezos block proposal mechanism, so proposals are blocks and not operations. ↩
If blocks were years then a block of level 1984 would be the year 1985 AD. Fun fact: our calendar goes from 1 BC to 1 AD. There is no year zero. ↩
The running economic protocol makes a random choice from the available delegates (weighted by their stake) to be validators for the current block level. To be quite precise, at each level, $n$ rolls are selected at random and their owners are the validators for that level. ↩
Emmy⁺ uses a similar mechanism here. ↩
Non-consensus operations means operations other than preendorsements and endorsements. Notably, non-consensus operations include transactions, which for a consensus algorithm are just data, but which for the end user are the whole point of the exercise. By block contents we intend all operations other than preendorsements and endorsements. (Pre)endorsement operations contain the hash of the putative “block contents”, not the operations themselves. ↩
An aside on quorums and the quorum intersection property.
Suppose for simplicity that everybody has the same stake, so we can ignore stake-based weightings. Suppose there are $n=3f+1$ validators, where $f$ is the number of incorrect, Byzantine validators — the standard assumption is that at most one third of validators are Byzantines, so taking $n=3f+1$ is the worst case. A quorum is a set of $2f+1$ signatures. Then by the pigeonhole principle, quorums intersect at $f+1$ signatures, and given the bound $f$ on Byzantine actors, it follows that there is at least one signature of a correct validator. This is called the quorum intersection property. ↩
By the quorum intersection property, there cannot be two endorsement quorums on two different block contents at the same round, because then a correct validator would have endorsed two different block contents at the same round, which is forbidden by the algorithm. This observation is at the core of the proof that there cannot be two different blocks agreed upon at the same level. ↩
Anyone can check that the endorsements are produced by the right delegates, that is, by the validators for the corresponding level. This is because validators are known well in advance and their public keys are available on-chain. ↩
For each validator, there is a final round, but final rounds may differ for different validators, and different validators may execute the loop a different number of times. So: rounds are a per validator quantity. ↩
The property that agreement is eventually reached is expressed precisely by a termination property — which in the context of a blockchain becomes a progress property (each level terminates, so we make progress). An attacker could try to delay the network at a rate calculated to be just a little more than the increase in round durations, so that for each round $r$ the proposer at round $r$ times out, though our network model assumes this cannot go on forever: any asynchronous period must be bounded. Following our discussion above of how classical BFT-style consensus algorithms (like Tendermint and Tenderbake) favour safety over liveness, we can draw the following chain of informal entailments: “asynchronous periods assumed bounded” $\Rightarrow$ “we can safely err on the side of safety” $\Rightarrow$ “deterministic finality”. ↩
Tendermint has immediate finality, meaning deterministic finality after just one block: once a block that was proposed in a proposal message is agreed upon, it is final. Tenderbake requires the marginally slower deterministic finality after two blocks because of its stronger timing assumptions: in order for validators to synchronize in their current round, proposal messages contain a round identifier, and in Tenderbake, proposal messages are encoded as blocks, thus blocks contain round identifiers — however, validators may take their decision at different rounds, so their decision rounds may differ! Thus, Tenderbake has more to agree on than Tendermint: its committee of validators must first agree on the “true” decision round to be able to agree on which is the “true” block. ↩

POPL 2021 retrospective

2021-02-01T20:00:00+01:00

About POPL (Symposium on Principles of Programming Languages)

POPL 2021 — the 48th ACM SIGPLAN Symposium on Principles of Programming Languages — is a premier annual conference event of the Programming Languages research community. POPL consists of a main conference, and many colocated events, which together present the latest and greatest research in (amongst other topics):

programming languages theory,
formal verification,
type systems, and
functional programming

— all subjects which are dear to Tezos community, and which are the bread-and-butter of the daily work at Nomadic Labs of writing some of the most complex and safety-critical code in commercial practice.

POPL took place from 17th to 22nd January 2021. It was supposed to be in Copenhagen Denmark, but due to the pandemic it was virtualised with a semi-synchronous program catering for multiple time zones. See the full symposium events page online and the schedule.

Our contributions

Nomadic Labs was actively involved throughout the week:

On Monday 18 Jan, Arvid Jakobsson, one of our experts in formal verification of smart contracts, presented the formalisation of the Dexter decentralised exchange contract in the Mi-Cho-Coq framework. at CPP’s 2021 Lightning Talks session³.
On Wednesday 20 Jan, Michel Mauny, Nomadic Labs’ CEO, made a short presentation of our company at POPL’s Sponsor Reception, which also featured a contributed video presenting a general overview of Tezos by Arthur Breitman.
On Friday 22 Jan, Germán Delbianco, one of our software verification experts, together with our collaborators from IMDEA Software Institute and UCM, presented an article presenting new developments on the algebraic foundations of Concurrent Separation logics. at POPL⁴.

We were also pleased to find further research works backed by Tezos Foundation grants in the programs: Xuanrui Qi and Jacques Garrigue from Nagoya University, in Japan, presented advances towards specifying OCaml GADTs in Coq, a work funded through the Certifiable OCaml Type Inference (COCTI) grant¹.

Sponsorship

The Tezos Foundation was a platinum sponsor of POPL 2021, for the third consecutive year.
We at Nomadic Labs are pleased and proud to have sponsored the colocated CPP 2021 (Conference on Certified Programs and Proofs), for the second consecutive year.

Highlights

Here are our selected highlights²:

POPL 2021

The POPL program ranges from theoretical foundations of programming languages (e.g. semantics and type systems), to the development and application of formal tools and techniques for creating and verifying reliable and correct systems.

Selected examples include:

formalised optimisations for quantum circuits;
programming with and reasoning about algebraic effects;
formal verification of shared memory concurrency and probabilistic programs; and
how to make cryptographic primitives secure under speculative executions.

Reliable blockchains and other distributed systems were the focus for two POPL21 invited keynotes:

CPP 2021 (Conference on Certified Programs and Proofs)

As mentioned above, we sponsored CPP 2021 (Conference on Certified Programs and Proofs). This covers mechanised verification efforts and tools, including:

the formalisation of mathematics,
certified algorithms, and
the development of new proof techniques and frameworks for certified programming.

A highlight of this year’s program was the keynote talk by Peter Sewell (University of Cambridge) accounting past, present, and future formalisation efforts towards formally specified, mechanised hardware architectures in the context of the CHERI project, and their impact on the development of existing and future hardware architectures.

Blockchain-related topics like certified frameworks for programming correct smart contracts, or mechanised proofs of consensus algorithms, frequently appear in the program. This year, presentations included:

advances in the ConCert framework to verify, test, and extract certified smart contracts in Coq (including extraction to Liquidity and plans to target CameLIGO in the future);
ongoing efforts to certify Tendermint using TLA+;
and our own lightning-talk presentation on the verification of the Dexter contract.

CoqPL 2021 (Seventh International Workshop on Coq for Programming Languages)

Amongst the other colocated events on offer during the week (like VMCAI, or LAFI), we took time to attend CoqPL 2021, the Seventh International Workshop on Coq for Programming Languages.

The Coq proof assistant is one of Nomadic Labs’ most-used verification tools, as witnessed by the Mi-Cho-Coq framework, the coq-of-ocaml project, and its use on this contributions to this year’s CPP and on this contributions to this year’s POPL.

The Tezos Foundation supports the development of Coq and of its ecosystem, and the CoqPL workshop provides an opportunity to interact with the Coq development team, learn about the recently released features and what’s coming next in the pipeline, and other recent library developments.

See you next year!

POPL 2022 is planned for Philadelphia, PA. We look forward to catching up with you all again there — hopefully, in person.

You can find the COCTI code here. ↩
At time of writing, the entire program is publicly available on Clowdr and should soon be uploaded to SIGPLAN’s YouTube channel. ↩
Arvid Jakobsson, Colin González, Bruno Bernardo, and Raphaël Cauderlier. Formally Verified Decentralized Exchange with Mi-Cho-Coq. Contributed Lightning Talk to CPP 2021. Thanks also to Kristina Sojakova (INRIA) for her contributions to the formalisation effort. ↩
František Farka, Aleksandar Nanevski, Anindya Banerjee, Germán Andrés Delbianco, and Ignacio Fábregas. On Algebraic Abstractions for Concurrent Separation Logics. Proc. ACM Program. Lang. 5, POPL, Article 5 (January 2021), 32 pages. https://doi.org/10.1145/3434286 ↩

A review of Nomadic Labs in 2020

2021-01-08T08:01:00+01:00

2020 is over and in spite of the difficulties which the year presented, we at Nomadic Labs got a lot done. So here’s who we are, and what we accomplished in 2020:

Nomadic Labs in a nutshell

Nomadic Labs is an international technical company dedicated to evolving the Tezos ecosystem. Tezos is a community-driven proof-of-stake self-evolving blockchain platform that adapts and adopts new features and enables borderless global cooperation.

Let’s unpack that:

Community-driven: The Tezos community is a global community of users, researchers, and adopters (see Tezos Commons and Tezos Agora).
Proof-of-stake: The Tezos blockchain is based on a proof-of-stake principle, which is low-power, inclusive, and environmentally sustainable. Indeed, a Tezos node can run on a Raspberry Pi (and here’s a howto!).
Self-evolving: The Tezos blockchain protocol is flexible, democratic and adaptable. We mean this in a specific technical sense — thanks to a built-in voting mechanism, the Tezos community of users can vote to update the protocol, and it regularly does.

So Nomadic Labs contributes to a broad ecosystem dedicated to creating a resilient and global blockchain platform with associated tools. We aim to serve science, society — and the dignity and privacy of productive work in a new technological age.

You can find out more about us here:

Discover Nomadic Labs in 2 min (video)
This short interview of our CEO by TheCoinTribune
Twitter: @LabosNomades
Our gitlab repo
Our technical blog

Culture and growth

At the start of 2020 we had an office-based culture, based in beautifully-situated offices in the heart of Paris.

Then, like everybody else, we adapted to social distancing and — as necessary — to working from home until the pandemic is over. That this adaptation was handled smoothly, was due to some genuinely hard work by administrative staff, and due to a cohesive and friendly company culture¹, and we are thankful that we were able to flourish and grow throughout 2020:

We started 2020 with 39 full-time employees,
we ended it with 59 full-time employees, and
we have plans for further growth in 2021.

It’s also worth mentioning how well the Tezos blockchain itself withstood the 2020 stress-test. There were no hiccoughs and no stalls. Nomadic Labs — and the blockchain which is its raison d’être — have been stable in a time of crisis. The technology worked, and the community continued to grow.

Seminars and events

In spite of the pandemic we were able to hold many meetings. Highlights include:

We held the Tezos Developers Day (shortly before the first lockdown on 17 March).
You can enjoy the videos here.
We co-organised a Journée Scientifique (a one-day research workshop) with INRIA on 21 September 2020.
See the full programme with slides.
We started a series of Nomadic Labs research seminars.
Topics so far have included practical proofs using Juvix, verifying smart contracts using Mi-Cho-Coq, efficient data storage on the blockchain using Plebeia, and adding multicore programming to OCaml. More to come in 2021!
We ran a four-day immersive Tezos training course in Feburary 2020.
Topics included the Tezos blockchain architecture; consensus/privacy fundamentals; smart contract languages; indexers; and hands-on dapp (distributed application) building. Speakers were from Nomadic Labs, LIGOlang, Blockwatch Data Inc., ECAD Labs Inc., and Tezos Ukraine, presented to approximately fifty participants with backgrounds ranging from software development to the financial sector. Write to training@nomadic-labs.com to learn how Nomadic Labs could help train you.

Notable projects and achievements

The Mi-Cho-Coq project connects Michelson smart contracts to the Coq proof assistant.²
Writing smart contracts is safety-critical (see a detailed taxonomy of possible errors) and Mi-Cho-Coq is an important step to help bring cutting-edge mathematics and computer science to bear on assuring safe and correct behaviour of Michelson smart contracts.
We contributed to specifying and verifying the Dexter exchange (see also our recent blog post).³
This is a smart contract on the Tezos blockchain, for exchanging cryptocurrency tokens that are compliant with the FA1.2 standard — this is similar to what Uniswap does on Ethereum with the ERC20 standard. A robust on-chain exchange mechanism is an important step towards making Tezos a globally useful cryptocurrency platform.

Adoption

We now have dedicated adoption and support teams!

These teams are devoted to helping people and organisations to make the step into the Tezos ecosystem. You can watch a short and clear video on stablecoins by our adoption team lead.

We have attracted institutional bakers

A “baker” on Tezos is a stakeholder (a blockchain participant) that validates operations and adds them to the Tezos blockchain (for which it is rewarded). At time of writing, Tezos has a diverse community of more than 400 bakers worldwide.

We are proud to announce that in 2020, our adoption and support teams provided support which helped three large institutions become bakers⁴ on the Tezos blockchain:

EDF (Éléctricité de France; a multinational utility company with 340 billion USD in assets) has become an institutional baker, via a subsidiary Exaion which it founded in January 2020.
See the press release, and see Exaion’s official baking record.
Sword France (a technological and digital transformation group founded in 2000) has become an institutional baker. As Alain Broustail (Sword Blockchain Director) said: “we have been working with [Tezos] for over a year. [It] provides security guarantees, transparency and strong adaptability, which makes it very attractive”.
See the press release. Sword is also participating in the Tezos Digisign project.
Smart Node (a staking company) has become an institutional baker. If you hold Tez then you can delegate your stake to Smart Node, who will bake using the stake you delegated and return rewards proportional to it. See the announcement, and see Smart Node’s official baking record.

We helped industrial partners to launch Tezos-based projects

Société Générale (a bank with 1.7 trillion USD in assets) selected a pool of technology providers, including Nomadic Labs, to experiment with the use of Central Bank Digital Currency (CBDC)¹¹ for interbank settlements.
Specifically, this project will explore the feasibility of financial securities being digitally settled and delivered in CBDC.
See the press release.
Sword group introduced Tezos Digisign, a free and open source tool to sign, certify and verify the authenticity of digital documents.
This tool is already in production with a client and is currently being integrated with several market ECM (Electronic Content Management) packages. The source code of Tezos Digisign is on Gitlab.
See the press release.
Logical Pictures is launching 21 Content Ventures, an investment vehicle with a 100 million Euro maximal capacity (minimum investment 100 thousand Euro) to invest in coproducing films and TV series, with an emphasis on international content (e.g. from the Cannes, Toronto, and Sundance festivals).⁵ A particularity of this investment vehicle is that it will be tokenized on the Tezos blockchain. This means that
- each title will be represented on the Tezos blockchain by a security token, and
- fundraising will take the form of a Security Token Offering (STO).
The future portfolio of films and TV series will thus be digitized, which will offer more liquidity and transparency than a traditional share of funds.
See the press release in English and in French.

Protocol upgrades

We contributed to not one but two Tezos protocol upgrades during 2020 (Carthage and Delphi) and we proposed a third (Edo):

The Tezos blockchain contains a mechanism to upgrade the protocol and thus change how the blockchain works by community vote, so each successful protocol upgrade is making history in the world of blockchain evolution.⁶

Tezos protocols are traditionally named after ancient cities. Thanks to Metastate for a nice timeline of protocol upgrades:

5 March 2020.
Carthage (block height 851,969; cycle 208; changelog; significance of the upgrade).
12 November 2020
Delphi (block height 1,212,417; cycle 296; changelog; significance of the upgrade).

The next proposed protocol upgrade is Edo; see an accessible explanation of what is proposed (essentially: bugfixes, Zcash Sapling integration, and a ticket system for smart contracts).

Associated to the Edo protocol upgrade is also a new release candidate of the protocol environment, numbered “Version 1”. The protocol environment is the set of functions that a protocol can use — a dedicated library which includes cryptographic primitives and other useful functions (packaged as an OCaml module). This is a backendish⁷ but significant piece of work: until now all protocols have used “Version 0” of the protocol environment.

Research and development

We started a research partnership with IMDEA. The emphases are on program analysis and verification, distributed consensus, resource consumption and performance, and security and privacy.
See the press release.
We started a research partnership with INRIA. This cooperation takes the form of a grants framework to support blockchain research, the results of which will be made publicly available.
So far, four research initiatives have been funded, employing ten researchers and two engineers. The initiatives relate to changes to OCaml and its compiler, and to the semantics of the F* verification system. See the press release.

A Journée Scientifique with INRIA was held on 21 September 2020, for which a full programme with slides is available.
We advertised ten internships, which are open at the time of writing this summary. Feel free to send in an application!
We spawned several test networks in 2020. A test network (testnet, for short) is a ‘mock’ blockchain where the rules are tweaked to allow experimentation e.g. before pushing a protocol upgrade. The test networks were:
- Dalphanet.
  This was spawned during summer 2020 and is now closed. It was used to test Sapling integration and baking accounts.
- Delphinet.
  This was used to test the Delphi protocol upgrade (which was voted for and accepted on 12 November 2020). See also a full account of the significance of this upgrade.
- Edonet.
  This is a current test network being used to test the Edo protocol.
We commonly open-source the tools we use to develop our software. For example, we released mockup mode, a tool in the tezos client for experimenting with (parts of) the essential API of a Tezos node without having to run a local blockchain or maintain a consensus algorithm. This has been especially useful to our developers working on smart contracts. We extensively documented this new tool’s usage.
We considerably extended our analysis of Emmy⁺.
Emmy⁺ is the current Tezos consensus algorithm, so it is important that we understand its behaviour both in theory and empirically. We extended the initial analysis from 2019 and presented the results as follows:
- A study of malicious reorgs.
- A study of partial synchrony.
- A study of mixed forks.
We proposed Tenderbake: an adaptation of the Tendermint algorithm for Tezos, in collaboration with CEA-List.
We have a Tenderbake prototype and are integrating it in the economic protocol as a next-gen upgrade from Emmy⁺.⁸ We expect to release a Tenderbake testnet in 2021.
If you want to design your own Tezos protocol, these two blog posts might help: part 1 and part 2.
We’re developers, and developers test their programs. But how do you know your coverage is good — that your tests cover the important cases? Well, here are two posts on our use of testing tools:
- pytest and
- bisect_ppx.
There’s an open internship if you’re interested, and you can get a feel for how this work gets done in practice by looking at this merge request for applying the TZT framework to test Michelson expressions.
We formalized most of the Tezos economic protocol in Coq, using the coq-of-ocaml tool, itself developed at Nomadic Labs. This is part of a larger project aiming to formally verify — meaning, to formally represent, and then verify using a computer correct behaviour of — the Tezos implementation itself.
We presented Albert, an intermediate smart contract programming language compiled to Michelson.¹⁰ Albert is an imperative language with variables and records — abstracting away from Michelson’s stack-based paradigm — intended as a compilation target for higher-level smart contract programming languages. See the paper in WTSC 2020. Albert’s compiler is written in Coq, which means that we can certify it: currently, the backend optimizer is certified.⁹

Introducing mockup mode for tezos-client

2020-12-21T15:00:00+01:00

We are pleased to announce that the tezos-client binary has a new feature aimed at contract and tool developers alike: the mockup mode.

Mockup mode allows easy prototyping of Tezos applications and smart contracts locally. By local we mean:

The relevant data files sit in a directory on your computer’s local filesystem.
These files are a lightweight emulation of the internal state of a Tezos single-node network.
Thus, networking communications infrastructure that a node would be wrapped in, has been stripped away.
Likewise, the consensus mechanisms that would be needed for a live blockchain with a network and multiple nodes, have been stripped away.
The state is directly accessible and modifiable, since it’s just files on your computer.
You don’t need (complex) setups of node-client interactions. There is no infrastructure aside from your own filesystem with a bundle of state files in a directory. It’s easy to inspect these files, modify them, and play with different configurations and protocols.

If a single sandboxed node were an apricot then mockup mode would be the apricot kernel, and we could write:

    mockup mode = kernel_of(one sandboxed Tezos node)

Our motivation in building mockup mode was to give our developers, who are building and testing Tezos smart contracts, an easy local environment offering a fast development cycle which needs only lightweight local state files, and which does not require a running blockchain. Now, we’d like to share the joy of this new tool with you.

The features described below are available on the master branch, and will be included in version 8.0.¹

This post is a practical guide to mockup mode’s features (see also the documentation). Be prepared for lots of command-line snippets, which you are welcome to run for yourself!

Overview
Run a mockup client in stateless mode
- Typecheck and evaluate scripts
- Query available mockup protocols
Run a mockup client with state
Tune mockup parameters
- Context state
Running a mockup client with asynchronous state
- Use
- Differences from sandboxed mode
Conclusions

Overview

The basic command: `tezos-client`

tezos-client is the main tool for advanced user interaction with the Tezos blockchain.

tezos-client can prepare transactions; evaluate, typecheck and originate contracts; and encode/decode data when interacting with nodes. It also acts as a wallet, allowing to sign arbitrary data — including, of course, transactions.

The mockup mode of tezos-client supports these operations (with slight limitations²), with the convenience that it does not need to be connected to a live Tezos node. All operations are local, in the sense above.

tezos-client in mockup mode does two things to compensate for not communicating with the live network:

It allows the user to specify — or if none are specified, it invents — dummy values for required initialisation parameters which would usually be gathered from a live node. Examples include: the head of the chain; or the client’s network identifier.
It simulates activation of the protocol, and runs local implementations of the RPCs (Remote Procedure Calls).

Three modes of operation

Mockup mode can run in three ways:

Stateless mode.
In this mode, tezos-client operates on its inputs and returns a value. Nothing is written to disk, and no state is preserved between calls to the client. This is the default.
Stateful mode.
In this mode, tezos-client creates or manipulates a state on disk. The switch for this is --base-dir <directory_name>; example here.
Stateful asynchronous mode.
This mode adds baking. The command-line switch for this is --base-dir <directory_name> --asynchronous; example here.

Capabilities of mockup mode

The current implementation of mockup mode can:

Typecheck, serialize, sign and evaluate a contract.
These features work in stateless mode.
Perform transactions, originations, and contract calls — mimicking sandboxed mode but without a node.
These features require a state.
Register delegates and bake blocks.
These features require an asynchronous state.

In practice we find it simplest to just use state and remember to delete it between sessions — but your needs may vary and the tool will accomodate them.

We will now consider the capabilities in more detail.

Run a mockup client in stateless mode

Typecheck and evaluate scripts

The mockup mode can typecheck and evaluate scripts. Let’s try this on a script hardlimit.tz, which you can download from Tezos master branch or create locally as follows:

$ cat > hardlimit.tz <<EOF
parameter unit ;
storage int ;
code { # This contract stops accepting transactions after N incoming transactions
       CDR ; DUP ; PUSH int 0 ; CMPLT; IF {PUSH int -1 ; ADD} {FAIL};
       NIL operation ; PAIR} ;
EOF

Typechecking a script:

$ tezos-client --protocol ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK \
  --mode mockup typecheck script hardlimit.tz

Well typed  
Gas remaining: 1039988.27 units remaining

Evaluating a script:

$ tezos-client --protocol ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK \
  --mode mockup run script hardlimit.tz \
  on storage '2' and input 'Unit'

 storage
  1
 emitted operations

 big_map diff

Without the --protocol option, the mockup mode will choose a protocol for you.³

Query available mockup protocols

We can query the list of the Tezos protocols that mockup mode supports:

$ tezos-client list mockup protocols

As this article went to print (so to speak), this command returns three protocol identifiers (ignoring any Warnings):

ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK
PsDELPH1Kxsxt8f9eWbxQeRxkjfbxoqM52jvs5Y5fBxWWh4ifpo
PsCARTHAGazKbHtnKfLzQg3kms52kSRpgnDY982a9oYsSXRLQEb

A word on the protocol for naming protocols: The Tezos protocol IDs above are based on hashes, but the start of each ID hints at the release name of the corresponding protocol. The three items above correspond to protocols called alpha (a development version of the Tezos protocol), Delphi, and Carthage.

Getting these IDs matters because a Tezos blockchain requires a protocol, thus in particular setting up a mockup state requires us to choose a protocol. The list above tells us what’s available.

Run a mockup client with state

Giving the mockup client some state allows access more of the available functionalities. In particular, given a state we can operate on it, including:

transferring Tez cryptocurrency tokens (ꜩ),
originating (deploying) contracts,
importing keys, and
querying balances or (more generally) making RPC queries on the chain’s current state.

A useful command alias: `mockup-client`

A shell alias will let us call tezos-client with --mode mockup and --base-dir /tmp/mockup, and so save us keystrokes later:

$ alias mockup-client='tezos-client --mode mockup --base-dir /tmp/mockup'

Our first state

Time to make a mockup session with some state!

Making the state

$ mockup-client --protocol ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK create mockup

Created mockup client base dir in /tmp/mockup
Tezos address added: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
Tezos address added: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN
Tezos address added: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
Tezos address added: tz1b7tUupMgCNw2cCLpKTkSD1NZzB5TkP2sv
Tezos address added: tz1ddb9NMYHZi5UzPdzTZMYQQZoMub195zgv

Note that:

The state is stored in /tmp/mockup because of the --base-dir option in mockup-client.
The switch --protocol ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK means that this mockup session will use protocol alpha for all subsequent commands on this state (see next point).
Mockup mode does not support protocol updates⁴, so if we want a new protocol we need to start from a new state. Thus, for this session we are stuck with our initial choice of alpha. In the near future, mockup mode will support protocol migrations to facilitate protocol switching.

The output above confirms that:

A mockup state data directory /tmp/mockup has been created. Data is
- in /tmp/mockup for non-mockup-specific elements (like accounts), and
- in /tmp/mockup/mockup for mockup-specific data (like mempool, trashpool, and context) — see asynchronous state for details).
Five accounts have been added, and their addresses are listed.

The five accounts are called bootstrap1 to bootstrap5 (see command below). The reader familiar with Tezos’ sandboxed client may recognize them as the preconfigured bootstrap1 to bootstrap5 accounts which it creates.

List known addresses

We can now use standard commands, e.g., to list known addresses.

$ mockup-client list known addresses

bootstrap5: tz1ddb9NMYHZi5UzPdzTZMYQQZoMub195zgv (unencrypted sk known)
bootstrap4: tz1b7tUupMgCNw2cCLpKTkSD1NZzB5TkP2sv (unencrypted sk known)
bootstrap3: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU (unencrypted sk known)
bootstrap2: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN (unencrypted sk known)
bootstrap1: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx (unencrypted sk known)

Transfer tokens

We can execute the canonical example of a transfer from one address to another.⁵

$ mockup-client transfer 100 from bootstrap1 to bootstrap2

Node is bootstrapped.
Estimated gas: 1427 units (will add 100 for safety)
Estimated storage: no bytes added
Operation successfully injected in the node.
Operation hash is 'ooVjVsPgUuy4grpDBbKr5QPc667JCQ6nbMeqeTjqiRzXiCiy5e9'
NOT waiting for the operation to be included.
Use command
  tezos-client wait for ooVjVsPgUuy4grpDBbKr5QPc667JCQ6nbMeqeTjqiRzXiCiy5e9 to be included --confirmations 30 --branch BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU
and/or an external block explorer to make sure that it has been included.
This sequence of operations was run:
  Manager signed operations:
    From: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
    Fee to the baker: ꜩ0.000404
    Expected counter: 1
    Gas limit: 1527
    Storage limit: 0 bytes
    Balance updates:
      tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ................ -ꜩ0.000404
      fees(the baker who will include this operation,0) ... +ꜩ0.000404
    Transaction:
      Amount: ꜩ100
      From: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
      To: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN
      This transaction was successfully applied
      Consumed gas: 1427
      Balance updates:
        tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ... -ꜩ100
        tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN ... +ꜩ100

Let’s check that the transfer has been registered:

First, we check that the sender bootstrap1 has indeed paid the transfer amount, plus fees:
```
$ mockup-client get balance for bootstrap1
```
```
3999899.999596 ꜩ
```
Second, we check that the receiver bootstrap2 indeed has an extra 100 ꜩ in its balance:
```
$ mockup-client get balance for bootstrap2
```
```
    4000100 ꜩ
```

Something more advanced: interacting with contracts

We developed mockup mode as a safe environment to develop and test Michelson smart contracts.

To interact with a contract, we must first originate (deploy) it. Let’s add a dummy contract to our mockup state:

$ mockup-client originate contract dummy transferring 100 from bootstrap1 running \
  'parameter unit; storage unit; code { CAR; NIL operation; PAIR}' --burn-cap 10

Node is bootstrapped.
Estimated gas: 1589.562 units (will add 100 + 36 for safety)
Estimated storage: 295 bytes added (will add 20 for safety)
Operation successfully injected in the node.
Operation hash is 'oor3iMLau7g9K78pWTrAETx5KwrE7jTWYR7euh2MC6pqReV8aX7'
NOT waiting for the operation to be included.
Use command
  tezos-client wait for oor3iMLau7g9K78pWTrAETx5KwrE7jTWYR7euh2MC6pqReV8aX7 to be included --confirmations 30 --branch BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU
and/or an external block explorer to make sure that it has been included.
This sequence of operations was run:
  Manager signed operations:
    From: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
    Fee to the baker: ꜩ0.000441
    Expected counter: 2
    Gas limit: 1726
    Storage limit: 315 bytes
    Balance updates:
      tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ................ -ꜩ0.000441
      fees(the baker who will include this operation,0) ... +ꜩ0.000441
    Origination:
      From: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
      Credit: ꜩ100
      Script:
        { parameter unit ; storage unit ; code { CAR ; NIL operation ; PAIR } }
        Initial storage: Unit
        No delegate for this contract
        This origination was successfully applied
        Originated contracts:
          KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j
        Storage size: 38 bytes
        Paid storage size diff: 38 bytes
        Consumed gas: 1589.562
        Balance updates:
          tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ... -ꜩ0.0095
          tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ... -ꜩ0.06425
          tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx ... -ꜩ100
          KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j ... +ꜩ100

New contract KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j originated.
Contract memorized as dummy.

We now check some things.

The contract account dummy is now listed as known along with all bootstrap accounts:

$ mockup-client list known contracts

dummy: KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j
bootstrap5: tz1ddb9NMYHZi5UzPdzTZMYQQZoMub195zgv
bootstrap4: tz1b7tUupMgCNw2cCLpKTkSD1NZzB5TkP2sv
bootstrap3: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
bootstrap2: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN
bootstrap1: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx

Contract dummy also has the expected amount of ꜩ in its balance:
```
$ mockup-client get balance for dummy
```
```
100 ꜩ
```

Let’s inspect our freshly-originated dummy contract and display part of its state through its storage:

$ mockup-client get contract storage for dummy

Unit

The contract’s storage can also be accessed in JSON format through the usual RPC mechanism:

$ mockup-client rpc get /chains/main/blocks/head/context/contracts/KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j/storage

{ "prim": "Unit" }

We can of course send some money to dummy and verify it has been added to the contract’s balance:

$ mockup-client transfer 100 from bootstrap3 to dummy

Node is bootstrapped.
Estimated gas: 2237.715 units (will add 100 for safety)
Estimated storage: no bytes added
Operation successfully injected in the node.
Operation hash is 'ooAe9HRnc1veUPTVPBtMEpfUi5isgYm4MzeD13MN8Nxfdgfa8AZ'
NOT waiting for the operation to be included.
Use command
  tezos-client wait for ooAe9HRnc1veUPTVPBtMEpfUi5isgYm4MzeD13MN8Nxfdgfa8AZ to be included --confirmations 30 --branch BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU
and/or an external block explorer to make sure that it has been included.
This sequence of operations was run:
  Manager signed operations:
    From: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
    Fee to the baker: ꜩ0.000485
    Expected counter: 1
    Gas limit: 2338
    Storage limit: 0 bytes
    Balance updates:
      tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU ................ -ꜩ0.000485
      fees(the baker who will include this operation,0) ... +ꜩ0.000485
    Transaction:
      Amount: ꜩ100
      From: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
      To: KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j
      This transaction was successfully applied
      Updated storage: Unit
      Storage size: 38 bytes
      Consumed gas: 2237.715
      Balance updates:
        tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU ... -ꜩ100
        KT1QgvWVQHXDu6ryqQ1t3GN3UciToFbLhu7j ... +ꜩ100

$ mockup-client get balance for bootstrap3

3999899.999515 ꜩ
````


```shell
$ mockup-client get balance for dummy

200 ꜩ

The examples so far have used mockup mode’s default settings. Some use cases need a custom setup, so mockup mode lets us configure some initial parameters, as we discuss next.

Tune mockup parameters

For simplicity, mockup mode — like sandboxed mode — uses default values for wallet and protocol parameters. These default settings can be inspected and overridden to suit your needs.

The default configuration can be inspected as follows (recall that mockup-client is a command alias for tezos-client plus some parameters):

$ mockup-client config show

Default value of --bootstrap-accounts:
[ { "name": "bootstrap1",
    "sk_uri":
      "unencrypted:edsk3gUfUPyBSfrS9CCgmCiQsTCHGkviBDusMxDJstFtojtc1zcpsh",
    "amount": "3999799925405" },
  { "name": "bootstrap2",
    "sk_uri":
      "unencrypted:edsk39qAm1fiMjgmPkw1EgQYkMzkJezLNewd7PLNHTkr6w9XA2zdfo",
    "amount": "4000100000000" },
  { "name": "bootstrap3",
    "sk_uri":
      "unencrypted:edsk4ArLQgBTLWG5FJmnGnT689VKoqhXwmDPBuGx3z4cvwU9MmrPZZ",
    "amount": "3999899999515" },
  { "name": "bootstrap4",
    "sk_uri":
      "unencrypted:edsk2uqQB9AY4FvioK2YMdfmyMrer5R8mGFyuaLLFfSRo8EoyNdht3",
    "amount": "4000000000000" },
  { "name": "bootstrap5",
    "sk_uri":
      "unencrypted:edsk4QLrcijEffxV31gGdN2HU7UpyJjA8drFoNcmnB28n89YjPNRFm",
    "amount": "4000000000000" } ]
Default value of --protocol-constants:
{ "hard_gas_limit_per_operation": "1040000",
  "hard_gas_limit_per_block": "10400000",
  "hard_storage_limit_per_operation": "60000", "cost_per_byte": "250",
  "chain_id": "NetXynUjJNZm7wi",
  "initial_timestamp": "1970-01-01T00:00:00Z" }

We can tune these values with dedicated mockup mode creation switches. Generating the JSON data by hand is a pain (and dangerously error-prone) so we suggest to generate files corresponding to default values, and edit them.

To generate the JSON files related to protocol constants and bootstrap accounts configuration, just type:

$ mockup-client config init

Written default --bootstrap-accounts file: /tmp/mockup/bootstrap-accounts.json
Written default --protocol-constants file: /tmp/mockup/protocol-constants.json

We can now edit the files bootstrap-accounts.json and protocol-constants.json to later create a tuned mockup state.

For example, we can change the chain_id field of protocol-constants.json. We will compute a new chain identifier, which will replace the initial NetXynUjJNZm7wi value.

tezos-client compute chain id from seed my-chain-id

NetXKQNvsbETtvZ

Let’s create a new protocol constants configuration file, using jq to manipulate the JSON data.

$ cat /tmp/mockup/protocol-constants.json | \
  jq '.chain_id = "NetXKQNvsbETtvZ"' > tuned_up_protocol_constants.json

Assuming you have not renamed the files, you can create a new mockup setup by feeding the JSON configuration to the create mockup command with the following command-line invocation.

$ mv /tmp/mockup /tmp/mockup.old && \
     mockup-client --protocol ProtoALphaALphaALphaALphaALphaALphaALphaALphaDdp3zK \
     create mockup \
     --protocol-constants tuned_up_protocol_constants.json \
     --bootstrap-accounts /tmp/mockup.old/bootstrap-accounts.json

Created mockup client base dir in /tmp/mockup
mockup client uses protocol overrides:
hard_gas_limit_per_operation: 1040000
hard_gas_limit_per_block: 10400000
hard_storage_limit_per_operation: 60000
cost_per_byte: 0.00025

mockup client uses custom bootstrap accounts:
name:bootstrap1
sk_uri:unencrypted:edsk3gUfUPyBSfrS9CCgmCiQsTCHGkviBDusMxDJstFtojtc1zcpsh
amount:3999799.925405;
name:bootstrap2
sk_uri:unencrypted:edsk39qAm1fiMjgmPkw1EgQYkMzkJezLNewd7PLNHTkr6w9XA2zdfo
amount:4000100;
name:bootstrap3
sk_uri:unencrypted:edsk4ArLQgBTLWG5FJmnGnT689VKoqhXwmDPBuGx3z4cvwU9MmrPZZ
amount:3999899.999515;
name:bootstrap4
sk_uri:unencrypted:edsk2uqQB9AY4FvioK2YMdfmyMrer5R8mGFyuaLLFfSRo8EoyNdht3
amount:4000000;
name:bootstrap5
sk_uri:unencrypted:edsk4QLrcijEffxV31gGdN2HU7UpyJjA8drFoNcmnB28n89YjPNRFm
amount:4000000
Tezos address added: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
Tezos address added: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN
Tezos address added: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
Tezos address added: tz1b7tUupMgCNw2cCLpKTkSD1NZzB5TkP2sv
Tezos address added: tz1ddb9NMYHZi5UzPdzTZMYQQZoMub195z

We can check that the chain id in our environment setup matches the chain id we obtained from the command line.

$ cat /tmp/mockup/mockup/context.json | jq .chain_id

"NetXKQNvsbETtvZ"

Context state

In addition to the two bootstrap-accounts.json and protocol-constants.json configuration files, stateful mockup mode stores state data in a single context.json file.

context.json is located under the mockup subdirectory of the base directory. In our running example, its absolute file name is /tmp/mockup/mockup/context.json. It contains in particular information about the current block hash or the shell header:

$ cat /tmp/mockup/mockup/context.json | \
  jq '.context | { block_hash: .block_hash, shell_header: .shell_header}'

{
  "block_hash": "BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU",
  "shell_header": {
    "level": 0,
    "proto": 0,
    "predecessor": "BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU",
    "timestamp": "1970-01-01T00:00:00Z",
    "validation_pass": 0,
    "operations_hash": "LLoZKi1iMzbeJrfrGWPFYmkLebcsha6vGskQ4rAXt2uMwQtBfRcjL",
    "fitness": [
      "01",
      "0000000000000000"
    ],
    "context": "CoUeJrcPBj3T3iJL3PY4jZHnmZa5rRZ87VQPdSBNBcwZRMWJGh9j"
  }
}

The directory where the context state resides, /tmp/mockup in our examples, is where mockup mode keeps all its data. In particular, it is key to supporting asynchronous operations in stateful mockup mode.

Running a mockup client with asynchronous state

In Tezos, extending the blockchain is a three-step process:

An operation is emitted across the network of nodes.
It gets validated, aggregated with other operations, and included in (baked in to) a block by a baker.
The (cryptographic hash of the) block gets included in the next block.

See this paper for details (search for “In order to append transactions to the ledger, all blockchains follow a similar generic algorithm …”).

Thus, mockup mode offers a stateful asynchronous mode which simulates a two-step inclusion of operations in the Tezos chain which corresponds to steps 2 and 3 above.⁶

We must add two new files in the mockup subdirectory, to store:

operations waiting to be baked in (mempool.json) and
operations rejected (trashpool.json).

How to activate

To activate asynchronous stateful mockup mode, we reuse the initial command line invocation for state creation, with an --asynchronous flag:

$ rm -Rf /tmp/mockup && mockup-client create mockup --asynchronous

Created mockup client base dir in /tmp/mockup
creating mempool file at /tmp/mockup/mockup/mempool.json
Tezos address added: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
Tezos address added: tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN
Tezos address added: tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU
Tezos address added: tz1b7tUupMgCNw2cCLpKTkSD1NZzB5TkP2sv
Tezos address added: tz1ddb9NMYHZi5UzPdzTZMYQQZoMub195zgv

This commands creates a fresh mockup directory, as for stateful mode, but adds another file to represent the mempool (mempool.json), which is initially empty.

Baking in asynchronous stateful mockup mode

Let’s add some operations to mempool.json by issuing two transfers.

$ mockup-client transfer 1 from bootstrap1 to bootstrap2
$ mockup-client transfer 2 from bootstrap2 to bootstrap3

These commands use the same syntax as for immediate stateful mockup mode; the fact that we are operating in asynchronous mode is auto-detected. Both transfer operations are now in the mempool, as we can verify:

$ cat /tmp/mockup/mockup/mempool.json

[ { "shell_header":
      { "branch": "BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU" },
    "protocol_data":
      { "contents":
          [ { "kind": "transaction",
              "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx", "fee": "403",
              "counter": "1", "gas_limit": "1527", "storage_limit": "0",
              "amount": "1000000",
              "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN" } ],
        "signature":
          "sigTXV77JT5t3xaAUnCXs4RhvwJscFaqpZvHp4Wm8tQoENKXyFz3hLyqbQkibPoo4JNeXiGHJRdeMTAK79ZJJMDTvxZGF75H" } },
  { "shell_header":
      { "branch": "BLockGenesisGenesisGenesisGenesisGenesisCCCCCeZiLHU" },
    "protocol_data":
      { "contents":
          [ { "kind": "transaction",
              "source": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN", "fee": "403",
              "counter": "1", "gas_limit": "1527", "storage_limit": "0",
              "amount": "2000000",
              "destination": "tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU" } ],
        "signature":
          "sigNDKvCy4JDjSSXNG3CBwwhkFw2S8DeVaUXamYbVddsyRwZhF7rf1RgCZUZBy8UdsYrWFUDaj9b8xH3w5Ryg3SoBfGmKJBR" } } ]

The mempool is just a JSON array listing the valid operations that are waiting to be baked in. We see our two pending transfers above, e.g. transfer 1 is first and will earn whoever bakes it into the chain 403 microtez (0.000403 ꜩ); we will return to this fee later.

We can check that our transfers have not yet been included in the chain (baked), so that the balances of bootstrap1 and bootstrap2 are unchanged. Both still have the initial value of 4000000 ꜩ:

$ mockup-client get balance for bootstrap1

4000000 ꜩ

$ mockup-client get balance for bootstrap2

4000000 ꜩ

For the balances to change, the transfers must be validated and baked in the chain.

$ mockup-client bake for bootstrap1 --minimal-timestamp

Nov  9 10:23:05.436 - alpha.baking.forge: found 2 valid operations (0 refused) for timestamp 1970-01-01T00:00:02.000-00:00 (fitness 01::0000000000000001)
Injected block BLzVCvEvPRsc

Note the --minimal-timestamp flag above. This will compute the baked block’s timestamp from its predecessor’s, instead of taking the current machine time.

In a local simulated environment, this also ensures the baking action will succeed, since the (computed) time between timestamps is guaranteed greater than the chain’s minimal time interval between blocks. Without the --minimal-timestamp flag, baking might fail because it is too close to the last baking.

The mempool is now empty since all operations were valid for the baking operation.⁷

The trashpool

The trashpool.json file which we will see below is a design feature unique to mockup mode. It does not appear in normal or sandboxed mode. First, some motivation:

The fee for transfer 1 above was a default fee which in a real system would have been paid to the baker baking the transaction into the real chain. But suppose our transfer is urgent and we want to offer an extra fee to encourage it to be baked quickly. mockup-mode allows us to offer an additional incentive to our (mock) bakers:

$ mockup-client transfer 1 from bootstrap1 to bootstrap2 --fee 1  
$ mockup-client transfer 2 from bootstrap2 to bootstrap3 --fee 0.5

Thus we have asked mockup-client to carry out two transactions: transfer 1 with a fee of 1 ꜩ, and transfer 2 with a fee of 0.5ꜩ. We then execute a selective baking operation:

$ mockup-client bake for bootstrap1 --minimal-timestamp --minimal-fees 0.6

Nov  9 10:23:06.047 - alpha.baking.forge: found 1 valid operations (1 refused) for timestamp 1970-01-01T00:00:04.000-00:00 (fitness 01::0000000000000002)
Nov  9 10:23:06.182 - mockup.local_services: Appending 1 operation to trashpool
Injected block BLE4cu7Usm8J

transfer 1 and transfer 2 are both valid transactions with respect to the emitters’ balances, but our mock baker

accepts transfer 1 and
rejects transfer 2 because of the command line option --minimal-fees 0.6.

mempool.json is empty after the baking operation, and the rejected transaction 2 has been pushed into
a trashpool.json file, for debugging:

$ cat /tmp/mockup/mockup/mempool.json

[]

$ cat /tmp/mockup/mockup/trashpool.json

[ { "shell_header":
      { "branch": "BLzVCvEvPRscvX9jre4t9oLWcSMa5o5LfnhPtSGsiL45qUSLRcB" },
    "protocol_data":
      { "contents":
          [ { "kind": "transaction",
              "source": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN",
              "fee": "500000", "counter": "2", "gas_limit": "1527", 
              "storage_limit": "0", "amount": "2000000",
              "destination": "tz1faswCTDciRzE4oJ9jn2Vm2dvjeyA9fUzU" } ],
        "signature":
          "sigpzoMG3ySUrk3JvVhjWgzK86NXbUdvD4Zad84GnC3e75ZxoAtMJAUe2tPa4UtFcr1cqmruruewYC4r9nWfKmVZcLSTaPZp" } } ]

The trashpool behaves like (and indeed is) a debug log: once an operation is recorded in the trashpool, this log entry stays there forever and cannot leave. This is because mockup mode has only one head block and one chain, and we do not store information about the past beyond the head block and the block preceding it. This is consistent with an idea of mockup mode as an environment for a single user to test state transformations on a single machine.

Differences from sandboxed mode

Mockup mode is not the only way to safely and locally test a Tezos blockchain: we can also run a Tezos blockchain sandboxed on a local machine. So in particular we can use sandboxed mode to create a blockchain consisting of a single Tezos node, and then interact with that using tezos-client.

This is a more heavyweight solution, because a sandboxed blockchain is still a blockchain, albeit one that is isolated (sandboxed) from the wider internet. A sandboxed blockchain is just as complex as an unsandboxed chain: it comes with chain history,⁸ chain branching, consensus, and so forth. These are all good things, but they are not needed for every testing scenario.

Sandboxed mode is also somewhat less friendly to debugging, e.g. it has no trashpool.

So, while it is valid to run a sandboxed blockchain and examine its behavior — and this has been done in practice — it requires some effort.

In mockup mode, in contrast,

there is no live node,
there is no chain branching (there is only ever one live block: the head), and
there is almost no history (we just store two blocks).

But,

Mockup mode is lightweight, fast, and convenient, and
there is no networking overhead, and no rounds of consensus with the one available node to decide which of the one available blocks will be added to the one available chain.
Mockup mode gives direct command-line access to some critical helper functions of a full Tezos implementation, allowing us to run and test those functions acting directly on a locally stored state.

Thus workflow is simple. The initial state, once created, is directly accessible and transformable — no need to open a terminal, run a node, log into clients, communicate with the node, let the node communicate (with itself) to reach consensus (with itself), and so forth.

Conclusions

We have presented a general overview of mockup mode in its three modes of operation:

Stateless mode gives us access to some basic but important functions.
Stateful mode gives us a state, but no baking. Every operation is immediately acted on by either being registered in the state, or rejected. There is only one live block.
Stateful asynchronous adds baking. There are a mempool and trashpool, and baking operations from the mempool either to act on the local state or to get dropped into the trashpool. There is still only one live block.

If you want to go beyond this then you can either:

set up a sandboxed Tezos network,
set up your own live Tezos network, or
connect to one that somebody else has already set up (e.g. the Tezos Mainnet).

Thus mockup mode fills in a complete menu of options for experimenting with Tezos.

We created this new version of tezos-client, with its mockup mode, to help our developers to quickly and efficiently develop and test smart contracts in a Tezos environment. They found it useful and it has improved our internal development cycle.⁹ We are happy to share this tool with the Tezos community, and we hope you will like it and find it useful too.

Mockup mode is being actively developed and will evolve best if it can benefit from your feedback. If you have a suggestion, please do not hesitate to create an issue on the tezos issue tracker.

Technical note: The current version 7.x releases have a preliminary version, with a slightly different user experience. Mockup mode has existed for Tezos protocols starting with Carthage (numbers relates to shell updates, and names to protocol updates). Edo requires a shell containing environment V1. Delphi and Carthage are both usable on 7.x releases, and shells are backward-compatible. ↩
This post covers the higher-level functionality. You can fetch a precise list of implemented functionalities with tezos-client --mode mockup rpc list. ↩
Technical documentation describes this as: mockup mode defaults to an unspecified protocol. ↩
A Tezos blockchain has a self-amendment mechanism allowing to modify the protocol, subject to community votes. These self-amendments are called protocol updates. ↩
Note that valid transfers in stateful mode are immediate, without any networking or consensus mechanisms. In contrast, a sandboxed Tezos blockchain maintains an (emulated) network and, most expensively, it adheres to the full consensus and baking mechanisms of a proper blockchain. ↩
In mockup mode, step 1 is irrelevant. This isn’t because there is no network — we might still simulate one locally. It’s because there’s no node. ↩
Note the word ‘forge’ in the code above. Wait, we can explain: it’s a term of art! Forging or minting is standard terminology in Proof-of-Stake systems (like Tezos) for the operation of creating a block, and corresponds to ‘mining’ in Proof-of-Work systems (like Bitcoin). Thus, the etymology and meaning of a phrase like “forgers forge forged transactions” is … entirely respectable. ↩
With the current Delphi protocol (announcement, changelog) it takes sixty blocks for a transaction to be removed from the mempool (this is governed by the max_operations_ttl parameter). ↩
For instance, we used mockup mode to test the bugfix for comb pairs in the Delphi protocol.
We also used mockup mode to develop the four smart contracts (ticket_builder_fungible.tz, ticket_builder_non_fungible.tz, ticket_wallet_fungible.tz and ticket_wallet_non_fungible.tz) which are examples of using the new tickets feature of Edo in a specific example implementation. Here, mockup mode tightened our developers’ development cycle in Michelson Emacs mode, which now uses mockup mode as a default engine to derive type information.
Before commit 199a1e82 Emacs integration used sandboxed mode. This required defining a tezos-client alias for inside the Emacs mode, and launching a node, all before starting up Emacs. With mockup mode we just create a state and operate on it directly, for immediate feedback. ↩

Announcing the Edo Release!

2020-11-30T12:00:00+01:00

Summary:

This is a joint announcement from Nomadic Labs, Marigold, and Metastate.

A couple of weeks ago, we were proud to see the “Delphi” upgrade to the Tezos protocol go live. This week, we are proud to announce our latest protocol upgrade proposal, “Edo”. As usual, Edo’s true name is its hash, which is PtEdoTezd3RHSC31mpxxo1npxFjoWWcFgQtxapi51Z8TLu6v6Uq.

Why is Edo being proposed when Delphi has only been in place for a short while? Although Delphi went live on November 12th, it was proposed on September 3rd. In the intervening months, we’ve been hard at work on the core Tezos software, and we’ve made significant improvements that we want to share with the users of the network. In particular, we have now completed a number of improvements that were in progress at the time that the interim Delphi update was proposed.

The Tezos protocol currently provides windows for new proposals every several months; one such window is now open. As we explain in this blog post, we intend for the foreseeable future to take advantage of every such opportunity, proposing upgrades that incorporate the improvements that have been completed in the intervening months since the last proposal.

Most cryptocurrency networks cannot be updated on a regular basis; they have no mechanism that overcomes the high coordination costs associated with protocol changes. Tezos, however, possesses an on-chain self-governance mechanism, as well as a mechanism for self-amendment without forks, and so we can propose updates to the chain which, if adopted by its users, are then automatically implemented. We intend to take full advantage of that mechanism going forward to make Tezos better and better with every proposal.

As for Edo itself: a full list of the changes can be found on this documentation page. In summary, however, the proposal contains some minor bug fixes, some additional improvements to performance and gas costs (albeit not as extreme as the ones in Delphi), the addition of a so-called “adoption period” to the voting schedule, and two important new features that we have been working on for some time: Sapling, and Tickets.

Sapling is a protocol originally developed by the Electric Coin Company for the Zcash project which implements shielded transactions. Our proposal allows smart contract developers to easily integrate Sapling in their smart contracts and create privacy-conscious applications. Because Tezos can be amended, it was possible for us to add this exciting new feature directly to Tezos itself.

Since our initial announcement of Sapling, the integration with Tezos has seen extensive testing and has been enhanced in numerous ways; we have also improved the performance.

Tickets are a convenient mechanism for smart contracts to grant portable permissions to other smart contracts or to issue tokens. While it’s possible to achieve this with existing programming patterns, tickets make it much easier for developers to write secure and composable contracts.

The “adoption period” (sometimes referred to as the “fifth period”) is an important improvement we have wanted to make to the governance mechanism for some time. Just like any other feature of the protocol, Tezos protocol amendments may make changes to the amendment process itself. Up until now, new versions of the protocol have gone live (that is, have been “activated”) one block after voting has been completed, which in practice is only sixty seconds. This has made it difficult for some Tezos bakers, indexers, and other users of the network to assure seamless upgrades of their nodes. We have also seen instances where the lack of certainty about whether an upgrade would be adopted caused some users to delay preparations until the last moment.

Under the new system, instead of four periods of eight cycles during voting, we propose to have five periods of five cycles. The new fifth period, the adoption period, will be a five cycle (approximately two weeks) gap between the adoption of the new protocol and the time when it is activated. This will aid in assuring seamless protocol transitions. (We anticipate some additional minor tweaks to the voting schedule may occur in coming protocol proposals.)

Some readers may notice that Baking Accounts, a feature that has been in the works for some time, is not included in Edo. Although the core Baking Accounts software is complete and reliable, we are not yet satisfied that the migration mechanisms needed to update the chain when Baking Accounts are activated are as seamless as we can make them. In the past, some migrations have caused delays to on-chain transactions occurring around the time of an upgrade, and our tests of the Baking Accounts migration indicate that it could take a considerable period of time. Going forward, as we will be upgrading the protocol quite frequently, we intend to minimize these migration times and any disruption to the network. We are thus working on optimizing our migration mechanisms.

Following our current policy of not slowing down deployment of completed features for ones that are not yet finished, we have held Baking Accounts back for the moment. We hope that Baking Accounts will be a feature of the next protocol proposal, which, if Edo is adopted, should occur in about three months.

Announcing Ebetanet, the Edo Preview Network!

2020-11-20T15:00:00+01:00

We have just spawned a test network for a beta version of the Edo protocol, which we plan to propose as the next (008) Tezos protocol upgrade.

The code running on the test network is our release candidate for Edo. We anticipate that the beta period will last only one to two weeks before our proposal is final. Please participate by testing it now!

We plan to replace this test network with Edonet, the successor of Delphinet, once we get out of beta and formally propose Edo.

If you are interested in participating in this test network, you can checkout the ebetanet-release branch of the repository and build it from source:

git fetch
git checkout ebetanet-release
make build-deps
eval $(opam env)
make

Note that you need to have installed the Rust compiler first. To install it, use the same instructions as those for the master branch.

Docker images are also available with tag ebetanet-release.

This branch is configured to join Ebetanet automatically so there is no need to run your node with the --network ebetanet command-line option.

Important note: To avoid mistakes, this branch cannot join Mainnet as there is no --network mainnet option available.

We thank Smart Chain Arena for providing an initial publicly available node.

Cortez End of Support

2020-11-16T15:00:00+01:00

In a context where Nomadic Labs aims to concentrate on its high value activities, we plan to refocus our efforts on projects and tools that are directly related to the heart of Tezos and its economic protocol. As a result, Nomadic Labs decided to discontinue its support and maintenance of both the Android and iOS versions of the Tezos mobile wallet, Cortez.

After a grace period running from now to 15 February 2021, Nomadic Labs will no longer guarantee Cortez’s functionality, meaning that from 16 February 2021 we will not be responsible for keeping Cortez up-to-date with the Tezos blockchain, and users use Cortez at their own risk.

We recommend Cortez mobile app users export their private key(s) and switch to another Tezos wallet. Both Magma and Galleon will allow you to import private keys from your mnemonic seed phrase. Alternatively, users can also make use of the Tezos command line wallet.

How to export your private key(s) out of Cortez

Cortez has an export function for you to extract the seed phrase for your accounts. This will allow you to access the same accounts/funds in other wallets. Note: store your seed phrase in a secure location!

ANDROID:

From the Dashboard, select the menu at the top right and choose “Key management”.
Select “Export your 24 words”.
Authenticate your password.
Your seed phrase will be displayed.

iOS:

From the main menu, select Settings and choose to Export the seed phrase.
Authenticate your password.
Your seed phrase will be displayed.

Note: Nomadic Labs will never ask you for your seed phrase! Keep your 24-word seed safe! You can use it to IMPORT your funds to a new wallet. Alternatively, you can also transfer your XTZ out of Cortez into a new wallet that supports Tezos.

The three month maintenance period starts now and ends on 15 February 2021.

Cortez is open-source so could become community-driven:

Smarter contracts thanks to Delphi (part 1/2)

2020-11-13T00:00:00+01:00

Delphi is the successor to the Carthage protocol. Delphi’s main difference from Carthage is that gas costs are lower, so that smart contracts can compute more before hitting the Delphi/Carthage per-operation gas limit of 1,040,000 gas units (gu).

In this post we quantify the difference that Delphi’s lower gas costs will make:

We start with a description and justification of the Michelson gas model; and then
we showcase the expected gains for some smart contracts chosen to illustrate the Delphi model’s advantages.

Measuring the gas gains for real-world smart contracts will be the topic of another post.

1. An overview of the Michelson gas model

To recap: gas limits guarantee that block validators only need a fraction of the time interval between blocks to validate a new block.

The exact fraction depends on the speed of the node’s hardware. Tezos tries to be inclusive, so we want even slow hardware to be able to sync the chain in real time.

As we wrote in our last gas post:

overestimating gas costs prevents developers from writing more interesting contracts, while underestimating these costs leads to possible attacks. It is important to get this right.

In fact, there are two ways to allow more computation in a block:

improve the performance of block validation and decrease the gas costs accordingly, or
refine the gas model when it overestimates gas costs.

For the Delphi proposal, we did both.

1.1 Execution steps of a smart contract call

A Tezos smart contract is a type of Tezos account consisting of:

the smart contract balance,
the smart contract code, which is executable code in the Michelson programming language, and
the smart contract storage, used to save the contract’s state between calls.

A transfer calls a smart contract by invoking it along with a parameter. The Tezos protocol then executes¹ the smart contract code, passing it two inputs:

the parameter, and
the contract’s current storage.

Outputs after execution are:

updated storage, and
a (possibly empty) list of operations to be added to the queue of pending operations.²

The Tezos protocol uses three distinct formats to represent Michelson scripts and values, listed here in increasingly higher levels of abstraction:

Raw byte sequences: is how data is stored on disk, or in operation payloads (in particular, parameters are stored as raw byte sequences).
Micheline: the Micheline format is protocol-independent (think: XML or Json for Michelson code) and portable across protocol versions. Micheline data has a tree-like structure so it is easier to manipulate than byte sequences; but it is still untyped.
The Michelson internal representation: this format is specific to each version of the Tezos protocol. Michelson expressions written in this format are guaranteed to be well-typed. It is the only format known to the Michelson interpreter.

When a Tezos smart contract is called — either due to some initial call to it, or as a consequence of working through a queue of accumulated operations e.g. from step 8 below — the Tezos protocol performs the following steps:

Read the contract’s script and storage from disk, as raw byte sequences (the parameter is already in memory).
Deserialise (convert) the script, storage, and parameter to Micheline.
Convert Micheline to the internal representation. The protocol code calls this step parsing; the command-line client calls it type-checking (since type errors might be thrown).
Interpret (execute) the smart contract using as inputs its storage and the parameter. At the end of this step, we obtain the updated storage, and a list of new operations.
Unparse updated storage — convert from the internal representation to Micheline.
Serialise this Micheline expression — convert it to a byte sequence.
Write the byte sequence to disk.
Queue the list of new operations to apply.
Loop to 1, until the operation queue is empty.

1.2 Limiting execution, with gas

Each of the execution steps above could take arbitrary time, so to guarantee that nodes can check the validity of operations in reasonable time, the protocol imposes a hard limit on computation and considers as invalid any operations that exceed it. This limit cannot just be a timeout: then fast nodes would have different valid operations than slower ones — and the network would not reach consensus.

Instead, we rely on gas; a hardware-independent abstraction of the time needed to validate operations that is built in to the Tezos protocol.

Here are the usual reasons we consume gas (relative weights depend on the smart contract being executed):

reading from disk,
deserialisation,
parsing,
interpretation, including a fixed per-operation gas cost,
unparsing,
serialisation, and
writing to disk.

Gas prevents overlong operations, so the chain’s security depends on good estimates of how long each task above will take:

An underestimate is a security risk because it makes it cheap to launch a denial-of-service attack on a node, using a smart contract.
An overestimate will unduly restrict the complexity of the smart contracts that can be run, which may annoy users but is not a security risk.

Thus, when we launched the Betanet, our gas constraints were deliberately overestimated, intended to be decreased later by protocol amendments.

1.3 Estimating gas costs (and getting it right)

But how far can we safely decrease gas costs in practice? To quantify this, we performed two kinds of benchmarks:

Macro benchmarks fill a block with operations designed to stress some particular source of gas consumption (e.g. computation, or disk accesses).
Micro benchmarks measure the time it takes to execute functions of the protocol on random inputs (of various sizes). This helps build a predictive model of the time taken by the function, so that the gas cost can precisely reflect it.

In the previous Athens and Babylon amendments we adjusted the gas model:

For the Athens proposals, we did macro benchmarks to refine the relative costs of disk accesses and computations. These benchmarks showed that disk accesses were too expensive compared to the other costs and we divided the relative cost of disk accesses by 2 to reflect this.

For the Babylon proposal, we micro-benchmarked the interpretation time of most instructions on random data of various sizes and interpolated cost models to all the benchmarked instructions. This led to a significant decrease of the interpretation costs of most instructions.

For the current Delphi proposal:

We developed a generator of well-typed Michelson code of arbitrary size and used it to micro-benchmark serialisation and parsing functions.
We optimised the parsing functions.
We optimised the Michelson interpreter.
We updated the Babylon instruction-per-instruction benchmarks to reflect the new optimisations of the interpreter and reduced the interpretation costs accordingly.
We benchmarked disk accesses on modern hardware and revised the corresponding gas model.

The modifications of the gas model brought by the Delphi proposal are documented in the following merge requests:

Now, we will measure the changes in gas consumption for contracts that are intensive in one of the components and not the others.

2. Theoretical limits

The following Michelson scripts are not realistic; they are designed to measure the gas gains for the Delphi proposal:

2.1 Gas costs for disk access (read / write)

To avoid deserialisation and parsing costs, we read and write lots of data of type bytes. Before reaching the gas limit per operation, we hit another limit defined by the protocol: the storage increase limit.

A single transaction may increase the contract storage by no more than 60kB (1kB = 1000 bytes) — we can build larger byte sequences dynamically; see later. To write more than 60kB in a single transaction we must also remove data in the same transaction.

Here is our Michelson script for a smart contract to stress the disk limits:

# Script io.tz
parameter (or (unit %add32K) (or (nat %write) (unit %read)));
storage (pair (nat %counter) (big_map %tank nat bytes));
code
  {
       UNPAPAIR;
       IF_LEFT
         {

           # %add32K(_ : unit): add 32 KiB to the storage
           DROP;

           PUSH nat 1; ADD;

           PUSH bytes 0x00;
           # 15 DUP; CONCAT
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;
           DUP; CONCAT;

           SOME; SWAP; DUP; DUG 3; UPDATE; SWAP;

           PAIR;
         }
         {
           IF_LEFT
             {
               # %write(n : nat): store 2^n bytes in the big_map

               # First clear the big_map to decrease the storage size diff
               DUG 2;
               RENAME @counter;
               SWAP;
               PUSH @index nat 0;
               PUSH bool True;
               LOOP
                 {
                   DUP; DIP { NONE bytes; SWAP; UPDATE };
                   PUSH nat 1; ADD @index;
                   DUP @index; DUP @counter 4; CMPGE
                 };
               DROP; SWAP; DROP; SWAP;

               # do n DUP; CONCAT
               DIP { PUSH bytes 0x00 };
               INT;
               DUP;
               GT;
               LOOP
                 {
                   PUSH nat 1; SWAP; SUB;
                   DIP {DUP; CONCAT};
                   DUP; GT
                 };
               DROP;
               SOME;
               PUSH nat 0;
               UPDATE
             }
             {
               # read(_ : unit): read the bytes that have been stored by %write
               DROP 2;
               DUP;
               PUSH nat 0;
               GET; DROP
             };
           PUSH nat 0; PAIR
         };
       NIL operation;
       PAIR
     }

The script io.tz above declares a storage of type pair (nat %counter) (big_map %tank nat bytes). The tank big map serves as a data tank which we will fill in several transactions, and then empty all at once to get a lot of writing rights. tank is expected to contain at index i 32KiB (1KiB = 1024 bytes) of zeros if i is smaller than the counter counter, and nothing otherwise. This script features three entrypoints:

add32K inserts data into the tank.
It adds 32KiB of data (note that 64KiB would exceed the storage size diff limit) in the first empty slot in the big map, and increments the counter to maintain the invariant.
write n empties the tank.
It puts None in all big map entries from $0$ to the value of the counter, resets the counter to $0$, and puts a byte sequence of size $2^n$ at the first position in the big map.
read accesses the first position in the big map and does nothing with it.
This entrypoint returns the storage unchanged.

We use the script io.tz as follows:

We originate it on a default storage Pair 0 {}; and
we repeatedly call the add32K entrypoint to insert enough data in the tank; and
once the storage is big enough (experimentally, 64 calls to add32K are enough), we call write with the largest parameter that the gas limit allows; and finally
we call read.

For the read entrypoint, thanks to the --trace-stack option of the client, we can compute the gas consumption of the GET instruction alone, so the gas consumption caused by reading the disk is well-isolated from other gas consumptions.

In Carthage, the largest n we can pass the write entrypoint is n=15, which hence writes 32KiB ($2^{15}$B) and consumes 535741 gu (thus slightly more than half the per-operation gas limit of 1,040,000 gu). Reading back these 32KiB costs approximately a third of the gas limit (the gas consumption of the GET instruction is 344481 gu).
In Delphi, we can reach n=21 which writes 2MiB ($2^{21}$B) and also consumes exactly 555330.160 gu (again slightly more than half the gas limit, which is the same as Carthage’s); the GET instruction of the read entrypoint then consumes almost as much gas (520045.437 gu).

In summary, comparing the gas costs in Carthage and Delphi:

Writing large pieces of data costs about 62 times less (a 98.4% saving).
Reading data costs about 42 times less (a 97.6% saving).

Quantifying the real-world impact of these cost reductions on practical Delphi usage will be the topic of the future blog post — but we can note here that this is clearly a significant reduction.

2.2 Gas costs for parsing code and data

Parsing costs are easy to decorrelate from the other sources of gas consumption thanks to the tezos-client typecheck data and tezos-client typecheck script commands that report exactly the amount of gas consumed at parsing. The Delphi gas model for parsing costs typically 40 times less than Carthage’s one.

To assess typechecking costs, we construct a synthetic script big_script.tz, which contains 4000 DUP and 4000 DROP. We then typecheck this script via the tezos-client typecheck script command.

This costs 248168 gas units under Carthage but only 6003.655 gas units under Delphi; so in this example the Dephi gas model costs 41.37 times less.

2.3 Gas costs for contract interpretation

Gas costs in the Michelson interpreter include

costs for instruction interpretation as discussed — and also
overhead costs in the Michelson interpreter due to the gas accounting system itself.

In Delphi we have optimised both. To evaluate the gains due to the Michelson interpreter optimisations in Delphi, we consider the scripts of two contracts:

constructing a large data structure (big_list.tz), and
computing a factorial function (arith.tz).

2.3.1 Gas costs for constructing a large data structure

Let’s first consider a contract that builds a huge list and does nothing with it:

# Script big_list.tz
parameter nat;
storage unit;
code
  {
    CAR;
    INT;
    NIL unit;
    UNIT;
    CONS;
    SWAP;
    DUP;
    GT;
    LOOP
      {
        PUSH nat 1; SWAP; SUB;
        SWAP;
        DUP; ITER {CONS};
        SWAP; DUP; GT
      };
    DROP 2;
    UNIT; NIL operation; PAIR
  }

When the contract big_list.tz receives n as parameter, it:

Builds a list unit that initially contains a single element.
Iterates n times in a loop that concatenates the list with itself (DUP; ITER{CONS}).
Empties the stack (DROP 2).

So this contract pays gas for building a list of length 2^n, and only for that.

In Carthage we can set at most n=20 and this consumes 996143 gas units.
In Delphi, the same contract with n=20 consumes 230406 gas units.

Thus, the cost in Delphi is 4.32 times less.

2.3.2 Gas costs for arithmetic

Arithmetic in Michelson is implemented using the Zarith arbitrary-precision library, a well-optimised OCaml wrapper of the GMP C library. Neither Zarith nor GMP has been modified in the development of Delphi and the cost models for arithmetic functions have been quite precise since Babylon, so there’s little hope for further optimisation in Delphi, for smart contracts that are mostly arithmetic computation.

To measure this, we wrote a simple loop-based factorial in Michelson:

# Script arith.tz
parameter nat;
storage unit;
code
  {
    CAR;
    PUSH @acc nat 1;
    PUSH @i nat 2;
    DUP;
    DUP 4;
    CMPGE;
    LOOP
      {
        DUP;
        DIP {MUL @acc};
        PUSH nat 1; ADD @i;
        DUP;
        DUP 4;
        CMPGE;
      };
    DROP 3; UNIT;
    NIL operation; PAIR
  }

As in the big_list.tz example, the result of the computation is not stored, to isolate the cost of the computation from the cost of storing the result.

The biggest parameter that can be sent to this contract within the gas limit in Carthage is exactly 6400. With this parameter,

1039686 gas units are consumed in Carthage and
389054 gas units are consumed in Delphi.

Thus, the factorial script costs 2.67 times less in Delphi.

2.4 Gas costs for inter-contract calls, in two parts

Part 1 of 2: Recursion

In Carthage, applying an operation has a minimal cost of 10 kgu (1 kgu = 1000 gu). In Delphi, the minimal cost is 1 kgu, thus ten times smaller. As we shall see below when we look at the self_recursion.tz script, this minimal cost is no longer a dominating cost for contracts that perform a lot of calls.

A simple way to test how many contract calls can be achieved in Carthage and in Delphi is to write a recursive contract that calls itself n times and then stops: it takes a number n as parameter and stops if n is 0, and calls itself on n-1 otherwise.

Here is the corresponding Michelson script:

# Script self_recusion.tz
parameter int;
storage unit;
code
  {
    UNPAIR; DUP; EQ;
    IF
      { DROP; NIL operation; PAIR }
      {
        PUSH int 1; SWAP; SUB;
        SELF; PUSH mutez 0; DIG 2; TRANSFER_TOKENS;
        NIL operation; SWAP; CONS; PAIR
      }
  }

In Carthage, we measured that each recursive call of self_recursion.z consumes 13178 gas units, so the largest n we can send to this contract within the protocol-set per-operation gas limit (1,040,000 gu) is n=77.
In Delphi, each recursive call of this contract only consumes 2871.559 gas, so the largest n we can send to this contract within the per-operation gas limit (identical to Carthage’s) is n=361.

We see that in Carthage the gas costs of self_recursion.tz are dominated by the 10 kgu cost of operation application, which represents about 3/4th of its total gas consumption. In Delphi, it only represents about 1/3rd of the total.

Part 2 of 2: A chain of calls

Calling a contract (recursively or otherwise) incurs a non-trivial gas cost, corresponding to loading, deserialising, and type-checking the contract as well as checking that the parameter to the call has the right type (cf. discussion above).

Since typical inter-contract interactions are not recursive, we concentrate here on non-recursive calls.

To test non-recursive contract calls, we build a chain of smart contracts. Each contract calls the next until we reach the last contract in the chain, which then originates a new contract to grow the chain. More precisely:

The contracts in the chain have two possible states called forwarding and final.
In forwarding state, the contract
- stores an address,
- casts it to contract unit (the type of contracts expecting no parameter), and
- calls it.
In final state, the contract
- originates a copy of itself with an initial storage in final state,
- stores the address of the newly-originated contract, and
- switches to forwarding mode.

Unfortunately, there is no instruction in Michelson to originate a copy of oneself; the only way to originate a contract from Michelson is with the CREATE_CONTACT instruction, which expects as static argument the full script of the contract to originate. Therefore, to originate a copy of the current contract, we cannot use CREATE_CONTRACT directly in the contract script: there is no finite script satisfying the equation script = { ...; CREATE_CONTRACT script ...; ... }. We need to untie this recursive³ equation and solve it at run-time, by either placing part of the script in the storage (using a lambda) or in the script of another contract.

We choose the former option (to put it in the storage) and leave the motivated reader to implement a solution based on the latter as an exercise.

The lambda that the contract stores needs to return both the origination operation and the address of the newly created contract, so its return type is pair address operation. As input, it needs some data to initialise the storage of the freshly-originated contract, which is basically the lambda itself in serialised form.

The trick to solve the recursive equation mentioned above is to have the lambda deserialise its own code using the UNPACK instruction — so the input type of the lambda is bytes and its code starts with UNPACK (lambda bytes (pair address operation)) — followed by code to originate the contract using the CREATE_CONTRACT instruction.

The complete script of the contract used by each link in the call chain is:

# Script link_contract.tz
parameter unit;
storage (or (lambda %final bytes (pair address operation)) (address %forward));
code
  { NIL operation;
    SWAP;
    CDR;
    IF_LEFT
      {
        # in %final mode, call the stored lambda on itself
        DUP ;
        PACK;
        EXEC;
        UNPAIR;
        DIP{CONS}
      }
      {
        # in %forward mode, call the stored address
        DUP;
        CONTRACT unit;
        ASSERT_SOME;
        PUSH mutez 0;
        UNIT;
        TRANSFER_TOKENS;
        SWAP;
        DIP{CONS}
      };
    # in both cases, we end in %forward mode
    RIGHT (lambda bytes (pair address operation));
    SWAP;
    PAIR
  }

And its initial storage is:

Left { UNPACK (lambda bytes (pair address operation));
       ASSERT_SOME;
       LEFT address;
       PUSH mutez 0;
       NONE key_hash;
       CREATE_CONTRACT {<<script_of_the_link_contract>>};
       SWAP;
       PAIR}

In Carthage, the cost of each transfer in this chain of calls is 27817 gu. We can build a chain of 36 transfers and one origination before reaching the per-operation limit of 1,040,000 gu.
In Delphi, the cost for each transfer is about 4779 gu. We can build a chain of 217 transfers and one origination before reaching the limit of 1,040,000 gu.

Thus, Delphi lets us grow the chain by a factor of almost 6.

Conclusion

The Delphi protocol reduces gas costs and allows more complex smart contracts to be deployed and executed. The gas gains vary depending on the source of gas consumption. We measured this using scripts tailored to reach the gas limit in different ways. The results in short are:

Gas source	Carthage consumed gas	Delphi consumed gas	Ratio
Writing to disk	535741 gu / 32 KiB	555330.160 gu / 2MiB	61.7
Reading from disk	344481 gu / 32 KiB	520045.437 gu / 2MiB	42.4
Parsing	248168 gu	6003.655 gu	41.4
Computation	996143 gu	230406.664 gu	4.32
Arithmetic	1039686 gu	389054.329 gu	2.67
Recursion	13178 gu	2871.559 gu	4.59
Inter-contract call	27817 gu	4779.106 gu	5.82

So to sum up:

Delphi enjoys a significant reduction in parsing costs: thanks to the refinement of the gas model for parsing, most contracts should see their parsing gas reduce by a factor of about 40, compared to Carthage, making it more practical to deploy and operate on larger contracts.
Similarly the cost of large disk accesses (I/Os) is reduced by a factor of about 60 for the largest disk writes.
Together, our improvements make inter-contract calls cheaper by a factor of 6 in our tests.
Improvements in contract interpretation costs exist though they are less striking because the cost model was quite precise already in Babylon (the version preceding Carthage). Yet, even here we can see quite significant improvements thanks to Delphi’s optimisations in the gas costs of the Michelson interpreter’s gas accounting system.

In a future blog post, we will measure the gas gains observed on a collection of real-world contracts that are either already popular on mainnet or about to be launched.

“Executes” here means “interprets”, in the sense of “an interpreter”. ↩
Returning a (non-empty) list of operations is sometimes also referred to as emitting or applying them. ↩
We distinguish between a recursion when a lambda calls itself, and a recursive contract call when a contract calls itself (with all the costs this entails). Here, we’re dealing just with the first kind of recursion. ↩

Delphi, the Latest Tezos upgrade, is live!

2020-11-12T16:00:00+01:00

Summary:

This is a joint post from Nomadic Labs, Metastate and Marigold.

We’re very happy to announce that the vote on the “Delphi” upgrade to the Tezos network passed a few hours ago (around 13:00 GMT on 12 November 2020.) The upgrade went live immediately afterwards at block 1,212,417.

An informal blog post describing Delphi is here, and a changelog of everything that went into Delphi is here. Most prominently, Delphi makes substantial improvements to the performance of the Michelson interpreter and to the gas model, and also reduces storage costs by a factor of four to reflect improvements in the underlying storage layer. We are very pleased that Delphi has gone live; it will dramatically loosen the gas constraints currently experienced by the creators of sophisticated smart contracts operating on Tezos, enabling ever more interesting applications of the system.

As described in this other blog post, we intend to continue producing a regular cadence of upgrade proposals to the Tezos network over time; the next, which will likely be named “Edo”, should appear very soon.

We are very excited about the features being proposed in Edo, and hope to announce them in the near future.

The case of mixed forks in Emmy+

2020-10-26T12:00:00+01:00

Note: This analysis was done with the help of Bruno Blanchet (Inria). The interested reader can experiment with our code used in the analysis. As in the previous analysis, we do not present any security proofs.

This is the fourth in a series of posts on Emmy⁺:

After our initial analysis,
recently revisited and
extended to the partial synchronous network model,
we now consider so-called “mixed forks”.

So far, we assumed that malicious bakers wanted to undo a transaction. In this post, we consider instead that they want to maintain a (malicious) fork for as long as possible.

We provide experimental data that such scenarios do not have a significant impact on the previous analysis, which thus remains robust in the presence of this kind of attack.

Mixed forks

$\newcommand\f[1]{\mathit{#1}}$ $\newcommand\edelay{\f{delay}}$ $\newcommand\edelaydelta{\edelay_{\Delta}}$ $\newcommand\ie{\f{ie}}$ $\newcommand\dpp{\f{dp}}$ $\newcommand\de{\f{de}}$ $\newcommand\dpde{\f{dpde}}$ $\newcommand\od{\tau}$ $\newcommand\emmy{\f{Emmy^+}}$ $\newcommand\hb{H}$ $\newcommand\cb{C}$ $\newcommand\pr{\mathtt{Pr}}$ $\newcommand\pprio[1]{\mathbb{P}_{prio}(#1)}$ $\newcommand\pendo[1]{\mathbb{P}_{endo}(#1)}$ $\newcommand\pdiff[1]{\mathbb{P}_{\Delta}(#1)}$ $\newcommand\difff[1]{\f{diff}_{=1}(#1)}$ $\newcommand\diff[1]{\f{diff}_{>1}(#1)}$ $\newcommand\diffl[1]{\f{diff}_{\ell}(#1)}$ $\newcommand\difffp[1]{\f{diff^{\leftarrow}}_{=1}(#1)}$ $\newcommand\diffp[1]{\f{diff^{\leftarrow}}_{>1}(#1)}$ $\newcommand\secu{\eta}$ $\newcommand\barpc{\bar{\pc}}$ $\newcommand\barph{\bar{p}^\star}$ $\newcommand\tsh{t^\star}$ $\newcommand\tsc{t}$ $\newcommand\diffg{\f{diff}}$ $\newcommand\pc{p}$ $\newcommand\maxp{\f{max\_p}}$ $\newcommand\ph{p^\star}$ $\newcommand\diffgns{\f{diff}^{\;ns}}$ $\newcommand\diffgs{\f{diff}^{\;s}}$ $\newcommand\cchain{\f{ch}_\cb}$ $\newcommand\hchain{\f{ch}_\hb}$

For readability we introduce some notation: we use $\hb$, resp. $\cb$ to distinguish between honest and corrupt.

We can think of several dishonest bakers $\cb_i$ as being a single ‘composite’ dishonest baker $\cb$ having as stake fraction the sum of the stake fractions of $\cb_i$, and similarly for the honest bakers.

So for simplicity, we reason henceforth using a pair of an honest baker $\hb$ and a dishonest baker $\cb$ acting as an adversary.

In our previous analyses (as above) we assumed $\cb$ wants to undo a transaction, so:

$\cb$ bakes a secret chain $\cchain$ and
once $\cchain$ is faster than $\hb$‘s chain $\hchain$, $\cb$ reveals it, and
honest $\hb$ adopts $\cchain$ and bakes on it thenceforth.

Now, suppose $\cb$ wants to maintain the system in a forked state¹ as long as possible. So:

$\cb$ still bakes a secret chain $\cchain$,
$\cb$ still reveals it once it is faster than $\hchain$, and
$\hb$ still adopts $\cchain$ and bakes on it — but now,
$\cb$ continues to secretly bake, this time on $\hchain$.

The roles of the two chains are thus swapped, and that’s why we talk about mixed forks: the same chain may be $\hb$‘s chain at some point and $\cb$‘s chain at some other point, and this swapping of roles may happen more than once.

We illustrate such a scenario in Figure 1 below, where:

Blocks (and their timestamps) baked by $\hb$ are identified by the “$^\star$” superscript.
Blocks baked by $\cb$ are in red.
The hashed red pattern denotes a block that $\cb$ bakes but does not reveal.
A block index represents the level at which the block is baked.
The number in each block denotes the block’s priority.

We assume $\cb$ has enough stake and chance to have consecutive 0 priorities for the blocks following $b_1$, so that $\cb$ can bake those blocks sooner than $\hb$.

Figure 1 shows that:

At level 1, $\cb$ starts a fork from $b^\star_0$ by baking $b_1$; $\cb$ cannot reveal $b_1$ because $b_1$ has a larger timestamp than $b^\star_1$ baked by $\hb$.
At level 2, $\cb$ double-bakes (bakes on its own secret chain and on the chain of $\hb$) and reveals $b_2$, the block baked on its own secret chain before $\hb$ bakes at this level (the block $b^\star_2$ which $\hb$ would have baked had it not been $b_2$ is in gray, drawn with dashed lines).
At level 3, $\cb$ bakes $b_3$ on top of $b'_2$ while $\hb$ bakes $b^\star_3$ on top of $b_2$; $\hb$ does so because $\hb$ swapped chains once $\cb$ revealed $b_2$: at that point, $\cb$‘s alternative chain $b_0^\star b_1b_2$ is longer than $\hb$‘s chain $b_0^\star b_1^\star$.
At level 4, $\cb$ bakes $b_4$ on top of $b_3$ and reveals it before $\hb$ bakes at this level (note that we are in a similar situation as that at level 2).
At level 5, $\hb$ bakes $b^\star_5$ on top of $b_4$: once $\cb$ revealed $b_4$, $\hb$ swapped again chains because the alternative chain is longer than the one on which $\hb$ was baking. This is the same reasoning as at level 3. Note that $\hb$ bakes $b^\star_5$ on the chain $\hb$ initially started baking on.

$\cb$ could in theory continue baking on both chains and force $\hb$ to swap chains repeatedly, assuming $\cb$ receives priorities and endorsement slots to allow it. We would thus like to quantify to what extent can $\cb$ do that: how often and for how long.

Probabilities of mixed forks

To experimentally show that the probability that $\cb$ can maintain a fork decreases (sufficiently rapidly) as fork length increases, we revisit the scenario “forks starting now”. We consider the following question:

How long can $\cb$ maintain the system in a forked state?

To answer this, we use an approach similar to the one for computing the probabilities of forks in the non-mixed forks case. We presented the methodology behind the analysis in a previous post. However, the key aspects needing careful consideration are:

how to identify when a swap is feasible and advantageous for $\cb$, and
how the update the time difference between chains at the moment of a swap.

Using our updated analysis, we plot the probabilities of mixed forks and compare them with the probabilities of non-mixed forks as already computed in the previous analysis). In this experiment we assumed that the network is synchronous.²

The plot suggests that, for decent attacker stake fractions such as .2, or even .3, the expected number of confirmations in the mixed case is only a bit larger than in the non-mixed case. For instance, for .3 attacker stake fraction, one needs to wait a priori for 20 confirmations instead of 16. The expected number of confirmations gets larger for higher attacker stakes: for .4, the expected number of confirmations is 85, vs. 67 for the non-mixed case.

To conclude, our experiments show that, even if $\cb$ tries to repeatedly make $\hb$ swap chains, $\cb$ cannot do this for too long: the number of expected confirmations when considering “mixed forks” is only slightly higher than without them.

This post considers specifically the case that $\cb$ tries to maintain one fork. Of course we could generalise this to multiple forks, but our simulations on a simplified model suggest that one fork is already a good approximation to the more general case. ↩
We recall that $\Delta = 30$ represents the case when the network is synchronous. ↩

Regular Scheduling For Our Tezos Proposals

2020-10-21T00:00:00+02:00

The teams at Nomadic Labs, Metastate, Marigold, and DaiLambda have participated in a number of joint protocol proposals for Tezos; some of us have been working on the code since the original launch of the Tezos network, and have been involved with updates from Athens through the recent Delphi proposal. Over time, we have gained more and more experience and have learned what practices seem to work best for updates to the Tezos ecosystem.

Up until recently, we have generally focused on releasing proposals when a particular pre-determined set of features have become stable. As the community has matured, and as we have gained experience, we have developed the belief that it is better to release proposals on a regularly scheduled basis instead.

The rest of this blog post describes our reasoning.

In software development, teams are often tempted to release a new version of their code only after a particular set of features is complete. The problem with this approach is that it often leads to releases occurring at longer and longer intervals.

The syndrome tends to work this way: a particular feature is on the must-have list for a release, but is delayed. A developer of another feature worries that because the gap between releases has increased, it is important that their feature be incorporated before the next release or it may not see the light of day for some time. The work to incorporate this change then delays the release further.

Then, other developers notice that releases are happening at longer and longer intervals, and they in turn become more and more worried that if they don’t get their features into the next release, it might be a very long time before users see it.

Soon, a vicious cycle is in progress where releases happen further and further apart, first being delayed by months, and then by years.

One remedy that many projects have found for this problem is straightforward: release code regularly, at scheduled intervals rather than when particular features are complete.

The reason this works is also straightforward. If trains leave a station every ten minutes, few people will be worried about missing a train; there will always be another a few minutes later. However, if trains leave without a fixed schedule and at long intervals, people become very paranoid that when the current train leaves, there might not be another for quite some time.

An important feature of Tezos is its ability to evolve without breaking its own rules. Tezos’ on-chain governance provides a neutral mechanism for stakeholders to coordinate and agree on updates to the protocol and its implementation.

Like many blockchain projects, Tezos is an open-source project with contributors from all over the world but, unlike other projects, it provides a mechanism through which anyone can credibly advance proposals without facing the typically insurmountable hurdle of coordinating a hard-fork.

Our teams, alongside many other contributors around the world, have historically collaborated on making ambitious proposals for the Tezos project. Going forward, we have decided to collaborate to submit protocol proposals every few months, which is the interval permitted by the Tezos on-chain amendment process. These will incorporate code that has been completed to our satisfaction by the scheduled date, rather than our introducing new proposals only when a predetermined set of features are finished.

The capacity to evolve is a key distinguishing feature of Tezos. With a scheduled releases approach, protocol proposals will occur with a much more steady cadence. We believe that this will, in turn, allow Tezos to gain the features developers and users have requested on a more dependable timeline.

Meanwhile at Nomadic Labs #9

2020-10-20T15:00:00+02:00

It’s been a while since we published a post in our meanwhile series, and as always we’ve been working hard behind the scenes to improve the Tezos ecosystem.

August marked a milestone: we launched Dalphanet, a dedicated test network designed to examine features from all developers involved in submitting the more extensive and long-awaited protocol proposal, including Sapling, a new protocol environment, among other improvements. For more information on Dalphanet see:

On August 19th, Raphaël Cauderlier attended the Tezos Town Hall. The main topics were governance and technical processes within the Tezos ecosystem. Tezos’ governance is uniquely innovative and flexible. The discussion shifted to high gas fees which led to the proposal of Delphi, an intermediate proposal to lower gas costs when interacting with smart contracts on the Tezos blockchain.

In conclusion, Gabriel Alfour, Metastate and Nomadic Labs announced Delphi, a smaller upgrade to the Tezos blockchain focused on lowering gas fees. Also, Delphi reduces storage costs by a factor of four, to reflect improvements in the underlying storage layer.

The public baker Stakery was first to inject the Delphi protocol to the Tezos blockchain. We are currently in the testing period, and the promotion period will start very soon. Follow this link to Agora to track the progress in real-time. Since the injection, we have also been working on finalizing a proposal for 008.

We also released Version 7.4, enabling the community to participate in Delphinet, a test network that we spawned for the recently injected Delphi proposal.¹

Nomadic Labs is pleased to support another promising project: Tezos DigiSign, an open source solution to digitally sign, store and verify documents on the Tezos blockchain, launched by Sword Group. Read more in Tezos Digisign press release.

In mid-September we:

published our latest press release: Tezos selected by Societe Generale/Forge for its central banking digital currency experiment;
published a new blog post on defending against malicious reorgs in Tezos proof of stake;
attended September’s Tezos Town Hall with topics ranging from Tezos core development to the technical decision-making in the Tezos ecosystem; and
co-organized a scientific workshop with Inria. As part of our collaboration we held presentations and discussions on research & tools within the Tezos ecosystem.

In October we:

published two blog posts:
- Emmy⁺ in the partial synchrony model describes Emmy⁺, going beyond the synchronous network model, and
- Dexter: Decentralized exchange for Tezos, formal verification work by Nomadic Labs goes into detail on how we verified a functional specification of Dexter’s core smart contract using Mi-Cho-Coq;
uploaded our internship catalog featuring ten open positions; and
announced a new release candidate: Version 8.0.

We look forward to the future and continued growth of users, developers & builders within the Tezos ecosystem.

Delphinet is not to be confused with Dalphanet! ↩

Dexter: Decentralized exchange for Tezos, formal verification work by Nomadic Labs

2020-10-14T10:30:00+02:00

On September 30th, camlCase released Dexter.

Dexter is a smart contract empowering decentralized exchange, performing the same function for tez as Uniswap on Ethereum does for ether; it enables trade between tez and any other token implemented by an FA1.2 contract.

To this day, the following efforts have been undertaken to ensure Dexter’s quality, including:

a security audit by Trail of Bits,
extensive property-based testing by camlCase, and
a formal verification of the functional specification of Dexter’s underlying smart contract by Nomadic Labs, using the Coq proof assistant with the Mi-Cho-Coq framework — the topic of this blog post.

Checking smart contracts is never about attaining certainty; it’s about clarifying what we do and do not know. Therefore, our efforts increase our confidence in the Dexter smart contract, but do not make errors impossible. In this blog post we give:

an outline of what we have proved about Dexter, and also
what we did not prove.

What is Dexter and FA1.2?

Dexter is a smart contract, with an accompanying decentralized app enabling decentralized token exchange on the Tezos blockchain. Users call Dexter’s smart contract to trade between tez and other tokens, without the intervention of a central authority. For instance:

tez can be exchanged to tzBTC, a token that wraps bitcoin, and
tez can be exchanged to USDtez, a USD-pegged stable-coin token⁴.

More precisely, Dexter is compatible with any FA1.2 token¹. FA1.2 is a standard for smart contracts implementing tokens on the Tezos blockchain. It specifies a minimal set of entrypoints for the contract to provide and how these entrypoints must behave. It is similar to the ERC20 standard for Ethereum.

Formally verifying a functional specification of Dexter

Nomadic Labs has experience of applying formal verification to smart contracts, most notably through our formal verification of the multisig and the spending limit contracts. We brought this experience to bear on Dexter as follows:

We built an informal, functional, specification (in collaboration with camlCase);
formalized this specification in Coq; and
certified that the Dexter smart contract satisfies the formal specification, using a machine-checked proof in Coq.

The informal specification clearly and comprehensively describes the Dexter smart contract’s desired functional behaviours in natural language. This does three important things:

It’s a clear, human-readable, functional specification of the contract. A functional specification defines the contract’s desired, immediate output, in terms of updated storage and emitted operations, for each input.
The act of rigorously working through the specification improved the quality of the final code in a virtuous engineering cycle which ironed out details and covered corner cases.
We could then convert this human-readable specification into a formalization suitable for machine checking in Coq.

This work, while considerable and consequential, cannot be a panacea or a silver bullet. The following aspects has only been checked in the audit, and through testing, but have not been subject to a formal study:

we cannot guarantee the soundness of the specification;
we have not formally verified Dexter’s inter-contract communication; and finally
we have not formally verified the temporal properties of Dexter.

This is subject to future work. We discuss each of these limitations in more detail below.

But, within these limitations: we know what the contract does, we checked the design space in detail — and furthermore, we proved it.

Specified, certified code in this style is also an important long-term investment in maintainability and security. For example it is future work to verify high-level, economic properties such as non-depletion, and to do this we will not have to look at Dexter’s source code: we can read the human-readable specification and/or reason from the formal specification, and our certification ensures the conclusions will hold of the implementation.

Methodology

Our development followed a policy where the contract is specified, implemented, and verified by three distinct teams:

the specifiers,
the implementors, and
the verifiers.

This policy forces the teams to verify each other’s work, and thus increases overall confidence in the development.

For example, if the verification team spots a divergence between the code and the specification, they can’t just change one to suit the other: they have to communicate with the implementors and specifiers to agree on an intended behavior, and update the code and/or the specification accordingly.

For Dexter, we have had:

one specifier who worked with camlCase with the informal specification, and then worked with the formalization of that specification;
camlCase that implemented the code of Dexter; and
two verifiers that proved the contract against the formal specification.

We followed this policy for most of the entrypoints, but time constraints meant the specifier also participated in the verification of 3 out of 11 entrypoints.

Time investment

Work with the informal and formal specification of Dexter has been ongoing part-time since November 2019. In total, the collaborating on the informal specification and developing the formal specification took 2 person-weeks from Nomadic Lab’s side. The final proof took three verification engineers four person-weeks to write.

Verification framework

The functional specification of Dexter has been verified in the Coq proof assistant using the Mi-Cho-Coq framework.

Coq is a rich environment, in which we can directly state and prove a wide variety of mathematical properties. In particular we can specify programs, program executions, and properties of program executions — including correctness with respect to a specification — all in a single theorem-proving environment.

Likewise Mi-Cho-Coq embeds Michelson smart contracts in Coq, so we can specify Michelson contracts and verify their properties, all in Coq — for instance, that a contract can’t fail or run out of tokens.

We do not reason directly on the Mi-Cho-Coq interpreter, since implementation details of how scripts are interpreted (think: stack manipulation), are irrelevant. Instead, we use Mi-Cho-Coq’s weakest-precondition calculator, which translates a contract and its execution into a logical proposition. To prove properties of the contract, we just need to prove properties of the corresponding logical proposition.

Findings

Developing a chain of formal spec / informal spec / implemented code, and checking that the three match up, catches errors. Below, we detail the errors Nomadic Labs found in the Dexter contract. These errors have been fixed in the version of Dexter running on mainnet.

Parameter verification

Each entrypoint of Dexter specifies a set of conditions on the parameters that must hold to successfully execute the entrypoint.

Sometimes the formal specification omitted parameter verifications that the informal specification said should be present.
Sometimes the code omitted parameter verifications that the informal specification said should be present (this can be a particular security issue).
Sometimes the informal specification omitted parameter verifications that the code implemented.

Operation order

Michelson contracts emit operations at the end of execution (e.g. token transfers). In Dexter’s case, the operations are transfers either: to the caller of the contract; to a FA1.2-compliant token contract; or to another Dexter contract. When verifying Dexter we found incoherencies in the order of operations emitted by Dexter with respect to the specification.

For example: suppose a user wants to sell $X$ many tokens, for $Y$ many tez using the tokenToXtz entrypoint. Dexter’s specification asserts that Dexter should

transfer $Y$ tez to the caller, and only then
transfer $X$ tokens from the caller to Dexter through a call to the token contract.

It turned out that the implementation would emit the operations in the reverse order: first 2, then 1. This was not an exploitable security issue — but it could have been, and such a violation of a specification complicates integration of Dexter with other contracts. The following example demonstrates why.

Consider a scenario where this error was not fixed, and where the caller is itself a smart contract. The developer of the calling contract would assume from the specification that when their contract receives the $Y$ tez from Dexter, it had not yet payed the outstanding $X$ tokens. Whereas in fact, the incoherent implementation of Dexter would have transferred the $X$ tokens from the caller before sending the tez to the caller.

So the calling contract might think it had more tokens available than the actual balance, and in mistakenly attempting to spend them, there might be insufficient funds for the FA1.2 token transfer initiated by the token-to-tez trade.

This scenario demonstrates the importance of the finer details of a specification, and ensuring that the implementation satisfies it.

Deadline checks

Dexter’s entrypoints for adding and removing liquidity and for performing trades contain a deadline verification; so if an order fails its deadline verification it should be rejected.

We found a small divergence between the specification of deadlines and their implementation. The specification states that the deadline must be greater than the current time stamp — so an order handled exactly on the deadline should be rejected.

However, in some entrypoints the check was inclusive: an order handled exactly on the deadline was accepted. This does not seem to have been an exploitable security issue, but the process of finding and fixing the issue illustrates how formal verification traps errors that might otherwise sneak into production code.

Future work

As with any formal verification, there are limits to our development. Some are inherent to software verification, while others stem from limitations of Mi-Cho-Coq:

Formal verification of the soundness of Dexter’s specification

By soundness we mean conformity to some “common sense” economic properties; e.g. that an attacker can’t remove the funds of a liquidity provider without permission.

This is simple garbage in, garbage out: we can check whether an implementation satisfies its specification, but what if the specification’s wrong? For instance, a specification’s author might inadvertently permit an attack by which an attacker would remove funds that are not theirs, by simple human error or by not fully understanding the domain logic of the system.

Formal verification of Dexter’s inter-contract communication

Michelson contracts communicate by emitting transfer operations at the end of their execution. If the destination of the transfer is another Michelson contract, then that too will be executed, and in turn it may emit operations. Thus, a single contract’s execution may trigger a chain of multiple contract calls.

For example consider Dexter’s entrypoint tokenToXtz, which exchanges tokens for tez. It emits a transfer to the caller with the obtained tez, and a transfer to the token contract that moves the sold tokens to Dexter. We’d like to assert that after this operation sequence, Dexter indeed gets the sold tokens.

Currently we can’t do this in Mi-Cho-Coq: we can confirm that Dexter emits the corresponding call to FA1.2, but Mi-Cho-Coq does not implement the semantics of inter-contract communication², so we cannot specify (and so cannot verify) the final effect of the complete chain of contract calls.

Formal verification of Dexter’s temporal properties

Temporal properties concern the behavior of a system over time. For instance: we’d like to verify that a user who adds liquidity to Dexter, can always remove it later.

This requires reasoning over all possible sequences of contract calls in the blockchain, which is currently impossible in Mi-Cho-Coq: first, because it requires adding the semantics of inter-contract communication, as discussed above; and second, because it also requires modeling properties of the block chain, such that the level and timestamp of each block in the chain is strictly increasing.

Results

The proof of the contract is available in merge request !71 of the Mi-Cho-Coq repository.

Excluding comments and blank lines³ the development is 2983 = 272 + 775 + 919 + 1017 lines of Coq code, as follows:

importing the contract into Mi-Cho-Coq takes 272 lines;
the specification is 775 lines long;
the proofs of the main theorems (see here for the proof of individual entrypoints and here for the proof of the full contract) span 919 lines total;
additionally, the proofs of the main theorems use 76 lemmas, spanning 1017 lines total.

Our lemmas are largely reusable and we’ll add them to the Mi-Cho-Coq framework to help future contract certifications, by us or by others.

Moving forward

At Nomadic Labs we are proud to have participated in the development and the verification of Dexter; the first smart-contract for exchanging tokens on the Tezos blockchain.

This work:

improves the security of a key component of the Tezos blockchain,
has improved the open-source tooling for verifying smart contracts in Coq,
is a long-term investment in (and commitment to) future maintainability and security, and
defines more clearly what is assured, and what is not yet assured, of Dexter’s behaviour.

Extending Mi-Cho-Coq is active R&D at Nomadic Labs, and we see potential applications beyond functional smart-contract specifications.

For example: the Trail of Bits audit of earlier versions of Dexter revealed exploits related to inter-contract communication (which have been fixed), of a type which could not have been detected with Mi-Cho-Coq in its current form. Improving this requires a finer modeling of Tezos’ smart-contract execution model. Extending Mi-Cho-Coq in this direction, and proving such properties on contracts such as Dexter, is future work for the smart-contract verification team at Nomadic Labs.

Also, to further increase confidence in Dexter’s soundness, we envisage an equivalence proof between Dexter and Uniswap. They work on similar principles, and Uniswap has been extensively studied for security bugs, so a proof that Dexter is equivalent (in a suitable formal sense) to Uniswap, would extend the trust already held for Uniswap to Dexter.

Acknowledgments

Thanks to camlCase for their help during the verification effort, and to Trail of Bits for discussing the finer points of their audit. We are very happy to have worked with them.

Technically, Dexter can interact with any contract that satisfies the FA1.2 specification. However, security issues may arise if Dexter is integrated with a token contract that, whilst satisfying FA1.2, is malicious. Dexter’s Token integration checklist specify a set of conditions that the integrated token must satisfy to ensure safe operation. ↩
To be more precise: when applying a block, the Tezos protocol inputs the list of operations contained therein and takes actions accordingly. Suppose the next operation is a contract call, so the protocol executes it. The return value may now be a list of new contracts to call and though we’d like the protocol to execute this list, we have not implemented this in Mi-Cho-Coq. We have only implemented the function interp that is called during block application. This function only executes one contract call and then stops. Consequently, we cannot handle a contract call that triggers other contract calls. Nor do we model balance or storage updates. ↩
This count does not include the Dexter contract script, even though it is also present in the development as a Coq string. ↩
Nomadic Labs is unaffiliated with USDtez or tzBTC. ↩