Amendments at Work in&nbsp;Tezos

Summary:

We are now on the verge of submitting a protocol upgrade to a vote, and it seems like a good opportunity to explain in details the way in which Tezos node handles amendment in practice.

Brace yourselves, this article is quite technical, as are all articles in our in-depth category. Still, as we did in the previous one on snapshots, we’ll try to explain the stakes and announcements and give a brief summary in a short foreword understandable even by non-programmers.

The original whitepaper by L.M. Goodman describes Tezos as a system capable of engulfing any blockchain protocol. A blockchain is ultimately a ledger whose state (the balances and stakes) is transformed by operations, and a consensus algorithm, which chooses among all possible alternative heads proposed by the network. They are modelled by two functions, in the mathematical sense, \(apply\) and \(fitness\). From an initial state \(S_0\), \(apply(B_n,S_{n-1})\) is called at each block of height \(n\) to produce the current state of the ledger \(S_n\) (called context in the codebase). \(fitness(S_n)\) gives a score to each alternative chain to decide which chain is the current chosen head. The plot twist is that this new state \(S_n\) can actually contain two new functions \(apply\) and \(fitness\), that will replace the current ones for the next block.

This neat mathematical abstraction is the very essence of Tezos. In this article, we try to demystify how this concept is turned into an actual evolving blockchain that can amend itself to remain at the state of the art and learn from its comrades’ (i.e. other protocols’) innovations and mistakes.

The software architecture of Tezos has often been illustrated by the following image, where the node is represented by a colorful octopus (the famous El Pulpo).

visual depiction of Tezos architecture

The octopus interacts with the rest of the network via its arms, stores the data in its belly, and uses its interchangeable brain to validate the chain and select its head.

The take away of this drawing is that the Tezos node is split into two zones, one containing the other. The node is mostly a shell, a generic blockchain network and storage layer, that exchanges blocks and operations with its peers, without knowing much about their content. It sees a tree of possible chains, and has to decide which one is considered the head.

To make sense out of all this data coming from the network, the shell relies on protocols as black boxes. In order to choose its head, the shell will look at all its possible candidate heads, find the protocol associated to each one, call their \(fitness\) function, and select the one with the higher score. For instance, in the current protocol, this score is the number of endorsements and blocks in the chain. To validate blocks and operations, the protocol relies on its current state (context). For the shell, this is an append-only database storing arbitrary data. It is in the protocol’s purview to give sense to its content.

The first section of this article describes in more technical terms what these black boxes contain, the format they have to respect, and how they interact with the shell.

During a Tezos amendment, when a protocol or a user decides that it is time for a protocol change, the shell has to retrieve and switch to the new protocol. This protocol has an additional migration function \(init\) that knows how to convert the content of a previous protocol to its own format.

The next two sections explain what happens in practice during an amendment, and the final section gives a more concrete vision with a survey of past protocol upgrades. In other words, a brief history of Tezos.

The protocols: who are they? where do they live? to whom do they answer?

Practically speaking, a protocol is a series of OCaml modules definitions and interfaces that must respect a set of rules to make sure that they can be loaded properly by the Tezos shell.

Protocol format

There are two reference formats for Tezos protocols. The on-disk format, input of the Tezos compiler, is a folder containing .ml (source code) and .mli (signature) files, accompanied by a TEZOS_PROTOCOL json description, that contains the list of module in compilation order, an optional environment version (see next section), and an optional protocol hash override for testing purposes. The packed format is a binary encoding described in src/lib_base/protocol.ml. This second format is the one used to compute the protocol hash and to transmit protocols over the peer-to-peer network.

The module list must end in a module named Main that contains the definitions of \(apply\), \(fitness\) and \(init\). In reality, they are broken down into smaller data types and functions, and are accompanied by a few utilities such as data encodings and RPC definitions. The exact signature that Main must respect is called PROTOCOL, and is defined in src/lib_protocol_environment/sigs/v1/updater.mli. This is how the shell interacts with protocols in a standardised and type-safe way, considering them as black boxes that all have the same input plugs. Basically, you can see the protocols as plug-ins that conform to a common plug-in interface.

The protocol environment

Continuing on this (relevant) analogy with plug-ins, protocols have access, for convenience, to a library of functions (including cryptographic primitives) and access to the state (such as the stored balances) in a way that is safe, and maintainable. In other words, they have access to a stable plug-in API.

For that, the OCaml code of a protocol is not compiled and run in the same environment as the rest of the node. Instead, it can only access the modules whose interfaces are given in src/lib_protocol_environment/sigs/v1/. These modules cover standard OCaml and Tezos-specific libraries that have been stripped down to remove any unsafe or unmaintainable function. This is a form of sandboxing achieved through the OCaml type-system.

When the shell loads a protocol, it provides an actual implementation for each module in the environment. As a concrete example, the begin_application function from the PROTOCOL interface takes a value of type Context.t, that is the database that contains the ledger’s state. The protocol only knows that this type exists and has a few associated functions. When a protocol is loaded, this type and functions are plugged to the actual on-disk storage primitives.

This abstraction is not just interesting for reducing the attack surface via sandboxing, it also open possibilities such as replacing the implementation of functions by newer, better ones, implementing different shells, etc. For instance, the test framework of Tezos uses a variant that runs entirely in RAM.

Different clients for different protocols

If you look at the source code of Tezos mainnet, you will see the code for all the previous protocol upgrades in folders of the form src/proto_<index>_<hash>/. Each of these folders contains a subfolder lib_protocol/src/ which is the actual code of the protocol, but also others, such as bin_baker of lib_client_commands.

For running a node, only the code in lib_protocol/src/ is necessary, and it is not even necessary to have it locally, since the shell could request it from the peer to peer network. But to use this protocol from the command line client, or to run a baker for this specific protocol, the other pieces are needed, and cannot be downloaded from the network.

Indeed, the important part, the one people agree upon when they vote, is the protocol itself. The command line client is just one client among others, and the baker, endorser and accuser are just reference implementations that we’ve provided, but others can write their own custom versions. It would not make sense either to provide generic implementations, as Tezos is flexible enough to completely change the consensus algorithm or the account system, and thus the code that builds blocks or produces transactions must be specific to each protocol.

Running a baker and running a node to participate in the system represent different levels of involvement. A baker has to understand and prepare for protocol changes, because they will have to run a new baker and, depending on the update, may have to adapt to modifications in their setup.

But if you just run a node, you can launch it once, and updates will come to you safely by themselves as they are proposed and approved by Tezos stakeholders.

Amendment blocks

To trigger a protocol change — for instance, in the current system, at the end of a successful voting cycle — the protocol has access in its environment to a Updater.activate function. This function will write the hash of the chosen protocol in a specific place in the context.

In that case, the shell, after validating the block, will read the hash, and look for the protocol. It will at this point download it from its peers, compile it and load it as a plug-in.

The next step is then to prepare the context format for the next protocol by calling its \(init\) function. This function will initialise any new structure introduced. For instance, if shielded votes or transactions are introduced, they may require new structures in the context. It may also migrate existing structures if needed. For instance, the upcoming proposal will include a transition in the structure of accounts, as a first cleanup to prepare for a forthcoming rehaul of the account system. We informally call that practice stitching the context.

If you consult such a transition block via the node’s RPCs, you will notice that it has two fields protocol and next_protocol. The former is the protocol that parsed and validated the block, providing a clear JSON version of the block contents (header, operations and receipts). The latter is the one that can run in the newly migrated context, defining the contextual RPCs (i.e RPCs being protocol dependent, such as getting an account’s balance). The first example in the third part gives a more concrete views of these two fields.

Automatic vs. user-activated upgrades

Tezos has integrated support for two kind of upgrades: automatic upgrades, and user-activated upgrades.

Automatic upgrade is the method that we just explained in details, and that starts when the protocol decides to call activate. In the current protocol of mainnet, this activate function is called only on a protocol that passed a successful voting procedure. But to be precise this voting procedure is a specific case of a more general procedure, and not carved in stone at all. Using this voting mechanism, people could vote to switch to a different voting system, or to a dictatorship. To be even more dramatic, people could decide to vote for a protocol that never calls activate anymore, forcing the Tezos mainnet to use the same protocol for eternity.

User-activated protocol upgrades have different purposes. The main use case is non controversial bug fixes or emergency reactions to unexpected behaviour. Of course, people could also try and use that to instigate a revolution and bypass the automatic upgrade system (for instance, to correct the silly situation described at the end of the last paragraph). All of this you can find in great details in this article by Arthur Breitman.

A quick history of past upgrades

As a concrete example of how the amendment works in practice, let us review the history of Tezos protocol upgrades to date.

From `proto_000_Ps9mPmXa` (`"genesis"`) to `proto_001_PtCJ7pwo` (`"alpha"`)

This is what happened on June 30th of last year. Some early birds where already running nodes in the network, waiting for the activation block.

Technically, when a Tezos mainnet node boots up from an empty data directory, if forges a dummy block (the same for everyone) at level zero. This block has no data, its identifier is BLockGene...sisf79b5d1CoW2, and its next_protocol is set to proto_000_Ps9mPmXa. This means that any block at level one will be parsed and validated by this "genesis" protocol.

The genesis protocol accepts a single block (at level one) that contains a protocol hash, and data to pass to its initialisation function placed at a specific place in the context. In the case of the mainnet (betanet at the time), the activated protocol was proto_001_PtCJ7pwo, codename "alpha", and the data was a list of wallet allocations. The node will then call the initialization function of this new protocol, so that the context is prepared (initialized with the initial balances of the wallets) for protocol "alpha" to validate the subsequent blocks properly.

Let us take this example to illustrate the difference between the protocol and next_protocol fields in the RPC API explained earlier. This activation block at level 1 has its field protocol set to proto_000_Ps9mPmXa: the protocol who parses and validates the block. Its next_protocol field is however proto_001_PtCJ7pwo: the protocol who responds to API calls and will evaluate the next block.

In a sandbox, or in a custom deployment, you can push your own activation block, and launch "alpha" with your own set of wallets. Actually, the Tezos node is not even bound to start "alpha", you could start any protocol. For instance, the test network Alphanet and Zeronet goes directly from "genesis" to the last update when they are rebooted.

This is a first example of an automatic upgrade. Nodes will receive a block, and the protocol will decide to amend itself during its evaluation. Then, nodes will download a copy of the protocol from the network if they don’t have it built-in, and continue receiving and evaluating blocks using this new protocol.

From `proto_001_PtCJ7pwo` (`"alpha"`) to `proto_002_PsYLVpVv` (`"alpha_002"`)

This migration was a user-activated upgrade at level 28082. At this level, proto_001_PtCJ7pwo did not perform an activation. However, the development team at this time suggested to migrate manually to a minor revision of the protocol, and node administrators who chose to follow this user-activated branch had to force their node into performing a protocol upgrade at this level (editing the source manually, or pulling from the repository).

This upgrade proposal was meant to fix a few bugs that made their way into the betanet launch:

The origination of a new smart contract from within a smart contract was spending the initial balance provision twice.
Serialisation of Michelson bytecode was not implemented, preventing the use of instruction LAMBDA.
Using a specific method, it was possible to delegate funds to an account that did not register themselves as delegate.

This upgrade also introduced a few Michelson instructions to work on a generic bytes type when releasing proto_001_PtCJ7pwo.

From `proto_002_PsYLVpVv` (`"alpha_002"`) to `proto_003_PsddFKi3` (`"alpha_003"`)

This migration was the second user-activated upgrade at level 204761. Here again, it was a bugfix release, and node administrators who chose to follow this user activated branch had to activate it knowingly. However, the community at this time was already much bigger, and it required time to coordinate. Hopefully, as Tezos matures and formal methods are applied to more parts of the codebase, user-activated releases will not be necessary, and all updates can happen smoothly via self-amendment.

At that time, an upgrade proposal was already on our shelves. Its goal was to fix what needed to be fixed in the protocol to allow for the first vote. Indeed, when launching the betanet, we concentrated on having strong consensus and transaction layers in the protocol, but we gave less love to the self-amendment part. Hence, it was not unexpected that a few fixes here and there would have to be implemented using a user-activated upgrade. These fixes included adding missing RPCs and correcting the ballot counts.

Our timeline for proposing this upgrade was shuffled a bit by an increased level of noise, or spam, on the network in November, due to some entity creating a lot of adresses with very low balance, and making many useless transactions of microtez amounts between them.

This spamming was possible because address creation was free, on purpose, to encourage adoption and barrierless entry to the system. This abuse led us to apply the same storage cost policy to adresses as to originated contracts.

Another spam prevention method, popular amongst bakers, was introduced. The network was running fee-less at the time. As part of the upgrade, the shell was modified to impose fee thresholds in the mempools, and a similar mechanism in the baker to include only transactions that respect a given fee scheme. This later change was not a change in the protocol itself but piggy-backed on top of the coordination achieved for the user-activated upgrade.

From `proto_003_PsddFKi3` (`"alpha_003"`) to `???` (`"Athens"`)

As we explained in the last meanwhile at Nomadic, our upcoming upgrade proposal(s) will contain not only fixes, but actual protocol changes that require a vote by the Tezos community. We will detail all proposed changes and alternatives in a forthcoming post.

For now, let us describe a small but defining change in our upcoming proposal. Up to now, protocol names (quoted in the titles above, such as the current "alpha_003") have been mere identifiers, stored in a specific place of the context to identify the current version.

We want to use the first voted proposal as an opportunity to adopt a new policy, and instead use these version strings to give a proper name to each new Tezos protocol version.

Our suggestion is to use city names, anglicised and in alphabetic order. Of course, in the future other entities will propose protocol upgrades, and it will be up to them to follow the convention or not. Yet, city names provide a wide set to choose from for each letter, with even a bit of room to express things.

As a demonstration, "Athens" is a pretty obvious choice for this first voted upgrade, don’t you think?

Amendments at Work in Tezos