Nomadic Labs
Nomadic Labs
Catching sneaky regressions with pytest-regtest

An in-depth look on how we use regression testing to catch bugs in Tezos.

Testing is an important complement to formal methods that we use through out Tezos to ensure software quality. In this blog post we discuss how we use regression testing, through the pytest-regtest tool.

Regression testing

Regression testing is a coarse-grained testing method for detecting unintended changes in the behavior of the system under test. We have applied regression testing to the tezos-client, notably to smart-contract type checking and execution. Globally, regression testing consists of two components. First, a way of running and storing the behavior of a system in a test scenario. Second, a way of running the same test again and comparing the behavior of the first and second run.

This is why we call regression testing coarse-grained: a regression test passes only if the output is exactly the same as in the first, stored, run. Consequently, a regression test will detect non-functional changes, such as changes in white space output by the tested system. Such nitpicking can be overly restrictive, and we will show later in this blog post how to relax overly strict verification.

In our case, we have enabled regression testing on an existing test suite that exercises the type checker and interpreter of Michelson smart contracts. In the first run of the test suite, the output of each command passed to tezos-client by the test case is stored in a log file by the test runner. After verifying that validity of the output, we commit these log files to source control. In subsequent runs, the output of each command sent to the client is compared to the expected output stored in the corresponding log file.

Regression testing using pytest-regtest

To implement regression tests we plug in pytest-regtest to the pytest-based Python Testing and Execution framework used for integration testing of Tezos. This plugin provides the regtest fixture that is then used as a file object: it can be written to, or used as a context manager. If the --regtest-reset flag is passed to pytest, then all output written to the regtest fixture during a given test case is stored to a log file. On subsequent runs (unless the --regtest-reset flag is passed again), the output written to the fixture will be compared to what was recorded during the first run.

Let’s write a test case and then extend it with regression testing. We start out with a classic unit test that exercises the Michelson ADD instruction. We first define a Michelson contract that takes a pair of natural numbers as parameter and, using the ADD instruction, calculates their sum and puts it in storage:

parameter (pair nat nat);
storage (option nat);
code { CAR; UNPAIR; ADD; SOME; NIL operation; PAIR; }

We store the contract in add_example.tz and write the following pytest:

class TestRegressionExample:
    def test_example_add(self, client):
        add_contract = 'add_example.tz'
        run_script_res = client.run_script(
            add_contract, 'None', 'Pair 10 15', amount=None
        )
        assert run_script_res.storage == '(Some 25)'

The tests runs the add_example.tz contract with parameters 10 and 15 using the run_script method of the client fixture of the Tezos Python testing suite, and then verifies that the result is indeed 25.

As expected, this test passes:

$ pytest -rf tests_python/tests/test_contract_opcodes.py -k test_example_add
============================== test session starts ===========================================
platform linux -- Python 3.7.3, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: .../tezos/tests_python, inifile: pytest.ini
plugins: timeout-1.3.3, parallel-0.0.9
collected 302 items / 301 deselected / 1 selected

tests_python/tests/test_contract_opcodes.py::TestContractOpcodes::test_example_add PASSED

This is all very well, but say that we want to test some other aspects of the ADD instruction, such as its gas consumption. We can either write a new test case, but this quickly gets tiresome as Michelson has some 80 instructions that should all be tested. Furthermore, gas constants are likely to evolve, so hard coding them in the tests makes for brittle tests. Our test suite already contains tests for all instructions — it would be more convenient if we could piggy-back existing tests, but without asserting explicitly the gas costs of each instruction in those tests.

First, how do we inspect the gas consumption of a contract? We can track gas consumption of a program (and by extension, its instructions) using the --trace-stack option to the run script command of the client. When this option is given, the interpreter reports a stack trace that contains the contents of the stack and the remaining gas at each program point during execution. Let’s try (the output has been redacted for brevity):

$ tezos-client run script add_example.tz on storage None and input 'Pair 10 15' --trace-stack
storage
  (Some 25)
emitted operations

big_map diff

trace
  - location: 9 (remaining gas: 799557 units remaining)
    [ (Pair (Pair 10 15) None)       ]
  - location: 10 (remaining gas: 799557 units remaining)
    [ (Pair 10 15)      @parameter ]
  - location: 13 (remaining gas: 799556 units remaining)
    [ (Pair 10 15)      @parameter
      (Pair 10 15)      @parameter ]
  - location: 14 (remaining gas: 799556 units remaining)

[...]

  - location: 11 (remaining gas: 799554 units remaining)
    [ 10
      15     ]
  - location: 18 (remaining gas: 799553 units remaining)
    [ 25     ]
  - location: 19 (remaining gas: 799553 units remaining)
    [ (Some 25)      ]
  - location: 20 (remaining gas: 799552 units remaining)
    [ {}
      (Some 25)      ]

[...]

We can deduce that the gas consumption for the ADD instruction in this execution is less than one gas unit (location 18), and that the gas consumption for the whole script is no more than five units. This is correct, and we would like future tests to signal if this changes unexpectedly. A solution would be to write the whole stack trace to the regtest fixture. To implement this idea, we modify our test:

    def test_example_add_reg(self, client, regtest):
        add_contract = "add_example.tz"
        run_script_res = client.run_script(
            add_contract, 'None', 'Pair 10 15', amount=None, trace_stack=True
        )
        regtest.write(run_script_res.client_output)
        assert run_script_res.storage == '(Some 25)'

The modified test requires the regtest fixture by including it as a formal parameter. Then we instruct the client fixture to pass the --trace-stack flag to tezos-client through the trace_stack=True. Finally, we write the complete output from the client, as obtained from run_script_res.client_output, to regtest. Now if we run the test_example_add_reg, it will fail due to the lack of a pre-existing log file.

$ pytest tests_python/tests/test_regression_example.py -k example_add_reg
tests_python/tests/test_regression_example.py::TestRegressionExample::test_example_add_reg FAILED [100%]

================== FAILURES ======================
____________________________ TestRegressionExample.test_example_add_reg ____________________________

regression test output differences for tests/test_regression_example.py::TestRegressionExample::test_example_add_reg:

>   --- current
>   +++ tobe
>   @@ -1,43 +1 @@
>   -storage
>   -  (Some 25)
>   -emitted operations
>
>   -big_map diff
[...]

We try running again with the --regtest-reset to create the log file:

$ pytest --regtest-reset tests_python/tests/test_regression_example.py -k example_add_reg

The generated log file is found in tests_python/tests/_regtest_outputs/test_regression_example.TestRegressionExample\:\:test_example_add_reg.out, and we can verify that it contains the expected stack trace:

$ cat tests_python/tests/_regtest_outputs/test_regression_example.TestRegressionExample\:\:test_example_add_reg.out
storage
  (Some 25)
emitted operations

big_map diff

trace
  - location: 9 (remaining gas: 799557 units remaining)
    [ (Pair (Pair 10 15) None)       ]
  - location: 10 (remaining gas: 799557 units remaining)
    [ (Pair 10 15)      @parameter ]
  - location: 13 (remaining gas: 799556 units remaining)
    [ (Pair 10 15)      @parameter
      (Pair 10 15)      @parameter ]
[...]
  - location: 11 (remaining gas: 799554 units remaining)
    [ 10
      15     ]
  - location: 18 (remaining gas: 799538 units remaining)
    [ 25     ]
  - location: 19 (remaining gas: 799537 units remaining)

We can now run the regression test and verify that it passes:

$ pytest tests_python/tests/test_regression_example.py -k example_add_reg
=================================================================== test session starts ===========================

platform linux -- Python 3.7.3, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: .../tezos/tests_python, inifile: pytest.ini
plugins: timeout-1.3.3, parallel-0.0.9, regtest-1.4.2
collected 2 items / 1 deselected / 1 selected

tests_python/tests/test_regression_example.py::TestRegressionExample::test_example_add_reg PASSED                                                                                                                                                                                  [100%]

========================== 1 passed, 1 deselected in 2.29 seconds ================================================

Now if a developer inadvertently introduces a change that modifies the gas consumption of the ADD instruction or its functioning, then this regression test should break. Likewise if the developer intentionally changes the semantics of the instruction. However, this should not be taken for granted. A good practice when writing tests is to test the test itself by intentionally breaking the functionality tested and verifying that the corresponding test breaks. We follow this advice and introduce a change in the gas cost of additions.

Gas costs are (mainly) defined in src/proto_alpha/lib_protocol/michelson_v1_gas.ml. Assume a cat, well-meaning but imprudent, prances over the developer’s keyboard, introducing the following change:

diff --git a/src/proto_alpha/lib_protocol/michelson_v1_gas.ml b/src/proto_alpha/lib_protocol/michelson_v1_gas.ml
index f61e519fe..228af62c6 100644
--- a/src/proto_alpha/lib_protocol/michelson_v1_gas.ml
+++ b/src/proto_alpha/lib_protocol/michelson_v1_gas.ml
@@ -182,7 +182,7 @@ module Cost_of = struct
     let abs int = atomic_step_cost (61 + ((int_bytes int) / 70))
     let int _int = free
     let neg = abs
-    let add i1 i2 = atomic_step_cost (51 + (Compare.Int.max (int_bytes i1) (int_bytes i2) / 62))
+    let add i1 i2 = atomic_step_cost (9000 + 51 + (Compare.Int.max (int_bytes i1) (int_bytes i2) / 62))

Oops! Suddenly additions have a fixed overhead of over 9000! It would be unfortunate if this change passes our test suite. Let us verify that it is not the case. We recompile the node and run the tests again:

$ pytest tests_python/tests/test_regression_example.py -k example_add_reg
=================================================================== test session starts ===========================

platform linux -- Python 3.7.3, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: .../tezos/tests_python, inifile: pytest.ini
plugins: timeout-1.3.3, parallel-0.0.9, regtest-1.4.2
collected 2 items / 1 deselected / 1 selected

tests_python/tests/test_regression_example.py::TestRegressionExample::test_example_add_reg FAILED [100%]

================== FAILURES ======================
____________________________ TestRegressionExample.test_example_add_reg ____________________________

regression test output differences for tests/test_regression_example.py::TestRegressionExample::test_example_add_reg:

>   --- current
>   +++ tobe
>   @@ -28,16 +28,16 @@
>      - location: 11 (remaining gas: 799554 units remaining)
>        [ 10
>          15    ]
>   -  - location: 18 (remaining gas: 799413 units remaining)
>   +  - location: 18 (remaining gas: 799538 units remaining)

All is well: the change is reflected in the output and the tests do indeed break. Curiously, the difference between the previous gas consumption and the new is not 9000, but 799538 - 799413 = 125. The reason is that the atomic_step_cost works with an internal gas unit, which is later scaled up to obtain the user-facing gas unit.

An eavesdropping client

In order to avoid modifying all existing test-cases and to specify manually the output that should be registered in each test, we wrote a new pytest fixture client_regtest. This fixture mixes the existing client fixture with the regtest fixture of pytest-regtest. In addition, it automatically stores the output of all commands passed to tezos-client. For example, we now express the example test case:

    def test_example_add_reg_client_regtest(self, client_regtest):
        client = client_regtest
        add_contract = "src/bin_client/test/contracts/opcodes/add_example.tz"
        run_script_res = client.run_script(
            add_contract, 'None', 'Pair 10 15', amount=None, trace_stack=True
        )
        assert run_script_res.storage == '(Some 25)'

Using this method, we add regression testing to the existing test cases simply by changing the fixture.

Scaling down the pedantry: adding output conversions

One difficulty with regression testing is dealing with output that may change from one test run to another. For instance, a contract that uses the NOW instruction will have a different stack trace on each run, as the timestamp that is pushed by NOW will differ on each execution. Like a merciless river, time flows ever forwards.

There is little we can do to change the laws of physics, but we can apply “output conversions” to the logs written by pytest to replace such data in the output with fixed markers.

We have written a client_regtest_scrubbed fixture that extends client_regtest with a series of such conversions. For example, a timestamp such as 2019-09-23T10:59:00Z is replaced by [TIMESTAMP] and operation hashes with [OPERATION_HASH].

An issue with long paths problem in pytest-regtest

During the implementation of regression testing, we discovered an issue in pytest-regtest generation of log files. Tests with long parameterizations would create log files with names whose length exceeded the operating system path length limit. We have patched pytest-regtest to replace explicit parameters in log file names with a hash, if the name would exceeds the limits imposed by the operating system.

Discussion

We have followed the method outlined above to add regression testing to Tezos integration tests. Notably in the existing test suite test_contracts_opcode.py that tests Michelson instruction semantics. We have also applied it to test macro expansion.

As a result, Michelson instructions are now regression test enabled. This decreases the risk that any future change inadvertently changes macro expansion, the semantics or the gas cost of any instruction, as doing so will alert the developer by breaking the test suite in many cases. However, due to the dynamic nature of gas costs, it is impossible for any test suite to cover all behaviors. Think of the addition of numbers, whose cost depends on the size of the operands. Tests can only cover a finite number of operands, but since integers and naturals are unbounded in Michelson, they can never cover all. Consequently, a change that changes the gas cost for an untested pair of operands will not be detected. Nonetheless, regression testing provides an important security net that helps to nip bugs early in the bud. Moving forward, we would like to extend regression testing to other parts of the test suite.

Our current approach to output conversion is quite ad hoc. We use the same set of output conversions for all tests. For examples, this results in all timestamps being removed from the output, even those that are constant between two runs. In the future, we would like to use test-specific conversions targeting directly the parts of the output that is expected to change.

Finally, regression testing is coarse-grained and so also fragile. In addition to picking up unintended changes, regression testing also detects intended, harmless changes. For instance, in merge request cryptiumlabs/tezos!54, the UNPAIR macro is replaced by an instruction. UNPAIR being widely used, this change breaks all very large number of regression tests. If such a change also modifies instruction semantics or gas usage in an unintended way, then bugs may be introduced. In such cases, it is important that the developer is very careful when inspecting the diffs of the regression logs and that they keep the change to the code minimal to ease review!

Indeed, as with any method for verification and validation, regression testing should be approached as a complementary activity to careful code review and formal methods.


How to write a Tezos protocol - part 2

This is the second post of a tutorial series on how to implement a Tezos protocol. In the first post, we saw how to write, compile, register, activate and use an extremely simple protocol. We also looked at the interface between the protocol and the shell. In this post, we consider a new protocol called demo_counter which extends demo_noops from the first post in several ways. Blocks can contain simple operations, whose effects update the blockchain state. It is parameterized by protocol parameters passed at activation time. It defines...

Read More
Formally Verifying a Critical Smart Contract

We present the formal verification of the Spending Limit Contract, a critical component of the Cortez wallet.

One of the main goals of Nomadic Labs is the development and applications of formal methods in the domain of distributed software, blockchains and smart contracts. In particular for the Tezos blockchain, for which we also develop the Cortez smartphone wallet (Android, iPhone). This wallet helps Tezos users manage their account and funds in a safe and secure manner. How can the user be...

Read More
Sapling integration in Tezos - Tech Preview

We are happy to announce a first technology preview of our integration in Tezos of the core of the Sapling protocol developed by the ZCash project. By extending the Michelson smart contract language, this work allows for the exchange of digital assets in a privacy preserving way. Why Sapling? In recent years, we’ve seen much progress towards enabling privacy-preserving payments on public ledgers, both in academic research and in the real world deployement with projects such as Zcash, Monero, or Aztec. In...

Read More
A new reward formula for Carthage

Note: This analysis was done with the help of Arthur Breitman and Bruno Blanchet (Inria). The code used for the analysis can be found at this url. A new reward formula for Carthage In this article, we present a new reward formula that we propose for inclusion in Carthage. This new formula is designed to make the network more robust to non-cooperative baking strategies. It does so without changing the total amount of rewards earned by bakers...

Read More
Michelson updates in 005

Changes in Michelson As hinted at in a previous blog post, we’ve been working on improving different parts of the protocol, including our favourite smart contract language: Michelson. The changes made to Michelson in this proposal intend to simplify smart contract development by making the code of complex contracts simpler and cleaner. In particular: smart contracts now support entrypoints contracts can now create, store and transmit as many big_maps as they want comparable types are now closed under products (i.e. the pair constructor) a new instruction, CHAIN_ID, allows...

Read More
Analysis of Emmy+

04/6/2020 update: After a discussion with Michael Neuder (University of Colorado Boulder), who noticed a discrepancy between his analysis of Emmy+ and ours, we identified a bug in the script used to generate the data for the plot in the “Forks starting now” section. We have therefore updated the plot. The number of confirmations has changed from 6, 12, 44 to 7, 16, 67 for attacker stake fractions of 0.2, 0.3, 0.4, respectively. The...

Read More
How to write a Tezos protocol

A Tezos node is parameterized by a software component called an economic protocol (or protocol for short). Different protocol implementations can be used to implement different types of blockchains. This is the first post of a tutorial series on how to implement such a protocol. We will see how to write, compile, register, activate and use an extremely simple protocol. By doing so, we will also start to explore the interface between the protocol and the node (more specifically the shell component of...

Read More
Athens: Our Proposals for the First Voted Amendment

This blog post is a preview of Athens: our protocol proposal for the first voted upgrade of Tezos. As announced in the last meanwhile at Nomadic, we shall propose two upgrades: one lowers the roll size to 8,000 tez, the other leaves it unchanged at 10,000 tez. Both alternatives will include an increase of the gas limit. The hashes of both versions will be proposed on mainnet later this week, now that a new proposal period has begun. Later this week, we will publish a...

Read More
Amendments at Work in Tezos

We are now on the verge of submitting a protocol upgrade to a vote, and it seems like a good opportunity to explain in details the way in which Tezos node handles amendment in practice. Brace yourselves, this article is quite technical, as are all articles in our in-depth category. Still, as we did in the previous one on snapshots, we’ll try to explain the stakes and announcements and give a brief summary in a short foreword understandable even by non-programmers. The original whitepaper...

Read More
  • 1
  • 2

Receive Updates

ATOM

Contacts