At Nomadic Labs we are proud to create next-gen software … but we are even prouder to help create the next generation of software scientists!
So it’s our great pleasure to host and work with several PhD students, who bring unique perspectives on the technology which they help us to develop, and whose interest in blockchain we are happy to nurture and inform.
In this blogpost, we will ask five questions of one of our students, Paul Laforgue (and a couple of questions of his supervisors at Nomadic Labs).
Over to you Paul … no the mic is over here … that’s right, you’re on air!
Questions for Paul
1. Please present yourself and your academic background
I’ve always been interested in programming language design and the use of proof-assistants to check correctness. In 2015 I did my first research internship at Chalmers University, in which I was introduced to the Agda proof assistant. In 2016, I started a Master’s degree at the Université de Paris under the supervision of Yann Régis-Gianas, who was at the Université at the time (and is now a programmer at Nomadic Labs). Yann and I showed how a programming language like OCaml can be extended with codatatypes and copatterns; modern programming language constructs to define and handle infinite data structures. Then in 2017 I did an internship in which I designed a type system for a subset of the R programming language at Northeastern University.
In October 2018 I started my job at Nomadic Labs in what is now called the Shell team.1 I mainly worked on the introduction of history modes, and metrics for the node. I started my PhD thesis at the Université de Paris in late 2020.
During my time as a Nomadic programmer, I became acquainted with the Tezos OCaml implementation Octez and with the challenges it has to face. This background knowledge helped us define the goals we want this PhD thesis to tackle.
2. Tell us more about the topic of your PhD: who is your mentor, and what are your research objectives?
Distributed, networked systems are now ubiquitous. By distributed system I mean one consisting of multiple participants — usually in different locations — who coordinate by exchanging messages to achieve some goal.
Blockchains are distributed of course, but so are telecommunication networks, peer-to-peer systems, and distributed databases (e.g. in medicine and transportation). Notice how all of these examples are also safety-critical!
Distributed systems tend to appear in applications that require the following features:
- Reliability: a few faulty nodes won’t corrupt the system’s behaviour,
- Scalability: computing resources and scheduling methods can be dynamically adjusted, and
- Performance: incoming tasks can be distributed optimally amongst participants.
But there’s a catch: programming concurrent and distributed systems is notoriously hard, and in particular it is very hard to predict all possible interactions between their components.
Message-passing is a fundamental coordination mechanism used by participants in a distributed system. We are developing a framework to specify and verify message-passing protocol specifications, and to validate an implementation against a given protocol.
In particular, we are developing a framework to specify and verify multiparty message-passing protocol specifications with the aim of making them executable and usable for conformance testing. In this framework, written in Coq, protocols are defined as choreographies.
Choreographies are global descriptions of distributed systems, expressing the composition of the expected interactions between their participants. We want to specify and verify the desired properties2 at the level of choreographies, and then produce certified executables by synthesising local behaviours of each participant that faithfully realise the communications given in the choreography, a mechanism known as endpoint projection.3
My mentors at Nomadic Labs are Zaynah Dargaye and Yann Régis-Gianas. My research directors from the IRIF research laboratory at Paris University are Giovanni Bernardi and Giuseppe Castagna. All have theoretical and practical backgrounds on communication-based programming, which is essential for this project. I’m lucky to work with such talented and expert people.
3. What is the value added and innovative side of your PhD thesis (if applicable, in the Tezos blockchain)?
Tezos is open-source and relies on an open network, so anybody can build or profile its nodes and enter the network. This is an important feature when we care for decentralization. Today, two projects implement Tezos nodes: the OCaml implementation Octez and the Rust implementation Tezedge. Also, numerous releases of the node are currently running on mainnet. Miscommunication between the nodes may lead to critical issues,4 thus to guarantee system health we must ensure that the nodes communicate well.
Also, it is important to build rigourous specifications, to avoid Tezos later encountering issues related with legacy code — as e.g. core banking platforms are finding themselves struggling with now (permalink).5 In our case, the fact that the specification is executable offers benefits: it stays up-to-date with the implementations and can assist programmers when modifying the node.
4. What is the benefit of preparing a PhD and spending most of your time in a private company vs. a University lab?
Both at Nomadic Labs and at the University, I am surrounded by people passionate about their fields, coming from different backgrounds and driven by curiosity. However, there is a constructive difference in emphasis:
At the University people target the publication of academic papers, whilst at Nomadic Labs the target is the improvement of the Tezos implementation. Nomadic programmers use their academic background to transform theoretical results from university research, into practical solutions. That’s two distinct yet interdependent approaches to computer science research. As a PhD student working at Nomadic Labs I get to study, participate in, and contribute to both approaches.
5. Why did you choose to become a part of the blockchain ecosystem and Nomadic labs?
I believe that blockchain technology is promising — for decentralised finance of course, but also in other ways which I hope may be directly socially constructive as well: e.g. e-voting, such as electis, decentralised identities, and tickets.
Tezos’ self-amendment mechanism makes it interesting, and it is also a fascinating project because it involves such a breadth of areas: whether you are interested in the peer-to-peer layer, in cryptography, in verification or consensus algorithms you will find something to dig in to and to improve the software. On a more personal level I really appreciate all the energy coming from the community’s feedback.
I want to enhance my skills in the formal verification of distributed systems using proof assistants, so that by the end of the thesis can tackle industry challenges with the Verif team at Nomadic.
I would like to enrich my perspectives when coping with research problems, by learning from my mentors and studying their approaches to problem-solving. Each of my mentors is unique and has his or her strengths, which I look forward to learning from.
Finally, I hope that — indeed, I would be honoured if — my work could make distributed software more secure. Delivering a tool to concretely help programmers in their message-passing protocol design and implementations would be very gratifying.
Questions for Paul’s Nomadic Labs mentors Zaynah Dargaye and Yann Régis-Gianas
What is your input to Paul’s thesis?
A thesis is an initiation and apprenticeship into top-level research: we will help Paul to learn how to identify an open problem, understand the state of the art, and study how to build on that state of the art to find a solution. There are so many skills involved in doing this, and there is so much knowledge required to apply these skills: domain-specific mathematical knowledge, research skills to evaluate which research paths are likely to bear fruit and are worth pursuing, and how to kindle that spark of creative insight which can sometimes make all the difference to finding the solution to a stubborn problem.
Furthermore, this field is an industrial field so solutions also have to be practical. Paul will have to learn (and we will have to teach him) what the practical requirements and priorities currently are, and how to translate theoretical insights into solutions that are applicable to the real and pressing challenges of designing industry-grade distributed systems.
Why is this thesis important?
In early 2020 Paul brought up a concern he encountered in ensuring the correct implementation of a component in the Tezos codebase: how to specify and validate the correct implementation of a message-passing protocol. Curious, he studied some solutions in existing formalisms, but felt unsatisfied.
The problem that Paul highlighted is widespread in modern systems, whether or not they are distributed: it is not enough that each component of a software system should satisfy its specification locally; they also have to talk together correctly according to some agreed protocol. In other words, we need a language to specify and reason about how components interact, and what it means for these interactions to be correct (bug-free).
Paul’s thesis introduces a formal language to describe the execution of message-passing protocols rigorously and unambiguously. From that description, one can generate a reference implementation for each party in a protocol that is correct-by-construction: given that the implementation is formally constructed from its specification, it is guaranteed that each component responds correctly to the received messages, and sends the messages predicted by the protocol. The wonderful thing is that we can deploy this reference implementation in production, and also use it to validate other implementations — e.g. optimised ones that are faster or have better memory-consumption.
Tezos is split into two parts: the user-facing (and user-upgradable) Economic Protocol, which is what you see and interact with when you run Tezos — and the hardware-facing Shell, which is a lower-level abstraction layer which takes care of e.g. messaging and communication. Think: Economic Protocol = OS and Shell = Kernel. If Tezos were a car, then the Shell team would be a bunch of mechanics covered in grease, making sure all the bolts are tight and no users get hot oil squirted in their faces (no offense, Paul). -ed ↩
Examples range from generic correctness properties, such as no deadlock, when two or more processes get stuck waiting on action from one another (like two super-polite people waiting, forever, for the other one to go through the door first!); all the way to very specific technical correctness properties, such as that the trustworthiness of a certain percentage of the peers advertised to bootstrapping nodes by an advertiser node, is correct with respect to its pool of connections. ↩
For choreographies see Fabrizio Montesi’s 2013 PhD thesis “Choreographic Programming” (permalink) and “A Core Model for Choreographic Programming” by Luís Cruz-Filipe and Fabrizio Montesi (2016) (see in particular page 1, final paragraph). For endpoint projection, see “Deadlock-freedom-by-design: multiparty asynchronous global programming” by Marco Carbone and Fabrizio Montesi (2013) (permalink). ↩
For example, if communications in a distributed system become incoherent then the network might split, meaning that a subset of the system’s nodes become isolated and evolve independently for some time, and once this has happened it may not be trivial to ‘glue’ the system back together again. ↩
This article on the modern need for COBOL (permalink) is also a good read. COBOL is in a sense about as far from modern OCaml in the evolution of industrial programming languages as it is possible to get. ↩