seL4 Proofs

seL4 has machine-checked mathematical proofs for the Arm, RISC-V, and Intel architectures. This page describes the high-level proof statements and how strong the proofs are.

To find out even more about the verification, also check out the FAQ's verification section.

High-level Proof Statements

The seL4 proofs span the following top-level properties.

Binary Code Semantics
Binary Code Semantics
C Code Semantics
C Code Semantics
Specification
Specification
Access Control + Information Flow
Access Control + Info...
Capability Distribution (capDL)
Capability Distributi...
Binary Code
Binary Code
C Code
C Code
Binary Correctness
Binary Correctness
Functional Correctness
Functional Correctness
Initialisation Correctness
Initialisation Cor...
Security Enforcement
(Integrity, Availability, Confidentiality)
Security Enforcement...
Invariants
Invariants
Design
Design
  • Functional correctness: the C code of all verified configurations of seL4 behaves precisely as specified in the seL4 specification and nothing more. This is the strongest assurance that the code will not have any unspecified behaviour.
  • Binary correctness: in the configurations that support this property, the binary machine code running on the processor correctly implements the behaviour of the C code assumed in the functional correctness proof. By extension, this means the binary code implements precisely the behaviour of the specification, and nothing more.
  • Security: in the configurations that support this property, the specification, and by extension the kernel code, prevents an application running on top of seL4 from modifying data without authorisation (integrity), from interfering with resource access of other applications (availability), and from learning information from other applications without explicit authorisation (confidentiality). Together these security properties enforce the isolation of components running on top of the kernel, allowing critical components to securely run alongside untrusted software.
  • System initialisation: the proofs above show that seL4 provides strong security — when used correctly. The security theorem statements provide requirements on how the system must be set up. For instance, for integrity to hold, isolated components cannot be given write access to each other, and confidentiality holds between security domains of seL4's domain scheduler, not between arbitrary threads. In the configurations that support verified system initialisation, the seL4 proof stack has further parts that help with the task of setting up systems correctly: the capDL specification language that lets developers describe access control configurations statically, and the user-level system initialiser that takes those descriptions, and automatically configures the system to provably conform to them at boot time.

Note that not all properties are available on all platforms and all configurations. See the list of verified configurations for the current status of what is available.

The properties listed above are what the proofs show directly. A side effect of the strength of these statements is that they also imply that seL4 is free from whole classes of common programming errors such as buffer overflows, null pointer dereferences, memory leaks, arithmetic exceptions or undefined behaviour. See the page on proof implications for more.

As with all proofs, there are still assumptions that must be met, and there may still be user expectations on kernel behaviour that are not captured by the properties proved so far. Nevertheless, this degree of strong evidence for security and correct functionality has never been achieved before for an OS kernel and is still unrivalled.

How Strong are the Proofs?

We claim that seL4 has world's highest level of assurance for an operating system kernel. On the research level, this is substantiated by the following facts:

  • formal verification in general is stronger than other traditional ways of gaining trust in software such as testing in that it can prove the absence of bugs, not just their presence.
  • within systems that are claimed to be formally verified, seL4's proof stack goes down to the binary code. This massively reduces the assumptions it relies on, compared to verification projects where the system model stays at a more abstract level.
  • within existing formal verification approaches, the seL4 proofs target strong properties, such as functional correctness and explicit security theorems. More traditional techniques such as static analysis often only target the absence of specific classes of programming errors, which are implied by the seL4 proofs.
  • within tools that can be used for formal machine-checked proof, the seL4 proofs use the proof assistant Isabelle/HOL, which is engineered to minimise the code that needs to trusted for the proof to be correct — see more on the assumptions page.

This means in all dimensions of measuring the strength of formal verification, detail of model, depth of property, and correctness of the prover, seL4 is on the highest levels.

All these arguments are nice, but they are abstract. There is something more concrete: we can measure how strong these proofs are in practice.

Before we turn to seL4 itself, there is a research paper about Finding and Understanding Bugs in C Compilers that makes empirical observations about the CompCert C compiler. Like seL4, CompCert also comes with a functional correctness proof, and this is what the authors find:

The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task.

Now turning towards seL4, there are test suites in addition to the formal verification: one for performance tests and one for functional tests.

Why have these if there is formal verification? Here are some of the classes that are still interesting to test:

  • Code in the verified kernel that is left as an assumption, e.g. machine interfaces such as caching. These can be tested and have a fairly high density of tests.
  • Code in unverified configurations of the kernel — things that are either not yet verified or are only for debugging or are for platforms/configuration combinations that are unverified.
  • Tests for verified code to confirm expectations: this is testing the specification. It is less about coverage, more about usability and confirming that the API is what the developers were intending, as opposed to what the specification says or what is implemented. It is also helpful when such tests break on code changes before these are verified.

In addition to the test suites, there are also integration tests with various user-level libraries and frameworks to make sure that they still work (or are updated appropriately) when we make a change to the kernel API.

This means there are a lot of tests that run frequently, hundreds for every code change. seL4 also has been in continued deployment for over a decade. This gives us some data.

In all this time, there have been no functional correctness defects in verified code since the functional correctness proof completed in 2009 — more than 15 years of tests, use, and deployment. There have also never been any integrity defects in verified code since the integrity proofs finished, and so on.

To contrast how remarkable 0 defects in 15 years are: defect density in unverified configurations of seL4 is overall lower than in normal software, but it is not zero. This means there have been defects discovered even in mature unverified configurations of the kernel (e.g. multicore SMP, or x64 VT-x virtualisation extensions), and even in verified configurations we have detected small defects in every class of unverified code in the kernel in that time — hardware assumptions, compiler assumptions, unverified boot code. As the saying goes: everything that can go wrong will eventually go wrong. All the more remarkable that in the entire 10,000 lines of verified code nothing has gone wrong in all that time. This is not surprising, we have a proof for that after all, but it can be hard to grasp.