aboutsummaryrefslogtreecommitdiffstats

Artifact for "Verified Interpreters for Dynamic Languages with Applications to the Nix Expression Language"

This is the accompanying artifact for the following paper:

Rutger Broekhoff and Robbert Krebbers. 2025. Verified Interpreters for Dynamic Languages with Applications to the Nix Expression Language. Proc. ACM Program. Lang. 9, ICFP, Article 268 (August 2025), 29 pages. https://doi.org/10.1145/3747537

This artifact primarily consists of two components: 1. A Rocq formalization of the languages presented in the paper (LambdaLang; § 2, DynLang; § 3, EvalLang; § 3.4, NixLang; § 4). This includes operational semantics and properties, an interpreter and correctness proof, plus some tests and examples. 2. An OCaml front-end for NixLang (§ 5), which elaborates Nix source files into NixLang. These are then evaluated using the NixLang interpreter, which is derived from the Rocq sources using Rocq's program extraction functionality.

N.B.: If you are using the VM image that we provide for evaluating the artifact, then all necessary dependencies will already be installed using opam. You can directly proceed with using the Dune project and Makefile or using the CLI.

For the purposes of evaluating the artifact, we suggest the following steps: 1. Building (this must certainly be done first). 2. Comparing that the mechanized semantics and interpreters of the defined languages match those presented in the paper. The mechanization of all languages follows a common structure, but we also have a table that points you directly to the relevant definitions. 3. Comparing that the mechanized theorems correspond to those presented in the paper. We have a table that lists all theorems in the paper and their corresponding version in the formalization. The paper mentions some properties/proofs in passing, we list the corresponding mechanized versions for these as well (but it may be less relevant to verify these, as they are primarily used as stepping stones for the major theorems presented in the paper). 4. Exercising the Nix test suite on our interpreter and verifying that this gives the same results as presented in the paper (§ 5). 5. Trying out some Nix programs in our REPL. The paper (§ 1, § 4.1, § 5) contains some examples that might be interesting to try out.

You may also be interested in the axioms used by the mechanization, or in reproducing the interpreter coverage number (91.77%, § 5, p. 22) and Table 1 (§ 5 p. 22). How to do this depends on the build method that you are using, see the followings section.

N.B.: We have only verified this artifact to function on Linux (Arch Linux, Debian) and macOS. It does not work on Windows.

Building

As mentioned at the start, this project consists of two parts: the Rocq mechanization and the front-end of the Nix interpreter. The front-end of the Nix interpreter, which uses the makes use of the NixLang interpreter (written in Rocq, extracted to OCaml), is built using Dune. The Dune project also takes care of checking all of the parts of the Rocq mechanization, and extracting the NixLang interpreter to OCaml. However, for convenience, we have a separate (classic) Makefile which checks only the Rocq sources (this may be more familiar to some, and will explicitly print the files being checked). (Do note that Dune will refuse to build if the .vo files resulting from the checking process are not cleaned up; this can be done using make clean.)

Both parts require dependencies. We have two main ways of managing the dependencies of the project: either using Nix or opam.

If you are familiar with Nix and have it available to you, it will likely be most easy to use. If you are just interested in using Nix to build the project (so not directly interacting with the Dune project / Makefile), look here. If you are instead interested in interacting with dune, Makefile and/or other scripts in this repository, you should probably use the Nix devshell instead. For that, look here. With the devshell set up, you can then proceed with using the Dune project and Makefile.

If you cannot or do not want to use Nix, there is always still opam. Look here to see how to install project dependencies with opam. With the dependencies installed using opam, you can then proceed with using the Dune project and Makefile.

Building with Nix

N.B.: Only do this if you are solely interested in building/testing and/or using the artifact. If you are interested in being able to manually use the Makefile and Dune project, you should use a Nix devshell instead; look here for instructions.

Run nix-build (or equivalently nix-build ./default.nix) to build and test the artifact. This will take a few minutes. The resulting CLI should then be available as ./result/bin/mininix. See how to use the CLI below. For some more details on the test suite, look here.

Run nix-build ./axioms.nix to generate the list of axioms used by the formalization. This is equivalent to running make validate. The output of coqchk should be printed directly, but will also be stored at ./result/coqchk-output. See below for more information on the axioms used.

In the paper, we report a 91.77% coverage for the interpreter code extracted from Rocq. To generate the coverage report, run nix-build ./coverage.nix. The coverage report should then be available as ./result/coverage/report-plain (a text file, look for lib/extraction/interp.ml for the interpreter coverage). A HTML version is also generated, the report for the interpreter should then be available under ./result/coverage/html/lib/extraction/interp.ml.html.

To generate the line counts for the Rocq development (for comparison with Table 1 in the article), run nix-build ./cloc.nix. The resulting report should be available as a text file ./result/formalization-loc-report.

Managing dependencies: Nix devshell

The Nix devshell gives you a shell where all required dependencies are installed for you, so you do not have to fiddle with opam, but can still use the Dune project and Makefile that we provide. There are a few ways that you can use the devshell.

Enter the devshell by running nix-shell. You can then proceed with using the Dune project and Makefile inside of the Nix devshell.

Managing dependencies: opam

First, ensure that your opam repositories are up-to-date:

opam update

For maximum flexibility and reproducibility, we recommend running the following command to create a new opam switch in which the pinned dependencies are installed (this includes the required Rocq, Flocq, Rocq-std++ versions etc.):

# Leave out the --locked flag if not on Linux!
opam switch create ./ --repos default,rocq-released=https://rocq-prover.org/opam/released --deps-only --locked

It may be necessary to activate the newly set up opam switch as follows (if you are not sure and opam instructs you to run this after creating the switch: do run it):

eval $(opam env)

The Dune project Makefile can then be used as usual.

Dune project and Makefile

The Makefile (just the formalization)

If you are solely interested in the Rocq sources, you may make use of the Makefile to check and build Rocq sources and dependencies, provided that you have the appropriate Rocq, Flocq and Rocq-std++ versions available (8.20.1, 4.2.0/4.2.1 and 1.11.0 resp.). When using a Nix devshell or an opam switch with the required dependencies installed (instructions above), this should work automatically.

Run make to check all files in the formalization. Run make validate to print the axioms used. See below for more information on the axioms used.

The Dune project (formalization + front-end)

N.B.: Dune will produce errors if .vo files generated by the Makefile are present. Remove these by running make clean. The Dune project will then build.

This step assumes that you have dependencies installed and available, either using opam or a Nix devshell. See the relevant sections above.

To check all proofs and build the CLI, use dune build -p mininix. To then run the CLI, use ./_build/default/bin/main.exe (see how to use it below). To execute the test suite, use dune test (this may take a few minutes). For some more details on the test suite, look here.

To generate the coverage report, run ./coverage.sh (this will take a few minutes). The report will then be printed directly (look for lib/extraction/interp.ml). A detailed HTML report should also be available under _coverage/html/lib/extraction/interp.ml.html.

To generate the line counts for the Rocq development (for comparison with Table 1 in the article), make sure that cloc and jq are installed (if not using the provided Nix shell). Then run ./cloc-rocq.sh to print the report.

Usage of the CLI

Depending on how you built the CLI, it will be available to you as ./_build/default/bin/main.exe (Dune) or ./result/bin/mininix (Nix). In these examples, replace <cli> with the actual path of the CLI binary.

  • As a Nix REPL: <cli> repl. Various meta-commands are available: :quit, :run, :set and :settings. Autocomplete for these meta-commands is available; it is possible to cycle through suggestions using TAB. Input can be canceled using Ctrl+C and the REPL can be quit using :quit or Ctrl+D. The REPL evaluates in deep mode by default. This can be changed by using :set eval_strategy shallow. The REPL does not handle newlines in user input (a newline is processed as a request to process input).

The expected format of user input is the Nix language (although, as mentioned in the paper, e.g. derivations are not supported). The paper (§ 1, § 4.1, § 5) contains some examples that might be interesting to try out. - For evaluating single files: <cli> eval FILENAME. This command evaluates in shallow mode by default. You can change to deep mode by using the flag -strict. An import tree definition file can be passed using the -importsdef flag. See below.

The Rocq mechanization

See how to check/build the mechanization here. This section describes the different parts of the mechanization, and how it relates to the different parts of the article.

Use of axioms

See how to list the axioms used above (look for the dependency management / build method that you are using). (In case that you are using the Makefile (so dependencies are managed using opam / the Nix devshell): run make validate. If you are using Nix but not the devshell: use nix-build ./axioms.nix (the axioms used will also be written to result/coqchk-output).)

Use of four axioms will be reported, namely:

Coq.Logic.FunctionalExtensionality.functional_extensionality_dep
Coq.Reals.ClassicalDedekindReals.sig_not_dec
Coq.Reals.ClassicalDedekindReals.sig_forall_dec
Coq.Logic.Classical_Prop.classic

These are not axioms that we directly make use of. Instead, they are marked as used because we import Flocq (so NixLang can support IEEE 754 floating point numbers, see theories/nix/floats.v), which imports the classical Reals module from the standard library. Running coqchk on the Reals module from the standard library gives the same list of axioms as shown above.

Structure of the mechanization

There are two files that are shared by all four languages: - theories/res.v: contains the res monad. See also Fig 2. on p. 6. - theories/utils.v: contains some generic lemmas that are useful for all languages. A decent amount of these are about finite maps, which we use heavily.

There is a general structure for all four languages that we formalize: - operational.v: the definition of the operational semantics. + Contains an inductive type expr for expressions. + Contains a step relation that describes the small-step op. sem. + Contains a subst function that gives parallel substitution (as used by step). - operational_props.v: properties of the operational semantics. + Contains a lemma step_det, proving that the small-step op. sem. is deterministic. - interp.v: the definition of the interpreter. + Contains the definition of values val, thunks thunk and environments env. + Contains an interp function that takes some expression, environment and amount of fuel, and returns a result (timeout, fail or some value). (In NixLang, we end up not using interp directly, but instead use interp' which wraps interp and allows specifying whether to evaluate in shallow/deep mode, cf. ⟦e⟧^{δ,E}_μ (§ 4.4, p. 19).) - interp_proofs.v: soundness and completeness of interpreter w.r.t. operational semantics, in three main theorems: + interp_sound_complete_ret_string (or interp_sound_complete_ret_lit for NixLang) proving the soundness and completeness for programs that reduce to strings (or literals in general in NixLang). + interp_sound_complete_fail proving the soundness and completeness for programs that fault. + interp_sound_complete_no_fuel proving the soundness and completeness for programs that loop. For LambdaLang, an extra condition here is that the programs that we are considering must be closed.

There is also always a generalized version of interp_sound_complete_ret_string/lit, namely interp_sound_complete_ret, which states that the interpreter is sound and complete w.r.t. the operational semantics for (in case of LambdaLang closed) programs that reduce to a value.

Specific to the different languages are the following files: - In theories/dynlang (for DynLang, § 3): + equiv.v, the equivalence of LambdaLang and DynLang for closed LambdaLang terms. - In theories/evallang (for EvalLang, § 3.4): + tests.v, some tests of the EvalLang expression parser and interpreter. - In theories/nix (for NixLang, § 4): + floats.v: our Flocq instantiation with some helper functions. + notations.v: some notations to make writing NixLang programs in Rocq easier. + tests.v: some example NixLang programs to test the functionality of the interpreter with. + wp.v: the definition of our proof-of-concept weakest precondition-based program logic, derived rules (see § 5). + wp_examples.v: examples of use of our WP-based program logic (see § 5).

Index of definitions and theories

Relevant definitions.

In paper File and line number Name
Shared result monad (p. 6) theories/res.v:4 res
LambdaLang syntax (p. 6) theories/lambda/operational.v:6 expr
LambdaLang operational semantics (p. 6) theories/lambda/operational.v:21 step
LambdaLang final expressions (p. 6) theories/lambda/operational.v:28 final
LambdaLang parallel substitution (p. 6) theories/lambda/operational.v:12 subst
LambdaLang interpreter (p. 6) theories/lambda/interp.v:34 interp
LambdaLang interpreter data structures (p. 6) theories/lambda/interp.v:7-14 thunk, env, val
DynLang syntax (p. 9) theories/dynlang/operational.v:6 expr
DynLang operational semantics (p. 9) theories/dynlang/operational.v:21 step
DynLang final expressions (p. 9) theories/dynlang/operational.v:31 final
DynLang parallel substitution (p. 9) theories/dynlang/operational.v:12 subst
DynLang interpreter (p. 9) theories/dynlang/interp.v:39 interp
DynLang interpreter data structures (p. 9) theories/dynlang/interp.v:7-13 thunk, env, val
EvalLang syntax (p. 11) theories/evallang/operational.v:7 expr
EvalLang expression parser (p. 11) theories/evallang/operational.v:103 parse
EvalLang operational semantics (p. 11) theories/evallang/operational.v:119 step
EvalLang final expressions (p. 11) theories/evallang/operational.v:130 final
EvalLang interpreter (p. 11) theories/evallang/interp.v:42 interp
EvalLang interpreter data structures (p. 11) theories/evallang/interp.v:7-13 thunk, env, val
NixLang syntax (p. 14) theories/nix/operational.v:67 expr
NixLang operational semantics (p. 14) theories/nix/operational.v:444 step
NixLang evaluation contexts (p. 14) theories/nix/operational.v:418 ctx1
NixLang final expressions (p. 14) theories/nix/operational.v:139 final
NixLang binary operator semantics (p. 15) theories/nix/operational.v:297 sem_bin_op
NixLang argument matching (p. 15) theories/nix/operational.v:398 matches
NixLang substitution (p. 15) theories/nix/operational.v:112 subst
NixLang interpreter (p. 18) theories/nix/interp.v:329 interp
NixLang interpreter data structures (p. 18) theories/nix/interp.v:5-21 val, thunk, tattr, env
NixLang interpreter variant with mode arg. (p. 19) theories/nix/interp.v:348 interp'

Corresponding theorems.

Theorem in paper File and line number Name
Theorem 2.1, Item 1 (p. 7) theories/lambda/interp_proofs.v:575 interp_sound_complete_ret_string
Theorem 2.1, Item 2 (p. 7) theories/lambda/interp_proofs.v:585 interp_sound_complete_fail
Theorem 2.1, Item 3 (p. 7) theories/lambda/interp_proofs.v:594 interp_sound_complete_no_fuel
Lemma 2.2 (p. 8) theories/lambda/interp_proofs.v:516 interp_sound_open
Lemma 2.3 (p. 8) theories/lambda/interp_proofs.v:402 interp_step
Lemma 2.4 (p. 8) theories/lambda/interp_proofs.v:297 interp_proper
Theorem 3.1, Item 1 (p. 11) theories/dynlang/equiv.v:120 interp_equiv_ret_string
Theorem 3.1, Item 2 (p. 11) theories/dynlang/equiv.v:131 interp_equiv_fail
Theorem 3.1, Item 3 (p. 11) theories/dynlang/equiv.v:142 interp_equiv_no_fuel
Theorem 4.1, Item 1 (p. 20) (*) theories/nix/interp_proofs.v:2655 interp_sound_complete_ret_lit
Theorem 4.1, Item 2 (p. 20) theories/nix/interp_proofs.v:2665 interp_sound_complete_fail
Theorem 4.1, Item 3 (p. 20) theories/nix/interp_proofs.v:2673 interp_sound_complete_no_fuel
Lemma 4.2 (p. 20) theories/nix/interp_proofs.v:2619 interp_sound_open'
Lemma 4.3 (p. 20) theories/nix/interp_proofs.v:2029 interp_step'

(*): The theorem in the formalization is stronger than the theorem presented in the paper. The latter is a trivial specialization of the former.

Other claims in the paper.

  • § 2, LambdaLang:
  • p. 7: Theorem 2.1, Item 1 generalizes to any final value. See theories/lambda/interp_proofs.v:563, interp_sound_complete_ret. This also holds for DynLang, EvalLang and NixLang. These languages have a theory interp_sound_complete_ret in their respective interp_proofs.v files as well.
  • p. 7: LambdaLang has a deterministic operational semantics. See theories/lambda/operational_props.v:19, step_det. This also holds for DynLang, EvalLang and NixLang. These languages have a theory step_det in their respective operational_props.v files as well.
  • § 3, DynLang:
  • p. 10: A variant of the main theorem of DynLang (Theorem 2.1) also holds for DynLang. The primary difference here is that the closedness conditions are not needed for DynLang. For a variant of Theorem 2.1, Item 1, see theories/dynlang/interp_proofs.v:391, interp_sound_complete_ret_string. For a variant of Theorem 2.1, Item 2, see theories/dynlang/interp_proofs.v:400, interp_sound_complete_fail. For a variant of Theorem 2.1, Item 3, see theories/dynlang/interp_proofs.v:408, interp_sound_complete_no_fuel. A variant of the generalized version of Theorem 2.1, Item 1 also holds for DynLang. See theories/dynlang/interp_proofs.v:381, interp_sound_complete_ret.
  • § 4, NixLang:
  • p. 19/20: Theorem 4.1, Item 1 generalizes to any final value.
  • p. 20: Lemma 4.2 follows from mutual induction on four properties. For these four properties and their corresponding proof, see theories/nix/interp_proofs.v:2267-2282 (interp_sound_open, interp_thunk_sound, interp_app_sound, force_deep_sound).
  • p. 20: Lemma 4.3 follows from mutual induction on two properties. For these two properties and their corresponding proof, see theories/nix/interp_proofs.v:1519, interp_step.
  • § 5, Evaluation (Program logic, p. 22):
  • We can derive structural rules for WP (the application rule is presented). See theories/nix/wp.v:55, App_wp for the presented application rule. In general, theories/nix/wp.v contains all derived rules for WP.
  • We prove the total correctness of three variants of the recursive program from § 4.1:
    • See theories/nix/wp_examples.v:84, even_rec_attr_wp' for nix rec { f = x: if x = 0 then true else !(f (x - 1)); }.f n
    • See theories/nix/wp_examples.v:132, even_rec_functor_wp' for nix { "__functor " = r: x: if x == 0 then true else !(r (x - 1)); } n
    • See theories/nix/wp_examples.v:157, even_rec_default_wp' for nix ({ f ? (x: if x == 0 then true else !(f (x - 1))) }: f) {} n
  • We prove the total correctness of the following program for any non-recursive attribute set e: nix let x = 1; in with e; with { y = 2; }; x == y See theories/nix/wp_examples.v:11, test_wp.

Import tree definitions

We do not claim to have (proper) support the import feature of Nix, but we do support it in a bare-bones fashion in order to be able to run the test suite. In the test suite, there is a file lib.nix, which a few test cases load by using with import ./lib.nix; ... (or something similar). To support this, we use a so-called import tree definition file, which pre-declares all files that may be imported. In the test suite, for example, we have created a test/testdata/importdef.sexp file, which looks like this:

(deps ./lib.nix)

This means that the file lib.nix should be available to be imported for all files in the test/testdata folder. Such importdef.sexp files are discovered automatically by walking up the directory tree from the path of the file to be evaluated (or in case of the REPL, from the current working directory). The general syntax (<forest>) is as follows: (deps <tree>), where <tree> is either <filename> or (<filename> <forest>)

In general, the primitive imports mechanism works as follows:

  • All path expressions (e.g., ./lib.nix) in the Nix source file are converted to strings representing their absolute path.
  • We load all dependencies and evaluate them to NixLang values.
  • We put all of these values in a NixLang attribute set (VAttr, see theories/nix/interp.v:14), where we associate the NixLang value of each imported file with the original filename (as an absolute path).
  • We then create a NixLang function import <filename> (using VClo, see theories/nix/interp.v:9) that takes a string and returns the value for that file (if indeed forward-declared in the import tree definition).
  • This function is then inserted into the global environment of the file that we want to evaluate.

This process is performed recursively, hence recursive imports are also supported.

Use of the Nix language tests

When exercising the official Nix language tests on our extracted interpreter, we do not exactly consider all tests in the Nix test suite. Since we are only interested in tests for the interpreter, and not so much in tests for the parser, we only consider the eval-* test files in the test/testdata folder. There are two types of tests that we have here: tests where evaluation should succeed, and tests where evaluation should fail. This is simply marked in the filename; Nix files that should fail to evaluate are named eval-fail-*.nix and files that should evaluate successfully are named eval-okay-*.nix. For the former, there are expected error outputs that the interpreter should produce, but we are not interested in reproducing these (nor would this be realistic with our setup).

For the latter, there are also expected output expressions, which can be found in matching eval-okay-*.exp. However, there are two eval-okay-*.nix files for which no such matching eval-okay-*.exp file exists: - eval-okay-tail-call-1.nix: a file eval-okay-tail-call-1.exp-disabled exists; we hence do not consider this test. - eval-okay-xml.nix: a file eval-okay-xml.exp.xml exists, but we are not interested in converting the resulting term to XML to validate the result of this test.

The total amount of eval-fail-*.nix files and pairs of eval-okay-*.nix and eval-okay-*.exp files informs our total count of interpreter tests that we consider to begin with, namely 182 (§ 5, p. 21). For these tests again, we explicitly list 74 tests that should be ignored in ./test/test_mininix.ml, since they cover functionality of Nix that is out of scope for our paper. We end up with 108 tests that we exercise our interpreter on. Our interpreter passes 103 of these tests, and times out for five tests due to its call-by-name nature. Some more details can be found in the paper, in § 5.

General project structure

  • The theories folder contains the Rocq formalization. See the structure of the mechanization.
  • The bin folder contains the entrypoint for the command-line interface.
  • The lib folder contains the Nix parser, elaborator and interpreter extraction
  • The nix subfolder contains an adapted version of the parser and pretty-printer from nixformat (licensed under ISC) by Denis Korzunov. See https://github.com/d2km/nixformat. Some improvements were made to the parser and pretty-printer. The file elaborator.ml, not part of nixformat, in this folder is concerned with desugaring attribute paths among other things; you may consider this a 'normalizing' stage, which allows the elaborator to our core language to be more concise.
  • The mininix subfolder is the most important here. It contains, among other files:
    • nix2mininix.ml, elaboration of Nix into our core language
    • builtins.nix, our implementation of builtins in Nix, in our core Nix language, where we liberally use 'core' builtins (available as binary operations) that are specific to the core language
    • conv.ml, conversion between numeric types in OCaml and Rocq/Flocq
    • import.ml, very bare-bones support for imports, used in the test suite
  • The extraction folder contains a file that extracts the core language interpreter and some auxiliary functions to OCaml
  • The explorer folder contains some utilities for converting the Rocq Mechanization to a static site.