mpri-2.4-project-2023-2024/README.md

# Generating well-typed random terms using constraint-based type inference

## Introduction

The goal of this programming project is to reuse a constraint-based
type inference engine (for a simple lamda-calculus) to implement
a random generator of well-typed term.

Contact: gabriel.scherer@inria.fr .

### Motivation

Random generation of well-typed term has been studied as a research
problem: if you want to apply random fuzzing testing techniques on the
implementation of a programming language, you need to generate a lot
of programs in this language. Generating programs that parse correctly
is not too hard, but generating well-typed programs can be hard -- and
generating ill-typed program does not test the rest of the
implementation very well.

To generate well-typed terms, a common approach is to write a random
generator that "inverts" the rules of the type-system, from the root
of the typing derivation to the leaves. If the generator wants to
generate a term at a given type, it can list the typing rule that may
produce this output type, decide to apply one of them, which
corresponds to choosing the head term-former of the generated program,
and in turn requires generating subterms whose types are given by the
chosen typing rule. For example, if you want to generate the program
(? : A -> B), choosing the typing rule for lambda-abstraction refines
it into (lambda (x : A). (? : B)) with a smaller missing hole. If a type in the derivation cannot be

An example of this brand of work is the following:

  Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes
  Testing an Optimising Compiler by Generating Random Lambda Terms
  2012
  https://publications.lib.chalmers.se/records/fulltext/157525.pdf

This approach (which basically corresponds to a Prolog-style
proof search) is easy for simple type systems, and rapidly becomes
difficult for advanced type systems -- for example, dealing with
polymorphism is either unsatisfying or very complex. It is frustrating
to spend a lot of effort writing a complex generator, because we
(language implementers) are already spending a lot of effort writing
complex type-checkers, and it feels like a duplication of work on
similar problems.

Is it possible to write a type-checker once, and reuse it for random
generation of well-typed programs?

### The project

In this project, you will implement a *simple* constraint-based type
inference engine, basically a miniature version of Inferno
( https://inria.hal.science/hal-01081233 ,
https://gitlab.inria.fr/fpottier/inferno ) for a small simply-typed
lambda-calculus (no ML-style polymorphism), and then turn it into
a random generator of well-typed programs.

We provide you with a skeleton for the project, with most of the
boring stuff already implemented, so that you can focus on the
interesting parts. We also provide code to help you test your
code.

### Grading

The project will be evaluated by:

- checking that the code you provided is correct, by reviewing the
  code and your testsuite results

- evaluating the quality of the code (clarity, documentation, etc.)

- evaluating the coverage of the "mandatory" tasks suggested
  by the provided skeleton

- evaluating the difficulty and originality of the extensions you
  implemented, if any

We would like you to write a short REPORT.md file at the root of the
project, which details what you did and explains any non-obvious point,
with pointers to relevant source files. There is no length requirement
for this REPORT.md file, just include the information that *you* think
is valuable and useful -- please, no boring intro or ChatGPT prose.

### Code reuse and plagiarism

Reusing third-party code is allowed, as long as (1) the code license
allows this form of reuse, and (2) you carefully indicate which parts
of the code are not yours -- mark it clearly in the code itself, and
mention it explicitly in your REPORT.md file.

In particular, please feel free to introduce dependencies on
third-party (free software) packages as long as they are explicit in
the packaging metadata of your project -- so that I can still build
the project.

Including code that comes from someone else without proper credit to
the authors is plagiarism.

### OCaml development setup

Install [Opam](https://opam.ocaml.org/doc/Install.html), the OCaml
package manager, on your system.

If you have never used Opam before, you need to initialize it (otherwise, skip this step):

```
$ opam init
```

For convenience, we setup a [local](https://opam.ocaml.org/blog/opam-local-switches/) Opam distribution, using the following commands:

```
$ opam switch create . --deps-only --with-doc --with-test
$ eval $(opam env)
```

To configure your favorite text editor, see the [Real World OCaml setup](http://dev.realworldocaml.org/install.html#editor-setup).

#### Tip on GADTs in OCaml

The type of constraints is a GADT, a Generalized (or Guarded)
Algebraic Datatype. Writing functions that operate on these datatypes
requires a bit more ceremony than everyday OCaml code, due to their
inherent use of polymorphic recursion.

If you are unfamiliar with function declarations that look like

    let rec foo : type a e . (a, e) t -> ...

then you should read the GADT chapter of the OCaml manual:
  https://v2.ocaml.org/releases/5.1/htmlman/gadts-tutorial.html

### Using a different language

We wrote useful support code in OCaml, so it is going to be much
easier to implement the project in OCaml. If you insist, you are of
course free to implement the project in a different programming
language -- any language, as long as I can install a free software
implementation of it on my Linux machine to test your code.

You would still need to read OCaml code to understand the key
ingredients of the projet -- type inference constraints with
elaboration, and random generation -- and transpose them in your
chosen implementation language.

Note: we do *not* expect you to reimplement the full project skeleton
as-is (see more details on what we provide in the "Code organization"
section below). Feel free to provide the minimum amount of support
code to demonstrate/test the interesting bits of the project --
constraint generation with elaboration to an explicitly-typed
representation, constraint solving, and then random generation of
well-typed terms.

## High-level description

This project contains a simple type inference engine in the spirit of
Inferno, with a type `Untyped.term` of untyped terms, a type
`STLC.term` of explicitly-typed terms, a type `('a, 'e) Constraint.t`
of constraints that produce elaboration witnesses of type `'a`.

#### Type inference a la inferno

The general idea is to implement a constraint generator of type

    Untyped.term -> (STLC.term, type_error) Constraint.t

and a constraint solving function of type

    val eval : ('a, 'e) Constraint.t -> ('a, 'e) result

which extracts a result (success or failure) from a normal form
constraint.

By composing these functions together, you have a type-checker for
the untyped language, that produces a "witness of well-typedness" in
the form of an explicitly-typed term -- presumably an annotation of
the original program.

(To keep the project difficulty manageable, our "simple type inference
engine" does not handle ML-style polymorphism, or in fact any sort of
polymorphism. We are implementing type inference for the simply-typed
lambda-calculus.)

#### Abstracting over an effect

But then there is a twist: we add to the language of untyped terms
*and* to the language of constraint a `Do` constructor that represents
a term (or constraint) produced by an "arbitrary effect", where the
notion of effect is given by an arbitrary functor (a parametrized type
with a `map` function). This is implemented by writing all the code
in OCaml modules `Make(T : Utils.Functor)` parametrized over `T`.

The constraint-generation function is unchanged.

    Untyped.term -> (STLC.term, type_error) Constraint.t

For constraint solving, however, new terms of the form
  `Do (p : (a, e) Constraint.t T.t)`
now have to be evaluated. We propose to extend the `eval` function to
a richer type

    val eval : ('a, 'e) Constraint.t -> ('a, 'e) normal_constraint

where a normal constraint is either a success, a failure, or an effectful
constraint computation `('a, 'e) Constraint.t T.t`.

We propose an evaluation rule of the form

    eval E[Do p] = NDo E[p]

where E is an evalution context for constraints, `p` has type
`('a, 'e) Constraint.t T.t` (it is a computation in `T` that
returns constraints), and `E[p]` is defined by lifting the
context-surrounding function
`E[_] : ('a, 'e) Constraint.t -> ('b, 'f) Constraint.t`
through the `T` functor.

#### Two or three effect instances

An obvious instantion of this `T : Functor` parameter is to use the
functor `Id.Empty` of empty (parametrized) types with no
inhabitant. This corresponds to the case where the `Do` constructor
cannot be used, the terms and constraint are pure. In this case `eval`
will simply evaluate the constraint to a result. This is what the
`minihell` test program uses.

The other case of interest for this is when the parameter
`T : Utils.Functor` is in fact a search monad `M : Utils.MonadPlus`.
Then it is possible to define a function

    val gen : depth:int -> ('a, 'e) constraint -> ('a, 'e) result M.t

on top of `eval`, that returns all the results that can be reached by
expanding `Do` nodes using `M.bind`, recursively, exactly `depth`
times. (Another natural choice would be to generate all the terms that
can be reached by expanding `Do` nodes *at most* `depth` times, but
this typically gives a worse generator.)

Finally, to get an actual program generator, you need to instantiate
this machinery with a certain choice of `M : MonadPlus` structure. We
ask you to implement two natural variants:

- `MSeq`, which simply enumerates the finite lists of possible
  results.

- `MRand`, which returns random solutions -- an infinite stream of
  independent randomly-sampled solutions.


## Implementation tasks

In short: implement the missing pieces to get the code working,
following the high-level description above. Then (optionally) think
about extending the project further -- we give some ideas in the
"Extensions" section later on.

In more details, we recommend the following progression (but you do as
you prefer).

0. Creating a version-controlled repository for your project, for
   example by running `git init` in the root directory of the project.
   Even if you program all alone, version control is a must-have for
   any moderately complex project. Whenever you have achieved
   progress, save/commit the current state. Whenever you are able to
   backtrack, erase your last hour of work and try to something
   different, save/commit the current state *before* throwing it
   away. You are welcome.

1. Look at the simply-typed lambda-calculus used as a toy programming
   language, implemented in Untyped.ml (the before-type-inference form
   without types everywhere) and STLC.ml (the explicitly-checked
   version produced by our type inference engine). These are fairly
   standard.

2. Look at the datatype of constraints used for type inference,
   defined in Constraint.ml.

3. Read the testsuite in tests.t/run.t to have an idea of the
   sort of behaviors expected once the project is complete.

   Reading the testsuite will also give you ideas on how to run your
   own tests during your work. Feel free to edit the run.t file, or
   add more test files.

   You might want to save the run.t file somewhere for future
   reference. Then we recommend "promoting" the current output
   (where nothing works because nothing is implemented yet) by running
   `dune runtest` (at the root of the project) and then
   `dune promote`. As you implement more features you should regularly
   run `dune runtest` again, and `dune promote` to save the current
   testsuite output. This will help you track your progress and
   notice behavior changes (improvements or regressions).

   Note: the testsuite outputs are descriptive, not prescriptive. If
   you implement the project differently, you may get different
   outputs that are also correct. You have to read the test output to
   tell if they make sense.

4. Implement a constraint generator in Infer.ml.

   You should be able to test your constraint generator by running
   commands such as:

       dune exec -- minihell --show-constraint tests/id_poly.test

   (Please of course feel free to write your own test files, and
    consider adding them to the testsuite in tests/run.t.)

5. Implement a constraint solver in Solver.ml. (There is also
   a missing function, Structure.merge, that you will have to
   implement to get unification working.)

   You should be able to test your constraint solver by running
   commands such as

       dune exec -- minihell --show-constraint --log-solver tests/id_poly.test

   At this point you have a complete type-checker for the simply-typed
   lambda-calculus. Congratulations!

6. Implement search monads in MSeq.ml and MRand.ml. They should obey
   the MonadPlus interface defined in Utils.ml. The intention is that:

   - MSeq should simply generate the finite sequence of all terms of
     a given size (the order does not matter), while
   - MRand should return an infinite sequence, each element being
     a random choice of term of the given size (we do not expect the
     distribution to be uniform: typically, for each syntactic node,
     each term constructor can be picked with a fixed probability).

   You can implement just one of them before going to the next step,
   or both. You only need one to test your code at first.

6. Implement a term generator in Generator.ml.

   You can test the generator instantiated with MRand by running, for
   example:

       dune exec -- minigen --depth 5 --count 3

   You can test the generator instantiated with MSeq:
   example:

       dune exec -- minigen --exhaustive --depth 5 --count 20

   (On my implementation this only generates 10 terms, as the
   generator only produces 10 different terms at this depth.)

7. At this point you are basically done with the "mandatory" part of
   this project. Please ensure that your code is clean and readable,
   we will take this into account when grading.

   You should now think of implementing extensions of your choice.
   (See the "Extensions" section below for some potential ideas.)


## Skeleton code organization

- `tests.t`: the testsuite. See tests.t/run.t for details.
   This is important.

- `bin/`: two small executable programs that are used to
   test the main logic in `src/`:

   + `minihell`: a toy type-checker

   + `minigen`: a toy generator of well-typed terms

   You should feel free to modify these programs if you wish, but you
   do not need to.

- `src/`: the bulk of the code, that place where we expect you to
   work. `src/*.{ml,mli}` are the more important modules that you will
   have to modify or use directly. `src/support/*.{ml,mli}` has the
   less interesting support code. (Again, feel free to modify the
   support code, but you do not need to.)

   + `Generator.ml,mli`: the random term generator (you only need to
     look at this late in the project, feel free to skip it at first)

   + `Infer.ml,mli`: the type-inference engine, that generates constraints
      with elaboration

   + `MRand.ml,mli`: the random-sampling monad

   + `MSeq.ml,mli`: the list-all-solutions monad

   + `Solver.ml,mli`: the constraint solver

   + `STLC.ml`: the syntax of types and well-typed terms

   + `Structure.ml`: the definition of type-formers
     in the type system. This is used in `STLC.ml`,
     but also in constraints that manipulate types
     containing inference variables.

   + `Unif.ml,mli`: the unification engine. This is fully
     implemented. You will need to understand its interface
     (`Unif.mli`) to solve equality constraints in `Solver.ml`.

   + `Untyped.ml`: the syntax of untyped terms.

   + `Utils.ml`: useful bits and pieces. Feel free
     to add your stuff there.

   + `support/`: the boring modules that help for debugging and
     testing, but you probably do not need to use or touch them
     directly. (Feel free to modify stuff there if you want.)

     * `ConstraintPrinter.ml,mli`: a pretty-printer for constraints

     * `ConstraintSimplifier.ml,mli`: a "simplifier" for constraints,
       that is used to implement the `--log-solver` option of
       `minihell`.

     * `Decode.ml,mli`: reconstructs user-facing types from
       the current state of the unification engine.

     * `Printer.ml`: support code for all pretty-printers.

     * `SatConstraint.ml`: "satisfiable constraints" are constraints
       that do not produce an output (no elaboration to
       well-typed terms). This simpler type (no GADTs in sight) is
       used under the hood for simplification and pretty-printing.

    * `STLCPrinter.ml,mli`: a pretty-printer for explicitly-typed
      terms.

    * `UntypedLexer.mll`: an `ocamllex` lexer for the untyped
      language.

    * `UntypedParser.mly`: a `menhir` grammar for the untyped
      language.

    * `UntypedPrinter.ml,mli`: a pretty-printer for the untyped
       language.

   (Again: if you decide to take risks and implement the project in
   a different language, you do not need to reimplement any of that,
   but then you are on your own for testing and debugging.)


### Extensions

After you are done with the core/mandatory part of the project, we
expect you to implement extensions of your choosing. Surprise us!

In case it can serve as inspiration, here are a few ideas for
extensions.

- You could add some extra features to the programming language
  supported. For example:

   + Base types such as integers and booleans

   + Recursive function definitions.

   + Lists, with a built-in shallow pattern-matching construction

          match .. with
          | [] -> ...
          | x :: xs -> ...

   + n-ary tuples: this sounds easy but it is in fact quite tricky,
     because the strongly-typed GADT of constraints does not make it
     easy to generate a "dynamic" number of existential
     quantifiers. This is a difficult exercise in strongly-typed
     programming.

     (We in fact wrote a short paper on how we approached this problem
     in the context of Inferno, and the difficulties are the same.
     https://inria.hal.science/hal-03145040
     Feel free to reuse our approach if you want.)

   + Patterns and pattern-matching. You get into the same difficulties
     as n-ary tuples, so working on n-ary tuples first is probably
     a good idea.

   + You could implement ML-style parametric polymorphism, with
     generalization during type inference. This is the hardest of the
     extensions mentioned here by far: we have not tried this, and we
     do not know how the interaction between enumeration/generation
     and generalization will work out. (One might need to change the
     term generation to stop going 'in and out' of sub-constraints, or
     at least to be careful in the implementation of ML-style
     generalization to use a (semi-)persistent data structure, a tree
     of generalization regions instead of a stack.)

- You could use the term generator for property-based testing of
  *something*: implement a procedure that manipulates explicitly-typed
  terms -- for example, a semantics-preserving program transformation,
  or a compilation to the programming language of your choice, etc --
  and test the property using random generator.

  (You could probably do some greybox fuzzing by providing an
  alternative search monad using Crowbar:
  https://github.com/stedolan/crowbar )

- You could improve the random-generation strategy by thinking about
  the shape of terms that you want to generate, tweaking the
  distribution, providing options to generate terms using only
  a subset of the rules, etc.

  An important improvement in practice would be to provide some form
  of shrinking.

- You could look at existing work on testing of language
  implementations by random generation, and see if they can be
  integrated in the present framework. For example,

  https://janmidtgaard.dk/papers/Midtgaard-al%3aICFP17-full.pdf

  performs random generation of programs with a custom effect system,
  that determines if a given sub-expression performs an observable
  side-effect or not -- to only generate terms with a deterministic
  evaluation order. Could one extend the inference engine to also
  check that the program is deterministic in this sense, and thus
  replace the custom-built program generator used in this work?