From 96bd0550260545398903574a3953c1fee3788a15 Mon Sep 17 00:00:00 2001 From: Gabriel Scherer Date: Thu, 21 Dec 2023 00:40:52 +0100 Subject: [PATCH] first public release of the project --- README.md | 506 +++++++++++++++++++++++++++ bin/dune | 11 + bin/minigen.ml | 55 +++ bin/minihell.ml | 165 +++++++++ constrained_generation.opam | 18 + dune | 5 + dune-project | 3 + src/Constraint.ml | 102 ++++++ src/Generator.ml | 20 ++ src/Generator.mli | 11 + src/Infer.ml | 83 +++++ src/Infer.mli | 18 + src/MRand.ml | 25 ++ src/MRand.mli | 1 + src/MSeq.ml | 25 ++ src/MSeq.mli | 1 + src/STLC.ml | 23 ++ src/Solver.ml | 62 ++++ src/Solver.mli | 40 +++ src/Structure.ml | 71 ++++ src/Structure.mli | 19 + src/Unif.ml | 133 +++++++ src/Unif.mli | 62 ++++ src/Untyped.ml | 63 ++++ src/Utils.ml | 122 +++++++ src/dune | 9 + src/support/ConstraintPrinter.ml | 54 +++ src/support/ConstraintPrinter.mli | 7 + src/support/ConstraintSimplifier.ml | 79 +++++ src/support/ConstraintSimplifier.mli | 6 + src/support/Decode.ml | 35 ++ src/support/Decode.mli | 3 + src/support/Printer.ml | 113 ++++++ src/support/STLCPrinter.ml | 65 ++++ src/support/STLCPrinter.mli | 2 + src/support/SatConstraint.ml | 43 +++ src/support/UntypedLexer.mll | 55 +++ src/support/UntypedParser.mly | 105 ++++++ src/support/UntypedPrinter.ml | 51 +++ src/support/UntypedPrinter.mli | 4 + src/support/dune | 5 + tests.t/curry.test | 1 + tests.t/error.test | 1 + tests.t/id_int.test | 1 + tests.t/id_poly.test | 1 + tests.t/run.t | 396 +++++++++++++++++++++ tests.t/selfapp.test | 2 + tests.t/uncurry.test | 3 + 48 files changed, 2685 insertions(+) create mode 100644 README.md create mode 100644 bin/dune create mode 100644 bin/minigen.ml create mode 100644 bin/minihell.ml create mode 100644 constrained_generation.opam create mode 100644 dune create mode 100644 dune-project create mode 100644 src/Constraint.ml create mode 100644 src/Generator.ml create mode 100644 src/Generator.mli create mode 100644 src/Infer.ml create mode 100644 src/Infer.mli create mode 100644 src/MRand.ml create mode 100644 src/MRand.mli create mode 100644 src/MSeq.ml create mode 100644 src/MSeq.mli create mode 100644 src/STLC.ml create mode 100644 src/Solver.ml create mode 100644 src/Solver.mli create mode 100644 src/Structure.ml create mode 100644 src/Structure.mli create mode 100644 src/Unif.ml create mode 100644 src/Unif.mli create mode 100644 src/Untyped.ml create mode 100644 src/Utils.ml create mode 100644 src/dune create mode 100644 src/support/ConstraintPrinter.ml create mode 100644 src/support/ConstraintPrinter.mli create mode 100644 src/support/ConstraintSimplifier.ml create mode 100644 src/support/ConstraintSimplifier.mli create mode 100644 src/support/Decode.ml create mode 100644 src/support/Decode.mli create mode 100644 src/support/Printer.ml create mode 100644 src/support/STLCPrinter.ml create mode 100644 src/support/STLCPrinter.mli create mode 100644 src/support/SatConstraint.ml create mode 100644 src/support/UntypedLexer.mll create mode 100644 src/support/UntypedParser.mly create mode 100644 src/support/UntypedPrinter.ml create mode 100644 src/support/UntypedPrinter.mli create mode 100644 src/support/dune create mode 100644 tests.t/curry.test create mode 100644 tests.t/error.test create mode 100644 tests.t/id_int.test create mode 100644 tests.t/id_poly.test create mode 100644 tests.t/run.t create mode 100644 tests.t/selfapp.test create mode 100644 tests.t/uncurry.test diff --git a/README.md b/README.md new file mode 100644 index 0000000..41932c7 --- /dev/null +++ b/README.md @@ -0,0 +1,506 @@ +# Generating well-typed random terms using constraint-based type inference + +## Introduction + +The goal of this programming project is to reuse a constraint-based +type inference engine (for a simple lamda-calculus) to implement +a random generator of well-typed term. + +Contact: gabriel.scherer@inria.fr . + +### Motivation + +Random generation of well-typed term has been studied as a research +problem: if you want to apply random fuzzing testing techniques on the +implementation of a programming language, you need to generate a lot +of programs in this language. Generating programs that parse correctly +is not too hard, but generating well-typed programs can be hard -- and +generating ill-typed program does not test the rest of the +implementation very well. + +To generate well-typed terms, a common approach is to write a random +generator that "inverts" the rules of the type-system, from the root +of the typing derivation to the leaves. If the generator wants to +generate a term at a given type, it can list the typing rule that may +produce this output type, decide to apply one of them, which +corresponds to choosing the head term-former of the generated program, +and in turn requires generating subterms whose types are given by the +chosen typing rule. For example, if you want to generate the program +(? : A -> B), choosing the typing rule for lambda-abstraction refines +it into (lambda (x : A). (? : B)) with a smaller missing hole. If a type in the derivation cannot be + +An example of this brand of work is the following: + + Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes + Testing an Optimising Compiler by Generating Random Lambda Terms + 2012 + https://publications.lib.chalmers.se/records/fulltext/157525.pdf + +This approach (which basically corresponds to a Prolog-style +proof search) is easy for simple type systems, and rapidly becomes +difficult for advanced type systems -- for example, dealing with +polymorphism is either unsatisfying or very complex. It is frustrating +to spend a lot of effort writing a complex generator, because we +(language implementers) are already spending a lot of effort writing +complex type-checkers, and it feels like a duplication of work on +similar problems. + +Is it possible to write a type-checker once, and reuse it for random +generation of well-typed programs? + +### The project + +In this project, you will implement a *simple* constraint-based type +inference engine, basically a miniature version of Inferno +( https://inria.hal.science/hal-01081233 , +https://gitlab.inria.fr/fpottier/inferno ) for a small simply-typed +lambda-calculus (no ML-style polymorphism), and then turn it into +a random generator of well-typed programs. + +We provide you with a skeleton for the project, with most of the +boring stuff already implemented, so that you can focus on the +interesting parts. We also provide code to help you test your +code. + +### Grading + +The project will be evaluated by: + +- checking that the code you provided is correct, by reviewing the + code and your testsuite results + +- evaluating the quality of the code (clarity, documentation, etc.) + +- evaluating the coverage of the "mandatory" tasks suggested + by the provided skeleton + +- evaluating the difficulty and originality of the extensions you + implemented, if any + +We would like you to write a short REPORT.md file at the root of the +project, which details what you did and explains any non-obvious point, +with pointers to relevant source files. There is no length requirement +for this REPORT.md file, just include the information that *you* think +is valuable and useful -- please, no boring intro or ChatGPT prose. + +### Code reuse and plagiarism + +Reusing third-party code is allowed, as long as (1) the code license +allows this form of reuse, and (2) you carefully indicate which parts +of the code are not yours -- mark it clearly in the code itself, and +mention it explicitly in your REPORT.md file. + +In particular, please feel free to introduce dependencies on +third-party (free software) packages as long as they are explicit in +the packaging metadata of your project -- so that I can still build +the project. + +Including code that comes from someone else without proper credit to +the authors is plagiarism. + +### OCaml development setup + +Install [Opam](https://opam.ocaml.org/doc/Install.html), the OCaml +package manager, on your system. + +If you have never used Opam before, you need to initialize it (otherwise, skip this step): + +``` +$ opam init +``` + +For convenience, we setup a [local](https://opam.ocaml.org/blog/opam-local-switches/) Opam distribution, using the following commands: + +``` +$ opam switch create . --deps-only --with-doc --with-test +$ eval $(opam env) +``` + +To configure your favorite text editor, see the [Real World OCaml setup](http://dev.realworldocaml.org/install.html#editor-setup). + +### Using a different language + +We wrote useful support code in OCaml, so it is going to be much +easier to implement the project in OCaml. If you insist, you are of +course free to implement the project in a different programming +language -- any language, as long as I can install a free software +implementation of it on my Linux machine to test your code. + +You would still need to read OCaml code to understand the key +ingredients of the projet -- type inference constraints with +elaboration, and random generation -- and transpose them in your +chosen implementation language. + +Note: we do *not* expect you to reimplement the full project skeleton +as-is (see more details on what we provide in the "Code organization" +section below). Feel free to provide the minimum amount of support +code to demonstrate/test the interesting bits of the project -- +constraint generation with elaboration to an explicitly-typed +representation, constraint solving, and then random generation of +well-typed terms. + +## High-level description + +This project contains a simple type inference engine in the spirit of +Inferno, with a type `Untyped.term` of untyped terms, a type +`STLC.term` of explicitly-typed terms, a type `('a, 'e) Constraint.t` +of constraints that produce elaboration witnesses of type `'a`. + +#### Type inference a la inferno + +The general idea is to implement a constraint generator of type + + Untyped.term -> (STLC.term, type_error) Constraint.t + +and a constraint solving function of type + + val eval : ('a, 'e) Constraint.t -> ('a, 'e) result + +which extracts a result (success or failure) from a normal form +constraint. + +By composing these functions together, you have a type-checker for +the untyped language, that produces a "witness of well-typedness" in +the form of an explicitly-typed term -- presumably an annotation of +the original program. + +(To keep the project difficulty manageable, our "simple type inference +engine" does not handle ML-style polymorphism, or in fact any sort of +polymorphism. We are implementing type inference for the simply-typed +lambda-calculus.) + +#### Abstracting over an effect + +But then there is a twist: we add to the language of untyped terms +*and* to the language of constraint a `Do` constructor that represents +a term (or constraint) produced by an "arbitrary effect", where the +notion of effect is given by an arbitrary functor (a parametrized type +with a `map` function). This is implemented by writing all the code +in OCaml modules `Make(T : Utils.Functor)` parametrized over `T`. + +The constraint-generation function is unchanged. + + Untyped.term -> (STLC.term, type_error) Constraint.t + +For constraint solving, however, new terms of the form + `Do (p : (a, e) Constraint.t T.t)` +now have to be evaluated. We propose to extend the `eval` function to +a richer type + + val eval : ('a, 'e) Constraint.t -> ('a, 'e) normal_constraint + +where a normal constraint is either a success, a failure, or an effectful +constraint computation `('a, 'e) Constraint.t T.t`. + +We propose an evaluation rule of the form + + eval E[Do p] = NDo E[p] + +where E is an evalution context for constraints, `p` has type +`('a, 'e) Constraint.t T.t` (it is a computation in `T` that +returns constraints), and `E[p]` is defined by lifting the +context-surrounding function +`E[_] : ('a, 'e) Constraint.t -> ('b, 'f) Constraint.t` +through the `T` functor. + +#### Two or three effect instances + +An obvious instantion of this `T : Functor` parameter is to use the +functor `Id.Empty` of empty (parametrized) types with no +inhabitant. This corresponds to the case where the `Do` constructor +cannot be used, the terms and constraint are pure. In this case `eval` +will simply evaluate the constraint to a result. This is what the +`minihell` test program uses. + +The other case of interest for this is when the parameter +`T : Utils.Functor` is in fact a search monad `M : Utils.MonadPlus`. +Then it is possible to define a function + + val gen : depth:int -> ('a, 'e) constraint -> ('a, 'e) result M.t + +on top of `eval`, that returns all the results that can be reached by +expanding `Do` nodes using `M.bind`, recursively, exactly `depth` +times. (Another natural choice would be to generate all the terms that +can be reached by expanding `Do` nodes *at most* `depth` times, but +this typically gives a worse generator.) + +Finally, to get an actual program generator, you need to instantiate +this machinery with a certain choice of `M : MonadPlus` structure. We +ask you to implement two natural variants: + +- `MSeq`, which simply enumerates the finite lists of possible + results. + +- `MRand`, which returns random solutions -- an infinite stream of + independent randomly-sampled solutions. + + +## Implementation tasks + +In short: implement the missing pieces to get the code working, +following the high-level description above. Then (optionally) think +about extending the project further -- we give some ideas in the +"Extensions" section later on. + +In more details, we recommend the following progression (but you do as +you prefer). + +0. Creating a version-controlled repository for your project, for + example by running `git init` in the root directory of the project. + Even if you program all alone, version control is a must-have for + any moderately complex project. Whenever you have achieved + progress, save/commit the current state. Whenever you are able to + backtrack, erase your last hour of work and try to something + different, save/commit the current state *before* throwing it + away. You are welcome. + +1. Look at the simply-typed lambda-calculus used as a toy programming + language, implemented in Untyped.ml (the before-type-inference form + without types everywhere) and STLC.ml (the explicitly-checked + version produced by our type inference engine). These are fairly + standard. + +2. Look at the datatype of constraints used for type inference, + defined in Constraint.ml. + +3. Read the testsuite in tests.t/run.t to have an idea of the + sort of behaviors expected once the project is complete. + + Reading the testsuite will also give you ideas on how to run your + own tests during your work. Feel free to edit the run.t file, or + add more test files. + + You might want to save the run.t file somewhere for future + reference. Then we recommend "promoting" the current output + (where nothing works because nothing is implemented yet) by running + `dune runtest` (at the root of the project) and then + `dune promote`. As you implement more features you should regularly + run `dune runtest` again, and `dune promote` to save the current + testsuite output. This will help you track your progress and + notice behavior changes (improvements or regressions). + + Note: the testsuite outputs are descriptive, not prescriptive. If + you implement the project differently, you may get different + outputs that are also correct. You have to read the test output to + tell if they make sense. + +4. Implement a constraint generator in Infer.ml. + + You should be able to test your constraint generator by running + commands such as: + + dune exec -- minihell --show-constraint tests/id_poly.test + + (Please of course feel free to write your own test files, and + consider adding them to the testsuite in tests/run.t.) + +5. Implement a constraint solver in Solver.ml. (There is also + a missing function, Structure.merge, that you will have to + implement to get unification working.) + + You should be able to test your constraint solver by running + commands such as + + dune exec -- minihell --show-constraint --log-solver tests/id_poly.test + + At this point you have a complete type-checker for the simply-typed + lambda-calculus. Congratulations! + +6. Implement search monads in MSeq.ml and MRand.ml. They should obey + the MonadPlus interface defined in Utils.ml. The intention is that: + + - MSeq should simply generate the finite sequence of all terms of + a given size (the order does not matter), while + - MRand should return an infinite sequence, each element being + a random choice of term of the given size (we do not expect the + distribution to be uniform: typically, for each syntactic node, + each term constructor can be picked with a fixed probability). + + You can implement just one of them before going to the next step, + or both. You only need one to test your code at first. + +6. Implement a term generator in Generator.ml. + + You can test the generator instantiated with MRand by running, for + example: + + dune exec -- minigen --depth 5 --count 3 + + You can test the generator instantiated with MSeq: + example: + + dune exec -- minigen --exhaustive --depth 5 --count 20 + + (On my implementation this only generates 10 terms, as the + generator only produces 10 different terms at this depth.) + +7. At this point you are basically done with the "mandatory" part of + this project. Please ensure that your code is clean and readable, + we will take this into account when grading. + + You should now think of implementing extensions of your choice. + (See the "Extensions" section below for some potential ideas.) + + +## Skeleton code organization + +- `tests.t`: the testsuite. See tests.t/run.t for details. + This is important. + +- `bin/`: two small executable programs that are used to + test the main logic in `src/`: + + + `minihell`: a toy type-checker + + + `minigen`: a toy generator of well-typed terms + + You should feel free to modify these programs if you wish, but you + do not need to. + +- `src/`: the bulk of the code, that place where we expect you to + work. `src/*.{ml,mli}` are the more important modules that you will + have to modify or use directory. `src/support/*.{ml,mli}` has the + less interesting support code. (Again, feel free to modify the + support code, but you do not need to.) + + + `Generator.ml,mli`: the random term generator (you only need to + look at this late in the project, feel free to skip it at first) + + + `Infer.ml,mli`: the type-inference engine, that generates constraints + with elaboration + + + `MRand.ml,mli`: the random-sampling monad + + + `MSeq.ml,mli`: the list-all-solutions monad + + + `Solver.ml,mli`: the constraint solver + + + `STLC.ml`: the syntax of types and well-typed terms + + + `Structure.ml`: the definition of type-formers + in the type system. This is used in `STLC.ml`, + but also in constraints that manipulate types + containing inference variables. + + + `Unif.ml,mli`: the unification engine. This is fully + implemented. You will need to understand its interface + (`Unif.mli`) to solve equality constraints in `Solver.ml`. + + + `Untyped.ml`: the syntax of untyped terms. + + + `Utils.ml`: useful bits and pieces. Feel free + to add your stuff there. + + + `support/`: the boring modules that help for debugging and + testing, but you probably do not need to use or touch them + directly. (Feel free to modify stuff there if you want.) + + * `ConstraintPrinter.ml,mli`: a pretty-printer for constraints + + * `ConstraintSimplifier.ml,mli`: a "simplifier" for constraints, + that is used to implement the `--log-solver` option of + `minihell`. + + * `Decode.ml,mli`: reconstructs user-facing types from + the current state of the unification engine. + + * `Printer.ml`: support code for all pretty-printers. + + * `SatConstraint.ml`: "satisfiable constraints" are constraints + that do not produce an output (no elaboration to + well-typed terms). This simpler type (no GADTs in sight) is + used under the hood for simplification and pretty-printing. + + * `STLCPrinter.ml,mli`: a pretty-printer for explicitly-typed + terms. + + * `UntypedLexer.mll`: an `ocamllex` lexer for the untyped + language. + + * `UntypedParser.mly`: a `menhir` grammar for the untyped + language. + + * `UntypedPrinter.ml,mli`: a pretty-printer for the untyped + language. + + (Again: if you decide to take risks and implement the project in + a different language, you do not need to reimplement any of that, + but then you are on your own for testing and debugging.) + + +### Extensions + +After you are done with the core/mandatory part of the project, we +expect you to implement extensions of your choosing. Surprise us! + +In case it can serve as inspiration, here are a few ideas for +extensions. + +- You could add some extra features to the programming language + supported. For example: + + + Base types such as integers and booleans + + + Recursive function definitions. + + + Lists, with a built-in shallow pattern-matching construction + + match .. with + | [] -> ... + | x :: xs -> ... + + + n-ary tuples: this sounds easy but it is in fact quite tricky, + because the strongly-typed GADT of constraints does not make it + easy to generate a "dynamic" number of existential + quantifiers. This is a difficult exercise in strongly-typed + programming. + + (We in fact wrote a short paper on how we approached this problem + in the context of Inferno, and the difficulties are the same. + https://inria.hal.science/hal-03145040 + Feel free to reuse our approach if you want.) + + + Patterns and pattern-matching. You get into the same difficulties + as n-ary tuples, so working on n-ary tuples first is probably + a good idea. + + + You could implement ML-style parametric polymorphism, with + generalization during type inference. This is the hardest of the + extensions mentioned here by far: we have not tried this, and we + do not know how the interaction between enumeration/generation + and generalization will work out. (One might need to change the + term generation to stop going 'in and out' of sub-constraints, or + at least to be careful in the implementation of ML-style + generalization to use a (semi-)persistent data structure, a tree + of generalization regions instead of a stack.) + +- You could use the term generator for property-based testing of + *something*: implement a procedure that manipulates explicitly-typed + terms -- for example, a semantics-preserving program transformation, + or a compilation to the programming language of your choice, etc -- + and test the property using random generator. + + (You could probably do some greybox fuzzing by providing an + alternative search monad using Crowbar: + https://github.com/stedolan/crowbar ) + +- You could improve the random-generation strategy by thinking about + the shape of terms that you want to generate, tweaking the + distribution, providing options to generate terms using only + a subset of the rules, etc. + + An important improvement in practice would be to provide some form + of shrinking. + +- You could look at existing work on testing of language + implementations by random generation, and see if they can be + integrated in the present framework. For example, + + https://janmidtgaard.dk/papers/Midtgaard-al%3aICFP17-full.pdf + + performs random generation of programs with a custom effect system, + that determines if a given sub-expression performs an observable + side-effect or not -- to only generate terms with a deterministic + evaluation order. Could one extend the inference engine to also + check that the program is deterministic in this sense, and thus + replace the custom-built program generator used in this work? diff --git a/bin/dune b/bin/dune new file mode 100644 index 0000000..593b22d --- /dev/null +++ b/bin/dune @@ -0,0 +1,11 @@ +(executable + (public_name minihell) + (libraries constrained_generation menhirLib) + (modules minihell) +) + +(executable + (public_name minigen) + (libraries constrained_generation) + (modules minigen) +) diff --git a/bin/minigen.ml b/bin/minigen.ml new file mode 100644 index 0000000..0fac657 --- /dev/null +++ b/bin/minigen.ml @@ -0,0 +1,55 @@ +type config = { + exhaustive : bool; + depth : int; + count : int; + seed : int option; +} + +let config = + let exhaustive = ref false in + let depth = ref 10 in + let count = ref 1 in + let seed = ref None in + let usage = + Printf.sprintf + "Usage: %s [options]" + Sys.argv.(0) in + let spec = Arg.align [ + "--exhaustive", Arg.Set exhaustive, + " Exhaustive enumeration"; + "--depth", Arg.Set_int depth, + " Depth of generated terms"; + "--count", Arg.Set_int count, + " Number of terms to generate"; + "--seed", Arg.Int (fun s -> seed := Some s), + " Fixed seed for the random number generator"; + ] in + Arg.parse spec (fun s -> raise (Arg.Bad s)) usage; + { + exhaustive = !exhaustive; + depth = !depth; + count = !count; + seed = !seed; + } + +let () = + match config.seed with + | None -> Random.self_init () + | Some s -> Random.init s + +module RandGen = Generator.Make(MRand) +module SeqGen = Generator.Make(MSeq) + +let () = + begin + if config.exhaustive then + MSeq.run @@ SeqGen.typed ~depth:config.depth + else + MRand.run @@ RandGen.typed ~depth:config.depth + end + |> Seq.take config.count + |> Seq.map STLCPrinter.print_term + |> List.of_seq + |> PPrint.(separate (hardline ^^ hardline)) + |> Utils.string_of_doc + |> print_endline diff --git a/bin/minihell.ml b/bin/minihell.ml new file mode 100644 index 0000000..a27081a --- /dev/null +++ b/bin/minihell.ml @@ -0,0 +1,165 @@ +(* We instantiate the machinery with the Empty functor, + which forbids any use of the Do constructor. *) +module Untyped = Untyped.Make(Utils.Empty) +module UntypedPrinter = UntypedPrinter.Make(Utils.Empty) +module Constraint = Constraint.Make(Utils.Empty) +module Infer = Infer.Make(Utils.Empty) +module ConstraintPrinter = ConstraintPrinter.Make(Utils.Empty) +module Solver = Solver.Make(Utils.Empty) + +type config = { + show_source : bool; + show_constraint : bool; + log_solver : bool; + show_type : bool; + show_typed_term : bool; +} + +module LexUtil = MenhirLib.LexerUtil + +let print_section header doc = + doc + |> Printer.with_header header + |> Utils.string_of_doc + |> print_endline + |> print_newline + +let call_parser ~config parser_fun input_path = + let ch = open_in input_path in + Fun.protect ~finally:(fun () -> close_in ch) @@ fun () -> + let lexbuf = Lexing.from_channel ch in + let lexbuf = LexUtil.init input_path lexbuf in + match parser_fun UntypedLexer.read lexbuf with + | term -> + let term = Untyped.freshen term in + if config.show_source then + print_section "Input term" + (UntypedPrinter.print_term term); + term + | exception UntypedParser.Error -> + let open Lexing in + let start = lexbuf.lex_start_p in + let stop = lexbuf.lex_curr_p in + let loc = + let line pos = pos.pos_lnum in + let column pos = pos.pos_cnum - pos.pos_bol in + if start.pos_lnum = stop.pos_lnum then + Printf.sprintf "%d.%d-%d" + (line start) + (line stop) (column stop) + else + Printf.sprintf "%d.%d-%d.%d" + (line start) (column start) + (line stop) (column stop) + in + Printf.ksprintf failwith + "%s:%s: syntax error" + (Filename.quote input_path) + loc + +let call_typer ~config (term : Untyped.term) = + let cst = + let w = Constraint.Var.fresh "final_type" in + Constraint.(Exist (w, None, + Conj(Infer.has_type Untyped.Var.Map.empty term w, + Infer.decode w))) + in + if config.show_constraint then + print_section "Generated constraint" + (ConstraintPrinter.print_constraint cst); + let logs, result = + let logs, env, nf = + Solver.eval ~log:config.log_solver Unif.Env.empty cst + in + let result = + match nf with + | NRet v -> Ok (v (Decode.decode env)) + | NErr e -> Error e + | NDo _ -> . + in + logs, result + in + if config.log_solver then + print_section "Constraint solving log" + PPrint.(separate hardline logs); + result + +let print_result ~config result = + match result with + | Ok (term, ty) -> + if config.show_type then + print_section "Inferred type" + (STLCPrinter.print_ty ty); + if config.show_typed_term then + print_section "Elaborated term" + (STLCPrinter.print_term term); + | Error err -> + print_section "Error" (match err with + | Infer.Clash (ty1, ty2) -> + Printer.incompatible + (STLCPrinter.print_ty ty1) + (STLCPrinter.print_ty ty2) + | Infer.Cycle (Utils.Cycle v) -> + Printer.cycle + (Printer.inference_variable + (Constraint.Var.print v)) + ) + +let process ~config input_path = + let ast = + call_parser ~config UntypedParser.term_eof input_path in + let result = + call_typer ~config ast in + let () = + print_result ~config result in + () +;; + +let parse_args () = + let show_source = ref false in + let show_constraint = ref false in + let log_solver = ref false in + let show_type = ref true in + let show_typed_term = ref true in + let inputs = Queue.create () in + let add_input path = Queue.add path inputs in + let usage = + Printf.sprintf + "Usage: %s [options] " + Sys.argv.(0) in + let spec = Arg.align [ + "--show-source", + Arg.Set show_source, + " Show input source"; + "--show-constraint", + Arg.Set show_constraint, + " Show the generated constraint"; + "--log-solver", + Arg.Set log_solver, + " Log intermediate constraints during solving"; + "--show-type", + Arg.Set show_type, + " Show the inferred type (or error)"; + "--show-typed-term", + Arg.Set show_typed_term, + " Show the inferred type (or error)"; + ] in + Arg.parse spec add_input usage; + let config = + { + show_source = !show_source; + show_constraint = !show_constraint; + log_solver = !log_solver; + show_type = !show_type; + show_typed_term = !show_typed_term; + } + in + let input_paths = (inputs |> Queue.to_seq |> List.of_seq) in + config, input_paths + +let () = + let config, input_paths = parse_args () in + List.iteri (fun i input -> + if i > 0 then print_newline (); + process ~config input + ) input_paths diff --git a/constrained_generation.opam b/constrained_generation.opam new file mode 100644 index 0000000..b316c0a --- /dev/null +++ b/constrained_generation.opam @@ -0,0 +1,18 @@ +name: "constrained_generation" +opam-version: "2.0" +maintainer: "gabriel.scherer@inria.fr" +authors: [ + "Gabriel Scherer " +] +license: "MIT" +synopsis: "A random program generator based on inferno-style constraint generation" +build: [ + ["dune" "build" "-p" name "-j" jobs] +] +depends: [ + "ocaml" { >= "4.08" } + "dune" { >= "2.8" } + "unionFind" { >= "20220109" } + "pprint" + "menhir" +] diff --git a/dune b/dune new file mode 100644 index 0000000..5f38e0c --- /dev/null +++ b/dune @@ -0,0 +1,5 @@ +; build information for the cram tests in tests.t +; see tests.t/run.t for more details +(cram + (deps %{bin:minihell} %{bin:minigen}) +) diff --git a/dune-project b/dune-project new file mode 100644 index 0000000..d57719c --- /dev/null +++ b/dune-project @@ -0,0 +1,3 @@ +(lang dune 2.8) +(using menhir 2.1) +(cram enable) \ No newline at end of file diff --git a/src/Constraint.ml b/src/Constraint.ml new file mode 100644 index 0000000..5bf5bd9 --- /dev/null +++ b/src/Constraint.ml @@ -0,0 +1,102 @@ +(** Constraint defines the type of type-inference constraints that our + solver understands -- see Solver.ml. + + In theory this can let you perform type inference for many + different languages, as long as their typing rules can be + expressed by the constraints we have defined. In practice most + non-trivial language features will require extending the language + of constraints (and the solver) with new constructs. *) + +(* We found it convenient to include some type definitions both inside + and outside the Make functor. Please don't let this small quirk + distract you. *) +module Types = struct + module Var = Utils.Variables() + + type variable = Var.t + type structure = variable Structure.t + + type ty = + | Var of variable + | Constr of structure +end +include Types + +module Make (T : Utils.Functor) = struct + include Types + + type eq_error = + | Clash of STLC.ty Utils.clash + | Cycle of variable Utils.cycle + + (** A value of type [('a, 'e) t] is a constraint + whose resolution will either succeed, and produce + a witness of type ['a], or fail and produce + error information at type ['e]. + + In particular, type inference from an untyped language + can be formulated as a function of type + [untyped_term -> (typed_term, type_error) t]. + + This is a GADT (Generalized Algebraic Datatype). If + you are unfamiliar with function declarations of the + form + [let rec foo : type a e . (a, e) t -> ...] + then you should read the GADT chapter of the OCaml manual: + https://v2.ocaml.org/releases/5.1/htmlman/gadts-tutorial.html + *) + type ('ok, 'err) t = + | Ret : 'a on_sol -> ('a, 'e) t + | Err : 'e -> ('a, 'e) t + | Map : ('a, 'e) t * ('a -> 'b) -> ('b, 'e) t + | MapErr : ('a, 'e) t * ('e -> 'f) -> ('a, 'f) t + | Conj : ('a, 'e) t * ('b, 'e) t -> ('a * 'b, 'e) t + | Eq : variable * variable -> (unit, eq_error) t + | Exist : variable * structure option * ('a, 'e) t -> ('a, 'e) t + | Decode : variable -> (STLC.ty, variable Utils.cycle) t + | Do : ('a, 'e) t T.t -> ('a, 'e) t + + and 'a on_sol = (variable -> STLC.ty) -> 'a + (** A value of type [('a, 'e) t] represents a part of an + inference constraint, but the value of type ['a] that + it produces on success may depend on the solution to + the whole constraint, not just for this part of the + constraint. + + For example, consider the untyped term + [(lambda y. 42) 0] + The sub-constraint generated for [lambda y. 42] could + be resolved first by the constraint solver and found + satisfiable, but at this point we don't know what the + final type for [y] will be, it will be + a still-undetermined inference variable [?w]. The + actual type for [?w] will only be determined when + resolving other parts of the whole constraint that + handle the application of [0]. We have to solve the + whole constraint, and then come back to elaborate an + explictly-typed term [lambda (y : int). 42]. + + The solution to the whole constraint is represented by + a mapping from inference variables to elaborated + types. + *) + + let (let+) c f = Map(c, f) + let (and+) c1 c2 = Conj(c1, c2) + (** These are "binding operators". Usage example: + {[ + let+ ty1 = Decode w1 + and+ ty2 = Decode w2 + in Constr (Arrow (ty1, ty2)) + ]} + + After desugaring the binding operators, this is equivalent to + {[ + Map(Conj(Decode w1, Decode w2), fun (ty1, ty2) -> + Constr (Arrow (ty1, ty2))) + ]} + + For more details on binding operators, see + https://v2.ocaml.org/releases/5.1/manual/bindingops.html + *) +end diff --git a/src/Generator.ml b/src/Generator.ml new file mode 100644 index 0000000..50276fc --- /dev/null +++ b/src/Generator.ml @@ -0,0 +1,20 @@ +module Make(M : Utils.MonadPlus) = struct + module Untyped = Untyped.Make(M) + module Constraint = Constraint.Make(M) + module Infer = Infer.Make(M) + module Solver = Solver.Make(M) + + (* just in case... *) + module TeVarSet = Untyped.Var.Set + module TyVarSet = STLC.TyVar.Set + +let untyped : Untyped.term = + Do (M.delay (Utils.not_yet "Generator.untyped")) + +let constraint_ : (STLC.term, Infer.err) Constraint.t = + Do (M.delay (Utils.not_yet "Generator.constraint_")) + +let typed ~depth = + Utils.not_yet "Generator.typed" depth + +end diff --git a/src/Generator.mli b/src/Generator.mli new file mode 100644 index 0000000..06c47bf --- /dev/null +++ b/src/Generator.mli @@ -0,0 +1,11 @@ +module Make(M : Utils.MonadPlus) : sig + module Untyped := Untyped.Make(M) + module Constraint := Constraint.Make(M) + module Infer := Infer.Make(M) + + val untyped : Untyped.term + + val constraint_ : (STLC.term, Infer.err) Constraint.t + + val typed : depth:int -> STLC.term M.t +end diff --git a/src/Infer.ml b/src/Infer.ml new file mode 100644 index 0000000..b15f6e3 --- /dev/null +++ b/src/Infer.ml @@ -0,0 +1,83 @@ +(** Infer contains the logic to generate an inference constraint from + an untyped term, that will elaborate to an explicitly-typed term + or fail with a type error. *) + +(* You have to implement the [has_type] function below, + which is the constraint-generation function. *) + +module Make(T : Utils.Functor) = struct + module Constraint = Constraint.Make(T) + open Constraint + module Untyped = Untyped.Make(T) + + (** The "environment" of the constraint generator maps each program + variable to an inference variable representing its (monomorphic) + type. + + For example, to infer the type of the term [lambda x. t], we + will eventually call [has_type env t] with an environment + mapping [x] to a local inference variable representing its type. + *) + module Env = Untyped.Var.Map + type env = variable Env.t + + type err = eq_error = + | Clash of STLC.ty Utils.clash + | Cycle of Constraint.variable Utils.cycle + + type 'a constraint_ = ('a, err) Constraint.t + + let eq v1 v2 = Eq(v1, v2) + let decode v = MapErr(Decode v, fun e -> Cycle e) + + (** This is a helper function to implement constraint generation for + the [Annot] construct. + + [bind ty k] takes a type [ty], and a constraint [k] parametrized + over a constraint variable. It creates a constraint context that + binds a new constraint variable [?w] that must be equal to [ty], + and places [k ?w] within this context. + + For example, if [ty] is the type [?v1 -> (?v2 -> ?v3)] , then + [bind ty k] could be the constraint + [∃(?w1 = ?v2 -> ?v3). ∃(?w2 = ?v1 -> ?w1). k ?w2], or equivalently + [∃?w3 ?w4. ?w3 = ?v1 -> ?w4 ∧ ?w4 = ?v2 -> ?v3 ∧ k ?w3]. + *) + let rec bind (ty : STLC.ty) (k : Constraint.variable -> ('a, 'e) t) : ('a, 'e) t = + (* Feel free to postpone implementing this function + until you implement the Annot case below. *) + Utils.not_yet "Infer.bind" (ty, k, fun () -> bind) + + (** This function generates a typing constraint from an untyped term: + [has_type env t w] generates a constraint [C] which contains [w] as + a free inference variable, such that [C] has a solution if and only + if [t] is well-typed in [env], and in that case [w] is the type of [t]. + + For example, if [t] is the term [lambda x. x], then [has_type env t w] + generates a constraint equivalent to [∃?v. ?w = (?v -> ?v)]. + + Precondition: when calling [has_type env t], [env] must map each + term variable that is free in [t] to an inference variable. + *) + let rec has_type (env : env) (t : Untyped.term) (w : variable) : (STLC.term, err) t = + match t with + | Untyped.Var x -> + Utils.not_yet "Infer.has_type: Var case" (env, t, w, x) + | Untyped.App (t, u) -> + Utils.not_yet "Infer.has_type: App case" (env, t, u, fun () -> has_type) + | Untyped.Abs (x, t) -> + Utils.not_yet "Infer.has_type: Abs case" (env, x, t, fun () -> has_type) + | Untyped.Let (x, t, u) -> + Utils.not_yet "Infer.has_type: Let case" (env, x, t, u, fun () -> has_type) + | Untyped.Annot (t, ty) -> + Utils.not_yet "Infer.has_type: Let case" (env, t, ty, bind, fun () -> has_type) + | Untyped.Tuple ts -> + Utils.not_yet "Infer.has_type: Let case" (env, ts, fun () -> has_type) + | Untyped.LetTuple (xs, t, u) -> + Utils.not_yet "Infer.has_type: Let case" (env, xs, t, u, fun () -> has_type) + | Do p -> + (* Feel free to postone this until you start looking + at random generation. Getting type inference to + work on all the other cases is a good first step. *) + Utils.not_yet "Infer.has_type: Let case" (env, p, fun () -> has_type) +end diff --git a/src/Infer.mli b/src/Infer.mli new file mode 100644 index 0000000..0bbd98b --- /dev/null +++ b/src/Infer.mli @@ -0,0 +1,18 @@ +module Make(T : Utils.Functor) : sig + module Untyped := Untyped.Make(T) + module Constraint := Constraint.Make(T) + + type err = + | Clash of STLC.ty Utils.clash + | Cycle of Constraint.variable Utils.cycle + + type 'a constraint_ = ('a, err) Constraint.t + + val eq : Constraint.variable -> Constraint.variable -> unit constraint_ + val decode : Constraint.variable -> STLC.ty constraint_ + + type env = Constraint.variable Untyped.Var.Map.t + + val has_type : env -> + Untyped.term -> Constraint.variable -> STLC.term constraint_ +end diff --git a/src/MRand.ml b/src/MRand.ml new file mode 100644 index 0000000..dba83ba --- /dev/null +++ b/src/MRand.ml @@ -0,0 +1,25 @@ +type 'a t = MRand_not_implemented_yet + +let map (f : 'a -> 'b) (s : 'a t) : 'b t = + Utils.not_yet "MRand.map" (f, s) + +let return (x : 'a) : 'a t = + Utils.not_yet "MRand.return" x + +let bind (sa : 'a t) (f : 'a -> 'b t) : 'b t = + Utils.not_yet "MRand.bind" (sa, f) + +let delay (f : unit -> 'a t) : 'a t = + Utils.not_yet "MRand.delay" (f ()) + +let sum (li : 'a t list) : 'a t = + Utils.not_yet "MRand.sum" li + +let fail : 'a t = + MRand_not_implemented_yet + +let one_of (vs : 'a array) : 'a t = + Utils.not_yet "MRand.one_of" vs + +let run (s : 'a t) : 'a Seq.t = + Utils.not_yet "MRand.run" s diff --git a/src/MRand.mli b/src/MRand.mli new file mode 100644 index 0000000..ef1cc7f --- /dev/null +++ b/src/MRand.mli @@ -0,0 +1 @@ +include Utils.MonadPlus diff --git a/src/MSeq.ml b/src/MSeq.ml new file mode 100644 index 0000000..7f39da0 --- /dev/null +++ b/src/MSeq.ml @@ -0,0 +1,25 @@ +type 'a t = MSeq_not_implemented_yet + +let map (f : 'a -> 'b) (s : 'a t) : 'b t = + Utils.not_yet "MSeq.map" (f, s) + +let return (x : 'a) : 'a t = + Utils.not_yet "MSeq.return" x + +let bind (sa : 'a t) (f : 'a -> 'b t) : 'b t = + Utils.not_yet "MSeq.bind" (sa, f) + +let delay (f : unit -> 'a t) : 'a t = + Utils.not_yet "MSeq.delay" (f ()) + +let sum (li : 'a t list) : 'a t = + Utils.not_yet "MSeq.sum" li + +let fail : 'a t = + MSeq_not_implemented_yet + +let one_of (vs : 'a array) : 'a t = + Utils.not_yet "MSeq.one_of" vs + +let run (s : 'a t) : 'a Seq.t = + Utils.not_yet "MSeq.run" s diff --git a/src/MSeq.mli b/src/MSeq.mli new file mode 100644 index 0000000..ef1cc7f --- /dev/null +++ b/src/MSeq.mli @@ -0,0 +1 @@ +include Utils.MonadPlus diff --git a/src/STLC.ml b/src/STLC.ml new file mode 100644 index 0000000..cb522ca --- /dev/null +++ b/src/STLC.ml @@ -0,0 +1,23 @@ +(* A type of explicitly-typed terms. *) + +module TyVar = Structure.TyVar + +type 'v ty_ = + | Constr of ('v, 'v ty_) Structure.t_ + +type raw_ty = string ty_ +type ty = TyVar.t ty_ + +let rec freshen_ty (Constr s) = + Constr (Structure.freshen freshen_ty s) + +module TeVar = Utils.Variables () + +type term = + | Var of TeVar.t + | App of term * term + | Abs of TeVar.t * ty * term + | Let of TeVar.t * ty * term * term + | Annot of term * ty + | Tuple of term list + | LetTuple of (TeVar.t * ty) list * term * term diff --git a/src/Solver.ml b/src/Solver.ml new file mode 100644 index 0000000..268a96f --- /dev/null +++ b/src/Solver.ml @@ -0,0 +1,62 @@ +(* + As explained in the README.md ("Abstracting over an effect"), + this module as well as other modules is parametrized over + an arbitrary effect [T : Functor]. +*) + +module Make (T : Utils.Functor) = struct + module Constraint = Constraint.Make(T) + module SatConstraint = SatConstraint.Make(T) + module ConstraintSimplifier = ConstraintSimplifier.Make(T) + module ConstraintPrinter = ConstraintPrinter.Make(T) + + type env = Unif.Env.t + type log = PPrint.document list + + let make_logger c0 = + let logs = Queue.create () in + let c0_erased = SatConstraint.erase c0 in + let add_to_log env = + let doc = + c0_erased + |> ConstraintSimplifier.simplify env + |> ConstraintPrinter.print_sat_constraint + in + Queue.add doc logs + in + let get_log () = + logs |> Queue.to_seq |> List.of_seq + in + add_to_log, get_log + + (** See [../README.md] ("High-level description") or [Solver.mli] + for a description of normal constraints and + our expectations regarding the [eval] function. *) + type ('a, 'e) normal_constraint = + | NRet of 'a Constraint.on_sol + | NErr of 'e + | NDo of ('a, 'e) Constraint.t T.t + + let eval (type a e) ~log (env : env) (c0 : (a, e) Constraint.t) + : log * env * (a, e) normal_constraint + = + let add_to_log, get_log = + if log then make_logger c0 + else ignore, (fun _ -> []) + in + (* We recommend calling the function [add_to_log] above + whenever you get an updated environment. Then call + [get_log] at the end to get a list of log message. + + $ dune exec -- minihell --log-solver foo.test + + will show a log that will let you see the evolution + of your input constraint (after simplification) as + the solver progresses, which is useful for debugging. + + (You can also tweak this code temporarily to print stuff on + stderr right away if you need dirtier ways to debug.) + *) + Utils.not_yet "Solver.eval" (env, c0, add_to_log, get_log) + +end diff --git a/src/Solver.mli b/src/Solver.mli new file mode 100644 index 0000000..376b7af --- /dev/null +++ b/src/Solver.mli @@ -0,0 +1,40 @@ +module Make (T : Utils.Functor) : sig + module Constraint := Constraint.Make(T) + + type env = Unif.Env.t + type log = PPrint.document list + + (** Normal constraints are the result + of solving constraints without computing + inside [Do p] nodes. *) + type ('a, 'e) normal_constraint = + | NRet of 'a Constraint.on_sol + (** A succesfully elaborated value. + (See Constraint.ml for exaplanations on [on_sol].) *) + + | NErr of 'e + (** A failed/false constraint. *) + + | NDo of ('a, 'e) Constraint.t T.t + (** A constraint whose evaluation encountered an effectful + constraint in a [Do p] node. + + We propose an evaluation rule of the form + [eval E[Do p] = NDo E[p]] + where a [Do (p : ('a1, 'e1) Constraint.t T.t)] node placed + inside an evaluation context [E] bubbles "all the way to the + top" in the result. [E[p]] is defined by using [T.map] to lift + the context-plugging operation + [E[_] : ('a1, 'e1) Constraint.t -> ('a2, 'e2) Constraint.t] + *) + + (** If [~log:true] is passed in input, collect a list of + intermediate steps (obtained from the solver + environment and the original constraint by + constraint simplification) as the constraint-solving + progresses. Otherwise the returned [log] will not be + used and could be returned empty. *) + val eval : + log:bool -> env -> ('a, 'e) Constraint.t -> + log * env * ('a, 'e) normal_constraint +end diff --git a/src/Structure.ml b/src/Structure.ml new file mode 100644 index 0000000..0e1793d --- /dev/null +++ b/src/Structure.ml @@ -0,0 +1,71 @@ +(** Type-formers are defined explicit as type "structures". + + Type structures ['a t] are parametric over the type of + their leaves. Typical tree-shaped representation of + types would use [ty t], a structure carrying types as + sub-expressions, but the types manipulated by the + constraint solver are so-called "shallow types" that + always use inference variables at the leaves. We cannot + write, say, [?w = α -> (β * γ)], one has to write + [∃?w1 ?w2 ?w3 ?w4. + ?w = ?w1 -> ?w2 + ∧ ?w1 = α + ∧ ?w2 = ?w3 * ?w4 + ∧ ?w3 = β + ∧ ?w4 = γ] instead. + + (The implementation goes through a first step [('v, 'a) t_] + that is also parametrized over a notion of type variable, + just like ['v Untyped.term] -- see the documentation there.) +*) + +module TyVar = Utils.Variables() + +type ('v, 'a) t_ = + | Var of 'v + (** Note: a type variable here represents a rigid/opaque/abstract type [α, β...], + not a flexible inference variable like [?w] in constraints. + + For example, for two distinct type variables [α, β] + the term [(lambda x. x : α → α) (y : β)] is always + ill-typed. *) + | Arrow of 'a * 'a + | Prod of 'a list + +type 'a raw = (string, 'a) t_ +type 'a t = (TyVar.t, 'a) t_ + +let iter f = function + | Var _alpha -> () + | Arrow (t1, t2) -> f t1; f t2 + | Prod ts -> List.iter f ts + +let map f = function + | Var alpha -> Var alpha + | Arrow (t1, t2) -> Arrow (f t1, f t2) + | Prod ts -> Prod (List.map f ts) + +let merge f s1 s2 = + Utils.not_yet "Structure.merge" (f, s1, s2) + +let global_tyvar : string -> TyVar.t = + (* There are no binders for type variables, which are scoped + globally for the whole term. *) + let tenv = Hashtbl.create 5 in + fun alpha -> + match Hashtbl.find tenv alpha with + | alpha_var -> alpha_var + | exception Not_found -> + let alpha_var = TyVar.fresh alpha in + Hashtbl.add tenv alpha alpha_var; + alpha_var + +let freshen freshen = function + | Var alpha -> Var (global_tyvar alpha) + | Arrow (t1, t2) -> Arrow (freshen t1, freshen t2) + | Prod ts -> Prod (List.map freshen ts) + +let print p = function + | Var v -> TyVar.print v + | Prod ts -> Printer.product (List.map p ts) + | Arrow (t1, t2) -> Printer.arrow (p t1) (p t2) diff --git a/src/Structure.mli b/src/Structure.mli new file mode 100644 index 0000000..8fd85f1 --- /dev/null +++ b/src/Structure.mli @@ -0,0 +1,19 @@ +module TyVar : module type of Utils.Variables() + +type ('v, 'a) t_ = + | Var of 'v + | Arrow of 'a * 'a + | Prod of 'a list + +type 'a raw = (string, 'a) t_ +type 'a t = (TyVar.t, 'a) t_ + +val iter : ('a -> unit) -> ('v, 'a) t_ -> unit + +val map : ('a -> 'b) -> ('v, 'a) t_ -> ('v, 'b) t_ + +val merge : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t option + +val freshen : ('a -> 'b) -> 'a raw -> 'b t + +val print : ('a -> PPrint.document) -> 'a t -> PPrint.document diff --git a/src/Unif.ml b/src/Unif.ml new file mode 100644 index 0000000..5e2e625 --- /dev/null +++ b/src/Unif.ml @@ -0,0 +1,133 @@ +(* There is nothing that you have to implement in this file/module, + and no particular need to read its implementation. On the other hand, + you want to understand the interface exposed in [Unif.mli] has it + is important to implement a constraint solver in Solver.ml. *) + +module UF = UnionFind.Make(UnionFind.StoreMap) + +type var = Constraint.variable + +(* The internal representation in terms of union-find nodes. *) +type uvar = unode UF.rref +and unode = { + var: var; + data: uvar Structure.t option; +} + +(* The user-facing representation hides union-find nodes, + replaced by the corresponding constraint variables. *) +type repr = { + var: var; + structure: var Structure.t option; +} + +module Env : sig + type t = { + store: unode UF.store; + map: uvar Constraint.Var.Map.t; + } + + val empty : t + + val mem : var -> t -> bool + + val add : var -> Constraint.structure option -> t -> t + + val uvar : var -> t -> uvar + + val repr : var -> t -> repr +end = struct + type t = { + store: unode UF.store; + map: uvar Constraint.Var.Map.t; + } + + let empty = + let store = UF.new_store () in + let map = Constraint.Var.Map.empty in + { store; map } + + let uvar var env : uvar = + Constraint.Var.Map.find var env.map + + let mem var env = + Constraint.Var.Map.mem var env.map + + let add var structure env = + let data = Option.map (Structure.map (fun v -> uvar v env)) structure in + let uvar = UF.make env.store { var; data } in + { env with map = Constraint.Var.Map.add var uvar env.map } + + let repr var env = + let { var; data; } = UF.get env.store (uvar var env) in + let var_of_uvar uv = (UF.get env.store uv).var in + let structure = Option.map (Structure.map var_of_uvar) data in + { var; structure; } +end + +type clash = var Utils.clash +exception Clash of clash +exception Cycle of var Utils.cycle + +type err = + | Clash of clash + | Cycle of var Utils.cycle + +let check_no_cycle env v = + let open struct + type status = Visiting | Visited + end in + let table = Hashtbl.create 42 in + let rec loop v = + let n = UF.get env.Env.store v in + match Hashtbl.find table n.var with + | Visited -> + () + | Visiting -> + raise (Cycle (Utils.Cycle n.var)) + | exception Not_found -> + Hashtbl.replace table n.var Visiting; + Option.iter (Structure.iter loop) n.data; + Hashtbl.replace table n.var Visited; + in loop v + +let rec unify orig_env v1 v2 : (Env.t, err) result = + let env = { orig_env with Env.store = UF.copy orig_env.Env.store } in + let queue = Queue.create () in + Queue.add (Env.uvar v1 env, Env.uvar v2 env) queue; + match unify_uvars env.Env.store queue with + | exception Clash clash -> Error (Clash clash) + | () -> + match check_no_cycle env (Env.uvar v1 env) with + | exception Cycle v -> Error (Cycle v) + | () -> Ok env + +and unify_uvars store (queue : (uvar * uvar) Queue.t) = + match Queue.take_opt queue with + | None -> () + | Some (u1, u2) -> + ignore (UF.merge store (merge queue) u1 u2); + unify_uvars store queue + +and merge queue (n1 : unode) (n2 : unode) : unode = + let clash () = raise (Clash (n1.var, n2.var)) in + let data = + match n1.data, n2.data with + | None, None -> None + | None, (Some _ as d) | (Some _ as d), None -> d + | Some st1, Some st2 -> + match + Structure.merge (fun v1 v2 -> + Queue.add (v1, v2) queue; + v1 + ) st1 st2 + with + | None -> clash () + | Some d -> Some d + in + { n1 with data } + +let unifiable env v1 v2 = + match unify env v1 v2 with + | Ok _ -> true + | Error _ -> false diff --git a/src/Unif.mli b/src/Unif.mli new file mode 100644 index 0000000..30a0e18 --- /dev/null +++ b/src/Unif.mli @@ -0,0 +1,62 @@ +(** The Unif module provides unification, which is a key ingredient of + type inference. This is exposed as a persistent "equation + environment" [Unif.Env.t] that stores the current knowledge on + inference variables obtained by constraint evaluation: + + - which inference variables are equal to each other + - for each inference variable, its known structure (if any) +*) + +type var = Constraint.variable + +type repr = { + var: var; + structure: Constraint.structure option; +} +(** [repr] represents all the knowledge so far about an inference + variable, or rather an equivalence class of inference variables + that are equal to each other: + - [var] is a choice of canonical representant for the equivalence class + - [structure] is the known structure (if any) of these variables +*) + +module Env : sig + type t + + val empty : t + + val mem : var -> t -> bool + + val add : var -> Constraint.structure option -> t -> t + + (** [repr x env] gets the representant of [x] in [env]. + + @raise [Not_found] if [x] is not bound in [env]. *) + val repr : var -> t -> repr +end + +(** Unification errors: + - [Clash] indicates that we tried to + unify two variables with incompatible structure + -- equating them would make the context inconsistent. + It returns the pair of variables with incompatible structure. + + - [Cycle] indicates that unifying two variables + would introduce a cyclic, infinite type. + It returns one variable belonging to the prospective cycle. +*) + +type err = + | Clash of var Utils.clash + | Cycle of var Utils.cycle + +val unify : Env.t -> var -> var -> (Env.t, err) result +(** [unify env v1 v2] takes the current equation environment [env], + and tries to update it with the knowledge that [v1], [v2] must be + equal. If this equality would introduce an error, we fail with the + error report, otherwise we return the updated equation + environment. *) + +val unifiable : Env.t -> var -> var -> bool +(** [unifiable env v1 v2] tests if unifying [v1] and [v2] + in the equation environment [env] would succeed. *) diff --git a/src/Untyped.ml b/src/Untyped.ml new file mode 100644 index 0000000..745a98e --- /dev/null +++ b/src/Untyped.ml @@ -0,0 +1,63 @@ +(** Our syntax of untyped terms. + + As explained in the README.md ("Abstracting over an effect"), + this module as well as other modules is parametrized over + an arbitrary effect [T : Functor]. +*) + +module Make(T : Utils.Functor) = struct + module Var = STLC.TeVar + + (** ['t term_] is parametrized over the representation + of term variables. Most of the project code will + work with the non-parametrized instance [term] below. *) + type 't term_ = + | Var of 'tev + | App of 't term_ * 't term_ + | Abs of 'tev * 't term_ + | Let of 'tev * 't term_ * 't term_ + | Tuple of 't term_ list + | LetTuple of 'tev list * 't term_ * 't term_ + | Annot of 't term_ * 'tyv STLC.ty_ + | Do of 't term_ T.t + constraint 't = < tevar : 'tev; tyvar : 'tyv; > + + (** [raw_term] are terms with raw [string] for their + variables. Several binders may use the same + variable. These terms are produced by the parser. *) + type raw_term = < tevar : string; tyvar : string > term_ + + (** [term] are terms using [STLC.TeVar.t] variables, + which include a unique stamp to guarantee uniqueness + of binders. This is what most of the code manipulates. *) + type term = < tevar : Var.t; tyvar : Structure.TyVar.t; > term_ + + let freshen : raw_term -> term = + let module Env = Map.Make(String) in + let bind env x = + let x_var = Var.fresh x in + let env = Env.add x x_var env in + env, x_var + in + let rec freshen env = + function + | Var x -> Var (Env.find x env) + | App (t1, t2) -> App (freshen env t1, freshen env t2) + | Abs (x, t) -> + let env, x = bind env x in + Abs (x, freshen env t) + | Let (x, t1, t2) -> + let env_inner, x = bind env x in + Let (x, freshen env t1, freshen env_inner t2) + | Tuple ts -> + Tuple (List.map (freshen env) ts) + | LetTuple (xs, t1, t2) -> + let env_inner, xs = List.fold_left_map bind env xs in + LetTuple (xs, freshen env t1, freshen env_inner t2) + | Annot (t, ty) -> + Annot (freshen env t, STLC.freshen_ty ty) + | Do p -> + Do (T.map (freshen env) p) + in + fun t -> freshen Env.empty t +end diff --git a/src/Utils.ml b/src/Utils.ml new file mode 100644 index 0000000..d92104d --- /dev/null +++ b/src/Utils.ml @@ -0,0 +1,122 @@ +type 'a clash = 'a * 'a +type 'v cycle = Cycle of 'v [@@unboxed] + +let string_of_doc doc = + let buf = Buffer.create 128 in + PPrint.ToBuffer.pretty 0.9 80 buf doc; + Buffer.contents buf + +module Variables () : sig + type t = private { + name: string; + stamp: int; + } + + val compare : t -> t -> int + val eq : t -> t -> bool + + val fresh : string -> t + + val namegen : string array -> (unit -> t) + + val name : t -> string + + val print : t -> PPrint.document + + module Set : Set.S with type elt = t + module Map : Map.S with type key = t +end = struct + type t = { + name: string; + stamp: int; + } + + let name v = v.name + + let compare = Stdlib.compare + let eq n1 n2 = (compare n1 n2 = 0) + + let stamps = Hashtbl.create 42 + let fresh name = + let stamp = + match Hashtbl.find_opt stamps name with + | None -> 0 + | Some n -> n + in + Hashtbl.replace stamps name (stamp + 1); + { name; stamp; } + + let namegen names = + if names = [||] then failwith "namegen: empty names array"; + let counter = ref 0 in + let wrap n = n mod (Array.length names) in + fun () -> + let idx = !counter in + counter := wrap (!counter + 1); + fresh names.(idx) + + let print { name; stamp } = + if stamp = 0 then PPrint.string name + else Printf.ksprintf PPrint.string "%s/%x" name stamp + + module Key = struct + type nonrec t = t + let compare = compare + end + module Set = Set.Make(Key) + module Map = Map.Make(Key) +end + +module type Functor = sig + type 'a t + val map : ('a -> 'b) -> 'a t -> 'b t +end + +(** A signature for search monads, that represent + computations that enumerate zero, one or several + values. *) +module type MonadPlus = sig + include Functor + val return : 'a -> 'a t + val bind : 'a t -> ('a -> 'b t) -> 'b t + + val sum : 'a t list -> 'a t + val fail : 'a t + val one_of : 'a array -> 'a t + (** [fail] and [one_of] can be derived from [sum], but + they typically have simpler and more efficient + specialized implementations. *) + + val delay : (unit -> 'a t) -> 'a t + (** Many search monad implementations perform their computation + on-demand, when elements are requested, instead of forcing + computation already to produce the ['a t] value. + + In a strict language, it is easy to perform computation + too early in this case, for example + [M.sum [foo; bar]] will compute [foo] and [bar] eagerly + even though [bar] may not be needed if we only observe + the first element. + + The [delay] combinator makes this on-demand nature + explicit, for example one can write [M.delay + (fun () -> M.sum [foo; bar])] to avoid computing [foo] + and [bar] too early. Of course, if the underlying + implementation is in fact eager, then this may apply + the function right away.*) + + val run : 'a t -> 'a Seq.t + (** ['a Seq.t] is a type of on-demand sequences from the + OCaml standard library: + https://v2.ocaml.org/api/Seq.html + *) +end + +module Empty = struct + type 'a t = | (* the empty type *) + let map (_ : 'a -> 'b) : 'a t -> 'b t = function + | _ -> . +end +module _ = (Empty : Functor) + +let not_yet fname = fun _ -> failwith (fname ^ ": not implemented yet") diff --git a/src/dune b/src/dune new file mode 100644 index 0000000..7d0aee0 --- /dev/null +++ b/src/dune @@ -0,0 +1,9 @@ +(include_subdirs unqualified) ;; also look in subdirectories + +(library + (name constrained_generation) + (public_name constrained_generation) + (synopsis "A random program generator based on constraint solving") + (libraries unionFind pprint) + (wrapped false) +) diff --git a/src/support/ConstraintPrinter.ml b/src/support/ConstraintPrinter.ml new file mode 100644 index 0000000..a802f03 --- /dev/null +++ b/src/support/ConstraintPrinter.ml @@ -0,0 +1,54 @@ +module Make(T : Utils.Functor) = struct + open Constraint.Make(T) + open SatConstraint.Make(T) + + let print_var v = + Printer.inference_variable + (Constraint.Var.print v) + + let print_sat_constraint (c : sat_constraint) : PPrint.document = + let rec print_top = + fun c -> print_left_open c + + and print_left_open = + let _print_self = print_left_open + and print_next = print_conj in + fun ac -> + let rec peel = function + | Exist (v, s, c) -> + let binding = + (print_var v, + Option.map (Structure.print print_var) s) + in + let (bindings, body) = peel c in + (binding :: bindings, body) + | other -> ([], print_next other) + in + let (bindings, body) = peel ac in + Printer.exist bindings body + + and print_conj = + let _print_self = print_conj + and print_next = print_atom in + function + | Conj cs -> Printer.conjunction (List.map print_next cs) + | other -> print_next other + + and print_atom = + function + | Decode v -> Printer.decode (print_var v) + | False -> Printer.false_ + | Eq (v1, v2) -> + Printer.eq + (print_var v1) + (print_var v2) + | Do _ -> Printer.do_ + | (Exist _ | Conj _) as other -> + PPrint.parens (print_top other) + + in print_top c + + let print_constraint (type a e) (c : (a, e) Constraint.t) + : PPrint.document + = print_sat_constraint (erase c) +end diff --git a/src/support/ConstraintPrinter.mli b/src/support/ConstraintPrinter.mli new file mode 100644 index 0000000..38889ae --- /dev/null +++ b/src/support/ConstraintPrinter.mli @@ -0,0 +1,7 @@ +module Make(T : Utils.Functor) : sig + module Constraint := Constraint.Make(T) + module SatConstraint := SatConstraint.Make(T) + + val print_sat_constraint : SatConstraint.t -> PPrint.document + val print_constraint : ('a, 'e) Constraint.t -> PPrint.document +end diff --git a/src/support/ConstraintSimplifier.ml b/src/support/ConstraintSimplifier.ml new file mode 100644 index 0000000..e0dca13 --- /dev/null +++ b/src/support/ConstraintSimplifier.ml @@ -0,0 +1,79 @@ +module Make(T : Utils.Functor) = struct + open Constraint.Make(T) + open SatConstraint.Make(T) + + type env = Unif.Env.t + + let simplify (env : env) (c : sat_constraint) : sat_constraint = + let is_in_env v = Unif.Env.mem v env in + let normalize v = + match Unif.Env.repr v env with + | { var; _ } -> var + | exception Not_found -> v + in + let module VarSet = Constraint.Var.Set in + let exist v s (fvs, c) : VarSet.t * sat_constraint = + assert (Var.eq v (normalize v)); + let s = + match Unif.Env.repr v env with + | exception Not_found -> s + | { structure; _ } -> structure + in + let fvs = + let fvs = ref fvs in + Option.iter (Structure.iter (fun v -> fvs := VarSet.add v !fvs)) s; + !fvs in + VarSet.remove v fvs, + Exist (v, s, c) + in + let rec simpl (bvs : VarSet.t) (c : sat_constraint) : VarSet.t * sat_constraint = + match c with + | False -> + (* Note: we do not attempt to normalize (⊥ ∧ C) into ⊥, (∃w. ⊥) + into ⊥, etc. If a contradiction appears in the constraint, we + think that it is clearer to see it deep in the constraint + term, in the context where the solver found it, rather than + bring it all the way to the top and erasing the rest of the + constraint in the process. *) + VarSet.empty, False + | Conj cs -> + let (fvs, cs) = + List.fold_left (fun (fvs, cs) c -> + let (fvs', c) = simpl bvs c in + let cs' = match c with + | Conj cs' -> cs' + | _ -> [c] + in + (VarSet.union fvs' fvs, cs' @ cs) + ) (VarSet.empty, []) cs in + fvs, Conj cs + | Eq (v1, v2) -> + let v1, v2 = normalize v1, normalize v2 in + if Constraint.Var.eq v1 v2 then + VarSet.empty, Conj [] (* True *) + else begin match Unif.unifiable env v1 v2 with + | false -> + VarSet.empty, False + | true | exception Not_found -> + VarSet.of_list [v1; v2], Eq (v1, v2) + end + | Exist (v, s, c) -> + let fvs, c = simpl (VarSet.add v bvs) c in + if is_in_env v then (fvs, c) + else if not (VarSet.mem v fvs) then (fvs, c) + else exist v s (fvs, c) + | Decode v -> + let v = normalize v in + VarSet.singleton v, Decode v + | Do p -> + bvs, Do p + in + let rec add_exist (fvs, c) = + match VarSet.choose_opt fvs with + | None -> c + | Some v -> + add_exist (exist v None (fvs, c)) + in + add_exist (simpl VarSet.empty c) +end + diff --git a/src/support/ConstraintSimplifier.mli b/src/support/ConstraintSimplifier.mli new file mode 100644 index 0000000..1432be7 --- /dev/null +++ b/src/support/ConstraintSimplifier.mli @@ -0,0 +1,6 @@ +module Make(T : Utils.Functor) : sig + type env = Unif.Env.t + module SatConstraint := SatConstraint.Make(T) + + val simplify : env -> SatConstraint.t -> SatConstraint.t +end diff --git a/src/support/Decode.ml b/src/support/Decode.ml new file mode 100644 index 0000000..498df2c --- /dev/null +++ b/src/support/Decode.ml @@ -0,0 +1,35 @@ +type env = Unif.Env.t + +type slot = + | Ongoing + | Done of STLC.ty + +let new_var = + STLC.TyVar.namegen [|"α"; "β"; "γ"; "δ"|] + +let table = Hashtbl.create 42 + +let decode (env : env) (v : Constraint.variable) : STLC.ty = + let exception Found_cycle of Constraint.variable Utils.cycle in + let rec decode (v : Constraint.variable) : STLC.ty = + let repr = Unif.Env.repr v env in + begin match Hashtbl.find table repr.var with + | Done ty -> ty + | Ongoing -> raise (Found_cycle (Utils.Cycle repr.var)) + | exception Not_found -> + Hashtbl.replace table repr.var Ongoing; + let ty = + STLC.Constr ( + match repr.structure with + | Some s -> Structure.map decode s + | None -> Var (new_var ()) + ) + in + Hashtbl.replace table repr.var (Done ty); + ty + end + in + (* Because we perform an occur-check on unification, we can assume + that we never find any cycle during decoding: + [Found_cycle] should never be raised here. *) + decode v diff --git a/src/support/Decode.mli b/src/support/Decode.mli new file mode 100644 index 0000000..7bbe514 --- /dev/null +++ b/src/support/Decode.mli @@ -0,0 +1,3 @@ +type env = Unif.Env.t + +val decode : env -> Constraint.variable -> STLC.ty diff --git a/src/support/Printer.ml b/src/support/Printer.ml new file mode 100644 index 0000000..45c255d --- /dev/null +++ b/src/support/Printer.ml @@ -0,0 +1,113 @@ +open PPrint + +(** ?w *) +let inference_variable w = + string "?" ^^ w + +(** $t -> $u *) +let arrow t u = group @@ + t ^/^ string "->" ^/^ u + +(** {$t1 * $t2 * ... $tn} *) +let product ts = group @@ + braces (separate (break 1 ^^ star ^^ space) ts) + +(** ($term : $ty) *) +let annot term ty = group @@ + surround 2 0 lparen ( + term ^/^ colon ^//^ ty + ) rparen + +(** lambda $input. $body *) +let lambda ~input ~body = group @@ + string "lambda" + ^/^ input + ^^ string "." + ^//^ body + +(** let $var = $def in $body *) +let let_ ~var ~def ~body = group @@ + string "let" + ^/^ var + ^/^ string "=" + ^/^ def + ^/^ string "in" + ^//^ body + +(** $t $u *) +let app t u = group @@ + t ^//^ u + +(** (t1, t2... tn) *) +let tuple p ts = group @@ + match ts with + | [] -> lparen ^^ rparen + | _ -> + surround 2 0 lparen ( + match ts with + | [t] -> + (* For arity-1 tuples we print (foo,) + instead of (foo) which would be ambiguous. *) + p t ^^ comma + | _ -> + separate_map (comma ^^ break 1) p ts + ) rparen + +(** ∃$w1 $w2 ($w3 = $s) $w4... $wn. $c *) +let exist bindings body = group @@ + let print_binding (w, s) = + match s with + | None -> w + | Some s -> + group @@ + surround 2 0 lparen ( + w + ^/^ string "=" + ^/^ s + ) rparen + in + let bindings = + group (flow_map (break 1) print_binding bindings) + in + group (utf8string "∃" ^^ ifflat empty space + ^^ nest 2 bindings + ^^ break 0 ^^ string ".") + ^^ prefix 2 1 empty body + +let true_ = utf8string "⊤" +let false_ = utf8string "⊥" + +(** $c1 ∧ $c2 ∧ .... ∧ $cn *) +let conjunction docs = group @@ + match docs with + | [] -> true_ + | docs -> separate (break 1 ^^ utf8string "∧" ^^ space) docs + +(** $v1 = $v2 *) +let eq v1 v2 = group @@ + v1 + ^/^ string "=" + ^/^ v2 + +(** decode $v *) +let decode v = group @@ + string "decode" ^^ break 1 ^^ v + +let do_ = string "do?" + +(** + $ty1 +incompatible with + $ty2 +*) +let incompatible ty1 ty2 = + group (blank 2 ^^ nest 2 ty1) + ^^ hardline ^^ string "incompatible with" ^^ hardline ^^ + group (blank 2 ^^ nest 2 ty2) + +let cycle v = + string "cycle on constraint variable" ^/^ v + +let with_header header doc = + string header ^^ colon ^^ nest 2 (group (hardline ^^ doc)) + diff --git a/src/support/STLCPrinter.ml b/src/support/STLCPrinter.ml new file mode 100644 index 0000000..2554a35 --- /dev/null +++ b/src/support/STLCPrinter.ml @@ -0,0 +1,65 @@ +open STLC + +let print_ty : ty -> PPrint.document = + let rec print t = + let print_self = print + and print_next = print_atom in + match t with + | Constr (Arrow (t1, t2)) -> + Printer.arrow (print_next t1) (print_self t2) + | other -> print_next other + + and print_atom = function + | Constr (Var alpha) -> TyVar.print alpha + | Constr (Prod ts) -> Printer.product (List.map print ts) + | Constr (Arrow _) as other -> PPrint.parens (print other) + + in print + +let print_term : term -> PPrint.document = + let print_binding x tau = + Printer.annot (TeVar.print x) (print_ty tau) + in + let rec print_top t = print_left_open t + + and print_left_open t = + let print_self = print_left_open + and print_next = print_app in + PPrint.group @@ match t with + | Abs (x, tau, t) -> + Printer.lambda + ~input:(print_binding x tau) + ~body:(print_self t) + | Let (x, tau, t, u) -> + Printer.let_ + ~var:(print_binding x tau) + ~def:(print_top t) + ~body:(print_self u) + | LetTuple (xtaus, t, u) -> + Printer.let_ + ~var:(Printer.tuple (fun (x, tau) -> print_binding x tau) xtaus) + ~def:(print_top t) + ~body:(print_self u) + | other -> print_next other + + and print_app t = + let print_self = print_app + and print_next = print_atom in + PPrint.group @@ match t with + | App (t, u) -> + Printer.app (print_self t) (print_next u) + | other -> print_next other + + and print_atom t = + PPrint.group @@ match t with + | Var x -> TeVar.print x + | Annot (t, ty) -> + Printer.annot + (print_top t) + (print_ty ty) + | Tuple ts -> + Printer.tuple print_top ts + | (App _ | Abs _ | Let _ | LetTuple _) as other -> + PPrint.parens (print_top other) + + in print_top diff --git a/src/support/STLCPrinter.mli b/src/support/STLCPrinter.mli new file mode 100644 index 0000000..d22a3fc --- /dev/null +++ b/src/support/STLCPrinter.mli @@ -0,0 +1,2 @@ +val print_ty : STLC.ty -> PPrint.document +val print_term : STLC.term -> PPrint.document diff --git a/src/support/SatConstraint.ml b/src/support/SatConstraint.ml new file mode 100644 index 0000000..e452e7c --- /dev/null +++ b/src/support/SatConstraint.ml @@ -0,0 +1,43 @@ +module Make (T : Utils.Functor) = struct + module Constraint = Constraint.Make(T) + open Constraint + + type t = sat_constraint + and sat_constraint = + | Exist of variable * structure option * sat_constraint + | Conj of sat_constraint list (* [True] is [Conj []] *) + | Eq of variable * variable + | Decode of variable + | False + | Do of sat_constraint T.t + + let rec erase + : type a e. (a, e) Constraint.t -> sat_constraint + = function + | Exist (v, c, s) -> Exist (v, c, erase s) + | Map (c, _) -> erase c + | MapErr (c, _) -> erase c + | Ret _v -> Conj [] + | Err _e -> False + | Conj (_, _) as conj -> + let rec peel + : type a e . (a, e) Constraint.t -> sat_constraint list + = function + | Map (c, _) -> peel c + | MapErr (c, _) -> peel c + | Conj (c1, c2) -> peel c1 @ peel c2 + | Err _ -> [False] + | Ret _ -> [] + | Exist _ as c -> [erase c] + | Eq _ as c -> [erase c] + | Decode _ as c -> [erase c] + | Do _ as c -> [erase c] + in + begin match peel conj with + | [c] -> c + | cases -> Conj cases + end + | Eq (v1, v2) -> Eq (v1, v2) + | Decode v -> Decode v + | Do c -> Do (T.map erase c) +end diff --git a/src/support/UntypedLexer.mll b/src/support/UntypedLexer.mll new file mode 100644 index 0000000..e9d9377 --- /dev/null +++ b/src/support/UntypedLexer.mll @@ -0,0 +1,55 @@ +{ + open UntypedParser + + let keyword_table = + Hashtbl.create 17 + + let keywords = [ + "let", LET; + "in", IN; + "lambda", LAMBDA; + ] + + let _ = + List.iter + (fun (kwd, tok) -> Hashtbl.add keyword_table kwd tok) + keywords + + let new_line lexbuf = + Lexing.new_line lexbuf +} + +let identchar = ['a'-'z' 'A'-'Z' '0'-'9' '_'] +let lident = ['a'-'z'] identchar* + +let blank = [' ' '\t']+ +let newline = '\r' | '\n' | "\r\n" + +rule read = parse + | eof { EOF } + | newline { new_line lexbuf; read lexbuf } + | blank { read lexbuf } + | lident as id { try Hashtbl.find keyword_table id + with Not_found -> LIDENT id } + | "->" { ARROW } + | '(' { LPAR } + | ')' { RPAR } + | '*' { STAR } + | ',' { COMMA } + | '=' { EQ } + | ":" { COLON } + | '.' { PERIOD } + | "--" { line_comment lexbuf; read lexbuf } + | _ as c + { failwith + (Printf.sprintf + "Unexpected character during lexing: %c" c) } + +and line_comment = parse +| newline + { new_line lexbuf; () } +| eof + { failwith "Unterminated OCaml comment: \ + no newline at end of file." } +| _ + { line_comment lexbuf } diff --git a/src/support/UntypedParser.mly b/src/support/UntypedParser.mly new file mode 100644 index 0000000..3ad71d0 --- /dev/null +++ b/src/support/UntypedParser.mly @@ -0,0 +1,105 @@ +%{ + open Untyped.Make(Utils.Empty) +%} + +%token LIDENT + +%token EOF + +%token LET "let" +%token IN "in" +%token LAMBDA "lambda" + +%token ARROW "->" +%token LPAR "(" +%token RPAR ")" +%token STAR "*" +%token COMMA "," +%token EQ "=" +%token COLON ":" +%token PERIOD "." + +%type term_eof + +%start term_eof + +%% + +let term_eof := + | ~ = term ; EOF ; + <> + +(***************** TERMS ***************) + +let term := + | ~ = term_abs ; <> + +let term_abs := + | "lambda" ; xs = list (tevar) ; "." ; t = term_abs ; + { List.fold_right (fun x t -> Abs (x, t)) xs t } + | (x, t1, t2) = letin(tevar) ; + { Let (x, t1, t2) } + | (xs, t1, t2) = letin(tuple(tevar)) ; + { LetTuple (xs, t1, t2) } + | ~ = term_app ; <> + +let term_app := + | t1 = term_app ; t2 = term_atom ; + { App (t1, t2) } + | ~ = term_atom ; <> + +let term_atom := + | x = tevar ; + { Var x } + | ts = tuple (term) ; + { Tuple ts } + | "(" ; t = term ; ":" ; ty = typ ; ")" ; + { Annot (t, ty) } + | "(" ; ~ = term ; ")" ; <> + +let tevar := + | ~ = LIDENT ; <> + +let letin (X) := + | LET ; x = X ; EQ ; + t1 = term ; IN ; + t2 = term_abs ; + { (x, t1, t2) } + +let tuple (X) := + | "(" ; ")" ; + { [] } + (* note: the rule below enforces that one-element lists always + end with a trailing comma *) + | "(" ; x = X ; COMMA ; xs = item_sequence(X, COMMA) ; ")"; + { x :: xs } + +(* item sequence with optional trailing separator *) +let item_sequence(X, Sep) := + | + { [] } + | x = X ; + { [x] } + | x = X ; () = Sep ; xs = item_sequence(X, Sep) ; + { x :: xs } + +(*************** TYPES ***************) + +let typ := + | ~ = typ_arrow ; <> + +let typ_arrow := + | ty1 = typ_atom ; "->" ; ty2 = typ_arrow ; + { STLC.Constr (Structure.Arrow (ty1, ty2)) } + | ~ = typ_atom ; <> + +let typ_atom := + | x = tyvar ; + { STLC.Constr (Structure.Var x) } + | "(" ; tys = separated_list ("*", typ) ; ")" ; + { STLC.Constr (Structure.Prod tys) } + | "(" ; ~ = typ ; ")" ; <> + +let tyvar := + | ~ = LIDENT ; <> + diff --git a/src/support/UntypedPrinter.ml b/src/support/UntypedPrinter.ml new file mode 100644 index 0000000..dc54194 --- /dev/null +++ b/src/support/UntypedPrinter.ml @@ -0,0 +1,51 @@ +module Make(T : Utils.Functor) = struct + open Untyped.Make(T) + + let print_term : term -> PPrint.document = + let rec print_top t = print_left_open t + + and print_left_open t = + let print_self = print_left_open + and print_next = print_app in + PPrint.group @@ match t with + | Abs (x, t) -> + Printer.lambda + ~input:(Var.print x) + ~body:(print_self t) + | Let (x, t, u) -> + Printer.let_ + ~var:(Var.print x) + ~def:(print_top t) + ~body:(print_self u) + | LetTuple (xs, t, u) -> + Printer.let_ + ~var:(Printer.tuple Var.print xs) + ~def:(print_top t) + ~body:(print_self u) + | other -> print_next other + + and print_app t = + let print_self = print_app + and print_next = print_atom in + PPrint.group @@ match t with + | App (t, u) -> + Printer.app (print_self t) (print_next u) + | other -> print_next other + + and print_atom t = + PPrint.group @@ match t with + | Var x -> Var.print x + | Annot (t, ty) -> + Printer.annot + (print_top t) + (STLCPrinter.print_ty ty) + | Tuple ts -> + Printer.tuple print_top ts + | (App _ | Abs _ | Let _ | LetTuple _) as other -> + PPrint.parens (print_top other) + | Do _p -> + Printer.do_ + + in print_top + +end diff --git a/src/support/UntypedPrinter.mli b/src/support/UntypedPrinter.mli new file mode 100644 index 0000000..0220006 --- /dev/null +++ b/src/support/UntypedPrinter.mli @@ -0,0 +1,4 @@ +module Make(T : Utils.Functor) : sig + module Untyped := Untyped.Make(T) + val print_term : Untyped.term -> PPrint.document +end diff --git a/src/support/dune b/src/support/dune new file mode 100644 index 0000000..5a81a3a --- /dev/null +++ b/src/support/dune @@ -0,0 +1,5 @@ +(ocamllex UntypedLexer) + +(menhir + (modules UntypedParser) + (flags --explain)) diff --git a/tests.t/curry.test b/tests.t/curry.test new file mode 100644 index 0000000..efa06d5 --- /dev/null +++ b/tests.t/curry.test @@ -0,0 +1 @@ +lambda f. lambda x y. f (x, y) diff --git a/tests.t/error.test b/tests.t/error.test new file mode 100644 index 0000000..637fd36 --- /dev/null +++ b/tests.t/error.test @@ -0,0 +1 @@ +(lambda x. (x : int)) (lambda y. y) diff --git a/tests.t/id_int.test b/tests.t/id_int.test new file mode 100644 index 0000000..98214dd --- /dev/null +++ b/tests.t/id_int.test @@ -0,0 +1 @@ +lambda x. (x : int) diff --git a/tests.t/id_poly.test b/tests.t/id_poly.test new file mode 100644 index 0000000..01845fb --- /dev/null +++ b/tests.t/id_poly.test @@ -0,0 +1 @@ +lambda x. x \ No newline at end of file diff --git a/tests.t/run.t b/tests.t/run.t new file mode 100644 index 0000000..3f9b00c --- /dev/null +++ b/tests.t/run.t @@ -0,0 +1,396 @@ +# TL;DR + +To run the tests, run + + dune runtest + +from the root of the project directory. If this outputs +nothing, the testsuite passes. If this outputs a diff, it +means that there is a mismatch between the recorded/reference +output and the behavior of your program. + +To *promote* the tests outputs (that is, to modify the reference +output to match the current behavior of your program), run + + dune runtest + dune promote + +When you submit your project, please check that `dune runtest` does +not produce a diff -- the recorded output should match your +program. If some outputs are wrong / not what you would expect, please +explain this in the present file. + + +# Intro + +This file is a "dune cram test" as explained at +> https://dune.readthedocs.io/en/stable/tests.html#cram-tests + +The idea is to write 2-indented command lines prefixed by +a dollar sign. The tool will run the command and check that +the output corresponds to the output recorded in the file. + + $ echo example + example + +To run the tests, just run `dune runtest` at the root of the +project. This will show you a diff between the observed +output and the recorded output of the test -- we consider +that the test 'passes' if the diff is empty. +In particular, if you run `dune runtest` and you see no +output, this is good! It means there was no change in the +test output. + +If you think that the new output is better than the previous +output, run `dune promote`; dune will rewrite the recorded +outputs to match the observed outputs. (You can also modify +outputs by hand but this is more cumbersome.) + +It is totally okay to have some test outputs recorded in +your repository that are known to be broken -- because there +is a bug, or some feature is not documented yet. Feel free +to use the free-form comments in run.t to mention explicitly +that the output is broken. (But then please remember, as the +output changes in the future, to also update your comments.) + + +# The tests + +The tests below use the `minihell` program defined in +../bin/minihell.ml, called on the *.test files stored in the +present directory. If you want to add new tests, just add +new test files and then new commands below to exercise them. + +`minihell` takes untyped programs as input and will +type-check and elaborate them. It can show many things +depending on the input flags passed. By default we ask +`minihell` to repeat the source file (to make the recorded +output here more pleasant to read) and to show the generated +constraint. It will also show the result type and the +elaborated term. + + $ FLAGS="--show-source --show-constraint" + +Remark: You can call minihell from the command-line yourself +by using either +> dune exec bin/minihell.exe -- +or +> dune exec minihell -- +(The latter short form, used in the tests below, is available thanks +to the bin/dune content.) + + +## Simple tests + +`id_poly` is just the polymorphic identity. + + $ minihell $FLAGS id_poly.test + Input term: + lambda x. x + + Generated constraint: + ∃?final_type. + (∃?x ?wt (?warr = ?x -> ?wt). ?final_type = ?warr ∧ decode ?x ∧ ?wt = ?x) + ∧ decode ?final_type + + Inferred type: + α -> α + + Elaborated term: + lambda (x : α). x + + +`id_int` is the monomorphic identity on the type `int`. Note +that we have not implemented support for a built-in `int` +type, this is just an abstract/rigid type variable: `Constr +(Var ...)` at type `STLC.ty`. + + $ minihell $FLAGS id_int.test + Input term: + lambda x. (x : int) + + Generated constraint: + ∃?final_type. + (∃?x ?wt (?warr = ?x -> ?wt). + ?final_type = ?warr + ∧ decode ?x + ∧ (∃(?int = int). ?int = ?x ∧ ?int = ?wt)) + ∧ decode ?final_type + + Inferred type: + int -> int + + Elaborated term: + lambda (x : int). x + + +## Logging the constraint-solving process + +You can ask `minihell` to show how the constraint evolves as +the solver progresses and accumulates more information on +the inference variables. + + $ minihell $FLAGS --log-solver id_int.test + Input term: + lambda x. (x : int) + + Generated constraint: + ∃?final_type. + (∃?x ?wt (?warr = ?x -> ?wt). + ?final_type = ?warr + ∧ decode ?x + ∧ (∃(?int = int). ?int = ?x ∧ ?int = ?wt)) + ∧ decode ?final_type + + Constraint solving log: + ∃?final_type. + decode ?final_type + ∧ (∃?x ?wt (?warr = ?x -> ?wt). + (∃(?int = int). ?int = ?wt ∧ ?int = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?final_type. + decode ?final_type + ∧ (∃?x ?wt (?warr = ?x -> ?wt). + (∃(?int = int). ?int = ?wt ∧ ?int = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?final_type. + decode ?final_type + ∧ (∃?wt (?warr = ?x -> ?wt). + (∃(?int = int). ?int = ?wt ∧ ?int = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?wt ?final_type. + decode ?final_type + ∧ (∃(?warr = ?x -> ?wt). + (∃(?int = int). ?int = ?wt ∧ ?int = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?wt (?warr = ?x -> ?wt) ?final_type. + decode ?final_type + ∧ (∃(?int = int). ?int = ?wt ∧ ?int = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr + ∃?x ?wt (?final_type = ?x -> ?wt). + decode ?final_type ∧ (∃(?int = int). ?int = ?wt ∧ ?int = ?x) ∧ decode ?x + ∃?x ?wt (?int = int) (?final_type = ?x -> ?wt). + decode ?final_type ∧ ?int = ?wt ∧ ?int = ?x ∧ decode ?x + ∃?wt (?int = int) (?final_type = ?int -> ?wt). + decode ?final_type ∧ ?int = ?wt ∧ decode ?int + ∃(?int = int) (?final_type = ?int -> ?int). + decode ?final_type ∧ decode ?int + + Inferred type: + int -> int + + Elaborated term: + lambda (x : int). x + + +## An erroneous program + + $ minihell $FLAGS error.test + Input term: + (lambda x. (x : int)) (lambda y. y) + + Generated constraint: + ∃?final_type. + (∃?wu (?wt = ?wu -> ?final_type). + (∃?x ?wt/1 (?warr = ?x -> ?wt/1). + ?wt = ?warr ∧ decode ?x ∧ (∃(?int = int). ?int = ?x ∧ ?int = ?wt/1)) + ∧ (∃?y ?wt/2 (?warr/1 = ?y -> ?wt/2). + ?wu = ?warr/1 ∧ decode ?y ∧ ?wt/2 = ?y)) + ∧ decode ?final_type + + Error: + int + incompatible with + β -> α + + +## Examples with products + + $ minihell $FLAGS curry.test + Input term: + lambda f. lambda x. lambda y. f (x, y) + + Generated constraint: + ∃?final_type. + (∃?f ?wt (?warr = ?f -> ?wt). + ?final_type = ?warr + ∧ decode ?f + ∧ (∃?x ?wt/1 (?warr/1 = ?x -> ?wt/1). + ?wt = ?warr/1 + ∧ decode ?x + ∧ (∃?y ?wt/2 (?warr/2 = ?y -> ?wt/2). + ?wt/1 = ?warr/2 + ∧ decode ?y + ∧ (∃?wu (?wt/3 = ?wu -> ?wt/2). + ?wt/3 = ?f + ∧ (∃?w1. + ?w1 = ?x + ∧ (∃?w2. ?w2 = ?y ∧ (∃(?wprod = {?w1 * ?w2}). ?wu = ?wprod))))))) + ∧ decode ?final_type + + Inferred type: + ({γ * β} -> α) -> γ -> β -> α + + Elaborated term: + lambda (f : {γ * β} -> α). lambda (x : γ). lambda (y : β). f (x, y) + + + $ minihell $FLAGS uncurry.test + Input term: + lambda f. lambda p. let (x, y) = p in f x y + + Generated constraint: + ∃?final_type. + (∃?f ?wt (?warr = ?f -> ?wt). + ?final_type = ?warr + ∧ decode ?f + ∧ (∃?p ?wt/1 (?warr/1 = ?p -> ?wt/1). + ?wt = ?warr/1 + ∧ decode ?p + ∧ (∃?x ?y (?wt/2 = {?x * ?y}). + decode ?x + ∧ decode ?y + ∧ ?wt/2 = ?p + ∧ (∃?wu (?wt/3 = ?wu -> ?wt/1). + (∃?wu/1 (?wt/4 = ?wu/1 -> ?wt/3). ?wt/4 = ?f ∧ ?wu/1 = ?x) + ∧ ?wu = ?y)))) + ∧ decode ?final_type + + Inferred type: + (β -> γ -> α) -> {β * γ} -> α + + Elaborated term: + lambda + (f : β -> γ -> α). + lambda (p : {β * γ}). let ((x : β), (y : γ)) = p in f x y + +## Cyclic types + +Unification can sometimes create cyclic types. We decide to reject +these situations with an error. (We could also accept those as they +preserve type-safety, but they have the issue, just like the +OCaml -rectypes option, that they allow to write somewhat-nonsensical +program, and our random term generator will be very good at finding +a lot of those.) + + $ minihell $FLAGS --log-solver selfapp.test + Input term: + lambda x. x x + + Generated constraint: + ∃?final_type. + (∃?x ?wt (?warr = ?x -> ?wt). + ?final_type = ?warr + ∧ decode ?x + ∧ (∃?wu (?wt/1 = ?wu -> ?wt). ?wt/1 = ?x ∧ ?wu = ?x)) + ∧ decode ?final_type + + Constraint solving log: + ∃?final_type. + decode ?final_type + ∧ (∃?x ?wt (?warr = ?x -> ?wt). + (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?final_type. + decode ?final_type + ∧ (∃?x ?wt (?warr = ?x -> ?wt). + (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?final_type. + decode ?final_type + ∧ (∃?wt (?warr = ?x -> ?wt). + (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?wt ?final_type. + decode ?final_type + ∧ (∃(?warr = ?x -> ?wt). + (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr) + ∃?x ?wt (?warr = ?x -> ?wt) ?final_type. + decode ?final_type + ∧ (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∧ ?final_type = ?warr + ∃?x ?wt (?final_type = ?x -> ?wt). + decode ?final_type + ∧ (∃?wu (?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∃?x ?wu ?wt (?final_type = ?x -> ?wt). + decode ?final_type + ∧ (∃(?wt/1 = ?wu -> ?wt). ?wu = ?x ∧ ?wt/1 = ?x) + ∧ decode ?x + ∃?x ?wu ?wt (?wt/1 = ?wu -> ?wt) ?wt (?final_type = ?x -> ?wt). + decode ?final_type ∧ ?wu = ?x ∧ ?wt/1 = ?x ∧ decode ?x + ∃?wu ?wt (?wt/1 = ?wu -> ?wt) ?wt (?final_type = ?wt/1 -> ?wt). + decode ?final_type ∧ ⊥ ∧ decode ?wt/1 + + Error: + cycle on constraint variable + ?wu + +## Generator tests + +This gives example outputs for my implementation. It is completely +fine if your own implementation produces different (sensible) results. + +There are not many programs with depth 3, 4 and 5. + + $ minigen --exhaustive --depth 3 --count 100 + lambda (x/4 : α). x/4 + + $ minigen --exhaustive --depth 4 --count 100 + lambda (v/14 : β/1). lambda (u/19 : α/1). u/19 + + lambda (v/14 : α/1). lambda (u/19 : γ/1). v/14 + +An example of random sampling output at higher depth. + + $ minigen --seed 42 --depth 6 --count 10 + lambda (z/90 : β/21). (z/90, lambda (u/90 : α/21). u/90) + + lambda (v/3d4 : β/dc). (lambda (w/3d4 : β/dc). v/3d4) v/3d4 + + lambda + (x/48b : {δ/110 * γ/110}). + let + ((y/48b : δ/110), (z/48b : γ/110)) + = + x/48b + in lambda (u/48b : β/110). u/48b + + lambda + (w/568 : γ/144). + lambda + (x/569 : {β/144 * α/144}). + let ((y/569 : β/144), (z/569 : α/144)) = x/569 in z/569 + + lambda + (y/58e : α/14c). + let (z/58e : δ/14b -> δ/14b) = lambda (u/58e : δ/14b). u/58e in y/58e + + (lambda (u/5f3 : γ/165). u/5f3, lambda (v/5f3 : β/165). v/5f3) + + (lambda (y/6b2 : α/187). y/6b2, lambda (z/6b2 : δ/186). z/6b2) + + lambda + (u/722 : {δ/19c * γ/19c}). + let + ((v/722 : δ/19c), (w/722 : γ/19c)) + = + u/722 + in lambda (x/723 : β/19c). v/722 + + lambda + (x/7fd : β/1c0). + lambda (y/7fd : α/1c0). let (z/7fd : α/1c0) = y/7fd in x/7fd + + lambda (x/b58 : δ/283). (lambda (y/b58 : γ/283). y/b58, x/b58) diff --git a/tests.t/selfapp.test b/tests.t/selfapp.test new file mode 100644 index 0000000..c8e3e1d --- /dev/null +++ b/tests.t/selfapp.test @@ -0,0 +1,2 @@ +lambda x. x x + diff --git a/tests.t/uncurry.test b/tests.t/uncurry.test new file mode 100644 index 0000000..20f2950 --- /dev/null +++ b/tests.t/uncurry.test @@ -0,0 +1,3 @@ +lambda f. lambda p. + let (x, y) = p in + f x y