Auto-generating Idris2 code based on Dependent State Automata (DSA) diagrams.
If you have followed the Type Driven Development with Idris book, you may have
spotted some patterns in chapter 14 in terms of modelling Dependent State
Automata in Idris. This project attempts to auto-generate the Idris2 source-code
for these based on a DOT (.gv) file.
Yes! (albeit with some limitations)
Generating Idris source code from .gv files is almost done! The
following example (from the TDD book, Ch 14) can be modelled in DOT:
digraph ATM {
Ready -> CardInserted [label="InsertCard"];
CardInserted -> Ready [label="EjectCard"];
CardInserted -> CardInserted [label="GetPIN!(PIN); CheckPIN?(Incorrect)"];
CardInserted -> Session [label="CheckPIN?(Correct)"];
Session -> Session [label="Dispense"];
Session -> Ready [label="EjectCard"]; // not entirely correct
}Which means you can generate a graph of the program and its states:
HOWEVER you can then also use the DOT-code/-model to generate Idris
through the toIdris2 function, resulting in the following output:
$ idris2 -p contrib -p dot-parse DSAGen.idr --exec atmTest -- SUCCESS!!! --
data ATMState
= Session
| CardInserted
| Ready
data CheckPINRes
= Incorrect
| Correct
data ATMCmd : (resTy : Type) -> ATMState -> (resTy -> ATMState) -> Type where
InsertCard : ATMCmd () (Ready) (const (CardInserted))
GetPIN : ATMCmd () (CardInserted) (const (CardInserted))
Dispense : ATMCmd () (Session) (const (Session))
CheckPIN : ATMCmd (CheckPINRes) (CardInserted) (\case (Incorrect) => (CardInserted); (Correct) => (Session))
EjectCard : ATMCmd () anyState (const (Ready))
Pure : (res : resTy) -> ATMCmd resTy (state_fn res) state_fn
(>>=) : ATMCmd resTy state_1 state2_fn
-> (contn : (res : resTy) -> ATMCmd cResTy (state2_fn res) state3_fn)
-> ATMCmd cResTy state_1 state3_fnIt is not the prettiest, and there is some stuff to fill in, but after that it type-checks and you would be able to program parts of Ch 14 using the result!
In a nutshell: DOT/GraphViz diagrams are annotated with special syntax (using
DOT's built-in labels), the result is then parsed and turned into a Domain
Specific Language (DSL) which describes Dependent State Automata (DSA). From
this DSA representation, the Idris2 code is then generated.
The DOT edge labels can use the following syntax to describe what the edge/command does (if anything):
:(<val>)-- specifies a value which the command takes as an argument.?(<case_val>)-- specifies a value which is a case in a dependent result (e.g. theCheckPINresult in the ATM) and indicates that the edge only connects these states when this case is hit.!(<val>)-- specifies a value which the command produces as a result.
Currently, the following types of values are supported:
- Idris names, e.g.
sn, - Integer literals,
- Data constructors (including ones with arguments!),
- Addition of two sub-values,
- Tuples of values.
The complete process for code generation (cg) is as follows:
- The special DOT/GraphViz file is parsed by the
dot-parsepackage. - The AST is then converted to a DSA
DSLv2, in part by the parsers found insrc/DSAGen/Parser/. This DSL stores:- Stores the DSA's name
dsaName, - extracts the states,
- extracts all the edges,
- filters and combines the Universal Edges from all the edges,
- splits the remaining edges using the
Splitdata-structure and theIsPlainEdgeproof, to separate the plain from the syntax-dependent edges.
- Stores the DSA's name
- The states get cg-ed to a top-level type declaration with constructors for each state name.
- The results of dependent edges get cg-ed to top-level type declarations
with constructors.
- As part of this process, the non-plain-non-dependent edges get
splitoff as well. This allows for easier cg since we don't have to worry about being handed an edge which isn't dependent. - All the dependent edges are then accumulated based on their command/edge name. This is to gather all the possible results/cases in one data-structure for easier cg.
- For each of the types of dependent edges (i.e. dep, take-dep, dep-prod, and take-dep-prod), a data declaration with constructors is then generated (note: only dep-edges at the moment; see limitations).
- As part of this process, the non-plain-non-dependent edges get
- The DSA command (cmd) data declaration gets cg-ed.
- The commands/edges get cg-ed
- First the plain edges are cg-ed.
- Then the non-plain edges are paired with their proofs of non-plainness,
- and are then
splitagain along a proof for non-plain-non-dependentness. - From this split, the non-plain-non-dependent edges get cg-ed,
- And finally, the non-plain-dependent edges get cg-ed
- Only simple dependent edges are currently supported; see limitations.
- Universal Edges get cg-ed.
- These are each a plain command producing no result, and going from
anyStateback to a single state.
- These are each a plain command producing no result, and going from
- Finally, the
Pureand(>>=)(read: 'bind') commands get cg-ed.- These have exactly the same structure for all DSAs, with the exception of the name of the command-type/DSA they belong to.
There are currently the following limitations to this project:
- The Idris2 code is generated through magic strings rather than elaborator reflection, meaning that there are no guarantees that the resulting Idris is valid. I have taken extreme care in crafting these magic strings, but there may be issues (please let me know if so!).
- As an indirect consequence of the previous limitation, the top-level
dependent state functions which are generated for dependent edges (and
their friends) are not aware of whether they should be taking any
additional arguments (other than the current state). This was not a
problem when generating them as inline lambda-
caseexpressions, since implicit arguments would then either be in scope elsewhere in the constructor, or Idris would automatically generalise them. However, that is not the case for top-level functions; an unknown argument is an error, not a compiler-inserted implicit argument (which is fair enough). However, storing the necessary information in the state and/or results will result in the function being generated correctly. Doing proper top-level function generation -- detecting the necessary arguments etc. -- would probably be quite annoying/difficult, and so this has not been implemented. - Edge names are assumed to be correct and unique (except for dependent edges, where they are expected to match). It is the programmer's/diagram-drawer's responsibility to make sure this is true.
- There is no system/syntax for specifying type information, meaning that
anything put at the type-level (e.g. in the state of the DSA) gets
code-generated as holes. If you know the type, this can be solved by
passing the output through a
sed -e 's/?<var>_hole/<type>/g'step, but this is not ideal. - The program does not generate the type of the result of a produce command,
meaning these need to be filled in manually (e.g.
PINin the ATM example). This is again not a huge problem, since these tend to be type-aliases or similar and so can be quickly defined. But like with the missing type information, it is not ideal. - Only simple dependent edges are supported. The rest will hit holes and exit with an error saying so. This is due to the general limitation described below.
After some experimentation and playing around with more complex DSAs, there seems to be a limitation to this approach in general, in that the diagrams for more complex state stop being nice and easily human-readable. One could argue that we do not need to display everything at once, and whilst that is true, partial display/hiding is unfortunately not a feature supported by DOT.
This work is licensed under GPL-2.0.