Post

From AST definition to monadic DSL

A nicer way to write code generators in Template Haskell

From AST definition to monadic DSL

Introduction

After working on 3 small-to-medium-sized projects that use Template Haskell (packed, type-machine, and reflection), I’ve come to realise that, while the idea of generating Haskell code in Haskell is fun, actually doing it isn’t.

Aside from the Q monad and the reify function, the main feature of Template Haskell’s API is the collection of AST types it provides. Basically, (almost) any code that is valid Haskell can be represented as an AST, including top-level declarations(Dec), expressions (Exp), and even pragmas, bang patterns and type annotations.

These AST types are defined in the most idiomatic-Haskell way: using algebraic data types (ADT):

1
data Exp = VarE Name | ConE Name | LitE Lit | AppE Exp Exp | ...

However, I find meta-programs (i.e. functions that build a Haskell AST) quite verbose. Let’s generate a projection function for tuples1:

1
2
3
4
5
genProj :: TupleSize -> FieldIndex -> Exp
genProj size idx = LamE 
    [TupP (replicate i WildP ++ [VarP f] ++ replicate (n - i - 1) WildP)] 
    (VarE f) 
  where f = mkName "value"

This example showcases 2 of the 5 main painful things we have to deal with when writing meta-programs:

  • Having to manage names (although here we just have f)
  • We have to build patterns and expressions separately: names cannot be used as patterns and expressions directly.
    • E.g., we cannot write let name = mkName "f" in LamE [name] name
  • In pure context (i.e. non Q computations), we can easily end up with clashing names (see mkName vs. newName)
  • Deconstructing values is a pain: we have to know in advance the number of fields of the target constructor, and build a list of Pats and insert VarP name at the correct position. This is even more frustrating when we want to access only a specific set of fields.
  • Refactoring pure computations to/from Q operations can be a pain as this usually means changing all the constructors used to build the AST (e.g. lamE <-> LamE, varP <-> VarP).

There are many other annoying points, but this is not a hate post. All this friction made me realise something: it would be so much better if we could build these ASTs with lenses and a state monad. With the wide array of operators provided by the lens library, it should be possible to make ourselves a nice little wrapper library.

This is what I have been working during the last few weeks. I came up with template-haskell-natural: a library that provides a monadic DSL for Template Haskell. It allows writing meta-programs that follow the same flow as the program to generate. Worded differently, it allows writing meta-programs naturally:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import Language.Haskell.TH.Natural.Syntax.Builder as B

genProj :: TupleSize -> FieldIndex -> Q Exp
genProj size idx idx = genExpr $ newExpr $ B.do
  tup <- arg
  value <- getTupleField size idx tup
  returns value

-- | \(a, b, ..) -> (.., b, a)
genReverse :: TupleSize -> ExpQ
genReverse n = genExpr $ newExpr $ b.do
    tup <- arg
    tupleFields <- forM [0 .. n - 1] $ \i -> getTupleField n i tup
    returns (TupE (Just <$> reverse tupFields))

The library provides utilities to build ASTs for:

  • Typeclasses and instances
  • Type Signatures
  • Datatypes and constructors
  • Expressions, binding variables, and pattern matching

The goal of this blog post is to present the ideas behind the library, as well as show some examples of meta-programs built using template-haskell-natural.

Foundations

Lenses and State Monad

Basically (and I apologise for the gross simplification), lenses are operators that allow accessing and updating the fields of an object. They are most commonly generated using Template Haskell. Let’s take the example of a simple record object and a function that increments one of its fields:

1
2
3
4
data MyData = MyData { _s :: String, _i :: Int, _c :: Char }

incI :: MyData -> MyData
incI d = d{ _i = _i d + 1 }

This is one short way to write incI. Using lenses, however, we can make this even more concise:

1
2
3
4
makeLenses 'MyData

incI :: MyData -> MyData
incI d = (d & i) +~ 1

While we can argue that the operators & and +~ require some getting used to, it definitely looks better.

The good news is that it gets even better when we interact with MyData through a State monad:

1
2
3
4
5
6
7
8
incI' :: State MyData ()
incI' = i += 1

mapS :: (Char -> Char) -> State MyData ()
mapS f = s %= (fmap f)

consCS :: State MyData String
consCS = liftA2 (:) (view c) (view s)

Lenses for Template Haskell

The imperative interface provided by lenses and the State monad could help us avoid some of the friction listed in the intro when building ASTs. We could imagine having utility functions that allow managing names, extract a given set of fields, build patterns, etc.

However, there are two issues that we need to consider:

  • Some operations, e.g. binding arguments, require ensuring that the Exp that is being built is a LamE. We need to be able to work with selected constructors (e.g. ClassD, FunD, LamE, etc.) type-safely.
  • The fields of the ADTs’ constructors are unnamed. So we cannot use the utilities from the lens library to derive them easily.

Extracting constructors

To solve the first problem, I came up with a small library extract-cons, which allow extracting all the constructors of a given type constructor into standalone datatypes:

1
2
3
4
5
6
7
8
9
import Data.Constructor.Extract.TH

extractConstructorsOf 'Exp

-- ==> Generates the following ADTs:
data VarE = MkVarE Name
data AppE = MkAppE Exp Exp
data LamE = MkLamE [Pat] Exp
-- etc.  

It also provides a way to easily to convert the generated data types back to the source datatype.

Naming ADT fields

The lens-adt package solves the second issue. Using the position function from generic-lens, we can write a simple TH utility function to generate a lens for each field of a datatype:

1
2
3
makeADTLenses 'VarE ["name"]
makeADTLenses 'AppE ["callee", "arg"]
makeADTLenses 'LamE ["pats", "body"]

Some field names can clash, so instead of defining lens functions, it generates instances of typeclasses to allow overloading:

1
2
3
4
5
6
7
8
9
makeADTLenses 'VarE ["name"]

-- ===> What actually gets generated 

class HasName s t where
  name :: Lens' s t

instance HasName VarE Name where
  name = position @1

All these lenses are generates and exposed by the template-haskell-lenses package. We now have everything we need to build ASTs imperatively:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
genExp :: StateT Q LamE () -> Q Exp

bindArg :: StateT Q LamE Exp
bingArg = do
  argName <- lift $ newName "arg"
  pats |>= VarP argName
  return (VarE argName)

returns :: Exp -> StateT Q LamE ()
returns res = body .= res

genId :: StateT Q LamE () 
genId = do
  x <- bindArg
  returns x

Note: This is not valid template-haskell-natural code, but has a similar feel to it.

Ensuring correctness of the built AST with the Builder Monad

We could just use the lenses from template-haskell-lenses along with a StateT Q monad, and build the AST using the operators presented earlier. But we can go even further: the template-haskell-natural provides another layer of safety to make sure that the produces AST is somewhat valid. For example, we need to make sure that the LamE has a body, that the last statement in a DoE is not a bind, etc.

While it is possible to use extracted constructors as a state most of the time, that’s not the case for e.g. LamE because the body will not be immediately set by the user. In such cases, we thus need to define an intermediary state datatype.

Because of this, we need something on the type-level to guarantee that our AST-building computation is ready to produce a valid AST. One way to do this is to use a graded monad:

1
2
3
4
5
6
7
8
data BuilderStep = Empty | Ready

newtype BaseBuilder m st prev curr a =
  BaseBuilder { unBB :: StateT m st a }
-- * skipping the redefinition of (>>=), (>>), etc. 

type Builder = BaseBuilder Q
type ConstBuilder st = Builder st Ready Ready

The Builder type alias is for computations for which we need to keep track of the progress (e.g. when all the fields are set). The ConstBuilder alias is for computations for which progress tracking is not necessary.

These definitions were so abstract that I could use them for all AST-building computations defined in template-haskell-natural 2. For illustration purposes, let’s consider the two main functions to build lambda expressions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Provided by the library
type ExprBuilder = Builder LamE

-- | Adds a pattern to the lambda and returns the bound name wrapped in `VarE`
arg :: ExprBuilder step step Exp
arg = do
    nextArgName <- liftB $ TH.newName "arg"
    argNames |>= nextArgName
    return $ TH.VarE nextArgName

-- | Sets the given expression as te 'body' of the lambda
returns :: GenExp e => e -> ExprBuilder step Ready ()
returns q = do
    expr <- liftB $ genExpr q
    impure (returnedExp ?= expr)

-- | The function to produce the final `Exp` AST 
newExpr :: ExprBuilder step Ready () -> Q Exp

The type parameters of the first function, arg, show that calling it does not alter the progress of the computation. On the other hand, the returns function sets the result expression of the lambda, making it ready to be reified.

Using the type-level arguments, we can guarantee at compile time that the AST to build is indeed ready and not missing any mandatory information.

GenX typeclasses

Notice how returns does not take an Exp, but any value whose type is an instance of the GenExp typeclass. With this abstraction, the programmer can actually pass anything that can be turned into an Exp, so both plain Exp values and quasi-quotations.

Thus, it is possible to do the following:

1
2
3
4
5
6
_ = B.do
  returns [|Nothing|]
  -- or
  returns (conE 'Nothing)
  -- or
  returns (ConE 'Nothing)

Thanks to this, we don’t have to worry about whether we can use pure constructors or quasi quotations to build the AST.

template-haskell-natural comes with variants of the GenExp typeclass for Decs, Types and Pats.

Examples

To test that template-haskell-natural is actually usable, I spent some time reimplementing some libraries that generate Haskell code using Template Haskell.

For the following snippets, we use the QualifiedDo language extension to be able to use the do-notation with our overloaded monadic operators.

Lambda Expressions: Reimplement tuple-th

tuple-th is a small library that provides TH functions to manipulate functions. Internally, it relies on a thin layer of abstraction around TH, allowing the functions to be reasonably small. I’ll leave you check out the original code. If the library were to be rewritten using template-haskell-natural, here is what it would look like.

1
2
3
import Language.Haskell.TH.Natural.Syntax.Builder as B
import Language.Haskell.TH.Natural.Syntax.Case
import Language.Haskell.TH.Natural.Syntax.Expr.Simple
  • mapTuple
    1
    2
    3
    4
    5
    6
    7
    
    -- (a -> b) -> (a, ..) -> (b, ..)
    mapTuple :: TupleSize -> ExpQ
    mapTuple n = genExpr $ newExpr $ B.do
      f <- arg
      tupFields <- tupleFields n =<< arg
      let transformedTupleFields = (f `AppE`) <$> tupFields
      returns $ tupE (pure <$> transformedTupleFields)
    
  • deleteAtTuple
    1
    2
    3
    4
    5
    6
    7
    
    -- Deletes the i-th element of the tuple.
    deleteAtTuple :: TupleSize -> TupleFieldIndex -> ExpQ
    deleteAtTuple n idx = genExpr $ newExpr $ B.do
      tup <- arg
      tupFields <- getTupleFields n tup
      let filteredFields = uncurry (++) $ second (drop 1) $ splitAt idx tupFields
      returns (tupE (pure <$> filteredFields))
    
  • safeTupleFromList
    1
    2
    3
    4
    5
    6
    7
    
    -- [a] -> Maybe (a, ..)
    safeTupleFromList :: TupleSize -> ExpQ
    safeTupleFromList n = genExpr $ newExpr $ B.do
      list <- arg
      returns $ case_ list $ B.do
          matchList n $ \listItems -> [|Just $(tupE $ pure <$> listItems)|]
          matchWild [|Nothing|]
    
  • catTuples
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    -- (a1, ..) -> (b1, ..) -> (a1, .., b1, ..)
    catTuples :: TupleSize -> TupleSize -> ExpQ
    catTuples n1 n2 = genExpr $ newExpr $ B.do
      leftTup <- arg
      rightTup <- arg
      tup1 <- tupleFields n1 leftTup
      tup2 <- tupleFields n2 rightTup
      let fields = tup1 ++ tup2
      returns $ tupE (pure <$> fields)
    

We end up with mostly clear and reasonably verbose code, without having to define a wrapper layer.

Datatypes: Reimplement extract-cons

The extract-cons library presented earlier is a small library that generates standalone datatypes from data constructors (see the Extracting constructors section). If we were to reimplement the library’s core function using template-haskell-natural, we would get the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- DataAndCon is an internal type from `extract-cons` containing info about the constructor to extract
genDataDec :: DataAndCon -> ExtractOptions -> DecQ
genDataDec d opts = genDec $ newData dataName $ B.do
  -- Add each type argument in the constructor to the data 
  forM_ (conTypeArgNames d) $ \tyArg -> addTypeVar (MkVarT tyArg)

  -- Add a 'deriving' clause for the typeclasses specified in the option 
  addDeriving `mapM_` deriveClasses opts

  -- Add one constructor
  addCon $ newCon conName $ B.do
      -- Add each argument of the original constructor to the new one
      addField' `mapM_` conArgs
  where
    conName = -- defined by opts
    dataName = -- defined by opts

Thanks to the monadic nature of the computation, we can leverage existing functions like mapM and forM to build easily build the AST programmatically.

Pattern matching: Reimplement the case generator from packed-data

The packed-data library comes with a set of code generators. One of them builds a case function that simulates pattern matching on packed data (more info here).

For example, the case function generated for a Tree datatype would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
data Tree a = Leaf a | Node (Tree a) (Tree a)

-- ====>
{-# INLINE caseTree #-}
caseTree ::
    (PackedReader '[a] r b) ->
    (PackedReader '[Tree a, Tree a] r b) ->
    PackedReader '[Tree a] r b
caseTree leafCase nodeCase = mkPackedReader $ \packed l -> do
    (tag :: Tag, packed1, l1) <- runReader reader packed l
    case tag of
        0 -> runReader leafCase packed1 l1
        1 -> runReader nodeCase packed1 l1
        _ -> fail "Bad Tag"

Again, using template-haskell-natural, we could write the code generator like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
genCase :: Name -> DecsQ
genCase tyName = do
    (TyConI (DataD _ _ _ _ cs _)) <- reify tyName
    newFunc ("case" ++ nameBase tyName) $ do
        -- INLINE pragma
        inline 
        -- Sets the function's type signature
        setSignature caseSignature
        -- Sets the body of the function
        bodyFromExp $ newExpr $ Expr.do
            -- Bind n arguments, where n is the number of constructor in the datatype
            caseReaders <- replicateM (length cs) arg

            resReader <- liftB $ genExpr $ Expr.do
              -- Bind both arguments in the PackedReader
              packed <- arg
              l <- arg

              -- The expression returns a do-expression
              returns $ newDo $ Do.do
                  -- Bind (<-) the result of runPackedReader
                  tpl <- bind [|runPackedReader reader $(q packed) $(q l)|]

                  -- Deconstruct the value and get its fields
                  (tag, packed1, l1) <-
                      liftA3
                          (,,)
                          -- Adding a type annotation to the first item
                          (getField' '(,,) 0 tpl $ \n -> [p|($(q n) :: Tag)|])
                          -- Get the second and third field in the triple
                          (getField '(,,) 1 tpl)
                          (getField '(,,) 2 tpl)  

                  -- Pattern matching on the Tag
                  returns $ case_ tag $ B.do
                      forM_ ([0 ..] `zip` caseReaders) $ \(i, caseReader) ->
                          -- Matching against literals
                          matchConst (LitP $ IntegerL i) [|runPackedReader $(q caseReader) $(q packed1) $(q l1)|]
                      -- Wildcard 
                      matchWild [|Prelude.fail "Bad Tag"|]

            returns [| mkPackedReader $(q resReader) |]
  where
    caseSignature = Sig.do
        (sourceType, _) <- liftB $ resolveAppliedType tyName
        branchesTypes <- liftB $ getBranchesTyList tyName []
        -- Create 2 type variables
        r <- qEC <$> newTypeVar "r"
        b <- qEC <$> newTypeVar "b"
        forM_ branchesTypes $ \branchType -> Sig.do
            -- Add argument to the function's signature
            addParam $
                let branchTypeList = foldr (\a rest -> [t|$(q a) ': $rest|]) (conT '[]) branchType
                 in [t|PackedReader $branchTypeList $r $b|]
        -- Set the return type of the function
        setResultType [t|PackedReader '[$(q sourceType)] $r $b|]

This example might feel a bit overwhelming because of its size, but it is a clear improvement from the original code. I used a few internal functions here (e.g. resolveAppliedType and getBranchesTyList) but we could have used th-abstraction instead.

Conclusion

While this approach required some bootstrapping (through extract-cons, lens-adt, the generated lenses and Builder), I would argue that the result definitely makes it worth it.

I find the DSL quite elegant and expressive enough, while still being not too verbose.

Regarding build time, I would say it’s reasonable: it takes 39 seconds to build extract-cons, lens-adt, template-haskell-lenses and template-haskell-natural (on a modern MacBook Pro, excluding dependencies, but they depend on lens and mtl).

It’s still a WIP though: documentation is very inconsistent, and I want to add more utility functions.

This needs to be confirmed, but I am quite confident about the maintainability of the library. It only relies on the AST definition through the lenses and the Builders that rely on an intermediate state. So theoretically, only the latter need to be updated when a change in TH’s API occurs (e.g. a new field in a constructor).

I should also mention that this approach is not specifically tied to Template Haskell. I think this method could be used to derive a monadic DSL from AST definitions for other languages (?)

Thank you for reading!

  1. Inspired by the tuple-th library ↩︎

  2. With one exception, for typed expressions. ↩︎

This post is licensed under CC BY 4.0 by the author.

Trending Tags