From AST definition to monadic DSL

A nicer way to write code generators in Template Haskell

Posted Feb 25, 2026

By Arthur Jamet

17 min read

Introduction

After working on 3 small-to-medium-sized projects that use Template Haskell (packed, type-machine, and reflection), I’ve come to realise that, while the idea of generating Haskell code in Haskell is fun, actually doing it isn’t.

Aside from the Q monad and the reify function, the main feature of Template Haskell’s API is the collection of AST types it provides. Basically, (almost) any code that is valid Haskell can be represented as an AST, including top-level declarations(Dec), expressions (Exp), and even pragmas, bang patterns and type annotations.

These AST types are defined in the most idiomatic-Haskell way: using algebraic data types (ADT):

  
data Exp = VarE Name | ConE Name | LitE Lit | AppE Exp Exp | ...

However, I find meta-programs (i.e. functions that build a Haskell AST) quite verbose. Let’s generate a projection function for tuples¹:

  
genProj :: TupleSize -> FieldIndex -> Exp
genProj size idx = LamE 
    [TupP (replicate i WildP ++ [VarP f] ++ replicate (n - i - 1) WildP)] 
    (VarE f) 
  where f = mkName "value"

This example showcases 2 of the 5 main painful things we have to deal with when writing meta-programs:

Having to manage names (although here we just have f)
We have to build patterns and expressions separately: names cannot be used as patterns and expressions directly.
- E.g., we cannot write let name = mkName "f" in LamE [name] name
In pure context (i.e. non Q computations), we can easily end up with clashing names (see mkName vs. newName)
Deconstructing values is a pain: we have to know in advance the number of fields of the target constructor, and build a list of Pats and insert VarP name at the correct position. This is even more frustrating when we want to access only a specific set of fields.
Refactoring pure computations to/from Q operations can be a pain as this usually means changing all the constructors used to build the AST (e.g. lamE <-> LamE, varP <-> VarP).

There are many other annoying points, but this is not a hate post. All this friction made me realise something: it would be so much better if we could build these ASTs through a monadic DSL.

This is what I have been working on during the last few weeks. I came up with template-haskell-natural: a library that leverages lenses and the State monad to derive a monadic DSL for Template Haskell. It allows writing meta-programs that follow the same flow as the program to generate. Worded differently, it allows writing meta-programs naturally:

  
import Language.Haskell.TH.Natural.Syntax.Builder as B

-- \(.., a, ..) -> a
genProj :: TupleSize -> FieldIndex -> Q Exp
genProj size idx idx = genExpr $ newExpr $ B.do
  tup <- arg
  value <- getTupleField size idx tup
  returns value

-- | \(a, b, ..) -> (.., b, a)
genReverse :: TupleSize -> ExpQ
genReverse n = genExpr $ newExpr $ b.do
    tup <- arg
    tupleFields <- forM [0 .. n - 1] $ \i -> getTupleField n i tup
    returns (TupE (Just <$> reverse tupFields))

The library provides utilities to build ASTs for:

Typeclasses and instances
Type Signatures
Datatypes and constructors
Expressions, binding variables, and pattern matching

The goal of this blog post is to present the ideas behind the library, as well as show some examples of meta-programs built using template-haskell-natural.

Foundations

Lenses and State Monad

Basically (and I apologise for the gross simplification), lenses are operators that allow accessing and updating the fields of an object. They are most commonly generated using Template Haskell. Let’s take the example of a simple record object and a function that increments one of its fields:

  
data MyData = MyData { _s :: String, _i :: Int, _c :: Char }

incI :: MyData -> MyData
incI d = d{ _i = _i d + 1 }

This is one short way to write incI. Using lenses, however, we can make this even more concise:

  
makeLenses 'MyData -- Generates lenses: s, i, c

incI :: MyData -> MyData
incI d = (d & i) +~ 1

While we can argue that the operators & and +~ require some getting used to, it definitely looks better.

The good news is that it gets even better when we interact with MyData through a State monad:

  
incI' :: State MyData ()
incI' = i += 1

mapS :: (Char -> Char) -> State MyData ()
mapS f = s %= (fmap f)

consCS :: State MyData String
consCS = liftA2 (:) (view c) (view s)

Lenses for Template Haskell

The imperative interface provided by lenses and the State monad could help us avoid some of the friction listed in the intro. We could imagine having utility functions that allow managing names, extract a given set of fields, build patterns, etc.

However, there are two issues that we need to consider:

Some operations, e.g. binding arguments, require ensuring that the Exp that is being built is a LamE. We need to be able to work with selected constructors (e.g. ClassD, FunD, LamE, etc.) type-safely.
The fields of the ADTs’ constructors are unnamed. So we cannot use the utilities from the lens library to derive lenses easily.

Extracting constructors

To solve the first problem, I came up with a small library, extract-cons, which allow extracting all the constructors of a given type constructor into standalone datatypes:

  
import Data.Constructor.Extract.TH

extractConstructorsOf 'Dec

-- ==> Generates the following ADTs:
data ClassD = MkClassD Cxt Name [TyVarBndr BndrVis] [FunDep] [Dec]
data InstanceD = MkInstanceD (Maybe Overlap) Cxt Type [Dec]
-- etc.  

It also provides a way to easily to convert the generated data types back to the source datatype.

Naming ADT fields

The lens-adt package solves the second issue. It provides a simple TH utility function that generate a lens for each field of a datatype, using the position function from generic-lens.

  
makeADTLenses 'ClassD ["cxt", "name", "tyVarBndr", "funDep", "dec"]
makeADTLenses 'InstanceD ["overlap", "cxt", "ty", "dec"] 

Some field names can clash, so instead of defining lens functions, it generates instances of typeclasses to allow overloading.

  
makeADTLenses 'VarE ["name"]

-- ===> What actually gets generated 

class HasName s t where
  name :: Lens' s t

instance HasName VarE Name where
  name = position @1

All these lenses are generated and exposed by the template-haskell-lenses package.

Good news: We now have everything we need to build ASTs imperatively!

  
-- | Generate an instance for the given typeclass (Name) 
-- using the provided builder
genInstance :: Name -> StateT Q InstanceD () -> Q Dec
genInstance n builder = 
  execStateT builder (MkInstanceD Nothing [] (ConT n) []) 

setOverlap :: Overlap -> StateT Q InstanceD ()
setOverlap o = overlap ?= o

addInstanceArg :: Type -> StateT Q InstanceD ()
addInstanceArg tyArg = ty %= (`AppT` tyArg)

Note: This is not valid template-haskell-natural code, but has a similar feel to it.

Ensuring correctness of the built AST with the `Builder` Monad

We could just use the lenses from template-haskell-lenses along with a StateT Q monad, and build the AST using the operators presented earlier. But we can go even further: the template-haskell-natural provides another layer of safety to make sure that the produced AST is ‘valid’: for example, we need to make sure that the LamE has a body, that the last statement in a DoE is not a bind, etc.

While it is possible to use extracted constructors as a state most of the time, that’s not the case for e.g. LamE because the body will not be immediately set by the user. In such cases, we thus need to define an intermediary state datatype.

Because of this, we need something on the type-level to guarantee that our AST-building computation is ready to produce a valid AST. One way to do this is to use a graded monad:

  
data BuilderStep = Empty | Ready

newtype BaseBuilder m st prev curr a =
  BaseBuilder { unBB :: StateT m st a }
-- * skipping the redefinition of (>>=), (>>), etc. 

type Builder = BaseBuilder Q
type ConstBuilder st = Builder st Ready Ready

The Builder type alias is for computations for which we need to keep track of the progress (e.g. when all the fields are set). The ConstBuilder alias is for computations for which tracking progress is not necessary.

These definitions were so abstract that I could use them for all AST-building computations defined in template-haskell-natural ². For illustration purposes, let’s consider the two main functions to build lambda expressions:

  
-- Provided by the library
type ExprBuilder = Builder LamE

-- | Adds a pattern to the lambda and 
-- returns the bound name wrapped in `VarE`
arg :: ExprBuilder step step Exp
arg = do
    nextArgName <- liftB $ TH.newName "arg"
    argNames |>= nextArgName
    return $ TH.VarE nextArgName

-- | Sets the given expression as te 'body' of the lambda
returns :: GenExp e => e -> ExprBuilder step Ready ()
returns q = do
    expr <- liftB $ genExpr q
    impure (returnedExp ?= expr)

-- | The function to produce the final `Exp` AST 
newExpr :: ExprBuilder step Ready () -> Q Exp

The type parameters of the first function, arg, show that calling it does not alter the progress of the calling computation. On the other hand, the returns function sets the result expression of the lambda, making it ready to be reified.

Using the type-level arguments, we can guarantee at compile time that the AST to build is indeed ready and not missing any mandatory information.

`GenX` typeclasses

Notice how returns does not take an Exp, but any value whose type is an instance of the GenExp typeclass. With this abstraction, the programmer can actually pass anything that can be turned into an Exp, so both plain Exp values and quasi-quotations. Thus, it is possible to do the following:

  
_ = B.do
  returns [|Nothing|]
  -- or
  returns (conE 'Nothing)
  -- or
  returns (ConE 'Nothing)

Thanks to this, we don’t have to worry about whether we can use pure constructors or quasi quotations to build the AST.

template-haskell-natural comes with variants of the GenExp typeclass for Decs, Types and Pats.

Examples

To test that template-haskell-natural is actually usable, I spent some time reimplementing a couple of libraries that generate Haskell code using Template Haskell.

For the following snippets, we use the QualifiedDo language extension to be able to use the do-notation with our overloaded monadic operators.

Lambda Expressions: Reimplement `tuple-th`

tuple-th is a small library that provides TH functions to manipulate functions. Internally, it relies on a thin layer of abstraction around TH, allowing the functions to be reasonably small. I’ll leave you check out the original code. If the library were to be rewritten using template-haskell-natural, here is what it would look like.

  
import Language.Haskell.TH.Natural.Syntax.Builder as B
import Language.Haskell.TH.Natural.Syntax.Case
import Language.Haskell.TH.Natural.Syntax.Expr.Simple

mapTuple

  
-- (a -> b) -> (a, ..) -> (b, ..)
mapTuple :: TupleSize -> ExpQ
mapTuple n = genExpr $ newExpr $ B.do
  f <- arg
  tupFields <- tupleFields n =<< arg
  let transformedTupleFields = (f `AppE`) <$> tupFields
  returns $ tupE (pure <$> transformedTupleFields)

deleteAtTuple

  
-- Deletes the i-th element of the tuple.
deleteAtTuple :: TupleSize -> TupleFieldIndex -> ExpQ
deleteAtTuple n idx = genExpr $ newExpr $ B.do
  tup <- arg
  tupFields <- getTupleFields n tup
  let filteredFields = uncurry (++) $ second (drop 1) $ splitAt idx tupFields
  returns (tupE (pure <$> filteredFields))

safeTupleFromList

  
-- [a] -> Maybe (a, ..)
safeTupleFromList :: TupleSize -> ExpQ
safeTupleFromList n = genExpr $ newExpr $ B.do
  list <- arg
  returns $ case_ list $ B.do
      matchList n $ \listItems -> [|Just $(tupE $ pure <$> listItems)|]
      matchWild [|Nothing|]

catTuples

  
-- (a1, ..) -> (b1, ..) -> (a1, .., b1, ..)
catTuples :: TupleSize -> TupleSize -> ExpQ
catTuples n1 n2 = genExpr $ newExpr $ B.do
  tup1Fields <- tupleFields n1 =<< arg 
  tup2Fields <- tupleFields n2 =<< arg
  let fields = tup1 ++ tup2
  returns $ tupE (pure <$> fields)

We end up with mostly clear and reasonably verbose code, without having to define a wrapper layer.

Datatypes: Reimplement `extract-cons`

The extract-cons library presented earlier is a small library that generates standalone datatypes from data constructors (see the Extracting constructors section). If we were to reimplement the library’s core function using template-haskell-natural, we would get the following:

  
-- DataAndCon is an internal type from `extract-cons` containing info about the constructor to extract
genDataDec :: DataAndCon -> ExtractOptions -> DecQ
genDataDec d opts = genDec $ newData dataName $ B.do
  -- Add each type argument in the constructor to the type constructor
  forM_ (conTypeArgNames d) $ \tyArg -> addTypeVar (MkVarT tyArg)

  -- Add a 'deriving' clause for the typeclasses specified in the ExtractOptions 
  addDeriving `mapM_` deriveClasses opts

  -- Add one constructor
  addCon $ newCon conName $ B.do
      -- Add each argument of the original constructor to the new one, 
      -- preserving strictness and unpacked-ness
      addField' `mapM_` conArgs
  where
    conName = -- defined by opts
    dataName = -- defined by opts

Thanks to the monadic nature of the computation, we can leverage existing functions like mapM and forM to build easily build the AST programmatically.

Type signature, function declaration and pattern matching: Reimplement the `case` generator from `packed-data`

The packed-data library comes with a set of code generators. One of them builds a case function that simulates pattern matching on packed data (more info here).

For example, the case function generated for a Tree datatype would look like this:

  
data Tree a = Leaf a | Node (Tree a) (Tree a)

-- ====>
{-# INLINE caseTree #-}
caseTree ::
    (PackedReader '[a] r b) ->
    (PackedReader '[Tree a, Tree a] r b) ->
    PackedReader '[Tree a] r b
caseTree leafCase nodeCase = mkPackedReader $ \packed l -> do
    (tag :: Tag, packed1, l1) <- runReader reader packed l
    case tag of
        0 -> runReader leafCase packed1 l1
        1 -> runReader nodeCase packed1 l1
        _ -> fail "Bad Tag"

Again, using template-haskell-natural, we could write the code generator like this:

  
genCase :: Name -> DecsQ
genCase tyName = do
    (TyConI (DataD _ _ _ _ cs _)) <- reify tyName
    newFunc ("case" ++ nameBase tyName) $ do
        -- INLINE pragma
        inline 
        -- Sets the function's type signature
        setSignature caseSignature
        -- Sets the body of the function
        bodyFromExp $ newExpr $ B.do
            -- Bind n arguments, where n is the number of constructor in the datatype
            caseReaders <- replicateM (length cs) arg

            resReader <- liftB $ genExpr $ B.do
              -- Bind both arguments in the PackedReader
              packed <- arg
              l <- arg

              -- The expression returns a do-expression
              returns $ newDo $ B.do
                  -- Bind (<-) the result of runPackedReader
                  tpl <- bind [|runPackedReader reader $(q packed) $(q l)|]

                  -- Deconstruct the value and get its fields
                  (tag, packed1, l1) <-
                      liftA3
                          (,,)
                          -- Adding a type annotation to the first item
                          (getField' '(,,) 0 tpl $ \n -> [p|($(q n) :: Tag)|])
                          -- Get the second and third field in the triple
                          (getField '(,,) 1 tpl)
                          (getField '(,,) 2 tpl)  

                  -- Pattern matching on the Tag
                  returns $ case_ tag $ B.do
                      forM_ ([0 ..] `zip` caseReaders) $ \(i, caseReader) ->
                          -- Matching against literals
                          matchConst (LitP $ IntegerL i) [|runPackedReader $(q caseReader) $(q packed1) $(q l1)|]
                      -- Wildcard 
                      matchWild [|Prelude.fail "Bad Tag"|]

            returns [| mkPackedReader $(q resReader) |]
  where
    caseSignature = newSignature $ B.do
        (sourceType, _) <- liftB $ resolveAppliedType tyName
        branchesTypes <- liftB $ getBranchesTyList tyName []
        -- Create 2 type variables
        r <- qEC <$> newTypeVar "r"
        b <- qEC <$> newTypeVar "b"
        forM_ branchesTypes $ \branchType -> Sig.do
            -- Add argument to the function's signature
            addParam $
                let branchTypeList = foldr (\a rest -> [t|$(q a) ': $rest|]) (conT '[]) branchType
                 in [t|PackedReader $branchTypeList $r $b|]
        -- Set the return type of the function
        setResultType [t|PackedReader '[$(q sourceType)] $r $b|]

This example might feel a bit overwhelming because of its size, but it is a clear improvement from the original code. I used a few internal functions here (e.g. resolveAppliedType and getBranchesTyList) but we could have used th-abstraction instead.

Limitations

Some functions that use Names (such as getField, to deconstruct a value using a constructor name) require them to be visible (see reify’s doc). For example, let’s say you have this bit of code:

  
f :: Q [Dec]
f = do
  myData <- genDec $ newData "MyData" $ newCon "MkData" $ addField [t|Int|]
  unpackData <- genDecs $ newFunc "unData" $ bodyFromExp $ B.do
    d <- arg
    value <- getField (mkName "MkData") 0 -- FAIL
    returns value
  return (myData : unpackData)

The call to getField will fail because the function relies on reify to get the constructor’s argument count (and generate a valid pattern). One solution is to generate the two declarations in two separate splices. Another way would be to integrate a cross-Builder tracking of the declared data types, classes, etc. to allow retrieving information from their name.

Conclusion

While this approach required some bootstrapping (through extract-cons, lens-adt, the generated lenses and Builder), I would argue that the result definitely makes it worth it.

I find the DSL quite elegant and expressive enough, while still being not too verbose.

Regarding build time, I would say it’s reasonable: it takes 39 seconds to build extract-cons, lens-adt, template-haskell-lenses and template-haskell-natural (on a modern MacBook Pro, excluding dependencies, but they depend on lens and mtl).

It’s still a WIP though: documentation is very inconsistent, and I want to add more utility functions.

This needs to be confirmed, but I am quite confident about the maintainability of the library. It only relies on the AST definition through the lenses and the Builders that rely on an intermediate state. So theoretically, only the latter need to be updated when a change in TH’s API occurs (e.g. a new field in a constructor).

I should also mention that this approach is not specifically tied to Template Haskell. I think this method could be used to derive a monadic DSL from AST definitions for other languages (?)

Thank you for reading! Feedback and impressions are appreciated :)

Inspired by the tuple-th library ↩︎
With one exception, for typed expressions. ↩︎

Haskell

This post is licensed under CC BY 4.0 by the author.