Type-machine
Using Template Haskell to derive the structure of records and simulate structural subtyping
Code available on GitHub, package available on Hackage.
Introduction
In Haskell, we usually model data using algebraic data types, like this:
1
data Maybe a = Nothing | Just a
Here we defined a Maybe
type, which has two constructors, Nothing
and Just
. The Just
constructor has one argument, while Nothing
as none.
It is common to consider these constructors’ arguments as fields, which can be mainly distinguished by their position in the constructor’s declaration. When data types have many fields, it becomes a bit of a pain to select them, for example, in functions like these:
1
2
3
4
5
6
7
8
9
10
data URL = URL
String -- ^ Scheme
String -- ^ Hostname
Maybe Int -- ^ Port
String -- ^ Path
[(String, String)] -- ^ Query Parameters
Maybe String -- ^ Fragment (#)
getPath :: URL -> String
getPath (URL _ _ _ path _ _) = path
Thankfully, in GHC 7.4.1 was introduced the record syntax1, which allows naming fields, like this:
1
2
3
4
5
6
7
8
data URL = URL {
scheme :: String,
hostname :: String,
port :: Maybe Int,
path :: String,
queryParams :: [(String, String)],
fragment :: Maybe String,
}
This lets us write the getPath
function much more concisely.
1
2
getPath :: URL -> String
getPath = path
Unfortunately, since Haskell’s type system is primarily nominal (=/= structural), we cannot set constraints that asks records to have a given set of fields. Yes, there is a HasField
typeclass, but it is too verbose for my liking, and only allows selecting a field, not updating it:
1
2
getName :: (HasField "name" a String) => a -> String
getName elem = getField @"name" elem
Lenses do provide getters and setters, but not in a name-polymorphic way, like HasField
. Alternatively, it is common to use heterogeneous lists to build records and work around the type system to simulate structural subtyping. However, we will see in the microbenchmark section why using these lists is not optimal.
Let’s consider another strongly typed language, TypeScript, which does support structural subtyping. It also provides a set of utility types, which I will call type-transformers.
I drew inspiration from TypeScript, wrote a bit of Template Haskell, and developed what became type-machine
, a Haskell library that allows deriving the structure of record types and generate constraints to simulate structural subtyping.
Here’s what it currently looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{-# LANGUAGE DuplicateRecordFields #-}
type_ "Vector2" (record ["x", "y"] [t|Int|])
-- Generates
-- data Vector2 = Vector2 {
-- x :: Int,
-- y :: Int,
-- }
type_ "Vector3" (union <::> ''Vector2 <:> record ["z"] [t|Int|])
-- Generates
-- data Vector3 = Vector3 {
-- x :: Int,
-- y :: Int,
-- z :: Int,
-- }
defineIs ''Vector2
deriveIs ''Vector2 ''Vector3
translateX :: (IsVector2 a) => Int -> a -> a
translateX n v = setX (n + getX v) v
example = translateX -1 (Vector3 1 2 3)
This blog post presents the library’s features, shows a use-case with Servant and evaluates its impact both at compile time and runtime.
Features
type_
and type-transformers
To derive a new type, we would use the type_
function:
1
type_ :: String -> TM Type -> Q [Dec]
The first argument is the name of the type to derive. The second argument to a TM
computation that produces a Type
. As any Template Haskell, it returns a Q [Dec]
.
Type
and TM
The Type
ADT models the structure of a record type. Its definition is simple and straight forward2:
1
2
3
4
5
6
7
8
data Type = Type
{ name :: Name
-- ^ Name of the data type
, fields :: Map String BangType
-- ^ Fields of the data type
, typeParams :: [(String, Maybe Kind)]
-- ^ Type parameter of the ADT
}
The TM
monad (standing for Type Machine
) is a simple type alias to WriterT [String] Q
, where [String]
represents potential error or warning messages issued during the computation.
Type-transformers
Type-transformers are functions that take a Type
and return a TM Type
.
The library comes with a small collection of such functions, inspired by TypeScript’s usability types, defined in TypeMachine.Functions
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
data A = A { a :: Int, b :: Int, c :: Int }
type_ "Picked" (pick ["a", "c"] <::> ''A)
-- Generates the following
data Picked = Picked { a :: Int, c :: Int }
type_ "Omitted" (omit ["b", "c"] <::> ''A)
-- Generates the following
data Omitted = Omitted { a :: Int }
type_ "Record" (record ["a", "b"] [t|String|])
-- Generates the following
data Record = Record { a :: String, b :: String }
type_ "Intersected" (intersection <:> pick ["c"] ''A <::> ''Record)
-- Generates the following
data Intersected = Intersected { a :: String, b :: String, c :: Int }
data B a :: B { f :: Maybe a }
type_ "Applied" (apply [t|Int|] <::> ''B)
-- Generates the following
data Applied = Applied { f :: Maybe Int }
This is not an exhaustive list, but it should give you an idea of what these transformers do and how to use them. If you have workedwith TypeScript’s type-level operations, you might find the syntax familiar.
Infix Operators
There are two main infix operators, <:>
and <::>
.
The <:>
operator
Remember that type-transformers are usually functions with type Type -> TM Type
? Well, <:>
allows chaining type-transformers. So it is like the binding operator, but with a twist. Some functions like intersection
take two Type
values as parameter, so >>=
wouldn’t work. The (somewhat ugly and unsatisfactory) workaround is to define a type of function whose parameters can be lifted in the TM
monad.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class LiftableTMFunction f where
applyTM :: forall a b. (f ~ (a -> b)) => (a -> b) -> TM a -> b
instance LiftableTMFunction (a -> TM b) where
applyTM f v = v >>= f
instance LiftableTMFunction (a -> b -> TM c) where
applyTM f ma b = do
a <- ma
f a b
instance LiftableTMFunction (a -> b -> c -> TM d) where
applyTM f ma b c = do
a <- ma
f a b c
-- Etc.
See how we have to define instances of the typeclass for each number of parameters? This is why I don’t like it. Maybe we could generate these instances using Template Haskell.
The <::>
operator
If type-transformers are functions with type Type -> TM Type
, how could we pass a Template Haskell Name
(prefixed with two single quotes) to them? This is what the toType :: Name -> TM Type
function is for. However, using this function would add verbosity to the type-transformer expressions:
1
type_ "Picked" (pick ["a", "c"] <:> toType ''A)
So, for brevity, I defined the <::>
operator:
1
t <::> n = t <:> toType n
Type-transformers aliases
Additionally, the TypeMachine.Infix
module provides additional infixes and aliases for the union
(&
) and intersection
(|
) type-transformers.
The collection of infixes includes:
<#|>
<:#|>
<#|:>
<#&>
<#&:>
<:&#>
The position of the :
in the infix indicates which side of the infix accepts a Name
, and the position of #
shows which side will have priority in case of an overlap (see union
, union'
, intersection
and intersection'
).
API
The definition of TM
, Type
and the infix operators are visible in the package’s API, meaning that you can write your own (possibly more advanced) type-transformers.
defineIs
and deriveIs
We saw how to derive types using type-transformers. Now let’s talk about how to get structural subtyping (almost) ‘for free’.
For any record types, the defineIs
function will generate a typeclass with:
- For each field, a getter and a setter function
- A function to transform a value into the target type.
The deriveIs
function generates an instance of the typeclass defined by defineIs
for the given type.
Here’s an example :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
data Id = Id { value :: Int }
defineIs ''Id
-- Generates the following
class IsId a where
getValue :: a -> Int
setValue :: Int -> a -> a
toId :: a -> Id
deriveIs ''Id ''Id
-- Generates the following
instance IsId Id where
getValue = value
setId newValue id_ = id_{value = newValue }
toId = id
data Id2 = Id2 { value2 :: Maybe Int }
deriveIs ''Id ''Id2
-- Fails, because Id2 does not have a 'value' field
The derivation algorithm tries to be smart: if a field’s type is a Monoid, it will be able to fall back when a field is missing in the source type:
1
2
3
4
5
6
7
8
defineIs ''Id2
deriveIs ''Id2 ''Id
-- Generates the following
instance IsId2 Id where
getValue2 = Nothing
setId _ id_ = id_
toId = Id Nothing
We can use these Is
typeclasses to simulate structural subtyping:
1
2
3
4
5
6
7
8
-- From the introduction
defineIs ''Vector2
deriveIs ''Vector2 ''Vector3
translateX :: (IsVector2 a) => Int -> a -> a
translateX n v = setX (n + getX v) v
example = translateX -1 (Vector3 1 2 3)
Limitations
Obviously the library has some shortcomings such as:
- Having to use the
DuplicateRecordFields
GHC extension- And having to deal with possible ambiguity when accessing a record’s field
- The
type_
function only handles record ADTs with exactly one constructor - I’m not really satisfied with having to pass a
Q Type
to therecord
type-transformer- And more generally, for advanced record manipulation, the programmer might need to be somewhat familiar with Template Haskell
- The
Is
typeclasses are useful, but are not a full replacement for theHasField
typeclass. I would like a future version oftype-machine
to provide something likedefineConstraint
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
defineConstraint "Has2DCoord" ["x", "y"] ''Vector2
-- Would generate the following constraint and getters/settings
type Has2DCoord a = (HasField "x" a Int, HasField "y" a Int)
setX :: (HasField "x" a Int) => Int -> a -> a
getX :: (HasField "x" a Int) => a -> Int
setY :: (HasField "y" a Int) => Int -> a -> a
getY :: (HasField "y" a Int) => a -> Int
translate2DCoord :: Has2DCoord a => n -> a -> a
translate2DCoord n vev = setY (+ n) (setX (+ n) vec)
Example
In web APIs, it is common to have a database model (say UserRecord
), whose structure can be used to derive user-facing models, like in responses (UserResponse
, a UserRecord
without the password) or forms (UserForm
, a UserRecord
without an ID). We can define and derive these models like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
data UserRecord = {
id :: Int,
name :: String,
password :: String
}
type_ "UserForm" (omit ["id"] <::> 'UserRecord)
type_ "UserResponse" (omit ["password"] <::> 'UserRecord)
defineIs UserResponse
deriveIs UserResponse UserRecord
deriving instance FromJSON UserForm
deriving instance ToJSON UserResponse
If we were to use Servant, we could define 2 endpoints, one to POST
a new user, and one to GET
one by ID:
1
2
3
4
5
6
7
type UserApi =
ReqBody '[JSON] UserForm :> Post '[JSON] UserResponse
:<|>
":id" :> Capture "id" Int :> Get '[JSON] UserResponse
server :: Server UserApi
server = createUser :<|> getUser
We could have a saveUserRecord
, which creates, persists and returns a UserRecord
from a UserForm
. Since UserForm
is not a subtype of UserRecord
(as it has no ID), we can’t use deriveIs
to generate the function to transform the former into the latter. However, since UserRecord
is a subtype of UserResponse
, we can use the generated toUserResponse
function to do the conversion for us:
1
2
3
4
5
6
7
8
9
10
11
12
13
createUser userForm = do
userRecord <- saveUserRecord userForm
let response =
toUserResponse userRecord
return response
saveUserRecord :: UserForm -> Db UserRecord
saveUserRecord userForm = do
newId <- getNextId
let record = UserRecord newId (name userForm) (password userForm)
save record
return record
We could do something similar when retrieving a UserRecord
:
1
2
3
4
5
6
7
getUser userId = do
userRecord <- getUserRecord userId
let response = toUserResponse userRecord
return response
getUserRecord :: Int -> Db UserRecord
getUserRecord = getByPrimaryKey
You can find other example use-cases in the repository
Microbenchmark
As mentioned in the introduction, it’s not unusual to use heterogeneous lists to model records and simulate structural subtyping.
However, because heterogeneous lists are not first-class citizens to GHC, the compiler will not be able to optimize, say, selection. Thus accessing a field is like traversing a list, which is O(n).
If we compare the time it takes to build and traverse records defined using type-machine
, extensible
and superrecord
(using Criterion), we can see that type-machine
is faster. This is not surprising, as we generate native records.
Library | type-machine | extensible | superrecord |
---|---|---|---|
Build time | 21.67ns | 24.32ns | 27.96ns |
Traversal time | 22.43ns | 168.3ns | 309.4ns |
Compilation time | 5.07s | 16.41s | 6m38 |
Note: Benchmarks were run on an Intel machine with two 986 Xeon Gold 6244 CPUs at 3.60 GHz, with 32Gb of RAM, running Ubuntu 22.04 LTS, using GHC 9.10.1 and the latest versions of the two other libraries. All code for these benchmarks is available on GitHub
Conclusion
While I am quite happy with the type-transformers, I feel like more work need to be done with the support for structural subtyping. The next step would be to write a flexible alternative to deriveIs
/defineIs
, deriveConstraint
.
Using Template Haskell/meta-programming to enhance the performance of programs is a topic I am really interested in, and type-machine
seems to be a nice example of how can leverage TH to make Haskell programs more efficient.
Feedback is welcome. Feel free to leave a comment below or open issues/pull requests on the repo!
Links
Thanks for reading! If you’re interested, check out type-machine
on GitHub and Hackage.
Note:
Kind
andBangType
are types from thetemplate-haskell
. I don’t like how tightly it couplesType
with that library, but defining custom types for them intype-machine
felt like it would’ve been redundant. ↩︎