Friday, May 31, 2013

Type-Safe Runtime Code Generation with (Typed) Template Haskell

Over the past several weeks I have implemented most of Simon Peyton Jones' proposal for a major revision to Template Haskell. This brings several new features to Template Haskell, including:

  1. Typed Template Haskell brackets and splices.
  2. Pattern splices and local declaration splices.
  3. The ability to add (and use) new top-level declarations from within top-level splices.

I will concentrate on the first new feature because it allows us to generate and compile Haskell values at run-time without sacrificing type safety. The code in this post is available on github; the github repository README contains instructions for building the th-new branch of GHC, which is where work on typed Template Haskell is being done. The GHC wiki contains more details about the current implementation status. The plan is that this work will land in HEAD before the 7.8 release of GHC.

The staged power function: untyped

The driving example I will use is the staged power function. The idea is that we want to compute \(x^n\) where \(n\) is known statically (that is, it is a constant we know ahead of time), but \(x\) may vary. John Lato wrote a blog post, Runtime Meta-programming in Haskell, that shows how to implement the staged power function while providing a veneer of type-safety on top of unsafe runtime code generation using "classic" Template Haskell. Examining his implementation is instructive, so I'll start there.

John's "safe" Template Haskell expression type is a newtype wrapper around the Template Haskell expression type, ExpQ, that adds a phantom type parameter.

newtype Meta a = Meta { unMeta :: ExpQ }

This definition means that for all α, we can construct a Meta α from any ExpQ by applying the constructor Meta. For example

e :: Meta Int
e = Meta [|'c'|]

Not very safe! However, using the Meta type, we can write combinators that are safe in the sense that if their inputs are "well-typed" Meta values, then their outputs will be "well-typed" Meta values. John defines one such safe combinator as follows

compose :: Meta (a -> b) -> Meta (b -> c) -> Meta (a -> c)
compose (Meta l) (Meta r) = Meta [| $r . $l |]

Although the compose function uses unsafe operations internally, if its two arguments really are Meta-wrapped Template Haskell expressions with the types a -> b and b -> c, respectively, then the result really will be a Meta-wrapped Template Haskell expression of type a -> c. This isn't enforced by the type system, but it's true nonetheless.

John defines the staged power function as follows

mkPow :: Int -> Meta (Int -> Int)
mkPow 0 = Meta [| const 1 |]
mkPow n = Meta [| \x -> x * $(unMeta $ mkPow (n-1)) x |]

Again, the type system doesn't enforce safety in the sense that it doesn't guarantee that the bit of Template Haskell that mkPow produces really has type Int -> Int, but we can see by inspection that it will always return a Template Haskell expression that has type Int -> Int. With classic Template Haskell, this is just about all that we can hope for.

The staged power function: typed

With the new version of Template Haskell, we can do better. Here is the definition of compose using the new Template Haskell typed brackets and typed splices (I have also changed the definition of compose so that the arguments are in the proper order :)).

compose :: TExpQ (b -> c) -> TExpQ (a -> b) -> TExpQ (a -> c)
compose f g = [|| $$f . $$g ||]

We can see a new data type, TExpQ a, and new syntax for splices and brackets. TExpQ a is a type alias for Q (TExp a), just as ExpQ is an alias for Q Exp. A TExp a is a typed Template Haskell expression—the type checker guarantees that it is the abstract syntax representation of a Haskell term of type a. This is in contrast to Exp, the existing untyped expression representation used by Template Haskell. A value of type Exp is the abstract syntax of a Haskell expression of some type, but a value of type TExp a is the abstract syntax of a Haskell expression of type a.

Values of type TExp a can only be constructed using typed brackets and typed splices. The syntax [||e||] is a typed bracket, and the syntax $$e is a typed splice. No other varieties of splice may occur in a typed bracket. In other words, [||e||] is the only introduction form for TExpQ, and $$e is the elimination form. It not the only elimination form; Template Haskell also provides a constant unType :: TExp a -> Exp that allows coercion from the type Template Haskell world to the untyped world; it simply throws away the type information

We can now write a completely safe version of the mkPow function.

mkPow :: Num a => Int -> TExpQ (a -> a)
mkPow 0 = [|| const 1 ||]
mkPow n = [|| \x -> x * $$(mkPow (n-1)) x ||]

This version is type-checked by the compiler; there is no unsafe newtype wrapping/unwrapping going on.

Under the covers

If you look at the definition of TExp in the template-haskell library, you'll see that it actually is a newtype wrapper.

newtype TExp a = TExp { unType :: Exp }

Why is this safe? Because the compiler is checking that a typed bracket is well-typed and then creating a TExp whose type index is that type. Of course you can always subvert type safety by importing Language.Haskell.TH.Syntax and mucking about with TExp's. But as long as you stick to typed brackets and splices, you will get type safety.

Run time code generation and compilation

Now that we can generate type safe Haskell expressions at run time, can we also safely compile them at run time? The answer is a qualified yes.

First, how can we compile a TExp a? John uses the plugins package, originally written by Don Stewart. I will use the GHC API directly. My implementation strategy is the same as his: print a Template Haskell expression to a file and compile and load the file. For this to work, we need two invariants to hold:

  1. The term that we pretty-print must be a faithful representation of the term that was type checked.
  2. The API that we use to compile the term we print out must assign it the same type that GHC did when it checked the term.

You might think that pretty-printing is safe, but in fact I had to fix a few bugs that caused the pretty-printed versions of some Template Haskell terms to not even parse as valid Haskell! I assume that the underlying expression value is a faithful representation of the term that GHC type checked.

The second property does not obviously hold. First of all, we need to make sure (approximately) that the type environment we use to compile the term is the same as the type environment used to type check it. Template Haskell resolves some names so that they are fully qualified. Perhaps we could walk over the term to figure out exactly which modules we need to import to compile the term, but I'm not sure that is sufficient. At the moment I've punted, so run time compilation may fail at run time. See the github repository for the details. Furthermore, even if we got all the imports right, defaulting and such might conceivably get in the way, so we slap a type signature on out term when we print it out to make sure the type of out compiled term really matches the type of the typed bracket.

Given the function compile :: forall a . Typeable a => TExp a -> IO a, which is defined in the github repository accompanying this post, we can now compile and run our staged power function. Note that the Typeable constraint is what allows us to print out the type of the term when we feed it to the GHC API.

main :: IO ()
main = do
  powTExp :: (TExp (Double -> Double)) <- runQ (mkPow 23)
  putStrLn . pprint . unType $ powTExp
  pow <- compile powTExp
  print $ pow 2.0
  print $ pow 10.0

Related work

Typed brackets bring much of the power of MetaML1 to Haskell. They are also similar to F#'s code quotations2; however, Haskell's untyped brackets are now completely untyped—the type checker doesn't check them at all. In classic Template Haskell, the compiler attempted to type check all varieties of bracket, but an expression bracket still resulted in a plain old ExpQ. F#'s untyped quotations are type checked, although their precise semantics are unclear to me. I like the untyped/typed distinction that is now made by the new Template Haskell; typed brackets really are typed, and untyped brackets are only parsed, never type checked.

Typed Template Haskell is quite different from MetaHaskell3, which provides a mechanism for quoting many different object languages while maintaining type safety. MetaHaskell also supports quotation and antiquotation using open code, which introduces a host of difficulties.

Footnotes:

1

W. Taha and T. Sheard, "Multi-stage programming with explicit annotations," in Proceedings of the 1997 ACM SIGPLAN symposium on Partial Evaluation and Semantics-Based Program Manipulation (PEPM '97), Amsterdam, The Netherlands, 1997, pp. 203–217.

3

G. Mainland, "Explicitly heterogeneous metaprogramming with MetaHaskell," in Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP '12), Copenhagen, Denmark, 2012, pp. 311–322.

2 comments:

  1. Could you please elaborate on the difficulties introduced by free variables? Would be interesting to compare your experience with ours in the Scala reflection framework.

    Also, don't you find typed quasiquotes to be at times limiting? When would you recommend using them instead of untyped quasiquotes?

    Finally have you looked into a possibility of mixing untyped and typed quasiquotes? Sometimes you know the type of an untyped quasiquote in advance - it's just that it's composed of parts that cannot be statically typed, so it cannot be given the type statically. In Scala we allow programmers to wrap untyped Tree's in typed Expr[T]'s. Have considered such a design?

    ReplyDelete
  2. I am really looking forward to using your improvements to Template Haskell, both for further work on metaprogramming and because I've hit some of the bugs that originally led to SPJ's proposal.

    I'm not sure of a solution to the type environment either. I think walking the AST to get the correct imports is probably good enough for at least 95% of cases. But I'm not yet convinced it always is, or that it's the best way.

    ReplyDelete