jueves, 12 de noviembre de 2015

Custom pattern-matching and advanced macrology

Hello, and welcome to the 3rd entry in the Lux Devlog, a blog dedicated to the design and implementation of the Lux Programming Language, an upcoming functional language in the Lisp tradition.

Today, I'll talk about 2 really interesting subjects that go hand in hand and that give Lux a lot of its power and expressiveness.

First: Custom pattern-matching


Before explaining the details, I'll talk a bit about the genesis of this feature...

Back in the early days of Lux's design & development, I was thinking about syntax for lists.
Some languages (such as Haskell and JavaScript) offer custom syntax for list or array data-structures since the regular syntax for building data-structures tends to be a bit cumbersome to use, while lists are very commonly used data-structures.

Just consider the difference between writings this:
 [1, 2, 3, 4]  
versus writing this:
 Cons 1 (Cons 2 (Cons 3 (Cons 4 Nil)))  

While designing Lux, I came to the conclusion that adding custom syntax for lists was, in a way, betraying the language. Rather than come up with a quick-and-dirty fix to the problem of lists, I wanted to have consistent and general ways to deal with data-structures, while also having comfortable syntax for lists. In short, I wanted to have my cake and eat it too.

The solution for building lists was pretty easy. In a language with macros, the easiest way to fix these kinds of syntax issues was to write a macro... and that's exactly what I did:

 (@list 1 2 3 4) ## This might surprise some of you, as an earlier version didn't have the @ sign...  

There is also an alternative version, which takes a "tail" as it's last argument

 (@list& 1 2 3 4 (@list 5 6 7 8))  

OK, so that takes care of building lists, but there is still something missing...
There's no point in building data-structures if you can't tear them apart later on. The mechanism for that is pattern-matching. And here is where we encounter our problem...

Macros work well in regular code, but pattern-matching is special. Patterns are not expressions. You're not supposed to evaluate a pattern in order to get something out of it.
However, the syntax for writing patterns turns out to be identical to the syntax for writing (certain kinds of) expressions.

The answer to this question might seem obvious to a lot of you... and it's also obvious to me (in hindsight). But to a prior version of me, many months ago, it wasn't such an obvious thing. I struggled with an answer for weeks and even considered just having custom syntax for lists and give up on the subject...

And then I had my "aha!" moment. If macros give me the syntax to build lists, and that syntax is the same I need for patterns, then why not just generate patterns with macros too. Sounds obvious, right? (In hindsight...)

But there was still the matter of how do I implement it.
Do I traverse the patterns to check for all macros that show up and expand those?
That seems easy, but then I thought of something...

Macros are pretty flexible tools. Building code for making data-structures is just one of the myriad things you can achieve.
But what if my patterns could care about more than just destructuring things.
I can't just expand macros in place whenever I see them, because I'd be assuming every macro is there to generate destructuring code for me.
I need to add more control, and include macros that allow me to do more than just easily decompose data-structures.

And so, the idea for custom pattern-matching was born.

The concept is pretty simple, the pattern-matching macro (case) checks patterns to see if there is a top-level macro invocation. If so, that macro gets involved with both the pattern and the body that is to be executed for that pattern. Whatever comes out of the macro gets substituted for the pattern and the body, and the macro-expansion process is repeated until no more macros are left.

The beautiful thing is that, since the body is also included in the macro call, you can have pattern-matching macros that transform their bodies or that repeat them an arbitrary amount of times. The power unleashed is very impressive, and I have only scratched the surface of what can be achieved.

Now, without further ado, it's time for some demonstrations :D

 (def (to-pairs xs)  
   (All [a] (-> (List a) (List (, a a))))  
   (case xs  
     (\ (@list& x1 x2 xs'))  
     (@list& [x1 x2] (to-pairs xs'))  
     _  
     #;Nil))  

The \ macro has the simple task of expanding every macro it finds inside the pattern. It's the simplest of the pattern-matching macros and its use is very common (specially when working with lists).

 (deftype Weekday  
   (| #Sunday  
      #Monday  
      #Tuesday  
      #Wednesday  
      #Thursday  
      #Friday  
      #Saturday))

 (def (weekend? day)  
   (-> Weekday Bool)  
   (case day  
     (\or #Sunday #Saturday)  
     true  
     _  
     false))  

The \or macro repeats the body given to it for every pattern you give it. That way, you can reuse the body whenever you have patterns that involve returning the same result.

 ## This is an actual structure from the lux/meta/ast file:  
 (defstruct #export AST/Eq (Eq AST)  
   (def (= x y)  
     (case [x y]  
       (\template [<tag> <struct>]  
        [[[_ (<tag> x')] [_ (<tag> y')]]  
        (:: <struct> (= x' y'))])  
       ([#;BoolS   Bool/Eq]  
        [#;IntS    Int/Eq]  
        [#;RealS   Real/Eq]  
        [#;CharS   Char/Eq]  
        [#;TextS   Text/Eq]  
        [#;SymbolS Ident/Eq]  
        [#;TagS    Ident/Eq])  

       (\template [<tag>]  
        [[[_ (<tag> xs')] [_ (<tag> ys')]]  
        (and (:: Int/Eq (= (size xs') (size ys')))  
             (foldL (lambda [old [x' y']]  
                      (and old (= x' y')))  
               true  
               (zip2 xs' ys')))])  
       ([#;FormS]  
        [#;TupleS])  

       [[_ (#;RecordS xs')] [_ (#;RecordS ys')]]  
       (and (:: Int/Eq (= (size xs') (size ys')))  
            (foldL (lambda [old [[xl' xr'] [yl' yr']]]  
                     (and old (= xl' yl') (= xr' yr')))  
              true  
              (zip2 xs' ys')))  

       _  
       false)))  

\template is the pattern-matching sibling to the do-template macro. It allows you to reuse the code of the body, with some minor modifications to make it more suitable to each particular case.
After the \ macro, this is probably the most handy pattern-matching macro to have around.

 ## Please forgive the super-contrived example  
 (deftype MyRecord  
   (& #foo Int  
      #bar Int  
      #baz Text))  
   
 (def (sum rec)  
   (-> MyRecord Int)  
   (case rec  
     (\slots [#foo #bar])  
     (i:+ foo bar)))  

\slots is Lux's counterpart to Clojure's :keys destructuring syntax.

 ## Again, sorry for the contrived example...  
 (def type-1 "foo")  
 (def type-2 "bar")  
 (def type-3 "baz")  
   
 (def (process-message message)  
   (-> (, Text Data) (,))  
   (case message  
     (\~ [(~ type-1) data]) (do-something      data)  
     (\~ [(~ type-2) data]) (do-something-else data)  
     (\~ [(~ type-3) data]) (do-another-thing  data)))  

Have you ever wanted to reuse a literal value in a situation that doesn't allow you the use of variables?
That's a bit problematic, as you end up repeating the same literal value over and over again, introducing the risk of bugs, should the value every change.

The \~ macro is there for precisely this purpose. Just tell it what you need inlined and let it work it's magic.
Note: It only works with Bool, Int, Real, Char and Text

Finally, I've got a nice treat for you guys...

Lux favors eager evaluation over lazy evaluation. However, we all know that some times laziness can be useful, and there are even some data-structures that depend on it, such as streams.

Lux offers a type for doing lazy evaluation:

 (deftype #export (Lazy a)  
   (All [b]  
     (-> (-> a b) b)))  

In Lux, Lazy is just like the Cont type in Haskell, except that it's arguments are in the reverse order.
Streams are defined in terms of Lazy:

 (deftype #export (Stream a)  
   (Lazy (, a (Stream a))))  

This means that streams are actually functions.

Now... some of you might think "if streams are functions, that means I can't pattern-match against them".
Well, my friend, you're wrong!

 (def (take-s n xs)  
   (All [a] (-> Int (Stream a) (List a)))  
   (if (i:<= n 0)  
     #;Nil  
     (case stream  
       (\stream& x xs')  
       (#;Cons x (take-s (dec n) xs')))))  

The \stream& macro modifies the body so that pattern-matching on streams amounts to running the functions appropriately to extract the values.
Thanks to pattern-matching macros, we can actually pattern-match against functions ^_^ .

BTW, I have talked about what macros come by default in the Lux standard library, but I haven't shown how they're implemented.
Just so you can get an idea, here's the implementation for the \ macro:

 (defmacro' #export (\ tokens)  
   (case tokens  
     (#Cons body (#Cons pattern #Nil))  
     (do Lux/Monad  
       [pattern+ (macro-expand-all pattern)]  
       (case pattern+  
         (#Cons pattern' #Nil)  
         (wrap (@list pattern' body))  
        
         _  
         (fail "\\ can only expand to 1 pattern.")))  
     
     _  
     (fail "Wrong syntax for \\")))  

Also, I haven't mentioned something very important.
Even though those macros can only work thanks to the case macro macro-expanding the patterns, there are other macros out there which use case in their implementations, and they can also benefit from pattern-matching macros.

Some of those macros are the let macro, the lambda macro, and the do macro.
That's right, you can use custom pattern-matching against the arguments to your functions, or inside your do-notation, or even in good ol' let forms. How cool is that!?

_____________________________________________________________________________

Second: Inter-operating macros


I know I've talked a lot already, but there's one other topic I want to discuss on this post, and that is how to get macros to inter-operate.

As the custom pattern-matching examples show, you can unlock great power when your macros can work together, rather than separately, and Lux already reaps some of that power outside of the boundaries of pattern-matching.

To get macros to play together, you need to do 2 things:

  1. Some macros must perform macro-expansions on the arguments they receive, in order to process the outputs
  2. There must be some kind of "contract" between the outer macro and the inner macros

The first part is pretty obvious. If not because case does macro-expansion on the patterns it receives, none of the magic would happen.

But the second part is often missed by macro writers in other lisps. Without a common contract, communication becomes impossible and there can be no cooperation.

What's a common contract?


Consider this: whatever your pattern-matching macros generate, some rules must always stand:

  1. They must have an even number of outputs (since you're substituting both the pattern and the body).
  2. For the patterns being generated, they must either be primitive forms suitable for regular pattern-matching, or they must be macro calls to be further expanded.

If any of these 2 rules is broken, the case macro is going to complain about it.

However, this isn't the only common contract macros can have and Lux already has a few macros with their own contracts.

The defsig common contract


You might remember the defsig macro from the last blog post (if not, I advise you to go read it).
What you might not know is that you can actually use other macros inside it's body.

Here's a nice example (from the lux/control/number module):

 (defsig #export (Number n)  
   (do-template [<name>]  
     [(: (-> n n n) <name>)]  
     [+] [-] [*] [/] [%])  
   
   (do-template [<name>]  
     [(: (-> n n) <name>)]  
     [negate] [signum] [abs])  
   
   (: (-> Int n)  
     from-int)  
   )  

The Number signature provides simple math operations.
There are already implementations for Int and Real in the standard library.
And, as you can see, I make liberal use of the do-template macro to reduce boiler-plate.

The reason why this works is simple: every member of the signature must take the form:
(: <type> <name>)

Anything that generates forms of that kind is going to be welcomed. You could even implement and use your own macros in there, provided that they generate that kind of code.

defstruct also has a similar contract...

The defstruct common contract


 (defstruct #export Int/Number (Number Int)  
   (do-template [<name> <op>]  
     [(def (<name> x y) (<op> x y))]  
  
     [+ _jvm_ladd]  
     [- _jvm_lsub]  
     [* _jvm_lmul]  
     [/ _jvm_ldiv]  
     [% _jvm_lrem])  
   
   (def (from-int x)  
     x)  
   
   (def (negate x)  
     (_jvm_lmul -1 x))  
   
   (def (abs x)  
     (if (_jvm_llt x 0)  
       (_jvm_lmul -1 x)  
       x))  

   (def (signum x)  
     (cond (_jvm_leq x 0) 0  
       (_jvm_llt x 0) -1  

       ## else  
       1))  
   )  

In this case, what defstruct is looking for is forms that define things.
Note that, the def macro being used here is the very same one used to define everything else in Lux.

Pretty cool, huh?

Finally, there's one last piece of macro awesomeness I want to share before we call it quits.
I came with it fairly recently, so I haven't settled on a name yet.
For now, let's just call it let%

Before I explain how it works, I'll show the itch it's meant cure:

 (defstruct #export Json/Read (Read Json)  
   (def (read input)  
     (case (:: JsonNull/Read (read input))  
       (#;Some value)  
       (#;Some (#Null value))  
    
       #;None  
       (case (:: JsonBoolean/Read (read input))  
         (#;Some value)  
         (#;Some (#Boolean value))  
   
         #;None  
         (case (:: JsonNumber/Read (read input))  
           (#;Some value)  
           (#;Some (#Number value))  
   
           #;None  
           (case (:: JsonString/Read (read input))  
             (#;Some value)  
             (#;Some (#String value))  
   
             #;None  
             (case (:: (JsonArray/Read [read]) (read input))  
               (#;Some value)  
               (#;Some (#Array value))  
   
               #;None  
               (case (:: (JsonObject/Read [read]) (read input))  
                 (#;Some value)  
                 (#;Some (#Object value))  
   
                 #;None  
                 #;None))))))  
     ))  

Do you see that? That train-wreck? That monstrosity?

It's from a JSON library for Lux I'm working on.
The Read signature in Lux is for structures that try to parse something out of text. If they fail, you get #None

As you can see, I'm having to do some testing, to try to figure out what I'm parsing, but the code is ruled by repetitive case expressions where everything is the same, except what parser I'm using and what tag to give to the result.

Surely, there must be a better way of doing it!

First... let's flatten the structure:

 (defstruct #export Json/Read (Read Json)  
   (def (read input)  
     (|> #;None  
         (case (:: (JsonObject/Read [read]) (read input))  
           (#;Some value)  
           (#;Some (#Object value))  
   
           #;None  
           )  
         (case (:: (JsonArray/Read [read]) (read input))  
           (#;Some value)  
           (#;Some (#Array value))  
   
           #;None  
           )  
         (case (:: JsonString/Read (read input))  
           (#;Some value)  
           (#;Some (#String value))  
   
           #;None  
           )  
         (case (:: JsonNumber/Read (read input))  
           (#;Some value)  
           (#;Some (#Number value))  
   
           #;None  
           )  
         (case (:: JsonBoolean/Read (read input))  
           (#;Some value)  
           (#;Some (#Boolean value))  
   
           #;None  
           )  
         (case (:: JsonNull/Read (read input))  
           (#;Some value)  
           (#;Some (#Null value))  
   
           #;None  
           ))  
     ))  

This might not be much, but it's a start.
By using the piping macro |>, I can avoid all the nesting and keep all the tests in the same level.
Now it's even more obvious that every form in there has the same shape, minus the parser and the tag.

Man... wouldn't it be nice of we just had a macro for repeating things, while passing in parameters...

 (defstruct #export Json/Read (Read Json)  
   (def (read input)  
     (|> #;None  
         (do-template [<struct> <tag>]  
           [(case (:: <struct> (read input))  
              (#;Some value)  
              (#;Some (<tag> value))  
   
              #;None  
              )]  
       
           [(JsonObject/Read [read]) #Object]  
           [(JsonArray/Read [read]) #Array]  
           [JsonString/Read     #String]  
           [JsonNumber/Read     #Number]  
           [JsonBoolean/Read     #Boolean]  
           [JsonNull/Read      #Null]))  
         ))  

do-template seems like a pretty wise choice here, doesn't it?
The problem is that it doesn't play well with |>, as |> doesn't do any kind of macro-expansion of it's member forms prior to piping them.
Because of that, I can't combine |> with do-template; as awesome as that might be.

let% to the rescue

 (defstruct #export Json/Read (Read Json)  
   (def (read input)  
     (let% [<tests> (do-template [<struct> <tag>]  
                      [(case (:: <struct> (read input))  
                         (#;Some value)  
                         (#;Some (<tag> value))  
   
                         #;None  
                         )]  
             
                      [(JsonObject/Read [read]) #Object]  
                      [(JsonArray/Read [read])  #Array]  
                      [JsonString/Read          #String]  
                      [JsonNumber/Read          #Number]  
                      [JsonBoolean/Read         #Boolean]  
                      [JsonNull/Read            #Null])]  
       (|> #;None  
           <tests>))  
       ))  

let% is meant for those situations in which you want to expand certain parts of the bodies of macros prior to expanding outer macros.With let%, you can bind symbols to arbitrary amounts of forms produced by macro expansions, and then they're going to be spliced wherever the symbols appear, further down the line.

With this, I can do all of my parsing while only focusing on the common logic and the details that change.

Note: let% is just a temporary name until I come up with something better.
Do you have any suggestion? Leave it in the comments.
And, yes, I have already considered with-macro-expansions, but it's too damned long...

_____________________________________________________________________________

That's it for today.
I know it was a long post, but hopefully you'll now understand the amazing power that can be unlocked when you write macros that play well together.

See you next time :D

No hay comentarios:

Publicar un comentario