Adding Macro Support to Our Lisp

2024-08-17

#build-log #language-design #python #lisp

I really enjoyed writing my last post on creating a small lisp-like language using python. Since the post was already fairly long, I decided to cut one really cool feature: Metaprogramming with macros. This post adds that feature and is meant to be a continuation of the previous post.

What is Metaprogramming?

Metaprogramming is the ability for a language to write code that can modify itself while running. To put it another way: it’s a way to write code that writes code. This technique is often used to add new language features by writing code in the language itself. Another common use case is building up domain-specific languages (DSL).

If you’re curious how a few other languages do it, here are some relevant links:

Clojure (a lisp, much like our own)
Elixir
Haskell (via “Template Haskell”)
Rust

The benefits of a lisp

This feature is very common in lisp-like languages because the code is, essentially, already data. Consider the following expression:

(+ 1 2)

This adds 1 and 2 together to produce the value 3.

The form we’re seeing is a list (an “array” in other languages). What if we could alter the contents of that list before we execute it? We could change the order, add new elements to the list, or remove elements. Since expressions are just data (lists), we can operate on them just like any other piece of data in our language.

”Data is code” and “code is data” is a big selling point of lips-like languages.

Infix operations

Let’s say we wanted to add support for “infix” operations in our language. This would let us put the operator in between the operands, like most people are probably use to. Instead of doing something like + 1 2, this would let us write 1 + 2.

Here’s how we’d like that to work in our language:

(infix 1 + 2)

Feature: `macro`

The way a user creates a macro looks very similar to a function. The only difference is a new macro keyword:

(def infix
 (macro (a f b) (f a b)))

To start, we’ll need to create a new type that represents a macro. That way, when we run into it in our recursive eval function, we’ll be able to tell it apart from regular functions.

Like we did with fn, we’re going to use a dataclass (but a regular JavaScript object with a type tag would do just as well):

@dataclass
class Macro
    params: list
    body: list

Next, whenever we run into the word macro in our evaluator, we can create a new instance of the Macro class:

def eval(node, env=global_env):
    if type(node) is list:
        [fn, *args] = node
        match fn:
            case "def":
                # [...]
            case "if":
                # [...]
            case "do":
                # [...]
            case "fn":
                # [...]
            case "macro": # new
                [params, body] = args
                return Macro(params, body)
            case _:
                # [...]

So far nothing groundbreaking here. We do almost the same thing when creating functions. The only difference is that macros don’t take their own environment (macro hygiene is way out of scope for this post).

Now we need to update our case _ to handle macros. Here’s that case in full:

def eval(node, env=global_env):
    if type(node) is list:
        [fn, *args] = node
        match fn:
            case "def":
                # [...]
            case "if":
                # [...]
            case "do":
                # [...]
            case "fn":
                # [...]
            case _:
                args = [eval(a, env) for a in args]
                fn = eval(node[0], env)
                if type(fn) is Function:
                    fn_env = new_env(fn.env, fn.params, args)
                    return eval(fn.body, fn_env)
                # new
                if type(fn) is Macro:
                    return expand_macro(node, env)
                else:
                    return fn(*args)

All we’re really doing here is calling a new function (expand_macro). Let’s take a look at that:

 def expand_macro(node, env):
     [key, *args] = node
     macro = env[key]
     params_to_args = dict(zip(macro.params, args))
     body = replace(macro.body, params_to_args)
     return eval(body, env)

The first line simple destructures the list (node) into the key (infix in our example) and the arguments passed into the macro
We then look up the previously defined Macro in our environment
Next we combine all of the parameters with their corresponding argument
- For the example macro (macro infix (a f b) (f a b))
- And the macro call (infix 1 + 2)
- dict(zip(...)) would produce {"a": 1, "f": "+", "b": 2}
Given the mapping of parameters to arguments, we call a new replace function
- This is what updates the expression to be something new!
Lastly, we return the evaluation given the new body we created

The last piece of this puzzle is replace. This is what updates the list to be something new (code rewriting code):

def replace(node, mapping):
    if type(node) is str:
        if node in mapping:
            return mapping[node]
        else:
            return node
    elif type(node) is list:
        return [replace(e, mapping) for e in node]
    else:
        raise TypeError(f"invalid node type: {node}")

That’s it! This function is what rewrites code before it gets executed. Let’s break it down:

If the node we’re currently looking at is a string
- If this string is a key in our mapping dictionary (if it’s a parameter)
  - Return the corresponding value (argument)
- Otherwise, it’s just a regular “word” (something other than ( or ))
  - Return it unmodified
If the node we’re currently looking at is a list
- Call this function for every element in the list (recursion!)
- We’re using list comprehension here, but a simple loop-and-update would work
- In JavaScript, Array.prototype.map() would be perfect for this
Otherwise, raise an error
- This is simply to be defensive, we shouldn’t find ourselves in this situation

Kicking the tires

In the last post, we had a test file we wrote some code assertions in. Let’s add a new section to that:

(do
  ...

  (print-v :_____macros)
  (def infix (macro (a f b) (f a b)))
  
  (assert (= (infix 1 + 2) 3))
  (assert (= (infix 5 - 4) 1))
)

And if we run our interpreter with python main.py:

...
:_____macros
:pass
:pass

It works! We’ve added metaprogramming with only 25 more lines of code. Here’s my code for this blog post.

At this point, the sky is the limit. With the power to manipulate the language itself, we can add all sorts new language features without having to write any new python code.