I've been working on Resyntax, my automated refactoring tool for Racket, for over four years now.
It all began one evening in January 2021 when, after years of pondering the question of automated refactoring tooling for Racket, I asked myself a very useful question:
"Wait, how the hell does DrRacket's Macro Stepper actually work?"
I've revisited this question several times over the years. Each time, I've learned something frightening and powerful.
First encounters with eldritch powers
The first time I asked that question, I decided to dig around in DrRacket's codebase in search of answers. I was initially confused because if you crawl through the enormous reference documentation on Racket's macro system and syntax model, it isn't immediately obvious how one could inspect the complex intermediate states of the macro expander. You could manually step through execution repeatedly with expand-once
, but this gives you pretty limited information - certainly a lot less than the Macro Stepper reveals. So if DrRacket, adhering to the Racket philosophy of not privileging external tools with extralinguistic mechanisms, is able to keep track of all this state and expose it to users... then there must be some API somewhere that the powers that be Don't Want You To Know About (because it's unstable and a maintenance headache to commit to).
And I found it. It's called current-expand-observe
and it lives in the #%expobs
kernel module. Here's the macro stepper grabbing ahold of it through dynamic shenanigans.
(define current-expand-observe
(dynamic-require ''#%expobs 'current-expand-observe))
This parameter (in the Racket sense of a parameter, not a normal function parameter) allows one to set the current macro expansion event observation hook, which is a function that receives expansion events from the Racket macro expander every time it does something. Each event comes with a payload; usually some syntax objects and identifiers relevant to the event in question. The specific events and event payloads aren't documented and are far from stable, as they change whenever the expander's implementation details warrant. You can find all of the places where Racket emits these events by searching the Racket implementation for log-expand
.
Background: two core problems for refactoring macros
With this occult tool in hand, I realized I could fairly easily put together a refactoring tool that expands code and keeps track of all of the pieces of code that the expander actually visited. This is important, because macros and quotation mean that it's pretty common to see code that looks like real code but which is actually data. Consider this code:
(define (build-square-expression x)
`(* ,x ,x)
A naive refactoring tool - even one based on AST transformations - might lodge a complaint here, asking the user why they didn't rewrite (* ,x ,x)
to (sqr ,x)
. But in our case, the code in question isn't code. It's quoted data. (More precisely, it's quasiquoted data.) Macros introduce similar problems via syntax templates, and ensuring a refactoring tool can handle macros is very important because Racketeers absolutely love their macros.
We can't even try to work around the problem by writing rules that try to detect when the AST node they're analyzing is underneath a quote. While direct use of quotes can be caught in this manner, as Racketeers we must reckon with the unfortunate truth that the use of quasiquotation could be hidden by a macro. For example:
(define-syntax-rule (my-quote expr)
(quasiquote expr))
(define (build-square-expression x)
(my-quote (* ,x ,x)))
In order to tell that (* ,x ,x)
is quoted data and not actual code, we have to expand the my-quote
macro. Without expansion, we cannot be certain that the code we're looking at is what it appears to be.
We could use the expander directly and try to analyze the fully expanded code, but that falls prey to a different problem. A refactoring tool wants to refactor code written by humans, not code generated by macros. Refactoring is meant to make code more readable to humans. If humans aren't reading the code in the first place, as is the case for macro-generated code, there's hardly any point in refactoring it. You don't see the macro-generated code, so why would you care how pretty it looks?
Furthermore, some code might only be needed during expansion. This code effectively vanishes after expansion and won't exist in the fully expanded code. This is usually because the code was used at compile-time, not runtime. If we only examine the fully expanded code, we won't be able to refactor such "disappeared uses".
So that's our two big problems: telling the difference between code and data, and telling the difference between code written by humans and code generated by macros.
The genesis of Resyntax
With current-expand-observe
, we now have a solution for both of the above problems. We can tell which code is code and which code is data by tracking which forms the expander actually expands. Data doesn't get expanded, because data doesn't do anything on its own. We can also keep track of code at all steps of the expansion, not just the final expansion result: this ensures that even though we're expanding the code, we still have visibility into code that might vanish during expansion. To tell what code is spit out by macros, we can use syntax-original?
on the visited syntax objects to determine if the expanded form was produced by a macro or not.
Dark magic in hand, I put current-expand-observe
to use and built the initial Resyntax prototype on January 8th, 2021. I only had a tiny handful of basic refactoring rules, really just enough to verify that the system worked at all. Which it didn't. I'd forgotten a require
import in the example code and had to fix that in the next commit. But then it worked, and there was much rejoicing.
Where things went
After that fateful day, the rest was history. Resyntax has grown a lot of new functionality since then. Over a hundred built-in refactoring rules, an entire DSL for testing them, an extensible API for creating rules, a command-line interface for analyzing and refactoring code, an Autofixer that now generates weekly pull requests to several Racket repositories, and more. But the core has remained unchanged: Resyntax expands your code using Racket's macro expander while listening to Racket for metadata in expansion events, and then Resyntax uses that information to apply its refactoring rules to your code in an intelligent and binding-aware manner.
It's amazing what Racket's macro expander can do.