diff --git a/website/blog/2025/06-20-2025-scala-plugins-in-chisel/index.md b/website/blog/2025/06-20-2025-scala-plugins-in-chisel/index.md new file mode 100644 index 00000000000..b8574746900 --- /dev/null +++ b/website/blog/2025/06-20-2025-scala-plugins-in-chisel/index.md @@ -0,0 +1,337 @@ +--- +authors: + - adkian-sifive +tags: [kb] +slug: scala-plugins-in-chisel +description: Scala compiler plugins for Naming in the Chisel programming language +--- + +# Introduction + +Compiler plugins are a feature of the Scala programming language which +the Chisel eDSL uses extensively. Similar to their counterparts in +Kotlin and Haskell (as GHC plugins), they allow extending the behavior +of the compiler with custom AST inspection and transformation. While +quite a powerful tool, writing and understanding compiler plugins is +often an esoteric art -- plugin code has a tendency to be exceptionally +difficult to decipher by anyone other than its author. + +My goal with this article is to motivate and demystify the subtle art +behind Chisel's naming compiler plugin, thereby providing a much-needed +introductory treatment about it. Further, since the patterns +documented here have emerged as solutions to more general problems +faced by eDSLs, it is hoped that this documentation will serve as a +template for applying them in other DSLs and eDSLs. + +No background knowledge of compiler theory or development is assumed; +however an understanding of basic Chisel and Scala constructs will be +helpful in motivating the examples in the article. Introduction points +for further study about interesting topics will be provided as +*asides*. + +# Compilers + +It seems prudent when talking about compiler plugins to start with a +brief description of *compilers*. The simplest and most general (and +indeed useless) description of a compiler -- including the Scala +compiler -- is that it's a piece of executable software that +transforms one representation of a piece of text to another. More +often than not, the initial representation -- the "input" -- is some +kind of human-readable program; the final representation -- the +"output" -- is a machine-executable binary. + +> **_Aside: Who compiles the compiler?_** If the compiler is +> executable software, mustn't it also be compiled? The answer is yes +> -- the code of a compiler is also compiled by some other +> compiler. Most modern compilers are in fact able to compile +> *themselves* -- specifically, a compiler version x will compile the +> compiler version of x+1 (or x+0.1). See ["Wikipedia - +> Bootstrapping"](https://en.wikipedia.org/wiki/Bootstrapping_(compilers)) + +While in theory it's certainly possible for the compiler to go from +zero to hero in a single shot -- that is, transform straight from its +input to the output -- the complexity of modern compilers makes +compiler development more realistic by breaking up compilation into +separate chunks. These chunks are *phases*, which are in turn made of up +separate *transforms*. A phase, or a combination of phases, make up a +*pass* -- a top to bottom transformation of the entire input program. + +Each transform within each phase "reads" its input program and +rewrites it in some predetermined manner before passing it to the next +phase or transform. The earlier phases, which are responsible for +reading the syntax of the program, create what is known as an +/abstract syntax tree/, or an AST. This is basically a directed tree +representation of the entire program. The structure of the AST remains +the same for the most part; compiler passes add or remove nodes or +information from the AST to make progress towards the final output. + +# Compilers and eDSLs + +An "embedded domain-specific language" (or eDSL) is a fairly new term +for an old idea -- a compiler which compiles to a higher level +language from an even higher-level, domain-specific language. The +"domains" here range from numerical operations1, database +management2, or even... hardware generation3, 4. + +General treatment of the theory of eDSLs is still fairly lacking +compared to the theory of programming languages proper, but a +pertinent concept that's often mentioned is the "depth" of embedded of +an eDSL^5^. Depending on how DSLs are embedded in their host +languages, eDSLs can either be "deep" or "shallow" embeddings. A +shallowly embedded language is little more than a library -- import +constructs from the language and you're good to go. On the other hand, +a deeply embedded language uses advanced metaprogramming or custom +compiler passes to append to the host language's parser or the type +system to provide a richer language interface. + +One of the key factors relevant to our current discussion which the +depth of embedding of DSLs influences is the ordering of execution of +the host language compiler and the DSL compiler. For a shallowly +embedded language, the host language compiler always executes fully before +the DSL compiler kicks in. The deeper the DSL is embedded, the earlier +its compiler kicks in -- its phases can be interspersed with those of +the host language with custom type or even syntax processing happening +before that of the host language. + +The Chisel programming language is a shallow embedding in the Scala +language -- that is, it is for the most part a Scala library that +allows users to import special constructs for hardware +generation. Barring a couple exceptions (that happen to be the crux of +this article), the Chisel compiler runs during "Chisel-time", a time +that is always after "Scala-time". This is a key concept for +motivating the naming problem in Chisel and other eDSLs. + +# Compiler plugins + +Various languages provide varying degrees of support of *compiler +plugins* - a feature for DSL developers that allows adding custom passes which +will be interspersed within the host language's compiler pipeline. + +In Scala, compiler plugin phases run between specified phases of the +host compiler. These phases can have custom transforms that receive +ASTs from the previous phase, inspect and transform them, and send +them along to the next phase. Compiler plugins are written within the +host Scala compiler's namespace and context. While this provides them +with the full power of the compiler itself, it requires quite a bit of +Scala compiler knowledge to implement. + +> **_Aside: Research plugins_** Scala 3 introduces experimental +> "Research Plugins", which allow plugin developers to completely +> rewrite the ordering of all the phases of the Scala compiler +> pipeline. See ["Scala Reference - Compiler +> Plugins"](https://docs.scala-lang.org/scala3/reference/changed-features/compiler-plugins.html) + +# Naming problem in eDSLs + +Embedded DSLs tend to have a fundamental blind spot - variable +naming. When writing code in an eDSL, a user might write something +like + +``` +val x = func(42) +``` + +thereby binding the name `x` to a variable in the current scope to the +invocation of `func`. If this compiles fine, the host language +compiler will probably transform the bound name to the value it +evaluates on the right-hand side. + +Depending on the DSL the user can certainly expect some bound name `x` +in the final output of the compiler. This, however, will not happen +if the host language erases the name or changes it to some temporary +contextual name depending on its position in the call stack, as is +common in most compiled languages. The name `x` will hence be lost by +the time the host compiler finishes, and before the DSL compiler runs. + +This means that without any intervention on the part of the DSL +developer, variable names from the input user code cannot always be +expected to be syntactically or semantically preserved in the compiled +code. + +In short, variables used in a Chisel program are all native Scala +variables whose names are only available in Scala-time and not in +Chisel-time. + +Therein lies the problem. + +# Solving the naming problem + +## Chisel naming with compiler plugin + +Predictable user-code naming is especially important in +Chisel as it compiles down to Verilog, where engineers rely on +predictable signal naming schemes from Chisel all the way down +through FIRRTL6 and to Verilog for signal tracing, debugging and +hardware verification. + +> **_Aside: Naming in Chisel_** Naming of user-defined variables has +> evolved significantly in Chisel over each major version and has +> recently stabilized, with now multiple naming schemes available based +> on user requirements -- from simple name "suggestions" to the +> compiler to more complex custom name overrides. See ["Chisel +> Explanations - +> Naming"](https://www.chisel-lang.org/docs/explanations/naming) + +To capture as raw of a user-code name as possible, Chisel runs a +custom name-grabber pass right after Scala finishes constructing a +typed AST. Just naively capturing variable names won't do, however, +since Chisel only cares about names of *Chisel types* which will in +turn become hardware names in the final Verilog. + +To make sure the plugin only grabs the names of Chisel types, the +transform methods in the custom naming phase inspect the right hand +side of every `val` definition -- sometimes even recursively, if +needed, for boxing types such as `Option`. + +> **_Aside: Chisel... types?_** I say "Chisel types" here which is a +> bit of a misnomer. Chisel actually doesn't have its own type system +> -- it relies on a system of objects to create instances of subtypes +> of the Data class. This system has its pros and cons and shall +> perhaps be a topic of a future article. For now, see ["Chisel +> Explanations - +> Datatypes"](https://www.chisel-lang.org/docs/explanations/data-types) + +In the internal representation of the syntax tree, each AST node +refers to some statement in the language; the ones we are interested +in for variable naming are of the type `val x = func(42)`. When Scala's +parser and the typer phases have run over this statement, the internal +representation of this statement in the compiler looks something like: + +``` +ValDef( + "x", + TypeTree[TypeRef(ThisType(...))], + Apply( + Ident(func), + List(Literal(Constant(42))) + ) +) +``` + +`ValDef`, `TypeTree`, `Apply`, `Ident` and so on are internal types of +the Scala compiler source code which can be composed to create an AST +representation. + +Once a Chisel type has been detected by the naming transformation, it +inspects the syntax tree of each AST node to extract the variable name +of the current `val`. This can done pretty easily with matching over +the `ValDef` as seen above to extract the first field which is the +variable name. The naming phase now knows that there's a `val` +definition bound to "x" with a type we're interested in. + +Next, we need to propagate this name from Scala-time to Chisel-time, +so that Chisel can sanitize it if necessary and lower it correctly to +FIRRTL. This is done using the `withName` method that's part of Chisel +`core`. The naming plugin rewrites the AST node of the `val` +definition to insert a call to the `withName` method on the RHS of the +val definition. The above statement will hence become: + +``` +val x = chisel3.withName("x")(func(42)) +``` + +The `withName` insertion into the AST node effectively *stages* the +addition of a string name until Chisel-time, when Chisel internals +process naming given the Chisel-time context of the variable +definition statement. The Chisel-time name processing is its own +beast and deserves its own article. + + +## Summary: Chisel naming plugin + +Here's a short top-to-bottom summary of the naming plugin: + +- User writes Chisel code and compiles it with the latest Chisel + compiler +- The Chisel compiler registers a naming phase with the Scala + compiler, and the build tool runs the Scala compiler +- During Scala compile time, after the Scala compiler's parser and + typer phases are finished running, the naming plugin receives the + AST from the typer phase and runs transformations on each `ValDef` + definition it encounters. +- For each `ValDef` AST node, it inspects the right-hand side of the + statement to check if somewhere in the leaf nodes of the thing being + defined is a Chisel type. +- If a Chisel type is found, it extracts the variable name from the + left-hand side +- The plugin rewrites the RHS of the statement with a call to + `chisel3.withName`, thereby staging the variable name of the + statement into a method that executes at Chisel-time +- During Chisel-time, Chisel executes `withName` and names the + variables based on the Chisel-time context at the statement + +## The reality of naming: Diversity of solutions + +### Chisel pre-3.4 + +Chisel's use of a Scala compiler plugin was a fairly recent +innovation. Previously, Chisel relied on Scala 2's "Macro Paradise" +plugin -- which itself is a compiler plugin -- to apply a +`suggestName` for each declared variable. This was considered fragile +for several reasons, including the fact that the macro invocation +would be handled in Chisel-time, not Scala-time. This meant that names +were harvested after Scala had completed all phases and names could +not necessarily be deterministically computed from inputs. + +### Other DSLs + +A survey of existing implementations of eDSLs in different functional +languages seems to suggest that the naming problem is somewhat +ubiquitous. In most languages, some form of compile-time reflection is +needed to capture user code naming. + +Unsurprisingly, the "purest" solution to the naming problem that +falls naturally from the host language's metaprogramming features is +in Lisp. This is because of Lisp's so-called *homoiconicity* -- its +ability to treat representation of its own code as data itself. This +means that any Lisp macro defined in the DSL implementation can *see* +the symbol that user code references as data.7 + +The Clash programming language, a fellow hardware eDSL implemented in +Haskell, has a slight advantage due to its deeper embedding. Clash has +access to constructs within Haskell core including the OccName8 data +structure that contains plaintext name and namespace information for +every declared symbol in user code. Clash utilizes the plaintext names +from OccName and applies similar disambiguation and sanitization as +Chisel core. + +Most modern Haskell eDSLs whose compilers run separately from the GHC +tend to use Template Haskell, an experimental metaprogramming +facility.9 + +Interestingly enough, an older staged DSL implemented in Haskell +called Paradise came up with a solution identical to the one currently +implemented in Chisel where the GHC preprocessor inserts calls to an +annotation function coincidentally also called `withName`.10 + +# Final Remarks + +Chisel naming has come a long way, and after undergoing heavy +utilization and customization in mission-critical applications at +[SiFive](https://sifive.com), can safely be deemed to be stable. With the +ongoing work of adding support for Scala 3 in Chisel, we're hoping to +develop cleaner and more readable Scala compiler plugins. + +# References and further reading +1[Typelevel - Spire ](https://typelevel.org/spire/) + +2[Apache Spark](https://spark.apache.org/) + +3[Clash Programming Language](https://clash-lang.org/) + +4[Spatial Programming Language](https://spatial-lang.org/) + +5[Folding Domain-Specific Languages: Deep and Shallow Embeddings](https://www.cs.ox.ac.uk/jeremy.gibbons/publications/embedding.pdf) + +See also [YouTube: Tiark Rompf - DSL Embedding in Scala](https://www.youtube.com/watch?v=16A1yemmx-w) + +6[The FIRRTL Spec](https://github.com/chipsalliance/firrtl-spec) + +7[Common Lisp CookBook](https://cl-cookbook.sourceforge.net/clos-tutorial/index.html) + +8[OccName in +Haskell](https://downloads.haskell.org/~ghc/6.10.2/docs/html/libraries/ghc/OccName.html) + +9[Naming with Template Haskell](https://markkarpov.com/tutorial/th?#names) + +10[Paradise: A two-stage DSL embedded in Haskell](https://urchin.earth.li/~ganesh/icfp08.pdf) diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 31d5a350567..ba9833b2e9b 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -17,3 +17,11 @@ jackkoenig: x: jackakattack github: jackkoenig linkedin: koenigjack +adkian-sifive: + name: Aditya Naik + title: Senior Engineer at SiFive + page: true + socials: + x: 0x7felf + github: adkian-sifive + linkedin: 0x7felf diff --git a/website/blog/tags.yml b/website/blog/tags.yml index 864d173a367..5e868081cf9 100644 --- a/website/blog/tags.yml +++ b/website/blog/tags.yml @@ -5,3 +5,6 @@ release: talk: label: "Talk/Publication" description: "Blog posts referring to talks or other publications" +kb: + label: "Knowledge base" + description: "Knowledge sharing about Chisel internals, features and other topics"