Commit b3c2556
Improve Oxide candidate extractor [0] (#16306)
This PR adds a new candidate[^candidate] extractor with 2 major goals in
mind:
1. It must be way easier to reason about and maintain.
2. It must have on-par performance or better than the current candidate
extractor.
### Problem
Candidate extraction is a bit of a wild west in Tailwind CSS and it's a
very critical step to make sure that all your classes are picked up
correctly to ensure that your website/app looks good.
One issue we run into is that Tailwind CSS is used in many different
"host" languages and frameworks with their own syntax. It's not only
used in HTML but also in JSX/TSX, Vue, Svelte, Angular, Pug, Rust, PHP,
Rails, Clojure, .NET, … the list goes on and all of these have different
syntaxes. Introducing dedicated parsers for each of these languages
would be a huge maintenance burden because there will be new languages
and frameworks coming up all the time. The best thing we can do is make
assumptions and so far we've done a pretty good job at that.
The only certainty we have is that there is at least _some_ structure to
the possible Tailwind classes used in a file. E.g.: `abc#def` is
definitely not a valid class, `hover:flex` definitely is. In a perfect
world we limit the characters that can be used and defined a formal
grammar that each candidate must follow, but that's not really an option
right now (maybe this is something we can implement in future major
versions).
The current candidate extractor we have has grown organically over time
and required patching things here and there to make it work in various
scenarios (and edge cases due to the different languages Tailwind is
used in).
While there is definitely some structure, we essentially work in 2
phases:
1. Try to extract `0..n` candidates. (This is the hard part)
2. Validate each candidate to make sure they are valid looking classes
(by validating against the few rules we have)
Another reason the current extractor is hard to reason about is that we
need it to be fast and that comes with some trade-offs to readability
and maintainability.
Unfortunately there will always be a lot of false positives, but if we
extract more classes than necessary then that's fine. It's only when we
pass the candidates to the core engine that we will know for sure if
they are valid or not. (we have some ideas to limit the amount of false
positives but that's for another time)
### Solution
Since the introduction of Tailwind CSS v4, we re-worked the internals
quite a bit and we have a dedicated internal AST structure for
candidates. For example, if you take a look at this:
```html
<div class="[@media(pointer:fine)]:data-[state=pending]:hover:text-red-500/(--my-opacity)"></div>
```
<details>
<summary>This will be parsed into the following AST:</summary>
```json
[
{
"kind": "functional",
"root": "text",
"value": {
"kind": "named",
"value": "red-500",
"fraction": null
},
"modifier": {
"kind": "arbitrary",
"value": "var(--my-opacity)"
},
"variants": [
{
"kind": "static",
"root": "hover"
},
{
"kind": "functional",
"root": "data",
"value": {
"kind": "arbitrary",
"value": "state=pending"
},
"modifier": null
},
{
"kind": "arbitrary",
"selector": "@media(pointer:fine)",
"relative": false
}
],
"important": false,
"raw": "[@media(pointer:fine)]:data-[state=pending]:hover:text-red-500/(--my-opacity)"
}
]
```
</details>
We have a lot of information here and we gave these patterns a name
internally. You'll see names like `functional`, `static`, `arbitrary`,
`modifier`, `variant`, `compound`, ...
Some of these patterns will be important for the new candidate extractor
as well:
| Name | Example | Description |
| -------------------------- | ----------------- |
---------------------------------------------------------------------------------------------------
|
| Static utility (named) | `flex` | A simple utility with no inputs
whatsoever |
| Functional utility (named) | `bg-red-500` | A utility `bg` with an
input that is named `red-500` |
| Arbitrary value | `bg-[#0088cc]` | A utility `bg` with an input that
is arbitrary, denoted by `[…]` |
| Arbitrary variable | `bg-(--my-color)` | A utility `bg` with an input
that is arbitrary and has a CSS variable shorthand, denoted by `(--…)` |
| Arbitrary property | `[color:red]` | A utility that sets a property to
a value on the fly |
A similar structure exist for modifiers, where each modifier must start
with `/`:
| Name | Example | Description |
| ------------------ | --------------------------- |
---------------------------------------- |
| Named modifier | bg-red-500`/20` | A named modifier |
| Arbitrary value | bg-red-500`/[20%]` | An arbitrary value, denoted by
`/[…]` |
| Arbitrary variable | bg-red-500`/(--my-opacity)` | An arbitrary
variable, denoted by `/(…)` |
Last but not least, we have variants. They have a very similar pattern
but they _must_ end in a `:`.
| Name | Example | Description |
| ------------------ | --------------------------- |
------------------------------------------------------------------------
|
| Named variant | `hover:` | A named variant |
| Arbitrary value | `data-[state=pending]:` | An arbitrary value,
denoted by `[…]` |
| Arbitrary variable | `supports-(--my-variable):` | An arbitrary
variable, denoted by `(…)` |
| Arbitrary variant | `[@media(pointer:fine)]:` | Similar to arbitrary
properties, this will generate a variant on the fly |
The goal with the new extractor is to encode these separate patterns in
dedicated pieces of code (we called them "machines" because they are
mostly state machine based and because I've been watching Person of
Interest but I digress).
This will allow us to focus on each pattern separately, so if there is a
bug or some new syntax we want to support we can add it to those
machines.
One nice benefit of this is that we can encode the rules and handle
validation as we go. The moment we know that some pattern is invalid, we
can bail out early.
At the time of writing this, there are a bunch of machines:
<details>
<summary>Overview of the machines</summary>
- `ArbitraryPropertyMachine`
Extracts candidates such as `[color:red]`. Some of the rules are:
1. There must be a property name
2. There must be a `:`
3. There must ba a value
There cannot be any spaces, the brackets are included, if the property
is a CSS variable, it must be a valid CSS variable (uses the
`CssVariableMachine`).
```
[color:red]
^^^^^^^^^^^
[--my-color:red]
^^^^^^^^^^^^^^^^
```
Depends on the `StringMachine` and `CssVariableMachine`.
- `ArbitraryValueMachine`
Extracts arbitrary values for utilities and modifiers including the
brackets:
```
bg-[#0088cc]
^^^^^^^^^
bg-red-500/[20%]
^^^^^
```
Depends on the `StringMachine`.
- `ArbitraryVariableMachine`
Extracts arbitrary variables including the parentheses. The first
argument must be a valid CSS variable, the other arguments are optional
fallback arguments.
```
(--my-value)
^^^^^^^^^^^^
bg-red-500/(--my-opacity)
^^^^^^^^^^^^^^
```
Depends on the `StringMachine` and `CssVariableMachine`.
- `CandidateMachine`
Uses the variant machine and utility machine. It will make sure that 0
or more variants are directly touching and followed by a utility.
```
hover:focus:flex
^^^^^^^^^^^^^^^^
aria-invalid:bg-red-500/(--my-opacity)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
Depends on the `VariantMachine` and `UtilityMachine`.
- `CssVariableMachine`
Extracts CSS variables, they must start with `--` and must contain at
least one alphanumeric character or, `-`, `_` and can contain any
escaped character (except for whitespace).
```
bg-(--my-color)
^^^^^^^^^^
bg-red-500/(--my-opacity)
^^^^^^^^^^^^
bg-(--my-color)/(--my-opacity)
^^^^^^^^^^ ^^^^^^^^^^^^
```
- `ModifierMachine`
Extracts modifiers including the `/`
- `/[` will delegate to the `ArbitraryValueMachine`
- `/(` will delegate to the `ArbitraryVariableMachine`
```
bg-red-500/20
^^^
bg-red-500/[20%]
^^^^^^
bg-red-500/(--my-opacity)
^^^^^^^^^^^^^^^
```
Depends on the `ArbitraryValueMachine` and `ArbitraryVariableMachine`.
- `NamedUtilityMachine`
Extracts named utilities regardless of whether they are functional or
static.
```
flex
^^^^
px-2.5
^^^^^^
```
This includes rules like: A `.` must be surrounded by digits.
Depends on the `ArbitraryValueMachine` and `ArbitraryVariableMachine`.
- `NamedVariantMachine`
Extracts named variants regardless of whether they are functional or
static. This is very similar to the `NamedUtilityMachine` but with
different rules. We could combine them, but splitting things up makes it
easier to reason about.
Another rule is that the `:` must be included.
```
hover:flex
^^^^^^
data-[state=pending]:flex
^^^^^^^^^^^^^^^^^^^^^
supports-(--my-variable):flex
^^^^^^^^^^^^^^^^^^^^^^^^^
```
Depends on the `ArbitraryVariableMachine`, `ArbitraryValueMachine`, and
`ModifierMachine`.
- `StringMachine`
This is a low-level machine that is used by various other machines. The
only job this has is to extract strings that start with double quotes,
single quotes or backticks.
We have this because once you are in a string, we don't have to make
sure that brackets, parens and curlies are properly balanced. We have to
make sure that balancing brackets are properly handled in other
machines.
```
content-["Hello_World!"]
^^^^^^^^^^^^^^
bg-[url("https://example.com")]
^^^^^^^^^^^^^^^^^^^^^
```
- `UtilityMachine`
Extracts utilities, it will use the lower level `NamedUtilityMachine`,
`ArbitraryPropertyMachine` and `ModifierMachine` to extract the utility.
It will also handle important markers (including the legacy important
marker).
```
flex
^^^^
bg-red-500/20
^^^^^^^^^^^^^
!bg-red-500/20 Legacy important marker
^^^^^^^^^^^^^^
bg-red-500/20! New important marker
^^^^^^^^^^^^^^
!bg-red-500/20! Both, but this is considered invalid
^^^^^^^^^^^^^^^
```
Depends on the `ArbitraryPropertyMachine`, `NamedUtilityMachine`, and
`ModifierMachine`.
- `VariantMachine`
Extracts variants, it will use the lower level `NamedVariantMachine` and
`ArbitraryValueMachine` to extract the variant.
```
hover:focus:flex
^^^^^^
^^^^^^
```
Depends on the `NamedVariantMachine` and `ArbitraryValueMachine`.
</details>
One important thing to know here is that each machine runs to
completion. They all implement a `Machine` trait that has a
`next(cursor)` method and returns a `MachineState`.
The `MachineState` looks like this:
```rs
enum MachineState {
Idle,
Done(Span)
}
```
Where a `Span` is just the location in the input where the candidate was
found.
```rs
struct Span {
pub start: usize,
pub end: usize,
}
```
#### Complexities
**Boundary characters:**
When running these machines to completion, they don't typically check
for boundary characters, the wrapping `CandidateMachine` will check for
boundary characters.
A boundary character is where we know that even though the character is
touching the candidate it will not be part of the candidate.
```html
<div class="flex"></div>
<!-- ^ ^ -->
```
The quotes are touching the candidate `flex`, but they will not be part
of the candidate itself, so this is considered a valid candidate.
**What to pick?**
Let's imagine you are parsing this input:
```html
<div class="hover:flex"></div>
```
The `UtilityMachine` will find `hover` and `flex`. The `VariantMachine`
will find `hover:`. This means that at a certain point in the
`CandidateMachine` you will see something like this:
```rs
let variant_machine_state = variant_machine.next(cursor);
// MachineState::Done(Span { start: 12, end: 17 }) // `hover:`
let utility_machine_state = utility_machine.next(cursor);
// MachineState::Done(Span { start: 12, end: 16 }) // `hover`
```
They are both done, but which one do we pick? In this scenario we will
always pick the variant because its range will always be 1 character
longer than the utility.
Of course there is an exception to this rule and it has to do with the
fact that Tailwind CSS can be used in different languages and
frameworks. A lot of people use `clsx` for dynamically applying classes
to their React components. E.g.:
```tsx
<div
class={clsx({
underline: someCondition(),
})}
></div>
```
In this scenario, we will see `underline:` as a variant, and `underline`
as a utility. We will pick the utility in this scenario because the next
character is whitespace so this will never be a valid candidate
otherwise (variants and utilities must be touching). Another reason this
is valid, is because there wasn't a variant present prior to this
candidate.
E.g.:
```tsx
<div
class={clsx({
hover:underline: someCondition(),
})}
></div>
```
This will be considered invalid, if you do want this, you should use
quotes.
E.g.:
```tsx
<div
class={clsx({
'hover:underline': someCondition(),
})}
></div>
```
**Overlapping/covered spans:**
Another complexity is that the extracted spans for candidates can and
will overlap. Let's take a look at this C# example:
```csharp
public enum StackSpacing
{
[CssClass("gap-y-4")]
Small,
[CssClass("gap-y-6")]
Medium,
[CssClass("gap-y-8")]
Large
}
```
In this scenario, `[CssClass("gap-y-4")]` starts with a `[` so we have a
few options here:
1. It is an arbitrary property, e.g.: `[color:red]`
2. It is an arbitrary variant, e.g.: `[@media(pointer:fine)]:`
When running the parsers, both the `VariantMachine` and the
`UtilityMachine` will run to completion but end up in a
`MachineState::Idle` state.
- This is because it is not a valid variant because it didn't end with a
`:`.
- It's also not a valid arbitrary property, because it didn't include a
`:` to separate the property from the value.
Looking at the code as a human it's very clear what this is supposed to
be, but not from the individual machines perspective.
Obviously we want to extract the `gap-y-*` classes here.
To solve this problem, we will run over an additional slice of the
input, starting at the position before the machines started parsing
until the position where the machines stopped parsing.
That slice will be this one: `[CssClass("gap-y-6")]` (we already skipped
over the whitespace). Now, for every `[` character we see, will start a
new `CandidateMachine` right after the `[`'s position and run the
machines over that slice. This will now eventually extract the `gap-y-6`
class.
The next question is, what if there was a `:` (e.g.:
`[CssClass("gap-y-6")]:`), then the `VariantMachine` would complete, but
the `UtilityMachine` will not because not exists after it. We will apply
the same idea in this case.
Another issue is if we _do_ have actual overlapping ranges. E.g.: `let
classes = ['[color:red]'];`. This will extract both the `[color:red]`
and `color:red` classes. You have to use your imagination, but the last
one has the exact same structure as `hover:flex` (variant + utility).
In this case we will make sure to drop spans that are covered by other
spans.
The extracted `Span`s will be valid candidates therefore if the outer
most candidate is valid, we can throw away the inner candidate.
```
Position: 11112222222
67890123456
↓↓↓↓↓↓↓↓↓↓↓
Span { start: 17, end: 25 } // color:red
Span { start: 16, end: 26 } // [color:red]
```
#### Exceptions
**JavaScript keys as candidates:**
We already talked about the `clsx` scenario, but there are a few more
exceptions and that has to do with different syntaxes.
**CSS class shorthand in certain templating languages:**
In Pug and Slim, you can have a syntax like this:
```pug
.flex.underline
div Hello World
```
<details>
<summary>Generated HTML</summary>
```html
<div class="flex underline">
<div>Hello World</div>
</div>
```
</details>
We have to make sure that in these scenarios the `.` is a valid boundary
character. For this, we introduce a pre-processing step to massage the
input a little bit to improve the extraction of the data. We have to
make sure we don't make the input smaller or longer otherwise the
positions might be off.
In this scenario, we could simply replace the `.` with a space. But of
course, there are scenarios in these languages where it's not safe to do
that.
If you want to use `px-2.5` with this syntax, then you'd write:
```pug
.flex.px-2.5
div Hello World
```
But that's invalid because that technically means `flex`, `px-2`, and
`5` as classes.
You can use this syntax to get around that:
```pug
div(class="px-2.5")
div Hello World
```
<details>
<summary>Generated HTML</summary>
```html
<div class="px-2.5">
<div>Hello World</div>
</div>
```
</details>
Which means that we can't simply replace `.` with a space, but have to
parse the input. Luckily we only care about strings (and we have a
`StringMachine` for that) and ignore replacing `.` inside of strings.
**Ruby's weird string syntax:**
```ruby
%w[flex underline]
```
This is valid syntax and is shorthand for:
```ruby
["flex", "underline"]
```
Luckily this problem is solved by the running the sub-machines after
each `[` character.
### Performance
**Testing:**
Each machine has a `test_…_performance` test (that is ignored by
default) that allows you to test the throughput of that machine. If you
want to run them, you can use the following command:
```sh
cargo test test_variant_machine_performance --release -- --ignored
```
This will run the test in release mode and allows you to run the ignored
test.
> [!CAUTION]
> This test **_will_** fail, but it will print some output. E.g.:
```
tailwindcss_oxide::extractor::variant_machine::VariantMachine: Throughput: 737.75 MB/s over 0.02s
tailwindcss_oxide::extractor::variant_machine::VariantMachine: Duration: 500ns
```
**Readability:**
One thing to note when looking at the code is that it's not always
written in the cleanest way but we had to make some sacrifices for
performance reasons.
The `input` is of type `&[u8]`, so we are already dealing with bytes.
Luckily, Rust has some nice ergonomics to easily write `b'['` instead of
`0x5b`.
A concrete example where we had to sacrifice readability is the state
machines where we check the `previous`, `current` and `next` character
to make decisions. For a named utility one of the rules is that a `.`
must be preceded by and followed by a digit. This can be written as:
```rs
match (cursor.prev, cursor.curr, cursor.next) {
(b'0'..=b'9', b'.', b'0'..=b'9') => { /* … */ }
_ => { /* … */ }
}
```
But this is not very fast because Rust can't optimize the match
statement very well, especially because we are dealing with tuples
containing 3 values and each value is a `u8`.
To solve this we use some nesting, once we reach `b'.'` only then will
we check for the previous and next characters. We will also early return
in most places. If the previous character is not a digit, there is no
need to check the next character.
**Classification and jump tables:**
Another optimization we did is to classify the characters into a much
smaller `enum` such that Rust _can_ optimize all `match` arms and create
some jump tables behind the scenes.
E.g.:
```rs
#[derive(Debug, Clone, Copy, PartialEq)]
enum Class {
/// ', ", or `
Quote,
/// \
Escape,
/// Whitespace characters
Whitespace,
Other,
}
const CLASS_TABLE: [Class; 256] = {
let mut table = [Class::Other; 256];
macro_rules! set {
($class:expr, $($byte:expr),+ $(,)?) => {
$(table[$byte as usize] = $class;)+
};
}
set!(Class::Quote, b'"', b'\'', b'`');
set!(Class::Escape, b'\\');
set!(Class::Whitespace, b' ', b'\t', b'\n', b'\r', b'\x0C');
table
};
```
There are only 4 values in this enum, so Rust can optimize this very
well. The `CLASS_TABLE` is generated at compile time and must be exactly
256 elements long to fit all `u8` values.
**Inlining**:
Last but not least, sometimes we use functions to abstract some logic.
Luckily Rust will optimize and inline most of the functions
automatically. In some scenarios, explicitly adding a
`#[inline(always)]` improves performance, sometimes it doesn't improve
it at all.
You might notice that in some functions the annotation is added and in
some it's not. Every state machine was tested on its own and whenever
the performance was better with the annotation, it was added.
### Test Plan
1. Each machine has a dedicated set of tests to try and extract the
relevant part for that machine. Most machines don't even check boundary
characters or try to extract nested candidates. So keep that in mind
when adding new tests. Extracting inside of nested `[…]` is only handled
by the outer most `extractor/mod.rs`.
2. The main `extractor/mod.rs` has dedicated tests for recent bug
reports related to missing candidates.
3. You can test each machine's performance if you want to.
There is a chance that this new parser is missing candidates even though
a lot of tests are added and existing tests have been ported.
To double check, we ran the new extractor on our own projects to make
sure we didn't miss anything obvious.
#### Tailwind UI
On Tailwind UI the diff looks like this:
<details>
<summary>diff</summary>
```diff
diff --git a/./main.css b/./pr.css
index d83b0a506..b3dd94a1d 100644
--- a/./main.css
+++ b/./pr.css
@@ -5576,9 +5576,6 @@ @layer utilities {
--tw-saturate: saturate(0%);
filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,);
}
- .\!filter {
- filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,) !important;
- }
.filter {
filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,);
}
```
</details>
The reason `!filter` is gone, is because it was used like this:
```js
getProducts.js
23: if (!filter) return true
```
And right now `(` and `)` are not considered valid boundary characters
for a candidate.
#### Catalyst
On Catalyst, the diff looks like this:
<details>
<summary>diff</summary>
```diff
diff --git a/./main.css b/./pr.css
index 9f8ed129..4aec992e 100644
--- a/./main.css
+++ b/./pr.css
@@ -2105,9 +2105,6 @@
.outline-transparent {
outline-color: transparent;
}
- .filter {
- filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,);
- }
.backdrop-blur-\[6px\] {
--tw-backdrop-blur: blur(6px);
-webkit-backdrop-filter: var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);
@@ -7141,46 +7138,6 @@
inherits: false;
initial-value: solid;
}
-@Property --tw-blur {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-brightness {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-contrast {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-grayscale {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-hue-rotate {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-invert {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-opacity {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-saturate {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-sepia {
- syntax: "*";
- inherits: false;
-}
-@Property --tw-drop-shadow {
- syntax: "*";
- inherits: false;
-}
@Property --tw-backdrop-blur {
syntax: "*";
inherits: false;
```
</details>
The reason for this is that `filter` was only used as a function call:
```tsx
src/app/docs/Code.tsx
31: .filter((x) => x !== null)
```
This was tested on all templates and they all remove a very small amount
of classes that aren't used.
The script to test this looks like this:
```sh
bun --bun ~/github.com/tailwindlabs/tailwindcss/packages/@tailwindcss-cli/src/index.t -- -i ./src/styles/tailwind.css -o pr.css
bun --bun ~/github.com/tailwindlabs/tailwindcss--main/packages/@tailwindcss-cli/src/index.t -- -i ./src/styles/tailwind.css -o main.css
git diff --no-index --patch ./{main,pr}.css
```
This is using git worktrees, so the `pr` branch lives in a `tailwindcss`
folder, and the `main` branch lives in a `tailwindcss--main` folder.
---
### Fixes:
- Fixes: #15616
- Fixes: #16750
- Fixes: #16790
- Fixes: #16801
- Fixes: #16880 (due
to validating the arbitrary property)
---
### Ideas for in the future
1. Right now each machine takes in a `Cursor` object. One potential
improvement we can make is to rely on the `input` on its own instead of
going via the wrapping `Cursor` object.
2. If you take a look at the AST, you'll notice that utilities and
variants have a "root", these are basically prefixes of each available
utility and/or variant. We can use this information to filter out
candidates and bail out early if we know that a certain candidate will
never produce a valid class.
3. Passthrough the `prefix` information. Everything that doesn't start
with `tw:` can be skipped.
### Design decisions that didn't make it
Once you reach this part, you can stop reading if you want to, but this
is more like a brain dump of the things we tried and didn't work out.
Wanted to include them as a reference in case we want to look back at
this issue and know _why_ certain things are implemented the way they
are.
#### One character at a time
In an earlier implementation, the state machines were pure state
machines where the `next()` function was called on every single
character of the input. This had a lot of overhead because for every
character we had to:
1. Ask the `CandidateMachine` which state it was in.
2. Check the `cursor.curr` (and potentially the `cursor.prev` and
`cursor.next`) character.
3. If we were in a state where a nested state machine was running, we
had to check its current state as well and so on.
4. Once we did all of that we could go to the next character.
In this approach, the `MachineState` looked like this instead:
```rs
enum MachineState {
Idle,
Parsing,
Done(Span)
}
```
This had its own set of problems because now it's very hard to know
whether we are done or not.
```html
<div class="hover:flex"></div>
<!-- ^ -->
```
Let's look at the current position in the example above. At this point,
it's both a valid variant and valid utility, so there was a lot of
additional state we had to track to know whether we were done or not.
#### `Span` stitching
Another approach we tried was to just collect all valid variants and
utilities and throw them in a big `Vec<Span>`. This reduced the amount
of additional state to track and we could track a span the moment we saw
a `MachineState::Done(span)`.
The next thing we had to do was to make sure that:
1. Covered spans were removed. We still do this part in the current
implementation.
2. Combine all touching variant spans (where `span_a.end + 1 ==
span_b.start`).
3. For every combined variant span, find a corresponding utility span.
- If there is no utility span, the candidate is invalid.
- If there are multiple candidate spans (this is in theory not possible
because we dropped covered spans)
- If there is a candidate _but_ it is attached to another set of spans,
then the candidate is invalid. E.g.: `flex!block`
4. All left-over utility spans are candidates without variants.
This approach was slow, and still a bit hard to reason about.
#### Matching on tuples
While matching against the `prev`, `curr` and `next` characters was very
readable and easy to reason about. It was not very fast. Unfortunately
had to abandon this approach in favor of a more optimized approach.
In a perfect world, we would still write it this way, but have some
compile time macro that would optimize this for us.
#### Matching against `b'…'` instead of classification and jump tables
Similar to the previous point, while this is better for readability,
it's not fast enough. The jump tables are much faster.
Luckily for us, each machine has it's own set of rules and context, so
it's much easier to reason about a single problem and optimize a single
machine.
[^candidate]: A candidate is what a potential Tailwind CSS class _could_
be. It's a candidate because at this stage we don't know if it will
actually produce something but it looks like it could be a valid class.
E.g.: `hover:bg-red-500` is a candidate, but it will only produce
something if `--color-red-500` is defined in your theme.
---------
Co-authored-by: Jordan Pittman <jordan@cryptica.me>
Co-authored-by: Philipp Spiess <hello@philippspiess.com>1 parent 781fb73 commit b3c2556
File tree
33 files changed
+5685
-1861
lines changed- crates
- node
- src
- oxide
- src
- extractor
- pre_processors
- fixtures
- tests
33 files changed
+5685
-1861
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
| 11 | + | |
| 12 | + | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | | - | |
34 | | - | |
35 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
36 | 45 | | |
| 46 | + | |
| 47 | + | |
37 | 48 | | |
38 | 49 | | |
39 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | 44 | | |
49 | 45 | | |
50 | 46 | | |
51 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
52 | 72 | | |
53 | 73 | | |
54 | 74 | | |
| |||
57 | 77 | | |
58 | 78 | | |
59 | 79 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
67 | 83 | | |
68 | 84 | | |
69 | 85 | | |
| |||
139 | 155 | | |
140 | 156 | | |
141 | 157 | | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | 158 | | |
159 | 159 | | |
0 commit comments