Commit 53e92e4
authored
Reland: [clang][test] add testing for the AST matcher reference (#112168)
## Problem Statement
Previously, the examples in the AST matcher reference, which gets
generated by the Doxygen comments in `ASTMatchers.h`, were untested and
best effort.
Some of the matchers had no or wrong examples of how to use the matcher.
## Solution
This patch introduces a simple DSL around Doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy to use.
In `ASTMatchers.h`, most matchers are documented with a Doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that the documentation is tested, is by using Doxygen's alias
feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple
words). Using the Doxygen aliases is the obvious choice, because there
are (now) four consumers:
- people reading the header/using signature help
- the Doxygen generated documentation
- the generated HTML AST matcher reference
- (new) the generated matcher tests
This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a Doxygen comment)
and provides diagnostics and statistics about the matchers.
The current statistics emitted by the parser are:
```text
Statistics:
doxygen_blocks : 519
missing_tests : 10
skipped_objc : 42
code_snippets : 503
matches : 820
matchers : 580
tested_matchers : 574
none_type_matchers : 6
```
The tests are generated during building, and the script will only print
something if it found an issue with the specified tests (e.g., missing
tests).
## Description
DSL for generating the tests from documentation.
TLDR:
```
\header{a.h}
\endheader <- zero or more header
\code
int a = 42;
\endcode
\compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
whole languages
\matcher{expr()} <- one or more matchers in succession
\match{42} <- one or more matches in succession
\matcher{varDecl()} <- new matcher resets the context, the above
\match will not count for this new
matcher(-group)
\match{int a = 42} <- only applies to the previous matcher (not to the
previous case)
```
The above block can be repeated inside a Doxygen command for multiple
code examples for a single matcher.
The test generation script will only look for these annotations and
ignore anything else like `\c` or the sentences where these annotations
are embedded into: `The matcher \matcher{expr()} matches the number
\match{42}.`.
### Language Grammar
[] denotes an optional, and <> denotes user-input
```
compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
matcher_tag_key ::= type
match_tag_key ::= type || std || count || sub
matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
matcher ::= \matcher{[matcher_tags$]<matcher>}
matchers ::= [matcher] matcher
match ::= \match{[match_tags$]<match>}
matches ::= [match] match
case ::= matchers matches
cases ::= [case] case
header-block ::= \header{<name>} <code> \endheader
code-block ::= \code <code> \endcode
testcase ::= code-block [compile_args] cases
```
### Language Standard Versions
The 'std' tag and '\compile_args' support specifying a specific language
version, a whole language and all of its versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' separator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the
matcher. Predicates for the 'std' compiler flag are used with
disjunction between languages (e.g. 'c || c++') and conjunction for all
predicates specific to each language (e.g. 'c++11-or-later &&
c++23-or-earlier').
Examples:
- `c` all available versions of C
- `c++11` only C++11
- `c++11-or-later` C++11 or later
- `c++11-or-earlier` C++11 or earlier
- `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and
23 (inclusive)
- `c++11-23,c` same as above
### Tags
#### `type`:
**Match types** are used to select where the string that is used to
check if a node matches comes from.
Available: `code`, `name`, `typestr`, `typeofstr`. The default is
`code`.
- `code`: Forwards to `tooling::fixit::getText(...)` and should be the
preferred way to show what matches.
- `name`: Casts the match to a `NamedDecl` and returns the result of
`getNameAsString`. Useful when the matched AST node is not easy to spell
out (`code` type), e.g., namespaces or classes with many members.
- `typestr`: Returns the result of `QualType::getAsString` for the type
derived from `Type` (otherwise, if it is derived from `Decl`, recurses
with `Node->getTypeForDecl()`)
**Matcher types** are used to mark matchers as sub-matcher with 'sub' or
as deactivated using 'none'. Testing sub-matcher is not implemented.
#### `count`:
Specifying a 'count=n' on a match will result in a test that requires
that the specified match will be matched n times. Default is 1.
#### `std`:
A match allows specifying if it matches only in specific language
versions. This may be needed when the AST differs between language
versions.
#### `sub`:
The `sub` tag on a `\match` will indicate that the match is for a node
of a bound sub-matcher.
E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that
binds to `inner`, which is the value for the `sub` tag of the expected
match for the sub-matcher `\match{sub=inner$...}`. Currently,
sub-matchers are not tested in any way.
### What if ...?
#### ... I want to add a matcher?
Add a Doxygen comment to the matcher with a code example, corresponding
matchers and matches, that shows what the matcher is supposed to do.
Specify the compile arguments/supported languages if required, and run
`ninja check-clang-unit` to test the documentation.
#### ... the example I wrote is wrong?
The test-failure output of the generated test file will provide
information about
- where the generated test file is located
- which line in `ASTMatcher.h` the example is from
- which matches were: found, not-(yet)-found, expected
- in case of an unexpected match: what the node looks like using the
different `type`s
- the language version and if the test ran with a windows `-target` flag
(also in failure summary)
#### ... I don't adhere to the required order of the syntax?
The script will diagnose any found issues, such as `matcher is missing
an example` with a `file:line:` prefix,
which should provide enough information about the issue.
#### ... the script diagnoses a false-positive issue with a Doxygen
comment?
It hopefully shouldn't, but if you, e.g., added some non-matcher code
and documented it with Doxygen, then the script will consider that as a
matcher documentation. As a result, the script will print that it
detected a mismatch between the actual and the expected number of
failures. If the diagnostic truly is a false-positive, change the
`expected_failure_statistics` at the top of the
`generate_ast_matcher_doc_tests.py` file.
Fixes #57607
Fixes #637481 parent 469f9d5 commit 53e92e4
File tree
8 files changed
+11406
-3942
lines changed- clang
- docs
- tools
- include/clang/ASTMatchers
- unittests/ASTMatchers
- utils
8 files changed
+11406
-3942
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
847 | 847 | | |
848 | 848 | | |
849 | 849 | | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
850 | 853 | | |
851 | 854 | | |
852 | 855 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
224 | 231 | | |
225 | 232 | | |
226 | 233 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
103 | 125 | | |
104 | 126 | | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | 127 | | |
109 | 128 | | |
110 | 129 | | |
111 | 130 | | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
112 | 169 | | |
113 | 170 | | |
114 | 171 | | |
| |||
191 | 248 | | |
192 | 249 | | |
193 | 250 | | |
194 | | - | |
195 | | - | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
196 | 254 | | |
197 | 255 | | |
198 | 256 | | |
| |||
0 commit comments