Clang JSON parser

Hi Jonathan,

I've been looking into the clang JSON output you mentioned in the [standardese issue #195](https://github.com/standardese/standardese/issues/195) for the last couple of days. Since I have no glue how deep your own research to this topic has gone, I just wanted to share the experiences I've made. 

## Limitations

### Preprocessor

The clang JSON output is produced after the preprocessor has run, so we won't be able to create entities like `cpp_macro_definition`, `cpp_macro_parameter` and `cpp_include_directive` unfortunately. I've tried to find a tool from the LLVM ecosystem that provides more output concerning the processor, but was unable to find anything useful. The `pp-trace` looked promising, but unfortunately running `pp-trace-12 --callbacks='*' /path/to/file.hpp` did not generate anything useful.

### Nested Namespaces

Each namespace shows up as it's own entity in the JSON document, so we would need to match the line numbers to see if it is a nested namespace declaration.

### Semantic Parent

Similar to the namespace above, we would need to track this manually. I'm not familiar enough with libclang yet to tell if this is going to be a minor or major challenge for the JSON parser.

## JSON conformance

Both Boost.JSON and simdjson follow the JSON standard exactly. This means there is a nesting limit of 32 objects. Boost.JSON has parsing options to change this behavior, I haven't found those settings for simdjson yet, so I'm not sure if they can be changed. I've created a couple of test source files to see how big the resulting .json files get. When you include all STL headers and run the command: `clang++ -Xclang -ast-dump=json -std=c++17 stl_headers.hpp > test.json` the resulting file is **550mb**. To successfully parse the `etl/string.hpp` header that is mentioned in the standardese issues I had to up the nesting limit before parsing.


## Results So Far

I've successfully parsed `cpp_enum` & `cpp_enum_value` entities using the Boost.JSON parser. I definitely need to do more tests before I can tell how much work the complete parser is going to be. Switching to simdjson once I really start working on pull requests should not be an issue. Both libraries seem to provide similar APIs. I used Boost for the testing, because I've been working with it the last couple of months. I think simdjson is a very valid option as well, I don't really have any preferences, but I think the selection should be based on performance metrics first, since the files that need to be parsed can get huge. 


Toby

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clang JSON parser #120

Limitations

Preprocessor

Nested Namespaces

Semantic Parent

JSON conformance

Results So Far

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clang JSON parser #120

Description

Limitations

Preprocessor

Nested Namespaces

Semantic Parent

JSON conformance

Results So Far

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions