|
10 | 10 |
|
11 | 11 |
|
12 | 12 | def codegraph(source_code, lang = "guess", analyses = None, **kwargs): |
| 13 | + """ |
| 14 | + Transforms source code into an annotated AST. |
| 15 | +
|
| 16 | + Given source code as string, this function quickly transforms |
| 17 | + the given code into an annotated AST. The AST is annotated with multiple |
| 18 | + (configurable) relations like control flow and data flow. |
| 19 | + The function uses tree-sitter as a backend. Therefore, this |
| 20 | + function can in theory support most programming languages (see README). |
| 21 | + However, since control flow and data flow have to be tailored to a specific |
| 22 | + language only Java and Python are supported at the moment. |
| 23 | +
|
| 24 | + All transformations are based on the transformations used in |
| 25 | + 'Self-Supervised Bug Detection and Repair' (Allamanis et al., 2021). |
| 26 | + The original implementation for Python can be found here: |
| 27 | + https://github.com/microsoft/neurips21-self-supervised-bug-detection-and-repair |
| 28 | + Note that interprocedural analysis (and relations) are currently not supported. |
| 29 | +
|
| 30 | +
|
| 31 | + Parameters |
| 32 | + ---------- |
| 33 | + source_code : str |
| 34 | + Source code to parsed as a string. Also |
| 35 | + supports parsing of incomplete source code |
| 36 | + snippets (by deactivating the syntax checker; see syntax_error) |
| 37 | + |
| 38 | + lang : [python, java] |
| 39 | + String identifier of the programming language |
| 40 | + to be parsed. Supported are most programming languages |
| 41 | + including python, java and javascript (see README) |
| 42 | + Default: guess (Guesses language / Not supported currently throws error currently) |
| 43 | + |
| 44 | + analyses: list of [ast, cfg, dataflow, subcfg] |
| 45 | + The analyses that should be applied during parsing the source code and |
| 46 | + the relations included the output. |
| 47 | + ast: Include relations based on the abstract syntax tree (the AST is always computed) |
| 48 | + cfg: Relations related to the control flow in the program (on a statement level) |
| 49 | + dataflow: Relations related to the data flow between variables |
| 50 | + subcfg: Relations related to the control flow (on a subexpression level) |
| 51 | + |
| 52 | + syntax_error : [raise, warn, ignore] |
| 53 | + Reaction to syntax error in code snippet. |
| 54 | + raise: raises a Syntax Error |
| 55 | + warn: prints a warning to console |
| 56 | + ignore: Ignores syntax errors. Helpful for parsing code snippets. |
| 57 | + Default: raise |
| 58 | +
|
| 59 | + Returns |
| 60 | + ------- |
| 61 | + SourceCodeGraph |
| 62 | + A labelled multi graph representing the given source code |
| 63 | + """ |
| 64 | + |
13 | 65 | root_node, tokens = preprocess_code(source_code, lang, **kwargs) |
14 | 66 |
|
15 | 67 | graph_analyses = load_lang_analyses(tokens[0].config.lang) |
|
0 commit comments