Skip to content

Commit 4a8a036

Browse files
committed
Experiment: Reactive analysis with skip-lite CMT cache
Vendor skip-lite library and integrate reactive analysis capabilities: - Vendor skip-lite marshal_cache and reactive_file_collection modules - Modify C++ code to handle ReScript CMT file format (CMI+CMT headers) - Add CmtCache module for mmap-based CMT file reading - Add ReactiveAnalysis module for incremental file processing - Add CLI flags: -cmt-cache, -reactive, -runs - Add README.md with usage and benchmark instructions Benchmark results (~5000 files): - Standard: CMT processing 0.78s, Total 1.01s - Reactive (warm): CMT processing 0.01s, Total 0.20s - Speedup: 74x for CMT processing, 5x total The reactive mode caches processed file_data and uses read_cmt_if_changed to skip unchanged files entirely on subsequent runs.
1 parent c3cbe6e commit 4a8a036

File tree

19 files changed

+2286
-24
lines changed

19 files changed

+2286
-24
lines changed

analysis/reanalyze/README.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Reanalyze
2+
3+
Dead code analysis and other experimental analyses for ReScript.
4+
5+
## Analyses
6+
7+
- **Dead Code Elimination (DCE)** - Detect unused values, types, and modules
8+
- **Exception Analysis** - Track potential exceptions through call chains
9+
- **Termination Analysis** - Experimental analysis for detecting non-terminating functions
10+
11+
## Usage
12+
13+
```bash
14+
# Run DCE analysis on current project (reads rescript.json)
15+
rescript-editor-analysis reanalyze -config
16+
17+
# Run DCE analysis on specific CMT directory
18+
rescript-editor-analysis reanalyze -dce-cmt path/to/lib/bs
19+
20+
# Run all analyses
21+
rescript-editor-analysis reanalyze -all
22+
```
23+
24+
## Performance Options
25+
26+
### Parallel Processing
27+
28+
Use multiple CPU cores for faster analysis:
29+
30+
```bash
31+
# Use 4 parallel domains
32+
reanalyze -config -parallel 4
33+
34+
# Auto-detect number of cores
35+
reanalyze -config -parallel -1
36+
```
37+
38+
### CMT Cache (Experimental)
39+
40+
Use memory-mapped cache for CMT file reading:
41+
42+
```bash
43+
reanalyze -config -cmt-cache
44+
```
45+
46+
### Reactive Mode (Experimental)
47+
48+
Cache processed file data and skip unchanged files on subsequent runs:
49+
50+
```bash
51+
reanalyze -config -reactive
52+
```
53+
54+
This provides significant speedup for repeated analysis (e.g., in a watch mode or service):
55+
56+
| Mode | CMT Processing | Total | Speedup |
57+
|------|----------------|-------|---------|
58+
| Standard | 0.78s | 1.01s | 1x |
59+
| Reactive (warm) | 0.01s | 0.20s | 5x |
60+
61+
### Benchmarking
62+
63+
Run analysis multiple times to measure cache effectiveness:
64+
65+
```bash
66+
reanalyze -config -reactive -timing -runs 3
67+
```
68+
69+
## CLI Flags
70+
71+
| Flag | Description |
72+
|------|-------------|
73+
| `-config` | Read analysis mode from rescript.json |
74+
| `-dce` | Run dead code analysis |
75+
| `-exception` | Run exception analysis |
76+
| `-termination` | Run termination analysis |
77+
| `-all` | Run all analyses |
78+
| `-parallel n` | Use n parallel domains (0=sequential, -1=auto) |
79+
| `-cmt-cache` | Use mmap cache for CMT files |
80+
| `-reactive` | Cache processed file_data, skip unchanged files |
81+
| `-runs n` | Run analysis n times (for benchmarking) |
82+
| `-timing` | Report timing of analysis phases |
83+
| `-debug` | Print debug information |
84+
| `-json` | Output in JSON format |
85+
| `-ci` | Internal flag for CI mode |
86+
87+
## Architecture
88+
89+
See [ARCHITECTURE.md](ARCHITECTURE.md) for details on the analysis pipeline.
90+
91+
The DCE analysis is structured as a pure pipeline:
92+
93+
1. **MAP** - Process each `.cmt` file independently → per-file data
94+
2. **MERGE** - Combine all per-file data → project-wide view
95+
3. **SOLVE** - Compute dead/live status → issues
96+
4. **REPORT** - Output issues
97+
98+
This design enables order-independence, parallelization, and incremental updates.
99+
100+
## Reactive Analysis
101+
102+
The reactive mode (`-reactive`) uses skip-lite's Marshal_cache to efficiently detect file changes:
103+
104+
1. **First run**: All files are processed and results cached
105+
2. **Subsequent runs**: Only changed files are re-processed
106+
3. **Unchanged files**: Return cached `file_data` immediately (no I/O or unmarshalling)
107+
108+
This is the foundation for a persistent analysis service that can respond to file changes in milliseconds.
109+
110+
## Development
111+
112+
### Testing
113+
114+
```bash
115+
# Run reanalyze tests
116+
make test-reanalyze
117+
118+
# Run with shuffled file order (order-independence test)
119+
make test-reanalyze-order-independence
120+
121+
# Run parallel mode test
122+
make test-reanalyze-parallel
123+
```
124+
125+
### Benchmarking
126+
127+
The benchmark project generates ~5000 files to measure analysis performance:
128+
129+
```bash
130+
cd tests/analysis_tests/tests-reanalyze/deadcode-benchmark
131+
132+
# Generate files, build, and run sequential vs parallel benchmark
133+
make benchmark
134+
135+
# Compare CMT cache effectiveness (cold vs warm)
136+
make time-cache
137+
138+
# Benchmark reactive mode (shows speedup on repeated runs)
139+
make time-reactive
140+
```
141+
142+
#### Reactive Benchmark
143+
144+
The `make time-reactive` target runs:
145+
146+
1. **Standard mode** (baseline) - Full analysis every time
147+
2. **Reactive mode** with 3 runs - First run is cold (processes all files), subsequent runs are warm (skip unchanged files)
148+
149+
Example output:
150+
151+
```
152+
=== Reactive mode benchmark ===
153+
154+
Standard (baseline):
155+
CMT processing: 0.78s
156+
Total: 1.01s
157+
158+
Reactive mode (3 runs):
159+
=== Run 1/3 ===
160+
CMT processing: 0.78s
161+
Total: 1.02s
162+
=== Run 2/3 ===
163+
CMT processing: 0.01s <-- 74x faster
164+
Total: 0.20s <-- 5x faster
165+
=== Run 3/3 ===
166+
CMT processing: 0.01s
167+
Total: 0.20s
168+
```
169+

analysis/reanalyze/src/Cli.ml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,12 @@ let parallel = ref 0
2727

2828
(* timing: report internal timing of analysis phases *)
2929
let timing = ref false
30+
31+
(* use mmap cache for CMT files *)
32+
let cmtCache = ref false
33+
34+
(* use reactive/incremental analysis (caches processed file_data) *)
35+
let reactive = ref false
36+
37+
(* number of analysis runs (for benchmarking reactive mode) *)
38+
let runs = ref 1

analysis/reanalyze/src/CmtCache.ml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
(** CMT file cache using Marshal_cache for efficient mmap-based reading.
2+
3+
This module provides cached reading of CMT files with automatic
4+
invalidation when files change on disk. It's used to speed up
5+
repeated analysis runs by avoiding re-reading unchanged files. *)
6+
7+
[@@@alert "-unsafe"]
8+
9+
(** Read a CMT file, using the mmap cache for efficiency.
10+
The file is memory-mapped and the cache automatically detects
11+
when the file changes on disk. *)
12+
let read_cmt path : Cmt_format.cmt_infos =
13+
Marshal_cache.with_unmarshalled_file path Fun.id
14+
15+
(** Read a CMT file only if it changed since the last access.
16+
Returns [Some cmt_infos] if the file changed (or first access),
17+
[None] if the file is unchanged.
18+
19+
This is the key function for incremental analysis - unchanged
20+
files return [None] immediately without any unmarshalling. *)
21+
let read_cmt_if_changed path : Cmt_format.cmt_infos option =
22+
Marshal_cache.with_unmarshalled_if_changed path Fun.id
23+
24+
(** Clear the CMT cache, unmapping all memory.
25+
Useful for testing or to free memory. *)
26+
let clear () = Marshal_cache.clear ()
27+
28+
(** Invalidate a specific path in the cache.
29+
The next read will re-load the file from disk. *)
30+
let invalidate path = Marshal_cache.invalidate path
31+
32+
(** Cache statistics *)
33+
type stats = {
34+
entry_count: int;
35+
mapped_bytes: int;
36+
}
37+
38+
(** Get cache statistics *)
39+
let stats () : stats =
40+
let s = Marshal_cache.stats () in
41+
{ entry_count = s.entry_count; mapped_bytes = s.mapped_bytes }
42+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
(** CMT file cache using Marshal_cache for efficient mmap-based reading.
2+
3+
This module provides cached reading of CMT files with automatic
4+
invalidation when files change on disk. *)
5+
6+
val read_cmt : string -> Cmt_format.cmt_infos
7+
(** Read a CMT file, using the mmap cache for efficiency. *)
8+
9+
val read_cmt_if_changed : string -> Cmt_format.cmt_infos option
10+
(** Read a CMT file only if it changed since the last access.
11+
Returns [Some cmt_infos] if the file changed (or first access),
12+
[None] if the file is unchanged. *)
13+
14+
val clear : unit -> unit
15+
(** Clear the CMT cache, unmapping all memory. *)
16+
17+
val invalidate : string -> unit
18+
(** Invalidate a specific path in the cache. *)
19+
20+
type stats = {
21+
entry_count: int;
22+
mapped_bytes: int;
23+
}
24+
(** Cache statistics *)
25+
26+
val stats : unit -> stats
27+
(** Get cache statistics *)
28+

0 commit comments

Comments
 (0)