|
| 1 | +--- |
| 2 | +oncalls: |
| 3 | + - executorch |
| 4 | +llms-gk: devmate_executorch_cadence_md |
| 5 | +apply_to_regex: ".*" |
| 6 | +--- |
| 7 | + |
| 8 | +# ExecuTorch: On-Device AI Inference Runtime |
| 9 | + |
| 10 | +## Project Description |
| 11 | + |
| 12 | +ExecuTorch is Meta's end-to-end solution for enabling on-device AI inference across mobile and edge devices including wearables, embedded systems, and microcontrollers. It extends PyTorch 2's compiler and export functionality to efficiently deploy models to edge devices with minimal memory footprint and superior performance. ExecuTorch powers Meta's on-device AI experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta Smart Glasses, and WhatsApp. |
| 13 | + |
| 14 | +**Key Value Propositions:** |
| 15 | +- **Portability**: Compatible across platforms from high-end mobile to constrained embedded systems |
| 16 | +- **Productivity**: Unified toolchain from PyTorch authoring through deployment |
| 17 | +- **Performance**: Lightweight runtime (<50kB core) with full hardware acceleration support |
| 18 | + |
| 19 | +**Supported Models**: LLMs (Llama, Qwen, Phi), Computer Vision, ASR, TTS, and multimodal models |
| 20 | + |
| 21 | +## Architecture Overview |
| 22 | + |
| 23 | +### Three-Phase Workflow |
| 24 | +1. **Program Preparation (AOT)**: Model export, quantization, memory planning, backend delegation |
| 25 | +2. **Runtime Preparation**: Program loading and initialization |
| 26 | +3. **Program Execution**: Kernel dispatch and inference |
| 27 | + |
| 28 | +### Four-Dialect Lowering Pipeline |
| 29 | +``` |
| 30 | +PyTorch Model → torch.export() |
| 31 | + ↓ ATen Dialect (PyTorch-compatible operators) |
| 32 | + ↓ Core ATen Dialect (Decomposed fundamental ops) |
| 33 | + ↓ Edge Dialect (Edge-optimized, dtype-specialized) |
| 34 | + ↓ Backend Dialect (Hardware-specific optimizations) |
| 35 | + ↓ .pte file (Serialized execution graph) |
| 36 | +``` |
| 37 | + |
| 38 | +### Core Components |
| 39 | +- **Runtime (`xplat/executorch/runtime/`)**: Core C++ execution engine, tensor data structures, operator dispatch |
| 40 | +- **EXIR (`xplat/executorch/exir/`)**: Export IR system, four-dialect compilation pipeline |
| 41 | +- **Kernels (`xplat/executorch/kernels/`)**: Portable, optimized, and quantized operator implementations |
| 42 | +- **Backends (`xplat/executorch/backends/`)**: Hardware acceleration (Apple CoreML/MPS, Qualcomm QNN, ARM Ethos-U, Vulkan, XNNPACK, etc.) |
| 43 | +- **Extensions (`xplat/executorch/extension/`)**: LLM framework, training, module wrappers, platform integration |
| 44 | +- **DevTools (`xplat/executorch/devtools/`)**: ETDump profiling, ETRecord debugging, inspector APIs |
| 45 | + |
| 46 | +## Directory Structure & Dependencies |
| 47 | + |
| 48 | +``` |
| 49 | +xplat/executorch/ |
| 50 | +├── runtime/ # Core C++ runtime (<50kB, portable) |
| 51 | +│ ├── core/ # Fundamental types (Tensor, EValue, Error) |
| 52 | +│ ├── executor/ # Program loading and execution |
| 53 | +│ ├── kernel/ # Operator registration and dispatch |
| 54 | +│ ├── backend/ # Backend delegate APIs |
| 55 | +│ └── platform/ # Platform abstraction layer (PAL) |
| 56 | +├── exir/ # Export IR compiler (4-dialect pipeline) |
| 57 | +├── kernels/ # Operator implementations |
| 58 | +│ ├── portable/ # Pure C++ reference kernels |
| 59 | +│ ├── optimized/ # Hardware-optimized versions |
| 60 | +│ └── quantized/ # Quantization kernels |
| 61 | +├── backends/ # Hardware acceleration (15+ backends) |
| 62 | +│ ├── apple/ # CoreML, MPS |
| 63 | +│ ├── qualcomm/ # QNN (HTP/NPU) |
| 64 | +│ ├── arm/ # Ethos-U, TOSA |
| 65 | +│ ├── xnnpack/ # Optimized CPU |
| 66 | +│ └── vulkan/ # Cross-platform GPU |
| 67 | +├── extension/ # Runtime extensions |
| 68 | +│ ├── llm/ # LLM tokenizers, samplers, KV cache |
| 69 | +│ ├── module/ # Simplified C++ APIs |
| 70 | +│ └── android/ios/ # Platform integration |
| 71 | +├── devtools/ # Profiling and debugging tools |
| 72 | +├── examples/ # Model export and usage examples |
| 73 | +├── schema/ # FlatBuffer file format definitions |
| 74 | +└── tools/ # Build system integration |
| 75 | +``` |
| 76 | + |
| 77 | +**Key Dependencies:** |
| 78 | +- **Internal**: PyTorch 2.10.0 (pinned via `torch_pin.py`) |
| 79 | +- **External**: FlatBuffers, numpy>=2.0.0, pybind11, CMake>=3.29 |
| 80 | +- **Platform-Specific**: coremltools (Apple), QNN SDK (Qualcomm), ARM Compute Library |
| 81 | + |
| 82 | +**Build Systems**: |
| 83 | +- Buck2 (internal Meta builds) |
| 84 | +- CMake (OSS builds, primary) |
| 85 | +- Python setuptools (pip packages) |
| 86 | + |
| 87 | +## Build and Development Workflow |
| 88 | + |
| 89 | +### Quick Start |
| 90 | +```bash |
| 91 | +# Clone and setup |
| 92 | +git clone -b viable/strict https://github.com/pytorch/executorch.git |
| 93 | +cd executorch |
| 94 | +conda create -yn executorch python=3.10.0 && conda activate executorch |
| 95 | + |
| 96 | +# Install in development mode |
| 97 | +./install_executorch.sh --editable |
| 98 | + |
| 99 | +# Run tests |
| 100 | +pytest |
| 101 | +``` |
| 102 | + |
| 103 | +### CMake Build Options |
| 104 | +```bash |
| 105 | +# Platform-specific builds (via presets) |
| 106 | +cmake .. --preset android-arm64-v8a # Android |
| 107 | +cmake .. --preset ios # iOS |
| 108 | +cmake .. --preset macos # macOS |
| 109 | +cmake .. --preset linux # Linux |
| 110 | + |
| 111 | +# Feature flags |
| 112 | +-DEXECUTORCH_BUILD_XNNPACK=ON # XNNPACK backend |
| 113 | +-DEXECUTORCH_BUILD_COREML=ON # CoreML backend |
| 114 | +-DEXECUTORCH_BUILD_EXTENSION_LLM=ON # LLM support |
| 115 | +-DEXECUTORCH_BUILD_TESTS=ON # Build tests |
| 116 | +-DEXECUTORCH_ENABLE_LOGGING=ON # Runtime logging |
| 117 | +-DCMAKE_BUILD_TYPE=Release # Release build (required for perf) |
| 118 | + |
| 119 | +# Build and run |
| 120 | +cmake --build cmake-out -j9 |
| 121 | +cmake --build cmake-out --target test |
| 122 | +``` |
| 123 | + |
| 124 | +### Export and Run Example |
| 125 | +```bash |
| 126 | +# Export model |
| 127 | +python -m examples.portable.scripts.export --model_name="add" |
| 128 | + |
| 129 | +# Execute |
| 130 | +./cmake-out/executor_runner --model_path add.pte |
| 131 | +``` |
| 132 | + |
| 133 | +### Build Scripts |
| 134 | +- `xplat/executorch/scripts/build_android_library.sh` - Android native builds |
| 135 | +- `xplat/executorch/scripts/build_apple_frameworks.sh` - iOS/macOS frameworks |
| 136 | +- `xplat/executorch/install_executorch.sh` - Main installation script |
| 137 | + |
| 138 | +### Cleaning and Rebuilding |
| 139 | +```bash |
| 140 | +./install_executorch.sh --clean |
| 141 | +git submodule sync && git submodule update --init --recursive |
| 142 | +``` |
| 143 | + |
| 144 | +## Documentation References |
| 145 | + |
| 146 | +**Core Documentation** (`xplat/executorch/docs/`): |
| 147 | +- Architecture guides and design documents |
| 148 | +- Backend integration tutorials |
| 149 | +- Platform-specific deployment guides |
| 150 | +- API references and examples |
| 151 | + |
| 152 | +**README Files**: |
| 153 | +- `xplat/executorch/README.md` - Project overview and getting started |
| 154 | +- `xplat/executorch/CONTRIBUTING.md` - Contribution guidelines |
| 155 | +- Backend-specific READMEs in `xplat/executorch/backends/*/README.md` |
| 156 | + |
| 157 | +**Internal Resources**: |
| 158 | +- Workplace groups and internal wikis for Meta-specific deployment patterns |
| 159 | +- Oncall: `executorch` (check TARGETS files for component-specific oncalls) |
| 160 | + |
| 161 | +## Testing Strategy |
| 162 | + |
| 163 | +### Test Frameworks |
| 164 | +- **Python**: pytest with hypothesis for property-based testing |
| 165 | +- **C++**: GoogleTest for unit tests |
| 166 | +- **Backend Testing**: Modular test suite in `xplat/executorch/backends/test/` |
| 167 | + |
| 168 | +### Test Organization |
| 169 | +``` |
| 170 | +xplat/executorch/test/ # Core runtime tests |
| 171 | +xplat/executorch/backends/test/ # Backend validation suite |
| 172 | +xplat/executorch/examples/models/test/ # End-to-end model tests |
| 173 | +``` |
| 174 | + |
| 175 | +### Testing Patterns |
| 176 | +- **Unit Tests**: Individual operator and component validation |
| 177 | +- **Integration Tests**: End-to-end model export and execution |
| 178 | +- **Backend Tests**: Stage-based pipeline (Export → Quantize → ToEdge → Partition → Serialize) |
| 179 | +- **Numerical Validation**: Configurable tolerance (atol=1e-1, rtol=4e-2 for backends) |
| 180 | +- **Performance Testing**: ETDump profiling with timing and memory metrics |
| 181 | + |
| 182 | +### Running Tests |
| 183 | +```bash |
| 184 | +# Python tests |
| 185 | +pytest # All tests |
| 186 | +pytest backends/xnnpack/test # Specific backend |
| 187 | +pytest -k "test_name" # Specific test |
| 188 | + |
| 189 | +# C++ tests |
| 190 | +cmake --build cmake-out --target test |
| 191 | + |
| 192 | +# Backend validation |
| 193 | +python -m backends.test.test_runner |
| 194 | +``` |
| 195 | + |
| 196 | +### Test Configuration |
| 197 | +- `xplat/executorch/pytest.ini` - pytest configuration with flake detection (50 reruns) |
| 198 | +- `xplat/executorch/Test.cmake` - C++ test definitions |
| 199 | +- CI/CD in `xplat/executorch/oss/.github/workflows/` - Multi-platform CI validation |
| 200 | + |
| 201 | +## Integration Points |
| 202 | + |
| 203 | +### PyTorch Integration |
| 204 | +- **torch.export()**: Primary model capture mechanism |
| 205 | +- **ATen Operators**: Compatible with PyTorch operator set |
| 206 | +- **Python Bindings**: `executorch.runtime` module for torch tensor integration |
| 207 | +- **Tensor Parsers**: Support for ATen, exec_aten, and portable tensor modes |
| 208 | + |
| 209 | +### Backend Delegation |
| 210 | +- **BackendInterface**: Standard contract for backend implementations |
| 211 | +- **Delegate API**: `executorch_call_delegate` for offloading subgraphs |
| 212 | +- **Partitioning**: Automatic or manual delegation of operations to accelerators |
| 213 | +- **External Tensors**: Support for backend-managed memory |
| 214 | + |
| 215 | +### Platform Integration |
| 216 | +- **Android**: JNI bindings in `xplat/executorch/extension/android/` |
| 217 | +- **iOS**: Swift package and frameworks in `xplat/executorch/extension/apple/` |
| 218 | +- **Embedded**: Zephyr RTOS support, bare-metal ARM builds |
| 219 | +- **Cross-Platform**: Vulkan for GPU acceleration across desktop and mobile |
| 220 | + |
| 221 | +### Data Flow |
| 222 | +``` |
| 223 | +PyTorch Model (Python) |
| 224 | + ↓ torch.export() |
| 225 | +EXIR Graph |
| 226 | + ↓ Backend Partitioning |
| 227 | +Delegated Subgraphs + Portable Ops |
| 228 | + ↓ Serialization |
| 229 | +.pte File (FlatBuffer) |
| 230 | + ↓ Runtime Loading |
| 231 | +C++ Runtime |
| 232 | + ↓ Execution |
| 233 | +CPU/GPU/NPU Hardware |
| 234 | +``` |
| 235 | + |
| 236 | +## Developer Rules |
| 237 | + |
| 238 | +### Portable C++ Programming |
| 239 | +**CRITICAL for `xplat/executorch/runtime/` code:** |
| 240 | +- **No stdlib dependencies**: No `std::vector`, `std::string`, `std::unique_ptr`, `std::shared_ptr` |
| 241 | +- **No exceptions/RTTI**: No `throw`, `try/catch`, `dynamic_cast`, `typeid` |
| 242 | +- **No standard I/O**: No `printf()`, `cout`, `malloc()`, `new`, files, threads |
| 243 | +- **Use provided APIs**: `MemoryManager` for allocation, `ET_LOG` for logging, `Result<T>` for errors |
| 244 | +- **C++17 only**: Target C++17 standard, use `<cstdint>` types (`uint32_t`, `int64_t`) |
| 245 | + |
| 246 | +### Code Style |
| 247 | +**C++ (Google Style with modifications):** |
| 248 | +- **Column Limit**: 80 characters |
| 249 | +- **Indentation**: 2 spaces, no tabs |
| 250 | +- **Naming**: |
| 251 | + - Classes/Types: `PascalCase` (e.g., `BackendDelegate`, `EValue`) |
| 252 | + - Functions/Methods: `snake_case()` (differs from Google) |
| 253 | + - Variables: `snake_case` (private members: `trailing_underscore_`) |
| 254 | + - Files: `snake_case.cpp`, `snake_case.h` |
| 255 | +- **Includes**: Use `<angle brackets>` for all includes, sort alphabetically within groups |
| 256 | +- **Documentation**: Doxygen style (`/** ... */` for multi-line, `/// ...` for single-line) |
| 257 | + |
| 258 | +**Python (PEP 8 + Google Style):** |
| 259 | +- **Line Length**: 80 characters |
| 260 | +- **Type Hints**: Required throughout |
| 261 | +- **Naming**: |
| 262 | + - Classes: `UpperCamelCase` |
| 263 | + - Functions: `snake_case()` (private: `_leading_underscore()`) |
| 264 | + - Constants: `UPPER_SNAKE_CASE` |
| 265 | +- **Docstrings**: Google style with type annotations |
| 266 | +- **Imports**: Standard library → typing → third-party → local |
| 267 | + |
| 268 | +### Error Handling |
| 269 | +**C++**: Use `runtime::Result<T>` and `runtime::Error` patterns |
| 270 | +```cpp |
| 271 | +Result<int> foo() { |
| 272 | + ET_CHECK_OR_RETURN_ERROR(condition, error_msg); |
| 273 | + // ... |
| 274 | + return value; |
| 275 | +} |
| 276 | +// Caller: |
| 277 | +auto result = foo(); |
| 278 | +ET_CHECK_OR_RETURN_ERROR(result.ok(), "Failed"); |
| 279 | +int value = ET_UNWRAP(result); |
| 280 | +``` |
| 281 | +
|
| 282 | +**Python**: Use custom exceptions with `ExportErrorType` categorization |
| 283 | +```python |
| 284 | +from executorch.exir.error import InternalError, ExportError |
| 285 | +
|
| 286 | +if not valid: |
| 287 | + raise InternalError(ExportErrorType.INVALID_INPUT, "Description") |
| 288 | +``` |
| 289 | + |
| 290 | +### Testing Requirements |
| 291 | +- All new features and bug fixes MUST include tests |
| 292 | +- Local testing: `pytest` (Python), `test/run_oss_cpp_tests.sh` (C++) |
| 293 | +- CI must pass before merge |
| 294 | +- Backend tests use relaxed tolerances (atol=1e-1, rtol=4e-2) |
| 295 | + |
| 296 | +### API Lifecycle |
| 297 | +- **Experimental**: Mark with `ET_EXPERIMENTAL` (C++) or `@experimental` (Python) - can change without notice |
| 298 | +- **Stable**: Default - breaking changes require deprecation |
| 299 | +- **Deprecated**: Mark with `ET_DEPRECATED`/`@deprecated` - remove after 2 minor releases |
| 300 | +- Document migration paths for deprecated APIs |
| 301 | + |
| 302 | +### Linting and Formatting |
| 303 | +**Setup and Run:** |
| 304 | +```bash |
| 305 | +# Setup (once) |
| 306 | +lintrunner init |
| 307 | + |
| 308 | +# Run all linters |
| 309 | +lintrunner -a |
| 310 | + |
| 311 | +# Auto-fix Python formatting |
| 312 | +ufmt format . |
| 313 | +``` |
| 314 | + |
| 315 | +**Configured Tools:** |
| 316 | +- **Python**: black, flake8, ufmt, mypy, torchfix |
| 317 | +- **C++**: clang-format (18.1.3) |
| 318 | +- **CMake**: cmakelang |
| 319 | + |
| 320 | +### Common Pitfalls |
| 321 | + |
| 322 | +**Export Issues:** |
| 323 | +- **Custom Operators**: Register with `torch.library` BEFORE export |
| 324 | +- **Dynamic Shapes**: Specify bounds with `dynamic_shapes` parameter |
| 325 | +- **Unsupported Ops**: Check operator support - not all PyTorch ops are supported |
| 326 | + |
| 327 | +**Runtime Issues:** |
| 328 | +- **Backend Linking**: Use `--whole-archive` when linking backend libraries |
| 329 | +- **Duplicate Registration**: Link only ONE `gen_operators_lib` per application |
| 330 | +- **Input Shape Mismatch**: Ensure runtime inputs match export-time example shapes |
| 331 | +- **Missing Operators**: Verify selective builds include all required ops |
| 332 | + |
| 333 | +**Performance Issues:** |
| 334 | +- **Debug Builds**: Always use `CMAKE_BUILD_TYPE=Release` for performance testing |
| 335 | +- **No Backend Delegation**: Ensure models use appropriate backends (XNNPACK, CoreML, etc.) |
| 336 | +- **Thread Count**: Optimize thread pool (often cores/2 or 4 on mobile) |
| 337 | + |
| 338 | +**Build Issues:** |
| 339 | +- **Python Dev Packages**: Install `python-dev` or `python3-devel` |
| 340 | +- **Submodule Sync**: Run `git submodule update --init` after pulling |
| 341 | +- **Clean Builds**: Use `--clean` flag if seeing unexpected build errors |
| 342 | + |
| 343 | +### Code Review Best Practices |
| 344 | +- Add clear PR title and description |
| 345 | +- Include test instructions and validation steps |
| 346 | +- Add reviewers based on CODEOWNERS or file blame |
| 347 | +- Label with appropriate release note tags |
| 348 | +- Use "Squash and merge" when ready |
| 349 | + |
| 350 | +### Performance Optimization |
| 351 | +- **Memory Planning**: Use AOT memory planning for constrained devices |
| 352 | +- **Quantization**: Apply QAT or PTQ for model size and performance |
| 353 | +- **Backend Selection**: Choose appropriate backend for target hardware |
| 354 | +- **Profiling**: Use `EXECUTORCH_SCOPE_PROF` macros and ETDump analysis |
| 355 | +- **Selective Build**: Link only required operators to minimize binary size |
| 356 | + |
| 357 | +### Additional Resources |
| 358 | +- **Contributing**: See `xplat/executorch/CONTRIBUTING.md` for detailed guidelines |
| 359 | +- **Oncall**: Primary oncall is `executorch`, component-specific in TARGETS files |
| 360 | +- **Code Ownership**: Check `xplat/executorch/CODEOWNERS` for file ownership |
| 361 | +- **CI Workflows**: See `xplat/executorch/oss/.github/workflows/` for CI configuration |
0 commit comments