Skip to content

Commit 7e0c7d0

Browse files
Zonglin Pengmeta-codesync[bot]
authored andcommitted
create repo context
Differential Revision: D84163926
1 parent 1a5eaec commit 7e0c7d0

File tree

1 file changed

+361
-0
lines changed

1 file changed

+361
-0
lines changed

.llms/rules/Executorch.md

Lines changed: 361 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,361 @@
1+
---
2+
oncalls:
3+
- executorch
4+
llms-gk: devmate_executorch_cadence_md
5+
apply_to_regex: ".*"
6+
---
7+
8+
# ExecuTorch: On-Device AI Inference Runtime
9+
10+
## Project Description
11+
12+
ExecuTorch is Meta's end-to-end solution for enabling on-device AI inference across mobile and edge devices including wearables, embedded systems, and microcontrollers. It extends PyTorch 2's compiler and export functionality to efficiently deploy models to edge devices with minimal memory footprint and superior performance. ExecuTorch powers Meta's on-device AI experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta Smart Glasses, and WhatsApp.
13+
14+
**Key Value Propositions:**
15+
- **Portability**: Compatible across platforms from high-end mobile to constrained embedded systems
16+
- **Productivity**: Unified toolchain from PyTorch authoring through deployment
17+
- **Performance**: Lightweight runtime (<50kB core) with full hardware acceleration support
18+
19+
**Supported Models**: LLMs (Llama, Qwen, Phi), Computer Vision, ASR, TTS, and multimodal models
20+
21+
## Architecture Overview
22+
23+
### Three-Phase Workflow
24+
1. **Program Preparation (AOT)**: Model export, quantization, memory planning, backend delegation
25+
2. **Runtime Preparation**: Program loading and initialization
26+
3. **Program Execution**: Kernel dispatch and inference
27+
28+
### Four-Dialect Lowering Pipeline
29+
```
30+
PyTorch Model → torch.export()
31+
↓ ATen Dialect (PyTorch-compatible operators)
32+
↓ Core ATen Dialect (Decomposed fundamental ops)
33+
↓ Edge Dialect (Edge-optimized, dtype-specialized)
34+
↓ Backend Dialect (Hardware-specific optimizations)
35+
↓ .pte file (Serialized execution graph)
36+
```
37+
38+
### Core Components
39+
- **Runtime (`xplat/executorch/runtime/`)**: Core C++ execution engine, tensor data structures, operator dispatch
40+
- **EXIR (`xplat/executorch/exir/`)**: Export IR system, four-dialect compilation pipeline
41+
- **Kernels (`xplat/executorch/kernels/`)**: Portable, optimized, and quantized operator implementations
42+
- **Backends (`xplat/executorch/backends/`)**: Hardware acceleration (Apple CoreML/MPS, Qualcomm QNN, ARM Ethos-U, Vulkan, XNNPACK, etc.)
43+
- **Extensions (`xplat/executorch/extension/`)**: LLM framework, training, module wrappers, platform integration
44+
- **DevTools (`xplat/executorch/devtools/`)**: ETDump profiling, ETRecord debugging, inspector APIs
45+
46+
## Directory Structure & Dependencies
47+
48+
```
49+
xplat/executorch/
50+
├── runtime/ # Core C++ runtime (<50kB, portable)
51+
│ ├── core/ # Fundamental types (Tensor, EValue, Error)
52+
│ ├── executor/ # Program loading and execution
53+
│ ├── kernel/ # Operator registration and dispatch
54+
│ ├── backend/ # Backend delegate APIs
55+
│ └── platform/ # Platform abstraction layer (PAL)
56+
├── exir/ # Export IR compiler (4-dialect pipeline)
57+
├── kernels/ # Operator implementations
58+
│ ├── portable/ # Pure C++ reference kernels
59+
│ ├── optimized/ # Hardware-optimized versions
60+
│ └── quantized/ # Quantization kernels
61+
├── backends/ # Hardware acceleration (15+ backends)
62+
│ ├── apple/ # CoreML, MPS
63+
│ ├── qualcomm/ # QNN (HTP/NPU)
64+
│ ├── arm/ # Ethos-U, TOSA
65+
│ ├── xnnpack/ # Optimized CPU
66+
│ └── vulkan/ # Cross-platform GPU
67+
├── extension/ # Runtime extensions
68+
│ ├── llm/ # LLM tokenizers, samplers, KV cache
69+
│ ├── module/ # Simplified C++ APIs
70+
│ └── android/ios/ # Platform integration
71+
├── devtools/ # Profiling and debugging tools
72+
├── examples/ # Model export and usage examples
73+
├── schema/ # FlatBuffer file format definitions
74+
└── tools/ # Build system integration
75+
```
76+
77+
**Key Dependencies:**
78+
- **Internal**: PyTorch 2.10.0 (pinned via `torch_pin.py`)
79+
- **External**: FlatBuffers, numpy>=2.0.0, pybind11, CMake>=3.29
80+
- **Platform-Specific**: coremltools (Apple), QNN SDK (Qualcomm), ARM Compute Library
81+
82+
**Build Systems**:
83+
- Buck2 (internal Meta builds)
84+
- CMake (OSS builds, primary)
85+
- Python setuptools (pip packages)
86+
87+
## Build and Development Workflow
88+
89+
### Quick Start
90+
```bash
91+
# Clone and setup
92+
git clone -b viable/strict https://github.com/pytorch/executorch.git
93+
cd executorch
94+
conda create -yn executorch python=3.10.0 && conda activate executorch
95+
96+
# Install in development mode
97+
./install_executorch.sh --editable
98+
99+
# Run tests
100+
pytest
101+
```
102+
103+
### CMake Build Options
104+
```bash
105+
# Platform-specific builds (via presets)
106+
cmake .. --preset android-arm64-v8a # Android
107+
cmake .. --preset ios # iOS
108+
cmake .. --preset macos # macOS
109+
cmake .. --preset linux # Linux
110+
111+
# Feature flags
112+
-DEXECUTORCH_BUILD_XNNPACK=ON # XNNPACK backend
113+
-DEXECUTORCH_BUILD_COREML=ON # CoreML backend
114+
-DEXECUTORCH_BUILD_EXTENSION_LLM=ON # LLM support
115+
-DEXECUTORCH_BUILD_TESTS=ON # Build tests
116+
-DEXECUTORCH_ENABLE_LOGGING=ON # Runtime logging
117+
-DCMAKE_BUILD_TYPE=Release # Release build (required for perf)
118+
119+
# Build and run
120+
cmake --build cmake-out -j9
121+
cmake --build cmake-out --target test
122+
```
123+
124+
### Export and Run Example
125+
```bash
126+
# Export model
127+
python -m examples.portable.scripts.export --model_name="add"
128+
129+
# Execute
130+
./cmake-out/executor_runner --model_path add.pte
131+
```
132+
133+
### Build Scripts
134+
- `xplat/executorch/scripts/build_android_library.sh` - Android native builds
135+
- `xplat/executorch/scripts/build_apple_frameworks.sh` - iOS/macOS frameworks
136+
- `xplat/executorch/install_executorch.sh` - Main installation script
137+
138+
### Cleaning and Rebuilding
139+
```bash
140+
./install_executorch.sh --clean
141+
git submodule sync && git submodule update --init --recursive
142+
```
143+
144+
## Documentation References
145+
146+
**Core Documentation** (`xplat/executorch/docs/`):
147+
- Architecture guides and design documents
148+
- Backend integration tutorials
149+
- Platform-specific deployment guides
150+
- API references and examples
151+
152+
**README Files**:
153+
- `xplat/executorch/README.md` - Project overview and getting started
154+
- `xplat/executorch/CONTRIBUTING.md` - Contribution guidelines
155+
- Backend-specific READMEs in `xplat/executorch/backends/*/README.md`
156+
157+
**Internal Resources**:
158+
- Workplace groups and internal wikis for Meta-specific deployment patterns
159+
- Oncall: `executorch` (check TARGETS files for component-specific oncalls)
160+
161+
## Testing Strategy
162+
163+
### Test Frameworks
164+
- **Python**: pytest with hypothesis for property-based testing
165+
- **C++**: GoogleTest for unit tests
166+
- **Backend Testing**: Modular test suite in `xplat/executorch/backends/test/`
167+
168+
### Test Organization
169+
```
170+
xplat/executorch/test/ # Core runtime tests
171+
xplat/executorch/backends/test/ # Backend validation suite
172+
xplat/executorch/examples/models/test/ # End-to-end model tests
173+
```
174+
175+
### Testing Patterns
176+
- **Unit Tests**: Individual operator and component validation
177+
- **Integration Tests**: End-to-end model export and execution
178+
- **Backend Tests**: Stage-based pipeline (Export → Quantize → ToEdge → Partition → Serialize)
179+
- **Numerical Validation**: Configurable tolerance (atol=1e-1, rtol=4e-2 for backends)
180+
- **Performance Testing**: ETDump profiling with timing and memory metrics
181+
182+
### Running Tests
183+
```bash
184+
# Python tests
185+
pytest # All tests
186+
pytest backends/xnnpack/test # Specific backend
187+
pytest -k "test_name" # Specific test
188+
189+
# C++ tests
190+
cmake --build cmake-out --target test
191+
192+
# Backend validation
193+
python -m backends.test.test_runner
194+
```
195+
196+
### Test Configuration
197+
- `xplat/executorch/pytest.ini` - pytest configuration with flake detection (50 reruns)
198+
- `xplat/executorch/Test.cmake` - C++ test definitions
199+
- CI/CD in `xplat/executorch/oss/.github/workflows/` - Multi-platform CI validation
200+
201+
## Integration Points
202+
203+
### PyTorch Integration
204+
- **torch.export()**: Primary model capture mechanism
205+
- **ATen Operators**: Compatible with PyTorch operator set
206+
- **Python Bindings**: `executorch.runtime` module for torch tensor integration
207+
- **Tensor Parsers**: Support for ATen, exec_aten, and portable tensor modes
208+
209+
### Backend Delegation
210+
- **BackendInterface**: Standard contract for backend implementations
211+
- **Delegate API**: `executorch_call_delegate` for offloading subgraphs
212+
- **Partitioning**: Automatic or manual delegation of operations to accelerators
213+
- **External Tensors**: Support for backend-managed memory
214+
215+
### Platform Integration
216+
- **Android**: JNI bindings in `xplat/executorch/extension/android/`
217+
- **iOS**: Swift package and frameworks in `xplat/executorch/extension/apple/`
218+
- **Embedded**: Zephyr RTOS support, bare-metal ARM builds
219+
- **Cross-Platform**: Vulkan for GPU acceleration across desktop and mobile
220+
221+
### Data Flow
222+
```
223+
PyTorch Model (Python)
224+
↓ torch.export()
225+
EXIR Graph
226+
↓ Backend Partitioning
227+
Delegated Subgraphs + Portable Ops
228+
↓ Serialization
229+
.pte File (FlatBuffer)
230+
↓ Runtime Loading
231+
C++ Runtime
232+
↓ Execution
233+
CPU/GPU/NPU Hardware
234+
```
235+
236+
## Developer Rules
237+
238+
### Portable C++ Programming
239+
**CRITICAL for `xplat/executorch/runtime/` code:**
240+
- **No stdlib dependencies**: No `std::vector`, `std::string`, `std::unique_ptr`, `std::shared_ptr`
241+
- **No exceptions/RTTI**: No `throw`, `try/catch`, `dynamic_cast`, `typeid`
242+
- **No standard I/O**: No `printf()`, `cout`, `malloc()`, `new`, files, threads
243+
- **Use provided APIs**: `MemoryManager` for allocation, `ET_LOG` for logging, `Result<T>` for errors
244+
- **C++17 only**: Target C++17 standard, use `<cstdint>` types (`uint32_t`, `int64_t`)
245+
246+
### Code Style
247+
**C++ (Google Style with modifications):**
248+
- **Column Limit**: 80 characters
249+
- **Indentation**: 2 spaces, no tabs
250+
- **Naming**:
251+
- Classes/Types: `PascalCase` (e.g., `BackendDelegate`, `EValue`)
252+
- Functions/Methods: `snake_case()` (differs from Google)
253+
- Variables: `snake_case` (private members: `trailing_underscore_`)
254+
- Files: `snake_case.cpp`, `snake_case.h`
255+
- **Includes**: Use `<angle brackets>` for all includes, sort alphabetically within groups
256+
- **Documentation**: Doxygen style (`/** ... */` for multi-line, `/// ...` for single-line)
257+
258+
**Python (PEP 8 + Google Style):**
259+
- **Line Length**: 80 characters
260+
- **Type Hints**: Required throughout
261+
- **Naming**:
262+
- Classes: `UpperCamelCase`
263+
- Functions: `snake_case()` (private: `_leading_underscore()`)
264+
- Constants: `UPPER_SNAKE_CASE`
265+
- **Docstrings**: Google style with type annotations
266+
- **Imports**: Standard library → typing → third-party → local
267+
268+
### Error Handling
269+
**C++**: Use `runtime::Result<T>` and `runtime::Error` patterns
270+
```cpp
271+
Result<int> foo() {
272+
ET_CHECK_OR_RETURN_ERROR(condition, error_msg);
273+
// ...
274+
return value;
275+
}
276+
// Caller:
277+
auto result = foo();
278+
ET_CHECK_OR_RETURN_ERROR(result.ok(), "Failed");
279+
int value = ET_UNWRAP(result);
280+
```
281+
282+
**Python**: Use custom exceptions with `ExportErrorType` categorization
283+
```python
284+
from executorch.exir.error import InternalError, ExportError
285+
286+
if not valid:
287+
raise InternalError(ExportErrorType.INVALID_INPUT, "Description")
288+
```
289+
290+
### Testing Requirements
291+
- All new features and bug fixes MUST include tests
292+
- Local testing: `pytest` (Python), `test/run_oss_cpp_tests.sh` (C++)
293+
- CI must pass before merge
294+
- Backend tests use relaxed tolerances (atol=1e-1, rtol=4e-2)
295+
296+
### API Lifecycle
297+
- **Experimental**: Mark with `ET_EXPERIMENTAL` (C++) or `@experimental` (Python) - can change without notice
298+
- **Stable**: Default - breaking changes require deprecation
299+
- **Deprecated**: Mark with `ET_DEPRECATED`/`@deprecated` - remove after 2 minor releases
300+
- Document migration paths for deprecated APIs
301+
302+
### Linting and Formatting
303+
**Setup and Run:**
304+
```bash
305+
# Setup (once)
306+
lintrunner init
307+
308+
# Run all linters
309+
lintrunner -a
310+
311+
# Auto-fix Python formatting
312+
ufmt format .
313+
```
314+
315+
**Configured Tools:**
316+
- **Python**: black, flake8, ufmt, mypy, torchfix
317+
- **C++**: clang-format (18.1.3)
318+
- **CMake**: cmakelang
319+
320+
### Common Pitfalls
321+
322+
**Export Issues:**
323+
- **Custom Operators**: Register with `torch.library` BEFORE export
324+
- **Dynamic Shapes**: Specify bounds with `dynamic_shapes` parameter
325+
- **Unsupported Ops**: Check operator support - not all PyTorch ops are supported
326+
327+
**Runtime Issues:**
328+
- **Backend Linking**: Use `--whole-archive` when linking backend libraries
329+
- **Duplicate Registration**: Link only ONE `gen_operators_lib` per application
330+
- **Input Shape Mismatch**: Ensure runtime inputs match export-time example shapes
331+
- **Missing Operators**: Verify selective builds include all required ops
332+
333+
**Performance Issues:**
334+
- **Debug Builds**: Always use `CMAKE_BUILD_TYPE=Release` for performance testing
335+
- **No Backend Delegation**: Ensure models use appropriate backends (XNNPACK, CoreML, etc.)
336+
- **Thread Count**: Optimize thread pool (often cores/2 or 4 on mobile)
337+
338+
**Build Issues:**
339+
- **Python Dev Packages**: Install `python-dev` or `python3-devel`
340+
- **Submodule Sync**: Run `git submodule update --init` after pulling
341+
- **Clean Builds**: Use `--clean` flag if seeing unexpected build errors
342+
343+
### Code Review Best Practices
344+
- Add clear PR title and description
345+
- Include test instructions and validation steps
346+
- Add reviewers based on CODEOWNERS or file blame
347+
- Label with appropriate release note tags
348+
- Use "Squash and merge" when ready
349+
350+
### Performance Optimization
351+
- **Memory Planning**: Use AOT memory planning for constrained devices
352+
- **Quantization**: Apply QAT or PTQ for model size and performance
353+
- **Backend Selection**: Choose appropriate backend for target hardware
354+
- **Profiling**: Use `EXECUTORCH_SCOPE_PROF` macros and ETDump analysis
355+
- **Selective Build**: Link only required operators to minimize binary size
356+
357+
### Additional Resources
358+
- **Contributing**: See `xplat/executorch/CONTRIBUTING.md` for detailed guidelines
359+
- **Oncall**: Primary oncall is `executorch`, component-specific in TARGETS files
360+
- **Code Ownership**: Check `xplat/executorch/CODEOWNERS` for file ownership
361+
- **CI Workflows**: See `xplat/executorch/oss/.github/workflows/` for CI configuration

0 commit comments

Comments
 (0)