A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme.
Attribution Graphs Explorer is a framework for mechanistic interpretability of transformer models based on circuit tracing methods. The toolkit allows researchers to:
- Extract computational circuits from neural networks
- Trace linear paths of information flow
- Visualize attribution graphs
- Test causal hypotheses through perturbation
Our implementation builds on the methods described in the Attribution Graphs research, allowing for programmatic analysis of neural network internals.
The toolkit is organized around several key components:
graph TD
A[Input Tokens] --> B[Token Embeddings]
B --> C[Cross-Layer Transcoder]
C --> D[Feature Activations]
D --> E[Attribution Graph]
E --> F[Output Logits]
style C fill:#f9f,stroke:#333,stroke-width:4px
style E fill:#bbf,stroke:#333,stroke-width:4px
Cross-Layer Transcoders provide a way to bypass MLP nonlinearities, creating linear feature-to-feature interactions that can be traced through the network. The CLT modules:
- Read from the residual stream at one layer
- Contribute to all subsequent MLP layers
- Maintain sparse feature representations
Attribution graphs represent the computational flow as a directed graph:
- Nodes: Features, tokens, and logits
- Edges: Attribution weights between features
- Paths: Computational circuits through the network
The toolkit provides algorithms for finding interpretable circuits:
- Path tracing algorithms
- Circuit motif identification
- Circuit visualization
Test hypotheses about discovered circuits:
- Perturbation experiments
- Causal validation
- Sparsity and concentration metrics
# Clone the repository
git clone https://github.com/aygp-dr/attribution-graphs-explorer.git
cd attribution-graphs-explorer
# Configure and build
./configure
gmake
# Run tests to verify installation
gmake test
# Run examples
gmake run
This project has been developed and tested with the following environment:
Component | Version | Notes |
---|---|---|
Operating System | FreeBSD 14.2 | Should work on most Unix-like systems |
Guile | 3.0.10 | Minimum 3.0 required |
GNU Make | 4.4.1 | gmake on FreeBSD |
GNU Grep | 3.11 | ggrep on FreeBSD |
GNU Awk | 5.3.2 | gawk on FreeBSD |
Direnv | 2.35.0 | For environment management |
Emacs | 30.1 | For org-mode processing and documentation |
- SRFI libraries: srfi-1, srfi-9, srfi-43
- ice-9 regex
;; Load the framework
(add-to-load-path "/path/to/attribution-graphs-explorer")
(use-modules (attribution-graphs clt transcoder)
(attribution-graphs graph attribution)
(attribution-graphs circuits discovery))
;; Create a cross-layer transcoder
(define my-clt (make-clt 5 '(6 7 8) 768 128 768))
;; Generate attribution graph
(define graph (compute-attribution-graph my-clt "Example input" 'last-token))
;; Find and visualize circuits
(define circuits (find-circuits graph))
(display (circuit->mermaid circuits graph))
The repository includes example applications:
Analyzes how transformer models plan rhyming in poetry:
(use-modules (attribution-graphs examples poetry-circuit))
(analyze-poetry-planning model "Roses are red\nViolets are ")
Traces factual recall with intermediate reasoning steps:
(use-modules (attribution-graphs examples reasoning-circuit))
(analyze-multihop-reasoning model "The capital of the state containing Dallas is")
This project is currently in alpha status (version 0.1.0). The core framework is implemented and functional, but several components are placeholders for demonstration purposes:
- Core data structures (CLT, attribution graphs, nodes, edges)
- Basic mathematical operations (activation functions, matrix operations)
- Graph construction and manipulation
- Circuit discovery algorithms
- Visualization generation (Mermaid diagrams)
- Example applications (poetry and reasoning circuits)
- Test framework
- Matrix operations use simplified random generation
- Some functions are placeholders (e.g.,
embed-tokens
,find-edge
) - No actual model integration (uses mock CLT instances)
- Limited validation framework
- No real-world model examples
- Integration with actual transformer models
- Improved matrix operations and numerical methods
- Enhanced visualization capabilities
- More comprehensive test suite
- Performance optimizations
- Additional circuit discovery algorithms
This toolkit builds on recent work in mechanistic interpretability of large language models:
- Attribution Graphs Methods - The core technical approach
- Attribution Graphs Biology - Application to biological knowledge
- Transformer Circuits - Broader context of circuit analysis
- Circuits: Zoom In on Neurons - Foundational work on circuit analysis in vision models
MIT License
If you use this toolkit in your research, please cite:
@software{attribution_graphs_explorer,
author = {AYGP-DR Research Team},
title = {Attribution Graphs Explorer: A Toolkit for Circuit Tracing in Transformer Models},
url = {https://github.com/aygp-dr/attribution-graphs-explorer},
year = {2025},
}