Attribution Graphs Explorer

1 Attribution Graphs Explorer

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme.

1.1 Overview

Attribution Graphs Explorer is a framework for mechanistic interpretability of transformer models based on circuit tracing methods. The toolkit allows researchers to:

Extract computational circuits from neural networks
Trace linear paths of information flow
Visualize attribution graphs
Test causal hypotheses through perturbation

Our implementation builds on the methods described in the Attribution Graphs research, allowing for programmatic analysis of neural network internals.

1.2 Architecture

The toolkit is organized around several key components:

graph TD
    A[Input Tokens] --> B[Token Embeddings]
    B --> C[Cross-Layer Transcoder]
    C --> D[Feature Activations]
    D --> E[Attribution Graph]
    E --> F[Output Logits]
    
    style C fill:#f9f,stroke:#333,stroke-width:4px
    style E fill:#bbf,stroke:#333,stroke-width:4px

1.2.1 Cross-Layer Transcoders (CLT)

Cross-Layer Transcoders provide a way to bypass MLP nonlinearities, creating linear feature-to-feature interactions that can be traced through the network. The CLT modules:

Read from the residual stream at one layer
Contribute to all subsequent MLP layers
Maintain sparse feature representations

1.2.2 Attribution Graphs

Attribution graphs represent the computational flow as a directed graph:

Nodes: Features, tokens, and logits
Edges: Attribution weights between features
Paths: Computational circuits through the network

1.2.3 Circuit Discovery

The toolkit provides algorithms for finding interpretable circuits:

Path tracing algorithms
Circuit motif identification
Circuit visualization

1.2.4 Validation Framework

Test hypotheses about discovered circuits:

Perturbation experiments
Causal validation
Sparsity and concentration metrics

1.3 Getting Started

1.3.1 Installation

# Clone the repository
git clone https://github.com/aygp-dr/attribution-graphs-explorer.git
cd attribution-graphs-explorer

# Configure and build
./configure
gmake

# Run tests to verify installation
gmake test

# Run examples
gmake run

1.3.2 Requirements

This project has been developed and tested with the following environment:

Component	Version	Notes
Operating System	FreeBSD 14.2	Should work on most Unix-like systems
Guile	3.0.10	Minimum 3.0 required
GNU Make	4.4.1	gmake on FreeBSD
GNU Grep	3.11	ggrep on FreeBSD
GNU Awk	5.3.2	gawk on FreeBSD
Direnv	2.35.0	For environment management
Emacs	30.1	For org-mode processing and documentation

1.3.2.1 Required Guile Modules

SRFI libraries: srfi-1, srfi-9, srfi-43
ice-9 regex

1.3.3 Basic Usage

;; Load the framework
(add-to-load-path "/path/to/attribution-graphs-explorer")
(use-modules (attribution-graphs clt transcoder)
             (attribution-graphs graph attribution)
             (attribution-graphs circuits discovery))

;; Create a cross-layer transcoder
(define my-clt (make-clt 5 '(6 7 8) 768 128 768))

;; Generate attribution graph
(define graph (compute-attribution-graph my-clt "Example input" 'last-token))

;; Find and visualize circuits
(define circuits (find-circuits graph))
(display (circuit->mermaid circuits graph))

1.4 Examples

The repository includes example applications:

1.4.1 Poetry Generation Circuit

Analyzes how transformer models plan rhyming in poetry:

(use-modules (attribution-graphs examples poetry-circuit))
(analyze-poetry-planning model "Roses are red\nViolets are ")

1.4.2 Multi-hop Reasoning Circuit

Traces factual recall with intermediate reasoning steps:

(use-modules (attribution-graphs examples reasoning-circuit))
(analyze-multihop-reasoning model "The capital of the state containing Dallas is")

1.5 Development Status

This project is currently in alpha status (version 0.1.0). The core framework is implemented and functional, but several components are placeholders for demonstration purposes:

1.5.1 Implemented Features

Core data structures (CLT, attribution graphs, nodes, edges)
Basic mathematical operations (activation functions, matrix operations)
Graph construction and manipulation
Circuit discovery algorithms
Visualization generation (Mermaid diagrams)
Example applications (poetry and reasoning circuits)
Test framework

1.5.2 Limitations

Matrix operations use simplified random generation
Some functions are placeholders (e.g., embed-tokens, find-edge)
No actual model integration (uses mock CLT instances)
Limited validation framework
No real-world model examples

1.5.3 Future Development

Integration with actual transformer models
Improved matrix operations and numerical methods
Enhanced visualization capabilities
More comprehensive test suite
Performance optimizations
Additional circuit discovery algorithms

1.6 Research Context

This toolkit builds on recent work in mechanistic interpretability of large language models:

Attribution Graphs Methods - The core technical approach
Attribution Graphs Biology - Application to biological knowledge
Transformer Circuits - Broader context of circuit analysis
Circuits: Zoom In on Neurons - Foundational work on circuit analysis in vision models

1.7 License

MIT License

1.8 Citation

If you use this toolkit in your research, please cite:

@software{attribution_graphs_explorer,
  author = {AYGP-DR Research Team},
  title = {Attribution Graphs Explorer: A Toolkit for Circuit Tracing in Transformer Models},
  url = {https://github.com/aygp-dr/attribution-graphs-explorer},
  year = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
attribution-graphs		attribution-graphs
docs/images		docs/images
scripts		scripts
tests		tests
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
Makefile		Makefile
README.org		README.org
VERSION		VERSION
attribution-graphs-explorer.el		attribution-graphs-explorer.el
attribution-graphs-explorer.org		attribution-graphs-explorer.org
config.mk		config.mk
configure		configure
kitchen-ops.org		kitchen-ops.org
org-lint.el		org-lint.el
scheme-lint.scm		scheme-lint.scm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Attribution Graphs Explorer

1 Attribution Graphs Explorer

1.1 Overview

1.2 Architecture

1.2.1 Cross-Layer Transcoders (CLT)

1.2.2 Attribution Graphs

1.2.3 Circuit Discovery

1.2.4 Validation Framework

1.3 Getting Started

1.3.1 Installation

1.3.2 Requirements

1.3.2.1 Required Guile Modules

1.3.3 Basic Usage

1.4 Examples

1.4.1 Poetry Generation Circuit

1.4.2 Multi-hop Reasoning Circuit

1.5 Development Status

1.5.1 Implemented Features

1.5.2 Limitations

1.5.3 Future Development

1.6 Research Context

1.7 License

1.8 Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

aygp-dr/attribution-graphs-explorer

Folders and files

Latest commit

History

Repository files navigation

Attribution Graphs Explorer

1 Attribution Graphs Explorer

1.1 Overview

1.2 Architecture

1.2.1 Cross-Layer Transcoders (CLT)

1.2.2 Attribution Graphs

1.2.3 Circuit Discovery

1.2.4 Validation Framework

1.3 Getting Started

1.3.1 Installation

1.3.2 Requirements

1.3.2.1 Required Guile Modules

1.3.3 Basic Usage

1.4 Examples

1.4.1 Poetry Generation Circuit

1.4.2 Multi-hop Reasoning Circuit

1.5 Development Status

1.5.1 Implemented Features

1.5.2 Limitations

1.5.3 Future Development

1.6 Research Context

1.7 License

1.8 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages