Skip to content

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme

Notifications You must be signed in to change notification settings

aygp-dr/attribution-graphs-explorer

Repository files navigation

Attribution Graphs Explorer

1 Attribution Graphs Explorer

https://img.shields.io/badge/Version-0.1.0-blue.svg https://img.shields.io/badge/Guile-3.0+-blue.svg https://img.shields.io/badge/License-MIT-green.svg https://img.shields.io/badge/Status-Alpha-orange.svg

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme.

1.1 Overview

Attribution Graphs Explorer is a framework for mechanistic interpretability of transformer models based on circuit tracing methods. The toolkit allows researchers to:

  • Extract computational circuits from neural networks
  • Trace linear paths of information flow
  • Visualize attribution graphs
  • Test causal hypotheses through perturbation

Our implementation builds on the methods described in the Attribution Graphs research, allowing for programmatic analysis of neural network internals.

docs/images/overview.png

1.2 Architecture

The toolkit is organized around several key components:

graph TD
    A[Input Tokens] --> B[Token Embeddings]
    B --> C[Cross-Layer Transcoder]
    C --> D[Feature Activations]
    D --> E[Attribution Graph]
    E --> F[Output Logits]
    
    style C fill:#f9f,stroke:#333,stroke-width:4px
    style E fill:#bbf,stroke:#333,stroke-width:4px
Loading

1.2.1 Cross-Layer Transcoders (CLT)

Cross-Layer Transcoders provide a way to bypass MLP nonlinearities, creating linear feature-to-feature interactions that can be traced through the network. The CLT modules:

  • Read from the residual stream at one layer
  • Contribute to all subsequent MLP layers
  • Maintain sparse feature representations

1.2.2 Attribution Graphs

Attribution graphs represent the computational flow as a directed graph:

  • Nodes: Features, tokens, and logits
  • Edges: Attribution weights between features
  • Paths: Computational circuits through the network

1.2.3 Circuit Discovery

The toolkit provides algorithms for finding interpretable circuits:

  • Path tracing algorithms
  • Circuit motif identification
  • Circuit visualization

1.2.4 Validation Framework

Test hypotheses about discovered circuits:

  • Perturbation experiments
  • Causal validation
  • Sparsity and concentration metrics

1.3 Getting Started

1.3.1 Installation

# Clone the repository
git clone https://github.com/aygp-dr/attribution-graphs-explorer.git
cd attribution-graphs-explorer

# Configure and build
./configure
gmake

# Run tests to verify installation
gmake test

# Run examples
gmake run

1.3.2 Requirements

This project has been developed and tested with the following environment:

ComponentVersionNotes
Operating SystemFreeBSD 14.2Should work on most Unix-like systems
Guile3.0.10Minimum 3.0 required
GNU Make4.4.1gmake on FreeBSD
GNU Grep3.11ggrep on FreeBSD
GNU Awk5.3.2gawk on FreeBSD
Direnv2.35.0For environment management
Emacs30.1For org-mode processing and documentation

1.3.2.1 Required Guile Modules

  • SRFI libraries: srfi-1, srfi-9, srfi-43
  • ice-9 regex

1.3.3 Basic Usage

;; Load the framework
(add-to-load-path "/path/to/attribution-graphs-explorer")
(use-modules (attribution-graphs clt transcoder)
             (attribution-graphs graph attribution)
             (attribution-graphs circuits discovery))

;; Create a cross-layer transcoder
(define my-clt (make-clt 5 '(6 7 8) 768 128 768))

;; Generate attribution graph
(define graph (compute-attribution-graph my-clt "Example input" 'last-token))

;; Find and visualize circuits
(define circuits (find-circuits graph))
(display (circuit->mermaid circuits graph))

1.4 Examples

The repository includes example applications:

1.4.1 Poetry Generation Circuit

Analyzes how transformer models plan rhyming in poetry:

(use-modules (attribution-graphs examples poetry-circuit))
(analyze-poetry-planning model "Roses are red\nViolets are ")

1.4.2 Multi-hop Reasoning Circuit

Traces factual recall with intermediate reasoning steps:

(use-modules (attribution-graphs examples reasoning-circuit))
(analyze-multihop-reasoning model "The capital of the state containing Dallas is")

1.5 Development Status

This project is currently in alpha status (version 0.1.0). The core framework is implemented and functional, but several components are placeholders for demonstration purposes:

1.5.1 Implemented Features

  • Core data structures (CLT, attribution graphs, nodes, edges)
  • Basic mathematical operations (activation functions, matrix operations)
  • Graph construction and manipulation
  • Circuit discovery algorithms
  • Visualization generation (Mermaid diagrams)
  • Example applications (poetry and reasoning circuits)
  • Test framework

1.5.2 Limitations

  • Matrix operations use simplified random generation
  • Some functions are placeholders (e.g., embed-tokens, find-edge)
  • No actual model integration (uses mock CLT instances)
  • Limited validation framework
  • No real-world model examples

1.5.3 Future Development

  • Integration with actual transformer models
  • Improved matrix operations and numerical methods
  • Enhanced visualization capabilities
  • More comprehensive test suite
  • Performance optimizations
  • Additional circuit discovery algorithms

1.6 Research Context

This toolkit builds on recent work in mechanistic interpretability of large language models:

1.7 License

MIT License

1.8 Citation

If you use this toolkit in your research, please cite:

@software{attribution_graphs_explorer,
  author = {AYGP-DR Research Team},
  title = {Attribution Graphs Explorer: A Toolkit for Circuit Tracing in Transformer Models},
  url = {https://github.com/aygp-dr/attribution-graphs-explorer},
  year = {2025},
}

About

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •