Skip to content
/ hexray Public

HexRay is part of MIRAI (Mechanistic Interpretability for Responsible AI). An initiative created by SageXAI.

License

Notifications You must be signed in to change notification settings

SageXAI/hexray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ HexRay

HexRay is your scalpel, microscope, and headlamp for AIβ€”trace every decision as it forms and reveal its inner mysteries.


πŸš€ What is HexRay?

HexRay is a low-level debugger for transformer models, purpose-built to illuminate the inner workings of AI β€” token by token, layer by layer. Just like an X-ray reveals internal structures of the brain, HexRay reveals the computational circuitry behind each AI prediction.

Built on top of TransformerLens, HexRay empowers mechanistic interpretability (MI) β€” the art of reverse engineering what algorithms a model has learned by analyzing weights, activations, and attention patterns. In other words, mechanistic interpretability aims to reverse-engineer the computational mechanisms of neural networks, providing a granular, causal understanding of AI decision-making [1][2][3]. HexRay extends this with:

  • πŸ” Logit debugging β€” trace how specific logits emerge and which neurons or attention heads contributed most.
  • 🧠 Chain-of-Thought attribution β€” follow how reasoning unfolds across time steps and internal components.
  • πŸͺ“ Neuron and head introspection β€” pinpoint influential subcomponents behind each decision.
  • 🧬 Activation tracing β€” monitor MLP and attention activity at every token and every layer.
  • 🧰 Red team–ready utilities β€” test model robustness, adversarial triggers, and hidden circuits.

Whether you're reverse engineering AI, probing safety risks in frontier models, or unraveling the inner workings of large language models, HexRay equips you with a scalpel, microscope, neuroscope, and headlamp β€” precision tools to illuminate, dissect, and understand the black box of AI with confidence.


✨ Features

  • Token-by-token residual stream tracing β€” inspect the evolution of hidden states at every layer and position.
  • Logit debugging β€” analyze which neurons, heads, and paths contributed most to a model’s final prediction.
  • Chain-of-Thought (CoT) attribution β€” trace logical reasoning step-by-step through attention and MLP layers.
  • Top-k component attribution β€” identify the most influential attention heads and MLP layers for each token.
  • Layer-wise activation logging β€” visualize and record intermediate activations for any prompt.
  • CLI interface β€” simple command-line interface for selecting models, prompts, and debugging modes.
  • TransformerLens integration β€” leverages robust hooks and interpretability primitives under the hood.
  • Modular architecture β€” designed for extensibility, including upcoming support for fuzzing, visualization, and adversarial tracing.
  • Debugging modes β€” toggle --cot-debug, --logit-debug, and --top-k-attribution to tailor your inspection workflow.
  • Supports multiple model scales with easy swapping via CLI.

πŸ”§ Basic Usage

python hexray.py --prompt "Why do bees buzz?" --top-k-attribution 10  

🟩 Output

Loading model gpt2-small...
Loaded pretrained model gpt2-small into HookedTransformer
Loaded pretrained model gpt2-small into HookedTransformer

Token: "
" (index -1)
Top Contributors to Final Logit:
- MLP  L9 : +46.16 (24.3%)
- MLP  L11: +35.61 (18.8%)
- MLP  L10: +30.14 (15.9%)
- MLP  L8 : +19.81 (10.4%)
- Attn L11: +14.21 (7.5%)
- MLP  L0 : +10.53 (5.5%)
- Attn L10: +9.93 (5.2%)
- Attn L0 : +9.62 (5.1%)
- Attn L7 : +7.67 (4.0%)
- MLP  L2 : +6.20 (3.3%)

πŸ”§ Chain of Though Debugging

python hexray.py --prompt "If John has 3 apples..." --cot-debug --top-k-attribution 10

🟩 Output

Loading model gpt2-small...
Loaded pretrained model gpt2-small into HookedTransformer
Loaded pretrained model gpt2-small into HookedTransformer

🧠 Chain of Thought Attribution Trace (Console):

Step 1: If John has 3 apples...
MLP  L10 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  23.2%
MLP  L8  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              12.6%
MLP  L0  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               11.5%
Attn L11 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               11.3%
Attn L0  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  9.7%
MLP  L7  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   9.0%
MLP  L11 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    7.9%
MLP  L6  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      6.3%
Attn L8  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       4.7%
Attn L9  β–ˆβ–ˆβ–ˆ                         3.7%

Logit Debugging

python hexray.py --model gpt2-xl --prompt "Tell me about Ξ‘hΞΏenΞΉx... Then if Pepsi is... the Ξ‘hΞΏenΞΉx must be..." --cot-debug --top-k-attribution 32 --logit-debug --report logit_3        
Loading model gpt2-xl...
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 689/689 [00:00<00:00, 442kB/s]
model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.43G/6.43G [01:18<00:00, 82.1MB/s]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 124/124 [00:00<00:00, 1.25MB/s]
tokenizer_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 26.0/26.0 [00:00<00:00, 318kB/s]
vocab.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.04M/1.04M [00:00<00:00, 5.73MB/s]
merges.txt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 456k/456k [00:00<00:00, 22.2MB/s]
tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.36M/1.36M [00:00<00:00, 21.8MB/s]
Loaded pretrained model gpt2-xl into HookedTransformer
Loaded pretrained model gpt2-xl into HookedTransformer
[β€’] Running Chain of Thought Debugger

🧠 Chain of Thought Attribution Trace (Console):

Step 1: Tell me about Ξ‘hΞΏenΞΉx...
MLP  L44 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   7.0%
MLP  L42 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    7.0%
MLP  L45 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     6.6%
MLP  L41 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      6.4%
MLP  L39 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ          5.2%
MLP  L43 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ             4.4%
MLP  L36 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              4.2%
MLP  L37 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              4.0%
MLP  L40 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               3.8%
MLP  L38 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               3.8%
MLP  L33 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                 3.2%
MLP  L46 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  3.0%
MLP  L34 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  2.9%
MLP  L29 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  2.9%
MLP  L35 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  2.8%
Attn L44 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   2.7%
Attn L42 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   2.6%
MLP  L32 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   2.5%
Attn L43 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    2.3%
Attn L46 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                     2.2%
Attn L33 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
MLP  L25 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
Attn L39 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
Attn L40 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
MLP  L30 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
Attn L36 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.9%
MLP  L28 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.8%
Attn L45 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      1.8%
MLP  L23 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       1.6%
MLP  L27 β–ˆβ–ˆβ–ˆβ–ˆ                        1.3%
MLP  L0  β–ˆβ–ˆβ–ˆβ–ˆ                        1.3%
Attn L37 β–ˆβ–ˆβ–ˆβ–ˆ                        1.3%

Step 2: Then if Pepsi is... the Ξ‘hΞΏenΞΉx must be...
MLP  L44 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   9.5%
MLP  L43 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      8.4%
MLP  L42 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        7.9%
MLP  L45 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         7.6%
MLP  L47 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         7.3%
MLP  L46 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ           6.6%
MLP  L41 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  4.0%
MLP  L39 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    3.4%
Attn L45 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    3.1%
Attn L42 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                     3.0%
Attn L44 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                     3.0%
Attn L43 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                     2.8%
Attn L39 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      2.4%
Attn L37 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                      2.3%
MLP  L40 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       2.3%
MLP  L38 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       2.3%
MLP  L34 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       2.0%
Attn L40 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       2.0%
MLP  L29 β–ˆβ–ˆβ–ˆβ–ˆ                        1.9%
Attn L25 β–ˆβ–ˆβ–ˆβ–ˆ                        1.8%
MLP  L35 β–ˆβ–ˆβ–ˆβ–ˆ                        1.7%
MLP  L36 β–ˆβ–ˆβ–ˆβ–ˆ                        1.6%
Attn L46 β–ˆβ–ˆβ–ˆ                         1.5%
Attn L41 β–ˆβ–ˆβ–ˆ                         1.5%
Attn L33 β–ˆβ–ˆβ–ˆ                         1.4%
MLP  L30 β–ˆβ–ˆβ–ˆ                         1.3%
Attn L34 β–ˆβ–ˆβ–ˆ                         1.3%
MLP  L23 β–ˆβ–ˆβ–ˆ                         1.3%
MLP  L37 β–ˆβ–ˆβ–ˆ                         1.2%
MLP  L25 β–ˆβ–ˆβ–ˆ                         1.2%
Attn L47 β–ˆβ–ˆβ–ˆ                         1.2%
Attn L35 β–ˆβ–ˆβ–ˆ                         1.1%
[β€’] Running Logit Debugger
[debug] captured: ['blocks.0.hook_attn_out', 'blocks.1.hook_attn_out', 'blocks.2.hook_attn_out', 'blocks.3.hook_attn_out', 'blocks.4.hook_attn_out', 'blocks.5.hook_attn_out', 'blocks.6.hook_attn_out', 'blocks.7.hook_attn_out', 'blocks.8.hook_attn_out', 'blocks.9.hook_attn_out', 'blocks.10.hook_attn_out', 'blocks.11.hook_attn_out', 'blocks.12.hook_attn_out', 'blocks.13.hook_attn_out', 'blocks.14.hook_attn_out', 'blocks.15.hook_attn_out', 'blocks.16.hook_attn_out', 'blocks.17.hook_attn_out', 'blocks.18.hook_attn_out', 'blocks.19.hook_attn_out', 'blocks.20.hook_attn_out', 'blocks.21.hook_attn_out', 'blocks.22.hook_attn_out', 'blocks.23.hook_attn_out', 'blocks.24.hook_attn_out', 'blocks.25.hook_attn_out', 'blocks.26.hook_attn_out', 'blocks.27.hook_attn_out', 'blocks.28.hook_attn_out', 'blocks.29.hook_attn_out', 'blocks.30.hook_attn_out', 'blocks.31.hook_attn_out', 'blocks.32.hook_attn_out', 'blocks.33.hook_attn_out', 'blocks.34.hook_attn_out', 'blocks.35.hook_attn_out', 'blocks.36.hook_attn_out', 'blocks.37.hook_attn_out', 'blocks.38.hook_attn_out', 'blocks.39.hook_attn_out', 'blocks.40.hook_attn_out', 'blocks.41.hook_attn_out', 'blocks.42.hook_attn_out', 'blocks.43.hook_attn_out', 'blocks.44.hook_attn_out', 'blocks.45.hook_attn_out', 'blocks.46.hook_attn_out', 'blocks.47.hook_attn_out']
[βœ“] Logit attribution plot saved to: logit_3/logit_attribution.png

πŸ“š Publications & Preprints

  • Jonathan Jaquez. HexRay: An Open-Source Neuroscope for AI - Tracing Tokens, Neurons, and Decisions for Frontier AI Research, Safety, and Security. TechRxiv. July 26, 2025. DOI: 10.36227/techrxiv.175356093.33637088/v1

πŸ“š References

β€Œ

πŸ“œ License

MIT License Β© 2025 Jonathan Jaquez

About

HexRay is part of MIRAI (Mechanistic Interpretability for Responsible AI). An initiative created by SageXAI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages