Skip to content
@safety-research

Safety Research

Popular repositories Loading

  1. circuit-tracer circuit-tracer Public

    Python 2.5k 282

  2. bloom bloom Public

    bloom - evaluate any behavior immediately  🌸🌱

    Python 1k 125

  3. petri petri Public

    An alignment auditing agent capable of quickly exploring alignment hypothesis

    Python 792 105

  4. persona_vectors persona_vectors Public

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models

    Python 326 76

  5. SCONE-bench SCONE-bench Public

    146 25

  6. safety-tooling safety-tooling Public

    Inference API for many LLMs and other useful tools for empirical research

    Python 94 24

Repositories

Showing 10 of 35 repositories
  • petri Public

    An alignment auditing agent capable of quickly exploring alignment hypothesis

    safety-research/petri’s past year of commit activity
    Python 792 MIT 105 4 4 Updated Jan 6, 2026
  • safety-research/circuit-tracer’s past year of commit activity
    Python 2,525 MIT 282 16 11 Updated Jan 6, 2026
  • bloom Public

    bloom - evaluate any behavior immediately  🌸🌱

    safety-research/bloom’s past year of commit activity
    Python 1,042 MIT 125 2 4 Updated Jan 4, 2026
  • how-ai-impacts-skill-formation Public

    Repo for measuring whether using AI tools inhibits skill formation and development

    safety-research/how-ai-impacts-skill-formation’s past year of commit activity
    Python 0 0 0 0 Updated Jan 3, 2026
  • A3 Public
    safety-research/A3’s past year of commit activity
    Python 2 Apache-2.0 0 0 0 Updated Dec 29, 2025
  • safety-tooling Public

    Inference API for many LLMs and other useful tools for empirical research

    safety-research/safety-tooling’s past year of commit activity
    Python 94 MIT 24 11 11 Updated Dec 24, 2025
  • inverse-scaling-ttc Public

    Inverse Scaling in Test-Time Compute

    safety-research/inverse-scaling-ttc’s past year of commit activity
    Python 23 MIT 2 0 0 Updated Dec 3, 2025
  • impossiblebench Public

    Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"

    safety-research/impossiblebench’s past year of commit activity
    Python 22 MIT 2 0 0 Updated Dec 1, 2025
  • SCONE-bench Public
    safety-research/SCONE-bench’s past year of commit activity
    146 MIT 25 3 0 Updated Nov 25, 2025
  • safety-research/unsupervised-truth-probes’s past year of commit activity
    Python 4 0 1 0 Updated Nov 24, 2025

Most used topics

Loading…