This is an accompanying repository for
"SENTRA: Selected-Next-Token Transformer for LLM Text Detection"
published in EMNLP Findings 2025.
SENTRA is a pre-trained encoder that can be used to train LLM detection classifiers. To use SENTRA, you need to fine-tune it on a labeled dataset containing both human authored and LLM generated text.
The training and inference code for sentra is in the sentra directory. The code we used for running the baseline methods are in the baselines directory.
The code and methods found in this repository and accompanying paper are not present or a part of firefox.