-
Notifications
You must be signed in to change notification settings - Fork 96
Open
Description
Hello, thank you very much — your TOFU / MUSE papers and the open-unlearning repo have been a huge help for my research.
I have a question that came up while trying to reproduce the results.
In the TOFU paper, the forget quality of the finetune model (marked as ■) is shown to be around 1e-20 for all forget set sizes (1%, 5%, 10%).

However, when I look at the data downloaded via src/setup_data.py
in saves/evaltofu_[backbone_model_name]_full/[forget_split]/TOFU_SUMMARY.json
, I see different values:
forget_split = 1% → about 1e-2
forget_split = 5% → about 1e-10
forget_split = 10% → about 1e-20
And these seem to hold regardless of the backbone model_name(Llama2 7B, Llama3 1B, Llama3 8B).
So my questions are:
- Are the models with the _full suffix different from the finetune models in the TOFU paper?
- If not, then which values should I look at when referring to the finetune model results from the paper?
- If the _full models are indeed the same as the paper’s finetune models, is it okay for me to report the forget quality according to the forget split values (like above) when writing my own paper?
Thank you for your clarification!
Metadata
Metadata
Assignees
Labels
No labels