either hack it into the `FlaggingCallback` or add a quick batch-style script we can use when logging the test/val sets