Handle label imbalance in binary classification tasks on text benchmark (#376)

vid-koci · pre-commit-ci[bot] · web-flow · commit 96bdf1208268 · 2024-03-11T10:03:42.000-07:00
Labels in the text benchmarks are imbalanced and weighting the positive
labels improves performance.
Experiments done on `fake` dataset (5% positive labels) with
`text_embedded` and `RoBERTa` encodings:

- `ResNet` result changes 91.1% -&gt; 93.4% 
- `FTTransformer` result remains unchanged
- `Trompt` result changes 95.2% -&gt; 95.8%

The differences were even more stark with distilled roberta, but we
aren't reporting those anywhere so I didn't note them down.

More results are pending

---------

Co-authored-by: pre-commit-ci[bot] &lt;66853113+pre-commit-ci[bot]@users.noreply.github.com&gt;
diff --git a/benchmark/data_frame_text_benchmark.py b/benchmark/data_frame_text_benchmark.py
@@ -457,7 +457,8 @@ def main_torch(
 
     if dataset.task_type == TaskType.BINARY_CLASSIFICATION:
         out_channels = 1
-        loss_fun = BCEWithLogitsLoss()
+        label_imbalance = sum(train_tensor_frame.y) / len(train_tensor_frame.y)
+        loss_fun = BCEWithLogitsLoss(pos_weight=1 / label_imbalance)
         metric_computer = AUROC(task='binary').to(device)
         higher_is_better = True
     elif dataset.task_type == TaskType.MULTICLASS_CLASSIFICATION: