Out of Memory Error when tuning hyperparameters

## Description

When using tune function to tune hyperparameters, I often end up with OOM errors, even though training models individually works fine. Perhaps related to #433 .



When trying to tune hyperparameters for a span_classifier model based on eds-camembert, I get OOM erros after a few trials. I have tried adding callbacks to free up the memory (using gc.collect, torch.cuda.empty_cache() and torch.cuda.ipc_collect()), but it seems to only partially work. When monitoring the GPU memory with nvidia-smi during the tuning, the memory goes up to around 10G during a training, then goes down to 500M between trials, except sometimes (seemingly randomly), the memory exceeds the 32G limit.

It has never happened on the first trial, and when I train a model using the train api with the parameters that were supposed to be used during the trial that crashed, everything works fine. 

Maybe this is related to #433, and sometimes one trial isn't killed correctly and does not free the memory, leading to the OOM.

## Your Environment



- Operating System:
- Python Version Used: 3.7.12
- spaCy Version Used: 2.2.4
- EDS-NLP Version Used: 0.17.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Out of Memory Error when tuning hyperparameters #434

Description

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out of Memory Error when tuning hyperparameters #434

Description

Description

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions