Could you help fix the deserialization vulnerability caused by two risky pre-trained models used in this repo?

Hi, @muhammed-abuodeh, I'd like to report that two potentially risky pretrained model is being used in this project, which may pose **deserialization threats**. Please check the following code example:

•    **src/utils/model_downloader.py**

```python
def download_default_models(model_path: Path) -> None:
    # check if models folder exists
    if not os.path.exists(model_path):
        os.mkdir(model_path)

    # check if default models exist
    if not os.path.exists(model_path / "CAMeLBERT-CATiB-biaffine.model"):
        print('downloading catib model')
        hf_hub_download(repo_id="CAMeL-Lab/camelbert-catib-parser", filename="CAMeLBERT-CATiB-biaffine.model", local_dir=model_path)
    if not os.path.exists(model_path / "CAMeLBERT-UD-biaffine.model"):
        print('downloading ud model')
        hf_hub_download(repo_id="CAMeL-Lab/camelbert-ud-parser", filename="CAMeLBERT-UD-biaffine.model", local_dir=model_path)
    print('Default models downloaded.')
```

•    **src/dependency_parser/biaff_parser.py**

```python
def parse(conll_path_or_parsed_tuples: Union[List[List[tuple]], str], parse_model:str) -> List[List[tuple]]:
    parser = Parser.load(parse_model)
    return parser.predict(conll_path_or_parsed_tuples, verbose=False, tree=True, proj=True)
```



#### **Issue Description**

As shown above, in the **src/utils/model_downloader.py** file, the model **"CAMeLBERT-CATiB-biaffine.model"**and **"CAMeLBERT-UD-biaffine.model"**   are first downloaded by the hf_hub_download. Subsequently, the model is loaded  and run via the `Parser.load` .`Parser.load` depends on `torch.load `

Moreover, before version 2.6, Torch.load did not set the weights_only parameter to true by default. In this project, the torch version is exactly lower than this version, which indicates that the deserialization vulnerability may be triggered when loading the model

[First model](https://huggingface.co/CAMeL-Lab/camelbert-ud-parser/tree/main) and [second_model](https://huggingface.co/CAMeL-Lab/camelbert-catib-parser/tree/main) has been **flagged as risky** on the HuggingFace platform. Specifically, its `CAMeLBERT-UD-biaffine.model`  and `CAMeLBERT-CATiB-biaffine.model`file is marked as malicious and may trigger deserialization threats. I think the reason might be that the executable code of the model file contains suspicious modules. Once model is load, the vulnerability could be activated.

![Image](https://github.com/user-attachments/assets/638c026e-8494-47ff-b890-3cb11b365854)

![Image](https://github.com/user-attachments/assets/79e20cba-dd59-4826-ad6b-b5a6e5ac2fb3)

**Related Risk Reports:**：[CAMeL-Lab/camelbert-ud-parser risk report ](https://protectai.com/insights/models/CAMeL-Lab/camelbert-catib-parser/aca64d988bc0b08760929bdf2629e375e60079a7/files?blob-id=10f573bb9daf3f4ca7fb38db0ffb1e10c74ccf0e&utm_source=huggingface)

 

#### Suggested Repair Methods

1. Convert the model to safer safetensors format and re-upload
2. Delete the suspicious modules in the executable code of the model file and re-upload them
3. After upgrading torch to version 2.6, if it can load smoothly, there will be no problem

As a popular machine learning projects, **every potential risk could be propagated and amplified**. Could you please address the above issues?

Thanks for your help~

Best regards,
Silverhand

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could you help fix the deserialization vulnerability caused by two risky pre-trained models used in this repo? #8

Issue Description

Suggested Repair Methods

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you help fix the deserialization vulnerability caused by two risky pre-trained models used in this repo? #8

Description

Issue Description

Suggested Repair Methods

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions