Skip to content

Could you help fix the deserialization vulnerability caused by two risky pre-trained models used in this repo? #8

@Slverhand

Description

@Slverhand

Hi, @muhammed-abuodeh, I'd like to report that two potentially risky pretrained model is being used in this project, which may pose deserialization threats. Please check the following code example:

src/utils/model_downloader.py

def download_default_models(model_path: Path) -> None:
    # check if models folder exists
    if not os.path.exists(model_path):
        os.mkdir(model_path)

    # check if default models exist
    if not os.path.exists(model_path / "CAMeLBERT-CATiB-biaffine.model"):
        print('downloading catib model')
        hf_hub_download(repo_id="CAMeL-Lab/camelbert-catib-parser", filename="CAMeLBERT-CATiB-biaffine.model", local_dir=model_path)
    if not os.path.exists(model_path / "CAMeLBERT-UD-biaffine.model"):
        print('downloading ud model')
        hf_hub_download(repo_id="CAMeL-Lab/camelbert-ud-parser", filename="CAMeLBERT-UD-biaffine.model", local_dir=model_path)
    print('Default models downloaded.')

src/dependency_parser/biaff_parser.py

def parse(conll_path_or_parsed_tuples: Union[List[List[tuple]], str], parse_model:str) -> List[List[tuple]]:
    parser = Parser.load(parse_model)
    return parser.predict(conll_path_or_parsed_tuples, verbose=False, tree=True, proj=True)

Issue Description

As shown above, in the src/utils/model_downloader.py file, the model **"CAMeLBERT-CATiB-biaffine.model"**and "CAMeLBERT-UD-biaffine.model" are first downloaded by the hf_hub_download. Subsequently, the model is loaded and run via the Parser.load .Parser.load depends on torch.load

Moreover, before version 2.6, Torch.load did not set the weights_only parameter to true by default. In this project, the torch version is exactly lower than this version, which indicates that the deserialization vulnerability may be triggered when loading the model

First model and second_model has been flagged as risky on the HuggingFace platform. Specifically, its CAMeLBERT-UD-biaffine.model and CAMeLBERT-CATiB-biaffine.modelfile is marked as malicious and may trigger deserialization threats. I think the reason might be that the executable code of the model file contains suspicious modules. Once model is load, the vulnerability could be activated.

Image

Image

Related Risk Reports:CAMeL-Lab/camelbert-ud-parser risk report

Suggested Repair Methods

  1. Convert the model to safer safetensors format and re-upload
  2. Delete the suspicious modules in the executable code of the model file and re-upload them
  3. After upgrading torch to version 2.6, if it can load smoothly, there will be no problem

As a popular machine learning projects, every potential risk could be propagated and amplified. Could you please address the above issues?

Thanks for your help~

Best regards,
Silverhand

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions