-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi, @muhammed-abuodeh, I'd like to report that two potentially risky pretrained model is being used in this project, which may pose deserialization threats. Please check the following code example:
• src/utils/model_downloader.py
def download_default_models(model_path: Path) -> None:
# check if models folder exists
if not os.path.exists(model_path):
os.mkdir(model_path)
# check if default models exist
if not os.path.exists(model_path / "CAMeLBERT-CATiB-biaffine.model"):
print('downloading catib model')
hf_hub_download(repo_id="CAMeL-Lab/camelbert-catib-parser", filename="CAMeLBERT-CATiB-biaffine.model", local_dir=model_path)
if not os.path.exists(model_path / "CAMeLBERT-UD-biaffine.model"):
print('downloading ud model')
hf_hub_download(repo_id="CAMeL-Lab/camelbert-ud-parser", filename="CAMeLBERT-UD-biaffine.model", local_dir=model_path)
print('Default models downloaded.')
• src/dependency_parser/biaff_parser.py
def parse(conll_path_or_parsed_tuples: Union[List[List[tuple]], str], parse_model:str) -> List[List[tuple]]:
parser = Parser.load(parse_model)
return parser.predict(conll_path_or_parsed_tuples, verbose=False, tree=True, proj=True)
Issue Description
As shown above, in the src/utils/model_downloader.py file, the model **"CAMeLBERT-CATiB-biaffine.model"**and "CAMeLBERT-UD-biaffine.model" are first downloaded by the hf_hub_download. Subsequently, the model is loaded and run via the Parser.load
.Parser.load
depends on torch.load
Moreover, before version 2.6, Torch.load did not set the weights_only parameter to true by default. In this project, the torch version is exactly lower than this version, which indicates that the deserialization vulnerability may be triggered when loading the model
First model and second_model has been flagged as risky on the HuggingFace platform. Specifically, its CAMeLBERT-UD-biaffine.model
and CAMeLBERT-CATiB-biaffine.model
file is marked as malicious and may trigger deserialization threats. I think the reason might be that the executable code of the model file contains suspicious modules. Once model is load, the vulnerability could be activated.
Related Risk Reports::CAMeL-Lab/camelbert-ud-parser risk report
Suggested Repair Methods
- Convert the model to safer safetensors format and re-upload
- Delete the suspicious modules in the executable code of the model file and re-upload them
- After upgrading torch to version 2.6, if it can load smoothly, there will be no problem
As a popular machine learning projects, every potential risk could be propagated and amplified. Could you please address the above issues?
Thanks for your help~
Best regards,
Silverhand