Skip to content

mistralai+Mistral-7B-Instruct-v0.3 tokenizer / lm-eval  #66

@ggbetz

Description

@ggbetz

lm-eval fails to load tokenizer for mistralai+Mistral-7B-Instruct-v0.3

2024-10-02:09:29:16,565 INFO     [api_models.py:121] Using tokenizer huggingface
Traceback (most recent call last):
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2450, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__
    super().__init__(
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 107, in __init__
    raise ValueError(
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/bin/lm-eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/evaluator.py", line 201, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/api/model.py", line 147, in create_from_arg_string
    return cls(**args, **args2)
           ^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 18, in __init__
    super().__init__(
  File "/scratch/slurm_tmpdir/job_2671086/lm-evaluation-harness/lm_eval/models/api_models.py", line 130, in __init__
    self.tokenizer = transformers.AutoTokenizer.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2216, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2451, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/slurm_tmpdir/job_2671086/venv-cot-eval/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError:
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Bad tokenizer file in repo?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions