-
Notifications
You must be signed in to change notification settings - Fork 20
Add a script to run lm-eval on OSS #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
I see that we're just using what upstream vllm is using. |
Yeah, this is just to follow the same format the vLLM is using with more models added |
model_name: str, tasks: List[str], tp_size: int, config: Dict[str, Any] | ||
) -> Dict[str, Any]: | ||
trust_remote_code = config.get("trust_remote_code", False) | ||
max_model_len = config.get("max_model_len", 8192) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this likely will impact the result. ideally it's set in auto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like vLLM lm-eval does like that value and ends up with this error:
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 10-10 01:22:45 [__init__.py:215] Automatically detected platform cuda.
Traceback (most recent call last):
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 186, in <module>
main()
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 182, in main
run_lm_eval(args.configs_dir, models, tasks)
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 164, in run_lm_eval
results = run(model_name, selected_tasks, tp_size, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 105, in run
return lm_eval.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 456, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 245, in simple_evaluate
lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 155, in create_from_arg_string
return cls(**args, **args2)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/vllm_causallms.py", line 170, in __init__
"max_model_len": int(self._max_length) if self._max_length else None,
^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'auto'
Let me take a closer look.
The first iteration of the script is at
vllm-eval-harness/run_vllm_eval_harness.py
. The rest are lm-eval configurations modified from the format used by vLLM CI https://github.com/vllm-project/vllm/tree/main/.buildkite/lm-eval-harness/configs. There are a couple of tweaks in the format:B200
, so that we can run the same task on multiple devices on CI if needed--tensor-parallel-size
to dictate how many devices are need to evaluate the modelORG/MODEL
format to make it easier to find the right configTesting
This script can be run locally on B200 with
Next steps
cc @zhewenl @yeqcharlotte