You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 30, 2024. It is now read-only.
i'm not well versed with python and where do i put the downloaded llama-2-7b-chat.Q4_0.gguf file?
i can make llama.cpp work real easy on my laptop but i cant seem to get this to work
i did git clone the neural speed, i did the pip install ... saved the file in run_model.py...
python run_model.py
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
# Specify the GGUF repo on the Hugginface
model_name = "TheBloke/Llama-2-7B-Chat-GGUF"
# Download the the specific gguf model file from the above repo
model_file = "llama-2-7b-chat.Q4_0.gguf"
# make sure you are granted to access this model on the Huggingface.
tokenizer_name = "meta-llama/Llama-2-7b-chat-hf"
prompt = "Once upon a time"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, model_file = model_file)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
(base) root@ubuntu:/usr/local/src/neural-speed# python run_model.py
Traceback (most recent call last):
File "/usr/local/src/neural-speed/run_model.py", line 2, in <module>
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
ImportError: cannot import name 'WeightOnlyQuantConfig' from 'intel_extension_for_transformers.transformers' (/root/miniconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/__init__.py)
(base) root@ubuntu:/usr/local/src/neural-speed#