-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi after various various attempts, and although hard to distinguish from the release notes.
I am currently compiling AOCL, ZENDNN, Zentorch using AOCC and explicit compile flags for Zen3 and will need to validate if there is a difference but I suspect not.
NOTE: Hard to determine if this is Zentorch and/or AOCL
When attempting to run AMD Quark quantized BF16 (Weight with or without Attention) Mistral 7b (could be any), Zentorch passes back to AOCL kernels which is what should occur.
However, when this occurs it defaults to AVX512 first and then fails/crashes if the system does not support AVX512 rather than downgrading to AVX2 emulation for BF16.
Current system Specs are 64c 7c13, 512GB DDR4.
Could you please advise if downgrading is expected or if there are additional parameters that need to be taken into consideration. Or is not using system without AVX512 not supported ?
I assuming in part bug here anyway at least given the crash rather than graceful response and reverting as a last resort to standard pytorch kernels.
REF:
[CORE:I][0.000022] Memory created [memory]
[API:I][0.000040] Memory create
[CORE:V0][0.000030] Memory desc init by tag [memory]
[CORE:I][0.000034] Memory created [memory]
[API:I][0.000050] Memory create
[CORE:V0][0.000040] Memory desc init by tag [memory]
[CORE:I][0.000044] Memory created [memory]
[API:I][0.000001] matmul desc create - no bias
[CORE:I][0.000001] matmul desc init [matmul]
[CORE:I][0.000001] CPU Engine: primitive_cache_capacity: 1024
[CORE:V0][0.000001] zendnn_f32_matmul_t::pd_t::init()
[CORE:V0][0.000152] Memory desc init by tag [memory]
[CORE:V0][0.000156] Memory desc init by tag [memory]
[CORE:V0][0.000159] Memory desc init by tag [memory]
[CORE:V0][0.000001] ZenDNN Ref gemm_f32_matmul_t::pd_t::init()
[CORE:V0][0.000007] ZenDNN Ref gemm_f32_matmul_t::pd_t::check_and_configure_attributes
[API:I][0.000209] matmul primitive_desc create - attr
[PROF:I][0.000018] zendnn_primitive_create,cache_miss,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,0.018511,ms
[API:I][0.000262] matmul primitive create
[API:I][0.000265] CPU Stream create
[CORE:I][0.000001] CPU Stream created [stream]
[CORE:V0][0.000001] CPU Stream created [cpu/stream]
[CORE:I][0.000117] ZenDNN Ref gemm_f32_matmul_t::execute_ref
[PROF:I][0.016544] zendnn_primitive_execute,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,16.4969,ms
Traceback (most recent call last):
File "/app/startup/test_llm.py", line 62, in
output = model.generate(input_ids, **generate_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 2642, in generate
result = self._beam_search(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 4077, in _beam_search
model_outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 927, in wrapper
@wraps(func)
File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1201, in forward
return compiled_fn(full_args)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
all_outs = call_func_at_runtime_with_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
out = normalize_as_list(f(args))
^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
outs = compiled_fn(args)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
return compiled_fn(runtime_args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in call
return self.current_callable(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_root/5l/c5llk25lcbpqpibrlsyezhpf4svplbb7vnrhcfymaofa5iujrhsm.py", line 32793, in call
buf24 = torch.ops.zentorch.zentorch_attn_qkv_fusion.default([buf9, buf9, buf9], [reinterpret_tensor(buf18, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf19, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf20, (32, 4096), (4096, 1), 0)], [reinterpret_tensor(buf21, (4096, 4096), (1, 4096), 0), reinterpret_tensor(buf22, (4096, 1024), (1, 4096), 0), reinterpret_tensor(buf23, (4096, 1024), (1, 4096), 0)], [0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [0, 0, 0], [1, 1, 1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 756, in call
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: /media/nvme1/models/scripts/conversions/ZenDNN-pytorch-plugin/src/cpu/cpp/MatmulUtils.hpp:220 check_valid_dtypes_for_matmul : zentorch_matmul bf16 path needs the cpu support avx512bf16
[CORE:I][2.545049] CPU Stream deleted [stream]
[CORE:I][2.545492] CPU Engine deleted [engine]
/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py:111: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Loading model: /models/quantized_output/int8/Mistral_7b