Skip to content

AOCL BF16 with AVX2, always requires AVX512 #2

@meven3000

Description

@meven3000

Hi after various various attempts, and although hard to distinguish from the release notes.

I am currently compiling AOCL, ZENDNN, Zentorch using AOCC and explicit compile flags for Zen3 and will need to validate if there is a difference but I suspect not.

NOTE: Hard to determine if this is Zentorch and/or AOCL

When attempting to run AMD Quark quantized BF16 (Weight with or without Attention) Mistral 7b (could be any), Zentorch passes back to AOCL kernels which is what should occur.
However, when this occurs it defaults to AVX512 first and then fails/crashes if the system does not support AVX512 rather than downgrading to AVX2 emulation for BF16.

Current system Specs are 64c 7c13, 512GB DDR4.

Could you please advise if downgrading is expected or if there are additional parameters that need to be taken into consideration. Or is not using system without AVX512 not supported ?

I assuming in part bug here anyway at least given the crash rather than graceful response and reverting as a last resort to standard pytorch kernels.

REF:
[CORE:I][0.000022] Memory created [memory]

[API:I][0.000040] Memory create

[CORE:V0][0.000030] Memory desc init by tag [memory]

[CORE:I][0.000034] Memory created [memory]

[API:I][0.000050] Memory create

[CORE:V0][0.000040] Memory desc init by tag [memory]

[CORE:I][0.000044] Memory created [memory]

[API:I][0.000001] matmul desc create - no bias

[CORE:I][0.000001] matmul desc init [matmul]

[CORE:I][0.000001] CPU Engine: primitive_cache_capacity: 1024

[CORE:V0][0.000001] zendnn_f32_matmul_t::pd_t::init()

[CORE:V0][0.000152] Memory desc init by tag [memory]

[CORE:V0][0.000156] Memory desc init by tag [memory]

[CORE:V0][0.000159] Memory desc init by tag [memory]

[CORE:V0][0.000001] ZenDNN Ref gemm_f32_matmul_t::pd_t::init()

[CORE:V0][0.000007] ZenDNN Ref gemm_f32_matmul_t::pd_t::check_and_configure_attributes

[API:I][0.000209] matmul primitive_desc create - attr

[PROF:I][0.000018] zendnn_primitive_create,cache_miss,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,0.018511,ms

[API:I][0.000262] matmul primitive create

[API:I][0.000265] CPU Stream create

[CORE:I][0.000001] CPU Stream created [stream]

[CORE:V0][0.000001] CPU Stream created [cpu/stream]

[CORE:I][0.000117] ZenDNN Ref gemm_f32_matmul_t::execute_ref

[PROF:I][0.016544] zendnn_primitive_execute,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,16.4969,ms

Traceback (most recent call last):

File "/app/startup/test_llm.py", line 62, in

output = model.generate(input_ids, **generate_kwargs)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 2642, in generate

result = self._beam_search(

         ^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 4077, in _beam_search

model_outputs = self(**model_inputs, return_dict=True)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn

return fn(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 927, in wrapper

@wraps(func)

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn

return fn(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1201, in forward

return compiled_fn(full_args)

       ^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper

all_outs = call_func_at_runtime_with_args(

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args

out = normalize_as_list(f(args))

                        ^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn

outs = compiled_fn(args)

       ^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper

return compiled_fn(runtime_args)

       ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in call

return self.current_callable(inputs)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/tmp/torchinductor_root/5l/c5llk25lcbpqpibrlsyezhpf4svplbb7vnrhcfymaofa5iujrhsm.py", line 32793, in call

buf24 = torch.ops.zentorch.zentorch_attn_qkv_fusion.default([buf9, buf9, buf9], [reinterpret_tensor(buf18, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf19, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf20, (32, 4096), (4096, 1), 0)], [reinterpret_tensor(buf21, (4096, 4096), (1, 4096), 0), reinterpret_tensor(buf22, (4096, 1024), (1, 4096), 0), reinterpret_tensor(buf23, (4096, 1024), (1, 4096), 0)], [0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [0, 0, 0], [1, 1, 1])

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 756, in call

return self._op(*args, **kwargs)

       ^^^^^^^^^^^^^^^^^^^^^^^^^

RuntimeError: /media/nvme1/models/scripts/conversions/ZenDNN-pytorch-plugin/src/cpu/cpp/MatmulUtils.hpp:220 check_valid_dtypes_for_matmul : zentorch_matmul bf16 path needs the cpu support avx512bf16

[CORE:I][2.545049] CPU Stream deleted [stream]

[CORE:I][2.545492] CPU Engine deleted [engine]

/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py:111: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.

warnings.warn(

Loading model: /models/quantized_output/int8/Mistral_7b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions