AOCL BF16 with AVX2, always requires AVX512

Hi after various various attempts, and although hard to distinguish from the release notes.

I am currently compiling AOCL, ZENDNN, Zentorch using AOCC and explicit compile flags for Zen3 and will need to validate if there is a difference but I suspect not.

NOTE: Hard to determine if this is Zentorch and/or AOCL

When attempting to run AMD Quark quantized BF16 (Weight with or without Attention) Mistral 7b (could be any), Zentorch passes back to AOCL kernels which is what should occur.
However, when this occurs it defaults to AVX512 first and then fails/crashes if the system does not support AVX512 rather than downgrading to AVX2 emulation for BF16.

Current system Specs are 64c 7c13, 512GB DDR4.

Could you please advise if downgrading is expected or if there are additional parameters that need to be taken into consideration. Or is not using system without AVX512 not supported ?

 I assuming in part bug here anyway at least given the crash rather than graceful response and reverting as a last resort to standard pytorch kernels. 


REF:
[CORE:I][0.000022] Memory created [memory]

[API:I][0.000040] Memory create

[CORE:V0][0.000030] Memory desc init by tag [memory]

[CORE:I][0.000034] Memory created [memory]

[API:I][0.000050] Memory create

[CORE:V0][0.000040] Memory desc init by tag [memory]

[CORE:I][0.000044] Memory created [memory]

[API:I][0.000001] matmul desc create - no bias

[CORE:I][0.000001] matmul desc init [matmul]

[CORE:I][0.000001] CPU Engine: primitive_cache_capacity: 1024

[CORE:V0][0.000001] zendnn_f32_matmul_t::pd_t::init()

[CORE:V0][0.000152] Memory desc init by tag [memory]

[CORE:V0][0.000156] Memory desc init by tag [memory]

[CORE:V0][0.000159] Memory desc init by tag [memory]

[CORE:V0][0.000001] ZenDNN Ref gemm_f32_matmul_t::pd_t::init()

[CORE:V0][0.000007] ZenDNN Ref gemm_f32_matmul_t::pd_t::check_and_configure_attributes

[API:I][0.000209] matmul primitive_desc create - attr

[PROF:I][0.000018] zendnn_primitive_create,cache_miss,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,0.018511,ms

[API:I][0.000262] matmul primitive create

[API:I][0.000265] CPU Stream create

[CORE:I][0.000001] CPU Stream created [stream]

[CORE:V0][0.000001] CPU Stream created [cpu/stream]

[CORE:I][0.000117] ZenDNN Ref gemm_f32_matmul_t::execute_ref

[PROF:I][0.016544] zendnn_primitive_execute,cpu,plugin_op:zentorch::zentorch_bmm,matmul,gemm:jit,undef,src_f32::blocked:abc:f0 wei_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,,,4x64x1:4x1x8:4x64x8,16.4969,ms

Traceback (most recent call last):

  File "/app/startup/test_llm.py", line 62, in <module>

    output = model.generate(input_ids, **generate_kwargs)

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 2642, in generate

    result = self._beam_search(

             ^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py", line 4077, in _beam_search

    model_outputs = self(**model_inputs, return_dict=True)

                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl

    return forward_call(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn

    return fn(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 927, in wrapper

    @wraps(func)

  File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn

    return fn(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1201, in forward

    return compiled_fn(full_args)

           ^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper

    all_outs = call_func_at_runtime_with_args(

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args

    out = normalize_as_list(f(args))

                            ^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn

    outs = compiled_fn(args)

           ^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper

    return compiled_fn(runtime_args)

           ^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__

    return self.current_callable(inputs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/tmp/torchinductor_root/5l/c5llk25lcbpqpibrlsyezhpf4svplbb7vnrhcfymaofa5iujrhsm.py", line 32793, in call

    buf24 = torch.ops.zentorch.zentorch_attn_qkv_fusion.default([buf9, buf9, buf9], [reinterpret_tensor(buf18, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf19, (32, 4096), (4096, 1), 0), reinterpret_tensor(buf20, (32, 4096), (4096, 1), 0)], [reinterpret_tensor(buf21, (4096, 4096), (1, 4096), 0), reinterpret_tensor(buf22, (4096, 1024), (1, 4096), 0), reinterpret_tensor(buf23, (4096, 1024), (1, 4096), 0)], [0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [0, 0, 0], [1, 1, 1])

            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 756, in __call__

    return self._op(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^

RuntimeError: /media/nvme1/models/scripts/conversions/ZenDNN-pytorch-plugin/src/cpu/cpp/MatmulUtils.hpp:220 check_valid_dtypes_for_matmul : zentorch_matmul bf16 path needs the cpu support avx512bf16

[CORE:I][2.545049] CPU Stream deleted [stream]

[CORE:I][2.545492] CPU Engine deleted [engine]

/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.

  warnings.warn(

Loading model: /models/quantized_output/int8/Mistral_7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AOCL BF16 with AVX2, always requires AVX512 #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AOCL BF16 with AVX2, always requires AVX512 #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions