Skip to content

Conversation

936187425
Copy link
Collaborator

@936187425 936187425 commented Apr 16, 2024

support for Mixtral-8x7B-v0.1

const int64_t head_dim = args.head_dim();
const int64_t n_kv_heads = args.n_kv_heads().value_or(n_heads);
const int64_t n_local_heads = n_heads / world_size;
const int64_t n_local_kv_heads = n_kv_heads / world_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a heads up. i added support for MQA and GQA, please also include that support in your change. FYI dff774e

you can learn MQA and GQA from this blog: https://iamshobhitagarwal.medium.com/navigating-the-attention-landscape-mha-mqa-and-gqa-decoded-288217d0a7d1

@936187425 936187425 changed the title [model] added support for mixtral moe model [model] add support for mixtral moe model May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants