System Info
Qwen3Moe models will calculate load_balancing_loss during evaluation, which will cause bug during generation on the 2nd (and later) steps.
The problem can be handled by modifying
if output_router_logits:
aux_loss = load_balancing_loss_func(
...
to
if output_router_logits and self.training:
aux_loss = load_balancing_loss_func(
...
Who can help?
No response
Information
Tasks
Reproduction
Perform model.generate() using a Qwen3MoeForCausalLM model.
Expected behavior
Generate successfully.