Skip to content

Conversation

@serhiy-katsyuba-intel
Copy link
Contributor

When at least two DP modules are running, each on a separate core, using irq_lock() may lead to interrupts being disabled for a very long time. When module_is_ready_to_process() always returns true, the DP task is executed in a loop all the time except for periods when preempted by higher priority threads. irq_lock() disables interrupts globally. Using irq_lock() on multiple cores can lead to unbalanced double locks without unlock in between.

Consider the case: core 1 calls irq_lock(1); this does not prevent core 2 from also calling flags = irq_lock(2); now flags contains the "interrupts disabled" state as interrupts were previously globally disabled by core 1. Then core 1 calls irq_unlock() -- interrupts are re-enabled; then core 2 calls irq_unlock(flags) to restore interrupts, which actually leads to interrupts being disabled. On the next loop iteration, core 1 calls flags = irq_lock(1), and since then interrupts might be disabled forever with only two DP threads constantly running.

This fixes a regression in multicore DP tests. The issue is triggered by this commit 4225c27, which just allows the DP task to run all the available time without being triggered by LL for every cycle.

When at least two DP modules are running, each on a separate core, using
irq_lock() may lead to interrupts being disabled for a very long time.
When module_is_ready_to_process() always returns true, the DP task is
executed in a loop all the time except for periods when preempted by
higher priority threads. irq_lock() disables interrupts globally. Using
irq_lock() on multiple cores can lead to unbalanced double locks without
unlock in between.

Consider the case: core 1 calls irq_lock(1); this does not prevent core 2
from also calling flags = irq_lock(2); now flags contains the "interrupts
disabled" state as interrupts were previously globally disabled by core 1.
Then core 1 calls irq_unlock() -- interrupts are re-enabled; then core 2
calls irq_unlock(flags) to restore interrupts, which actually leads to
interrupts being disabled. On the next loop iteration, core 1 calls
flags = irq_lock(1), and since then interrupts might be disabled forever
with only two DP threads constantly running.

This fixes a regression in multicore DP tests. The issue is triggered by
this commit 4225c27, which just allows
the DP task to run all the available time without being triggered by LL
for every cycle.

Signed-off-by: Serhiy Katsyuba <serhiy.katsyuba@intel.com>
Copy link
Collaborator

@softwarecki softwarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad my solution helped resolve the issue :)

@lgirdwood
Copy link
Member

@serhiy-katsyuba-intel can you check the internal CI. Thanks !

@serhiy-katsyuba-intel
Copy link
Contributor Author

@serhiy-katsyuba-intel can you check the internal CI. Thanks !

The CI fails on test_00_11_enter_d3_with_topology_stress test on NVL FPGA. That test does not use DP, so should not be directly affected by the changes from this PR. I checked few neighboring PRs: they all fail the same test. cc: @lrudyX , @tmleman .

@lrudyX
Copy link

lrudyX commented Oct 28, 2025

test_00_11_enter_d3_with_topology_stress

Fail is related to DUT issue. Working on solving this problem.

@serhiy-katsyuba-intel
Copy link
Contributor Author

Internal Intel CI now is working and passed successfully.

@abonislawski abonislawski merged commit fe861a6 into thesofproject:main Oct 30, 2025
39 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants