You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding deadlock detection to ConSan.
To detect deadlocks we first need to keep track of the barrier state -
we introduce `barrierState` count and phase aux structure that encodes
information about phase, init_count and current_count. We then
instrument barrier inits and arrivals - init set up init_count =
current_count = cout; phase = 0. Arrival decrements current_count and if
it hits 0, flips the phase and re-sets current_count. We treat TMA and
TC commits as barrier arrivals with count=1.
With that information we can then check if all the threads are waiting
(which constitutes a deadlock). To do that we instrument entries and
exits from barrier_wait. We introduce `waiting` aux structure, that
keeps track of waiting threads, and mapping to which barrier they are
waiting on. When issuing a wait, the thread adds itself to the
`waiting`, and checks if all the other threads are also waiting. Thread
that waits on a barrier with phase that does not match the current phase
of that barrier (info from `barrierState`) is not considered waiting.
Verified that we catch deadlocks with cases like barriers underarrival
(didn't flip the phase), overarrival (flipped the phase too many times),
etc.
Copy file name to clipboardExpand all lines: include/triton/Dialect/TritonInstrument/IR/TritonInstrument.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,6 +31,8 @@ All types are generated on-demand (per partition) based on:
31
31
- readVisibility (scratch, <B x 64 x i64>): Per-buffer, per-thread lanes. Each lane stores a 64-bit mask of other threads whose reads are visible to that lane’s thread
- barrierStates (scratch, <Kxi32>): Packed barrier metadata. Bit 0 stores the current phase, bits [1..8] the initial arrival count, bits [9..16] the current arrival count. The verifier checks underflow before updating, and flips the phase when the current count reaches zero.
35
+
- waiting (scratch, <Kxi32>): Per-barrier bitfield describing waiting threads. Each base thread gets two bits: bit (2 * thread + 0) is the waiting flag, bit (2 * thread + 1) stores the phase the thread is waiting on.
34
36
- outstandingCommits (scratch, <B x 16 x i8>): Per-buffer, per-base-thread commit counters for cp.async and wgmma
35
37
36
38
## Visibility and legality rules
@@ -53,6 +55,20 @@ ConSan separates “tracking” from “visibility transfer”:
53
55
- At arrive/commit sites (e.g., tc commit, arrive on mbarrier): ConSan emits the track ops for both reads and writes.
54
56
- At waits: experimental_transfer_visible_reads / experimental_transfer_visible_writes propagates tracked visibility from the barrier back into the waiting thread’s visibility, and this transfer is repeated to peer threads (base, TMA, TC) to keep the three classes consistent.
55
57
58
+
### Barrier phase/count tracking
59
+
60
+
- experimental_init_barrier_state(barrier, count, barrierStates) initializes the per-barrier state with phase = 0 and both initial/current arrival counts = `count`.
61
+
- experimental_verify_barrier_arrive(barrier, count, barrierStates) checks that subtracting `count` from the current arrival count would not underflow. The codegen emits an assert if it would.
62
+
- experimental_update_barrier_state(barrier, count, barrierStates) applies the arrive: subtracts `count`, flips the phase when the count reaches zero, and reloads the current count from the initial count.
63
+
64
+
### Deadlock detection
65
+
66
+
ConSan records which phase each thread is waiting on:
67
+
68
+
- experimental_set_waiting(barrier, baseThread, phase, barriers, waiting) sets the waiting flag for `baseThread` and stores the requested `phase`. The flag/phase bits share the waiting bitfield (two bits per base thread).
69
+
- experimental_check_all_active_waiting(activeMask, barriers, waiting, barrierStates) filters waiting threads to those whose stored phase matches the current barrier phase. If all active threads are waiting on matching phases, it raises a deadlock assert.
70
+
- experimental_clear_waiting(barrier, baseThread, barriers, waiting) clears the waiting bits for `baseThread`. Each wait clears its own state after the wait completes.
71
+
56
72
## Commit-count–based synchronization
57
73
58
74
Some hardware ops synchronize via “number of outstanding commits” rather than mbarriers.
let summary = "update the auxiliary barrier state after a verified arrive";
463
+
let description = [{
464
+
Apply an arrive count to the tracked barrier state, toggling the phase when the count reaches zero and reloading the current count from the initial count.
let summary = "Assert that not all active threads are waiting on matching phases";
502
+
let description = [{
503
+
Filter waiting threads to those whose recorded phase matches the current barrier phase, OR-reduce across barriers, and assert that (waitingMask & activeMask) != activeMask.
0 commit comments