Skip to content

Commit 5898605

Browse files
committed
Polling confirmations: Expand on why directly using duration doesn't work.
1 parent 5caec1f commit 5898605

File tree

1 file changed

+39
-17
lines changed

1 file changed

+39
-17
lines changed

proposals/testing/NNNN-polling-confirmations.md

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -372,27 +372,49 @@ requires macOS 13.0, iOS 16.0, watchOS 9.0, tvOS 16.0 and visionOS 1.0.
372372

373373
### Duration and Concurrent Execution
374374

375-
It is an unfortunate side effect that directly using the `duration` to determine
376-
when to stop polling (i.e. `while duration has not elapsed { poll() }`) is
377-
unreliable in a parallel execution environment. Especially on systems that are
378-
under-resourced, under very high load, or both - such as CI systems. This is
379-
especially the case for the Testing library, which, at time of writing, submits
380-
every test at once to the concurrency system for scheduling. Under this
381-
environment, with heavily-burdened machines running test suites with a very
382-
large amount of tests, there is a very real case that a polling confirmation's
383-
`duration` might elapse before the `body` has had a chance to return even once.
375+
Directly using the `duration` to determine when to stop polling is incredibly
376+
unreliable in a
377+
parallel execution environment, like most platforms Swift Testing runs on. The
378+
fundamental issue is that if polling were to directly use a timeout to determine
379+
when to stop execution, such as:
380+
381+
```swift
382+
let end = ContinuousClock.now + timeout
383+
while ContinuousClock.now < end {
384+
if await runPollAndCheckIfShouldStop() {
385+
// alert the user!
386+
}
387+
await Task.yield
388+
}
389+
```
390+
391+
With enough system load, the polling check might only run a handful of times, or
392+
even once, before the timeout is triggered. In this case, the component being
393+
polled might not have had time to update its status such that polling could
394+
pass. Using the `Aquarium.raiseDolphins` example from earlier: On the first time
395+
that `runPollAndCheckIfShouldStop` executes the background task created by
396+
`raiseDolphins` might not have started executing its closure, leading the
397+
polling to continue. If the system is under sufficiently high load, which can
398+
be caused by having a very large amount of tests in the test suite, then once
399+
the `Task.yield` finishes and the while condition is checked again, then it
400+
might now be past the timeout. Or the task created by `Aquarium.runDolphins`
401+
might have started and the closure run to completion before the next time
402+
`runPollAndCheckIfShouldStop()` is executed. Or both. This approach of using
403+
a clock to check when to stop is inherently unreliable, and becomes increasingly
404+
unreliable as the load on the system increases and as the size of the test suite
405+
increases.
384406

385407
To prevent this, the Testing library will calculate how many times to poll the
386-
`body`. This is done by dividing the `duration` by the `interval`. For example,
408+
`body`. This can be done by dividing the `duration` by the `interval`. For example,
387409
with the default 1 second duration and 1 millisecond interval, the Testing
388-
library will poll 1000 times, waiting 1 millisecond between polling attempts.
389-
This works and is immune to the issues posed by concurrent execution on
390-
heavily-burdened systems.
410+
library could poll 1000 times, waiting 1 millisecond between polling attempts.
411+
This is immune to the issues posed by concurrent execution, allowing it to
412+
scale with system load and test suite size.
391413
This is also very easy for test authors to understand and predict, even if it is
392-
not fully accurate - each poll attempt takes some amount of time, even for very
393-
fast `body` closures. Which means that the real-time duration of a polling
394-
confirmation will always be longer than the value specified in the `duration`
395-
argument.
414+
not fully accurate to wall-clock time - each poll attempt takes some amount of
415+
time, even for very fast `body` closures. Which means that the real-time
416+
duration of a polling confirmation will always be longer than the value
417+
specified in the `duration` argument.
396418

397419
### Usage
398420

0 commit comments

Comments
 (0)