Commit 5f83f3a
committed
[IMP] snippets.convert_html_columns: a batch processing story
TLDR: RTFM
Once upon a time, in a countryside farm in Belgium...
At first, the upgrade of databases was straightforward. But, as time
passed, the size of the databases grew, and some CPU-intensive
computations took so much time that a solution needed to be found.
Hopefully, the Python standard library has the perfect module for this
task: `concurrent.futures`.
Then, Python 3.10 appeared, and the usage of `ProcessPoolExecutor`
started to sometimes hang for no apparent reasons. Soon, our hero finds
out he wasn't the only one to suffer from this issue[^1].
Unfortunately, the proposed solution looked overkill. Still, it
revealed that the issue had already been known[^2] for a few years.
Despite the fact that an official patch wasn't ready to be committed,
discussion about its legitimacy[^3] leads our hero to a nicer solution.
By default, `ProcessPoolExecutor.map` submits elements one by one to the
pool. This is pretty inefficient when there are a lot of elements to
process. This can be changed by using a large value for the *chunksize*
argument.
Who would have thought that a bigger chunk size would solve a
performance issue?
As always, the response was in the documentation[^4].
[^1]: https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load
[^2]: python/cpython#74028
[^3]: python/cpython#114975 (review)
[^4]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map
closes #94
Signed-off-by: Nicolas Seinlet (nse) <nse@odoo.com>1 parent 6a7f050 commit 5f83f3a
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
| |||
0 commit comments