SP::ThreadPool

tl;dr

#include "thread_pool.hpp"   // just drop the header somewhere in your include path

std::future<void> fut = SP::ThreadPool::submit_task(0, N, [](std::size_t i) {
    heavy_work(i);
});

SP::ThreadPool::wait_for_all();   // blocks until every submitted task finishes
// fut.get() to wait for a specific task the future is returned from

No manual boot‑up required – first call auto‑initialises the pool.
submit_task(...) has overloads for range loops, custom chunking, and per‑task thread caps.
set_thread_count(n) changes the pool size globally, any time.
shutdown() tears everything down (use once at program exit).

Key Features

SP::ThreadPool is a single‑header, C++17 work‑stealing thread pool designed for embarrassingly parallel workloads.

Zero Dependency

The pool only uses the C++ Standard Library

Work Stealing

Idle threads pull tasks from heavier queues for near-perfect load balancing.

Highly-Customizable Workload

You can divide a loop into either by chunk_count, chunk_size, or chunk_multiplier.
chunk_count: Exact number of chunks to divide the task into
chunk_size: How big each task should be. The pool, then, computes appropriate chunk_count.
chunk_multiplier: The number of chunks each threads should be assigned to. The pool, then, computes appropriate chunk_count.

Threads Cap

For simple tasks, either set chunk_count < number_of_available_threads or threads_cap to reduce the overhead of using multi-threading.

Getting Started

Add the header
```
#include "thread_pool.hpp"
```
Add compiler flag
```
-pthread
```
That's it! The thread pool is self-contained inside that header.

Submit work

std::size_t N = 1'000'000;
SP::ThreadPool::submit_task(0, N, [](std::size_t i) { compute(i); });
SP::ThreadPool::wait_for_all();

Tune if required

SP::ThreadPool::set_thread_count(8);      // fixed pool size
SP::ThreadPool::set_work_stealing(false); // FIFO scheduling only

How It Works (under the hood)

Mechanism	Summary
Per‑thread deques	Each worker owns a double‑ended queue. It pushes its own tasks at the back and pops from the front (FIFO, cache‑friendly for recursion).
Work stealing	When a worker runs dry, it scans the other queues and steals from the back of the fullest one, minimizing contention. Stealing can be disabled anytime using `set_work_stealing(bool)`
Thread cap	`set_threads_cap(k)` (permanent) or `submit_task_with_threads_cap()` (temporarily) limit active workers to `k` (useful when sharing CPUs). The pool will automatically use the first `k` threads.
Processor affinity	`set_processor_affinity()` pins each worker to a core (`SetThreadAffinityMask`/`pthread_setaffinity_np`). The pool uses sequential pinning. If there are more threads than cores, we will wrap around and assign multiple threads to a single core. Optional; call once after boot.
Auto‑teardown	When the last task finishes and you call `shutdown()`, all threads join cleanly. The destructor also triggers this on program exit.

Public API

All methods are static – call them via the class.

Method	Purpose	Notes / Warnings
`submit_task(F&& f)`	Enqueue a single functor.	Round‑robin assignment; returns `std::future<void>` immediately.
`submit_task(start, end, F&& f)`	Split `[start,end)` into `threads × 4` chunks (default multiplier) and process in parallel.	Non‑blocking; wait via the returned future or `wait_for_all()`.
`submit_task(start,end,chunk_cnt,F&& f)`	Explicit number of chunks.	Chunk count 0 is a no‑op (future resolves instantly).
`submit_task_with_chunk_multiplier(start,end,mul,F&& f)`	`chunks = mul × active_threads`.	Use for coarse‑/fine‑grained tuning.
`submit_task_with_chunk_size(start,end,chunk_sz,F&& f)`	Fixed item count per chunk.
`submit_task_with_threads_cap(start,end,cap,F&& f)`	Temporarily restrict workers for this task.	Important: tasks submitted with a cap must finish (wait on the future) before submitting new capped tasks, to avoid starvation.
`submit_task_with_threads_cap(start,end,cap,mul,F&& f)`	Cap + custom chunking.	Same caution as above.
`wait_for_all()`	Block until every queued task completes.	Safe to call multiple times.
`set_thread_count(n)`	Resize pool (spawns or joins threads).	Active tasks continue; new size applies to future submissions.
`get_thread_count()`	Current configured pool size.	Does not reflect temporary caps.
`set_threads_cap(k)`	Manually cap active workers.	Applies to all subsequent tasks until changed again.
`get_threads_cap()`	Current global cap.
`set_work_stealing(bool)`	Enable/disable stealing.	Disabling may reduce jitter for real‑time work at the cost of load balance.
`set_processor_affinity()`	Pin threads to cores.	Call once after the pool is running. No‑op on some platforms.
`shutdown()`	Join all workers and free memory.	Call once, usually from `main()` shutdown path.
`soft_boot()`	Pre‑launch threads without submitting a task.	Rarely needed; used in low‑latency systems.

Usage Patterns

Simple parallel loop

std::vector<int> data(1'000'000);
auto fut = SP::ThreadPool::submit_task(0, data.size(), [&](std::size_t i) {
    data[i] = heavy_compute(i);
});
fut.get(); // or SP::ThreadPool::wait_for_all();

Cap workers for an I/O bound section

auto fut = SP::ThreadPool::submit_task_with_threads_cap(
            0, files.size(), /*cap=*/4, [&](std::size_t i) {
    parse_file(files[i]);
});
fut.get();   // MUST finish before you raise the cap or submit further capped work

Custom chunk size

constexpr std::size_t CHUNK = 16;
SP::ThreadPool::submit_task_with_chunk_size(0, N, CHUNK, do_work);

Single, fire‑and‑forget task

SP::ThreadPool::submit_task([] { prewarm(); });

Best Practices

Always wait (future::get() or wait_for_all()) when using per‑call thread caps to avoid deadlocks.
Call set_processor_affinity() after the pool is up and before heavy computation phases.
Avoid very small chunk sizes (< 2–4 µs of work) to minimize scheduler overhead.
For library authors: wrap pool calls so that you can fall back to the caller’s executor in the future.

Building / Integration

No build system magic – include the header and compile with C++17 or newer.
On Windows you may need to link against Synchronize.lib implicitly provided by MSVC (nothing extra to do on recent toolchains).

Roadmap

submit_bulk() returning a future<void> per element group.
Integration with C++26 std::execution.
Optional task priorities.

Tests

(to be filled by repo owner)

Benchmark

Benchmark performed on a 3000×3000 image with max iterations of 300, using 8 threads.

BS::Thread_Pool Performance

Tasks	Mean Time (ms)	Std Dev (ms)	Speed (pixels/ms)	Speedup
1	3209.35	12.24	2804.3	1.00x
2	1609.89	4.61	5590.5	1.99x
4	1501.88	5.27	5992.5	2.14x
8	1048.94	2.45	8580.1	3.06x
16	719.48	23.85	12509.1	4.46x
32	583.73	14.72	15418.1	5.50x
64	553.96	24.56	16246.7	5.79x
128	585.07	38.29	15382.7	5.49x

Maximum speedup: 5.793x using 64 tasks

SP::ThreadPool Performance

Tasks	Mean Time (ms)	Std Dev (ms)	Speed (pixels/ms)	Speedup
1	3150.02	28.70	2857.1	1.00x
2	1582.81	3.22	5686.1	1.99x
4	1483.74	4.78	6065.8	2.12x
8	1038.28	13.72	8668.2	3.03x
16	678.65	17.39	13261.7	4.64x
32	627.67	5.54	14338.8	5.02x
64	545.37	19.13	16502.6	5.78x
128	513.02	9.62	17543.3	6.14x

Maximum speedup: 6.140x using 128 tasks

Key Observations:

SP::ThreadPool shows slightly better performance than BS::Thread_Pool
Both implementations achieve significant speedups (5.79x vs 6.14x)
Optimal task counts differ between implementations (64 vs 128 tasks)

Contributing

Fork & branch.
Follow the style in the existing header (clang‑format file forthcoming).
Open a PR; GitHub CI runs sanitizer + unit tests.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BS_thread_pool.hpp		BS_thread_pool.hpp
LICENSE		LICENSE
README.md		README.md
benchmark.cpp		benchmark.cpp
sp_thread_pool.hpp		sp_thread_pool.hpp
thread_pool_test.cpp		thread_pool_test.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SP::ThreadPool

tl;dr

Key Features

Zero Dependency

Work Stealing

Highly-Customizable Workload

Threads Cap

Getting Started

How It Works (under the hood)

Public API

Usage Patterns

Simple parallel loop

Cap workers for an I/O bound section

Custom chunk size

Single, fire‑and‑forget task

Best Practices

Building / Integration

Roadmap

Tests

Benchmark

BS::Thread_Pool Performance

SP::ThreadPool Performance

Key Observations:

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Paaaark/sp_thread_pool

Folders and files

Latest commit

History

Repository files navigation

SP::ThreadPool

tl;dr

Key Features

Zero Dependency

Work Stealing

Highly-Customizable Workload

Threads Cap

Getting Started

How It Works (under the hood)

Public API

Usage Patterns

Simple parallel loop

Cap workers for an I/O bound section

Custom chunk size

Single, fire‑and‑forget task

Best Practices

Building / Integration

Roadmap

Tests

Benchmark

BS::Thread_Pool Performance

SP::ThreadPool Performance

Key Observations:

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages