High-performance HyperLogLog with bias correction and full concurrency support. Used for accurate and space-efficient cardinality estimation.
HyperLogLogs are space efficient data structures for the "count-distinct problem", approximating the number of distinct elements in a multiset. Paper.
hyperloglockless offers a lockless concurrent HyperLogLog and a single threaded counterpart. They're simpler, faster, and more accurate than other HyperLogLog implementations:
- 🧵 Concurrent:
AtomicHyperLogLogis a drop-in replacement forRwLock<OtherHyperLogLog>: all methods take&self, so you can wrap it inArcand update it concurrently without&mut. - ⚡ Fast: Designed to be fast and simple in both single and multi-threaded scenarios.
- 🎯 Accurate: Empirically verified accuracy for trillions of elements; other implementations break down after millions.
- ✅ Tested: Rigorously tested with loom and benchmarked.
[dependencies]
hyperloglockless = "0.3.0"A HyperLogLog with precision p uses 2^p bytes of memory and has an error % of roughly 104 / sqrt(2^p).
use hyperloglockless::HyperLogLog;
let precision = hyperloglockless::precision_for_error(0.01); // 1% error
assert_eq!(precision, 14);
let mut hll = HyperLogLog::new(precision);
hll.insert(&'🦀');
hll.insert_all('a'..='z');
let count = hll.count(); // ~27
assert_eq!(hll.len(), 1 << precision); // 16384 bytesFull concurrency support: AtomicHyperLogLog is a drop-in replacement for RwLock<OtherHyperLogLog>: all methods take &self.
use hyperloglockless::AtomicHyperLogLog;
let hll = AtomicHyperLogLog::new(14);
hll.insert(&'🦀');
hll.insert_all('a'..='z');Both hyperloglockless::HyperLogLog and hyperloglockless::AtomicHyperLogLog are extremely fast for both insert and count calls.
hyperloglockless::AtomicHyperLogLog does not require any locking and therefore avoids thread contention.
hyperloglockless stays consistently accurate while other implementations break down after millions of items.
rand- Enabled by default, this has theDefaultHashersource its random state usingthread_rng()instead of hardware sources. Getting entropy from a user-space source is considerably faster, but requires additional dependencies to achieve this. Disabling this feature by usingdefault-features = falsemakesDefaultHashersource its entropy usinggetrandom, which will have a much simpler code footprint at the expense of speed.serde- HyperLogLogs implementSerializeandDeserializewhen possible.loom-AtomicHyperLogLogs use loom atomics, making it compatible with loom testing.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.



