Skip to content

jsherer/treap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Immutable Treap Data Structure

A clean, Pythonic implementation of an immutable treap (tree + heap) data structure that provides O(log n) expected operations with a natural Python API.

Overview

A treap is a randomized binary search tree that maintains both:

  • Binary search tree property: Keys are ordered (left subtree < node < right subtree)
  • Heap property: Each node has a random priority, with parent priority ≥ child priority

The combination of these properties ensures O(log n) expected time complexity for basic operations while maintaining simplicity of implementation.

This implementation is immutable - all operations return new treap instances rather than modifying existing ones, making it thread-safe and suitable for functional programming paradigms.

Features

  • Pythonic API: Dict-like access with [], in, len(), iteration
  • Any Comparable Keys: Works with integers, strings, tuples, or any comparable type
  • Set Operations: Union (|), intersection (&), difference (-)
  • Order Statistics: select(k) for k-th smallest, rank(key) for position
  • Range Queries: Efficient queries over key ranges
  • Immutability: All operations return new instances
  • Memory Efficient: Uses __slots__ and generators
  • Thread-Safe: Safe for concurrent reads due to immutability

Installation

# Clone the repository
git clone <repository-url>
cd <repository-directory>

# Install dependencies
make install

Usage

The package provides a Pythonic API that works with any comparable type:

Basic Operations

from treap import Treap, treap

# Create a new treap
t = Treap()

# Add items
t = t.set("apple", 1)
t = t.set("banana", 2)
t = t.set("cherry", 3)

# Get items (dict-like access)
print(t["apple"])  # 1
print(t.get("date", 0))  # 0 (default)

# Check membership
if "banana" in t:
    print("Found banana!")

# Delete items
t = t.delete("apple")

# Get min/max
print(t.min_key())  # "banana"
print(t.max_key())  # "cherry"

Creation Methods

# From dict
t = Treap.from_dict({"x": 10, "y": 20, "z": 30})

# From pairs
t = Treap.from_pairs([(1, "one"), (2, "two"), (3, "three")])

# Convenience function
t = treap((1, "a"), (2, "b"), (3, "c"))
t = treap(x=10, y=20, z=30)

Set Operations

t1 = treap((1, "a"), (2, "b"), (3, "c"))
t2 = treap((2, "B"), (3, "C"), (4, "D"))

# Union with | operator
t_union = t1 | t2

# Intersection with & operator
t_intersect = t1 & t2

# Difference with - operator
t_diff = t1 - t2

Iteration

t = treap((3, "three"), (1, "one"), (4, "four"), (2, "two"))

# Iterate keys in sorted order
for key in t:
    print(key)  # 1, 2, 3, 4

# Get all keys/values/items
keys = list(t.keys())     # [1, 2, 3, 4]
values = list(t.values())  # ["one", "two", "three", "four"]
items = list(t.items())    # [(1, "one"), (2, "two"), ...]

Range Queries

t = Treap()
for i in range(10):
    t = t.set(i, str(i))

# Get items in range [3, 6]
for key, value in t.range(3, 6):
    print(f"{key}: {value}")

# Exclusive range [3, 6)
for key, value in t.range(3, 6, inclusive=False):
    print(f"{key}: {value}")

Order Statistics

# Get k-th smallest item (0-indexed)
item = t.select(5)  # 6th smallest item

# Get rank of key (number of items less than it)
rank = t.rank(7)  # Number of items < 7

Testing

Run the test suite:

make test

Implementation Details

Performance Characteristics

Operation Time Complexity Notes
set, delete, get O(log n) expected Randomized structure
len() O(1) Cached at node level
min_key, max_key O(log n) Tree traversal
select(k), rank(key) O(log n) Uses size caching
range(start, end) O(k + log n) k = items in range
Union, Intersection O(m log(n/m + 1)) m ≤ n
Iteration O(n) In-order traversal

Memory Usage

  • Node overhead: ~40% less than dict due to __slots__
  • Immutability cost: O(log n) new nodes per modification
  • Total structure: O(n) nodes

Immutability Benefits

  1. Thread Safety: Multiple threads can read the same treap safely
  2. Persistence: Previous versions remain accessible
  3. Undo/Redo: Easy to implement by keeping references
  4. Debugging: State changes are explicit and traceable

Benchmarking

Compare treap performance against Python's dict:

make benchmark        # Standard benchmark (1K, 10K, 100K items)
make benchmark-small  # Quick benchmark (100, 500, 1K items)
make benchmark-large  # Large benchmark (10K, 50K, 100K items)

When to Use Treap vs Dict

Use Treap when you need:

  • Sorted iteration over keys
  • Range queries
  • Immutability/persistence
  • Set operations on large datasets
  • Order statistics (select k-th, rank)

Use Dict when you need:

  • Fastest possible O(1) lookups
  • Frequent modifications
  • No ordering requirements

Algorithm Reference

The implementation is based on the paper: "Fast Set Operations Using Treaps" by Guy E. Blelloch and Margaret Reid-Miller

License

See LICENSE

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass (make test)
  2. New features include tests
  3. Code follows existing style conventions

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published