Non-unique indexes: handle storing many (> 1 million) rows with the same key

Our current implementation of non-unique BTree indexes uses a `Vec<RowPointer>` (well, `SmallVec<[RowPointer; 1]>`, but whatever) for multiple rows at the same key. The vec is not sorted, so insertion is fast, but deletion requires a linear scan. This performs well for indexes where rows are well-distributed among the rows, but very poorly for indexes where there are many rows with the same key. Amend our BTree index implementation to be something `BTreeSet<K, SameKeyEntry>`, where `SameKeyEntry` is approximately:

```rust
enum SameKeyEntry {
    SingleRow(RowPointer),
    SmallNumber(Vec<RowPointer>),
    LargeNumber(HashSet<RowPointer>),
}
```

Choose an appropriate threshold for transitioning between `SmallNumber` and `LargeNumber`, perhaps 1024. Implementer's discretion as to whether `SingleRow` and `SmallNumber` are distinct variants, or we continue using `SmallVec` to combine these two cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-unique indexes: handle storing many (> 1 million) rows with the same key #3240

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-unique indexes: handle storing many (> 1 million) rows with the same key #3240

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions