Skip to content

Non-unique indexes: handle storing many (> 1 million) rows with the same key #3240

@gefjon

Description

@gefjon

Our current implementation of non-unique BTree indexes uses a Vec<RowPointer> (well, SmallVec<[RowPointer; 1]>, but whatever) for multiple rows at the same key. The vec is not sorted, so insertion is fast, but deletion requires a linear scan. This performs well for indexes where rows are well-distributed among the rows, but very poorly for indexes where there are many rows with the same key. Amend our BTree index implementation to be something BTreeSet<K, SameKeyEntry>, where SameKeyEntry is approximately:

enum SameKeyEntry {
    SingleRow(RowPointer),
    SmallNumber(Vec<RowPointer>),
    LargeNumber(HashSet<RowPointer>),
}

Choose an appropriate threshold for transitioning between SmallNumber and LargeNumber, perhaps 1024. Implementer's discretion as to whether SingleRow and SmallNumber are distinct variants, or we continue using SmallVec to combine these two cases.

Metadata

Metadata

Assignees

Labels

bitcraft issueActive issue for the BitCraft teamperformanceA PR/Issue related to improving performance of stdb

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions