-
Notifications
You must be signed in to change notification settings - Fork 5
Added mem-dbg across all structs of the crate as optional feature #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ab52962
03d6a03
bb75924
4f6815a
04b0e84
20f3d0d
953cea8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ const LEN_OFFSET: usize = 56; | |
/// Mask used for accessing heap allocated data stored at the pointer in `data` field. | ||
const PTR_MASK: usize = ((1 << LEN_OFFSET) - 1) & !3; | ||
|
||
#[cfg_attr(feature = "mem_dbg", derive(mem_dbg::MemDbg, mem_dbg::MemSize))] | ||
/// Array representation container | ||
pub(crate) struct Array<'a, const P: usize, const W: usize> { | ||
/// Number of items stored in the array | ||
|
@@ -110,7 +111,18 @@ impl<'a, const P: usize, const W: usize> RepresentationTrait for Array<'a, P, W> | |
/// Return memory size of `Array` representation | ||
#[inline] | ||
fn size_of(&self) -> usize { | ||
size_of::<usize>() + size_of_val(self.arr) | ||
// The size of the len attribute, which is an usize | ||
let len_size = size_of::<usize>(); | ||
// The size of the cap attribute, which is an usize | ||
let cap_size = size_of::<usize>(); | ||
// The size of the arr attribute, which is a slice | ||
// and therefore composed of a pointer (usize) and a length (usize) | ||
let arr_size = size_of::<usize>() * 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I understand the "multiply by 2" aspect of this logic, however I don't think we should measure memory usage of |
||
// The size of the values in the arr attribute, which are | ||
// u32 values | ||
let arr_size_values = size_of_val(self.arr); | ||
|
||
len_size + cap_size + arr_size + arr_size_values | ||
} | ||
|
||
/// Free memory occupied by the `Array` representation | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,8 +15,9 @@ const REPRESENTATION_ARRAY: usize = 0x0000_0000_0000_0001; | |
const REPRESENTATION_HLL: usize = 0x0000_0000_0000_0003; | ||
|
||
/// Representation types supported by `CardinalityEstimator` | ||
#[repr(u8)] | ||
#[derive(Debug, PartialEq)] | ||
// #[repr(u8)] | ||
#[derive(PartialEq)] | ||
#[cfg_attr(feature = "mem_dbg", derive(mem_dbg::MemDbg, mem_dbg::MemSize))] | ||
#[enum_dispatch] | ||
pub(crate) enum Representation<'a, const P: usize, const W: usize> { | ||
Small(Small<P, W>), | ||
|
@@ -25,7 +26,6 @@ pub(crate) enum Representation<'a, const P: usize, const W: usize> { | |
} | ||
|
||
/// Representation trait which must be implemented by all representations. | ||
#[enum_dispatch(Representation<P, W>)] | ||
pub(crate) trait RepresentationTrait { | ||
fn insert_encoded_hash(&mut self, h: u32) -> usize; | ||
fn estimate(&self) -> usize; | ||
|
@@ -37,6 +37,64 @@ pub(crate) trait RepresentationTrait { | |
} | ||
} | ||
|
||
impl<'a, const P: usize, const W: usize> core::fmt::Debug for Representation<'a, P, W> { | ||
fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> std::fmt::Result { | ||
write!(f, "{}", self.to_string()) | ||
} | ||
} | ||
|
||
impl<'a, const P: usize, const W: usize> RepresentationTrait for Representation<'a, P, W> { | ||
#[inline] | ||
fn insert_encoded_hash(&mut self, h: u32) -> usize { | ||
match self { | ||
Representation::Small(s) => s.insert_encoded_hash(h) as usize, | ||
Representation::Array(a) => a.insert_encoded_hash(h) as usize, | ||
Representation::Hll(hll) => hll.insert_encoded_hash(h) as usize, | ||
} | ||
} | ||
|
||
#[inline] | ||
fn estimate(&self) -> usize { | ||
match self { | ||
Representation::Small(s) => s.estimate(), | ||
Representation::Array(a) => a.estimate(), | ||
Representation::Hll(hll) => hll.estimate(), | ||
} | ||
} | ||
|
||
#[inline] | ||
fn size_of(&self) -> usize { | ||
// A priori, we do not know which of the child structs is the largest and therefore | ||
// composing the bulk of the memory usage by the enum. Therefore, we subtract the size of | ||
// the variants, and then we sum the size of the enum itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I see, so basically what I am currently measuring would be the 'runtime peak size' in place of the storage size. I suppose the question then is which one we are more interested in measuring. Which one would you rather go for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're primarily interested in measuring heap allocated memory usage of overall There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
let size_of_variant = match self { | ||
Representation::Small(s) => s.size_of() - size_of::<Small<P, W>>(), | ||
Representation::Array(a) => a.size_of() - size_of::<Array<'a, P, W>>(), | ||
Representation::Hll(hll) => hll.size_of() - size_of::<HyperLogLog<'a, P, W>>(), | ||
}; | ||
|
||
size_of_variant + size_of::<Self>() | ||
} | ||
|
||
#[inline] | ||
unsafe fn drop(&mut self) { | ||
match self { | ||
Representation::Small(s) => s.drop(), | ||
Representation::Array(a) => a.drop(), | ||
Representation::Hll(hll) => hll.drop(), | ||
} | ||
} | ||
|
||
#[inline] | ||
fn to_data(&self) -> usize { | ||
match self { | ||
Representation::Small(s) => s.to_data(), | ||
Representation::Array(a) => a.to_data(), | ||
Representation::Hll(hll) => hll.to_data(), | ||
} | ||
} | ||
} | ||
|
||
/// Representation error | ||
#[derive(Debug)] | ||
pub enum RepresentationError { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to enable
mem_dbg
feature by default?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if by the end of this PR we were to replace size_of with mem_dbg - the two methods are by definition very overlapping in scope. Possibly, reading the other comments, you intended size_of to only include the size of the stored object, while mem_dbg has the goal of computing the size of the object in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
mem_dbg
aims to show real heap memory usage similarly tosize_of
then it could be a good replacement / alternative to currentsize_of
implementation.