-
Notifications
You must be signed in to change notification settings - Fork 2
Add qc metrics #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add qc metrics #66
Conversation
- revert n_unique_samples to number instead of fraction
- now pass t to detect_spikes rather than y
- takes sum of all points that are above the expected maximum value
- if KDE fails, always use global median
- now based on fixing repeated sampling of the same channel
dts = np.diff(t) | ||
dt = np.median(dts) | ||
n_violations = np.sum(np.abs(dts - dt) > atol) | ||
## TODO: make ibllib wrappers to convert metrics to QC vals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is indeed a huge todo. The structure that I see for this, is a separate definition of the ranges for the metric and the corresponding label. I would split this away from the function definition as we might change the interpretation for the metric, while the definition of it is standalone, to be decided. I also want to study the QC system for the ephys more first to see what works well there and what doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kick dt violations
# return bool(even_check & odd_check) | ||
|
||
|
||
def n_early_samples(A: pd.DataFrame | pd.Series, dt_tol: float = 0.001) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstring and definition what is an early sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to processing
return find_early_samples(A, dt_tol=dt_tol).sum() | ||
|
||
|
||
def n_repeated_samples( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstring
return P[1] - P[0] | ||
|
||
|
||
def deviance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docstring
unused argument w_len, also looks like you are using w_len for samples. In sliding operations, I followed a syntax as such
y, t = F.values, F.index.values
fs = 1 / np.median(np.diff(t)) if fs is None else fs
w_size = int(w_len * fs)
return np.median(np.abs(a - np.median(a)) / np.median(a)) | ||
|
||
|
||
def sliding_deviance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible I would argue for not having "sliding" variants of metrics but using a framework defined elsewhere such as sliding operations to use for evaluating metrics in a sliding manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kick it
return np.mean(x) + np.std(x) * np.sqrt(2 * np.log(len(x))) | ||
|
||
|
||
def n_expmax_violations(A: pd.Series | np.ndarray) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge to expmax_violations
return sum(np.abs(a) > exp_max) | ||
|
||
|
||
def expmax_violation(A: pd.Series | np.ndarray) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please docstring
return reg.popt[1] | ||
|
||
|
||
def bleaching_amp(A: pd.Series | np.ndarray) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is a meaningful metric as we don't do a proper quantitative measurement of photon count or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kick it
|
||
|
||
def low_freq_power_ratio(A: pd.Series, f_cutoff: float = 3.18) -> float: | ||
def response_variability_ratio( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
return (responses).mean(axis=0).var() / (responses).var(axis=0).mean() | ||
|
||
|
||
def response_magnitude(A: pd.Series, events: np.ndarray, window: tuple = (0, 1)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftovers from pynapple, fixme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kick it
return dt - A.index.diff() > dt_tol | ||
|
||
|
||
def _fill_missing_channel_names(A: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultimately I would argue these functions should better live in preprocessing
we need to define the differences where preprocessing ends and processing starts, but I think fixing signal issues that come from FP3002 bugs feel like they are in the domain of preprocessing
|
||
return pd.Series(y, index=t) | ||
|
||
def find_repeated_samples( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring me pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review this
return repeated_sample_mask | ||
|
||
|
||
def fix_repeated_sampling( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring me pls
return pd.Series(m, index=t[inds + int(w_size / 2)]) | ||
|
||
|
||
def _eval_metric_sliding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this we should probably sit down together and you need to walk me through the proposed changes in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments, we should sit together and discuss some of your proposed changes and see how we structure the qc looking forward.
some new metrics added, some update, now with ruff format fixes