-
Notifications
You must be signed in to change notification settings - Fork 21
More stable algorithm for variance, standard deviation #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
0f29529
1fbf5f8
322f511
adab8e6
93cd9b3
2be4f74
edb655d
dd2e4b6
936ed1d
1968870
d036ebc
12bcb0f
6f5bece
b1f7b5d
cd9a8b8
27448e4
10214cc
a81b1a3
004fddc
4491ce9
c3a6d88
4dcd7c2
c101a2b
98e1b4e
d0d09df
1139a9c
569629c
50ad095
f88e231
77526fd
0f5d587
31f30c9
3b3369f
24fb532
177b8de
7deb84a
120fbf3
4541c46
aa4b9b3
d5c59e3
b721433
4f26ed8
d77c132
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__version__ = "0.1.dev657+g619a390.d20250606" | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,6 +46,7 @@ | |
_initialize_aggregation, | ||
generic_aggregate, | ||
quantile_new_dims_func, | ||
var_chunk, | ||
) | ||
from .cache import memoize | ||
from .lib import ArrayLayer, dask_array_type, sparse_array_type | ||
|
@@ -1251,7 +1252,8 @@ def chunk_reduce( | |
# optimize that out. | ||
previous_reduction: T_Func = "" | ||
for reduction, fv, kw, dt in zip(funcs, fill_values, kwargss, dtypes): | ||
if empty: | ||
# UGLY! but this is because the `var` breaks our design assumptions | ||
if empty and reduction is not var_chunk: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this code path is an "optimization" for chunks that don't contain any valid groups. so The next issue will be that fill_value is a scalar like
The other place this will matter is in This bit is hairy, and ill-defined. Let me know if you want me to work through it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm partway through implementing something to work here.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking some more, I may have misinterpreted what fill_value is used for. When is it needed for intermediates? |
||
result = np.full(shape=final_array_shape, fill_value=fv, like=array) | ||
elif is_nanlen(reduction) and is_nanlen(previous_reduction): | ||
result = results["intermediates"][-1] | ||
|
@@ -1260,6 +1262,10 @@ def chunk_reduce( | |
kw_func = dict(size=size, dtype=dt, fill_value=fv) | ||
kw_func.update(kw) | ||
|
||
# UGLY! but this is because the `var` breaks our design assumptions | ||
if reduction is var_chunk: | ||
kw_func.update(engine=engine) | ||
|
||
if callable(reduction): | ||
# passing a custom reduction for npg to apply per-group is really slow! | ||
# So this `reduction` has to do the groupby-aggregation | ||
|
Uh oh!
There was an error while loading. Please reload this page.