-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[pydap backend] enables downloading multiple dim arrays within single http request #10629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+133
−24
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
3677dec
update PydapArrayWrapper to support backend batching
Mikejmnez 4187644
rebase
Mikejmnez 729dc49
pydap-server it not necessary
Mikejmnez 1fd9e18
set `batch=False` as default
Mikejmnez f6a78b0
set `batch=False` as default in datatree
Mikejmnez 326d925
set `batch=False` as default in open groups as dict
Mikejmnez 0f0dede
for flaky, install pydap from repo for now
Mikejmnez a35efa5
initial tests - quantify cached url
Mikejmnez fcb2eae
adds tests to datatree backend to assert multiple dimensions download…
Mikejmnez 677e3de
update testing to show number of download urls
Mikejmnez 7f05a6a
simplified logic
Mikejmnez e360560
specify cached session debug name to actually cache urls
Mikejmnez c6ed8bf
fix for mypy
Mikejmnez 54f6f8d
user visible changes on `whats-new.rst`
Mikejmnez 419b25e
impose sorted to `get_dimensions` method
Mikejmnez 747fcc7
reformat `whats-new.rst`
Mikejmnez 381c499
revert to install pydap from conda and not from repo
Mikejmnez 5f5c4e1
expose checksum as user kwarg
Mikejmnez e15f8cb
include `checksums` optional argument in `whats-new`
Mikejmnez 0a2730c
update to newest release of pydap via pip until conda install is avai…
Mikejmnez a5d2b0f
use requests_cache session with retry-params when 500 errors occur
Mikejmnez 9a88316
update env yml file to use new pydap release via conda
Mikejmnez d2835ab
turn on testing on datatree from test.opendap.org
Mikejmnez b60adb5
rebase with main
Mikejmnez 578b31a
update what`s new
Mikejmnez 25b08cd
removes batch as arg - acts always but only on dimension data arrays
Mikejmnez 0e1ff6c
updates tests
Mikejmnez f4f253a
update `whats new`
Mikejmnez b4c7dda
minor code changes
Mikejmnez ced359f
fix `whats new` changes
Mikejmnez e789324
formatting
Mikejmnez 7bcbd7c
Merge branch 'main' into pydap4_scale
dcherian File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an aside a
with var.dataset.batch_mode():context manager would be nice API for thisUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dcherian. That is a really good suggestion. Currently the
enable_batch...method does not support the context manager protocol (it was never meant to be turned on/off). I totally see what you mean. I'll set it up (and come back to this at a later pr)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why are we turning it off here then
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: It doesn't need it, but is there to show it is only used for dap4 dimensions.
Long answer: the scope of this PR was originally broader, and it worked beyond dimensions:
Originally I enabled an optional parameter
batch=None (default). And so while dimensions were always downloaded within a single dap response (in dap4), there was the option to download other non-dimension variables in a separate individual dap response (say when executingds.load()). With pure pydap, there is no distinction between dims and non-dims. But xarray loads eagerly dims into memory, and so I split the logic this way.I slimmed the PR to do only dimensions, and the performance gain is enough when using
xr.open_mfdataset, that I am pleased if this is merged. I'd more than gladly restore thebatch=None | Iterablebehavior, that further enables non-dimensions to get "batch downloaded" together, for further performance gains. The need for the optional parameter (as opposed to default for dap4) relates to best/safe practices when the remote url points to a virtually aggregated dataset (for example.ncml). In that scenario, "Batch downloading" should prob be avoided, and so this behavior needs to be optional and user-specified aware.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand corrected - I ran some tests with
ds.load()and in the current behavior, it does need to be disabled. I like the idea of using the context manager protocol. But it is not currently implemented.Apologies - I have looked at this for so long, I am starting to get confused with the different iterations of this PR.
So this PR would either:
a)Stay as is: download only dims within single dap url.b)Incorporate more general behavior (enable non-dims, original purpose of this PR).I have no urgency on this, and my preference would be
b)if that is OK with you. I could implement/enable the context manager protocol to improve the API and include it in pydap's next releaseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to defer to your judgement and preferences here :)
The context manager is just a nice-to-have and not a blocker. It does sound like there's already a nice improvement. I'm happy to merge as-is at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I recommend merging smaller, incremental changes whenever feasible. They are easier to review and improvements out into the world faster.
(There is a separate question of whether the pydap backend should be split of Xarray, given its growing complexity, but I'm also happy to defer that to another day.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - Lets merge as is. This is ready, and it will be nice to get something out in the world right now. It's been waaay too long...
I can see the "growing complexity" argument on maintainers / developers. Definitely a question for another day. I would be happy to be part of the conversation.