Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f47377e
Fix import
abarciauskas-bgse Jun 21, 2025
872fb37
Fix some dates and calling write_to_icechunk
abarciauskas-bgse Jun 21, 2025
2983fef
Add testing notebook
abarciauskas-bgse Jun 21, 2025
ea999eb
Switch to scheduled run
abarciauskas-bgse Jun 28, 2025
34c7d77
Add testing notebook
abarciauskas-bgse Jun 28, 2025
b0ebcc6
Fix merge conflicts
abarciauskas-bgse Jun 28, 2025
859ae54
testing
abarciauskas-bgse Jul 3, 2025
6edf747
merge updates
abarciauskas-bgse Jul 3, 2025
5cc5b59
Bump memory and timeout
abarciauskas-bgse Jul 3, 2025
91e1ee0
Bunch of stuff that does not work
Jul 10, 2025
e358ad8
Added branching + tests
Jul 11, 2025
1a79b0f
testing all timesteps
Jul 11, 2025
97c3ded
reactivated secrets injection + set dry run globally + reduce memory
Jul 16, 2025
935ec16
Pin ic<1.0.0 and vz <2.0.0
Jul 16, 2025
554993a
pin vz 1.3.2 and add more flexibility for local testing re secrets
Jul 17, 2025
0130edd
stricter requirments + fix for the local testing logic
Jul 17, 2025
1b8d5a4
Revert local dev if statement
Jul 17, 2025
a9c6c14
More logging
Jul 17, 2025
4918dfd
fix requirements
Jul 17, 2025
8f1d6bb
Add prints + load data before try except
Jul 17, 2025
e2b1fed
switch direct links to external
Jul 17, 2025
4810008
add h5netcdf to requirements
Jul 17, 2025
1b04dc6
remove global dry_run env var
Jul 17, 2025
838f36f
mergecommit
Jul 18, 2025
f7357bf
Improved print messages
Jul 18, 2025
4b4a6b5
Migrated to uv + post-commit hook to sync lambda reqs
Jul 21, 2025
df0d6e3
clean requirements.txt output
Jul 21, 2025
8144153
Add instructions for notebook testing + when I encountered error desc…
Jul 21, 2025
1970025
Updated ic and vz versions
Jul 21, 2025
f2d6745
Add README instructions + pytest deps and config + use official uv pr…
Aug 1, 2025
3979309
bunch of removals + GH action CI + passing tests locally
Aug 11, 2025
7d69ddd
Merge remote-tracking branch 'upstream/main' into refactor-to-vz2-and…
Aug 11, 2025
db22a90
inject EDL creds as env variables
Aug 11, 2025
19843fa
fix uv sync pre-commit hook
Aug 11, 2025
1ca9820
Use external data links for gh action tests
Aug 11, 2025
254121a
set in_region to false for external
Aug 11, 2025
3d83a96
Built new v2-p2 store, tests pass locally, updated docs
Aug 14, 2025
7761e9b
fix editable export with uv
Aug 14, 2025
73aa83e
Fix faulty sync script and omit project in dependencies
Aug 14, 2025
a725b9e
Lighten core dependencies and dont export dev deps
Aug 14, 2025
a2e7d63
rename tests ci
Aug 14, 2025
a37dcbd
Fix error in deployment env to current production store
Aug 14, 2025
24a8263
Added usage snippet and ran production test as dry run
Aug 15, 2025
68da6c4
More detailed logging + increased RAM
Aug 15, 2025
9d01322
forgot what this is
Aug 15, 2025
904dbe2
Merge branch 'refactor-to-vz2-and-ic1' of https://github.com/developm…
Aug 15, 2025
588b788
try except catch in handler + logging
Aug 15, 2025
491f538
Merge branch 'refactor-to-vz2-and-ic1' of https://github.com/developm…
Aug 15, 2025
f0c856b
another logging change
Aug 15, 2025
334ec5c
Local testing instructions
Aug 15, 2025
afe3ee7
add gitignore for local dev
Aug 16, 2025
4dfa521
Add aws dependencies to uv + README update
Aug 16, 2025
d6df309
updated cdk workflow in README to uv
Aug 16, 2025
5f819fd
README handoff
Aug 16, 2025
ae50058
set in_region for ea.open() + requirements sync for cdk
Aug 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ICECHUNK_STORE_DIRECT="s3://nasa-veda-scratch/jbusecke/icechunk/mursst-testing/MUR_test_deployed"
ICECHUNK_STORE_PROD="s3://nasa-eodc-public/icechunk/MUR-JPL-L4-GLOB-v4.1-virtual-v2-p2"
17 changes: 17 additions & 0 deletions .github/workflows/check-requirements.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Check requirements sync
on: [push, pull_request]

jobs:
check-requirements:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- name: Check requirements.txt is in sync
run: |
uv export --format=requirements.txt --no-hashes --no-annotate --no-header > temp-requirements.txt
diff temp-requirements.txt cdk/lambda/requirements.txt || (
echo "requirements.txt is out of sync with pyproject.toml"
echo "Run: uv export --format=requirements.txt --no-hashes --no-annotate --no-header > cdk/lambda/requirements.txt or use pre-commit"
exit 1
)
28 changes: 28 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Integration Tests

on:
push:
branches: [main]
pull_request:

jobs:
test:
runs-on: ubuntu-latest
env: # Set them here for all steps in the job
EARTHDATA_USERNAME: ${{ secrets.EARTHDATA_USERNAME }}
EARTHDATA_PASSWORD: ${{ secrets.EARTHDATA_PASSWORD }}
OUT_OF_REGION: true
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install uv
run: pip install uv

- name: Run pytest
run: uv run pytest --ignore=test_integration_in_region.py
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,9 @@ cdk.out
queue_arn.txt
config.py
*.ipynb_checkpoints/
cdk/lambda/temp_requirements
temp-requirements.txt
cdk/env.json
.gitignore
cdk/events/event.json
test_export
27 changes: 27 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
default_install_hook_types: [pre-commit, post-commit]
repos:
- repo: local
hooks:
- id: sync-requirements
name: Sync requirements.txt with uv before commit
entry: ./scripts/sync_requirements.sh
language: system
always_run: false
pass_filenames: false
stages: [pre-commit]
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.8.4
hooks:
- id: uv-lock
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.12.7
hooks:
# Run the linter.
- id: ruff
types_or: [ python, pyi ]
args: [ --fix ]
# Run the formatter.
- id: ruff-format
types_or: [ python, pyi ]
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
118 changes: 95 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,70 @@

Code for writing to an Icechunk store using CMR subscriptions and other AWS services.

## Example

This snippet shows how to open the store and make a first plot

```python
import icechunk as ic
from icechunk.credentials import S3StaticCredentials
from datetime import datetime
from urllib.parse import urlparse
import earthaccess
import xarray as xr

store_url = "s3://nasa-eodc-public/icechunk/MUR-JPL-L4-GLOB-v4.1-virtual-v2-p2"
store_url_parsed = urlparse(store_url)

storage = ic.s3_storage(
bucket = store_url_parsed.netloc,
prefix = store_url_parsed.path,
from_env=True,
)

def get_icechunk_creds(daac: str = None) -> S3StaticCredentials:
if daac is None:
daac = "PODAAC" # TODO: Might want to change this for a more general version
# https://github.com/nsidc/earthaccess/discussions/1051 could help here.
# assumes that username and password are available in the environment
# TODO: accomodate rc file?
auth = earthaccess.login(strategy="environment")
if not auth.authenticated:
raise PermissionError("Could not authenticate using environment variables")
creds = auth.get_s3_credentials(daac=daac)
return S3StaticCredentials(
access_key_id=creds["accessKeyId"],
secret_access_key=creds["secretAccessKey"],
expires_after=datetime.fromisoformat(creds["expiration"]),
session_token=creds["sessionToken"],
)



# TODO: Is there a way to avoid double opening? Maybe not super important
repo = ic.Repository.open(
storage=storage,
)
# see if reopening works
repo = ic.Repository.open(
storage=storage,
authorize_virtual_chunk_access = ic.containers_credentials(
{
k: ic.s3_refreshable_credentials(
get_credentials=get_icechunk_creds
) for k in repo.config.virtual_chunk_containers.keys()
}
)
)

session = repo.readonly_session('main')
ds = xr.open_zarr(session.store, zarr_format=3, consolidated=False)
ds['analysed_sst'].isel(time=0, lon=slice(10000, 12000), lat=slice(10000, 12000)).plot()
```

> This has been tested on the NASA VEDA hub only for now.


## Background

https://wiki.earthdata.nasa.gov/display/CMR/CMR+Ingest+Subscriptions
Expand All @@ -20,41 +84,49 @@ source .venv/bin/activate
uv pip install -r requirements.txt
```

## Set configuration variables
## Deploying the lambda for processing notifications

Make sure the settings `config.py` are appropriate for your needs.
See `cdk/README.md`.

## Using uv with jupyter lab
```
uv sync
uv run bash
python -m ipykernel install --user --name=mursstvenv --display-name="MURSST-VENV"
```
After refreshing your browser window you should be able to select the "MURSST-VENV" kernel from the upper right corner of the jupyter lab notebook interface.

Add your configuration:
## Rebuilding the store from scratch
To build the store in a new location you can run

```sh
cp config.py.example config.py
```
uv run python scripts/rebuild_store.py
```

## Creating an SQS Queue to receive CMR notifications and creating a subscription for that queue.
**Note that this script will use the store URL from the environment variable `ICECHUNK_STORE_DIRECT`. For local execution this should be defined in the `.env` file.**

`create_queue.py` will create an SQS queue with the necessary policy to receive SNS notifications. This script requires AWS credentials are configured for the target AWS environment.

`subscribe.py` creates a subscription for the queue identified by `queue_arn.txt` (created by `create_queue.py`) to receive CMR granule notifications for the `COLLECTION_CONCEPT_ID` in `config.py`. Note this script uses [`earthaccess`](https://earthaccess.readthedocs.io) to create a bearer token to pass in the subscription request, so you will need to have earthdata login credentials in ~/.netrc or be ready type them when prompted.
## Testing strategy

Setting up the queue and associated subscription are one-time operations so there is no reason to manage them in the infrastructure lifecycle of say, a CDK app (deleting the stack would delete the queue, for example).
The tests in tests/test_integration_in_region.py only work when run in AWS region us-west-2 and with the correct permissions.

```sh
# Ensure proper AWS credentials are set
# Create a queue
python ./create_queue.py
# Create a subscription for the queue to receive notifications about new collection granules
python ./subscribe.py
```
Because of this restriction, we currently recommend running all tests locally—for example, on the NASA VEDA Hub. (The majority of tests are disabled in GitHub CI.)

Make sure the machine has sufficient RAM. The smallest server instances have caused issues in the past.

To run the complete set of tests:

## Looking up your subscription in CMR
```
uv run pytest
```

1. Get a bearer token from https://urs.earthdata.nasa.gov/users/aimeeb/user_tokens
2. Use the bearer token in an Authorization header when making a request to https://cmr.earthdata.nasa.gov/search/subscriptions
3. Use the bearer token to make a request to the URL wrapped in the `<location>` tag in the response from (2).
### Rebuilding the store as part of the testing

Note also that the `name` of the subscription will be `<SUBSCRIBER_ID>-<COLLECTION_CONCEPT_ID>-subscription` using the values set in config.py.
In some cases, you may want to rebuild the entire store and then run the appending logic without affecting the production store. To do this:

## Deploying the lambda for processing notifications
1. **Check and update** the store name in .env. We recommend using the s3://nasa-veda-scratch/ bucket for local testing so that data is regularly purged.

See `cdk/README.md`.
2. **Adjust the stop date** in scripts/rebuild_store.py based on how much data you want to be “missing” before running the appending logic.

3. **Run** through notebooks/hub_test.ipynb to test the appending logic.
Ensure you follow the instructions above so that the uv dependencies are correctly respected within the notebook.
Loading
Loading