Migrate report data to S3 bucket

## What

In the report analysis notebooks, change all functions that read data from local paths to read from S3 (it's okay to keep local caching files local, the scope is the raw input data):

1. Read from the S3 bucket `s3://data.sb` in `us-west-2`
2. Time how long these calls take run from a laptop
3. Time how long they take from a devcontainer in `us-west-2` AWS region

## Why

To finally migrate the repository over to reading data from our data lake rather than local filesystems.

## How

* Time how long each report takes to render locally
* Move the following datasets to the bucket:
  * `ny_aeba_grid`: `nyiso_hourly_load.parquet`
  * `ct_hp_rates`: bsf metadata and monthly loads (need to download first)
  * `ri_hp_rates`: bsf metadata and monthly loads
  * `il_lea`: `il_lea_data.csv` though this should probably be read from google sheets
  * `il_npa`: ??
  * (and any other that I missed that the reports need to run)
* Update the data read calls to use arrow reading from s3 paths
* Time how long each report takes the render when reading from S3
* Time a couple of individual reads functions, especially ones reading large parquet files
* Repeat the report / function timings on devcontainer
* The idea is ultimately to learn if the read times are tolerable if working locally, and working a devcontainer in AWS
  * We ultimately want to work in cloud devcontainers anyway, so maybe it's intolerably slow locally but fast enough on AWS, so then we prioritize getting on AWS
  * Or maybe even on AWS it's too close, in which case we can create another issue to create a lightweight caching solution: if you've already downloaded the data before, rerunning the read function should load from disk, not from network.

## Deliverables

- [ ] Data in s3
- [ ] PR with functioning read-from-s3 calls
- [ ] Comments in this thread documenting the file and function read times

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate report data to S3 bucket #30

What

Why

How

Deliverables

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate report data to S3 bucket #30

Description

What

Why

How

Deliverables

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions