-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Metadata storage for bigflow jobs/workflows
There are several usecases for simple document/key-value storage
-
Save (append) information about executed workflows/jobs.
ID, run-time, docker hash, execution time, cost estimate, result etc...
Basically some sort of structured logs, which may be used to
see execution history & do some cost estimation (manually) -
Query for running workflows/jobs, their status (history and/or curenly running workflows)
bigflow history -w workflow_id
Such cli api migh be a first step towards "airflow-free" solution
(aka ability to replace airflow with custom cron-like service) -
Communicate between taks/workflows.
In some rare cases one workflow migh want to check status of another.
Also workflow migh check if another instance is currently running.
This especially important for dev-like environments, where
workflows are executed locally (via bigflow run). -
Persist some information between tasks/jobs.
Like 'last-processed-id' (for incremental processing),
last time-per-batch (to auto-adjust batch-size) etc.
Database - anything for 1. BigQuery / any-sql-like DB for 1/2/3/4.
Client visible API - TBD.