Skip to content

Metadata storage for jobs/workflows #240

@anjensan

Description

@anjensan

Metadata storage for bigflow jobs/workflows

There are several usecases for simple document/key-value storage

  1. Save (append) information about executed workflows/jobs.
    ID, run-time, docker hash, execution time, cost estimate, result etc...
    Basically some sort of structured logs, which may be used to
    see execution history & do some cost estimation (manually)

  2. Query for running workflows/jobs, their status (history and/or curenly running workflows)

    bigflow history -w workflow_id
    Such cli api migh be a first step towards "airflow-free" solution
    (aka ability to replace airflow with custom cron-like service)

  3. Communicate between taks/workflows.
    In some rare cases one workflow migh want to check status of another.
    Also workflow migh check if another instance is currently running.
    This especially important for dev-like environments, where
    workflows are executed locally (via bigflow run).

  4. Persist some information between tasks/jobs.
    Like 'last-processed-id' (for incremental processing),
    last time-per-batch (to auto-adjust batch-size) etc.

Database - anything for 1. BigQuery / any-sql-like DB for 1/2/3/4.

Client visible API - TBD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions