GitHub - 24jmwangi/modern-datastack: Analytics engineering and dataops

Modern-Datastack

Analytics engineering and dataops

Python + GitHub Actions + dbt Cloud + Terraform: Incremental Loading on BigQuery

Project Overview

This project demonstrates an end-to-end data pipeline leveraging:

Python scripts to ingest data from Google Sheets into PostgreSQL.
Incremental loading from PostgreSQL to BigQuery (Silver dataset).
dbt Cloud for transformations, testing, and loading into the Gold dataset.
Terraform for infrastructure provisioning.
GitHub Actions for CI/CD

Incremental Loading

The pipeline optimizes data transfer to BigQuery by only loading new records since the last successful load.

How it works:

Get last load timestamp
- The script queries the target BigQuery table for the latest timestamp in a specified column.
Extract only new data
- PostgreSQL is queried for rows with a timestamp greater than the last load.
Append to BigQuery
- New rows are appended to the existing table without overwriting previous data.
Fallback to full load
- If the target table doesn't exist or is empty, a full loadwh is performed.

Why?

Reduce processing time.
Minimizes API/data transfer.
Ensure no duplication of previously loaded data.

Data Flow

Google Sheets → PostgreSQL
- Extract data from a ticker list sheet and historical ticker sheets.
- Store in PostgreSQL for quick local analysis.
PostgreSQL → BigQuery (Silver)
- Incrementally load tables based on their timestamp column.
dbt Cloud → BigQuery (Gold)
- Transform and test data before making it analytics-ready. (Section below is a placeholder)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
dbt		dbt
scripts		scripts
terraform		terraform
.gitignore		.gitignore
README.md		README.md
dbtcloud.png		dbtcloud.png
md.png		md.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modern-Datastack

Python + GitHub Actions + dbt Cloud + Terraform: Incremental Loading on BigQuery

Project Overview

Incremental Loading

Data Flow

dbt Transformations & Testing

About

Uh oh!

Releases

Packages

Languages

24jmwangi/modern-datastack

Folders and files

Latest commit

History

Repository files navigation

Modern-Datastack

Python + GitHub Actions + dbt Cloud + Terraform: Incremental Loading on BigQuery

Project Overview

Incremental Loading

Data Flow

dbt Transformations & Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages