DLT-META

Documentation | Release Notes | Examples

Project Overview

DLT-META is a metadata-driven framework designed to work with Lakeflow Declarative Pipelines. This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.

In practice, a single generic pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow

Components:

Metadata Interface

Capture input/output metadata in onboarding file
Capture Data Quality Rules
Capture processing logic as sql in Silver transformation file

Generic Lakeflow Declarative Pipeline

Apply appropriate readers based on input metadata
Apply data quality rules with Lakeflow Declarative Pipeline expectations
Apply CDC apply changes if specified in metadata
Builds Lakeflow Declarative Pipeline graph based on input/output metadata
Launch Lakeflow Declarative pipeline

High-Level Process Flow:

Steps

DLT-META Lakeflow Declarative Pipeline Features support

Features	DLT-META Support
Input data sources	Autoloader, Delta, Eventhub, Kafka, snapshot
Medallion architecture layers	Bronze, Silver
Custom transformations	Bronze, Silver layer accepts custom functions
Data Quality Expecations Support	Bronze, Silver layer
Quarantine table support	Bronze layer
create_auto_cdc_flow API support	Bronze, Silver layer
create_auto_cdc_from_snapshot_flow API support	Bronze layer
append_flow API support	Bronze layer
Liquid cluster support	Bronze, Bronze Quarantine, Silver tables
DLT-META CLI	`databricks labs dlt-meta onboard`, `databricks labs dlt-meta deploy`
Bronze and Silver pipeline chaining	Deploy dlt-meta pipeline with `layer=bronze_silver` option using default publishing mode
create_sink API support	Supported formats:`external delta table , kafka` Bronze, Silver layers
Databricks Asset Bundles	Supported
DLT-META UI	Uses Databricks Lakehouse DLT-META App

Getting Started

Refer to the Getting Started

Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal

pre-requisites:

Python 3.8.0 +
Databricks CLI v0.213 or later. See instructions
Install Databricks CLI on macOS:
Install Databricks CLI on Windows:

Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:

databricks auth login --host WORKSPACE_HOST

To enable debug logs, simply add `--debug` flag to any command.

Installing dlt-meta:

Install dlt-meta via Databricks CLI:

    databricks labs install dlt-meta

Onboard using dlt-meta CLI:

If you want to run existing demo files please follow these steps before running onboard command:

Clone dlt-meta:

git clone https://github.com/databrickslabs/dlt-meta.git

Navigate to project directory:
```
cd dlt-meta
```
Create Python virtual environment:
```
python -m venv .venv
```
Activate virtual environment:
```
source .venv/bin/activate
```

Install required packages:

# Core requirements
pip install "PyYAML>=6.0" setuptools databricks-sdk

# Development requirements
pip install delta-spark==3.0.0 pyspark==3.5.5 pytest>=7.0.0 coverage>=7.0.0

# Integration test requirements
pip install "typer[all]==0.6.1"

Set environment variables:

dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home

Run onboarding command:
```
databricks labs dlt-meta onboard
```

The command will prompt you to provide onboarding details. If you have cloned the dlt-meta repository, you can accept the default values which will use the configuration from the demo folder.

Above onboard cli command will:

Push code and data to your Databricks workspace
Create an onboarding job
Display a success message: Job created successfully. job_id={job_id}, url=https://{databricks workspace url}/jobs/{job_id}
Job URL will automatically open in your default browser.

depoly using dlt-meta CLI:

Once onboarding jobs is finished deploy Lakeflow Declarative Pipeline using below command
```
   databricks labs dlt-meta deploy
```

The command will prompt you to provide pipeline configuration details.

Above deploy cli command will:

Deploy Lakeflow Declarative Pipeline with dlt-meta configuration like layer, group, dataflowSpec table details etc to your databricks workspace
Display message: dlt-meta pipeline={pipeline_id} created and launched with update_id={pipeline_update_id}, url=https://{databricks workspace url}/#joblist/pipelines/{pipeline_id}
Pipline URL will automatically open in your defaul browser.

Project Support

Please note that all projects released under Databricks Labs are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as issues on the Github Repo.
They will be reviewed as time permits, but there are no formal SLAs for support.

Name		Name	Last commit message	Last commit date
Latest commit History 558 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
examples		examples
integration_tests		integration_tests
lakehouse_app		lakehouse_app
src		src
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
authors.txt		authors.txt
environment.yaml		environment.yaml
install.sh		install.sh
labs.yml		labs.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DLT-META

Project Overview

Components:

Metadata Interface

Generic Lakeflow Declarative Pipeline

High-Level Process Flow:

Steps

DLT-META Lakeflow Declarative Pipeline Features support

Getting Started

Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal

pre-requisites:

Installing dlt-meta:

Onboard using dlt-meta CLI:

depoly using dlt-meta CLI:

More questions

Project Support

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 16

Languages

License

databrickslabs/dlt-meta

Folders and files

Latest commit

History

Repository files navigation

DLT-META

Project Overview

Components:

Metadata Interface

Generic Lakeflow Declarative Pipeline

High-Level Process Flow:

Steps

DLT-META Lakeflow Declarative Pipeline Features support

Getting Started

Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal

pre-requisites:

Installing dlt-meta:

Onboard using dlt-meta CLI:

depoly using dlt-meta CLI:

More questions

Project Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 16

Languages

Packages