02 Concepts

Azure Modern Data Platform

A typical Azure Modern Data Platform implementation as shown below which will be used to explain the basic concepts of the ELT Framework.

Definition and Instance concepts

Definition: In the ELT framework, a definition refers to a one-time metadata configuration, such as IngestDefinition for ingestion, L1TransformDefinition for Level 1 transformation, and L2TransformDefinition for Level 2 transformation.

Instance: Each execution of a definition generates an instance (e.g., IngestInstance, L1TransformInstance, L2TransformInstance) used for tracking, auditing, data lineage, and re-runs.

Ingest Definition

IngestDefinition configures data loading from various source systems into the Raw/Landing zone of a Datalake. Sources can be cloud-based, on-premises, or manually uploaded (e.g., third-party data) and include databases, flat files, REST APIs, XML/FetchXML, JSON, and other batch data sources. Data first lands in the Raw zone, such as Azure Data Lake Gen2 Storage (ADLS), maintaining its original granularity and format. This zone serves as cost-effective storage, aiding processing pipelines without directly accessing the source system, which is useful for re-runs, decommissioned sources, and separating transformations from ingestion. Folders in the Raw zone are typically partitioned by Source/Entity/Year/Month/Day. File formats from APIs and external files are preserved (e.g., JSON, XML), while data from databases is stored in parquet format when possible. Raw data can then be transformed for downstream use, including Machine Learning (ML) workloads.

Ingest Definition

Source System

Within the context of ELT Framework, Source System refers to any input data source. E.g. ERP, Fleet Management, Historian, Enrolment System etc.

Stream

Within the context of ELT Framework, Stream refers to an entity within the source system. For e.g. a table/view, REST API end point or a flat file within the source system

Ingest Instance

An execution of Ingestion Pipeline using the Ingest Definition will create a Ingest Instance record. The Ingest instance record will have the following data points

Date/Number range of Ingestion
Reload flag
Ingestion status
Destination details of Raw file (container, folder and file)
Duration for which the pipeline ran
Audit data points
Lineage data points

L1 Transform Definition

Once ingested, data is available in the Raw zone of the data lake. From there, it can be transformed using compute resources like Spark notebooks, with results stored in the Trusted zone. The Level 1 TransformDefinition specifies the input path from the Raw Zone, the destination path in the Trusted Zone, the destination in DW (if applicable), and the transformation notebook used. The Trusted/Structured zone enriches data from the Raw zone, maintaining the same granularity and storing it in parquet format. This layer can also be Delta Lake. Examples of data enrichment include:

De-duplication
Removing leading/trailing spaces from strings
Merging/upserting existing data with newer versions
Converting UTC dates to local time zones
Standardizing timestamp formats (e.g., Julian dates)
Flattening JSON and XML files
Adding headers to files without headers
Translating column names to English (e.g., SAP columns in German)
Removing system columns from source data
Implementing SCD patterns

L1 Transform Definition

L1 Transform Instance

An execution of L1Transformation Pipeline using the L1TransformDefinition will create a L1 Transform Instance record. The Ingest instance record will have the following data points.

Reload flag
Transformation status
Details of Raw and Transformed file (container, folder and file)
DW Table or Delta Lake Table where the transformed output is available
Duration for which the pipeline ran
Audit data points
Lineage data points

L2 Transform Definition

Level 2 transformation is where granularity of data changes through application of specific business rules. L2 Transformations can be defined to use the source as either Raw Zone or Trusted Zone or a table in DW. Typical transformations in this layer are:

Aggregation
Pivot/Un-Pivot
Redaction
Consolidation
Data mash-up from different source systems
Snapshots
Post processing
Fact and Dim

L2 Transform Definition

L2 Transform Instance

An execution of L2Transformation Pipeline using the L1TransformDefinition will create a L2 Transform Instance record. The Ingest instance record will have the following data points

Reload flag
Transformation status
Details of Raw and Transformed file (container, folder and file)
DW Table or Delta Lake Table where the transformed output is available
Duration for which the pipeline ran
Audit datapoints
Lineage data points

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

02 Concepts

Azure Modern Data Platform

Definition and Instance concepts

Ingest Definition

Source System

Stream

Ingest Instance

L1 Transform Definition

L1 Transform Instance

L2 Transform Definition

L2 Transform Instance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally