Log store abstraction #4111

bcdurak · 2025-10-30T10:02:57Z

This is a large PR, so I would advise you to read the whole description before starting the review.

What do we capture now?

Previously, ZenML used to add a new handler to the root logger. This root logger was used to capture all the logs that goes through the root logger and store them in the artifact store. Additionally, we used to wrap the built-in print function in a way, that we stored the printed messages as well. However, in this case, we missed on a couple of sources such as messages from loggers that do not propagate to the root logger, anything on the stdout/stderr aside from log messages and print statements.

Now, we do the following:

stdout and stderr are now wrapped. we keep the original stdout and stderr.
everything that goes through the new wrapped stdout/stderr still go through the original stdout/stderr.
however, additionally, every message is also directed a classmethod called LoggingContext.emit(...). (will be explained in the following section).
we still add a handler to the root logger. With this the root logger ends up having two handler: the console_handler and the zenml_handler.
While the console_handler formats and writes things to the console, the zenml_handler is responsible for routing all the incoming log messages to the LoggingContext.emit(...) as well.

The new `LoggingContext` class

We have a new LoggingContext class now that replaces the old PipelineLogsContext. It's still a context manager, but operates a bit differently.

When you __init__ this class, it stores the reference to the log store within your active stack. Every the __enter__ method gets called, it checks a context variable called active_logging_context, if there is one, it stores it and replaces the context variable with itself. Similarly, when __exit__ gets called, it removes itself from the context variable and puts the old value back.

One of the most critical parts is the fact that you require a LogsResponse to initiate a LoggingContext now. So, when we ultimately call the emit(...) classmethod, it passes the message and active logging context (along with the correspondingLogsResponse) to the emit(...) method of the log store.

The new `LogStore` component

We have a new type of stack component called a LogStore. It handles log collection and retrieval . Different implementations can plug into this interface to provide different storage backends without changing how logs are captured or accessed.

This PR also introduces three layers of implementation:

1. Layer: `BaseLogStore`

This layer introduces the main abstraction for the new stack component. Main abstract methods include:

emit(...): receives log records and sends them to a specific backend
fetch(...): retrieves stored logs for the dashboard and API based on time filters and limits
finalize(...): finalizes the stream of logs associated with a specific log response

2. Layer: `OtelLogStore`

This is yet another abstraction built on the base log store that implements the core OTEL infrastructure:

emit(...): Activates the log store if not yet activated and translates log recored objects into OTEL format with zenml-specific attributes (e.g., zenml.log_id, zenml.log_uri, zenml.log_store_id) and emits them through the OTEL logger
activate(...): Sets up the OpenTelemetry pipeline including the LoggerProvider, BatchLogRecordProcessor, and LoggingHandler
deactivate(...): Flushes pending logs and shuts down the processor and its background thread

Moreover, It introduces configuration options for the OTEL-standardized logs including: service_name, service_version, max_queue_size, schedule_delay_millis, max_export_batch_size

The following abstract methods are exposed and must be implemented by subclasses:

get_exporter(...): Returns the specific LogExporter instance for the backend
fetch(...): Backend-specific log retrieval (since each backend has different query mechanisms)

3. Layer: Concrete Implementations

ArtifactLogStore

The artifact log store writes logs directly to the artifact store, providing a zero-configuration logging solution that works out of the box:

Uses a custom ArtifactLogExporter that writes LogEntry objects to the artifact store (compatible with our previous approach)
Handles both mutable filesystems (single file with append) and immutable filesystems like GCS (directory with timestamped files that get merged on finalization)
Automatically chunks large messages (>5KB) to prevent storage issues with UTF-8 boundary handling
Implements fetch(...) by streaming log files line-by-line from the artifact store
Can be created automatically from an existing artifact store via from_artifact_store(...) class method
Supports log finalization via an END_OF_STREAM_MESSAGE marker that triggers file merging on immutable filesystems and version removal on others

DatadogLogStore

The Datadog log store exports logs to Datadog's HTTP intake API using OTLP:
Uses an OTLPLogExporter configured with Datadog's OTLP endpoint
Requires an api_key for log ingestion and an application_key for log retrieval
Implements fetch(...) using Datadog's Logs Search API (/api/v2/logs/events/search) with:
Query filtering by service and zenml.log_id

Interaction of the stack with log stores

Similar to the image builders, if you don't have a log store within your active stack, an ArtifactLogStore flavor will be used instead. Since our default approach requires the opentelemetry-sdk, it is added to the pyproject.toml as an additional dependency to the base package.

Various other changes

There was a context variable called redirected which was defaulted to False and never used afterwards. That is now removed.
We changed the way prepare the logs uri for the artifact log store.

Notes

Recheck the flush/deactivate solution.

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
IMPORTANT: I made sure that my changes are reflected properly in the following resources:
- ZenML Docs
- Dashboard: Needs to be communicated to the frontend team.
- Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
- Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

src/zenml/log_stores/artifact/artifact_log_store_flavor.py

src/zenml/log_stores/datadog/datadog_flavor.py

src/zenml/log_stores/artifact/artifact_log_exporter.py

…ature/log-store

bcdurak added 7 commits October 24, 2025 15:16

first checkpoint

0ef5e33

migration

f7fe096

second check

e5ef506

docs checkpoint

a2a88b3

another checkpoint

d7faedf

new checkpoint

96bf1f8

solved conflict

d634475

bcdurak changed the base branch from main to develop October 30, 2025 10:03

github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Oct 30, 2025

bcdurak and others added 17 commits October 30, 2025 13:37

solving things

185d279

some other checkpoint

f4a62fb

formatting

c189af8

new changes

929bd3a

checkpoint

ee9973a

fixing the secret

1f95b4f

some broken checkpoint

4127bbd

new big checkpoint

c733401

merge conflicts resolved

3467617

removing docs for now

443d92e

some fixes

cc39e34

fix the local orch

fddf02b

removed unused

9f50c73

delete the old step logging

50d6833

some fixes

4866c89

running checkpoint

01da27e

Merge branch 'develop' into feature/log-store

6f37cb8

bcdurak commented Nov 18, 2025

View reviewed changes

src/zenml/log_stores/artifact/artifact_log_store_flavor.py Outdated Show resolved Hide resolved

bcdurak commented Nov 18, 2025

View reviewed changes

src/zenml/log_stores/datadog/datadog_flavor.py Outdated Show resolved Hide resolved

new changes

d4243af

stefannica and others added 3 commits December 3, 2025 11:19

Fixed datadog log fetching

255224f

docstrings

6fee1eb

merged develop

776e8d5

bcdurak added the release-notes Release notes will be attached and used publicly for this PR. label Dec 3, 2025

stefannica approved these changes Dec 3, 2025

View reviewed changes

src/zenml/log_stores/artifact/artifact_log_exporter.py Outdated Show resolved Hide resolved

stefannica added 2 commits December 3, 2025 15:02

Update src/zenml/log_stores/artifact/artifact_log_exporter.py

04d18d0

Removed context, fixed datadog fetch time window, used OTEL handler

0378082

bcdurak linked an issue Dec 4, 2025 that may be closed by this pull request

Improve the logging experience #4055

Open

1 task

stefannica and others added 20 commits December 4, 2025 11:11

Implement generic OTEL exporter

523a620

Fix docstrings and spelling errors

8857797

Fix linter errors

3d59d00

merged develop

1a39e17

fixing the unit tests

55563b3

format

14c2f00

Improved otel exporter to use correct fields

7bbac46

Merge branch 'feature/log-store' of github.com:zenml-io/zenml into fe…

183f88c

…ature/log-store

small fix to the runner

fd0368d

one more

1d0c7bf

removed todo

526eb31

more minor fixes

5b84a26

sql zen store changes

a41f4e8

more minor fixes

3725797

another small fix

8c89ef1

minor fixes

3562ade

better log entry fetching

11def57

merged develop

74fea2e

late night changes

8dfcedf

proper limits

9ef088d

bcdurak merged commit 3989d92 into develop Dec 6, 2025
109 of 119 checks passed

bcdurak deleted the feature/log-store branch December 6, 2025 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log store abstraction #4111

Log store abstraction #4111

bcdurak commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Log store abstraction #4111

Log store abstraction #4111

Conversation

bcdurak commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do we capture now?

The new LoggingContext class

The new LogStore component

1. Layer: BaseLogStore

2. Layer: OtelLogStore

3. Layer: Concrete Implementations

ArtifactLogStore

DatadogLogStore

Interaction of the stack with log stores

Various other changes

Notes

Pre-requisites

Types of changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bcdurak commented Oct 30, 2025 •

edited

Loading

The new `LoggingContext` class

The new `LogStore` component

1. Layer: `BaseLogStore`

2. Layer: `OtelLogStore`