Skip to content

Feat!: Dev-only VDE mode #5087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Aug 12, 2025
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6161f36
Feat: Add support for the non-virtual prod mode
izeigerman Jul 29, 2025
b14b33b
add tests for the plan stage builder
izeigerman Jul 30, 2025
19b9a31
add tests for the snapshot definition
izeigerman Jul 30, 2025
281e3bd
add docs
izeigerman Jul 30, 2025
854b732
move virtual_environment_model attribute into model meta
izeigerman Jul 31, 2025
8144762
update docs
izeigerman Jul 31, 2025
2d836a0
compute preview in dev
izeigerman Jul 31, 2025
8be9fd5
add an integration test
izeigerman Jul 31, 2025
01e6876
fix dbt support
izeigerman Jul 31, 2025
273cb34
use_finalized_state can't be used with non-full vde
izeigerman Jul 31, 2025
2c25c9d
minor root config fix
izeigerman Jul 31, 2025
52eb476
add a migration script
izeigerman Jul 31, 2025
99c8b0b
fix build
izeigerman Aug 1, 2025
325bf76
fixes after rebase
izeigerman Aug 8, 2025
096f139
drop data objects of different types
izeigerman Aug 9, 2025
224cff5
extend unrestorable criteria
izeigerman Aug 11, 2025
1d7c7de
fix tests
izeigerman Aug 11, 2025
2ac5969
improve tests
izeigerman Aug 11, 2025
24e10dd
fix warning
izeigerman Aug 11, 2025
9e36f95
fix typo
izeigerman Aug 11, 2025
dfa0e35
remove obsolete plan checks
izeigerman Aug 11, 2025
06e2541
fix manual categorization for the model kind change
izeigerman Aug 11, 2025
8fe9b65
cosmetic
izeigerman Aug 11, 2025
a90932d
address comments
izeigerman Aug 12, 2025
08d80bf
address doc comments
izeigerman Aug 12, 2025
2670119
adjust intervals based on force rebuild at runtime
izeigerman Aug 12, 2025
282cfe3
fix model kind change edge case
izeigerman Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/guides/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,44 @@ sqlmesh_md5__d3b07384d113edec49eaa6238ad5ff00__dev

This has a downside that now it's much more difficult to determine which table corresponds to which model by just looking at the database with a SQL client. However, the table names have a predictable length so there are no longer any surprises with identfiers exceeding the max length at the physical layer.

#### Virtual Data Environment modes

By default, Virtual Data Environments (VDE) are applied across both development and production environments. This allows SQLMesh to reuse physical tables when appropriate, even when promoting from development to production.

However, users may sometimes prefer their production environment to be non-virtual. The non-exhaustive list of reasons may include:

- Integration with third-party tools and platforms, such as data catalogs, may not work well with the virtual view layer that SQLMesh imposes by default
- A desire to rely on time travel features provided by cloud data warehouses such as BigQuery, Snowflake, and Databricks

To mitigate this, SQLMesh offers an alternative 'dev-only' mode for using VDE. It can be enabled in the project configuration like so:

=== "YAML"

```yaml linenums="1"
virtual_environment_mode: dev_only
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config

config = Config(
virtual_environment_mode="dev_only",
)
```

As the name suggests, 'dev-only' mode means that VDE is applied only in development environments, while in production, model tables and views are updated directly, bypassing the virtual layer. This also means that physical tables in production will be created using the original, unversioned model names. Users will still benefit from VDE and data reuse across development environments.

Please note the following tradeoffs when enabling this mode:

- All data inserted in development environments is used only for [preview](../concepts/plans.md#data-preview-for-forward-only-changes) and will **not** be reused in production
- Reverting a model to a previous version will be applied going forward and may require an explicit data restatement

!!! warning
Switching the mode for an existing project will result in a complete rebuild of all models in the project. Refer to the [Table Migration Guide](./table_migration.md) to migrate existing tables without rebuilding them from scratch.


#### Environment view catalogs

By default, SQLMesh creates an environment view in the same [catalog](../concepts/glossary.md#catalog) as the physical table the view points to. The physical table's catalog is determined by either the catalog specified in the model name or the default catalog defined in the connection.
Expand Down
1 change: 1 addition & 0 deletions docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Configuration options for how SQLMesh manages environment creation and promotion
| `environment_suffix_target` | Whether SQLMesh views should append their environment name to the `schema`, `table` or `catalog` - [additional details](../guides/configuration.md#view-schema-override). (Default: `schema`) | string | N |
| `gateway_managed_virtual_layer` | Whether SQLMesh views of the virtual layer will be created by the default gateway or model specified gateways - [additional details](../guides/multi_engine.md#gateway-managed-virtual-layer). (Default: False) | boolean | N |
| `environment_catalog_mapping` | A mapping from regular expressions to catalog names. The catalog name is used to determine the target catalog for a given environment. | dict[string, string] | N |
| `virtual_environment_mode` | Determines the Virtual Data Environment (VDE) mode. If set to `full`, VDE is used in both production and development environments. The `dev_only` option enables VDE only in development environments, while in production, no virtual layer is used and models are materialized directly using their original names (i.e., no versioned physical tables). (Default: `full`) | string | N |

### Models

Expand Down
11 changes: 11 additions & 0 deletions examples/sushi/config.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os

from sqlmesh.core.config.common import VirtualEnvironmentMode
from sqlmesh.core.config import (
AutoCategorizationMode,
BigQueryConnectionConfig,
Expand Down Expand Up @@ -76,6 +77,16 @@
model_defaults=model_defaults,
)

# A configuration used for SQLMesh tests with virtual environment mode set to DEV_ONLY.
test_config_virtual_environment_mode_dev_only = test_config.copy(
update={
"virtual_environment_mode": VirtualEnvironmentMode.DEV_ONLY,
"plan": PlanConfig(
auto_categorize_changes=CategorizerConfig.all_full(),
),
}
)

# A DuckDB config with a physical schema map.
map_config = Config(
default_connection=DuckDBConnectionConfig(),
Expand Down
29 changes: 29 additions & 0 deletions sqlmesh/core/config/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,35 @@ def __repr__(self) -> str:
return str(self)


class VirtualEnvironmentMode(str, Enum):
"""Mode for virtual environment behavior.

FULL: Use full virtual environment functionality with versioned table names and virtual layer updates.
DEV_ONLY: Bypass virtual environments in production, using original unversioned model names.
"""

FULL = "full"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Is the term "FULL" being used because of "FULL" in auto-categorization? "ALL" feels better to me but I'm wondering if there is some intention behind using "FULL" that I am missing.

DEV_ONLY = "dev_only"

@property
def is_full(self) -> bool:
return self == VirtualEnvironmentMode.FULL

@property
def is_dev_only(self) -> bool:
return self == VirtualEnvironmentMode.DEV_ONLY

@classproperty
def default(cls) -> VirtualEnvironmentMode:
return VirtualEnvironmentMode.FULL

def __str__(self) -> str:
return self.name

def __repr__(self) -> str:
return str(self)


class TableNamingConvention(str, Enum):
# Causes table names at the physical layer to follow the convention:
# <schema-name>__<table-name>__<fingerprint>
Expand Down
21 changes: 14 additions & 7 deletions sqlmesh/core/config/root.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@
from sqlmesh.cicd.config import CICDBotConfig
from sqlmesh.core import constants as c
from sqlmesh.core.console import get_console
from sqlmesh.core.config import EnvironmentSuffixTarget, TableNamingConvention
from sqlmesh.core.config.common import (
EnvironmentSuffixTarget,
TableNamingConvention,
VirtualEnvironmentMode,
)
from sqlmesh.core.config.base import BaseConfig, UpdateStrategy
from sqlmesh.core.config.common import variables_validator, compile_regex_mapping
from sqlmesh.core.config.connection import (
Expand Down Expand Up @@ -110,6 +114,7 @@ class Config(BaseConfig):
physical_schema_mapping: A mapping from regular expressions to names of schemas in which physical tables for corresponding models will be placed.
environment_suffix_target: Indicates whether to append the environment name to the schema or table name.
physical_table_naming_convention: Indicates how tables should be named at the physical layer
virtual_environment_mode: Indicates how environments should be handled.
gateway_managed_virtual_layer: Whether the models' views in the virtual layer are created by the model-specific gateway rather than the default gateway.
infer_python_dependencies: Whether to statically analyze Python code to automatically infer Python package requirements.
environment_catalog_mapping: A mapping from regular expressions to catalog names. The catalog name is used to determine the target catalog for a given environment.
Expand Down Expand Up @@ -148,12 +153,9 @@ class Config(BaseConfig):
env_vars: t.Dict[str, str] = {}
username: str = ""
physical_schema_mapping: RegexKeyDict = {}
environment_suffix_target: EnvironmentSuffixTarget = Field(
default=EnvironmentSuffixTarget.default
)
physical_table_naming_convention: TableNamingConvention = Field(
default=TableNamingConvention.default
)
environment_suffix_target: EnvironmentSuffixTarget = EnvironmentSuffixTarget.default
physical_table_naming_convention: TableNamingConvention = TableNamingConvention.default
virtual_environment_mode: VirtualEnvironmentMode = VirtualEnvironmentMode.default
gateway_managed_virtual_layer: bool = False
infer_python_dependencies: bool = True
environment_catalog_mapping: RegexKeyDict = {}
Expand Down Expand Up @@ -260,6 +262,11 @@ def _normalize_identifiers(key: str) -> None:
"Please specify one or the other"
)

if self.plan.use_finalized_state and not self.virtual_environment_mode.is_full:
raise ConfigError(
"Using the finalized state is only supported when `virtual_environment_mode` is set to `full`."
)
Comment on lines +265 to +268
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because a "partial" state doesn't make sense if you aren't using VDE in prod? If so, what would be the negative in allowing this? Just doesn't have much meaning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since dev-only VDE always assumes a forward-only plan, this check is no longer relevant. Will remove it, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and concluded that indeed this configuration is meaningless when used with dev-only VDEs since there are no previous table versions to go back to.


if self.environment_catalog_mapping:
_normalize_identifiers("environment_catalog_mapping")
if self.physical_schema_mapping:
Expand Down
11 changes: 7 additions & 4 deletions sqlmesh/core/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -1616,6 +1616,11 @@ def plan_builder(
max_interval_end_per_model,
)

if not self.config.virtual_environment_mode.is_full:
forward_only = True
elif forward_only is None:
forward_only = self.config.plan.forward_only

return self.PLAN_BUILDER_TYPE(
context_diff=context_diff,
start=start,
Expand All @@ -1628,9 +1633,7 @@ def plan_builder(
skip_backfill=skip_backfill,
empty_backfill=empty_backfill,
is_dev=is_dev,
forward_only=(
forward_only if forward_only is not None else self.config.plan.forward_only
),
forward_only=forward_only,
allow_destructive_models=expanded_destructive_models,
environment_ttl=environment_ttl,
environment_suffix_target=self.config.environment_suffix_target,
Expand Down Expand Up @@ -2936,7 +2939,7 @@ def _node_or_snapshot_to_fqn(self, node_or_snapshot: NodeOrSnapshot) -> str:
def _plan_preview_enabled(self) -> bool:
if self.config.plan.enable_preview is not None:
return self.config.plan.enable_preview
# It is dangerous to enable preview by default for dbt projects that rely on engines that dont support cloning.
# It is dangerous to enable preview by default for dbt projects that rely on engines that don't support cloning.
# Enabling previews in such cases can result in unintended full refreshes because dbt incremental models rely on
# the maximum timestamp value in the target table.
return self._project_type == c.NATIVE or self.engine_adapter.SUPPORTS_CLONING
Expand Down
63 changes: 62 additions & 1 deletion sqlmesh/core/engine_adapter/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
CommentCreationTable,
CommentCreationView,
DataObject,
DataObjectType,
EngineRunMode,
InsertOverwriteStrategy,
SourceQuery,
Expand Down Expand Up @@ -369,6 +370,9 @@ def replace_query(
kwargs: Optional create table properties.
"""
target_table = exp.to_table(table_name)

table_exists = self._drop_data_object_on_type_mismatch(target_table, DataObjectType.TABLE)

source_queries, columns_to_types = self._get_source_queries_and_columns_to_types(
query_or_df, columns_to_types, target_table=target_table
)
Expand All @@ -390,7 +394,7 @@ def replace_query(
)
# All engines support `CREATE TABLE AS` so we use that if the table doesn't already exist and we
# use `CREATE OR REPLACE TABLE AS` if the engine supports it
if self.SUPPORTS_REPLACE_TABLE or not self.table_exists(target_table):
if self.SUPPORTS_REPLACE_TABLE or not table_exists:
return self._create_table_from_source_queries(
target_table,
source_queries,
Expand Down Expand Up @@ -930,6 +934,28 @@ def clone_table(
)
)

def drop_data_object(self, data_object: DataObject, ignore_if_not_exists: bool = True) -> None:
"""Drops a data object of arbitrary type.

Args:
data_object: The data object to drop.
ignore_if_not_exists: If True, no error will be raised if the data object does not exist.
"""
if data_object.type.is_view:
self.drop_view(data_object.to_table(), ignore_if_not_exists=ignore_if_not_exists)
elif data_object.type.is_materialized_view:
self.drop_view(
data_object.to_table(), ignore_if_not_exists=ignore_if_not_exists, materialized=True
)
elif data_object.type.is_table:
self.drop_table(data_object.to_table(), exists=ignore_if_not_exists)
elif data_object.type.is_managed_table:
self.drop_managed_table(data_object.to_table(), exists=ignore_if_not_exists)
else:
raise SQLMeshError(
f"Can't drop data object '{data_object.to_table().sql(dialect=self.dialect)}' of type '{data_object.type.value}'"
)

def drop_table(self, table_name: TableName, exists: bool = True) -> None:
"""Drops a table.

Expand Down Expand Up @@ -1118,6 +1144,12 @@ def create_view(
if properties.expressions:
create_kwargs["properties"] = properties

if replace:
self._drop_data_object_on_type_mismatch(
view_name,
DataObjectType.VIEW if not materialized else DataObjectType.MATERIALIZED_VIEW,
)

with source_queries[0] as query:
self.execute(
exp.Create(
Expand Down Expand Up @@ -2483,6 +2515,35 @@ def _truncate_table(self, table_name: TableName) -> None:
table = exp.to_table(table_name)
self.execute(f"TRUNCATE TABLE {table.sql(dialect=self.dialect, identify=True)}")

def _drop_data_object_on_type_mismatch(
self, target_name: TableName, expected_type: DataObjectType
) -> bool:
"""Drops a data object if it exists and is not of the expected type.

Args:
target_name: The name of the data object to check.
expected_type: The expected type of the data object.

Returns:
True if the data object exists and is of the expected type, False otherwise.
"""
target_table = exp.to_table(target_name)
existing_data_objects = self.get_data_objects(
schema_(target_table.db, target_table.catalog), {target_table.name}
)
if existing_data_objects:
if existing_data_objects[0].type == expected_type:
return True

logger.warning(
"Target data object '%s' is a %s and not a %s, dropping it",
target_table.sql(dialect=self.dialect),
existing_data_objects[0].type.value,
expected_type.value,
)
self.drop_data_object(existing_data_objects[0])
return False

def _replace_by_key(
self,
target_table: TableName,
Expand Down
4 changes: 3 additions & 1 deletion sqlmesh/core/engine_adapter/redshift.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,9 @@ def replace_query(
"""
import pandas as pd

if not isinstance(query_or_df, pd.DataFrame) or not self.table_exists(table_name):
table_exists = self._drop_data_object_on_type_mismatch(table_name, DataObjectType.TABLE)

if not isinstance(query_or_df, pd.DataFrame) or not table_exists:
return super().replace_query(
table_name,
query_or_df,
Expand Down
3 changes: 3 additions & 0 deletions sqlmesh/core/engine_adapter/shared.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,9 @@ class DataObject(PydanticModel):
def is_clustered(self) -> bool:
return bool(self.clustering_key)

def to_table(self) -> exp.Table:
return exp.table_(self.name, db=self.schema_name, catalog=self.catalog, quoted=True)


class CatalogSupport(Enum):
# The engine has no concept of catalogs
Expand Down
2 changes: 2 additions & 0 deletions sqlmesh/core/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,7 @@ def _load_sql_models(
infer_names=self.config.model_naming.infer_names,
signal_definitions=signals,
default_catalog_per_gateway=self.context.default_catalog_per_gateway,
virtual_environment_mode=self.config.virtual_environment_mode,
**loading_default_kwargs or {},
)

Expand Down Expand Up @@ -683,6 +684,7 @@ def _load_python_models(
audit_definitions=audits,
signal_definitions=signals,
default_catalog_per_gateway=self.context.default_catalog_per_gateway,
virtual_environment_mode=self.config.virtual_environment_mode,
):
if model.enabled:
models[model.fqn] = model
Expand Down
3 changes: 3 additions & 0 deletions sqlmesh/core/model/decorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from sqlglot import exp
from sqlglot.dialects.dialect import DialectType

from sqlmesh.core.config.common import VirtualEnvironmentMode
from sqlmesh.core.macros import MacroRegistry
from sqlmesh.core.signal import SignalRegistry
from sqlmesh.utils.jinja import JinjaMacroRegistry
Expand Down Expand Up @@ -154,6 +155,7 @@ def model(
variables: t.Optional[t.Dict[str, t.Any]] = None,
infer_names: t.Optional[bool] = False,
blueprint_variables: t.Optional[t.Dict[str, t.Any]] = None,
virtual_environment_mode: VirtualEnvironmentMode = VirtualEnvironmentMode.default,
) -> Model:
"""Get the model registered by this function."""
env: t.Dict[str, t.Tuple[t.Any, t.Optional[bool]]] = {}
Expand Down Expand Up @@ -228,6 +230,7 @@ def model(
"audit_definitions": audit_definitions,
"signal_definitions": signal_definitions,
"blueprint_variables": blueprint_variables,
"virtual_environment_mode": virtual_environment_mode,
**rendered_fields,
}

Expand Down
2 changes: 2 additions & 0 deletions sqlmesh/core/model/definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -1062,6 +1062,7 @@ def _data_hash_values(self) -> t.List[str]:
self.gateway,
self.interval_unit.value if self.interval_unit is not None else None,
str(self.optimize_query) if self.optimize_query is not None else None,
self.virtual_environment_mode.value,
]

for column_name, column_type in (self.columns_to_types_ or {}).items():
Expand Down Expand Up @@ -2950,6 +2951,7 @@ def render_expression(
)
),
"formatting": str,
"virtual_environment_mode": lambda value: exp.Literal.string(value.value),
}


Expand Down
2 changes: 2 additions & 0 deletions sqlmesh/core/model/meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from sqlglot.optimizer.normalize_identifiers import normalize_identifiers

from sqlmesh.core import dialect as d
from sqlmesh.core.config.common import VirtualEnvironmentMode
from sqlmesh.core.config.linter import LinterConfig
from sqlmesh.core.dialect import normalize_model_name
from sqlmesh.core.model.common import (
Expand Down Expand Up @@ -83,6 +84,7 @@ class ModelMeta(_Node):
default=None, exclude=True, alias="ignored_rules"
)
formatting: t.Optional[bool] = Field(default=None, exclude=True)
virtual_environment_mode: VirtualEnvironmentMode = VirtualEnvironmentMode.default

_bool_validator = bool_validator
_model_kind_validator = model_kind_validator
Expand Down
Loading
Loading