Skip to content

Cross catalog query support #49

@MatBielGrapeUp

Description

@MatBielGrapeUp

I would like to use databricks-sqlalchemy to query tables located in different schemas and catalogs (or even JOIN them) using the same engine, current documentation presents only a single configuration where both schema and catalog are provided as part of the URL:
databricks://token:{token}@{hostname}?http_path={http_path}&schema={schema}&catalog={catalog}

I discovered (it's not documented) that i can remove both parameters from the URL and use __table_args__ to provide these
for every Table:

class SomeTable(Base):
    __tablename__ = "some_table"
    __table_args__ = (
        {"schema": quoted_name(f"my_catalog.my_schema", quote=False)},
    )

    some_field = Column(String(6), nullable=False)

It looks like a hack, because the schema property value is actually a catalog + schema, and i need to use quoted_name to avoid some errors.

Would it be possible to support some dialect specific property, for example databricks_catalog to define a catalog as part of the __table_args__? It would look like this:

__table_args__ = (
    {"schema": "my_schema", "databricks_catalog": "my_catalog"},
)

Just catalog is not allowed here because it leads to an error:
TypeError: Additional arguments should be named <dialectname>_<argument>, got 'catalog'
But dialect specific should be fine.

Having such feature would be easier for us to use the library with our use case (we want to join Delta tables with LakeBase ones added to Unity Catalog)

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions