|
| 1 | +--- |
| 2 | +sidebar_position: 2 |
| 3 | +--- |
| 4 | + |
| 5 | +# Databricks |
| 6 | + |
| 7 | +`intugle` integrates with Databricks, allowing you to read data from your tables and deploy your `SemanticModel` by setting constraints and comments directly in your Databricks account. |
| 8 | + |
| 9 | +## Installation |
| 10 | + |
| 11 | +To use `intugle` with Databricks, you must install the optional dependencies: |
| 12 | + |
| 13 | +```bash |
| 14 | +pip install "intugle[databricks]" |
| 15 | +``` |
| 16 | + |
| 17 | +This installs the `pyspark`, `sqlglot` and `databricks-sql-connector` libraries. |
| 18 | + |
| 19 | +## Configuration |
| 20 | + |
| 21 | +The Databricks adapter can connect using credentials from a `profiles.yml` file or automatically use an active session when running inside a Databricks notebook. |
| 22 | + |
| 23 | +### Connecting from an External Environment |
| 24 | + |
| 25 | +When running `intugle` outside of a Databricks notebook, you must provide full connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `databricks:` key. |
| 26 | + |
| 27 | +**Example `profiles.yml`:** |
| 28 | + |
| 29 | +```yaml |
| 30 | +databricks: |
| 31 | + host: <your_databricks_host> |
| 32 | + http_path: <your_sql_warehouse_http_path> |
| 33 | + token: <your_personal_access_token> |
| 34 | + schema: <your_schema> |
| 35 | + catalog: <your_catalog> # Optional, for Unity Catalog |
| 36 | +``` |
| 37 | +
|
| 38 | +### Connecting from a Databricks Notebook |
| 39 | +
|
| 40 | +When your code is executed within a Databricks Notebook, the adapter automatically detects and uses the notebook's active Spark session for execution. However, it still requires a `profiles.yml` file to determine the target `schema` and `catalog` for your operations. |
| 41 | + |
| 42 | +**Example `profiles.yml` for Notebooks:** |
| 43 | + |
| 44 | +```yaml |
| 45 | +databricks: |
| 46 | + schema: <your_schema> |
| 47 | + catalog: <your_catalog> # Optional, for Unity Catalog |
| 48 | +``` |
| 49 | + |
| 50 | +## Usage |
| 51 | + |
| 52 | +### Reading Data from Databricks |
| 53 | + |
| 54 | +To include a Databricks table in your `SemanticModel`, define it in your input dictionary with `type: "databricks"` and use the `identifier` key to specify the table name. |
| 55 | + |
| 56 | +:::caution Important |
| 57 | +The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table name specified in the `identifier`. |
| 58 | +::: |
| 59 | + |
| 60 | +```python |
| 61 | +from intugle import SemanticModel |
| 62 | +
|
| 63 | +datasets = { |
| 64 | + "CUSTOMERS": { |
| 65 | + "identifier": "CUSTOMERS", # Must match the key above |
| 66 | + "type": "databricks" |
| 67 | + }, |
| 68 | + "ORDERS": { |
| 69 | + "identifier": "ORDERS", # Must match the key above |
| 70 | + "type": "databricks" |
| 71 | + } |
| 72 | +} |
| 73 | +
|
| 74 | +# Initialize the semantic model |
| 75 | +sm = SemanticModel(datasets, domain="E-commerce") |
| 76 | +
|
| 77 | +# Build the model as usual |
| 78 | +sm.build() |
| 79 | +``` |
| 80 | + |
| 81 | +### Materializing Data Products |
| 82 | + |
| 83 | +When you use the `DataProduct` class with a Databricks connection, the resulting data product will be materialized as a new **view** directly within your target schema. |
| 84 | + |
| 85 | +### Deploying the Semantic Model |
| 86 | + |
| 87 | +Once your semantic model is built, you can deploy it to Databricks using the `deploy()` method. This process syncs your model's intelligence to your physical tables by: |
| 88 | +1. **Syncing Metadata:** It updates the comments on your physical Databricks tables and columns with the business glossaries from your `intugle` model. You can also sync tags. |
| 89 | +2. **Setting Constraints:** It sets `PRIMARY KEY` and `FOREIGN KEY` constraints on your tables based on the relationships discovered in the model. |
| 90 | + |
| 91 | +```python |
| 92 | +# Deploy the model to Databricks |
| 93 | +sm.deploy(target="databricks") |
| 94 | +
|
| 95 | +# You can also control which parts of the deployment to run |
| 96 | +sm.deploy( |
| 97 | + target="databricks", |
| 98 | + sync_glossary=True, |
| 99 | + sync_tags=True, |
| 100 | + set_primary_keys=True, |
| 101 | + set_foreign_keys=True |
| 102 | +) |
| 103 | +``` |
| 104 | + |
0 commit comments