Skip to content

Commit 2950740

Browse files
Merge pull request #60 from Intugle/features/databricks-integration
Features/databricks integration
2 parents f8104cc + 01b229d commit 2950740

File tree

18 files changed

+3178
-113
lines changed

18 files changed

+3178
-113
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ For a detailed, hands-on introduction to the project, please see our quickstart
104104
| **Sports Media** | [`quickstart_sports_media.ipynb`](notebooks/quickstart_sports_media.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_sports_media.ipynb) |
105105
| **Databricks Unity Catalog [Health Care]** | [`quickstart_healthcare_databricks.ipynb`](notebooks/quickstart_healthcare_databricks.ipynb) | Databricks Notebook Only |
106106
| **Snowflake Horizon Catalog [ FMCG ]** | [`quickstart_fmcg_snowflake.ipynb`](notebooks/quickstart_fmcg_snowflake.ipynb) | Snowflake Notebook Only |
107-
| **Native Snowflake with Cortex Analyst [ Tech Manufacturing ]** | [`quickstart_native_snowflake.ipynb`](notebooks/quickstart_native_snowflake.ipynb) | Snowflake Notebook Only |
107+
| **Native Snowflake with Cortex Analyst [ Tech Manufacturing ]** | [`quickstart_native_snowflake.ipynb`](notebooks/quickstart_native_snowflake.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_native_snowflake.ipynb) |
108+
| **Native Databricks with AI/BI Genie [ Tech Manufacturing ]** | [`quickstart_native_databricks.ipynb`](notebooks/quickstart_native_databricks.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_native_databricks.ipynb) |
108109

109110
These datasets will take you through the following steps:
110111

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# Databricks
6+
7+
`intugle` integrates with Databricks, allowing you to read data from your tables and deploy your `SemanticModel` by setting constraints and comments directly in your Databricks account.
8+
9+
## Installation
10+
11+
To use `intugle` with Databricks, you must install the optional dependencies:
12+
13+
```bash
14+
pip install "intugle[databricks]"
15+
```
16+
17+
This installs the `pyspark`, `sqlglot` and `databricks-sql-connector` libraries.
18+
19+
## Configuration
20+
21+
The Databricks adapter can connect using credentials from a `profiles.yml` file or automatically use an active session when running inside a Databricks notebook.
22+
23+
### Connecting from an External Environment
24+
25+
When running `intugle` outside of a Databricks notebook, you must provide full connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `databricks:` key.
26+
27+
**Example `profiles.yml`:**
28+
29+
```yaml
30+
databricks:
31+
host: <your_databricks_host>
32+
http_path: <your_sql_warehouse_http_path>
33+
token: <your_personal_access_token>
34+
schema: <your_schema>
35+
catalog: <your_catalog> # Optional, for Unity Catalog
36+
```
37+
38+
### Connecting from a Databricks Notebook
39+
40+
When your code is executed within a Databricks Notebook, the adapter automatically detects and uses the notebook's active Spark session for execution. However, it still requires a `profiles.yml` file to determine the target `schema` and `catalog` for your operations.
41+
42+
**Example `profiles.yml` for Notebooks:**
43+
44+
```yaml
45+
databricks:
46+
schema: <your_schema>
47+
catalog: <your_catalog> # Optional, for Unity Catalog
48+
```
49+
50+
## Usage
51+
52+
### Reading Data from Databricks
53+
54+
To include a Databricks table in your `SemanticModel`, define it in your input dictionary with `type: "databricks"` and use the `identifier` key to specify the table name.
55+
56+
:::caution Important
57+
The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table name specified in the `identifier`.
58+
:::
59+
60+
```python
61+
from intugle import SemanticModel
62+
63+
datasets = {
64+
"CUSTOMERS": {
65+
"identifier": "CUSTOMERS", # Must match the key above
66+
"type": "databricks"
67+
},
68+
"ORDERS": {
69+
"identifier": "ORDERS", # Must match the key above
70+
"type": "databricks"
71+
}
72+
}
73+
74+
# Initialize the semantic model
75+
sm = SemanticModel(datasets, domain="E-commerce")
76+
77+
# Build the model as usual
78+
sm.build()
79+
```
80+
81+
### Materializing Data Products
82+
83+
When you use the `DataProduct` class with a Databricks connection, the resulting data product will be materialized as a new **view** directly within your target schema.
84+
85+
### Deploying the Semantic Model
86+
87+
Once your semantic model is built, you can deploy it to Databricks using the `deploy()` method. This process syncs your model's intelligence to your physical tables by:
88+
1. **Syncing Metadata:** It updates the comments on your physical Databricks tables and columns with the business glossaries from your `intugle` model. You can also sync tags.
89+
2. **Setting Constraints:** It sets `PRIMARY KEY` and `FOREIGN KEY` constraints on your tables based on the relationships discovered in the model.
90+
91+
```python
92+
# Deploy the model to Databricks
93+
sm.deploy(target="databricks")
94+
95+
# You can also control which parts of the deployment to run
96+
sm.deploy(
97+
target="databricks",
98+
sync_glossary=True,
99+
sync_tags=True,
100+
set_primary_keys=True,
101+
set_foreign_keys=True
102+
)
103+
```
104+

docsite/docs/examples.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ For a detailed, hands-on introduction to the project, please see our quickstart
1515
| **Sports Media** | [`quickstart_sports_media.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_sports_media.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_sports_media.ipynb) |
1616
| **Databricks Unity Catalog [Health Care]** | [`quickstart_healthcare_databricks.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_healthcare_databricks.ipynb) | Databricks Notebook Only |
1717
| **Snowflake Horizon Catalog [ FMCG ]** | [`quickstart_fmcg_snowflake.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_fmcg_snowflake.ipynb) | Snowflake Notebook Only |
18-
| **Native Snowflake with Cortex Analyst [ Tech Manufacturing ]** | [`quickstart_native_snowflake.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_native_snowflake.ipynb) | Snowflake Notebook Only |
18+
| **Native Snowflake with Cortex Analyst [ Tech Manufacturing ]** | [`quickstart_native_snowflake.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_native_snowflake.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_native_snowflake.ipynb) |
19+
| **Native Databricks with AI/BI Genie [ Tech Manufacturing ]** | [`quickstart_native_databricks.ipynb`](https://github.com/Intugle/data-tools/blob/main/notebooks/quickstart_native_databricks.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Intugle/data-tools/blob/main/notebooks/quickstart_native_databricks.ipynb) |
1920

2021
These datasets will take you through the following steps:
2122

0 commit comments

Comments
 (0)