Skip to content

Commit c689c60

Browse files
authored
add duckdb support (#1398)
1 parent 10d53b4 commit c689c60

File tree

18 files changed

+678
-817
lines changed

18 files changed

+678
-817
lines changed

doc/assets/diagram.png

2.55 KB
Loading

doc/assets/diagram.svg

Lines changed: 367 additions & 807 deletions
Loading

doc/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ alt: Works with GeoPandas
101101
align: center
102102
---
103103
:::
104+
104105
:::{tab-item} Polars
105106
```python
106107
import polars
@@ -116,6 +117,24 @@ align: center
116117
---
117118
:::
118119
120+
:::{tab-item} DuckDB
121+
```python
122+
import duckdb
123+
import hvplot.duckdb
124+
from bokeh.sampledata.autompg import autompg_clean as df
125+
126+
df_duckdb = duckdb.from_df(df)
127+
table = df_duckdb.groupby(['origin', 'mfr'])['mpg'].mean().sort_values().tail(5)
128+
table.hvplot.barh('mfr', 'mpg', by='origin', stacked=True)
129+
```
130+
```{image} ./_static/home/pandas.gif
131+
---
132+
alt: Works with DuckDB
133+
align: center
134+
---
135+
```
136+
137+
:::
119138
:::{tab-item} Intake
120139
```python
121140
import hvplot.intake

doc/user_guide/Integrations.ipynb

Lines changed: 108 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -254,19 +254,13 @@
254254
},
255255
{
256256
"cell_type": "markdown",
257-
"id": "a46e377e-729a-4f99-b5d3-83b0736cb8a3",
257+
"id": "7474a792-2cfd-4139-a1cd-872f913fa07b",
258258
"metadata": {},
259259
"source": [
260260
":::{note}\n",
261261
"Added in version `0.9.0`.\n",
262-
":::"
263-
]
264-
},
265-
{
266-
"cell_type": "markdown",
267-
"id": "7474a792-2cfd-4139-a1cd-872f913fa07b",
268-
"metadata": {},
269-
"source": [
262+
":::\n",
263+
"\n",
270264
":::{important}\n",
271265
"While other data sources like `Pandas` or `Dask` have built-in support in HoloViews, as of version 1.17.1 this is not yet the case for `Polars`. You can track this [issue](https://github.com/holoviz/holoviews/issues/5939) to follow the evolution of this feature in HoloViews. Internally hvPlot simply selects the columns that contribute to the plot and casts them to a Pandas object using Polars' `.to_pandas()` method.\n",
272266
":::"
@@ -327,6 +321,111 @@
327321
"df_polars['A'].hvplot.line(height=150)"
328322
]
329323
},
324+
{
325+
"cell_type": "markdown",
326+
"id": "efc2f45e",
327+
"metadata": {},
328+
"source": [
329+
"#### DuckDB"
330+
]
331+
},
332+
{
333+
"cell_type": "markdown",
334+
"id": "db91860c",
335+
"metadata": {},
336+
"source": [
337+
":::{note}\n",
338+
"Added in version `0.11.0`.\n",
339+
":::"
340+
]
341+
},
342+
{
343+
"cell_type": "code",
344+
"execution_count": null,
345+
"id": "0d6460d0",
346+
"metadata": {},
347+
"outputs": [],
348+
"source": [
349+
"import numpy as np\n",
350+
"import pandas as pd\n",
351+
"\n",
352+
"df_pandas = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD')).cumsum()\n",
353+
"df_pandas.head(2)"
354+
]
355+
},
356+
{
357+
"cell_type": "code",
358+
"execution_count": null,
359+
"id": "21638d45",
360+
"metadata": {},
361+
"outputs": [],
362+
"source": [
363+
"import hvplot.duckdb # noqa \n",
364+
"import duckdb\n",
365+
"\n",
366+
"connection = duckdb.connect(':memory:')\n",
367+
"relation = duckdb.from_df(df_pandas, connection=connection)\n",
368+
"relation.to_view(\"example_view\");"
369+
]
370+
},
371+
{
372+
"cell_type": "markdown",
373+
"id": "40b56f16",
374+
"metadata": {},
375+
"source": [
376+
"`.hvplot()` supports [DuckDB](https://duckdb.org/docs/api/python/overview.html) `DuckDBPyRelation` and `DuckDBConnection` objects."
377+
]
378+
},
379+
{
380+
"cell_type": "code",
381+
"execution_count": null,
382+
"id": "f588e3fe",
383+
"metadata": {},
384+
"outputs": [],
385+
"source": [
386+
"relation.hvplot.line(y=['A', 'B', 'C', 'D'], height=150)"
387+
]
388+
},
389+
{
390+
"cell_type": "markdown",
391+
"id": "68a47856",
392+
"metadata": {},
393+
"source": [
394+
"`DuckDBPyRelation` is a bit more optimized because it handles column subsetting directly within DuckDB before the data is converted to a `pd.DataFrame`.\n",
395+
"\n",
396+
"So, it's a good idea to use the `connection.sql()` method when possible, which gives you a `DuckDBPyRelation`, instead of `connection.execute()`, which returns a `DuckDBPyConnection`."
397+
]
398+
},
399+
{
400+
"cell_type": "code",
401+
"execution_count": null,
402+
"id": "214c60ee",
403+
"metadata": {},
404+
"outputs": [],
405+
"source": [
406+
"sql_expr = \"SELECT * FROM example_view WHERE A > 0 AND B > 0\"\n",
407+
"connection.sql(sql_expr).hvplot.line(y=['A', 'B'], hover_cols=[\"C\"], height=150) # subsets A, B, C"
408+
]
409+
},
410+
{
411+
"cell_type": "markdown",
412+
"id": "2a2f61d4",
413+
"metadata": {},
414+
"source": [
415+
"Alternatively, you can directly subset the desired columns in the SQL expression."
416+
]
417+
},
418+
{
419+
"cell_type": "code",
420+
"execution_count": null,
421+
"id": "5ce25c3d",
422+
"metadata": {},
423+
"outputs": [],
424+
"source": [
425+
"sql_expr = \"SELECT A, B, C FROM example_view WHERE A > 0 AND B > 0\"\n",
426+
"connection.execute(sql_expr).hvplot.line(y=['A', 'B'], hover_cols=[\"C\"], height=150)"
427+
]
428+
},
330429
{
331430
"cell_type": "markdown",
332431
"id": "25a6e724-6a84-4bff-9108-ac71dcfa9116",

doc/user_guide/Introduction.ipynb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
"\n",
1616
"* [Pandas](https://pandas.pydata.org): DataFrame, Series (columnar/tabular data)\n",
1717
"* [Rapids cuDF](https://docs.rapids.ai/api/cudf/stable/): GPU DataFrame, Series (columnar/tabular data)\n",
18+
"* [DuckDB](https://www.duckdb.org/): DuckDB is a fast in-process analytical database\n",
1819
"* [Polars](https://www.pola.rs/): Polars is a fast DataFrame library/in-memory query engine (columnar/tabular data)\n",
1920
"* [Dask](https://www.dask.org): DataFrame, Series (distributed/out of core arrays and columnar data)\n",
2021
"* [XArray](https://xarray.pydata.org): Dataset, DataArray (labelled multidimensional arrays)\n",

envs/py3.10-tests.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ dependencies:
2121
- dask
2222
- dask>=2021.3.0
2323
- datashader>=0.6.5
24+
- duckdb
2425
- fiona
2526
- fugue
2627
- fugue-sql-antlr>=0.2.0

envs/py3.11-docs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ dependencies:
2020
- colorcet>=2
2121
- dask>=2021.3.0
2222
- datashader>=0.6.5
23+
- duckdb
2324
- fiona
2425
- fugue
2526
- fugue-sql-antlr>=0.2.0

envs/py3.11-tests.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ dependencies:
2121
- dask
2222
- dask>=2021.3.0
2323
- datashader>=0.6.5
24+
- duckdb
2425
- fiona
2526
- fugue
2627
- fugue-sql-antlr>=0.2.0

envs/py3.12-tests.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ dependencies:
2121
- dask
2222
- dask>=2021.3.0
2323
- datashader>=0.6.5
24+
- duckdb
2425
- fiona
2526
- fugue
2627
- fugue-sql-antlr>=0.2.0

envs/py3.9-tests.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ dependencies:
2020
- dask
2121
- dask>=2021.3.0
2222
- datashader>=0.6.5
23+
- duckdb
2324
- fiona
2425
- fugue
2526
- fugue-sql-antlr>=0.2.0

0 commit comments

Comments
 (0)