Skip to content

Commit b7f7170

Browse files
committed
docs: Added databricks notebook
1 parent a2ab608 commit b7f7170

File tree

2 files changed

+126
-1
lines changed

2 files changed

+126
-1
lines changed

.gitignore

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,49 @@
1-
/.project
1+
target/
2+
!.mvn/wrapper/maven-wrapper.jar
3+
!**/src/main/**/target/
4+
!**/src/test/**/target/
5+
6+
### IntelliJ IDEA ###
7+
*.iws
8+
*.iml
9+
*.ipr
10+
11+
### Eclipse ###
12+
.apt_generated
13+
.classpath
14+
.factorypath
15+
.project
16+
.settings
17+
.springBeans
18+
.sts4-cache
19+
20+
### NetBeans ###
21+
/nbproject/private/
22+
/nbbuild/
23+
/dist/
24+
/nbdist/
25+
/.nb-gradle/
26+
!**/src/main/**/build/
27+
!**/src/test/**/build/
28+
29+
### VS Code ###
30+
.vscode/
31+
32+
### Mac OS ###
33+
.DS_Store
34+
pom.xml.tag
35+
pom.xml.releaseBackup
36+
pom.xml.versionsBackup
37+
pom.xml.next
38+
release.properties
39+
dependency-reduced-pom.xml
40+
buildNumber.properties
41+
.mvn/timing.properties
42+
# https://github.com/takari/maven-wrapper#usage-without-binary-jar
43+
.idea
44+
bin
45+
build
46+
.gradle
47+
out
48+
dbdata
49+
/docker-compose-local.yml

redis-spark-notebook.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Databricks notebook source
2+
# MAGIC %md # Connect Spark with Redis Cloud
3+
# MAGIC Spark and Redis together unlock powerful capabilities for data professionals. This guide demonstrates how to integrate these technologies for enhanced analytics, real-time processing, and machine learning applications.
4+
# MAGIC
5+
# MAGIC In this hands-on notebook, you'll learn how to make efficient use of Redis data structures alongside Spark's distributed computing framework. You'll see firsthand how to extract data from Redis, process it in Spark, and write results back to Redis for application use. Key topics include:
6+
# MAGIC 1. Setting up the Spark-Redis connector in Databricks
7+
# MAGIC 2. Writing data to Redis from Spark
8+
# MAGIC 3. Reading data from Redis for application access
9+
# MAGIC
10+
# MAGIC ## Databricks Cluster Setup with Redis Connector
11+
# MAGIC
12+
# MAGIC 1. Set up a new Databricks cluster
13+
# MAGIC 1. Go to the cluster's **Libraries** section
14+
# MAGIC 1. Select **Install New**
15+
# MAGIC 1. Choose **Maven** as your source and click **Search Packages**
16+
# MAGIC 1. Enter `redis-spark-connector`` and select `com.redis:redis-spark-connector:x.y.z`
17+
# MAGIC 1. Finalize by clicking **Install** <br/>
18+
# MAGIC Want to explore the connector's full capabilities? Check the [detailed documentation](https://redis-field-engineering.github.io/redis-spark)
19+
# MAGIC
20+
# MAGIC ## Loading Test Data into Spark
21+
# MAGIC
22+
# MAGIC In this step, you import a CSV file into your Unity Catalog volume. This is a shortened version of the [Import and visualize CSV data](https://docs.databricks.com/aws/en/getting-started/import-visualize-data) notebook.
23+
# MAGIC 1. Replace `<catalog-name>`, `<schema-name>`, and `<volume-name>` with the catalog, schema, and volume names for a Unity Catalog volume. Optionally replace the `table_name` value with a table name of your choice.
24+
25+
# COMMAND ----------
26+
27+
catalog = "<catalog_name>"
28+
schema = "<schema_name>"
29+
volume = "<volume_name>"
30+
download_url = "https://health.data.ny.gov/api/views/jxy9-yhdk/rows.csv"
31+
file_name = "baby_names.csv"
32+
table_name = "baby_names"
33+
path_volume = "/Volumes/" + catalog + "/" + schema + "/" + volume
34+
path_table = catalog + "." + schema
35+
dbutils.fs.cp(f"{download_url}", f"{path_volume}" + "/" + f"{file_name}")
36+
df = spark.read.csv(f"{path_volume}/{file_name}", header=True, inferSchema=True, sep=",")
37+
38+
# COMMAND ----------
39+
40+
# MAGIC %md ## Setting Up Redis Cloud Environment
41+
# MAGIC
42+
# MAGIC Redis Cloud offers a fully-managed Redis service ideal for this integration. Follow these steps:
43+
# MAGIC
44+
# MAGIC 1. Register for a [Redis Cloud account](https://redis.io/cloud/)
45+
# MAGIC 1. Follow the [quickstart guide](https://redis.io/docs/latest/operate/rc/rc-quickstart/) to create a free tier database
46+
# MAGIC
47+
# MAGIC ## Configuring Spark with Redis Connection Details
48+
# MAGIC
49+
# MAGIC 1. From your Redis Cloud database dashboard, find your connection endpoint under **Connect**. The string follows this pattern: `redis://<user>:<pass>@<host>:<port>`
50+
# MAGIC 1. In Databricks, open your cluster settings and locate **Advanced Options**. Under **Spark** in the **Spark config** text area, add your Redis connection string as both `spark.redis.read.connection.uri redis://...` and `spark.redis.write.connection.uri redis://...` parameters. This configuration applies to all notebooks using this cluster.
51+
# MAGIC
52+
53+
# MAGIC ## Writing Data to Redis
54+
# MAGIC
55+
# MAGIC Let's use the `df` test data we imported earlier and write it to Redis.
56+
# MAGIC
57+
58+
# COMMAND ----------
59+
60+
df.write.format("redis").mode("overwrite").option("type", "hash").option("keyspace", "baby").option("key", "First Name").save()
61+
62+
# COMMAND ----------
63+
64+
# MAGIC %md ## Exploring Your Redis Data
65+
# MAGIC Examine the keys and values you've created using **RedisInsight**, Redis' visual data browser. From your Redis Cloud database dashboard, click on **Redis Insight** and explore the data imported from Spark.
66+
67+
# MAGIC ## Reading from Redis
68+
# MAGIC
69+
# MAGIC We can now read the data from Redis using the following line.
70+
# MAGIC
71+
72+
# COMMAND ----------
73+
74+
redisDF = spark.read.format("redis").load()
75+
display(redisDF)
76+
77+
# COMMAND ----------

0 commit comments

Comments
 (0)