Skip to content

Commit b8c8a27

Browse files
committed
fix: update tutorial 36 with correct pip install command
1 parent 80a4b99 commit b8c8a27

File tree

3 files changed

+68
-70
lines changed

3 files changed

+68
-70
lines changed

tutorials/036 - Distributing Calls with Glue Interactive Sessions on Ray.ipynb

Lines changed: 68 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -2,83 +2,89 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"metadata": {},
5+
"metadata": {
6+
"editable": true,
7+
"trusted": true
8+
},
69
"source": [
710
"[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)\n",
811
"\n",
912
"# 36 - Distributing Calls on Glue Interactive sessions\n",
1013
"\n",
11-
"AWS SDK for pandas is pre-loaded into [AWS Glue interactive sessions](https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions-overview.html) with Ray kernel, making it by far the easiest way to experiment with the library at scale."
14+
"AWS SDK for pandas is pre-loaded into [AWS Glue interactive sessions](https://docs.aws.amazon.com/glue/latest/dg/is-using-ray.html) with Ray kernel, making it by far the easiest way to experiment with the library at scale."
1215
]
1316
},
1417
{
1518
"cell_type": "markdown",
16-
"metadata": {},
19+
"metadata": {
20+
"editable": true,
21+
"trusted": true
22+
},
1723
"source": [
18-
"In AWS Glue Studio, choose `Jupyter Notebook` to create an AWS Glue interactive session:\n",
24+
"In AWS Glue Studio, choose `Notebook` to create an AWS Glue interactive session:\n",
1925
"\n",
2026
"![](_static/glue_is_create.png)\n",
2127
"\n",
2228
"Then select `Ray` as the kernel. The IAM role must trust the AWS Glue service principal.\n",
2329
"\n",
2430
"![](_static/glue_is_setup.png)\n",
2531
"\n",
26-
"Once the notebook is up and running you can import the library. Since we are running on AWS Glue with Ray, AWS SDK for pandas will automatically use the existing Ray cluster with no extra configuration needed."
27-
]
28-
},
29-
{
30-
"attachments": {},
31-
"cell_type": "markdown",
32-
"metadata": {},
33-
"source": [
34-
"## Install the library"
32+
"Once the notebook is up and running you can import the library. You can install `awswrangler` and `modin` as additional dependencies."
3533
]
3634
},
3735
{
3836
"cell_type": "code",
39-
"execution_count": null,
40-
"metadata": {},
41-
"outputs": [],
37+
"execution_count": 16,
38+
"metadata": {
39+
"tags": [],
40+
"trusted": true,
41+
"vscode": {
42+
"languageId": "python"
43+
}
44+
},
45+
"outputs": [
46+
{
47+
"name": "stdout",
48+
"output_type": "stream",
49+
"text": [
50+
"Additional python modules to be included:\n",
51+
"awswrangler\n",
52+
"modin\n"
53+
]
54+
}
55+
],
4256
"source": [
43-
"!pip install \"awswrangler[modin]\""
57+
"%additional_python_modules awswrangler,modin"
4458
]
4559
},
4660
{
4761
"cell_type": "code",
4862
"execution_count": 1,
4963
"metadata": {
50-
"editable": true,
51-
"trusted": true
64+
"tags": [],
65+
"trusted": true,
66+
"vscode": {
67+
"languageId": "python"
68+
}
5269
},
5370
"outputs": [
5471
{
5572
"name": "stdout",
5673
"output_type": "stream",
5774
"text": [
58-
"Welcome to the Glue Interactive Sessions Kernel\n",
59-
"For more information on available magic commands, please type %help in any new cell.\n",
60-
"\n",
61-
"Please view our Getting Started page to access the most up-to-date information on the Interactive Sessions kernel: https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html\n",
62-
"Installed kernel version: 0.37.0 \n",
63-
"Authenticating with environment variables and user-defined glue_role_arn: arn:aws:iam::977422593089:role/AWSGlueMantaTests\n",
75+
"Authenticating with environment variables and user-defined glue_role_arn: arn:aws:iam::463623607974:role/service-role/AmazonSageMakerServiceCatalogProductsGlueRole\n",
6476
"Trying to create a Glue session for the kernel.\n",
6577
"Worker Type: Z.2X\n",
6678
"Number of Workers: 5\n",
67-
"Session ID: 309824f0-bad7-49d0-a2b4-e1b8c7368c5f\n",
79+
"Session ID: 32566e82-34d2-4db7-adac-cbee573e20bf\n",
6880
"Job Type: glueray\n",
6981
"Applying the following default arguments:\n",
70-
"--glue_kernel_version 0.37.0\n",
82+
"--glue_kernel_version 0.38.1\n",
7183
"--enable-glue-datacatalog true\n",
72-
"Waiting for session 309824f0-bad7-49d0-a2b4-e1b8c7368c5f to get into ready status...\n",
73-
"Session 309824f0-bad7-49d0-a2b4-e1b8c7368c5f has been created.\n"
74-
]
75-
},
76-
{
77-
"name": "stderr",
78-
"output_type": "stream",
79-
"text": [
80-
"2022-11-21 16:24:03,136\tINFO worker.py:1329 -- Connecting to existing Ray cluster at address: 2600:1f10:4674:6822:5b63:3324:984:3152:6379...\n",
81-
"2022-11-21 16:24:03,144\tINFO worker.py:1511 -- Connected to Ray cluster. View the dashboard at \u001b[1m\u001b[32m127.0.0.1:8265 \u001b[39m\u001b[22m\n"
84+
"--auto-scaling-ray-min-workers 1\n",
85+
"--additional-python-modules awswrangler,modin\n",
86+
"Waiting for session 32566e82-34d2-4db7-adac-cbee573e20bf to get into ready status...\n",
87+
"Session 32566e82-34d2-4db7-adac-cbee573e20bf has been created.\n"
8288
]
8389
}
8490
],
@@ -88,43 +94,42 @@
8894
},
8995
{
9096
"cell_type": "code",
91-
"execution_count": 3,
97+
"execution_count": 8,
9298
"metadata": {
93-
"trusted": true
94-
},
95-
"outputs": [
96-
{
97-
"name": "stderr",
98-
"output_type": "stream",
99-
"text": [
100-
"Read progress: 100%|##########| 9/9 [00:10<00:00, 1.15s/it]\n",
101-
"UserWarning: When using a pre-initialized Ray cluster, please ensure that the runtime env sets environment variable __MODIN_AUTOIMPORT_PANDAS__ to 1\n"
102-
]
99+
"tags": [],
100+
"trusted": true,
101+
"vscode": {
102+
"languageId": "python"
103103
}
104-
],
104+
},
105+
"outputs": [],
105106
"source": [
106-
"df = wr.s3.read_csv(path=\"s3://nyc-tlc/csv_backup/yellow_tripdata_2021-0*.csv\")"
107+
"df = wr.s3.read_parquet(path=\"s3://ursa-labs-taxi-data/2017/\")"
107108
]
108109
},
109110
{
110111
"cell_type": "code",
111-
"execution_count": 4,
112+
"execution_count": 9,
112113
"metadata": {
113-
"trusted": true
114+
"tags": [],
115+
"trusted": true,
116+
"vscode": {
117+
"languageId": "python"
118+
}
114119
},
115120
"outputs": [
116121
{
117122
"name": "stdout",
118123
"output_type": "stream",
119124
"text": [
120-
" VendorID tpep_pickup_datetime ... total_amount congestion_surcharge\n",
121-
"0 1.0 2021-01-01 00:30:10 ... 11.80 2.5\n",
122-
"1 1.0 2021-01-01 00:51:20 ... 4.30 0.0\n",
123-
"2 1.0 2021-01-01 00:43:30 ... 51.95 0.0\n",
124-
"3 1.0 2021-01-01 00:15:48 ... 36.35 0.0\n",
125-
"4 2.0 2021-01-01 00:31:49 ... 24.36 2.5\n",
125+
" vendor_id pickup_at ... improvement_surcharge total_amount\n",
126+
"0 1 2017-01-09 11:13:28 ... 0.3 15.300000\n",
127+
"1 1 2017-01-09 11:32:27 ... 0.3 7.250000\n",
128+
"2 1 2017-01-09 11:38:20 ... 0.3 7.300000\n",
129+
"3 1 2017-01-09 11:52:13 ... 0.3 8.500000\n",
130+
"4 2 2017-01-01 00:00:00 ... 0.3 52.799999\n",
126131
"\n",
127-
"[5 rows x 18 columns]\n"
132+
"[5 rows x 17 columns]\n"
128133
]
129134
}
130135
],
@@ -133,7 +138,6 @@
133138
]
134139
},
135140
{
136-
"attachments": {},
137141
"cell_type": "markdown",
138142
"metadata": {},
139143
"source": [
@@ -145,9 +149,9 @@
145149
],
146150
"metadata": {
147151
"kernelspec": {
148-
"display_name": "awswrangler-v9JnknIF-py3.8",
152+
"display_name": "Glue PySpark",
149153
"language": "python",
150-
"name": "python3"
154+
"name": "glue_pyspark"
151155
},
152156
"language_info": {
153157
"codemirror_mode": {
@@ -156,14 +160,8 @@
156160
},
157161
"file_extension": ".py",
158162
"mimetype": "text/x-python",
159-
"name": "python",
160-
"pygments_lexer": "python3",
161-
"version": "3.8.5"
162-
},
163-
"vscode": {
164-
"interpreter": {
165-
"hash": "83297b058d59ee0acd247586c837429190a8258f15c0eea6234359f5557dde51"
166-
}
163+
"name": "Python_Glue_Session",
164+
"pygments_lexer": "python3"
167165
}
168166
},
169167
"nbformat": 4,

tutorials/_static/glue_is_create.png

-40.1 KB
Loading

tutorials/_static/glue_is_setup.png

-65 KB
Loading

0 commit comments

Comments
 (0)