@@ -134,6 +134,10 @@ Below are instructions for using each of the migration workflows described above
134134 you created to point to the Hive metastore. It is used to extract the Hive JDBC
135135 connection information using the native Spark library.
136136
137+ - ` --region ` the AWS region for Glue Data Catalog, for example, ` us-east-1 ` .
138+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region .
139+ If not provided, ` us-east-1 ` is used as default.
140+
137141 - ` --database-prefix ` (optional) set to a string prefix that is applied to the
138142 database name created in AWS Glue Data Catalog. You can use it as a way
139143 to track the origin of the metadata, and avoid naming conflicts. The default
@@ -164,7 +168,8 @@ If the above solutions don't apply to your situation, you can choose to first
164168migrate your Hive metastore to Amazon S3 objects as a staging area, then run an ETL
165169job to import the metadata from S3 to the AWS Glue Data Catalog. To do this, you need to
166170have a Spark 2.1.x cluster that can connect to your Hive metastore and export
167- metadata to plain files on S3.
171+ metadata to plain files on S3. The Hive metastore to S3 migration can also run
172+ as an Glue ETL job, if AWS Glue can directly connect to your Hive metastore.
168173
1691741 . Make the MySQL connector jar available to the Spark cluster on the master and
170175 all worker nodes. Include the jar in the Spark driver class path as well
@@ -229,9 +234,12 @@ metadata to plain files on S3.
229234 Add the following parameters.
230235
231236 - ` --mode` set to ` from-s3`
232- - ` --database-input-path` set to the S3 path containing only databases.
233- - ` --table-input-path` set to the S3 path containing only tables.
234- - ` --partition-input-path` set to the S3 path containing only partitions.
237+ - ` --region` the AWS region for Glue Data Catalog, for example, ` us-east-1` .
238+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region.
239+ If not provided, ` us-east-1` is used as default.
240+ - ` --database-input-path` set to the S3 path containing only databases. For example: ` s3://someBucket/output_path_from_previous_job/databases`
241+ - ` --table-input-path` set to the S3 path containing only tables. For example: ` s3://someBucket/output_path_from_previous_job/tables`
242+ - ` --partition-input-path` set to the S3 path containing only partitions. For example: ` s3://someBucket/output_path_from_previous_job/partitions`
235243
236244 Also, because there is no need to connect to any JDBC source, the job doesn' t
237245 require any connections.
@@ -315,6 +323,9 @@ metadata to plain files on S3.
315323 directly to a jdbc Hive Metastore
316324 - ` --connection-name` set to the name of the AWS Glue connection
317325 you created to point to the Hive metastore. It is the destination of the migration.
326+ - ` --region` the AWS region for Glue Data Catalog, for example, ` us-east-1` .
327+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region.
328+ If not provided, ` us-east-1` is used as default.
318329 - ` --database-names` set to a semi-colon(; ) separated list of
319330 database names to export from Data Catalog.
320331
@@ -333,7 +344,10 @@ metadata to plain files on S3.
333344 instructions above. Since the destination is now an S3 bucket instead of a Hive metastore,
334345 no connections are required. In the job, add the following parameters:
335346
336- - ` --mode` set to ` to-S3` , which means the migration is to S3.
347+ - ` --mode` set to ` to-s3` , which means the migration is to S3.
348+ - ` --region` the AWS region for Glue Data Catalog, for example, ` us-east-1` .
349+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region.
350+ If not provided, ` us-east-1` is used as default.
337351 - ` --database-names` set to a semi-colon(; ) separated list of
338352 database names to export from Data Catalog.
339353 - ` --output-path` set to the S3 destination path.
@@ -365,8 +379,7 @@ metadata to plain files on S3.
365379
366380# ### AWS Glue Data Catalog to another AWS Glue Data Catalog
367381
368- Currently, you cannot access an AWS Glue Data Catalog in another account.
369- However, you can migrate (copy) metadata from the Data Catalog in one account to another. The steps are:
382+ You can migrate (copy) metadata from the Data Catalog in one account to another. The steps are:
370383
3713841. Enable cross-account access for an S3 bucket so that both source and target accounts can access it. See
372385 [the Amazon S3 documenation](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html#example-bucket-policies-use-case-1)
@@ -379,7 +392,7 @@ However, you can migrate (copy) metadata from the Data Catalog in one account to
379392
3803933. Upload the the following scripts to an S3 bucket accessible from the target AWS account to be updated:
381394
382- export_from_datacatalog .py
395+ import_into_datacatalog .py
383396 hive_metastore_migration.py
384397
3853984. In the source AWS account, create a job on the AWS Glue console to extract metadata from the AWS Glue Data Catalog to S3.
@@ -391,7 +404,10 @@ However, you can migrate (copy) metadata from the Data Catalog in one account to
391404
392405 Add the following parameters:
393406
394- - ` --mode` set to ` to-S3` , which means the migration is to S3.
407+ - ` --mode` set to ` to-s3` , which means the migration is to S3.
408+ - ` --region` the AWS region for Glue Data Catalog, for example, ` us-east-1` .
409+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region.
410+ If not provided, ` us-east-1` is used as default.
395411 - ` --database-names` set to a semi-colon(; ) separated list of
396412 database names to export from Data Catalog.
397413 - ` --output-path` set to the S3 destination path that you configured with ** cross-account access** .
@@ -407,10 +423,12 @@ However, you can migrate (copy) metadata from the Data Catalog in one account to
407423 Add the following parameters.
408424
409425 - ` --mode` set to ` from-s3`
426+ - ` --region` the AWS region for Glue Data Catalog, for example, ` us-east-1` .
427+ You can find a list of Glue supported regions here: http://docs.aws.amazon.com/general/latest/gr/rande.html#glue_region.
428+ If not provided, ` us-east-1` is used as default.
410429 - ` --database-input-path` set to the S3 path containing only databases.
411430 - ` --table-input-path` set to the S3 path containing only tables.
412431 - ` --partition-input-path` set to the S3 path containing only partitions.
413432
4144336. (Optional) Manually delete the temporary files generated in the S3 folder. Also, remember to revoke the
415434 cross-account access if it' s not needed anymore.
416-
0 commit comments