Skip to content

Deployment Errors - Master Issue #32

@neebs12

Description

@neebs12

Overview

Errors and Resolution

  • lambda-role was already made: had to be deleted and reran
    • Note: this happens when initial home deployment fails and a destroy is not run
  • In init-process of tester engine - IF the s3 bucket is not reached to get script the tester immediately executes - ie: skips the synchronization process.
    • Requirement: Tester engine contains a "default" testing script, thus synchronization still needs to occur.
  • test comp message is sent but containers remain running
    • Note: line 45 of aws/lambda/orchestrator/index.js is commented out while we are still developing - uncomment out if you want to test
  • difficult to know if services are left running after destroy: resource group tags? (prob future work)
    • cdk s3 buckets and cloudformation stack remain from bootstrapping process

Resolved

  • failed to create 1/4 timestream tables: RateLimitExceeded error
    • see PR @local - 33
    • Solution: used for loop instead of .forEach for a delayed loop. Also added a further 500ms delay for each table creation write to ensure RateLimitError is not hit. Implemented a solution in a soon to be made PR - specific file is /scripts/utils/clearTimestream.js. @A-Thresher feel free to test again. I have tested with 4 regions (thus 8 tables in current schema) and it works as required.
    • Note: attempted to solve problem by initializing tables in CDK. However, we cannot guarantee that the database is created BEFORE the tables are created within the CDK without changing a significant workflow of current deployment. This version of the solution was tried but errored out - thus not implemented
  • tester count is hardcoded: to be determined based on VU numbers
    • see PR @local - 33
    • This can be solved using a .ts script that has a fn which specifies max VU per container. - see solution in comments.
  • failed to delete timestream in parallel mode
    • see PR @local - 33
    • Image
    • Note: this may be because there are now "many" tables and await timestreamWriteClient.send( new DeleteTableCommand({...})) in /scripts/utils/clearTimestream.js is not guaranteed to delete a table but to only request for it. Consider applying a recursive fn with base-case of noOfTables to undefined with a sleep of 500 to 1000ms to also avoid and any potential rate limit errors given queries to the client.
    • Note: Implemented a solution in a soon to be made PR - specific file is /scripts/utils/clearTimestream.js. @A-Thresher feel free to test again. I have tested with 4 regions (thus 8 tables in current schema) and it works as required.
  • Failed to create a S3 bucket a@home even if home deployment ran normally.
    • see PR @local - 33. Added resiliency to S3 bucket creation and teardown - is tested when E2E (2 remote regions) when S3 bucket was deliberately commented out from CDK - although S3 bucket deployment is still included in CDK in final production version
    • Image
    • Note: Destroyed home and re-ran again. No issue, error did not come up again.
  • Failed to build one s3 bucket: removed CDKToolkit in cloudformation and reran.
    • Note: This S3 bucket was the home region's bootstrap bucket. Remote region buckets were created fine.
    • Solution: This has been resolved momentarily. A work around is made where the s3 bucket is made if the CDK Bootstrapping process fails to make it. However this relies on the bucket having a consistent name (potentially tied to the cdk version used)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions