-
Notifications
You must be signed in to change notification settings - Fork 1
added retry logic and made sure only errors on last retry #1986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds retry-aware logging middleware to Faktory job processing, ensuring that job failures only log as errors on the final retry attempt, while earlier failures log as warnings.
Key changes:
- Implemented
RetryAwareLoggingmiddleware that distinguishes between retriable failures (warnings) and final failures (errors) - Added comprehensive test coverage for the retry-aware logging behavior across different retry scenarios
- Integrated the middleware into the worker manager and added a minor logging improvement to the digest cron job
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/worker/worker.go | Added default max retries constant and registered the retry-aware logging middleware |
| pkg/worker/job_utilities.go | Implemented the RetryAwareLogging middleware function with retry count logic |
| pkg/worker/job_utilities_test.go | Added comprehensive tests for retry-aware logging behavior including edge cases |
| pkg/worker/digest_cron_job.go | Minor refactoring to add day parameter to logging for better observability |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Thanks Peyton, still looking, but I'm running into some issues testing I wanted to alert you to while I continue review. Something is panicking, it looks like it's from the new logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Peyton232 , I don't think this will address the current issue with translated audit logging an error, will it? the error is logged currently from the translated audit package. We pass a logger, but we are calling the error method directly there. Example from recent error message. I think this just adds additional logging, but doesn't conditionally silence logged errors
{"level":"error","ts":1760992321.4104438,"caller":"translatedaudit/translated_audit.go:66","msg":"issue saving translated audit.","app_section":"faktory","JID":"0Il7U53WmIU5wD-K","job_type":"TranslateAuditJob","traceID":"85ab5c78-d9cc-41af-aade-4e5ae5004226","change_id":12230,"translated_audit_queue_id":"20d79530-9201-4b61-99ad-52e3a1355f01","error":"issue creating new TranslatedAuditChange object: dbErr: message: issue executing named statement, error pq: insert or update on table \"translated_audit\" violates foreign key constraint \"translated_audit_model_plan_id_fkey\"","stacktrace":"[github.com/cms-enterprise/mint-app/pkg/translatedaudit.TranslateAudit](http://github.com/cms-enterprise/mint-app/pkg/translatedaudit.TranslateAudit)\n\t/mint/pkg/translatedaudit/translated_audit.go:66\[ngithub.com/cms-enterprise/mint-app/pkg/translatedaudit.TranslateAuditJobByID](http://ngithub.com/cms-enterprise/mint-app/pkg/translatedaudit.TranslateAuditJobByID)\n\t/mint/pkg/translatedaudit/translated_audit_job_utitilities.go:35\[ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).TranslateAuditJob](http://ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).TranslateAuditJob)\n\t/mint/pkg/worker/translate_audit_job.go:57\[ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).Work.JobWithPanicProtection.func1](http://ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).Work.JobWithPanicProtection.func1)\n\t/mint/pkg/worker/job_utilities.go:19\[ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).Work.(*Manager).Register.func2](http://ngithub.com/cms-enterprise/mint-app/pkg/worker.(*Worker).Work.(*Manager).Register.func2)\n\t/go/pkg/mod/github.com/contribsys/faktory_worker_go@v1.7.0/manager.go:49\[ngithub.com/contribsys/faktory_worker_go.dispatch](http://ngithub.com/contribsys/faktory_worker_go.dispatch)\n\t/go/pkg/mod/github.com/contribsys/faktory_worker_go@v1.7.0/middleware.go:19\[ngithub.com/contribsys/faktory_worker_go.(*Manager).dispatch](http://ngithub.com/contribsys/faktory_worker_go.(*Manager).dispatch)\n\t/go/pkg/mod/github.com/contribsys/faktory_worker_go@v1.7.0/manager.go:66\[ngithub.com/contribsys/faktory_worker_go.processOne](http://ngithub.com/contribsys/faktory_worker_go.processOne)\n\t/go/pkg/mod/github.com/contribsys/faktory_worker_go@v1.7.0/runner.go:159\[ngithub.com/contribsys/faktory_worker_go.process](http://ngithub.com/contribsys/faktory_worker_go.process)\n\t/go/pkg/mod/github.com/contribsys/faktory_worker_go@v1.7.0/runner.go:101"}
MINT-3448
Description
This change involves add retry logic to only error on the final retry of a faktory job and instead throw warnings up until the final retry
How to test this change
This does take a little bit since this is on cron job and also Faktory has exponential backoff on jobs
PR Author Checklist
PR Reviewer Guidelines