-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
fix(sentry apps): Self-heal/regenereate broken ServiceHooks #96712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
def regenerate_service_hook_for_installation( | ||
installation_id: int, servicehook_events: list[str] | None | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function would be used whenever we encounter the SentryAppWebhookFailureReason.MISSING_SERVICEHOOK
or SentryAppWebhookFailureReason.EVENT_NOT_IN_SERVCEHOOK
failure cases.
In both cases we usually should have an installation with a working ServiceHook
a la we use the Sentry app's event list as the source of truth for all installations
This logic was put into another task because I didn't want another logically separate operation to potentially pollute the results/info of another task. And similarly allows us to get better data on when this operation is ran.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #96712 +/- ##
==========================================
- Coverage 80.63% 80.15% -0.49%
==========================================
Files 8475 8483 +8
Lines 373193 379104 +5911
Branches 24205 24205
==========================================
+ Hits 300929 303857 +2928
- Misses 71891 74874 +2983
Partials 373 373 |
assert installation is not None, "Installation must exist to regenerate service hooks" | ||
app_events = installation.sentry_app.events | ||
|
||
if servicehook_events is None or set(servicehook_events) != set(app_events): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this more idk if we should even have these checks, like I think we can assume the caller will have done the validation/check that the ServiceHook is in a bad state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with your intuition: i think this task should only be called if this if statement is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some a metric to the task so we can monitor how often it is triggers, and create a redash dashboard/query to show us the burndown rate of the broken servicehooks?
@@ -799,6 +799,38 @@ def send_webhooks(installation: RpcSentryAppInstallation, event: str, **kwargs: | |||
) | |||
|
|||
|
|||
@instrumented_task( | |||
"sentry.sentry_apps.tasks.sentry_apps.regenerate_service_hook_for_installation", | |||
taskworker_config=TaskworkerConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just making sure we don't have to add anything extra from TASK_OPTIONS
in the taskworker config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
er im not sure if we need to add anything else. O I did bork the namespace though. I highkey just copied this from the other tasks
installation_id: int, servicehook_events: list[str] | None | ||
) -> None: | ||
installation = app_service.installation_by_id(id=installation_id) | ||
assert installation is not None, "Installation must exist to regenerate service hooks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we raise some integrity exception and add more context ie, installation_id, etc.
another idea is to add a logger.info
with this context (it becomes a breadcrumb in the sentry error)
assert installation is not None, "Installation must exist to regenerate service hooks" | ||
app_events = installation.sentry_app.events | ||
|
||
if servicehook_events is None or set(servicehook_events) != set(app_events): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with your intuition: i think this task should only be called if this if statement is true.
I think we can piggy back off the general task metric to see how often the task is triggered. As for some stat/visualization of the burndown,
Also I was thinking we could track this via the open Sentry issues in the PR description. Since if those go down we can correlate that to burning down the broken servicehooks |
Finally got around to doing this... In the past some Sentry Apps have been updated and their ServiceHooks(read: webhooks) weren't updated properly for various reasons. Leaving people with broken webhooks. That also caused these Sentry errors to surface to us
This PR adds a task that will gradually regenerate the broken webhooks by re-running the
create_or_update_service_hooks_for_installation
function. That function is what should've been run during the initial webhook update but for various reason (timeouts, errors etc.) didn't complete.