File tree Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Original file line number Diff line number Diff line change 22
33## master
44* [ ENHANCEMENT] Add bigger tenants and configure default compactor tenant shards
5+ * [ ENHANCEMENT] Add alert ` CortexCompactorWriteVisitMarkerIsFailing ` to monitor compactors
56
67## 1.17.1 / 2024-10-23
78* [ CHANGE] Use cortex v1.17.1
Original file line number Diff line number Diff line change 102102 ||| % $._config,
103103 },
104104 },
105+ {
106+ // Alert if compactor are not able to update the visit-marker.
107+ alert: 'CortexCompactorBlockVisitMarkerIsFailing' ,
108+ 'for' : '2h' ,
109+ expr: |||
110+ sum(increase(cortex_compactor_block_visit_marker_write_failed{job=~".+/%(compactor)s"}[2h]))>0
111+ ||| % $._config.job_names,
112+ labels: {
113+ severity: 'critical'
114+ },
115+ annotations: {
116+ message: |||
117+ Cortex compactors are not able to update the visit marker, double check logs to see what is happening
118+ |||
119+ }
120+ }
105121 ],
106122 },
107123 ],
Original file line number Diff line number Diff line change @@ -379,6 +379,17 @@ How to **investigate**:
379379- Ensure ingesters are successfully shipping blocks to the storage
380380- Look for any error in the compactor logs
381381
382+ ### CortexCompactorWriteVisitMarkerIsFailing
383+
384+ Only applies to compactors when using shuffle sharding.
385+ This alert fires if the compactor is not able to update the visit marker across all tenants.
386+ The marker file is a very small json file that should never have any problems getting updated.
387+
388+ How to **investigate**:
389+ - Verify the logs for the compactors, they should show the exact reason
390+ - If you see the `context canceled` or any other timeouts in the logs,
391+ consider increasing `-compactor.compaction-visit-marker-timeout` and `-compactor.compaction-visit-marker-file-update-interval`.
392+
382393### CortexCompactorHasNotSuccessfullyRunCompaction
383394
384395This alert fires if the compactor is not able to successfully compact all discovered compactable blocks (across all tenants).
You can’t perform that action at this time.
0 commit comments