Skip to content

Commit 94c55d8

Browse files
committed
Add startupProbe to otel-agent container for graceful startup
The otel-agent container can take extra time to bind to port 13133, especially when the cluster is under heavy load or resources are constrained. Without a startup probe, the pod may be restarted prematurely by Kubernetes if the readiness or liveness probes fail during this slow startup period, leading to CrashLoopBackOff errors. By introducing a startupProbe, we provide a dedicated grace period for the otel-agent to initialize and bind to its health check port before readiness and liveness checks begin. This ensures the pod is not restarted unnecessarily during startup, improving reliability and reducing the risk of CrashLoopBackOff due to transient startup delays.
1 parent f9b4977 commit 94c55d8

File tree

3 files changed

+21
-0
lines changed

3 files changed

+21
-0
lines changed

manifests/templates/reconciler-manager-configmap.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,13 @@ data:
195195
volumeMounts:
196196
- name: otel-agent-config-reconciler-vol
197197
mountPath: /conf
198+
startupProbe:
199+
httpGet:
200+
path: /
201+
port: 13133
202+
failureThreshold: 30 # Allow up to 5 minutes (30 * 10s)
203+
periodSeconds: 10
204+
initialDelaySeconds: 0
198205
readinessProbe:
199206
httpGet:
200207
path: /

manifests/templates/reconciler-manager.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,13 @@ spec:
9191
volumeMounts:
9292
- name: otel-agent-config-vol
9393
mountPath: /conf
94+
startupProbe:
95+
httpGet:
96+
path: /
97+
port: 13133
98+
failureThreshold: 30 # Allow up to 5 minutes (30 * 10s)
99+
periodSeconds: 10
100+
initialDelaySeconds: 0
94101
readinessProbe:
95102
httpGet:
96103
path: /

manifests/templates/resourcegroup-manifest.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,13 @@ spec:
275275
- containerPort: 55678
276276
- containerPort: 8888
277277
- containerPort: 13133
278+
startupProbe:
279+
httpGet:
280+
path: /
281+
port: 13133
282+
failureThreshold: 30 # Allow up to 5 minutes (30 * 10s)
283+
periodSeconds: 10
284+
initialDelaySeconds: 0
278285
readinessProbe:
279286
httpGet:
280287
path: /

0 commit comments

Comments
 (0)