From 45f14e69cac8b0e0fed0b7c9f7bf8c3182f33ac9 Mon Sep 17 00:00:00 2001 From: Sedky Date: Mon, 18 Aug 2025 15:42:10 -0400 Subject: [PATCH] Update health check documentation to include readiness checks and improve clarity on usage and configuration. Renamed section titles and expanded content to cover both liveness and readiness health checks for the Tyk Gateway. --- .../ensure-high-availability/health-check.md | 280 +++++++++--------- 1 file changed, 147 insertions(+), 133 deletions(-) diff --git a/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md b/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md index bfcb14281f..a09690aff3 100644 --- a/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md +++ b/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md @@ -1,33 +1,53 @@ --- -title: "Liveness Health Checks" +title: "Health Checks" date: 2025-02-10 -keywords: ["health check", "liveness health check", "Tyk Gateway", "Tyk Dashboard", "MDCB", "load balancer", "Kubernetes liveness probe"] -description: "How to set up liveness health checks for the Tyk Gateway to ensure high availability and monitor the status of components like Redis, Dashboard, and RPC." +keywords: ["health check", "liveness health check", "readiness health check", "Tyk Gateway", "Tyk Dashboard", "MDCB", "load balancer", "Kubernetes liveness probe", "Kubernetes readiness probe"] +description: "How to set up liveness and readiness health checks for the Tyk Gateway to ensure high availability and monitor the status of components like Redis, Dashboard, and RPC." aliases: - /tyk-rest-api/health-checking --- -## Set Up Liveness Health Checks +## Overview -Health checks are extremely important in determining the status of an -application - in this instance, the Tyk Gateway. Without them, it can be hard to -know the actual state of the Gateway. +Tyk Gateway provides two health check endpoints to help you monitor and manage your API gateway: -Depending on your configuration, the Gateway could be using a few components: +### Quick Reference -- The Tyk Dashboard. -- RPC -- Redis (compulsory). +| Endpoint | Purpose | When to Use | HTTP Response | +|----------|---------|-------------|---------------| +| `/hello` | **Liveness check** | Load balancers, basic monitoring | Always 200 OK | +| `/ready` | **Readiness check** | Kubernetes, traffic routing decisions | 200 OK when ready, 503 when not | -Any of these components could go down at any given point and it is useful to -know if the Gateway is currently usable or not. A good usage of the health -check endpoint is for the configuration of a load balancer to multiple instances of the Gateway or -as a Kubernetes liveness probe. +### Which endpoint should I use? -The following component status will not be returned: +- **Use `/hello` for**: Load balancers, basic uptime monitoring, general health checks +- **Use `/ready` for**: Kubernetes readiness probes, deciding when to route traffic to a new Gateway instance + +## What Gets Monitored + +The health check endpoints monitor these critical Gateway dependencies: + +✅ **Monitored Components:** +- **Redis** (required) - Data storage and caching +- **Tyk Dashboard** (if configured) - API management interface +- **RPC connection** (for MDCB setups) - Multi-data center communication + + +### Kubernetes Deployments +```yaml +# Liveness probe - restarts pod if Gateway process is dead +livenessProbe: + httpGet: + path: /hello + port: 8080 + +# Readiness probe - removes from service when not ready +readinessProbe: + httpGet: + path: /ready + port: 8080 +``` -* MongDB or SQL -* Tyk Pump {{< note success >}} **Note** @@ -73,173 +93,167 @@ The following status levels can be returned in the JSON response. - **fail**: Indicates that Redis AND the Tyk Dashboard are unavailable, and can and indicate other failures. The impact is high (i.e. no configuration changes are available for API/policies/keys, no quotas are applied, and no analytics). -## Configure health check +## The `/ready` Endpoint (Readiness Check) + +Use this endpoint when you need to know if the Gateway is **actually ready** to handle API traffic. + +### What it checks +- ✅ Redis is connected and working +- ✅ APIs have been loaded successfully at least once -By default, the liveness health check runs on the `/hello` path. But -it can be configured to run on any path you want to set. For example: +### How it responds +- **Gateway is ready**: Returns `HTTP 200 OK` +- **Gateway is NOT ready**: Returns `HTTP 503 Service Unavailable` +### When to use `/ready` +- **Kubernetes readiness probes** - Removes pod from service when not ready +- **Graceful Terminations** - Removes pod from service when Gateway is shutting down +- **New deployments** - Wait for 200 response before routing traffic +- **Automated scaling** - Verify new instances are ready before adding to pool + +### Configuration +The endpoint runs on `/ready` by default. To change it: ```yaml -health_check_endpoint_name: "status" +readiness_check_endpoint_name: "status-ready" ``` -This configures the health check to run on `/status` instead of `/hello`. +[config ref](https://tyk.io/docs/tyk-oss-gateway/configuration/#readiness_check_endpoint_name) -**Refresh Interval** +## The `/hello` Endpoint (Liveness Check) -The Health check endpoint will refresh every 10 seconds. +Use this endpoint for basic health monitoring and load balancer health checks. This check returns 200 when the Gateway has started and is attempting to or has arrived to a stable condition. -**HTTP error code** -The Health check endpoint will always return a `HTTP 200 OK` response if the polled health check endpoint is available on your Tyk Gateway. If `HTTP 200 OK` is not returned, your Tyk Gateway is in an error state. +### How it responds +- **Always returns `HTTP 200 OK`** (even when components are failing). +- **Check the response body** to see which components are healthy or failing +### When to use `/hello` +- **Load balancers** - Route traffic to instances that respond +- **Basic monitoring** - Simple uptime checks +- **MDCB setups** - Monitor both Management and Worker Gateways -For MDCB installations the `/hello` endpoint can be polled in either your Management or Worker Gateways. It is recommended to use the `/hello` endpoint behind a load balancer for HA purposes. +### Configuration +The endpoint runs on `/hello` by default. To change it: -## Health check examples +```yaml +health_check_endpoint_name: "status" +``` -The following examples show how the Health check endpoint returns +[Config Ref](https://tyk.io/docs/tyk-oss-gateway/configuration/#health_check_endpoint_name) +### Important Notes +- **Updates every 10 seconds** - Health status is cached and refreshed automatically +- **Always responds with 200** - Even when Redis or Dashboard are down (check response body for details) +- **Use for load balancers** - Perfect for HAProxy, NGINX, AWS ALB health checks -### Pass Status +## Testing the Health Check Endpoints -The following is returned for a `pass` status level for the Open Source Gateway: +### Quick Health Check +```bash +# Check if Gateway is alive (always returns 200) +curl http://localhost:8080/hello +# Check if Gateway is ready to serve traffic +curl http://localhost:8080/ready ``` -$ http :8080/hello + +### `/ready` Endpoint Examples + +**✅ Gateway is ready** (returns `HTTP 200 OK`): +```bash +$ curl -i http://localhost:8080/ready HTTP/1.1 200 OK -Content-Length: 156 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:36:09 GMT { + "status": "pass", "description": "Tyk GW", "details": { - "redis": { - "componentType": "datastore", - "status": "pass", - "time": "2021-04-14T17:36:03Z" - } - }, - "status": "pass", - "version": "v3.1.1" + "redis": { "status": "pass" } + } } ``` -### Redis outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 303 -Content-Type: application/json -Date: Wed, 14 Apr 2021 14:58:06 GMT +**❌ Gateway is NOT ready** (returns `HTTP 503 Service Unavailable`): +```bash +$ curl -i http://localhost:8080/ready +HTTP/1.1 503 Service Unavailable { - "description": "Tyk GW", + "status": "fail", + "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "status": "pass", - "time": "2021-04-14T14:58:03Z" - }, - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", + "redis": { "status": "fail", - "time": "2021-04-14T14:58:03Z" + "output": "Redis is down or not configured" } - }, - "status": "warn", - "version": "v3.1.2" + } } ``` -### Dashboard outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 292 -Content-Type: application/json -Date: Wed, 14 Apr 2021 15:52:47 GMT +### `/hello` Endpoint Examples +**✅ All systems healthy** (always returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "pass", "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "output": "dashboard is down? Heartbeat is failing", - "status": "fail", - "time": "2021-04-14T15:52:43Z" - }, - "redis": { - "componentType": "datastore", - "status": "pass", - "time": "2021-04-14T15:52:43Z" - } - }, - "status": "warn", - "version": "v3.1.2" + "redis": { "status": "pass" }, + "dashboard": { "status": "pass" } + } } ``` -### Dashboard and Redis outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 354 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:53:33 GMT +**⚠️ Redis is down** (still returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "warn", "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "output": "dashboard is down? Heartbeat is failing", + "redis": { "status": "fail", - "time": "2021-04-14T17:53:33Z" + "output": "Redis is down or not configured" }, - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", - "status": "fail", - "time": "2021-04-14T17:53:33Z" - } - }, - "status": "fail", - "version": "v3.1.2" + "dashboard": { "status": "pass" } + } } ``` - -### MDCB Worker Gateway RPC outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 333 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:21:24 GMT - +**❌ Multiple components down** (still returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "fail", "description": "Tyk GW", "details": { - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", - "status": "fail", - "time": "2021-04-14T17:21:16Z" - }, - "rpc": { - "componentType": "system", - "output": "Could not connect to RPC", - "status": "fail", - "time": "2021-04-14T17:21:16Z" - } - }, - "status": "fail", - "version": "v3.1.2" + "redis": { "status": "fail" }, + "dashboard": { "status": "fail" } + } } ``` +## Troubleshooting with Health Checks + +### Understanding Status Levels + +| Status | Meaning | What to do | +|--------|---------|------------| +| `pass` | All components healthy | ✅ Gateway is working normally | +| `warn` | Some components down | ⚠️ Gateway works but with reduced functionality | +| `fail` | Critical components down | ❌ Gateway may not work properly | + +### Common Issues + +**Redis connection failed**: +- Check Redis is running: `redis-cli ping` +- Verify connection settings in Gateway config +- Check network connectivity to Redis + +**Dashboard connection failed**: +- Verify Dashboard is running and accessible +- Check Dashboard URL in Gateway config +- Test connectivity: `curl http://dashboard:3000/hello` +