diff --git a/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md b/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md index bfcb14281f..a09690aff3 100644 --- a/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md +++ b/tyk-docs/content/planning-for-production/ensure-high-availability/health-check.md @@ -1,33 +1,53 @@ --- -title: "Liveness Health Checks" +title: "Health Checks" date: 2025-02-10 -keywords: ["health check", "liveness health check", "Tyk Gateway", "Tyk Dashboard", "MDCB", "load balancer", "Kubernetes liveness probe"] -description: "How to set up liveness health checks for the Tyk Gateway to ensure high availability and monitor the status of components like Redis, Dashboard, and RPC." +keywords: ["health check", "liveness health check", "readiness health check", "Tyk Gateway", "Tyk Dashboard", "MDCB", "load balancer", "Kubernetes liveness probe", "Kubernetes readiness probe"] +description: "How to set up liveness and readiness health checks for the Tyk Gateway to ensure high availability and monitor the status of components like Redis, Dashboard, and RPC." aliases: - /tyk-rest-api/health-checking --- -## Set Up Liveness Health Checks +## Overview -Health checks are extremely important in determining the status of an -application - in this instance, the Tyk Gateway. Without them, it can be hard to -know the actual state of the Gateway. +Tyk Gateway provides two health check endpoints to help you monitor and manage your API gateway: -Depending on your configuration, the Gateway could be using a few components: +### Quick Reference -- The Tyk Dashboard. -- RPC -- Redis (compulsory). +| Endpoint | Purpose | When to Use | HTTP Response | +|----------|---------|-------------|---------------| +| `/hello` | **Liveness check** | Load balancers, basic monitoring | Always 200 OK | +| `/ready` | **Readiness check** | Kubernetes, traffic routing decisions | 200 OK when ready, 503 when not | -Any of these components could go down at any given point and it is useful to -know if the Gateway is currently usable or not. A good usage of the health -check endpoint is for the configuration of a load balancer to multiple instances of the Gateway or -as a Kubernetes liveness probe. +### Which endpoint should I use? -The following component status will not be returned: +- **Use `/hello` for**: Load balancers, basic uptime monitoring, general health checks +- **Use `/ready` for**: Kubernetes readiness probes, deciding when to route traffic to a new Gateway instance + +## What Gets Monitored + +The health check endpoints monitor these critical Gateway dependencies: + +✅ **Monitored Components:** +- **Redis** (required) - Data storage and caching +- **Tyk Dashboard** (if configured) - API management interface +- **RPC connection** (for MDCB setups) - Multi-data center communication + + +### Kubernetes Deployments +```yaml +# Liveness probe - restarts pod if Gateway process is dead +livenessProbe: + httpGet: + path: /hello + port: 8080 + +# Readiness probe - removes from service when not ready +readinessProbe: + httpGet: + path: /ready + port: 8080 +``` -* MongDB or SQL -* Tyk Pump {{< note success >}} **Note** @@ -73,173 +93,167 @@ The following status levels can be returned in the JSON response. - **fail**: Indicates that Redis AND the Tyk Dashboard are unavailable, and can and indicate other failures. The impact is high (i.e. no configuration changes are available for API/policies/keys, no quotas are applied, and no analytics). -## Configure health check +## The `/ready` Endpoint (Readiness Check) + +Use this endpoint when you need to know if the Gateway is **actually ready** to handle API traffic. + +### What it checks +- ✅ Redis is connected and working +- ✅ APIs have been loaded successfully at least once -By default, the liveness health check runs on the `/hello` path. But -it can be configured to run on any path you want to set. For example: +### How it responds +- **Gateway is ready**: Returns `HTTP 200 OK` +- **Gateway is NOT ready**: Returns `HTTP 503 Service Unavailable` +### When to use `/ready` +- **Kubernetes readiness probes** - Removes pod from service when not ready +- **Graceful Terminations** - Removes pod from service when Gateway is shutting down +- **New deployments** - Wait for 200 response before routing traffic +- **Automated scaling** - Verify new instances are ready before adding to pool + +### Configuration +The endpoint runs on `/ready` by default. To change it: ```yaml -health_check_endpoint_name: "status" +readiness_check_endpoint_name: "status-ready" ``` -This configures the health check to run on `/status` instead of `/hello`. +[config ref](https://tyk.io/docs/tyk-oss-gateway/configuration/#readiness_check_endpoint_name) -**Refresh Interval** +## The `/hello` Endpoint (Liveness Check) -The Health check endpoint will refresh every 10 seconds. +Use this endpoint for basic health monitoring and load balancer health checks. This check returns 200 when the Gateway has started and is attempting to or has arrived to a stable condition. -**HTTP error code** -The Health check endpoint will always return a `HTTP 200 OK` response if the polled health check endpoint is available on your Tyk Gateway. If `HTTP 200 OK` is not returned, your Tyk Gateway is in an error state. +### How it responds +- **Always returns `HTTP 200 OK`** (even when components are failing). +- **Check the response body** to see which components are healthy or failing +### When to use `/hello` +- **Load balancers** - Route traffic to instances that respond +- **Basic monitoring** - Simple uptime checks +- **MDCB setups** - Monitor both Management and Worker Gateways -For MDCB installations the `/hello` endpoint can be polled in either your Management or Worker Gateways. It is recommended to use the `/hello` endpoint behind a load balancer for HA purposes. +### Configuration +The endpoint runs on `/hello` by default. To change it: -## Health check examples +```yaml +health_check_endpoint_name: "status" +``` -The following examples show how the Health check endpoint returns +[Config Ref](https://tyk.io/docs/tyk-oss-gateway/configuration/#health_check_endpoint_name) +### Important Notes +- **Updates every 10 seconds** - Health status is cached and refreshed automatically +- **Always responds with 200** - Even when Redis or Dashboard are down (check response body for details) +- **Use for load balancers** - Perfect for HAProxy, NGINX, AWS ALB health checks -### Pass Status +## Testing the Health Check Endpoints -The following is returned for a `pass` status level for the Open Source Gateway: +### Quick Health Check +```bash +# Check if Gateway is alive (always returns 200) +curl http://localhost:8080/hello +# Check if Gateway is ready to serve traffic +curl http://localhost:8080/ready ``` -$ http :8080/hello + +### `/ready` Endpoint Examples + +**✅ Gateway is ready** (returns `HTTP 200 OK`): +```bash +$ curl -i http://localhost:8080/ready HTTP/1.1 200 OK -Content-Length: 156 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:36:09 GMT { + "status": "pass", "description": "Tyk GW", "details": { - "redis": { - "componentType": "datastore", - "status": "pass", - "time": "2021-04-14T17:36:03Z" - } - }, - "status": "pass", - "version": "v3.1.1" + "redis": { "status": "pass" } + } } ``` -### Redis outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 303 -Content-Type: application/json -Date: Wed, 14 Apr 2021 14:58:06 GMT +**❌ Gateway is NOT ready** (returns `HTTP 503 Service Unavailable`): +```bash +$ curl -i http://localhost:8080/ready +HTTP/1.1 503 Service Unavailable { - "description": "Tyk GW", + "status": "fail", + "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "status": "pass", - "time": "2021-04-14T14:58:03Z" - }, - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", + "redis": { "status": "fail", - "time": "2021-04-14T14:58:03Z" + "output": "Redis is down or not configured" } - }, - "status": "warn", - "version": "v3.1.2" + } } ``` -### Dashboard outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 292 -Content-Type: application/json -Date: Wed, 14 Apr 2021 15:52:47 GMT +### `/hello` Endpoint Examples +**✅ All systems healthy** (always returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "pass", "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "output": "dashboard is down? Heartbeat is failing", - "status": "fail", - "time": "2021-04-14T15:52:43Z" - }, - "redis": { - "componentType": "datastore", - "status": "pass", - "time": "2021-04-14T15:52:43Z" - } - }, - "status": "warn", - "version": "v3.1.2" + "redis": { "status": "pass" }, + "dashboard": { "status": "pass" } + } } ``` -### Dashboard and Redis outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 354 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:53:33 GMT +**⚠️ Redis is down** (still returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "warn", "description": "Tyk GW", "details": { - "dashboard": { - "componentType": "system", - "output": "dashboard is down? Heartbeat is failing", + "redis": { "status": "fail", - "time": "2021-04-14T17:53:33Z" + "output": "Redis is down or not configured" }, - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", - "status": "fail", - "time": "2021-04-14T17:53:33Z" - } - }, - "status": "fail", - "version": "v3.1.2" + "dashboard": { "status": "pass" } + } } ``` - -### MDCB Worker Gateway RPC outage - -``` -$ http :8080/hello -HTTP/1.1 200 OK -Content-Length: 333 -Content-Type: application/json -Date: Wed, 14 Apr 2021 17:21:24 GMT - +**❌ Multiple components down** (still returns `HTTP 200 OK`): +```bash +$ curl http://localhost:8080/hello { + "status": "fail", "description": "Tyk GW", "details": { - "redis": { - "componentType": "datastore", - "output": "storage: Redis is either down or was not configured", - "status": "fail", - "time": "2021-04-14T17:21:16Z" - }, - "rpc": { - "componentType": "system", - "output": "Could not connect to RPC", - "status": "fail", - "time": "2021-04-14T17:21:16Z" - } - }, - "status": "fail", - "version": "v3.1.2" + "redis": { "status": "fail" }, + "dashboard": { "status": "fail" } + } } ``` +## Troubleshooting with Health Checks + +### Understanding Status Levels + +| Status | Meaning | What to do | +|--------|---------|------------| +| `pass` | All components healthy | ✅ Gateway is working normally | +| `warn` | Some components down | ⚠️ Gateway works but with reduced functionality | +| `fail` | Critical components down | ❌ Gateway may not work properly | + +### Common Issues + +**Redis connection failed**: +- Check Redis is running: `redis-cli ping` +- Verify connection settings in Gateway config +- Check network connectivity to Redis + +**Dashboard connection failed**: +- Verify Dashboard is running and accessible +- Check Dashboard URL in Gateway config +- Test connectivity: `curl http://dashboard:3000/hello` +