Skip to content

Commit c7aec86

Browse files
committed
docs: Add load balancer zone redundancy documentation
Add comprehensive documentation for zone-redundant load balancer feature: - Explain Azure zone redundancy concepts for load balancers - Provide configuration examples for all load balancer types: - Internal load balancers (API server) - Public load balancers - Node outbound load balancers - Control plane outbound load balancers - Include complete highly available cluster example - Document important considerations: - Immutability of zones after creation - Region support requirements - Standard SKU requirement - Backend pool placement best practices - Provide migration guidance for existing clusters - Add troubleshooting section - Document best practices
1 parent c7bef13 commit c7aec86

File tree

2 files changed

+295
-0
lines changed

2 files changed

+295
-0
lines changed

docs/book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
- [Externally managed Azure infrastructure](./self-managed/externally-managed-azure-infrastructure.md)
3636
- [Failure Domains](./self-managed/failure-domains.md)
3737
- [Flatcar](./self-managed/flatcar.md)
38+
- [Load Balancer Zone Redundancy](./self-managed/load-balancer-zone-redundancy.md)
3839
- [GPU-enabled Clusters](./self-managed/gpu.md)
3940
- [IPv6](./self-managed/ipv6.md)
4041
- [Machine Pools (VMSS)](./self-managed/machinepools.md)
Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# Load Balancer Zone Redundancy
2+
3+
## Zone Redundancy for Load Balancers in Azure
4+
5+
Azure Load Balancers can be configured as zone-redundant to ensure high availability across multiple availability zones within a region. A zone-redundant load balancer distributes traffic across all zones, providing resilience against zone failures.
6+
7+
**Key concepts:**
8+
- Zone redundancy for load balancers is configured through the **frontend IP configuration**
9+
- For **internal load balancers**, zones are set directly on the frontend IP configuration
10+
- For **public load balancers**, zones are inherited from the zone configuration of the public IP address
11+
- **Zones are immutable** - once created, they cannot be changed, added, or removed
12+
13+
Full details can be found in the [Azure Load Balancer reliability documentation](https://learn.microsoft.com/azure/reliability/reliability-load-balancer).
14+
15+
## Configuring Zone-Redundant Load Balancers
16+
17+
CAPZ exposes the `availabilityZones` field on load balancer specifications to enable zone redundancy.
18+
19+
### Internal Load Balancers
20+
21+
For internal load balancers (such as a private API server), you can configure availability zones directly on the load balancer spec:
22+
23+
```yaml
24+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
25+
kind: AzureCluster
26+
metadata:
27+
name: my-cluster
28+
namespace: default
29+
spec:
30+
location: eastus
31+
networkSpec:
32+
apiServerLB:
33+
type: Internal
34+
availabilityZones:
35+
- "1"
36+
- "2"
37+
- "3"
38+
```
39+
40+
This configuration creates a zone-redundant internal load balancer with frontend IPs distributed across zones 1, 2, and 3.
41+
42+
### Public Load Balancers
43+
44+
For public load balancers, zone redundancy is primarily controlled by the public IP addresses. However, you can still set `availabilityZones` on the load balancer for consistency:
45+
46+
```yaml
47+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
48+
kind: AzureCluster
49+
metadata:
50+
name: my-cluster
51+
namespace: default
52+
spec:
53+
location: eastus
54+
networkSpec:
55+
apiServerLB:
56+
type: Public
57+
availabilityZones:
58+
- "1"
59+
- "2"
60+
- "3"
61+
```
62+
63+
> **Note**: For public load balancers, ensure that the associated public IP addresses are also zone-redundant for complete zone redundancy.
64+
65+
### Node Outbound Load Balancer
66+
67+
You can also configure zone redundancy for node outbound load balancers:
68+
69+
```yaml
70+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
71+
kind: AzureCluster
72+
metadata:
73+
name: my-cluster
74+
namespace: default
75+
spec:
76+
location: westus2
77+
networkSpec:
78+
nodeOutboundLB:
79+
type: Public
80+
availabilityZones:
81+
- "1"
82+
- "2"
83+
- "3"
84+
frontendIPs:
85+
- name: node-outbound-ip
86+
publicIP:
87+
name: node-outbound-publicip
88+
```
89+
90+
### Control Plane Outbound Load Balancer
91+
92+
For clusters with private API servers, you can configure the control plane outbound load balancer:
93+
94+
```yaml
95+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
96+
kind: AzureCluster
97+
metadata:
98+
name: my-cluster
99+
namespace: default
100+
spec:
101+
location: eastus
102+
networkSpec:
103+
apiServerLB:
104+
type: Internal
105+
availabilityZones:
106+
- "1"
107+
- "2"
108+
- "3"
109+
controlPlaneOutboundLB:
110+
availabilityZones:
111+
- "1"
112+
- "2"
113+
- "3"
114+
frontendIPs:
115+
- name: controlplane-outbound-ip
116+
publicIP:
117+
name: controlplane-outbound-publicip
118+
```
119+
120+
## Complete Example: Highly Available Cluster
121+
122+
Here's a complete example of a highly available cluster with zone-redundant load balancers:
123+
124+
```yaml
125+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
126+
kind: AzureCluster
127+
metadata:
128+
name: ha-cluster
129+
namespace: default
130+
spec:
131+
location: eastus
132+
resourceGroup: ha-cluster-rg
133+
networkSpec:
134+
# Zone-redundant internal API server load balancer
135+
apiServerLB:
136+
type: Internal
137+
name: ha-cluster-internal-lb
138+
availabilityZones:
139+
- "1"
140+
- "2"
141+
- "3"
142+
frontendIPs:
143+
- name: api-server-internal-ip
144+
privateIPAddress: "10.0.0.100"
145+
146+
# Zone-redundant control plane outbound load balancer
147+
controlPlaneOutboundLB:
148+
name: ha-cluster-cp-outbound-lb
149+
availabilityZones:
150+
- "1"
151+
- "2"
152+
- "3"
153+
frontendIPs:
154+
- name: cp-outbound-ip
155+
publicIP:
156+
name: cp-outbound-publicip
157+
158+
# Zone-redundant node outbound load balancer
159+
nodeOutboundLB:
160+
name: ha-cluster-node-outbound-lb
161+
availabilityZones:
162+
- "1"
163+
- "2"
164+
- "3"
165+
frontendIPs:
166+
- name: node-outbound-ip
167+
publicIP:
168+
name: node-outbound-publicip
169+
170+
# Custom VNet configuration
171+
vnet:
172+
name: ha-cluster-vnet
173+
cidrBlocks:
174+
- "10.0.0.0/16"
175+
176+
subnets:
177+
- name: control-plane-subnet
178+
role: control-plane
179+
cidrBlocks:
180+
- "10.0.0.0/24"
181+
- name: node-subnet
182+
role: node
183+
cidrBlocks:
184+
- "10.0.1.0/24"
185+
```
186+
187+
## Important Considerations
188+
189+
### Immutability
190+
191+
Once a load balancer is created with availability zones, the zone configuration **cannot be changed**. This is an Azure platform limitation. To change zones, you must:
192+
193+
1. Delete the load balancer
194+
2. Recreate it with the new zone configuration
195+
196+
> **Warning**: Changing load balancer zones requires recreating the cluster's load balancers, which will cause service interruption.
197+
198+
### Region Support
199+
200+
Not all Azure regions support availability zones. Before configuring zone-redundant load balancers, verify that your target region supports zones:
201+
202+
```bash
203+
az vm list-skus -l <location> --zone -o table
204+
```
205+
206+
### Standard SKU Requirement
207+
208+
Zone-redundant load balancers require the **Standard SKU**. CAPZ uses Standard SKU by default, so no additional configuration is needed.
209+
210+
### Backend Pool Placement
211+
212+
For optimal high availability:
213+
- Spread your control plane nodes across all availability zones
214+
- Spread your worker nodes across all availability zones
215+
- Ensure backend pool members exist in the same zones as the load balancer
216+
217+
See the [Failure Domains](failure-domains.md) documentation for details on distributing VMs across zones.
218+
219+
## Migration from Non-Zone-Redundant Load Balancers
220+
221+
If you have an existing cluster without zone-redundant load balancers, migration requires careful planning:
222+
223+
### For New Clusters
224+
225+
When creating a new cluster, simply include the `availabilityZones` field in your `AzureCluster` specification from the start.
226+
227+
### For Existing Clusters
228+
229+
**Migration is not straightforward** because:
230+
1. Azure does not allow modifying zones on existing load balancers
231+
2. CAPZ's webhook validation prevents zone changes to enforce this immutability
232+
3. Load balancer recreation requires cluster downtime
233+
234+
**Recommended approach for existing clusters:**
235+
1. Create a new cluster with zone-redundant configuration
236+
2. Migrate workloads to the new cluster
237+
3. Decommission the old cluster
238+
239+
**Alternative for development/test clusters:**
240+
1. Delete the `AzureCluster` resource (this will delete the infrastructure)
241+
2. Recreate the `AzureCluster` with `availabilityZones` configured
242+
3. Reconcile the cluster
243+
244+
> **Important**: The alternative approach causes significant downtime and should only be used in non-production environments.
245+
246+
## Troubleshooting
247+
248+
### Load Balancer Not Zone-Redundant
249+
250+
If your load balancer is not zone-redundant despite configuration:
251+
252+
1. **Verify the zones are set in spec:**
253+
```bash
254+
kubectl get azurecluster <cluster-name> -o jsonpath='{.spec.networkSpec.apiServerLB.availabilityZones}'
255+
```
256+
257+
2. **Check the Azure load balancer frontend configuration:**
258+
```bash
259+
az network lb frontend-ip show \
260+
--lb-name <lb-name> \
261+
--name <frontend-name> \
262+
--resource-group <rg-name> \
263+
--query zones
264+
```
265+
266+
3. **Verify the region supports zones:**
267+
```bash
268+
az vm list-skus -l <location> --zone -o table | grep -i standardsku
269+
```
270+
271+
### Validation Errors
272+
273+
If you encounter validation errors when updating `availabilityZones`:
274+
275+
```
276+
field is immutable
277+
```
278+
279+
This is expected behavior. Zones cannot be modified after creation. You must recreate the load balancer with the desired configuration.
280+
281+
## Best Practices
282+
283+
1. **Enable zone redundancy from the start** when creating new clusters in zone-capable regions
284+
2. **Use all available zones** in the region (typically 3 zones) for maximum resilience
285+
3. **Spread backend pools** across all zones configured on the load balancer
286+
4. **Monitor zone health** and be prepared to handle zone failures
287+
5. **Test failover scenarios** to ensure your cluster can survive zone outages
288+
6. **Document your zone configuration** for disaster recovery procedures
289+
290+
## Related Documentation
291+
292+
- [Failure Domains](failure-domains.md) - Configure VMs across availability zones
293+
- [API Server Endpoint](api-server-endpoint.md) - API server load balancer configuration
294+
- [Azure Load Balancer Reliability](https://learn.microsoft.com/azure/reliability/reliability-load-balancer) - Azure official documentation

0 commit comments

Comments
 (0)