HIVE-3020: add MachinePool adoption documentation. #2815

huangmingxia · 2025-12-16T04:55:57Z

Add MachinePool adoption documentation.

Changes

Add documentation for automatic metadata.json generation via retrofit code
Add MachinePool adoption procedure documentation

openshift-ci-robot · 2025-12-16T04:56:01Z

@huangmingxia: This pull request references HIVE-3020 which is a valid jira issue.

Details

In response to this:

Add MachinePool adoption documentation.

Changes

Add documentation for automatic metadata.json generation via retrofit code

Add MachinePool adoption procedure documentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-12-16T04:56:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huangmingxia
Once this PR has been reviewed and has the lgtm label, please assign dlom for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

huangmingxia · 2025-12-17T08:11:27Z

/hold

huangmingxia · 2025-12-19T01:54:05Z

/unhold

huangmingxia · 2025-12-19T11:18:14Z

/test periodic-images

huangmingxia · 2025-12-19T11:27:54Z

/test periodic-images

huangmingxia · 2025-12-19T11:54:12Z

/test periodic-images

huangmingxia · 2025-12-19T12:23:23Z

/test periodic-images

huangmingxia · 2025-12-19T14:24:00Z

/test periodic-images

openshift-ci · 2025-12-19T14:42:14Z

@huangmingxia: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

huangmingxia · 2025-12-19T15:10:02Z

Hi @2uasimojo @suhanime @dlom Could you please help review this doc update when you have time? Thank you!

2uasimojo

This is great work @huangmingxia. We've still got a few gaps to close -- see inline.

2uasimojo · 2026-01-02T20:49:37Z

docs/using-hive.md

+If you want to adopt additional MachineSets for a cluster that is already managed by Hive, you can do so by creating MachinePools that match the existing MachineSets.
+
+Steps:  
+1. Label the existing MachineSets with `hive.openshift.io/machine-pool=<pool-name>`


Confusing terminology here: "pool name" could refer to the MachinePool.metadata.name or the MachinePool.spec.name. I don't have any great ideas for unambiguous terms for these, do you? You've done a good job being specific below, but it might benefit the reader if we introduced precise terminology early and used it consistently throughout.

2uasimojo · 2026-01-02T20:55:45Z

docs/using-hive.md

+   - Current replica count
+   - Any platform-specific configurations (root volume settings, etc.)
+
+2. **Label the existing MachineSets** with the `hive.openshift.io/machine-pool` label. The label value must match the `spec.name` you will use in the MachinePool:


Naming deep dive:

This webhook check forces the MachinePool name to be composed of the CD name and the pool.spec.name.

This restriction should be noted here so the reader doesn't have to find out about it by bouncing off the webhook :)

Are we allowed to pick any arbitrary pool.spec.name regardless of the names of the existing MachineSets? I know we'll use the spec.name to compose the names of new msets when we're creating them; and I know we used to match existing msets by name but don't anymore (right??). If you didn't test this scenario explicitly, please do. LMK if you want to discuss specifics.

(LATER)
3. Oh, there's also HIVE-2199, which I think indicates that we could end up deleting msets whose names match the naming convention accidentally? This may warrant a warning in the documentation, or maybe even a much more prescriptive procedure.

2uasimojo · 2026-01-02T20:59:27Z

docs/using-hive.md

+   # Check MachinePool status, MachineSets are listed in status
+   oc get machinepool mycluster-worker -n mynamespace -o yaml
+   
+   # Check that existing MachineSets were not recreated
+   oc get machinesets -n openshift-machine-api


It's not clear that these two commands are run on different clusters (the first on the hub, the second on the spoke). Above you called the latter "the cluster that you want to manage". Let's establish the terms "hub" and "spoke" early in this section and then be explicit using those terms any time oc is being invoked.

2uasimojo · 2026-01-02T21:01:58Z

docs/using-hive.md

+- Hive will scale `us-east-1a` from 1 to 2 replicas → 1 new Machine created
+- Hive will scale `us-east-1f` from 2 to 1 replica → 1 Machine deleted
+
+Special case: If the total number of replicas equals the number of zones, zone order does not matter (each zone gets exactly 1 replica).


To be annoyingly precise, this extends to whenever the number of replicas is an exact multiple of the number of zones. It may not be worth trying to explain that here -- in fact, I'm not sure it's necessary to mention the subject at all. WDYT?

2uasimojo · 2026-01-02T21:43:40Z

docs/using-hive.md

+
+3. **Create a MachinePool** with specifications that exactly match the existing MachineSets:
+   - The `spec.name` must match the label value you applied in step 2
+   - The `spec.platform` configuration (instance type, zones, etc.) must exactly match the existing MachineSets


Annoyingly, we also rely on the CD.spec.platform for certain pieces of MachineSet configuration.

AWS: Region, UserTags

Azure: Region

GCP: Region

IBMCloud: Region (to discover zones when not specified).

Nutanix: PrismEndpoint, FailureDomains

OpenStack: Cloud

VSphere: Lots of things :(

I say "annoyingly" because it makes our life hard here, but the reasoning behind it is that these are knobs that we expect to be the same for all mpools in a given cluster. That being the case, I don't know that it's even necessary to mention this quirk in the documentation... unless we can come up with counterexamples where it matters. Off the top of my head:

For AWS, if existing msets have one set of UserTags and {cd.spec.platform.aws.userTags U pool.spec.platform.aws.userTags} has a different set, the delta might only be realized under unusual circumstances.

New msets would get the hive-defined tags... but I can't think of a scenario where we would be creating a new mset after an adoption like this.

Existing msets would only get updated if this annotation is used... and such annotations aren't part of the official API, so we really shouldn't document them.

Even if the mset gets updated, MAPI on the spoke will ignore the changes except when creating new Machines, e.g. as a result of scaling.

For VSphere, it may be technically possible in OCP to put different msets into different datacenters/datastores/folders/clusters/networks. I'm not sure how that will play with zonal support, but for now we simply don't support putting msets in any FD other than the singular one defined in cd.spec.platform.vsphere... so if existing msets are not in that FD, we can't adopt them. (Cc @dlom)

2uasimojo · 2026-01-02T21:45:47Z

docs/using-hive.md

+         type: n1-standard-4
+         zones:
+           - us-central1-a
+           # - us-central1-b


It seems odd that this is commented out. The reader might assume that it's because we discovered the existing msets weren't in this AZ... but -d and -e are also missing?

2uasimojo · 2026-01-02T21:49:49Z

docs/using-hive.md

-1. If the referenced Secret is available -- e.g. if the cluster was previously managed by hive -- simply copy it in.
-1. If you have the original metadata.json file -- e.g. if the cluster was provisioned directly via openshift-install -- create the Secret from it: `oc create secret generic my-gcp-cluster-metadata-json -n mynamespace --from-file=metadata.json=/tmp/metadata.json`
-1. Otherwise, you may need to compose the file by hand. See the samples below.
+- The `metadataJSONSecretRef` file is optional for cluster adoption. If you do not specify `metadataJSONSecretRef` in the ClusterDeployment, Hive will automatically generate the metadata.json content from the ClusterDeployment fields and create a secret named `{cluster-name}-metadata-json` (see [retrofitMetadataJSON](https://github.com/openshift/hive/blob/master/pkg/controller/clusterdeployment/clusterdeployment_controller.go#L1110)). The ClusterDeployment will be automatically updated with the `metadataJSONSecretRef` after the secret is created. You only need to manually provide a metadata.json secret if you have specific metadata that cannot be derived from the ClusterDeployment fields.


This section almost looks like it was awaiting a rebase on #2805... but some of your content goes beyond what merged there. I don't mind if you want to beef it up, but it should be done via a separate PR to keep things crisp. (Cc @jianping-shu)

Thanks! @2uasimojo A separate PR #2824 has already been submitted. This document will be updated accordingly.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 16, 2025

openshift-ci bot requested review from 2uasimojo and suhanime December 16, 2025 04:56

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2025

huangmingxia force-pushed the HIVE-3020 branch from 3adb8ee to a883e19 Compare December 19, 2025 10:20

HIVE-3020: docs: add MachinePool adoption documentation improvements

93989e2

huangmingxia force-pushed the HIVE-3020 branch from a883e19 to 93989e2 Compare December 19, 2025 10:28

2uasimojo reviewed Jan 2, 2026

View reviewed changes

HIVE-3020: add MachinePool adoption documentation. #2815

Are you sure you want to change the base?

HIVE-3020: add MachinePool adoption documentation. #2815

Uh oh!

Conversation

huangmingxia commented Dec 16, 2025

Changes

Uh oh!

openshift-ci-robot commented Dec 16, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

openshift-ci bot commented Dec 16, 2025

Uh oh!

huangmingxia commented Dec 17, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

openshift-ci bot commented Dec 19, 2025

Uh oh!

huangmingxia commented Dec 19, 2025

Uh oh!

2uasimojo left a comment

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

huangmingxia Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openshift-ci-robot commented Dec 16, 2025 •

edited by openshift-ci bot

Loading

2uasimojo Jan 2, 2026 •

edited

Loading