Skip to content

Conversation

@richardcase
Copy link
Member

What this PR does / why we need it:

This adds documentation that details the contract for providers when implementing an infrastructure machine pool.

This has been created retrospectively from looking at a number of providers and the MachinePool controller.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #12799

/area machinepool

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/machinepool Issues or PRs related to machinepools labels Nov 7, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 7, 2025
@k8s-ci-robot k8s-ci-robot requested a review from elmiko November 7, 2025 14:35
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 7, 2025
@k8s-ci-robot k8s-ci-robot requested a review from sivchari November 7, 2025 14:35
@richardcase richardcase force-pushed the machinepool_contract_doc branch from d1d60f8 to 271c8df Compare November 7, 2025 14:40
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 7, 2025
@richardcase richardcase force-pushed the machinepool_contract_doc branch from 271c8df to 5cc7517 Compare November 7, 2025 15:00
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 7, 2025
@richardcase richardcase force-pushed the machinepool_contract_doc branch from 5cc7517 to 53b1241 Compare November 7, 2025 15:02
@richardcase richardcase changed the title [WIP] 📖 docs: machinepool contract spec 📖 docs: machinepool contract spec Nov 7, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2025
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!
This really helps folks like me to step up knowledge about MachinePools!

@sbueringer
Copy link
Member

/assign

Would like to review after Fabrizio lgtm and before merge

@richardcase
Copy link
Member Author

Thanks for your feedback @fabriziopandini . I will make updates based on this.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sbueringer. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@richardcase richardcase force-pushed the machinepool_contract_doc branch from cbf9bc7 to 22f729e Compare November 10, 2025 15:44
@richardcase
Copy link
Member Author

@fabriziopandini - i have updated the doc based on your feedback.

@bnallapeta
Copy link
Contributor

@richardcase are we not going to document anything on the upgrades? My understanding is that the status would change based on the update strategy. For example:

Atomic updates:

  • status.replicas temporarily drops to 0
  • All providerIDs disappear then reappear
  • InfrastructureReady may flip to False

Rolling updates:

  • status.replicas stays >= desired (with surge) or slightly below
  • providerIDs change gradually (old removed, new added)
  • InfrastructureReady stays True

Should providers declare their update strategy in the contract so CAPI can adapt behavior accordingly?

@richardcase
Copy link
Member Author

Should providers declare their update strategy in the contract so CAPI can adapt behavior accordingly?

@bnallapeta - i think we should follow up on this as we are trying to document the current state of the contract. Providers declaring an "update strategy" would be a change to the current contract.

We are going to need to update the MachinePool controller document based on the introduction of this contract doc. Perhaps we should include behaviour type stuff when we do that?

@bnallapeta
Copy link
Contributor

@richardcase quoting from #10496,

Furthermore I don't know if there is a documented contract as of today for MachinePools how BootstrapConfigs are supposed to be rolled out. I also don't know if MachinePools behave the same across providers today.

I think we should talk about this in the contract. A few questions on this:

Should providers watch for bootstrap config changes and trigger updates? If yes, what's the signal? Hash of the secret data? ConfigRef version?

@richardcase
Copy link
Member Author

Note for the maintainers, i will squash the commits when we are all happy with the doc.

@richardcase
Copy link
Member Author

Should providers watch for bootstrap config changes and trigger updates? If yes, what's the signal? Hash of the secret data? ConfigRef version?

I agree i do think we need to document this in the MachinPools documentation. However, i don't think this sits in the contract document. Lets add this elsewhere, perhaps where i suggested earlier: https://cluster-api.sigs.k8s.io/developer/core/controllers/machine-pool when we make changes to that.

@chrischdi
Copy link
Member

Note for the maintainers, i will squash the commits when we are all happy with the doc.

no need to

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 11, 2025
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to, but I think we should take a bolder step with regards to moving to the new contract, otherwise it will be confusing for users and complicated for us to handle


A provider can opt-in to MachinePool Machines (MPM). With MPM machines all the replicas in a MachinePool are represented by a Machine & InfraMachine. This enables core CAPI to perform common operations on single machines (and their Nodes), such as draining a node before scale down, integration with Cluster Autoscaler and also [MachineHealthChecks].

If you want to adopt MPM then you MUST have an `status.infrastructureMachineKind` field and the field must contain the resource kind that represents the replicas in the pool. This is usually named InfraMachine if machine pool machines are representable like regular machines, or InfraMachinePoolMachine in other cases. For example, for the AWS provider the value would be set to `AWSMachine`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I grasp the meaning of

This is usually named InfraMachine if machine pool machines are representable like regular machines, or InfraMachinePoolMachine in other cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworded this, hopefully it makes more sense now.

Comment on lines 307 to 309
Currently this is done by setting `status.ready` to **true**. The value returned here is stored in the MachinePool's `status.infraStructureReady` field.

Additionally providers should set `initialization.provisioned` to **true**. This value isn't currently used by core CAPI for MachinePools. However, MachinePools will start to use this instead and `status.ready` will be deprecated. By setting both these fields it will make the future migration easier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure asking providers to have both provides benefit, e.g. for other contract we ask to have one, depending on the contract they declare the provider is supporting.

Also, do you have a timeline in mind for for this to happen?

(for sake of simplicity I would prefer if there is only one way to transition from one contract version to another; similarly it will be easier to manage if we can align the timeline to v1beta1 removal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this section as we discussed.

@sbueringer
Copy link
Member

@richardcase Do you have time to address the final findings so we can merge this PR? Thx!

@richardcase
Copy link
Member Author

Do you have time to address the final findings so we can merge this PR? Thx!

I do. Sorry for the delay, i kind of got consumed by some downstream work. Chatting with Fabrizio in a bit to clarify a point, and then I'll get the last edits done today.

This adds documentation that details the contract for providers when
implementing an infrastructure machine pool.

This has been created retrospectively from looking at a number of
providers and the MachinePool controller.

Signed-off-by: Richard Case <richard.case@outlook.com>
Changes after the first review by Fabrizio.

Signed-off-by: Richard Case <richard.case@outlook.com>
Changes after the first review by Fabrizio.

Signed-off-by: Richard Case <richard.case@outlook.com>
Some updates after an additional review by Andreas.

Signed-off-by: Richard Case <richard.case@outlook.com>
Various updates after review by Fabrizio and Andreas. Specifically:

- Making it clearer what is preferred state vs current deprecated
- Rewording some parts to be clearer.

Signed-off-by: Richard Case <richard.case@outlook.com>
@richardcase richardcase force-pushed the machinepool_contract_doc branch from 3e1ec90 to ed15af9 Compare November 28, 2025 16:48
@richardcase
Copy link
Member Author

This should be good to go now.

Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


```

Once `status.initialization.provisioned` is set the MachinePool "core" controller will bubble this info in the MachinePool's `status.initialization.infrastructureProvisioned`; also InfraMachinePools’s `spec.providerIDList` and `status.replicas` will be surfaced on MachinePool’s corresponding fields at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once `status.initialization.provisioned` is set the MachinePool "core" controller will bubble this info in the MachinePool's `status.initialization.infrastructureProvisioned`; also InfraMachinePools’s `spec.providerIDList` and `status.replicas` will be surfaced on MachinePool’s corresponding fields at the same time.
Once `status.initialization.provisioned` is set, the MachinePool "core" controller will bubble this info in the MachinePool's `status.initialization.infrastructureProvisioned`; also InfraMachinePools’s `spec.providerIDList` and `status.replicas` will be surfaced on MachinePool’s corresponding fields at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document contract for machine pools

7 participants