Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 10 additions & 37 deletions deployments/gpu-operator/templates/nodefeaturerules.yaml
Copy link

@msanft msanft Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently tweaked the NodeFeatureRules to work with a B200 (should be GB100, iirc) cluster we received, and the device ID is 0x2901, which wouldn't be matched by the config in this PR. any chance this can be changed?

Original file line number Diff line number Diff line change
Expand Up @@ -44,64 +44,37 @@ spec:
matchExpressions:
sev.enabled:
op: Exists
- name: "NVIDIA H100"
- name: "NVIDIA Hopper GPU"
labels:
"nvidia.com/gpu.H100": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2339"]}
- name: "NVIDIA H100 PCIe"
# H100/H800 (0x2300-0x23ff) from https://admin.pci-ids.ucw.cz/read/PC/10de
device: {op: InRegexp, value: ["^23[0-9a-f]{2}$"]}
- name: "NVIDIA Blackwell GPU"
labels:
"nvidia.com/gpu.H100.pcie": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2331"]}
- name: "NVIDIA H100 80GB HBM3"
labels:
"nvidia.com/gpu.H100.HBM3": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2330"]}
- name: "NVIDIA H800"
labels:
"nvidia.com/gpu.H800": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2324"]}
- name: "NVIDIA H800 PCIE"
labels:
"nvidia.com/gpu.H800.pcie": "true"
"nvidia.com/gpu.family": "hopper"
"nvidia.com/gpu.family": "blackwell"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2322"]}
# GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions specific chip models "GB202-GB207, GB110-GB120" but the regex pattern ^(2[b-f]|3[0-3])[0-9a-f]{2}$ matches a much broader range (0x2b00-0x33ff). This includes values that don't correspond to the listed chip models. Consider either:

  1. Making the comment more generic (e.g., "Blackwell family (0x2b00-0x33ff)")
  2. Narrowing the regex pattern to match only the specific chip ranges if that's the intent

The current discrepancy between the comment and the pattern could be confusing for future maintainers.

Suggested change
# GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de
# Blackwell family (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de

Copilot uses AI. Check for mistakes.
device: {op: InRegexp, value: ["^(2[b-f]|3[0-3])[0-9a-f]{2}$"]}
- name: "NVIDIA CC Enabled"
labels:
"nvidia.com/cc.capable": "true"
matchAny: # TDX/SEV + Hopper GPU
matchAny: # TDX/SEV + Hopper/Blackwell GPU
- matchFeatures:
- feature: rule.matched
matchExpressions:
nvidia.com/gpu.family: {op: In, value: ["hopper"]}
nvidia.com/gpu.family: {op: In, value: ["hopper", "blackwell"]}
sev.snp.enabled: {op: IsTrue}
- matchFeatures:
- feature: rule.matched
matchExpressions:
nvidia.com/gpu.family: {op: In, value: ["hopper"]}
nvidia.com/gpu.family: {op: In, value: ["hopper", "blackwell"]}
tdx.enabled: {op: IsTrue}
{{- end }}