Skip to content

Conversation

@zvonkok
Copy link

@zvonkok zvonkok commented Dec 8, 2025

All Hopper and Hopper+ architectures support CC.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Copilot AI review requested due to automatic review settings December 8, 2025 21:35
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates GPU node labeling rules by replacing specific H100/H800 GPU model rules with broader family-based rules for Hopper and Blackwell architectures. The changes use PCI ID range matching via regex patterns instead of individual device IDs, and extend Confidential Computing (CC) capability support to include both Hopper and Blackwell GPU families.

Key changes:

  • Consolidated 5 specific Hopper-based rules (H100, H100 PCIe, H100 80GB HBM3, H800, H800 PCIE) into a single "NVIDIA Hopper GPU" rule using regex pattern for PCI ID range 0x2300-0x23ff
  • Added new "NVIDIA Blackwell GPU" rule covering PCI ID range 0x2b00-0x33ff
  • Updated CC capability rules to recognize both "hopper" and "blackwell" GPU families with TDX/SEV-SNP support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2322"]}
# GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions specific chip models "GB202-GB207, GB110-GB120" but the regex pattern ^(2[b-f]|3[0-3])[0-9a-f]{2}$ matches a much broader range (0x2b00-0x33ff). This includes values that don't correspond to the listed chip models. Consider either:

  1. Making the comment more generic (e.g., "Blackwell family (0x2b00-0x33ff)")
  2. Narrowing the regex pattern to match only the specific chip ranges if that's the intent

The current discrepancy between the comment and the pattern could be confusing for future maintainers.

Suggested change
# GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de
# Blackwell family (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de

Copilot uses AI. Check for mistakes.
Copy link

@msanft msanft Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently tweaked the NodeFeatureRules to work with a B200 (should be GB100, iirc) cluster we received, and the device ID is 0x2901, which wouldn't be matched by the config in this PR. any chance this can be changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants