From c60143a808108bad115ae174f00a30adfdd75149 Mon Sep 17 00:00:00 2001 From: Peter Colledge Date: Wed, 20 Dec 2023 11:39:24 +0000 Subject: [PATCH] cleanup: Documentation on NFD --- README.md | 4 ++++ docs/nfd.md | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 docs/nfd.md diff --git a/README.md b/README.md index 71634b7..ce7917d 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ Use OpenOnloadĀ® or EnterpriseOnloadĀ® to accelerate your workloads in Kubernete * [AMD Solarflare](https://www.solarflare.com) hardware (`sfc`) * OpenShift Container Platform (OCP) 4.10+ with * [Kernel Module Management (KMM) Operator](https://kmm.sigs.k8s.io/) 1.1 ([OpenShift documentation](https://docs.openshift.com/container-platform/4.14/hardware_enablement/kmm-kernel-module-management.html)) + * [Node Feature Discovery (NFD)](docs/nfd.md) Operator (optional) * Both restricted network or internet-connected clusters Deployment can also be performed on Kubernetes 1.23+ but full implementation details are not currently provided. @@ -164,6 +165,9 @@ this recommended overlay further, see the variant steps below. The above overlay configures KMM to `modprobe onload` but `modprobe sfc` is also required. Please see [Out-of-tree `sfc` module](#out-of-tree-sfc-kernel-module) for options. +The above overlay selects **all `worker` role nodes** in the cluster. To filter based on node hardware, you may wish +to use the [recommended Node Feature Discovery configuration](docs/nfd.md). + > [!IMPORTANT] > Due to Kubernetes limitations on label lengths, the combined length of the Name and Namespace of the Onload CR must be less than 32 characters. diff --git a/docs/nfd.md b/docs/nfd.md new file mode 100644 index 0000000..d759e4a --- /dev/null +++ b/docs/nfd.md @@ -0,0 +1,54 @@ + +# Selecting Nodes with AMD Solarflare hardware using Node Feature Discovery (NFD) + +## Cluster configuration + +[Node Feature Discovery (NFD)](https://kubernetes-sigs.github.io/node-feature-discovery) +([Redhat documentation](https://docs.openshift.com/container-platform/4.14/hardware_enablement/psap-node-feature-discovery-operator.html#create-cd-cli_node-feature-discovery-operator)) +enables the selection of nodes based on hardware features and system configuration. +NFD-Worker runs on each node to detect changes which are then used to label the node. + +A `NodeFeatureDiscovery` CR enables the detections you require. A full example is provided in the above documentation +if you do not already have one configured. + +To enable detection of AMD Solarflare cards, identified by the PCIe Subsystem Vendor ID '1924', +add the following configuration to your CR's `configData` section: + +```yaml +kind: NodeFeatureDiscovery +... +spec: + ... + workerConfig: + configData: | + sources: + pci: + deviceClassWhitelist: + - "1924" + deviceLabelFields: + - "subsystem_vendor" +``` + +After NFD is deployed, configured, and its daemons have performed detections, verify with: + +```sh +kubectl get nodes -l feature.node.kubernetes.io/pci-1924.present=true +``` + +## Onload Custom Resource (CR) & workload configuration + +Now the above is configured, automated build and loading of the out-of-tree `sfc` driver on all AMD Solarflare +hardware nodes can be easily achieved through the addition the following node label selector in +your Onload CR and/or workloads: + +```yaml + selector: + feature.node.kubernetes.io/pci-1924.present: "true" +``` + +## Footnotes + +```yaml +SPDX-License-Identifier: MIT +SPDX-FileCopyrightText: (c) Copyright 2023 Advanced Micro Devices, Inc. +```