Skip to content

Conversation

@shreeya-patel98
Copy link
Collaborator

Update process (This kernel CentOS base for 4.18.0-553)

  • Kernel History Rebuild Process for all src.rpms hosted by RESF
  • Create sig-cloud-8/4.18.0-553.80.1.el8_10 branch
  • Check if any maintained code is included in the new el release.
  • Cherry-pick all code from previous branch into new branch (skipping unneeded code)
    • Fix conflicts as they arise
  • Build and Test

Removed Commits

None

Rebase Results

1635dd14f0aa (HEAD -> shreeya_sig-cloud-8/4.18.0-553.80.1.el8_10, origin/shreeya_sig-cloud-8/4.18.0-553.80.1.el8_10) RDMA/mana_ib: use the correct page table index based on hardware page size
2d06885844f7 RDMA/mana_ib: use the correct page size for mapping user-mode doorbell page
8450d53b7276 RDMA/mana_ib: Fix bug in creation of dma regions
c67c2dd5bfdd net: mana: Add support for page sizes other than 4KB on ARM64
262d69572086 net: mana: Enable MANA driver on ARM64 with 4K page size
a5b8d71c783b x86/cpu: Provide default cache line size if not enumerated
a5956399d254 x86/cpu: Get rid of an unnecessary local variable in get_cpu_address_sizes()
9d9631e735e5 x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()
55afbde3958d x86/boot: Move x86_cache_alignment initialization to correct spot
b3bd11934be6 x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
9646b4b50868 (tag: resf_kernel-4.18.0-553.80.1.el8_10, origin/sig-cloud-8/4.18.0-553.80.1.el8_10, origin/rocky8_10, sig-cloud-8/4.18.0-553.80.1.el8_10, rocky8_10) Rebuild rocky8_10 with kernel-4.18.0-553.80.1.el8_10

Build

/mnt/scratch/workspace/sig-cloud-8/kernel-src-tree
Skipping make mrproper
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0"
Making olddefconfig
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  UPD     include/generated/uapi/linux/version.h
  DESCEND objtool
  UPD     include/config/kernel.release
  DESCEND bpf/resolve_btfids
  UPD     include/generated/utsrelease.h
  CALL    scripts/checksyscalls.sh
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/file2alias.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTLD  scripts/mod/modpost
  CHK     include/generated/compile.h
  CC      arch/x86/hyperv/hv_init.o
  CC      init/version.o
  AS      arch/x86/entry/vdso/vdso-note.o
  AS      arch/x86/entry/vdso/vdso32/note.o
  VDSO    arch/x86/entry/vdso/vdso64.so.dbg
  VDSO    arch/x86/entry/vdso/vdso32.so.dbg
  OBJCOPY arch/x86/entry/vdso/vdso64.so
  OBJCOPY arch/x86/entry/vdso/vdso32.so
  VDSO2C  arch/x86/entry/vdso/vdso-image-64.c
  VDSO2C  arch/x86/entry/vdso/vdso-image-32.c
  CC      arch/x86/entry/vdso/vdso-image-64.o
  AR      init/built-in.a
  CC      arch/x86/entry/vdso/vdso-image-32.o
--
  INSTALL sound/usb/snd-usb-audio.ko
  INSTALL sound/usb/snd-usbmidi-lib.ko
  INSTALL sound/usb/usx2y/snd-usb-us122l.ko
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+
[TIMER]{MODULES}: 10s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 20s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+ and Index to 0
The default is /boot/loader/entries/753870ec7b134d7582d817dd9bf35992-4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+.conf with index 0 and kernel /boot/vmlinuz-4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+
The default is /boot/loader/entries/753870ec7b134d7582d817dd9bf35992-4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+.conf with index 0 and kernel /boot/vmlinuz-4.18.0-shreeya_sig-cloud-8_4.18.0-553.80.1.el8_10-1635dd14f0+
Generating grub configuration file ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 453s
[TIMER]{MODULES}: 10s
[TIMER]{INSTALL}: 20s
[TIMER]{TOTAL} 488s
Rebooting in 10 seconds

kernel-build.log

KeslfTests

shreeya@spatel-dev-bom ~/c/w/sig-cloud-8> grep -a ^ok kselftest-after.log | wc -l
217
shreeya@spatel-dev-bom ~/c/w/sig-cloud-8> grep -a ^ok kselftest-before.log | wc -l
217

kselftest-after.log
kselftest-before.log

ciq-sahlberg and others added 10 commits October 30, 2025 10:48
…tead of a two-phase approach

jira roc-2673
commit fbf6449

Instead of setting x86_virt_bits to a possibly-correct value and then
correcting it later, do all the necessary checks before setting it.

At this point, the #VC handler references boot_cpu_data.x86_virt_bits,
and in the previous version, it would be triggered by the CPUIDs between
the point at which it is set to 48 and when it is set to the correct
value.

    Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Adam Dunlap <acdunlap@google.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Jacob Xu <jacobhxu@google.com>
    Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com

Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira roc-2673
commit 3e32552

c->x86_cache_alignment is initialized from c->x86_clflush_size.
However, commit fbf6449 moved c->x86_clflush_size initialization
to later in boot without moving the c->x86_cache_alignment assignment:

  fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")

This presumably left c->x86_cache_alignment set to zero for longer
than it should be.

The result was an oops on 32-bit kernels while accessing a pointer
at 0x20.  The 0x20 came from accessing a structure member at offset
0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10.  kmalloc() can
evidently return ZERO_SIZE_PTR when it's given 0 as its alignment
requirement.

Move the c->x86_cache_alignment initialization to be after
c->x86_clflush_size has an actual value.

    Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/r/20231002220045.1014760-1-dave.hansen@linux.intel.com
    (cherry picked from commit 3e32552)
Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>

Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-2183
bug-fix x86/sev-es: Set x86_virt_bits
commit-author Paolo Bonzini <pbonzini@redhat.com>
commit 9a45819

In commit fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct
value straight away, instead of a two-phase approach"), the initialization
of c->x86_phys_bits was moved after this_cpu->c_early_init(c).  This is
incorrect because early_init_amd() expected to be able to reduce the
value according to the contents of CPUID leaf 0x8000001f.

Fortunately, the bug was negated by init_amd()'s call to early_init_amd(),
which does reduce x86_phys_bits in the end.  However, this is very
late in the boot process and, most notably, the wrong value is used for
x86_phys_bits when setting up MTRRs.

To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is
set/cleared, and c->extended_cpuid_level is retrieved.

Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
(cherry picked from commit 9a45819)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
…sizes()

jira LE-2183
bug-fix-prereq x86/sev-es: Set x86_virt_bits
commit-author Borislav Petkov (AMD) <bp@alien8.de>
commit 95bfb35

Drop 'vp_bits_from_cpuid' as it is not really needed.

No functional changes.

	Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240316120706.4352-1-bp@alien8.de
(cherry picked from commit 95bfb35)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-2183
bug-fix x86/sev-es: Set x86_virt_bits
commit-author Dave Hansen <dave.hansen@linux.intel.com>
commit 2a38e4c

tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.

Long Story:

The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0.  Normally, this value is populated from
c->x86_clflush_size.

Right now the code is set up to get c->x86_clflush_size from two
places.  First, modern CPUs get it from CPUID.  Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().

The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID.  But there are oddballs.

Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:

	cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
	if (cap0 & (1<<19))
		c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;

So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size.  That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:

        buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
                         GFP_KERNEL);

To fix this, always provide a sane value for ->x86_clflush_size.

Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs.  It would not, for instance, have worked on
the QEMU config.

1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel/GenuineIntel0000590_Clanton_03_CPUID.txt
2. You can also get this behavior if you use "-cpu 486,+clzero"
   in QEMU.

[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
	   because bpetkov brutally murdered it recently. ]

Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
	Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
	Tested-by: Jörn Heusipp <osmanx@heusipp.de>
	Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linux.intel.com/
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.de/
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.intel.com
(cherry picked from commit 2a38e4c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-3812
commit-author Haiyang Zhang <haiyangz@microsoft.com>
commit 40a1d11

Change the Kconfig dependency, so this driver can be built and run on ARM64
with 4K page size.
16/64K page sizes are not supported yet.

	Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1715632141-8089-1-git-send-email-haiyangz@microsoft.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 40a1d11)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-3812
commit-author Haiyang Zhang <haiyangz@microsoft.com>
commit 382d174

As defined by the MANA Hardware spec, the queue size for DMA is 4KB
minimal, and power of 2. And, the HWC queue size has to be exactly
4KB.

To support page sizes other than 4KB on ARM64, define the minimal
queue size as a macro separately from the PAGE_SIZE, which we always
assumed it to be 4KB before supporting ARM64.

Also, add MANA specific macros and update code related to size
alignment, DMA region calculations, etc.

	Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
	Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 382d174)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-3812
commit-author Konstantin Taranov <kotaranov@microsoft.com>
commit e02497f

Use ib_umem_dma_offset() helper to calculate correct dma offset.

Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
	Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://lore.kernel.org/r/1709560361-26393-2-git-send-email-kotaranov@linux.microsoft.com
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit e02497f)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
…l page

jira LE-3812
commit-author Long Li <longli@microsoft.com>
commit 4a3b99b

When mapping doorbell page from user-mode, the driver should use the system
page size as this memory is allocated via mmap() from user-mode.

	Cc: stable@vger.kernel.org
Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
	Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1725030993-16213-2-git-send-email-longli@linuxonhyperv.com
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit 4a3b99b)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
… size

jira LE-3812
commit-author Long Li <longli@microsoft.com>
commit 9e517a8

MANA hardware uses 4k page size. When calculating the page table index,
it should use the hardware page size, not the system page size.

	Cc: stable@vger.kernel.org
Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
	Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1725030993-16213-1-git-send-email-longli@linuxonhyperv.com
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit 9e517a8)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
@shreeya-patel98 shreeya-patel98 requested a review from a team October 30, 2025 12:03
@shreeya-patel98 shreeya-patel98 self-assigned this Oct 30, 2025
@shreeya-patel98
Copy link
Collaborator Author

Sorry, for some reason I lost the logs for the following command.

python3 rolling-release-update.py --repo ../kernel-src-tree/ \
--new-base-branch rocky8_10 \
--old-rolling-branch sig-cloud-8/4.18.0-553.79.1.el8_10

I've added the output of git log --oneline which has the commits that were added after running the above command.

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shreeya-patel98 shreeya-patel98 merged commit ee9f1c2 into sig-cloud-8/4.18.0-553.80.1.el8_10 Oct 31, 2025
2 checks passed
@shreeya-patel98 shreeya-patel98 deleted the shreeya_sig-cloud-8/4.18.0-553.80.1.el8_10 branch October 31, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants