Skip to content

Conversation

@sempervictus
Copy link
Contributor

@sempervictus sempervictus commented Nov 23, 2025

Motivation and Context

fix #15668 and proxy upstreaming per discussion therein

Description

Upstream @bspengler-oss memory fixes for ZFS permitting operation under the much more rigorous memory and execution controls of grsecurity/pax kernels.

How Has This Been Tested?

2.3.4 no longer blows up with KERNSEAL+PRIVKSTACK+RAP+KERNEXEC+UDEREF and so forth

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

HIGHMEM kmap interfaces operate on only a single page at a time, yet ZFS hadn't accounted for this, resulting in crashes and potential memory corruption on HIGHMEM (typically 32-bit) systems.  This was caught by PaX's KERNSEAL feature as it makes use of HIGHMEM functionality on x64.
    
On typical 64-bit systems, this issue wouldn't have been observed, as the map interfaces simply fall back to returning an address in lowmem where the contiguous pages can be accessed directly.
    
Joint work with the PaX Team, tested by Mark van Dijk

Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
ZFS typically preserves proper LIFO ordering regarding map/unmap operations that wrap the Linux kernel's kmap interfaces that require such ordering, but one instance in abd_raidz_gen_iterate() did not.
    
Similar issues have been fixed in the Linux kernel in the past, see for instance CVE-2025-39899 for userfaultfd.

Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Fix another instance where ZFS assumes multiple pages can be mapped at once via zfs_kmap_local(), resulting in crashes and potential memory corruption on HIGHMEM-enabled (typically 32-bit) systems.

Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
@sempervictus
Copy link
Contributor Author

Ping @behlendorf for continuity from the prior conversation

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work running these highmem cases down. Unfortunately, we don't regularly test on 32-bit systems to catch this kind of thing. The fixes themselves look right to me.

@sempervictus can you fix the cstyle issues by updating the patches to wrap at 80 columns, and the commit comments at 72. Then rebase this again on the latest master.

paddr = zfs_kmap_local(sg_page(aiter->iter_sg));
page = sg_page(aiter->iter_sg);
if (PageHighMem(page)) {
page = nth_page(page, offset / PAGE_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to offset >> PAGE_SHIFT to avoid depending on the compiler to optimize this.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 5, 2025

page = sg_page(aiter->iter_sg);
if (PageHighMem(page))
offset &= ~PAGE_MASK;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wanted to mention, which had been pointed out to me by the PaX Team, is you may (depending on style preferences) want to use PAGE_SIZE - 1 here and at the other use on line 923 above rather than ~PAGE_MASK, that is, both lines would be turned into:

offset &= PAGE_SIZE - 1;

Both evaluate to the same value on Linux, but on FreeBSD PAGE_MASK has a different interpretation/value which may cause some confusion for developers working on both sides of the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I didn't realize things were different on FreeBSD in this regard. Then yes, let's update it to PAGE_SIZE - 1 to avoid any ambiguity here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, FreeBSD uses PAGE_MASK to mean the offset within the page (therefore has a 0xfff value), whereas Linux uses PAGE_MASK to mean the address aligned to page boundary (~0xfff). Exact opposite meaning and value, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HIGHMEM violation by kmap in zfs_uiomove_bvec_impl

3 participants