Skip to content

Conversation

@roxanan1996
Copy link
Contributor

@roxanan1996 roxanan1996 commented Dec 10, 2025

DESCRIPTION

Commit 12f147d ("do_change_type(): refuse to operate on unmounted/not ours mounts") is the fix.
Clean cherry-pick.

But it had a bf. Commit cffd044 "use uniform permission checks for all mount propagation changes"
that was not a clean cherry-pick. Similar to what we need for lts9.4 and 9.2,
I cherry-pick the dependency
9ffb14e ("move_mount: allow to add a mount into an existing group").
This required a fix cffd044 ("fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)")

That's how we ended up with 4 commits instead of only 2.

COMMITS

do_change_type(): refuse to operate on unmounted/not ours mounts

jira VULN-98605
cve CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit 12f147ddd6de7382dad54812e65f3f08d05809fc
move_mount: allow to add a mount into an existing group

jira VULN-98605
cve-bf CVE-2025-38498
commit-author Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
commit 9ffb14ef61bab83fa818736bf3e7e6b6e182e8e2
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)

jira VULN-98605
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit d8cc0362f918d020ca1340d7694f07062dc30f36
use uniform permission checks for all mount propagation changes

jira VULN-98605
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit cffd0441872e7f6b1fce5e78fb1c99187a291330

TESTING

BUILD

> grep -E -B 5 -A 5 '\[TIMER\]|^Starting Build' /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kernel-build-after.log
/home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kernel-src-tree
Running make mrproper...
[TIMER]{MRPROPER}: 4s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1439s
Making Modules
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+
[TIMER]{MODULES}: 9s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 31s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+ and Index to 2
The default is /boot/loader/entries/1c00816342e14fbeabc332089f863e3e-4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+.conf with index 2 and kernel /boot/vmlinuz-4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+
The default is /boot/loader/entries/1c00816342e14fbeabc332089f863e3e-4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+.conf with index 2 and kernel /boot/vmlinuz-4.18.0-rnicolescu_ciqlts8_6_CVE-2025-38498-6aaf7af89f8fa+
Generating grub configuration file ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 4s
[TIMER]{BUILD}: 1439s
[TIMER]{MODULES}: 9s
[TIMER]{INSTALL}: 31s
[TIMER]{TOTAL} 1487s
Rebooting in 10 seconds

Kselftests

> /home/rnicolescu/ciq/kernel-tools/kselftest-diff.sh /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498
/home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kselftest-before.log
212
/home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kselftest-after.log
212
Before: /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kselftest-before.log
After: /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kselftest-after.log
Diff:
No differences found.

Check_kernel_commits

> python3 /home/rnicolescu/ciq/kernel-src-tree-tools/check_kernel_commits.py --repo /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kernel-src-tree --pr_branch {rnicolescu}_ciqlts8_6_CVE-2025-38498 --base_branch origin/ciqlts8_6 --check-cves
All referenced commits exist upstream and have no Fixes: tags.

Run interdiff

> python3 /home/rnicolescu/ciq/kernel-src-tree-tools/run_interdiff.py --repo /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kernel-src-tree --pr_branch {rnicolescu}_ciqlts8_6_CVE-2025-38498 --base_branch origin/ciqlts8_6
[DIFF] PR commit 13b682ac19640 (move_mount: allow to add a mount into an existing group) → upstream 9ffb14ef61ba
Differences found:

  diff -u b/fs/namespace.c b/fs/namespace.c
  --- b/fs/namespace.c
  +++ b/fs/namespace.c
  @@ -2708,4 +2708,4 @@
   
   static int do_move_mount(struct path *old_path, struct path *new_path)
   {
  -	struct mnt_namespace *ns;
  +	struct path parent_path = {.mnt = NULL, .dentry = NULL};

Due to missing 2763d11 ("get rid of detach_mnt()") but it's not needed for this.

[DIFF] PR commit 6aaf7af89f8fa (use uniform permission checks for all mount propagation changes) → upstream cffd0441872e
Differences found:

  diff -u b/fs/namespace.c b/fs/namespace.c
  --- b/fs/namespace.c
  +++ b/fs/namespace.c
  @@ -2251,4 +2251,4 @@
  -	return attach_recursive_mnt(mnt, p, mp);
  +	return attach_recursive_mnt(mnt, p, mp, NULL);
   }
   
   static int may_change_propagation(const struct mount *m)

Due to missing 86b1da9 ("attach_recursive_mnt(): get rid of flags entirely"), not relevant.

Run jira_pr_check

> python3 /home/rnicolescu/ciq/kernel-src-tree-tools/jira_pr_check.py --kernel-src-tree /home/rnicolescu/ciq/kernels/lts-8.6_CVE-2025-38498/kernel-src-tree --merge-target {rnicolescu}_ciqlts8_6_CVE-2025-38498 --pr-branch origin/ciqlts8_6

## JIRA PR Check Results

✅ **No issues found!**


---
**Summary:** Checked 0 commit(s) total.

jira VULN-98605
cve CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit 12f147d

Ensure that propagation settings can only be changed for mounts located
in the caller's mount namespace. This change aligns permission checking
with the rest of mount(2).

	Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 07b2088 ("beginning of the shared-subtree proper")
	Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 12f147d)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98605
cve-bf CVE-2025-38498
commit-author Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
commit 9ffb14e

Previously a sharing group (shared and master ids pair) can be only
inherited when mount is created via bindmount. This patch adds an
ability to add an existing private mount into an existing sharing group.

With this functionality one can first create the desired mount tree from
only private mounts (without the need to care about undesired mount
propagation or mount creation order implied by sharing group
dependencies), and next then setup any desired mount sharing between
those mounts in tree as needed.

This allows CRIU to restore any set of mount namespaces, mount trees and
sharing group trees for a container.

We have many issues with restoring mounts in CRIU related to sharing
groups and propagation:
- reverse sharing groups vs mount tree order requires complex mounts
  reordering which mostly implies also using some temporary mounts
(please see https://lkml.org/lkml/2021/3/23/569 for more info)

- mount() syscall creates tons of mounts due to propagation
- mount re-parenting due to propagation
- "Mount Trap" due to propagation
- "Non Uniform" propagation, meaning that with different tricks with
  mount order and temporary children-"lock" mounts one can create mount
  trees which can't be restored without those tricks
(see https://www.linuxplumbersconf.org/event/7/contributions/640/)

With this new functionality we can resolve all the problems with
propagation at once.

Link: https://lore.kernel.org/r/20210715100714.120228-1-ptikhomirov@virtuozzo.com
	Cc: Eric W. Biederman <ebiederm@xmission.com>
	Cc: Alexander Viro <viro@zeniv.linux.org.uk>
	Cc: Christian Brauner <christian.brauner@ubuntu.com>
	Cc: Mattias Nissler <mnissler@chromium.org>
	Cc: Aleksa Sarai <cyphar@cyphar.com>
	Cc: Andrei Vagin <avagin@gmail.com>
	Cc: linux-fsdevel@vger.kernel.org
	Cc: linux-api@vger.kernel.org
	Cc: lkml <linux-kernel@vger.kernel.org>
Co-developed-by: Andrei Vagin <avagin@gmail.com>
	Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
	Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Signed-off-by: Andrei Vagin <avagin@gmail.com>
	Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
(cherry picked from commit 9ffb14e)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98605
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit d8cc036

9ffb14e "move_mount: allow to add a mount into an existing group"
breaks assertions on ->mnt_share/->mnt_slave.  For once, the data structures
in question are actually documented.

Documentation/filesystem/sharedsubtree.rst:
        All vfsmounts in a peer group have the same ->mnt_master.  If it is
	non-NULL, they form a contiguous (ordered) segment of slave list.

do_set_group() puts a mount into the same place in propagation graph
as the old one.  As the result, if old mount gets events from somewhere
and is not a pure event sink, new one needs to be placed next to the
old one in the slave list the old one's on.  If it is a pure event
sink, we only need to make sure the new one doesn't end up in the
middle of some peer group.

"move_mount: allow to add a mount into an existing group" ends up putting
the new one in the beginning of list; that's definitely not going to be
in the middle of anything, so that's fine for case when old is not marked
shared.  In case when old one _is_ marked shared (i.e. is not a pure event
sink), that breaks the assumptions of propagation graph iterators.

Put the new mount next to the old one on the list - that does the right thing
in "old is marked shared" case and is just as correct as the current behaviour
if old is not marked shared (kudos to Pavel for pointing that out - my original
suggested fix changed behaviour in the "nor marked" case, which complicated
things for no good reason).

	Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 9ffb14e ("move_mount: allow to add a mount into an existing group")
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit d8cc036)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98605
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit cffd044

do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph.  The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for.  The former is a mess - originally it
didn't even check that mount *is* mounted.  That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.

What we really need (in both cases) is
	* only touch mounts that are mounted.  That's a must-have
constraint - data corruption happens if it get violated.
	* don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).

That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().

Fixes: 12f147d "do_change_type(): refuse to operate on unmounted/not ours mounts"
	Acked-by: Andrei Vagin <avagin@gmail.com>
	Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Reviewed-by: Christian Brauner <brauner@kernel.org>
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit cffd044)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
@roxanan1996 roxanan1996 marked this pull request as draft December 10, 2025 14:16
@github-actions
Copy link

🔍 Interdiff Analysis

  • ⚠️ PR commit 13b682ac1964 (move_mount: allow to add a mount into an existing group) → upstream 9ffb14ef61ba
    Differences found:
diff -u b/fs/namespace.c b/fs/namespace.c
--- b/fs/namespace.c
+++ b/fs/namespace.c
@@ -2708,4 +2708,4 @@
 
 static int do_move_mount(struct path *old_path, struct path *new_path)
 {
-	struct mnt_namespace *ns;
+	struct path parent_path = {.mnt = NULL, .dentry = NULL};
  • ⚠️ PR commit 6aaf7af89f8f (use uniform permission checks for all mount propagation changes) → upstream cffd0441872e
    Differences found:
diff -u b/fs/namespace.c b/fs/namespace.c
--- b/fs/namespace.c
+++ b/fs/namespace.c
@@ -2251,4 +2251,4 @@
-	return attach_recursive_mnt(mnt, p, mp);
+	return attach_recursive_mnt(mnt, p, mp, NULL);
 }
 
 static int may_change_propagation(const struct mount *m)

This is an automated interdiff check for backported commits.

@roxanan1996 roxanan1996 marked this pull request as ready for review December 12, 2025 15:47
@roxanan1996 roxanan1996 requested a review from a team December 12, 2025 15:47
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants