Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion resources/hiding_ci/build_and_install_kernel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ confirm() {
apply_patch_file() {
echo "Applying patch:" $(basename $1)

git apply $1
git apply --allow-empty $1
}

apply_patch_or_series() {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
From 363385a3c2cd4f7fe445ed71329e55d190cb14d5 Mon Sep 17 00:00:00 2001
From: Takahiro Itazuri <itazur@amazon.com>
Date: Tue, 2 Dec 2025 12:15:49 +0000
Subject: [RFC PATCH 0/2] KVM: pfncache: Support guest_memfd without direct map

[ based on kvm/next with [1] ]

Recent work on guest_memfd [1] is introducing support for removing guest
memory from the kernel direct map (Note that it hasn't been merged yet,
and that is why this patch series is labelled RFC). The feature is
useful for non-CoCo VMs to prevent the host kernel from accidentally or
speculatively accessing guest memory as a general safety improvement.
Pages for guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP have
their direct-map PTEs explicitly disabled, and thus cannot rely on the
direct map.

This breaks the facilities that use gfn_to_pfn_cache, including
kvm-clock. gfn_to_pfn_cache caches the pfn and kernel host virtual
address (khva) for a given gfn so that KVM can repeatedly read or write
the corresponding guest page. The cached khva may be later dereferenced
from atomic contexts in some cases. Such contexts cannot tolerate
sleeping or page faults, and therefore cannot use the userspace mapping
(uhva), as those mappings may fault at any time. As a result,
gfn_to_pfn_cache requires a stable, fault-free kernel virtual address
for the backing pages, independent of the userspace page.

This small patch series enables gfn_to_pfn_cache to work correctly when
a memslot is backed by guest_memfd with GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
The first patch teaches gfn_to_pfn_cache to obtain pfn for guest_memfd-
backed memslots via kvm_gmem_get_pfn() instead of GUP (hva_to_pfn()).
The second patch makes gfn_to_pfn_cache use vmap()/vunmap() to create a
fault-free kernel address for such pages. We believe that establishing
such mapping for paravirtual guest/host communication is acceptable
since such pages do not contain sensitive data.

Another considered idea was to use memremap() instead of vmap(), since
gpc_map() already falls back to memremap() if pfn_valid() is false.
However, vmap() was chosen for the following reason. memremap() with
MEMREMAP_WB first attempts to use the direct map via try_ram_remap(),
and then falls back to arch_memremap_wb(), which explicitly refuses to
map system RAM. It would be possible to relax this restriction, but the
side effects are unclear because memremap() is widely used throughout
the kernel. Changing memremap() to support system RAM without the
direct map solely for gfn_to_pfn_cache feels disproportionate. If
additional users appear that need to map system RAM without the direct
map, revisiting and generalizing memremap() might make sense. For now,
vmap()/vunmap() provides a contained and predictable solution.

A possible approach in the future is to use the "ephmap" (or proclocal)
proposed in [2], but it is not yet clear when that work will be merged.
In constrast, the changes in this patch series are small and
self-contained, yet immediately allow gfn_to_pfn_cache (including
kvm-clock) to operate correctly with direct map-removed guest_memfd.
Once ephmap eventually is merged, gfn_to_pfn_cache can be updated to
make use of it as appropriate.

[1]: https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus.lmu.de/
[2]: https://lore.kernel.org/all/20250812173109.295750-1-jackmanb@google.com/

Takahiro Itazuri (2):
KVM: pfncache: Use kvm_gmem_get_pfn() for guest_memfd-backed memslots
KVM: pfncache: Use vmap() for guest_memfd pages without direct map

include/linux/kvm_host.h | 7 ++++++
virt/kvm/pfncache.c | 52 +++++++++++++++++++++++++++++-----------
2 files changed, 45 insertions(+), 14 deletions(-)

--
2.50.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
From bebfd5914aae9d2eaec6b90b6408875f2aa40610 Mon Sep 17 00:00:00 2001
From: Takahiro Itazuri <itazur@amazon.com>
Date: Mon, 1 Dec 2025 14:58:44 +0000
Subject: [RFC PATCH 1/2] KVM: pfncache: Use kvm_gmem_get_pfn() for guest_memfd-backed memslots

gfn_to_pfn_cache currently relies on hva_to_pfn(), which resolves PFNs
through GUP. GUP assumes that the page has a valid direct-map PTE,
which is not true for guest_memfd created with
GUEST_MEMFD_FLAG_NO_DIRECT_MAP, because their direct-map PTEs are
explicitly removed via set_direct_map_valid_noflush().

Introduce a helper function, gpc_to_pfn(), that routes PFN lookup to
kvm_gmem_get_pfn() for guest_memfd-backed memslots (regardless of
whether GUEST_MEMFD_FLAG_NO_DIRECT_MAP is set), and otherwise falls
back to the existing hva_to_pfn() path. Rename hva_to_pfn_retry() to
gpc_to_pfn_retry() accordingly.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
virt/kvm/pfncache.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 728d2c1b488a..bf8d6090e283 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -152,22 +152,34 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
return kvm->mmu_invalidate_seq != mmu_seq;
}

-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+static kvm_pfn_t gpc_to_pfn(struct gfn_to_pfn_cache *gpc, struct page **page)
{
- /* Note, the new page offset may be different than the old! */
- void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
- kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
- void *new_khva = NULL;
- unsigned long mmu_seq;
- struct page *page;
+ if (kvm_slot_has_gmem(gpc->memslot)) {
+ kvm_pfn_t pfn;
+
+ kvm_gmem_get_pfn(gpc->kvm, gpc->memslot, gpa_to_gfn(gpc->gpa),
+ &pfn, page, NULL);
+ return pfn;
+ }

struct kvm_follow_pfn kfp = {
.slot = gpc->memslot,
.gfn = gpa_to_gfn(gpc->gpa),
.flags = FOLL_WRITE,
.hva = gpc->uhva,
- .refcounted_page = &page,
+ .refcounted_page = page,
};
+ return hva_to_pfn(&kfp);
+}
+
+static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+{
+ /* Note, the new page offset may be different than the old! */
+ void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
+ kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
+ void *new_khva = NULL;
+ unsigned long mmu_seq;
+ struct page *page;

lockdep_assert_held(&gpc->refresh_lock);

@@ -206,7 +218,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
cond_resched();
}

- new_pfn = hva_to_pfn(&kfp);
+ new_pfn = gpc_to_pfn(gpc, &page);
if (is_error_noslot_pfn(new_pfn))
goto out_error;

@@ -319,7 +331,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
}
}

- /* Note: the offset must be correct before calling hva_to_pfn_retry() */
+ /* Note: the offset must be correct before calling gpc_to_pfn_retry() */
gpc->uhva += page_offset;

/*
@@ -327,7 +339,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
* drop the lock and do the HVA to PFN lookup again.
*/
if (!gpc->valid || hva_change) {
- ret = hva_to_pfn_retry(gpc);
+ ret = gpc_to_pfn_retry(gpc);
} else {
/*
* If the HVA→PFN mapping was already valid, don't unmap it.
--
2.50.1

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
From 6c90c75b15ba48cc5e0a1e74224c041c3c45668b Mon Sep 17 00:00:00 2001
From: Takahiro Itazuri <itazur@amazon.com>
Date: Mon, 1 Dec 2025 16:47:05 +0000
Subject: [RFC PATCH 2/2] KVM: pfncache: Use vmap() for guest_memfd pages without direct map

gfn_to_pfn_cache currently maps RAM PFNs with kmap(), which relies on
the direct map. guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP
disable their direct-map PTEs via set_direct_map_valid_noflush(), so the
linear address returned by kmap()/page_address() will fault if
dereferenced.

In some cases, gfn_to_pfn_cache dereferences the cached kernel address
(khva) from atomic contexts where page faults cannot be tolerated.
Therefore khva must always refer to a fault-free kernel mapping. Since
mapping and unmapping happen exclusively in the refresh path, which may
sleep, using vmap()/vunmap() for these pages is safe and sufficient.

Introduce kvm_slot_no_direct_map() to detect guest_memfd slots without
the direct map, and make gpc_map()/gpc_unmap() use vmap()/vunmap() for
such pages.

This allows the facilities based on gfn_to_pfn_cache (e.g. kvm-clock) to
work correctly with guest_memfd regardless of whether its direct-map
PTEs are valid.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
include/linux/kvm_host.h | 7 +++++++
virt/kvm/pfncache.c | 26 ++++++++++++++++++++------
2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 70e6a5210ceb..793d98f97928 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -15,6 +15,7 @@
#include <linux/minmax.h>
#include <linux/mm.h>
#include <linux/mmu_notifier.h>
+#include <linux/pagemap.h>
#include <linux/preempt.h>
#include <linux/msi.h>
#include <linux/slab.h>
@@ -628,6 +629,12 @@ static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *sl
return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
}

+static inline bool kvm_slot_no_direct_map(const struct kvm_memory_slot *slot)
+{
+ return slot && kvm_slot_has_gmem(slot) &&
+ mapping_no_direct_map(slot->gmem.file->f_mapping);
+}
+
static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
{
return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index bf8d6090e283..ae6d8699e536 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -96,10 +96,16 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
return true;
}

-static void *gpc_map(kvm_pfn_t pfn)
+static void *gpc_map(struct gfn_to_pfn_cache *gpc, kvm_pfn_t pfn)
{
- if (pfn_valid(pfn))
- return kmap(pfn_to_page(pfn));
+ if (pfn_valid(pfn)) {
+ struct page *page = pfn_to_page(pfn);
+
+ if (kvm_slot_no_direct_map(gpc->memslot))
+ return vmap(&page, 1, VM_MAP, PAGE_KERNEL);
+
+ return kmap(page);
+ }

#ifdef CONFIG_HAS_IOMEM
return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
@@ -115,6 +121,11 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
return;

if (pfn_valid(pfn)) {
+ if (is_vmalloc_addr(khva)) {
+ vunmap(khva);
+ return;
+ }
+
kunmap(pfn_to_page(pfn));
return;
}
@@ -224,13 +235,16 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

/*
* Obtain a new kernel mapping if KVM itself will access the
- * pfn. Note, kmap() and memremap() can both sleep, so this
- * too must be done outside of gpc->lock!
+ * pfn. Note, kmap(), vmap() and memremap() can sleep, so this
+ * too must be done outside of gpc->lock! Note that even though
+ * the rwlock is dropped, it's still fine to access gpc->pfn and
+ * other fields because gpc->fresh_lock mutex prevents those
+ * from being changed.
*/
if (new_pfn == gpc->pfn)
new_khva = old_khva;
else
- new_khva = gpc_map(new_pfn);
+ new_khva = gpc_map(gpc, new_pfn);

if (!new_khva) {
kvm_release_page_unused(page);
--
2.50.1