Skip to content

Commit f0886ca

Browse files
committed
secret-hiding: Update kernel patches for kvm-clock
We used a very ad-hoc solution for kvm-clock. The new kernel patches make gfn_to_pfn_cache (that kvm-clock is based on) work for guest_memfd without the direct map. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
1 parent 3b0a2ef commit f0886ca

File tree

4 files changed

+284
-103
lines changed

4 files changed

+284
-103
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
From 363385a3c2cd4f7fe445ed71329e55d190cb14d5 Mon Sep 17 00:00:00 2001
2+
From: Takahiro Itazuri <itazur@amazon.com>
3+
Date: Tue, 2 Dec 2025 12:15:49 +0000
4+
Subject: [RFC PATCH 0/2] KVM: pfncache: Support guest_memfd without direct map
5+
6+
[ based on kvm/next with [1] ]
7+
8+
Recent work on guest_memfd [1] is introducing support for removing guest
9+
memory from the kernel direct map (Note that it hasn't been merged yet,
10+
and that is why this patch series is labelled RFC). The feature is
11+
useful for non-CoCo VMs to prevent the host kernel from accidentally or
12+
speculatively accessing guest memory as a general safety improvement.
13+
Pages for guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP have
14+
their direct-map PTEs explicitly disabled, and thus cannot rely on the
15+
direct map.
16+
17+
This breaks the facilities that use gfn_to_pfn_cache, including
18+
kvm-clock. gfn_to_pfn_cache caches the pfn and kernel host virtual
19+
address (khva) for a given gfn so that KVM can repeatedly read or write
20+
the corresponding guest page. The cached khva may be later dereferenced
21+
from atomic contexts in some cases. Such contexts cannot tolerate
22+
sleeping or page faults, and therefore cannot use the userspace mapping
23+
(uhva), as those mappings may fault at any time. As a result,
24+
gfn_to_pfn_cache requires a stable, fault-free kernel virtual address
25+
for the backing pages, independent of the userspace page.
26+
27+
This small patch series enables gfn_to_pfn_cache to work correctly when
28+
a memslot is backed by guest_memfd with GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
29+
The first patch teaches gfn_to_pfn_cache to obtain pfn for guest_memfd-
30+
backed memslots via kvm_gmem_get_pfn() instead of GUP (hva_to_pfn()).
31+
The second patch makes gfn_to_pfn_cache use vmap()/vunmap() to create a
32+
fault-free kernel address for such pages. We believe that establishing
33+
such mapping for paravirtual guest/host communication is acceptable
34+
since such pages do not contain sensitive data.
35+
36+
Another considered idea was to use memremap() instead of vmap(), since
37+
gpc_map() already falls back to memremap() if pfn_valid() is false.
38+
However, vmap() was chosen for the following reason. memremap() with
39+
MEMREMAP_WB first attempts to use the direct map via try_ram_remap(),
40+
and then falls back to arch_memremap_wb(), which explicitly refuses to
41+
map system RAM. It would be possible to relax this restriction, but the
42+
side effects are unclear because memremap() is widely used throughout
43+
the kernel. Changing memremap() to support system RAM without the
44+
direct map solely for gfn_to_pfn_cache feels disproportionate. If
45+
additional users appear that need to map system RAM without the direct
46+
map, revisiting and generalizing memremap() might make sense. For now,
47+
vmap()/vunmap() provides a contained and predictable solution.
48+
49+
A possible approach in the future is to use the "ephmap" (or proclocal)
50+
proposed in [2], but it is not yet clear when that work will be merged.
51+
In constrast, the changes in this patch series are small and
52+
self-contained, yet immediately allow gfn_to_pfn_cache (including
53+
kvm-clock) to operate correctly with direct map-removed guest_memfd.
54+
Once ephmap eventually is merged, gfn_to_pfn_cache can be updated to
55+
make use of it as appropriate.
56+
57+
[1]: https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus.lmu.de/
58+
[2]: https://lore.kernel.org/all/20250812173109.295750-1-jackmanb@google.com/
59+
60+
Takahiro Itazuri (2):
61+
KVM: pfncache: Use kvm_gmem_get_pfn() for guest_memfd-backed memslots
62+
KVM: pfncache: Use vmap() for guest_memfd pages without direct map
63+
64+
include/linux/kvm_host.h | 7 ++++++
65+
virt/kvm/pfncache.c | 52 +++++++++++++++++++++++++++++-----------
66+
2 files changed, 45 insertions(+), 14 deletions(-)
67+
68+
--
69+
2.50.1
70+
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
From bebfd5914aae9d2eaec6b90b6408875f2aa40610 Mon Sep 17 00:00:00 2001
2+
From: Takahiro Itazuri <itazur@amazon.com>
3+
Date: Mon, 1 Dec 2025 14:58:44 +0000
4+
Subject: [RFC PATCH 1/2] KVM: pfncache: Use kvm_gmem_get_pfn() for guest_memfd-backed memslots
5+
6+
gfn_to_pfn_cache currently relies on hva_to_pfn(), which resolves PFNs
7+
through GUP. GUP assumes that the page has a valid direct-map PTE,
8+
which is not true for guest_memfd created with
9+
GUEST_MEMFD_FLAG_NO_DIRECT_MAP, because their direct-map PTEs are
10+
explicitly removed via set_direct_map_valid_noflush().
11+
12+
Introduce a helper function, gpc_to_pfn(), that routes PFN lookup to
13+
kvm_gmem_get_pfn() for guest_memfd-backed memslots (regardless of
14+
whether GUEST_MEMFD_FLAG_NO_DIRECT_MAP is set), and otherwise falls
15+
back to the existing hva_to_pfn() path. Rename hva_to_pfn_retry() to
16+
gpc_to_pfn_retry() accordingly.
17+
18+
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
19+
---
20+
virt/kvm/pfncache.c | 34 +++++++++++++++++++++++-----------
21+
1 file changed, 23 insertions(+), 11 deletions(-)
22+
23+
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
24+
index 728d2c1b488a..bf8d6090e283 100644
25+
--- a/virt/kvm/pfncache.c
26+
+++ b/virt/kvm/pfncache.c
27+
@@ -152,22 +152,34 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
28+
return kvm->mmu_invalidate_seq != mmu_seq;
29+
}
30+
31+
-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
32+
+static kvm_pfn_t gpc_to_pfn(struct gfn_to_pfn_cache *gpc, struct page **page)
33+
{
34+
- /* Note, the new page offset may be different than the old! */
35+
- void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
36+
- kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
37+
- void *new_khva = NULL;
38+
- unsigned long mmu_seq;
39+
- struct page *page;
40+
+ if (kvm_slot_has_gmem(gpc->memslot)) {
41+
+ kvm_pfn_t pfn;
42+
+
43+
+ kvm_gmem_get_pfn(gpc->kvm, gpc->memslot, gpa_to_gfn(gpc->gpa),
44+
+ &pfn, page, NULL);
45+
+ return pfn;
46+
+ }
47+
48+
struct kvm_follow_pfn kfp = {
49+
.slot = gpc->memslot,
50+
.gfn = gpa_to_gfn(gpc->gpa),
51+
.flags = FOLL_WRITE,
52+
.hva = gpc->uhva,
53+
- .refcounted_page = &page,
54+
+ .refcounted_page = page,
55+
};
56+
+ return hva_to_pfn(&kfp);
57+
+}
58+
+
59+
+static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
60+
+{
61+
+ /* Note, the new page offset may be different than the old! */
62+
+ void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
63+
+ kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
64+
+ void *new_khva = NULL;
65+
+ unsigned long mmu_seq;
66+
+ struct page *page;
67+
68+
lockdep_assert_held(&gpc->refresh_lock);
69+
70+
@@ -206,7 +218,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
71+
cond_resched();
72+
}
73+
74+
- new_pfn = hva_to_pfn(&kfp);
75+
+ new_pfn = gpc_to_pfn(gpc, &page);
76+
if (is_error_noslot_pfn(new_pfn))
77+
goto out_error;
78+
79+
@@ -319,7 +331,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
80+
}
81+
}
82+
83+
- /* Note: the offset must be correct before calling hva_to_pfn_retry() */
84+
+ /* Note: the offset must be correct before calling gpc_to_pfn_retry() */
85+
gpc->uhva += page_offset;
86+
87+
/*
88+
@@ -327,7 +339,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
89+
* drop the lock and do the HVA to PFN lookup again.
90+
*/
91+
if (!gpc->valid || hva_change) {
92+
- ret = hva_to_pfn_retry(gpc);
93+
+ ret = gpc_to_pfn_retry(gpc);
94+
} else {
95+
/*
96+
* If the HVA→PFN mapping was already valid, don't unmap it.
97+
--
98+
2.50.1
99+

resources/hiding_ci/linux_patches/11-kvm-clock/0001-KVM-x86-use-uhva-for-kvm-clock-if-kvm_gpc_refresh-fa.patch

Lines changed: 0 additions & 103 deletions
This file was deleted.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
From 6c90c75b15ba48cc5e0a1e74224c041c3c45668b Mon Sep 17 00:00:00 2001
2+
From: Takahiro Itazuri <itazur@amazon.com>
3+
Date: Mon, 1 Dec 2025 16:47:05 +0000
4+
Subject: [RFC PATCH 2/2] KVM: pfncache: Use vmap() for guest_memfd pages without direct map
5+
6+
gfn_to_pfn_cache currently maps RAM PFNs with kmap(), which relies on
7+
the direct map. guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP
8+
disable their direct-map PTEs via set_direct_map_valid_noflush(), so the
9+
linear address returned by kmap()/page_address() will fault if
10+
dereferenced.
11+
12+
In some cases, gfn_to_pfn_cache dereferences the cached kernel address
13+
(khva) from atomic contexts where page faults cannot be tolerated.
14+
Therefore khva must always refer to a fault-free kernel mapping. Since
15+
mapping and unmapping happen exclusively in the refresh path, which may
16+
sleep, using vmap()/vunmap() for these pages is safe and sufficient.
17+
18+
Introduce kvm_slot_no_direct_map() to detect guest_memfd slots without
19+
the direct map, and make gpc_map()/gpc_unmap() use vmap()/vunmap() for
20+
such pages.
21+
22+
This allows the facilities based on gfn_to_pfn_cache (e.g. kvm-clock) to
23+
work correctly with guest_memfd regardless of whether its direct-map
24+
PTEs are valid.
25+
26+
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
27+
---
28+
include/linux/kvm_host.h | 7 +++++++
29+
virt/kvm/pfncache.c | 26 ++++++++++++++++++++------
30+
2 files changed, 27 insertions(+), 6 deletions(-)
31+
32+
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
33+
index 70e6a5210ceb..793d98f97928 100644
34+
--- a/include/linux/kvm_host.h
35+
+++ b/include/linux/kvm_host.h
36+
@@ -15,6 +15,7 @@
37+
#include <linux/minmax.h>
38+
#include <linux/mm.h>
39+
#include <linux/mmu_notifier.h>
40+
+#include <linux/pagemap.h>
41+
#include <linux/preempt.h>
42+
#include <linux/msi.h>
43+
#include <linux/slab.h>
44+
@@ -628,6 +629,12 @@ static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *sl
45+
return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
46+
}
47+
48+
+static inline bool kvm_slot_no_direct_map(const struct kvm_memory_slot *slot)
49+
+{
50+
+ return slot && kvm_slot_has_gmem(slot) &&
51+
+ mapping_no_direct_map(slot->gmem.file->f_mapping);
52+
+}
53+
+
54+
static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
55+
{
56+
return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
57+
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
58+
index bf8d6090e283..ae6d8699e536 100644
59+
--- a/virt/kvm/pfncache.c
60+
+++ b/virt/kvm/pfncache.c
61+
@@ -96,10 +96,16 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
62+
return true;
63+
}
64+
65+
-static void *gpc_map(kvm_pfn_t pfn)
66+
+static void *gpc_map(struct gfn_to_pfn_cache *gpc, kvm_pfn_t pfn)
67+
{
68+
- if (pfn_valid(pfn))
69+
- return kmap(pfn_to_page(pfn));
70+
+ if (pfn_valid(pfn)) {
71+
+ struct page *page = pfn_to_page(pfn);
72+
+
73+
+ if (kvm_slot_no_direct_map(gpc->memslot))
74+
+ return vmap(&page, 1, VM_MAP, PAGE_KERNEL);
75+
+
76+
+ return kmap(page);
77+
+ }
78+
79+
#ifdef CONFIG_HAS_IOMEM
80+
return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
81+
@@ -115,6 +121,11 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
82+
return;
83+
84+
if (pfn_valid(pfn)) {
85+
+ if (is_vmalloc_addr(khva)) {
86+
+ vunmap(khva);
87+
+ return;
88+
+ }
89+
+
90+
kunmap(pfn_to_page(pfn));
91+
return;
92+
}
93+
@@ -224,13 +235,16 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
94+
95+
/*
96+
* Obtain a new kernel mapping if KVM itself will access the
97+
- * pfn. Note, kmap() and memremap() can both sleep, so this
98+
- * too must be done outside of gpc->lock!
99+
+ * pfn. Note, kmap(), vmap() and memremap() can sleep, so this
100+
+ * too must be done outside of gpc->lock! Note that even though
101+
+ * the rwlock is dropped, it's still fine to access gpc->pfn and
102+
+ * other fields because gpc->fresh_lock mutex prevents those
103+
+ * from being changed.
104+
*/
105+
if (new_pfn == gpc->pfn)
106+
new_khva = old_khva;
107+
else
108+
- new_khva = gpc_map(new_pfn);
109+
+ new_khva = gpc_map(gpc, new_pfn);
110+
111+
if (!new_khva) {
112+
kvm_release_page_unused(page);
113+
--
114+
2.50.1
115+

0 commit comments

Comments
 (0)