Skip to content

Conversation

@epilys
Copy link
Contributor

@epilys epilys commented Jul 27, 2025

Follow the https://docs.kernel.org/arch/arm64/booting.html protocol to allow building a bootable linux-like image.

This'd be ideal for a compile once, run "anywhere" use-case.

I've been able to create linux bootable images with this diff.

file(1) recognizes the image as Linux kernel ARM64 boot executable Image, little-endian, 4K pages and QEMU can boot it with direct kernel boot. Also with Xen under QEMU by loading the image as the dom0 with the guest-loader device.

Finally, I added the UEFI MZ magic as the first instruction but I haven't had time to look into writing a proper stub.

TODOs:

  1. make the header configurable with a cargo feature instead of embedding it in entry.S
  2. configurable page sizes, endianness
  3. document it in README

@google-cla
Copy link

google-cla bot commented Jul 27, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@qwandor qwandor self-requested a review July 27, 2025 18:05
Copy link
Collaborator

@qwandor qwandor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I like the idea of supporting the Linux boot protocol (so far I've only used this for binaries loaded at a fixed address, like a bootloader), but I'm not sure that copying to a fixed address will always work. You'll also need to sign the CLA before I can accept anything.

@epilys epilys force-pushed the linux-bootable branch 2 times, most recently from cb2893c to 36296ef Compare August 8, 2025 16:38
@epilys
Copy link
Contributor Author

epilys commented Aug 8, 2025

Was busy with work but had time to look into this just now.

I got the relocate solution working, with the following caveats:

  • It requires -z notext and -pie linker args
  • Uses RELA relocations, not RELR
  • Image ORIGIN in linker script must be 0, an alternative is to calculate the origin with an adrp+add and then compute the difference with the load address.

Also fixed the CLA check, and guarded the relocate logic behind a relocate cargo feature.

EDIT: I made an example by copying the qemu_el1.rs example not for merging but to aid review: https://github.com/epilys/aarch64-rt/tree/linux-bootable-ex Check the ex subdirectory.

@epilys epilys requested a review from qwandor August 8, 2025 16:40
@epilys epilys changed the title RFC: Implement Linux bootable image support Implement Linux bootable image support Aug 8, 2025
@epilys epilys marked this pull request as ready for review August 8, 2025 16:41
@epilys epilys force-pushed the linux-bootable branch 2 times, most recently from 3cae637 to 60e7a0f Compare August 8, 2025 17:28
@epilys
Copy link
Contributor Author

epilys commented Aug 8, 2025

One thing missing is that the initial_pagetables macro cannot be configured to depend on discovered memory layouts e.g. /memory DT node or EFI memory map.

@epilys
Copy link
Contributor Author

epilys commented Aug 19, 2025

Rebased against main to include the naked functions refactoring, now everything is guarded behind a cargo feature and there's no unnecessary code added when it's disabled.

@epilys
Copy link
Contributor Author

epilys commented Aug 22, 2025

I am still not sure if this is the correct way to go. It definitely works, but forcing the emission of relocation entries seems hacky. Maybe we should just emit only the boot header instead, and force the user of the library to map their actual physical load address to their expected load virtual address. I think that's what the linux kernel is doing (?). That way, no relocations are needed. I'm not super familiar with MMU programming and page tables though.

@qwandor WDYT?

@epilys
Copy link
Contributor Author

epilys commented Aug 22, 2025

Yes, I got confirmation from a kernel developer.

It maps its code to a virtual address with an offset from the physical address.

Then, for MMIO/devices it maps them to dynamic virtual addresses such as ones created with vmalloc.

So with that in mind, while self-relocation seems like a good solution to avoid all that, it seems it should be separate from the linux boot header stuff.

@qwandor
Copy link
Collaborator

qwandor commented Aug 28, 2025

Yes, I got confirmation from a kernel developer.

It maps its code to a virtual address with an offset from the physical address.

Then, for MMIO/devices it maps them to dynamic virtual addresses such as ones created with vmalloc.

So with that in mind, while self-relocation seems like a good solution to avoid all that, it seems it should be separate from the linux boot header stuff.

This is an interesting idea. So I guess you'd need to reserve space for an initial pagetable, but fill in the entries before enabling it based on the load address?

src/relocate.rs Outdated
r_info,
r_addend,
} = unsafe { *rela };
if elf64_r_type!(r_info) == R_AARCH64_RELATIVE {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know it is safe to ignore relocation entries of other types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is RELR which are opt-in as a linker option, since they are a very recent feature

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean RELA vs. RELR, I mean R_AARCH64_RELATIVE vs. other entries. I'm not very familiar with the details of how relocation works, but as far as I can see there are a whole lot of different types e.g. R_AARCH64_ABS64, R_AARCH64_MOVW_UABS_G0 and so on, how do we know that we can ignore all the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are not emitted for pie executables.

image : ORIGIN = 0x0, LENGTH = 2M
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that the relocate feature is incompatible with the initial-pagetable feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily, you can use initial-pagetable to map everything to normal memory and after relocating, you need to set up new page tables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, let's document that then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that actually valid? Talking to some folks at Arm, it sounds like this might not work, because if a device is incorrectly mapped as normal memory then the CPU might speculatively access it, which can cause the device to perform arbitrary unwanted operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will write it as incompatible then. Thanks!

@epilys epilys force-pushed the linux-bootable branch 2 times, most recently from 930e8ff to 991a2f9 Compare August 29, 2025 10:39
@epilys
Copy link
Contributor Author

epilys commented Sep 8, 2025

Cc @ardbiesheuvel this is the PR I mentioned to you.

Follow the https://docs.kernel.org/arch/arm64/booting.html protocol to
allow building a bootable linux-like image.

To allow for any load address, perform relocations before jumping to the
rust entrypoint.

Guard relocation logic behind "relocation" Cargo feature.

Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
"ldr x10, [x9, #16]",
// let r_offset = unsafe { *rela }.r_offset;
"ldr x11, [x9]",
// new_ptr += offset;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this a bit, and remove the need for the jump at the start of the loop, by doing something like

ldp x10, x11, [x9], #24 
ldr x12, [x9, #-8]

Note that R_AARCH64_RELATIVE is the only RELA type that can reasonably expected to occur here, so performing the load of the offset and addend fields unconditionally will not make any difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that no other RELA types can occur?

@ardbiesheuvel
Copy link

One general question about this code: who is supposed to perform the cache invalidation? No I-cache maintenance should be required, as R_AARCH64_RELATIVE relocations can only refer to data, not code. However, under KVM, where the VMM is in charge of ensuring that the entire bootable image is clean to the point of coherency (PoC), some extra work is needed when the relocation code itself happens to execute before the MMU and caches are enabled. Otherwise, the updated memory contents might be shadowed by stale, clean cachelines, resulting in odd behavior when the code enables the MMU.

@epilys
Copy link
Contributor Author

epilys commented Sep 8, 2025

@ardbiesheuvel What is your suggestion? if we want to ensure correct use of this API, would it make sense to do the cache flush ourselves after performing the relocation? I.e. flush_icache_range(load_addr, load_addr + image_length).

@ardbiesheuvel
Copy link

Flushing the I-cache should not be needed, only the D-cache needs maintenance.

The issue here is that it depends on

  • whether or not the MMU and caches are already enabled when this code runs
  • whether any of the relocated quantities may be observable by secondaries before they enable their MMUs

Assuming that this code only executes with the MMU off, it should be sufficient to perform a D-cache invalidate for either the whole region, or simply for every store - the latter is much simpler but performs redundant work if there are many adjacent locations being relocated.

If the code may execute with the MMU on as well, the invalidate should be gated on this condition, and a clean performed instead (unless the code never executes with the MMU off, even on secondaries).

rustflags = [
"-C", "relocation-model=pie",
"-C", "link-args=-z notext",
]
Copy link
Collaborator

@qwandor qwandor Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher up in this README we recommend passing the linker script in build.rs via cargo:rustc-link-arg=, so let's suggest the same thing here for these flags if that's possible.

@qwandor
Copy link
Collaborator

qwandor commented Nov 3, 2025

@epilys Did you have a chance to look at the outstanding comments on this? I think the main things remaining are flushing D-cache, Ard's suggested simplification, and updating the README.

@epilys
Copy link
Contributor Author

epilys commented Nov 10, 2025

It's on my TODO list. IIUC, I need to add a conditional check to see if MMU is disabled for current EL, and invalidate d-cache if it's off, or clean&invalidate if it's on, after all relocations are performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants