-
Notifications
You must be signed in to change notification settings - Fork 243
implement floor
and ceil
in assembly on i586
#976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b202474
to
159feb2
Compare
159feb2
to
e04e38d
Compare
This is awesome, thank you for implementing it! There is one problem, our |
Fix for the red CI in #979 btw |
Well, I don't know this target and its assembly that well. So using I think the only downside is that the symbols might be visible from the outside? Is that a problem? |
actually we can prevent even that with a trick that Björn showed me recently using |
e04e38d
to
1680fd8
Compare
from what I can find this is a 2024 edition thing
but rust 1.63 won't compile |
I think it may be, also conflicts (though we could make it weak). But this should be pretty easy to move to inline asm; I think something like this would work https://rust.godbolt.org/z/P98rKEW7P, which is reasonably close to the original (not sure if there's a way to turn pointers to locals into stack offsets, to avoid the |
1680fd8
to
1766d1e
Compare
That's probably fine, we can fix that properly once naked functions are available we now get errors like this?
|
You're clobbering the saved control word on the stack with the modified one, so it isn't restored correctly. I think there's an argument for using a fixed control word for the |
libm/src/math/arch/i586.rs
Outdated
"movw %dx, ({cw_ptr})", // Apply cw | ||
"fldcw ({cw_ptr})", // ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yeah, as @quaternic mentioned this needs a second stack slot. As a microopt this could be let mut cw_stash = [0u16; 2];
and then these accesses can be -4({stash_ptr})
so we save a register vs. two different u16
locals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually even better, let mut cw_stash = MaybeUninit::<[u16; 2]>::uninit();
to save the zero init instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this to work
https://rust.godbolt.org/z/44Tjcx3bb
the -4
stuff didn't work (also, for u16
, why -4
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, took that from the original asm without checking :)
1766d1e
to
8dd28c9
Compare
8dd28c9
to
8f43b0b
Compare
Somehow these changes introduce a panic somewhere??
The culprit is |
That's bizarre. The failure didn't show up with I'll try to look on Monday but it would be good to trace back where that panic is coming from, maybe an |
After doing a bit of testing, it appears that any inline ASM block that doesn't have the |
Is this bug in no_panic or is does codegen for non- If it's the first case, I think it might be okay to work around this by updating |
I think it's a codegen/optimization issue. When I move the https://godbolt.org/z/aad7TYaWf (with On the other hand in the As far as I can tell there is no actual panic there (see e.g. the godbolt) |
So, what I had in mind there was that while the implementation from NetBSD is surely more correct in supporting unmasked exceptions, we really don't have that support anywhere in the language (except maybe naked functions) so a chunk of it is unnecessary. I believe this should work just as well: extern "C" fn floor(mut x: f64) -> f64 {
unsafe {
core::arch::asm!(
"fld qword ptr [{x}]",
"fstcw [{x}]",
"mov word ptr [{x} + 2], 0x077f",
"fldcw [{x} + 2]",
"frndint",
"fldcw [{x}]",
"fstp qword ptr [{x}]",
x = in(reg) &mut x,
out("st(0)") _, out("st(1)") _,
out("st(2)") _, out("st(3)") _,
out("st(4)") _, out("st(5)") _,
out("st(6)") _, out("st(7)") _,
options(nostack),
);
}
x
} https://godbolt.org/z/8b6a8Yxn1 |
That implementation has the right behavior, but still runs into the panic issue. |
Yeah, I didn't mean to imply it would do anything for that, and there's currently no way to make x87 inline assembly As for the panic, it seems that all the errors trace back to the same source: Since that call looks to be replaceable with |
Does making the functions If that doesn't work, I think the change in #982 is fine if it gets us out of this issue.
Do you mean putting the definitions in a separate object file but still checking the panic in Lines 46 to 51 in 2cdde03
|
8f43b0b
to
678b58b
Compare
Locally at least, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small nits but it looks good to me, @quaternic could you also review?
I do think it is okay to use #982 as well so we can allow floor
to be inlined
libm/src/math/arch/i586.rs
Outdated
let mut cw_stash = core::mem::MaybeUninit::<u16>::uninit(); | ||
let mut cw_tmp = core::mem::MaybeUninit::<u16>::uninit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be imported since it is used a few times?
libm/src/math/arch/i586.rs
Outdated
} else { | ||
return x; | ||
// NOTE: this `inline(never)` is load-bearing for making functions that use it panic-free. | ||
// Without it `rem_pio2_large` (and any function that uses it) will contain panics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they actually contain panics? Or is it a false positive that the pattern used by no_panic
doesn't handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to actually pinpoint where the reported panic really is. e.g. https://godbolt.org/z/TcEGeYos9 doesn't show any panics. So there is a chance that it's a limitation in no_panic
.
515623f
to
37b4f27
Compare
I did some more testing, and I think this might be an issue in rustc (well, more of a feature request). Functions containing inline asm with an ABI that allows unwinding are considered potentially unwinding (so don't get the https://godbolt.org/z/x9cYPovMz So my understanding is that in this case:
So, to fix this specific case while still allowing inlining in the general case, the simplest fix that I could think of is to wrap the call with I've had some doubts that substituting Other that that, this looks all good by me. Thanks for writing proper comments for the asm! |
Thank you for looking into this more. Would you mind writing up a rust-lang/rust issue with your findings? That would be good to link from the fixme in #982. |
Shouldn't I'm wondering what a solution would look like here. Are you thinking something like a |
Created issue at rust-lang/rust#144518, there's some more context there.
|
Possible workaround for #976 (comment) Inline assembly in the body of a function currently causes the compiler to consider that function possibly unwinding, even if said asm originated from inlining an `extern "C"` function. This patch wraps the problematic callsite with `#[inline(never)]`.
libm/src/math/arch/i586.rs
Outdated
|
||
use super::super::fabs; | ||
// FIXME: when the MSRV allows, use naked functions instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there actually any reason to do this now? Being able to inline this assembly should be advantageous, and from a quick look at godbolt it looks like this function is only 13 instructions (compared to 14 from the NetBSD implementations).
Okay! I think this just needs to rebase to drop the first commit, and drop the |
37b4f27
to
5d7b542
Compare
Cool, I've made those changes and CI is happy. |
Thanks for making this happen! Cc @programmerjake since this was your suggestion a long time ago |
The output, for reference: .section .text.ceil,"ax",@progbits
.hidden ceil
.weak ceil
.p2align 4
.type ceil,@function
ceil:
.cfi_startproc
sub esp, 12
.cfi_def_cfa_offset 16
fld qword ptr [esp + 16]
fstp qword ptr [esp]
mov eax, esp
#APP
fld qword ptr [eax]
wait
fnstcw word ptr [eax]
mov word ptr [eax + 2], 2943
fldcw word ptr [eax + 2]
frndint
fldcw word ptr [eax]
fstp qword ptr [eax]
#NO_APP
fld qword ptr [esp]
add esp, 12
.cfi_def_cfa_offset 4
ret
.section .text.floor,"ax",@progbits
.hidden floor
.weak floor
.p2align 4
.type floor,@function
floor:
.cfi_startproc
sub esp, 12
.cfi_def_cfa_offset 16
fld qword ptr [esp + 16]
fstp qword ptr [esp]
mov eax, esp
#APP
fld qword ptr [eax]
wait
fnstcw word ptr [eax]
mov word ptr [eax + 2], 1919
fldcw word ptr [eax + 2]
frndint
fldcw word ptr [eax]
fstp qword ptr [eax]
#NO_APP
fld qword ptr [esp]
add esp, 12
.cfi_def_cfa_offset 4
ret |
fixes #837
The assembly is based on
Which both state
Which I believe means we're good in terms of licensing.