Skip to content

Conversation

@credo-quia-absurdum
Copy link
Contributor

@credo-quia-absurdum credo-quia-absurdum commented Dec 10, 2025

Summary

This PR makes three improvements to the comments in emitLoadImmediate:

  1. Clarifies that a previous speculation in the comment is incorrect, while the heuristic still works in most cases.
  2. Fixes a mismatch between the comment and the actual implementation (the formula ordering).
  3. Updates the comment style to follow the repository's convention (// instead of /* */).

@clamp03 @tomeksowi @SkyShield, @namu-lee
part of #84834, cc @dotnet/samsung

Details

1. Speculation in the comment proven incorrect

The previous comment included the following speculative statement.

The smaller offset should yield the least instruction. (is this correct?)

I investigated this and confirmed that the statement is not always true.
In most cases, a smaller offset with subtract mode produces fewer instructions, but not always.

For example, for the immediate value:

0x739B'8000'FC80'05F4

The add form with the larger offset requires only 5 instructions,
while the subtract form with the smaller offset requires 8 instructions.

Addition Mode

Offset : 0xFC80'05F4

# Instruction Immediate Register Value After
1 lui 0x7'39B8 0x0000'0000'739B'8000
2 slli 11 0x0000'039C'DC00'0000
3 addi 0x7E4 0x0000'039C'DC00'07E4
4 slli 21 0x739B'8000'FC80'0000
5 addi 0x5F4 0x739B'8000'FC80'05F4

Subtract Mode

Offset : 0x037F'FA0C

# Instruction Immediate Register Value After
1 lui 0x7'39B8 0x0000'0000'739B'8000
2 addiw 0x001 0x0000'0000'739B'8001
3 slli 17 0x0000'E737'0002'0000
4 addi 0x901 0x0000'E737'0001'F901
5 slli 11 0x0739'B800'0FC8'0800
6 addi 0x860 0x0739'B800'0FC8'0060
7 slli 4 0x739B'8000'FC80'0600
8 addi 0xFF4 0x739B'8000'FC80'05F4

I experimented with an optimization that computes the exact instruction count for each mode and selects the cheaper one. However, the overall improvement was minimal.

Uniformly sampling the 64-bit space showed that only about one case per million improved compared to the existing heuristic, so I decided to drop this approach.

Although there are more cases where the larger add offset yields a better sequence, the practical impact remains small because the code generator falls back to emitDataConst whenever the required instruction count exceeds five.

2. Mismatch between comment and implementation

In the original comment, the third condition for enabling the left-shift form was written as:

(b - a) - (y - x) >= 11

However, the actual implementation evaluates the expression in the opposite order:

bool cond3 = ((y - x) - (b - a)) >= 11;

This PR corrects the comment to match the implementation and also formats the conditions in the comment to clarify the combined condition (1) && ( (2) || (3) ).

Copilot AI review requested due to automatic review settings December 10, 2025 12:17
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 10, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR corrects and improves comments in the emitLoadImmediate function for RISC-V code generation. The changes clarify a previously speculative comment that has been proven incorrect through investigation, fix a mismatch between comment and implementation for formula ordering, and modernize comment style to follow repository conventions.

Key changes:

  • Clarifies that smaller offsets do not always yield fewer instructions, providing a concrete counterexample
  • Corrects the formula in condition (3) from (b - a) - (y - x) >= 11 to (y - x) - (b - a) >= 11 to match the actual implementation
  • Converts all block comments from /* */ style to // style per repository conventions

Copy link
Member

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @fuad1502

@am11 am11 added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 10, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@fuad1502
Copy link
Contributor

Hi @credo-quia-absurdum, thank you for looking into this 🙏 I always wondered how the heuristic compared with an optimal solution. GCC, for example, if I recall correctly, uses backtracking to find the optimal solution. If you have the analysis publicly available, I would love to see it 🙏

If you want to look into optimizing this further as mentioned by @tomeksowi, you can refer to a draft PR I've made before here. I forgot why I didn't follow through with it, but maybe there's something there that can help 🙏

Copy link
Contributor

@fuad1502 fuad1502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you 🙏

@credo-quia-absurdum
Copy link
Contributor Author

credo-quia-absurdum commented Dec 11, 2025

Hi @credo-quia-absurdum, thank you for looking into this 🙏 I always wondered how the heuristic compared with an optimal solution. GCC, for example, if I recall correctly, uses backtracking to find the optimal solution. If you have the analysis publicly available, I would love to see it 🙏

If you want to look into optimizing this further as mentioned by @tomeksowi, you can refer to a draft PR I've made before here. I forgot why I didn't follow through with it, but maybe there's something there that can help 🙏

@fuad1502 Here is the analysis I performed.

I reproduced the emitter logic in a small C++ program with a simple register emulator and uniformly sampled across 64-bit space. In the text files linked below, Add Only refers to the instruction count when the emitter always uses the add-offset form, while Sub Only refers to the count when it always uses the subtract-offset form. Each entry also includes the instruction count produced by the current heuristic (without the fallback to emitDataConst), along with the corresponding 64-bit immediate value in binary form.

Validating the "Smaller Offset Is Better" Assumption

First, I examined whether the assumption "the smaller offset always yields fewer instructions" holds. The results, available here, show that this assumption does not hold in a noticeable portion. In fact, 9,617 out of 1 million samples (~1%) were counterexamples where the larger offset produced a shorter instruction sequence.

Potential Improvement Over the Current Heuristic with Fallback

Next, I evaluated how much improvement would be possible if the heuristic were perfect, under the constraint that we fall back to emitDataConst whenever the instruction count exceeds five. The results are here.

For this analysis, I only tracked cases where:

  • the offset selected by the current heuristic yields more instructions than the alternative, and
  • the alternative form generates five or fewer instructions (i.e., would not trigger the fallback).

Out of 1,000,000,000 sampled immediates, only 1,389 cases (~1 per 0.7M) showed any opportunity for improvement over the current implementation. This extremely small gain is primarily due to the fallback rule: once the instruction count exceeds five, the emitter switches to emitDataConst, nullifying potential benefits.

@clamp03
Copy link
Member

clamp03 commented Dec 12, 2025

@credo-quia-absurdum Please check jit format error and remove trailing whitespace.

@credo-quia-absurdum
Copy link
Contributor Author

@clamp03 Thanks for pointing it out. I've updated the PR and removed the trailing whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants