Skip to content

Conversation

@weihanglo
Copy link
Member

@weihanglo weihanglo commented Nov 15, 2025

Add original_source_len field to track the byte length
of source files before normalization.

For imported/decoded source files
where the original content is unavailable,
the normalized length is used as a fallback.

original_source_len is for writing the correct file length
to dep-info for -Zchecksum-hash-algorithm

Fixes #148934

Add `original_source_len` field to track the byte length
of source files before normalization.

For imported/decoded source files
where the original content is unavailable,
the normalized length is used as a fallback.

`original_source_len` is for writing the correct file length
to dep-info for `-Zchecksum-hash-algorithm`
@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 15, 2025

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

self.checksum_hash.encode(s);
// Do not encode `start_pos` as it's global state for this session.
self.source_len.encode(s);
self.original_source_len.encode(s);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is really needed

Copy link
Member

@jieyouxu jieyouxu Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say if the normalized source len is encoded, then unnormalized source len should also be encoded. I would imagine that if the source file changed, then well, it did materially change.

(
escape_dep_filename(&fmap.name.prefer_local().to_string()),
fmap.source_len.0 as u64,
fmap.original_source_len as u64,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the goal of the entire PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: can you please add a comment to elaborate here for locality? That is,

source_len -> original_source_len is very subtle.

Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in this part of the compiler, however the changes make sense to me. A few "cosmetic" nits.

View changes since this review

(
escape_dep_filename(&fmap.name.prefer_local().to_string()),
fmap.source_len.0 as u64,
fmap.original_source_len as u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: can you please add a comment to elaborate here for locality? That is,

source_len -> original_source_len is very subtle.

Comment on lines +1728 to +1729
/// The byte length of this source before normalization.
pub original_source_len: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I'm not sure how invasive the change is, but we can rename this pair of source lengths (better names welcome):

  • source_len -> normalized_source_len
  • original_source_len -> unnormalized_source_len

When reading this diff, I find that source_len vs original_source_len is really not obvious. I can appreciate if the diffs is intentionally made small to make review easier, but I would prefer if we also do a rename for the source_len field for consistency -- this is really not obvious.

self.checksum_hash.encode(s);
// Do not encode `start_pos` as it's global state for this session.
self.source_len.encode(s);
self.original_source_len.encode(s);
Copy link
Member

@jieyouxu jieyouxu Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say if the normalized source len is encoded, then unnormalized source len should also be encoded. I would imagine that if the source file changed, then well, it did materially change.

@jieyouxu
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

-Zchecksum-hash-algorithm used normalized file size in dep-info

3 participants