Skip to content

Conversation

@sylvestre
Copy link
Contributor

should fix tests/misc/unexpand.pl

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 13, 2025

Merging this PR will degrade performance by 15.47%

Summary

❌ 2 regressed benchmarks
✅ 81 untouched benchmarks
⏩ 94 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
unexpand_large_file[10] 529 ms 625.8 ms -15.47%
unexpand_many_lines[100000] 252.3 ms 298.5 ms -15.47%

Comparing sylvestre:unexpand (c4f6dab) with main (20a5c3a)

Open in CodSpeed

Footnotes

  1. 94 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/misc/unexpand is no longer failing!

Comment on lines 44 to 46
if num == 0 {
return Err(ParseError::TabSizeCannotBeZero);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would handle this case in its own arm: it makes the match more compact and allows you to get rid of the return.

Comment on lines 38 to 41
fn parse_increment_syntax(word: &str, nums: &[usize]) -> Result<usize, ParseError> {
if nums.is_empty() {
return Err(ParseError::InvalidCharacter("+".to_string()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the nums param, its values are never used. And the check for emptiness should probably be done where this function is called.

Comment on lines 58 to 66
fn parse_extend_syntax(word: &str, nums: &[usize]) -> Result<usize, ParseError> {
if nums.is_empty() {
return Err(ParseError::InvalidCharacter("/".to_string()));
}
match word.parse::<usize>() {
Ok(num) => {
if num == 0 {
return Err(ParseError::TabSizeCannotBeZero);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two previous comments about parse_increment_syntax also apply to this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With nums removed, you can also remove parse_increment_syntax or parse_extend_syntax because both functions do the same thing. And rename the remaining function.

}
}

fn tabstops_parse(s: &str) -> Result<TabConfig, ParseError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name this function parse_tabstops to be consistent with the other function names that start with parse.

Comment on lines 108 to 118
match word.parse::<usize>() {
Ok(num) => nums.push(num),
Err(e) => {
return match e.kind() {
IntErrorKind::PosOverflow => Err(ParseError::TabSizeTooLarge),
_ => Err(ParseError::InvalidCharacter(
word.trim_start_matches(char::is_numeric).to_string(),
)),
};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality already exists in the function you get when applying the refactorings I mentioned previously. The result of this function you can then push to nums.

Comment on lines 130 to 132
if nums.contains(&0) {
return Err(ParseError::TabSizeCannotBeZero);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my previous suggestions this snippet is no longer necessary as the condition can't become true.

Comment on lines 136 to 141
if nums.is_empty() {
return Err(ParseError::InvalidCharacter("+".to_string()));
}
let last = *nums.last().unwrap();
nums.push(last + inc);
} else if extend_size.is_some() && nums.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nums.is_empty() is always false in those two conditions and thus the returns are unreachable.

Comment on lines +341 to +343
} else if tab_config.increment_size.is_some() {
// +N: increment handled in tabstops_parse, so we should have the tab
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the condition: it is not relevant as the if and else cases both return the same value.

Suggested change
} else if tab_config.increment_size.is_some() {
// +N: increment handled in tabstops_parse, so we should have the tab
None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what used to be here for this comment:

but I would suggest to do:

  } else if let Some(inc) = tab_config.increment_size {
      let last = *tabstops.last().unwrap();
      Some(inc - ((col - last) % inc))
  }

Because it causes this difference in behaviour:

  # GNU (correct):
  $ printf "                    " | unexpand -t '3,+6' | xxd
  00000000: 0909 0920 2020 2020    # 3 tabs + 5 spaces

  # uutils (incorrect):
  $ printf "                    " | unexpand -t '3,+6' | xxd
  00000000: 0909 2020 2020 2020 2020 2020 20   # 2 tabs + 11 spaces

match word.parse::<usize>() {
Ok(num) => {
if num == 0 {
return Err(ParseError::TabSizeCannotBeZero);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\0 and +0 are accepted by GNU unexpand actually

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/misc/unexpand is no longer failing!

@sylvestre sylvestre requested a review from cakebaker December 26, 2025 23:33
impl UError for ParseError {}

fn tabstops_parse(s: &str) -> Result<Vec<usize>, ParseError> {
fn parse_tab_num(word: &str) -> Result<usize, ParseError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of deviation from GNU here:

  # GNU (accepts):
  $ printf "   test" | unexpand -t '3,/0'
          test

  # uutils (rejects):
  $ printf "   test" | unexpand -t '3,/0'
  unexpand: tab size cannot be 0

Since on line 40 the tab size has to unconditionally be 0

if increment_size.is_some() || extend_size.is_some() {
return Err(ParseError::InvalidCharacter("+".to_string()));
}
if nums.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed entirely same as the line on 78,

GNU accepts /N and +N alone, treating them as tabs at multiples of N

Example test cases:

  $ printf "         " | unexpand -t '/9' | xxd
  # GNU:    00000000: 09                                      
  # uutils: unexpand: tab size contains invalid character(s): '/'

  $ printf "                  " | unexpand -t '/9' | xxd
  # GNU:    00000000: 0909                                    
  # uutils: unexpand: tab size contains invalid character(s): '/'

  $ printf "      " | unexpand -t '+6' | xxd
  # GNU:    00000000: 09                                      
  # uutils: unexpand: tab size contains invalid character(s): '+'

  $ printf "            " | unexpand -t '+6' | xxd
  # GNU:    00000000: 0909                                    
  # uutils: unexpand: tab size contains invalid character(s): '+'

and


  $ printf "          " | unexpand -t '3,/0' | xxd
  # GNU:    00000000: 0909 0920                               
  # uutils: unexpand: tab size cannot be 0

  $ printf "          " | unexpand -t '3,+0' | xxd
  # GNU:    00000000: 0909 0920                               
  # uutils: unexpand: tab size cannot be 0

  $ printf "          " | unexpand -t '3' | xxd
  # GNU:    00000000: 0909 0920                               

@ChrisDryden
Copy link
Collaborator

Any chance with how busy you are with the reviews you'd be interested in me finishing this one up with the bugs? I'm going to add those edge cases that failed to the main gnu test suite.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/unexpand is no longer failing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpand does not support tabstop modifiers

4 participants