Skip to content

Conversation

msteveb
Copy link
Owner

@msteveb msteveb commented Aug 9, 2025

Add -lineanchor and -linestop support
A few other changes to be more similar to Tcl
Better support for POSIX regex

@msteveb msteveb requested a review from Copilot August 9, 2025 11:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances regex functionality in Jim Tcl by adding -lineanchor and -linestop support, improving Tcl compatibility, and fixing several POSIX regex behaviors.

  • Adds -lineanchor and -linestop options to regexp and regsub commands for finer newline handling control
  • Updates error messages to use "option" instead of "switch" for better Tcl compatibility
  • Enables previously commented-out UTF-8 and regex tests, fixes edge cases in empty pattern matching

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/regexp2.test Uncommented tests for -lineanchor/-linestop options and updated error message expectations
tests/regexp.test Uncommented UTF-8 tests, updated error messages, and added new empty pattern tests
tests/interactive.test Changed expect patterns from braces to double quotes for proper expansion
test-bootstrap-jim Added argument forwarding to make-bootstrap-jim script
make-bootstrap-jim Added --no-regexp option to exclude regexp extension from bootstrap build
jimregexp.h Split REG_NEWLINE into separate REG_NEWLINE_ANCHOR and REG_NEWLINE_STOP flags
jimregexp.c Updated regex engine to use new separate newline flags
jim.c Modified string sorting to place -- option last for Tcl 9.0 compatibility
jim-regexp.c Added -lineanchor/-linestop support, fixed empty pattern handling, updated error messages


# And finally the core source code
for i in jim.c jim-subcmd.c utf8.c jim-format.c jimregexp.c jimiocompat.c jim-win32compat.c jim-nosignal.c; do
for i in jim.c jim-subcmd.c utf8.c jim-format.c jim-subcmd.h $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line includes jim-subcmd.h which is a header file in the source files loop. This should likely be removed since header files are already processed in the earlier loop at line 144.

Suggested change
for i in jim.c jim-subcmd.c utf8.c jim-format.c jim-subcmd.h $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do
for i in jim.c jim-subcmd.c utf8.c jim-format.c $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do

Copilot uses AI. Check for mistakes.

jim-regexp.c Outdated
n = source_len - offset;
p = source_str + offset;
do {
while (1) {
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The do...while loop has been changed to while (1) with a break condition inside. Consider using a more explicit loop condition or adding a comment explaining why this infinite loop pattern is preferred over the previous do...while(n) structure.

Suggested change
while (1) {
while (n > 0) {

Copilot uses AI. Check for mistakes.

msteveb added 8 commits August 9, 2025 21:36
Although "" and "x*" both match the empty string, the former
correctly exits, while the latter looped forever.

Match Tcl here by advancing by one char in both cases, but
in the latter case end of string is matched, while in the former
it is not.

Also prevent both cases from slicing a utf-8 char into bytes.

Fixes: #353

Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
This matches Tcl 9.0

Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
If using POSIX regex instead of the builtin jim regex,
\r and \n character escapes are not supported.
Use literal \r and \n instead.

Signed-off-by: Steve Bennett <steveb@workware.net.au>
This means jim builtin regex, not posix regex

Signed-off-by: Steve Bennett <steveb@workware.net.au>
To build without the jim builtin regex

Signed-off-by: Steve Bennett <steveb@workware.net.au>
@msteveb msteveb force-pushed the regex-improvements branch from 2311f5b to e00c921 Compare August 9, 2025 11:41
@msteveb msteveb merged commit ca9fd7a into master Aug 12, 2025
5 checks passed
@msteveb msteveb deleted the regex-improvements branch August 13, 2025 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant