-
Notifications
You must be signed in to change notification settings - Fork 129
regex improvements #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex improvements #354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances regex functionality in Jim Tcl by adding -lineanchor
and -linestop
support, improving Tcl compatibility, and fixing several POSIX regex behaviors.
- Adds
-lineanchor
and-linestop
options toregexp
andregsub
commands for finer newline handling control - Updates error messages to use "option" instead of "switch" for better Tcl compatibility
- Enables previously commented-out UTF-8 and regex tests, fixes edge cases in empty pattern matching
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/regexp2.test | Uncommented tests for -lineanchor /-linestop options and updated error message expectations |
tests/regexp.test | Uncommented UTF-8 tests, updated error messages, and added new empty pattern tests |
tests/interactive.test | Changed expect patterns from braces to double quotes for proper expansion |
test-bootstrap-jim | Added argument forwarding to make-bootstrap-jim script |
make-bootstrap-jim | Added --no-regexp option to exclude regexp extension from bootstrap build |
jimregexp.h | Split REG_NEWLINE into separate REG_NEWLINE_ANCHOR and REG_NEWLINE_STOP flags |
jimregexp.c | Updated regex engine to use new separate newline flags |
jim.c | Modified string sorting to place -- option last for Tcl 9.0 compatibility |
jim-regexp.c | Added -lineanchor /-linestop support, fixed empty pattern handling, updated error messages |
make-bootstrap-jim
Outdated
|
||
# And finally the core source code | ||
for i in jim.c jim-subcmd.c utf8.c jim-format.c jimregexp.c jimiocompat.c jim-win32compat.c jim-nosignal.c; do | ||
for i in jim.c jim-subcmd.c utf8.c jim-format.c jim-subcmd.h $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do |
Copilot
AI
Aug 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line includes jim-subcmd.h
which is a header file in the source files loop. This should likely be removed since header files are already processed in the earlier loop at line 144.
for i in jim.c jim-subcmd.c utf8.c jim-format.c jim-subcmd.h $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do | |
for i in jim.c jim-subcmd.c utf8.c jim-format.c $REGEXP_SOURCE jimiocompat.c jim-win32compat.c jim-nosignal.c; do |
Copilot uses AI. Check for mistakes.
jim-regexp.c
Outdated
n = source_len - offset; | ||
p = source_str + offset; | ||
do { | ||
while (1) { |
Copilot
AI
Aug 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The do...while
loop has been changed to while (1)
with a break condition inside. Consider using a more explicit loop condition or adding a comment explaining why this infinite loop pattern is preferred over the previous do...while(n)
structure.
while (1) { | |
while (n > 0) { |
Copilot uses AI. Check for mistakes.
Although "" and "x*" both match the empty string, the former correctly exits, while the latter looped forever. Match Tcl here by advancing by one char in both cases, but in the latter case end of string is matched, while in the former it is not. Also prevent both cases from slicing a utf-8 char into bytes. Fixes: #353 Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
This matches Tcl 9.0 Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
Signed-off-by: Steve Bennett <steveb@workware.net.au>
If using POSIX regex instead of the builtin jim regex, \r and \n character escapes are not supported. Use literal \r and \n instead. Signed-off-by: Steve Bennett <steveb@workware.net.au>
This means jim builtin regex, not posix regex Signed-off-by: Steve Bennett <steveb@workware.net.au>
To build without the jim builtin regex Signed-off-by: Steve Bennett <steveb@workware.net.au>
2311f5b
to
e00c921
Compare
Add -lineanchor and -linestop support
A few other changes to be more similar to Tcl
Better support for POSIX regex