This project is a Rust implementation of the grep command-line tool, built as part of the Codecrafters "Build your own grep" challenge. This implementation provides a hands-on understanding of regular expression parsing, pattern matching algorithms, and how text search utilities work under the hood.
This grep implementation covers the fundamental pattern matching operations that grep performs:
- Character Class Matching: Support for
\d(digits) and\w(word characters) patterns - Character Groups: Both positive
[abc]and negative[^abc]character sets - Anchors: Start-of-line
^and end-of-line$matching - Quantifiers: One-or-more
+and zero-or-one?repetition operators - Wildcard Matching: Single character wildcard
.support - Alternation: Support for
(pattern1|pattern2)alternative matching - Literal Character Matching: Direct character matching for simple patterns
- Complex Pattern Sequences: Combining multiple pattern elements in sequence
- Modular Architecture: Clean separation between parsing (
parser.rs) and matching (matcher.rs) - Recursive Pattern Matching: Efficient pattern matching with proper backtracking
- Comprehensive Pattern Support: From simple literal matches to complex regex sequences with alternation
- Memory Safe: Leveraging Rust's ownership system for safe string processing
The grep implementation processes patterns through several key components:
- Converts string patterns into an enum-based AST representation
- Handles nested patterns like alternation groups
(a|b|c) - Supports escape sequences and special characters
- Recursive parsing for complex pattern structures
match_herefunction serves as the core matching engine- Handles different pattern types through pattern matching
- Implements backtracking for quantifiers and alternation
- Optimized character-by-character matching
Literal(char): Direct character matchingDigit: Matches\dpatternsAlphanumeric: Matches\wpatterns (including underscore)Group(bool, String): Character groups with positive/negative matchingWildcard: Single character wildcard.OneOrMore/ZeroOrOne: Quantifier supportAlternation(Vec<Vec<Pattern>>): Alternative pattern matchingEndAnchor: End-of-line$matching
- Rust 1.80 or later (as specified in
Cargo.toml)
-
Clone the repository:
git clone https://github.com/Md-Talim/codecrafters-grep-rust cd codecrafters-grep-rust -
Build the project:
cargo build --release
General Usage:
./your_program.sh -E <pattern>The program reads from stdin and exits with code 0 if the pattern matches, 1 if no match is found.
Pattern Examples:
| Pattern Type | Example | Description | Matches |
|---|---|---|---|
| Character Classes | \d |
Any digit | 123, a1b |
\w |
Any word character | hello, test_123 |
|
| Character Groups | [abc] |
Any of a, b, or c | apple, bat |
[^xyz] |
Any char except x, y, z | apple, dog |
|
| Anchors | ^hello |
Start of line | hello world |
world$ |
End of line | hello world |
|
| Quantifiers | ca+t |
One or more 'a' | cat, caaat |
ca?t |
Zero or one 'a' | ct, cat |
|
| Wildcard | c.t |
Any character between c and t | cat, cut, c1t |
| Alternation | (cat|dog) |
Match either "cat" or "dog" | cat, dog, doggy |
-
Match digits:
echo "hello123" | ./your_program.sh -E "\d" # Exit code: 0 (match found)
-
Match alternation:
echo "I have a dog" | ./your_program.sh -E "(cat|dog)" # Exit code: 0 (dog found)
-
Match with quantifiers:
echo "caaat" | ./your_program.sh -E "ca+t" # Exit code: 0 (one or more 'a' found)
-
Anchor matching:
echo "hello world" | ./your_program.sh -E "^hello" # Exit code: 0 (line starts with "hello")
- Pattern Matching Algorithms: Understanding how regex engines parse and execute patterns
- Recursive Descent Parsing: Building an AST from pattern strings
- Backtracking: Implementing efficient backtracking for quantifiers and alternation
- Alternation Logic: Handling multiple alternative patterns within groups
- Ownership in String Processing: Managing string slices and character iterators safely
- Pattern Matching: Using Rust's powerful enum pattern matching for different regex patterns
- Iterator Patterns: Efficient character-by-character processing with Rust iterators
- Memory Safety: Leveraging Rust's ownership system for safe regex processing
- Alternation Parsing: Implementing proper parsing of
(pattern1|pattern2|pattern3)syntax with nested groups - Backtracking Complexity: Ensuring efficient backtracking without infinite loops in quantifiers
- String Slice Management: Properly managing string slices and character indices in Rust
- Pattern Precedence: Handling operator precedence correctly in complex patterns
- The Codecrafters "Build your own grep" challenge provided the structure and test cases