Skip to content

Conversation

@kanishka1804
Copy link

@kanishka1804 kanishka1804 commented Oct 19, 2025

Closes #515

Summary of Changes

  1. Feature Implementation (Tests Added/Updated):

    • src/parse.spec.ts: Added a new describe('Line Ending Compatibility...') block with four end-to-end tests covering:
      1. Identical parsing for LF and CRLF inputs.
      2. Correct graceful handling of mixed LF/CRLF line endings.
      3. Preservation of LF/CRLF characters within quoted fields.
      4. Correct handling of single (ignored) vs. double (empty record created) trailing EOLs.
    • src/Lexer.test.ts: Added a specific unit test to confirm the Lexer correctly tokenizes mixed LF and CRLF record delimiters, including location data.
    • src/Lexer.spec.ts: Added a property-based test using fast-check to verify the Lexer's behavior across random EOL combinations.
  2. Bug Fixes (Required for Tests to Pass):

    • Lexer Trimming Bug Fix (src/Lexer.ts): Removed the incorrect logic in the *#tokens() method that was prematurely trimming the final EOL from the buffer. This fix ensures the Lexer properly produces a RecordDelimiter token for every EOL found, allowing the parser to correctly generate empty records for double-trailing EOLs (which was failing before).
    • Test Maintenance: Updated expectations in several original tests (src/Lexer.test.ts, src/Lexer.spec.ts) that were incorrectly expecting the final EOL to be trimmed.

Summary by CodeRabbit

  • Bug Fixes

    • Final line endings (LF, CRLF, and mixed) are now consistently emitted and tokenized so trailing delimiters and token positions are correct.
    • Delimiter boundary handling improved to preserve accurate token locations, including when fields are empty or quoted.
    • Lone end-of-line now produces a record delimiter token where appropriate.
  • Tests

    • Added property and example tests for LF/CRLF/mixed endings and single/double trailing endings.
    • Extended expectations to include location metadata; added a helper to collect parse results from multiple input types.

@changeset-bot
Copy link

changeset-bot bot commented Oct 19, 2025

⚠️ No Changeset found

Latest commit: 80e0b50

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The lexer no longer trims trailing CRLF/LF on flush and now slices buffers by the full delimiter length when emitting delimiters; tests and parse helpers were updated to expect RecordDelimiter tokens and consistent LOCATION_SHAPE metadata across LF, CRLF, and mixed line endings.

Changes

Cohort / File(s) Summary
Lexer implementation
src/Lexer.ts
Stop pre-emit trimming of trailing CRLF/LF; use full delimiter length when slicing/advancing cursor so delimiters are emitted with correct locations; adjust buffering logic/comments around deferring emission when matches consume the buffer.
Lexer tests (unit & property)
src/Lexer.spec.ts, src/Lexer.test.ts
Add property-based test for record delimiters across EOL styles; add unit test for mixed CRLF/LF; update many expectations to include RecordDelimiter tokens and LOCATION_SHAPE on Field and delimiter tokens, including final delimiter when EOF indicates.
Lexer transformer test
src/LexerTransformer.spec.ts
Update expectation so a lone end-of-line produces a RecordDelimiter token with LOCATION_SHAPE (previously expected no output).
Parse integration & helpers
src/parse.spec.ts
Add toArray helper to collect parse outputs from various input types and new "Line Ending Compatibility (CRLF vs LF)" tests covering LF vs CRLF equivalence, mixed endings, quoted-field line breaks, and single/double trailing EOLs (including empty trailing rows).

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Parser/Reader
    participant Lexer as Lexer
    participant TG as TokenGenerator

    rect rgba(243,247,255,0.9)
    note right of Lexer: Old flow (pre-change)
    Caller->>Lexer: provide buffer "field\r\n"
    Lexer->>TG: flush(buffer)
    TG->>TG: trim trailing CRLF/LF (removed)
    TG->>Caller: emit Field(token)  -- RecordDelimiter omitted
    end

    rect rgba(246,255,243,0.9)
    note right of Lexer: New flow (post-change)
    Caller->>Lexer: provide buffer "field\r\n"
    Lexer->>TG: flush(buffer)
    TG->>TG: slice buffer by full delimiter length
    TG->>Caller: emit Field(token with LOCATION_SHAPE)
    TG->>Caller: emit RecordDelimiter(token with LOCATION_SHAPE)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐇 I nibble endings, CRLF and LF alike,
I mark each token with a tidy little strike.
Delimiters stay, locations all shown,
Rows line up neat — no mystery left unknown.
A hop, a test, a parse: CSV delight. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility" is directly aligned with the changeset. The pull request adds multiple test suites across four test files (parse.spec.ts, Lexer.test.ts, Lexer.spec.ts, LexerTransformer.spec.ts) and includes a necessary bug fix in Lexer.ts that removes incorrect trimming of trailing CRLF/LF characters. The title clearly conveys both the testing additions and the underlying bug fix related to line ending compatibility, making it specific, concise, and accurate.
Linked Issues Check ✅ Passed The PR fully satisfies all acceptance criteria from linked issue #515. It adds four comprehensive end-to-end tests in src/parse.spec.ts covering identical parsing for LF vs CRLF [test 1], graceful handling of mixed line endings [test 2], preservation of line breaks inside quoted fields [test 3], and correct handling of trailing EOLs [test 4]. Additionally, it provides a property-based test using fast-check in src/Lexer.spec.ts for random EOL combinations [test 5], a unit test for mixed LF/CRLF tokenization in src/Lexer.test.ts, and a necessary bug fix in src/Lexer.ts that removes premature trimming of trailing CRLF/LF to enable the parser to generate empty records for double-trailing EOLs. All requested test locations and objectives are met.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly scoped to issue #515. The test additions in src/parse.spec.ts, src/Lexer.test.ts, and src/Lexer.spec.ts are explicitly requested objectives. The bug fix in src/Lexer.ts (removing premature trimming and fixing delimiter handling) is a necessary code change to enable the new test cases to pass and support proper CRLF/LF handling. The updates to src/LexerTransformer.spec.ts reflect the corrected behavior resulting from the Lexer.ts fix. No unrelated refactoring, style changes, or feature additions outside the scope of line ending compatibility are present.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kanishka1804, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of the CSV parsing library by addressing critical compatibility issues with different line ending conventions (CRLF and LF). It introduces extensive testing to ensure consistent and correct behavior across various EOL scenarios, including mixed formats and EOLs embedded within fields. A key bug in the Lexer's EOL trimming logic has been rectified, allowing for accurate parsing of trailing empty records. The overall impact is a more reliable and standard-compliant CSV parser.

Highlights

  • Comprehensive Line Ending Tests: Added a new suite of end-to-end tests in src/parse.spec.ts to validate the parser's compatibility with various line ending scenarios, including identical parsing for LF/CRLF, graceful handling of mixed endings, preservation of EOLs within quoted fields, and correct behavior with single/double trailing EOLs.
  • Lexer Bug Fix: Resolved a bug in src/Lexer.ts where the Lexer was prematurely trimming the final End-Of-Line (EOL) character from the buffer. This fix ensures that all record delimiters are correctly tokenized, which is crucial for handling scenarios like double trailing EOLs that result in empty records.
  • Enhanced Lexer Unit & Property Tests: Introduced a specific unit test in src/Lexer.test.ts to confirm correct tokenization of mixed LF and CRLF record delimiters, including location data. Additionally, a property-based test using fast-check was added to src/Lexer.spec.ts to thoroughly verify the Lexer's EOL handling across random combinations.
  • Test Maintenance: Updated existing test expectations in src/Lexer.test.ts and src/Lexer.spec.ts to align with the corrected Lexer behavior, specifically regarding the emission of RecordDelimiter tokens that were previously being trimmed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement to the CSV parser by ensuring compatibility with both LF and CRLF line endings. The changes include a crucial bug fix in the Lexer to correctly handle trailing EOLs, which is well-supported by a comprehensive new suite of tests, including unit, end-to-end, and property-based tests. The added tests greatly enhance confidence in the parser's robustness. My review includes a couple of suggestions to further improve the completeness and correctness of the new test cases. Overall, this is a high-quality contribution.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/Lexer.ts (2)

159-173: Fix delimiter slicing for multi-character delimiters.

Using slice(1) corrupts tokenization when delimiter length > 1 (e.g., "::"), causing the remaining delimiter chars to be treated as data. Use #fieldDelimiterLength consistently.

-    if (this.#buffer.startsWith(this.#delimiter)) {
-      this.#buffer = this.#buffer.slice(1);
+    if (this.#buffer.startsWith(this.#delimiter)) {
+      this.#buffer = this.#buffer.slice(this.#fieldDelimiterLength);
       const start: Position = { ...this.#cursor };
-      this.#cursor.column += this.#fieldDelimiterLength;
-      this.#cursor.offset += this.#fieldDelimiterLength;
+      this.#cursor.column += this.#fieldDelimiterLength;
+      this.#cursor.offset += this.#fieldDelimiterLength;
       return {
         type: FieldDelimiter,
         value: this.#delimiter,
         location: {
           start,
           end: { ...this.#cursor },
           rowNumber: this.#rowNumber,
         },
       };
     }

Also consider adding a unit/property test with multi-char delimiters to prevent regression (see suggestion in Lexer.spec.ts comment).


277-281: Comment contradicts the condition (buffering vs flushing).

The code defers emission when NOT flushing and the match fully consumes the buffer. Update the comment for clarity.

-      // If we're flushing and the match doesn't consume the entire buffer,
-      // then return null
+      // When buffering (not flushing) and the match fully consumes the buffer,
+      // defer emission to wait for the next chunk (field may be followed by delimiter/quote)
       if (this.#flush === false && match[0].length === this.#buffer.length) {
         return null;
       }
🧹 Nitpick comments (3)
src/Lexer.test.ts (1)

164-164: Minor test description typos.

  • Rename “RCLF” → “CRLF”.
  • Fix “thorw” → “throw”.
-  test("should parse a field with escaped and delimiter and record delimiter and EOF(RCLF)", () => {
+  test("should parse a field with escaped and delimiter and record delimiter and EOF(CRLF)", () => {
@@
-  test("should thorw an error if the field is not closed", () => {
+  test("should throw an error if the field is not closed", () => {
@@
-    test("should thorw DOMException named AbortError if the signal is aborted", () => {
+    test("should throw DOMException named AbortError if the signal is aborted", () => {

Also applies to: 280-281, 296-296

src/parse.spec.ts (2)

371-381: Tighten test typing to avoid any[].

Optional: define a Row alias for readability.

-async function toArray(csv: string | ReadableStream<string>): Promise<any[]>;
-async function toArray(csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response): Promise<any[]>;
+type Row = Record<string, string>;
+async function toArray(csv: string | ReadableStream<string>): Promise<Row[]>;
+async function toArray(csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response): Promise<Row[]>;
 async function toArray(
-    csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response
-): Promise<any[]> {
+    csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response
+): Promise<Row[]> {

423-435: Add a CRLF+CRLF double-trailing case for completeness.

You already test LF+LF. Add CRLF+CRLF to assert an empty trailing record is created too.

   it('should ignore a single trailing line ending (LF or CRLF) but handle double trailing EOL', async () => {
@@
     const expectedWithEmpty = [...expected, { name: '', age: '' }];
     expect(await toArray(csvWithDoubleTrailing)).toEqual(expectedWithEmpty);
+
+    // CRLF + CRLF should also produce an empty trailing record
+    const csvWithDoubleTrailingCRLF = 'name,age\r\nAlice,42\r\n\r\n';
+    expect(await toArray(csvWithDoubleTrailingCRLF)).toEqual(expectedWithEmpty);
   });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc4fc57 and 7c015d4.

📒 Files selected for processing (4)
  • src/Lexer.spec.ts (3 hunks)
  • src/Lexer.test.ts (3 hunks)
  • src/Lexer.ts (1 hunks)
  • src/parse.spec.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-06-10T19:31:23.296Z
Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

  • src/parse.spec.ts
📚 Learning: 2024-06-10T19:31:23.296Z
Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

  • src/Lexer.ts
🧬 Code graph analysis (3)
src/parse.spec.ts (1)
src/parse.ts (1)
  • parse (186-210)
src/Lexer.spec.ts (4)
src/__tests__/helper.ts (3)
  • eol (219-221)
  • row (228-237)
  • field (141-150)
src/escapeField.ts (1)
  • escapeField (23-65)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
src/Lexer.ts (1)
  • Lexer (19-301)
src/Lexer.test.ts (1)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
🔇 Additional comments (5)
src/Lexer.test.ts (3)

11-33: Mixed CRLF/LF tokenization test looks correct.

Offsets, line/column, and rowNumber transitions are consistent. Good coverage.


152-160: EOF(LF): expecting a final RecordDelimiter is correct.

Matches the updated Lexer semantics when flushing.


194-202: EOF(CRLF): final RecordDelimiter expectation is correct.

Consistent with CRLF handling and location updates.

src/Lexer.spec.ts (2)

25-79: Property test for EOL tokenization looks solid.

Good use of toMatchObject with LOCATION_SHAPE and quoted fields to isolate RecordDelimiter behavior.


295-326: EOF final RecordDelimiter expectation is appropriate.

This matches the “no trimming on flush” change.

To guard against regressions, consider adding a targeted test for multi-character delimiters (would have caught a slicing bug in Lexer):

+  it("should tokenize correctly with multi-character delimiters", () => {
+    const options = { delimiter: "::" as const, quotation: '"' as const };
+    const lexer = new Lexer(options);
+    const csv = '"a"::"b"\r\n"c"::"d"';
+    const tokens = [...lexer.lex(csv)];
+    expect(tokens).toMatchObject([
+      { type: Field, value: "a", location: LOCATION_SHAPE },
+      { type: FieldDelimiter, value: "::", location: LOCATION_SHAPE },
+      { type: Field, value: "b", location: LOCATION_SHAPE },
+      { type: RecordDelimiter, value: "\r\n", location: LOCATION_SHAPE },
+      { type: Field, value: "c", location: LOCATION_SHAPE },
+      { type: FieldDelimiter, value: "::", location: LOCATION_SHAPE },
+      { type: Field, value: "d", location: LOCATION_SHAPE },
+    ]);
+  });

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/Lexer.spec.ts (1)

25-79: Assert location on Field tokens for consistency and stronger checks.

Other tests validate location on Field tokens; this new property test doesn’t. Add location: LOCATION_SHAPE to Field entries to ensure position metadata is consistently validated.

-                { type: Field, value: field },
+                { type: Field, value: field, location: LOCATION_SHAPE },
🧹 Nitpick comments (1)
src/LexerTransformer.spec.ts (1)

164-177: Great regression example; add a CRLF twin for completeness.

The lone LF example is perfect to lock in the fix. Consider adding an explicit CRLF example too to harden cross‑platform coverage.

       {
         options: { delimiter: ",", quotation: '"' },
         chunks: ["\n"],
         expected: [
           {
             type: RecordDelimiter,
             value: "\n",
             location: LOCATION_SHAPE, 
           },
         ],
       },
+      {
+        options: { delimiter: ",", quotation: '"' },
+        chunks: ["\r\n"],
+        expected: [
+          {
+            type: RecordDelimiter,
+            value: "\r\n",
+            location: LOCATION_SHAPE,
+          },
+        ],
+      },
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c015d4 and 980f932.

📒 Files selected for processing (4)
  • src/Lexer.spec.ts (3 hunks)
  • src/Lexer.ts (2 hunks)
  • src/LexerTransformer.spec.ts (1 hunks)
  • src/parse.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Lexer.ts
  • src/parse.spec.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-06-10T19:31:23.296Z
Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

  • src/LexerTransformer.spec.ts
🧬 Code graph analysis (2)
src/LexerTransformer.spec.ts (1)
src/common/constants.ts (1)
  • RecordDelimiter (10-10)
src/Lexer.spec.ts (4)
src/__tests__/helper.ts (3)
  • eol (219-221)
  • row (228-237)
  • field (141-150)
src/escapeField.ts (1)
  • escapeField (23-65)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
src/Lexer.ts (1)
  • Lexer (19-301)
🔇 Additional comments (2)
src/Lexer.spec.ts (2)

292-298: LGTM — location added to Field tokens.

Adding location: LOCATION_SHAPE here aligns expectations with the lexer’s position tracking.


321-327: LGTM — explicit final RecordDelimiter at EOF.

Pushing a trailing RecordDelimiter when EOF is true correctly mirrors the lexer’s behavior post‑fix.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/Lexer.spec.ts (1)

25-79: Well-structured property test for EOL tokenization.

The test correctly validates that the Lexer emits proper tokens (Field, FieldDelimiter, RecordDelimiter) with location metadata for CSVs using either LF or CRLF. The test data constraints (minLength: 1) ensure non-empty inputs, and the expected token array logic properly accounts for fields, delimiters, and record boundaries.

Optional: Consider clarifying the test scope in the name.

The test name mentions "for various EOL" which could imply mixed line endings within a single CSV. However, each generated CSV uses a consistent EOL (const eol = g(FC.eol)), and the property test exercises different EOL types across multiple runs. If mixed EOLs are tested elsewhere (e.g., in Lexer.test.ts as noted in the PR summary), consider renaming to "for LF and CRLF delimiters" to avoid ambiguity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 980f932 and eac6a50.

📒 Files selected for processing (2)
  • src/Lexer.spec.ts (3 hunks)
  • src/parse.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/parse.spec.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/Lexer.spec.ts (4)
src/__tests__/helper.ts (3)
  • eol (219-221)
  • row (228-237)
  • field (141-150)
src/escapeField.ts (1)
  • escapeField (23-65)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
src/Lexer.ts (1)
  • Lexer (19-301)
🔇 Additional comments (2)
src/Lexer.spec.ts (2)

296-296: LGTM: Location metadata now consistent.

Adding location: LOCATION_SHAPE aligns this expectation with other tests and ensures location data is validated for Field tokens when fields are quoted or non-empty.


321-327: Correct handling of trailing EOL.

When EOF is true, the CSV ends with an EOL character, and the Lexer should emit a corresponding RecordDelimiter token. The data.length > 0 guard is defensive (the test constraint ensures minLength: 1) but harmless and makes the intent explicit.

@kamiazya
Copy link
Owner

@kanishka1804 Please check CI failure, seems like there are some format issue.

@kamiazya kamiazya changed the title feat(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility Oct 21, 2025
@kanishka1804
Copy link
Author

Hello @kamiazya,
I apologize for the CI failures. I have resolved the format issue.
Thank you so much for your time and guidance on this issue—I appreciate it now it should ready for your final review.

@codecov
Copy link

codecov bot commented Oct 23, 2025

Bundle Report

Changes will decrease total bundle size by 1.35kB (-0.17%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
web-csv-toolbox-CSV-esm 229.18kB -1.35kB (-0.59%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: web-csv-toolbox-CSV-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
_virtual/web_csv_toolbox_wasm_bg.wasm.js -592 bytes 111.65kB -0.53%
web_csv_toolbox_wasm_bg.wasm -443 bytes 83.64kB -0.53%
Lexer.js -315 bytes 5.94kB -5.04%

Files in Lexer.js:

  • ./src/Lexer.ts → Total Size: 5.55kB

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 23, 2025

CodSpeed Performance Report

Merging #536 will not alter performance

Comparing kanishka1804:feat/csv-line-ending-tests (80e0b50) with main (009c762)

Summary

✅ 7 untouched

@kamiazya kamiazya force-pushed the feat/csv-line-ending-tests branch from e45035d to 80e0b50 Compare October 23, 2025 11:15
@kamiazya kamiazya added the bug Something isn't working label Oct 23, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/parse.spec.ts (1)

385-463: Line-ending compatibility suite is comprehensive and matches #515 acceptance criteria.

LF vs CRLF parity, mixed endings, quoted EOL preservation, and all four double‑EOL combinations are covered. Good job.

🧹 Nitpick comments (3)
src/LexerTransformer.spec.ts (1)

165-175: Cover single CRLF example as well for symmetry.

Add a matching example for "\r\n" so both LF and CRLF single‑EOL cases are locked in.

           {
             options: { delimiter: ",", quotation: '"' },
             chunks: ["\n"],
             expected: [
               {
                 type: RecordDelimiter,
                 value: "\n",
                 location: LOCATION_SHAPE,
               },
             ],
           },
+          {
+            options: { delimiter: ",", quotation: '"' },
+            chunks: ["\r\n"],
+            expected: [
+              {
+                type: RecordDelimiter,
+                value: "\r\n",
+                location: LOCATION_SHAPE,
+              },
+            ],
+          },
src/Lexer.test.ts (1)

282-290: Nit: fix test title typo “RCLF” → “CRLF”.

Purely cosmetic; avoids confusion.

-  test("should parse a field with escaped and delimiter and record delimiter and EOF(RCLF)", () => {
+  test("should parse a field with escaped and delimiter and record delimiter and EOF(CRLF)", () => {
src/parse.spec.ts (1)

371-384: Type the helper’s return to improve checks.

Returning a structured type helps catch expectation mistakes in tests.

-async function toArray(csv: string | ReadableStream<string>): Promise<any[]>;
+async function toArray(csv: string | ReadableStream<string>): Promise<Record<string, string>[]>;
-async function toArray(
+async function toArray(
   csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response,
-): Promise<any[]>;
+): Promise<Record<string, string>[]>;
-async function toArray(
+async function toArray(
   csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response,
-): Promise<any[]> {
+): Promise<Record<string, string>[]> {
   const records = [];
   for await (const row of parse(csv as any)) {
     records.push(row);
   }
   return records;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5185c0 and 80e0b50.

📒 Files selected for processing (5)
  • src/Lexer.spec.ts (3 hunks)
  • src/Lexer.test.ts (3 hunks)
  • src/Lexer.ts (2 hunks)
  • src/LexerTransformer.spec.ts (1 hunks)
  • src/parse.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Lexer.ts
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-06-10T19:31:23.296Z
Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

  • src/parse.spec.ts
📚 Learning: 2024-06-10T19:31:23.296Z
Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

  • src/LexerTransformer.spec.ts
🧬 Code graph analysis (4)
src/parse.spec.ts (1)
src/parse.ts (1)
  • parse (186-210)
src/Lexer.test.ts (1)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
src/Lexer.spec.ts (4)
src/__tests__/helper.ts (3)
  • eol (219-221)
  • row (228-237)
  • field (141-150)
src/escapeField.ts (1)
  • escapeField (23-65)
src/common/constants.ts (3)
  • Field (15-15)
  • FieldDelimiter (5-5)
  • RecordDelimiter (10-10)
src/Lexer.ts (1)
  • Lexer (19-302)
src/LexerTransformer.spec.ts (1)
src/common/constants.ts (1)
  • RecordDelimiter (10-10)
🔇 Additional comments (4)
src/Lexer.test.ts (2)

11-121: Mixed CRLF/LF tokenization test reads correctly.

Locations/offsets and delimiter values align with the Lexer behavior. Nice addition.


240-248: EOF(LF) RecordDelimiter assertion matches new no-trim behavior.

Expectation is correct post-fix.

src/Lexer.spec.ts (2)

24-77: Property test for EOL tokenization is solid.

Good coverage across options/data with quoted fields; expected tokens and locations look right.


294-325: EOF RecordDelimiter expectation is correct with the trailing‑EOL fix.

This guards against regressions in end‑of‑file handling.

@kamiazya
Copy link
Owner

@kanishka1804 Sorry, I haven't had a chance to look into it in detail either, but it seems highly likely that there's some problem with the automated tests. The GitHub Action job timed out.

It's probable that the overall behavior has become inconsistent due to the bug fix.

The original issue was just to add tests, so if you were only adding tests, I think it could be merged. Do you want to continue development with the current scope, which includes this bug fix?

Of course, I would appreciate it if you continue as is. However, I suspect you became interested in this project through Hacktoberfest, so changing this to a smaller-scoped PR might be a good idea for you.

@kamiazya
Copy link
Owner

@kanishka1804 Hi, just a reminder, since it's last day of the Hactoberfest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working hacktoberfest-accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test cases for CSV with different line endings (CRLF vs LF)

2 participants