bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility #536

kanishka1804 · 2025-10-19T16:15:56Z

Closes #515

Summary of Changes

Feature Implementation (Tests Added/Updated):
- src/parse.spec.ts: Added a new describe('Line Ending Compatibility...') block with four end-to-end tests covering:
  1. Identical parsing for LF and CRLF inputs.
  2. Correct graceful handling of mixed LF/CRLF line endings.
  3. Preservation of LF/CRLF characters within quoted fields.
  4. Correct handling of single (ignored) vs. double (empty record created) trailing EOLs.
- src/Lexer.test.ts: Added a specific unit test to confirm the Lexer correctly tokenizes mixed LF and CRLF record delimiters, including location data.
- src/Lexer.spec.ts: Added a property-based test using fast-check to verify the Lexer's behavior across random EOL combinations.
Bug Fixes (Required for Tests to Pass):
- Lexer Trimming Bug Fix (src/Lexer.ts): Removed the incorrect logic in the *#tokens() method that was prematurely trimming the final EOL from the buffer. This fix ensures the Lexer properly produces a RecordDelimiter token for every EOL found, allowing the parser to correctly generate empty records for double-trailing EOLs (which was failing before).
- Test Maintenance: Updated expectations in several original tests (src/Lexer.test.ts, src/Lexer.spec.ts) that were incorrectly expecting the final EOL to be trimmed.

Summary by CodeRabbit

Bug Fixes
- Final line endings (LF, CRLF, and mixed) are now consistently emitted and tokenized so trailing delimiters and token positions are correct.
- Delimiter boundary handling improved to preserve accurate token locations, including when fields are empty or quoted.
- Lone end-of-line now produces a record delimiter token where appropriate.
Tests
- Added property and example tests for LF/CRLF/mixed endings and single/double trailing endings.
- Extended expectations to include location metadata; added a helper to collect parse results from multiple input types.

changeset-bot · 2025-10-19T16:16:00Z

⚠️ No Changeset found

Latest commit: 80e0b50

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-10-19T16:16:06Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The lexer no longer trims trailing CRLF/LF on flush and now slices buffers by the full delimiter length when emitting delimiters; tests and parse helpers were updated to expect RecordDelimiter tokens and consistent LOCATION_SHAPE metadata across LF, CRLF, and mixed line endings.

Changes

Cohort / File(s)	Summary
Lexer implementation `src/Lexer.ts`	Stop pre-emit trimming of trailing CRLF/LF; use full delimiter length when slicing/advancing cursor so delimiters are emitted with correct locations; adjust buffering logic/comments around deferring emission when matches consume the buffer.
Lexer tests (unit & property) `src/Lexer.spec.ts`, `src/Lexer.test.ts`	Add property-based test for record delimiters across EOL styles; add unit test for mixed CRLF/LF; update many expectations to include `RecordDelimiter` tokens and `LOCATION_SHAPE` on Field and delimiter tokens, including final delimiter when EOF indicates.
Lexer transformer test `src/LexerTransformer.spec.ts`	Update expectation so a lone end-of-line produces a `RecordDelimiter` token with `LOCATION_SHAPE` (previously expected no output).
Parse integration & helpers `src/parse.spec.ts`	Add `toArray` helper to collect parse outputs from various input types and new "Line Ending Compatibility (CRLF vs LF)" tests covering LF vs CRLF equivalence, mixed endings, quoted-field line breaks, and single/double trailing EOLs (including empty trailing rows).

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Parser/Reader
    participant Lexer as Lexer
    participant TG as TokenGenerator

    rect rgba(243,247,255,0.9)
    note right of Lexer: Old flow (pre-change)
    Caller->>Lexer: provide buffer "field\r\n"
    Lexer->>TG: flush(buffer)
    TG->>TG: trim trailing CRLF/LF (removed)
    TG->>Caller: emit Field(token)  -- RecordDelimiter omitted
    end

    rect rgba(246,255,243,0.9)
    note right of Lexer: New flow (post-change)
    Caller->>Lexer: provide buffer "field\r\n"
    Lexer->>TG: flush(buffer)
    TG->>TG: slice buffer by full delimiter length
    TG->>Caller: emit Field(token with LOCATION_SHAPE)
    TG->>Caller: emit RecordDelimiter(token with LOCATION_SHAPE)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Refactor CSV token handling #240 — Overlapping changes to src/Lexer.ts tokenization/delimiter handling; likely related to delimiter slicing and emission.
Add detailed position tracking for tokens #238 — Related updates to token shapes and emission logic (location metadata and delimiter tokens) affecting lexer/tests.

Suggested labels

enhancement

Poem

🐇 I nibble endings, CRLF and LF alike,
I mark each token with a tidy little strike.
Delimiters stay, locations all shown,
Rows line up neat — no mystery left unknown.
A hop, a test, a parse: CSV delight. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility" is directly aligned with the changeset. The pull request adds multiple test suites across four test files (parse.spec.ts, Lexer.test.ts, Lexer.spec.ts, LexerTransformer.spec.ts) and includes a necessary bug fix in Lexer.ts that removes incorrect trimming of trailing CRLF/LF characters. The title clearly conveys both the testing additions and the underlying bug fix related to line ending compatibility, making it specific, concise, and accurate.
Linked Issues Check	✅ Passed	The PR fully satisfies all acceptance criteria from linked issue #515. It adds four comprehensive end-to-end tests in src/parse.spec.ts covering identical parsing for LF vs CRLF [test 1], graceful handling of mixed line endings [test 2], preservation of line breaks inside quoted fields [test 3], and correct handling of trailing EOLs [test 4]. Additionally, it provides a property-based test using fast-check in src/Lexer.spec.ts for random EOL combinations [test 5], a unit test for mixed LF/CRLF tokenization in src/Lexer.test.ts, and a necessary bug fix in src/Lexer.ts that removes premature trimming of trailing CRLF/LF to enable the parser to generate empty records for double-trailing EOLs. All requested test locations and objectives are met.
Out of Scope Changes Check	✅ Passed	All changes in the pull request are directly scoped to issue #515. The test additions in src/parse.spec.ts, src/Lexer.test.ts, and src/Lexer.spec.ts are explicitly requested objectives. The bug fix in src/Lexer.ts (removing premature trimming and fixing delimiter handling) is a necessary code change to enable the new test cases to pass and support proper CRLF/LF handling. The updates to src/LexerTransformer.spec.ts reflect the corrected behavior resulting from the Lexer.ts fix. No unrelated refactoring, style changes, or feature additions outside the scope of line ending compatibility are present.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-10-19T16:16:11Z

Summary of Changes

Hello @kanishka1804, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of the CSV parsing library by addressing critical compatibility issues with different line ending conventions (CRLF and LF). It introduces extensive testing to ensure consistent and correct behavior across various EOL scenarios, including mixed formats and EOLs embedded within fields. A key bug in the Lexer's EOL trimming logic has been rectified, allowing for accurate parsing of trailing empty records. The overall impact is a more reliable and standard-compliant CSV parser.

Highlights

Comprehensive Line Ending Tests: Added a new suite of end-to-end tests in src/parse.spec.ts to validate the parser's compatibility with various line ending scenarios, including identical parsing for LF/CRLF, graceful handling of mixed endings, preservation of EOLs within quoted fields, and correct behavior with single/double trailing EOLs.
Lexer Bug Fix: Resolved a bug in src/Lexer.ts where the Lexer was prematurely trimming the final End-Of-Line (EOL) character from the buffer. This fix ensures that all record delimiters are correctly tokenized, which is crucial for handling scenarios like double trailing EOLs that result in empty records.
Enhanced Lexer Unit & Property Tests: Introduced a specific unit test in src/Lexer.test.ts to confirm correct tokenization of mixed LF and CRLF record delimiters, including location data. Additionally, a property-based test using fast-check was added to src/Lexer.spec.ts to thoroughly verify the Lexer's EOL handling across random combinations.
Test Maintenance: Updated existing test expectations in src/Lexer.test.ts and src/Lexer.spec.ts to align with the corrected Lexer behavior, specifically regarding the emission of RecordDelimiter tokens that were previously being trimmed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant improvement to the CSV parser by ensuring compatibility with both LF and CRLF line endings. The changes include a crucial bug fix in the Lexer to correctly handle trailing EOLs, which is well-supported by a comprehensive new suite of tests, including unit, end-to-end, and property-based tests. The added tests greatly enhance confidence in the parser's robustness. My review includes a couple of suggestions to further improve the completeness and correctness of the new test cases. Overall, this is a high-quality contribution.

src/Lexer.spec.ts

src/parse.spec.ts

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/Lexer.ts (2)

159-173: Fix delimiter slicing for multi-character delimiters.

Using slice(1) corrupts tokenization when delimiter length > 1 (e.g., "::"), causing the remaining delimiter chars to be treated as data. Use #fieldDelimiterLength consistently.

-    if (this.#buffer.startsWith(this.#delimiter)) {
-      this.#buffer = this.#buffer.slice(1);
+    if (this.#buffer.startsWith(this.#delimiter)) {
+      this.#buffer = this.#buffer.slice(this.#fieldDelimiterLength);
       const start: Position = { ...this.#cursor };
-      this.#cursor.column += this.#fieldDelimiterLength;
-      this.#cursor.offset += this.#fieldDelimiterLength;
+      this.#cursor.column += this.#fieldDelimiterLength;
+      this.#cursor.offset += this.#fieldDelimiterLength;
       return {
         type: FieldDelimiter,
         value: this.#delimiter,
         location: {
           start,
           end: { ...this.#cursor },
           rowNumber: this.#rowNumber,
         },
       };
     }

Also consider adding a unit/property test with multi-char delimiters to prevent regression (see suggestion in Lexer.spec.ts comment).

277-281: Comment contradicts the condition (buffering vs flushing).

The code defers emission when NOT flushing and the match fully consumes the buffer. Update the comment for clarity.

-      // If we're flushing and the match doesn't consume the entire buffer,
-      // then return null
+      // When buffering (not flushing) and the match fully consumes the buffer,
+      // defer emission to wait for the next chunk (field may be followed by delimiter/quote)
       if (this.#flush === false && match[0].length === this.#buffer.length) {
         return null;
       }

🧹 Nitpick comments (3)

src/Lexer.test.ts (1)

164-164: Minor test description typos.

Rename “RCLF” → “CRLF”.
Fix “thorw” → “throw”.

-  test("should parse a field with escaped and delimiter and record delimiter and EOF(RCLF)", () => {
+  test("should parse a field with escaped and delimiter and record delimiter and EOF(CRLF)", () => {
@@
-  test("should thorw an error if the field is not closed", () => {
+  test("should throw an error if the field is not closed", () => {
@@
-    test("should thorw DOMException named AbortError if the signal is aborted", () => {
+    test("should throw DOMException named AbortError if the signal is aborted", () => {

Also applies to: 280-281, 296-296

src/parse.spec.ts (2)

371-381: Tighten test typing to avoid any[].

Optional: define a Row alias for readability.

-async function toArray(csv: string | ReadableStream<string>): Promise<any[]>;
-async function toArray(csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response): Promise<any[]>;
+type Row = Record<string, string>;
+async function toArray(csv: string | ReadableStream<string>): Promise<Row[]>;
+async function toArray(csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response): Promise<Row[]>;
 async function toArray(
-    csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response
-): Promise<any[]> {
+    csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response
+): Promise<Row[]> {

423-435: Add a CRLF+CRLF double-trailing case for completeness.

You already test LF+LF. Add CRLF+CRLF to assert an empty trailing record is created too.

   it('should ignore a single trailing line ending (LF or CRLF) but handle double trailing EOL', async () => {
@@
     const expectedWithEmpty = [...expected, { name: '', age: '' }];
     expect(await toArray(csvWithDoubleTrailing)).toEqual(expectedWithEmpty);
+
+    // CRLF + CRLF should also produce an empty trailing record
+    const csvWithDoubleTrailingCRLF = 'name,age\r\nAlice,42\r\n\r\n';
+    expect(await toArray(csvWithDoubleTrailingCRLF)).toEqual(expectedWithEmpty);
   });

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc4fc57 and 7c015d4.

📒 Files selected for processing (4)

src/Lexer.spec.ts (3 hunks)
src/Lexer.test.ts (3 hunks)
src/Lexer.ts (1 hunks)
src/parse.spec.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parse.spec.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/Lexer.ts

🧬 Code graph analysis (3)

src/parse.spec.ts (1)

src/parse.ts (1)

parse (186-210)

src/Lexer.spec.ts (4)

src/__tests__/helper.ts (3)

eol (219-221)

row (228-237)

field (141-150)

src/escapeField.ts (1)

escapeField (23-65)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

src/Lexer.ts (1)

Lexer (19-301)

src/Lexer.test.ts (1)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

🔇 Additional comments (5)

src/Lexer.test.ts (3)

11-33: Mixed CRLF/LF tokenization test looks correct.

Offsets, line/column, and rowNumber transitions are consistent. Good coverage.

152-160: EOF(LF): expecting a final RecordDelimiter is correct.

Matches the updated Lexer semantics when flushing.

194-202: EOF(CRLF): final RecordDelimiter expectation is correct.

Consistent with CRLF handling and location updates.
src/Lexer.spec.ts (2)
25-79: Property test for EOL tokenization looks solid.

Good use of toMatchObject with LOCATION_SHAPE and quoted fields to isolate RecordDelimiter behavior.

295-326: EOF final RecordDelimiter expectation is appropriate.

This matches the “no trimming on flush” change.

To guard against regressions, consider adding a targeted test for multi-character delimiters (would have caught a slicing bug in Lexer):
+  it("should tokenize correctly with multi-character delimiters", () => {
+    const options = { delimiter: "::" as const, quotation: '"' as const };
+    const lexer = new Lexer(options);
+    const csv = '"a"::"b"\r\n"c"::"d"';
+    const tokens = [...lexer.lex(csv)];
+    expect(tokens).toMatchObject([
+      { type: Field, value: "a", location: LOCATION_SHAPE },
+      { type: FieldDelimiter, value: "::", location: LOCATION_SHAPE },
+      { type: Field, value: "b", location: LOCATION_SHAPE },
+      { type: RecordDelimiter, value: "\r\n", location: LOCATION_SHAPE },
+      { type: Field, value: "c", location: LOCATION_SHAPE },
+      { type: FieldDelimiter, value: "::", location: LOCATION_SHAPE },
+      { type: Field, value: "d", location: LOCATION_SHAPE },
+    ]);
+  });

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/Lexer.spec.ts (1)
25-79: Assert location on Field tokens for consistency and stronger checks.

Other tests validate location on Field tokens; this new property test doesn’t. Add location: LOCATION_SHAPE to Field entries to ensure position metadata is consistently validated.
-                { type: Field, value: field },
+                { type: Field, value: field, location: LOCATION_SHAPE },

🧹 Nitpick comments (1)

src/LexerTransformer.spec.ts (1)

164-177: Great regression example; add a CRLF twin for completeness.

The lone LF example is perfect to lock in the fix. Consider adding an explicit CRLF example too to harden cross‑platform coverage.

       {
         options: { delimiter: ",", quotation: '"' },
         chunks: ["\n"],
         expected: [
           {
             type: RecordDelimiter,
             value: "\n",
             location: LOCATION_SHAPE, 
           },
         ],
       },
+      {
+        options: { delimiter: ",", quotation: '"' },
+        chunks: ["\r\n"],
+        expected: [
+          {
+            type: RecordDelimiter,
+            value: "\r\n",
+            location: LOCATION_SHAPE,
+          },
+        ],
+      },

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c015d4 and 980f932.

📒 Files selected for processing (4)

src/Lexer.spec.ts (3 hunks)
src/Lexer.ts (2 hunks)
src/LexerTransformer.spec.ts (1 hunks)
src/parse.spec.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/Lexer.ts
src/parse.spec.ts

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/LexerTransformer.spec.ts

🧬 Code graph analysis (2)

src/LexerTransformer.spec.ts (1)

src/common/constants.ts (1)

RecordDelimiter (10-10)

src/Lexer.spec.ts (4)

src/__tests__/helper.ts (3)

eol (219-221)

row (228-237)

field (141-150)

src/escapeField.ts (1)

escapeField (23-65)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

src/Lexer.ts (1)

Lexer (19-301)

🔇 Additional comments (2)

src/Lexer.spec.ts (2)

292-298: LGTM — location added to Field tokens.

Adding location: LOCATION_SHAPE here aligns expectations with the lexer’s position tracking.

321-327: LGTM — explicit final RecordDelimiter at EOF.

Pushing a trailing RecordDelimiter when EOF is true correctly mirrors the lexer’s behavior post‑fix.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/Lexer.spec.ts (1)

25-79: Well-structured property test for EOL tokenization.

The test correctly validates that the Lexer emits proper tokens (Field, FieldDelimiter, RecordDelimiter) with location metadata for CSVs using either LF or CRLF. The test data constraints (minLength: 1) ensure non-empty inputs, and the expected token array logic properly accounts for fields, delimiters, and record boundaries.

Optional: Consider clarifying the test scope in the name.

The test name mentions "for various EOL" which could imply mixed line endings within a single CSV. However, each generated CSV uses a consistent EOL (const eol = g(FC.eol)), and the property test exercises different EOL types across multiple runs. If mixed EOLs are tested elsewhere (e.g., in Lexer.test.ts as noted in the PR summary), consider renaming to "for LF and CRLF delimiters" to avoid ambiguity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 980f932 and eac6a50.

📒 Files selected for processing (2)

src/Lexer.spec.ts (3 hunks)
src/parse.spec.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/parse.spec.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/Lexer.spec.ts (4)

src/__tests__/helper.ts (3)

eol (219-221)

row (228-237)

field (141-150)

src/escapeField.ts (1)

escapeField (23-65)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

src/Lexer.ts (1)

Lexer (19-301)

🔇 Additional comments (2)

src/Lexer.spec.ts (2)

296-296: LGTM: Location metadata now consistent.

Adding location: LOCATION_SHAPE aligns this expectation with other tests and ensures location data is validated for Field tokens when fields are quoted or non-empty.

321-327: Correct handling of trailing EOL.

When EOF is true, the CSV ends with an EOL character, and the Lexer should emit a corresponding RecordDelimiter token. The data.length > 0 guard is defensive (the test constraint ensures minLength: 1) but harmless and makes the intent explicit.

kamiazya · 2025-10-21T01:05:01Z

@kanishka1804 Please check CI failure, seems like there are some format issue.

kanishka1804 · 2025-10-23T10:09:05Z

Hello @kamiazya,
I apologize for the CI failures. I have resolved the format issue.
Thank you so much for your time and guidance on this issue—I appreciate it now it should ready for your final review.

codecov · 2025-10-23T10:28:37Z

Bundle Report

Changes will decrease total bundle size by 1.35kB (-0.17%) ⬇️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
web-csv-toolbox-CSV-esm	229.18kB	-1.35kB (-0.59%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: web-csv-toolbox-CSV-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`_virtual/web_csv_toolbox_wasm_bg.wasm.js`	-592 bytes	111.65kB	-0.53%
`web_csv_toolbox_wasm_bg.wasm`	-443 bytes	83.64kB	-0.53%
`Lexer.js`	-315 bytes	5.94kB	-5.04%

Files in Lexer.js:

./src/Lexer.ts → Total Size: 5.55kB

codspeed-hq · 2025-10-23T10:30:37Z

CodSpeed Performance Report

Merging #536 will not alter performance

_{Comparing kanishka1804:feat/csv-line-ending-tests (80e0b50) with main (009c762)}

Summary

✅ 7 untouched

…ending compatibility

…r Lexer bug fix

…onsistency

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/parse.spec.ts (1)

385-463: Line-ending compatibility suite is comprehensive and matches #515 acceptance criteria.

LF vs CRLF parity, mixed endings, quoted EOL preservation, and all four double‑EOL combinations are covered. Good job.

🧹 Nitpick comments (3)

src/LexerTransformer.spec.ts (1)

165-175: Cover single CRLF example as well for symmetry.

Add a matching example for "\r\n" so both LF and CRLF single‑EOL cases are locked in.

           {
             options: { delimiter: ",", quotation: '"' },
             chunks: ["\n"],
             expected: [
               {
                 type: RecordDelimiter,
                 value: "\n",
                 location: LOCATION_SHAPE,
               },
             ],
           },
+          {
+            options: { delimiter: ",", quotation: '"' },
+            chunks: ["\r\n"],
+            expected: [
+              {
+                type: RecordDelimiter,
+                value: "\r\n",
+                location: LOCATION_SHAPE,
+              },
+            ],
+          },

src/Lexer.test.ts (1)

282-290: Nit: fix test title typo “RCLF” → “CRLF”.

Purely cosmetic; avoids confusion.
-  test("should parse a field with escaped and delimiter and record delimiter and EOF(RCLF)", () => {
+  test("should parse a field with escaped and delimiter and record delimiter and EOF(CRLF)", () => {

src/parse.spec.ts (1)

371-384: Type the helper’s return to improve checks.

Returning a structured type helps catch expectation mistakes in tests.

-async function toArray(csv: string | ReadableStream<string>): Promise<any[]>;
+async function toArray(csv: string | ReadableStream<string>): Promise<Record<string, string>[]>;
-async function toArray(
+async function toArray(
   csv: Uint8Array | ArrayBuffer | ReadableStream<Uint8Array> | Response,
-): Promise<any[]>;
+): Promise<Record<string, string>[]>;
-async function toArray(
+async function toArray(
   csv: string | Uint8Array | ArrayBuffer | ReadableStream<any> | Response,
-): Promise<any[]> {
+): Promise<Record<string, string>[]> {
   const records = [];
   for await (const row of parse(csv as any)) {
     records.push(row);
   }
   return records;
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5185c0 and 80e0b50.

📒 Files selected for processing (5)

src/Lexer.spec.ts (3 hunks)
src/Lexer.test.ts (3 hunks)
src/Lexer.ts (2 hunks)
src/LexerTransformer.spec.ts (1 hunks)
src/parse.spec.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/Lexer.ts

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parse.spec.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
PR: kamiazya/web-csv-toolbox#29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/LexerTransformer.spec.ts

🧬 Code graph analysis (4)

src/parse.spec.ts (1)

src/parse.ts (1)

parse (186-210)

src/Lexer.test.ts (1)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

src/Lexer.spec.ts (4)

src/__tests__/helper.ts (3)

eol (219-221)

row (228-237)

field (141-150)

src/escapeField.ts (1)

escapeField (23-65)

src/common/constants.ts (3)

Field (15-15)

FieldDelimiter (5-5)

RecordDelimiter (10-10)

src/Lexer.ts (1)

Lexer (19-302)

src/LexerTransformer.spec.ts (1)

src/common/constants.ts (1)

RecordDelimiter (10-10)

🔇 Additional comments (4)

src/Lexer.test.ts (2)

11-121: Mixed CRLF/LF tokenization test reads correctly.

Locations/offsets and delimiter values align with the Lexer behavior. Nice addition.

240-248: EOF(LF) RecordDelimiter assertion matches new no-trim behavior.

Expectation is correct post-fix.

src/Lexer.spec.ts (2)

24-77: Property test for EOL tokenization is solid.

Good coverage across options/data with quoted fields; expected tokens and locations look right.

294-325: EOF RecordDelimiter expectation is correct with the trailing‑EOL fix.

This guards against regressions in end‑of‑file handling.

kamiazya · 2025-10-25T07:54:24Z

@kanishka1804 Sorry, I haven't had a chance to look into it in detail either, but it seems highly likely that there's some problem with the automated tests. The GitHub Action job timed out.

It's probable that the overall behavior has become inconsistent due to the bug fix.

The original issue was just to add tests, so if you were only adding tests, I think it could be merged. Do you want to continue development with the current scope, which includes this bug fix?

Of course, I would appreciate it if you continue as is. However, I suspect you became interested in this project through Hacktoberfest, so changing this to a smaller-scoped PR might be a good idea for you.

kamiazya · 2025-10-31T02:02:49Z

@kanishka1804 Hi, just a reminder, since it's last day of the Hactoberfest.

gemini-code-assist bot reviewed Oct 19, 2025

View reviewed changes

src/Lexer.spec.ts Outdated Show resolved Hide resolved

src/parse.spec.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 19, 2025

View reviewed changes

kamiazya changed the title ~~feat(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility~~ bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility Oct 21, 2025

kanishka1804 added 5 commits October 23, 2025 20:15

feat(csv): Add comprehensive test cases and fix bug for CRLF/LF line …

ade78ca

…ending compatibility

fix(test): Correct LexerTransformer expectation for trailing EOL afte…

75de108

…r Lexer bug fix

fix(review): Address final PR feedback

e44c8db

fix(review): Final cleanup for Lexer buffering comments and slicing c…

76508ca

…onsistency

style: apply auto-formatting fixes via biome

80e0b50

kamiazya force-pushed the feat/csv-line-ending-tests branch from e45035d to 80e0b50 Compare October 23, 2025 11:15

kamiazya assigned kanishka1804 Oct 23, 2025

kamiazya added the bug Something isn't working label Oct 23, 2025

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

kamiazya added the hacktoberfest-accepted label Oct 31, 2025

Uh oh!

bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility #536

Are you sure you want to change the base?

bug(csv): Add comprehensive test cases and fix bug for CRLF/LF line ending compatibility #536

Uh oh!

Conversation

kanishka1804 commented Oct 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Oct 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kamiazya commented Oct 21, 2025

Uh oh!

kanishka1804 commented Oct 23, 2025

Uh oh!

codecov bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

codspeed-hq bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #536 will not alter performance

Summary

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kamiazya commented Oct 25, 2025

Uh oh!

kamiazya commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kanishka1804 commented Oct 19, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Oct 19, 2025 •

edited

Loading

coderabbitai bot commented Oct 19, 2025 •

edited

Loading

codecov bot commented Oct 23, 2025 •

edited

Loading

codspeed-hq bot commented Oct 23, 2025 •

edited

Loading