perf: JavaScript parser performance improvements #614

kamiazya · 2025-11-30T02:27:46Z

Summary

This PR includes significant internal optimizations that improve JavaScript-based CSV parsing performance, along with important API improvements.

Performance Improvements

Metric	Before (v0.14)	After	Improvement
1,000 rows parsing	3.57 ms	1.77 ms	50% faster
5,000 rows parsing	19.47 ms	8.96 ms	54% faster
Throughput (1,000 rows)	23.8 MB/s	49.0 MB/s	2.06x
Throughput (5,000 rows)	24.0 MB/s	53.3 MB/s	2.22x

Final Performance Results (Pure JavaScript)

Format	Throughput
Object format (1,000 rows)	49.0 MB/s
Array format (1,000 rows)	89.6 MB/s
Object format (5,000 rows)	53.3 MB/s

Array format is approximately 46% faster than Object format for the same data.

Breaking Changes

1. Column Count Strategy Rename and New Default

⚠️ Breaking Change: The columnCountStrategy option has been updated:

Renamed pad to sparse: The old pad strategy has been renamed to sparse to better reflect its behavior of filling missing fields with undefined.
Added new fill strategy: A new strategy that fills missing fields with empty strings (""), which is now the default for both array and object formats.
Object format restrictions: The sparse strategy is now not allowed for object format because undefined values are not type-safe for objects. Using sparse with object format will throw a runtime error.
Default changed: The default columnCountStrategy is now fill instead of keep/pad.

Migration Guide

Before (v0.x):

const assembler = createCSVRecordAssembler({
  header: ["a", "b", "c"] as const,
  outputFormat: "array",
  columnCountStrategy: "pad", // fills with undefined
});

After (v1.x):

// For undefined behavior (array format only):
const assembler = createCSVRecordAssembler({
  header: ["a", "b", "c"] as const,
  outputFormat: "array",
  columnCountStrategy: "sparse", // fills with undefined
});

// For empty string behavior (both formats, default):
const assembler = createCSVRecordAssembler({
  header: ["a", "b", "c"] as const,
  outputFormat: "object", // or "array"
  columnCountStrategy: "fill", // fills with "" (default)
});

2. Lexer API Changes (Low-level API only)

These changes only affect users of the low-level Lexer API. High-level APIs (parseString, parseBinary, etc.) are unchanged.

Token type constants: Changed from Symbol to numeric constants
Location tracking: Now disabled by default. Add trackLocation: true to Lexer options if you need token location information. Note: Error messages still include position information even when trackLocation: false (computed lazily only when errors occur).
Struct of token objects: Changed to improve performance. Token properties changed and reduce tokens by combining delimiter and newline information into a field.

Who is affected?

Most users are NOT affected. Only users who directly use FlexibleStringCSVLexer and rely on token.location or Symbol-based token type comparison need to update their code.

Test plan

✅ All existing tests pass
✅ Performance benchmarks confirm improvements
✅ Type safety maintained

Summary by CodeRabbit

New Features
- New documentation for output formats and column-count strategies; optional location tracking for richer error messages (off by default).
Breaking Changes
- Object output now only accepts "fill" or "strict"; array output supports "sparse", "keep", "truncate". Low-level token/lexer internals changed and may affect advanced integrations.
Improvements
- Missing fields now default to empty strings in many flows, parse errors include line/column info, performance and memory-reuse optimizations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

changeset-bot · 2025-11-30T02:27:49Z

🦋 Changeset detected

Latest commit: 6926119

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

codspeed-hq · 2025-11-30T02:36:05Z

CodSpeed Performance Report

Merging #614 will improve performances by ×4.2

_{Comparing pref-js (6926119) with main (8adf5d9)}

Summary

⚡ 41 improvements
✅ 24 untouched
⏩ 8 skipped¹

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
⚡	Simulation	`Binary approach: parseBinary (1KB)`	463 µs	288.7 µs	+60.34%
⚡	Simulation	`Binary approach: parseBinary (1MB)`	369.9 ms	195.2 ms	+89.54%
⚡	Simulation	`Custom delimiter: PSV (100 rows)`	1,108.6 µs	637.6 µs	+73.88%
⚡	Simulation	`Custom delimiter: TSV (100 rows)`	1,103.9 µs	648.5 µs	+70.23%
⚡	Simulation	`Data transformation: no conversion (100 rows)`	2.6 ms	1.3 ms	+98.42%
⚡	Simulation	`Data transformation: type conversion (100 rows)`	3.2 ms	1.4 ms	×2.4
⚡	Simulation	`Engine comparison: mainThread (500 rows)`	30.9 ms	8.6 ms	×3.6
⚡	Simulation	`Engine comparison: mainThread (5000 rows)`	152.3 ms	90.9 ms	+67.45%
⚡	Simulation	`Field length: 10 chars (1000 rows)`	12.4 ms	7.2 ms	+72.55%
⚡	Simulation	`Line ending: CRLF (1000 rows)`	18.5 ms	5.5 ms	×3.4
⚡	Simulation	`Line ending: LF (1000 rows)`	9.9 ms	5.5 ms	+81.15%
⚡	Simulation	`Low-level: CSVLexer + CSVRecordAssembler (1000 rows)`	28.2 ms	15.1 ms	+86.91%
⚡	Simulation	`Low-level: CSVLexer only (1000 rows)`	17.6 ms	5.8 ms	×3
⚡	Simulation	`Memory: toArraySync (allocate all at once)`	29.3 ms	15.9 ms	+84.73%
⚡	Simulation	`Memory: toIterableIterator (streaming)`	28.1 ms	15.2 ms	+85.55%
⚡	Simulation	`Queuing: large (10000 rows) - default HWM`	2.3 s	1.3 s	+77.74%
⚡	Simulation	`Queuing: medium (1000 rows) - default HWM`	256.5 ms	133.4 ms	+92.35%
⚡	Simulation	`Quote ratio: 0% (1000 rows)`	12.2 ms	7.8 ms	+56.24%
⚡	Simulation	`Quote ratio: 25% (1000 rows)`	13.2 ms	8 ms	+64.06%
⚡	Simulation	`Quote ratio: 50% (1000 rows)`	12.7 ms	8 ms	+58.01%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

8 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2025-11-30T02:47:51Z

Walkthrough

Unified token model (Delimiter enum + AnyToken) with optional per-token location tracking; expanded columnCountStrategy options and enforced object-format restrictions (only "fill"|"strict"); lexer, assemblers, parsers, tests, docs, benchmarks, and a ReusableArrayPool utility were updated.

Changes

Cohort / File(s)	Summary
Changelogs & Docs `/.changeset/column-count-strategy-rename.md`, `/.changeset/lexer-api-changes.md`, `/.changeset/performance-improvements.md`, `docs/reference/column-count-strategy-guide.md`, `docs/reference/output-format-guide.md`	Add changelog entries and new docs describing columnCountStrategy matrix, outputFormat guidance, lexer API changes, and JS parser performance metrics.
Core Constants & Types `src/core/constants.ts`, `src/core/types.ts`, `src/core/types.test-d.ts`	Replace symbol token constants with exported `Delimiter` enum; introduce unified token types (`Token<TrackLocation>`, `AnyToken`, BaseToken variants), TrackLocation generics/options, and tighten object-format strategy types to `"fill"
Lexer Implementation & Factory `src/parser/models/FlexibleStringCSVLexer.ts`, `src/parser/api/model/createStringCSVLexer.ts`, `src/parser/models/FlexibleStringCSVLexer.spec.ts`	Lexer now emits unified tokens `{ value, delimiter, delimiterLength }` with optional location tracking (TrackLocation generic); factory signatures updated to return `StringCSVLexer<TrackLocation>`; tests adapted.
Assembler Implementations & Factory `src/parser/models/FlexibleCSVObjectRecordAssembler.ts`, `src/parser/models/FlexibleCSVArrayRecordAssembler.ts`, `src/parser/models/FlexibleCSVRecordAssembler.ts`, `src/parser/api/model/createCSVRecordAssembler.ts`, `src/parser/api/model/createCSVRecordAssembler.test.ts`	Assemblers consume `AnyToken` streams and operate by `delimiter`; add buffering and `ReusableArrayPool`; implement strategies (`fill`, `sparse`, `keep`, `strict`, `truncate`) and perform runtime validation rejecting incompatible strategies for object output.
Parser Class Generics `src/parser/models/Flexible{String,Binary}{Array,Object}CSVParser.ts`, `src/parser/models/FlexibleBinaryArrayCSVParser.ts`, `src/parser/models/FlexibleBinaryObjectCSVParser.ts`	Convert `Header` generic parameters to `const` generics to preserve literal header inference (type-only signature changes).
Tests — Lexer / Assembler / Parser / Streams `src/parser/models/*/`, `src/parser/api/*/`, `src/parser/stream/*`	Migrate tests to new token shape (`AnyToken` + `Delimiter`), relax many strict equality assertions (`toStrictEqual` → `toEqual`), update error snapshots to include location info, and adjust expectations for new columnCountStrategy semantics (missing-field `""` vs `undefined` per strategy). Several legacy test suites removed.
Utilities & Memory Pool `src/utils/memory/ReusableArrayPool.ts`	Add `ReusableArrayPool<T>` utility for pooling arrays with `take` and `release` APIs.
Vitest & Config `config/vitest.setup.ts`, `vite.config.ts`, `package.json`	Set fast-check `endOnFailure: true`; add coverage reporters/exclusions and add devDependency `@vitest/coverage-v8`.
Benchmarks & Scripts `benchmark/package.json`	Update benchmark scripts to use `node --import tsx`, add `quick`, `unified`, and profiling scripts.

Sequence Diagram(s)

sequenceDiagram
    participant Source as CSV source
    participant Lexer as FlexibleStringCSVLexer
    participant Transformer as Lexer→Transformer
    participant Assembler as FlexibleCSV{Object|Array}RecordAssembler
    participant Consumer as Output

    Note over Source,Lexer: streaming CSV chunks
    Source->>Lexer: push chunk(s)
    Lexer->>Lexer: parse into Token<TrackLocation> {value, delimiter, delimiterLength[, location]}
    Lexer->>Transformer: emit Token(s)
    Transformer->>Assembler: forward AnyToken stream
    Assembler->>Assembler: buffer fields, apply columnCountStrategy (fill/sparse/keep/strict/truncate)
    Assembler->>Consumer: emit record (object or array)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas needing extra attention:

src/parser/models/FlexibleStringCSVLexer.ts — lexer rewrite, TrackLocation generics, pooling, token-shape and error-location changes
src/parser/models/FlexibleCSVObjectRecordAssembler.ts & FlexibleCSVArrayRecordAssembler.ts — assembly strategies, buffer/pool logic, header and runtime validation
src/core/types.ts — large public type-surface changes (Token/AnyToken, TrackLocation, strategy constraints)
Widespread test updates and removed suites — ensure expectations and deletions are intentional

Possibly related PRs

feat!: add array output format support for CSV parsing #600 — overlapping changes to columnCountStrategy semantics and assembler/lexer behavior.
Refactor CSV token handling #240 — refactor from per-token symbols to a unified token/delimiter model.
refactor: optimize streaming API to emit individual tokens #566 — unified token streaming model and related lexer/token changes.

Suggested labels

enhancement, javascript

Suggested reviewers

nagasawaryoya

Poem

I hop through tokens, soft and bright,
Delimiters guide each field in flight,
I stash arrays in cozy pools,
Track locations, follow rules,
A rabbit cheers these tidy bytes 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly summarizes the main objective: JavaScript parser performance improvements through optimizations and API changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pref-js

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4395ef1 and 6926119.

📒 Files selected for processing (1)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (5 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: :0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user `kamiazya` has confirmed that the changes made in PR #32 are correct and has requested the resolution of all review comments.

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parser/models/FlexibleCSVRecordAssembler.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 53
File: .github/workflows/ci.yaml:23-39
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user has clarified that in the `package.json` of the `web-csv-toolbox` project, the `test` script is defined as `vitest`, which means that the command `npm test run` correctly expands to `vitest run`.

Applied to files:

src/parser/models/FlexibleCSVRecordAssembler.test.ts

🧬 Code graph analysis (1)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (3)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (2)

RangeError (192-200)

header (202-244)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

RangeError (185-193)

header (195-243)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build / Build WASM

🔇 Additional comments (1)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (1)

1-1554: LGTM! Past review comments addressed, comprehensive test coverage.

All three issues from the previous review have been successfully resolved:

The comment at lines 1192-1193 now accurately describes the unified token format

The duplicate assemble() call in the "exactly N fields" test (lines 1271-1296) has been removed

The Infinity test (line 1341) now uses .not.toThrow() without specifying a specific error type

The test suite is comprehensive and well-organized, covering:

Constructor validation and AbortSignal handling

Array and object output formats with all columnCountStrategy options

Headerless mode scenarios and validation

Prototype pollution safety with dangerous property names

Field count limit protection against DoS attacks

Detailed error message validation

Token construction throughout the file correctly uses the unified token model with Delimiter enum and AnyToken type.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-30T02:47:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.11%. Comparing base (8adf5d9) to head (6926119).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #614   +/-   ##
=======================================
  Coverage   94.11%   94.11%           
=======================================
  Files           1        1           
  Lines         187      187           
=======================================
  Hits          176      176           
  Misses         11       11

Flag	Coverage Δ
rust	`94.11% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Token type constants: Changed from Symbol to numeric constants for faster comparison - Location tracking: Now disabled by default (add trackLocation: true for token location info) - Lazy position tracking: Error messages include position info even when trackLocation: false - Unified token structure: Reduced token count by combining delimiter info into field tokens - Object.create(null) for records: ~3.6x faster object creation - Non-destructive buffer reading: Reduced GC pressure by 46% - Regex removal for unquoted fields: ~15% faster parsing - String comparison optimization: ~10% faster - Quoted field parsing optimization: Eliminated overhead Benchmark results: - 1,000 rows: 3.57 ms → 1.77 ms (50% faster) - 5,000 rows: 19.47 ms → 8.96 ms (54% faster) - Throughput: 23.8 MB/s → 49.0 MB/s (2.06x improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

codecov · 2025-11-30T04:46:48Z

Bundle Report

Changes will increase total bundle size by 103.85kB (9.08%) ⬆️⚠️, exceeding the configured threshold of 5%.

Bundle name	Size	Change
example-vite-bundle-slim-esm	83.09kB	1.16kB (1.42%) ⬆️
web-csv-toolbox-CSV-esm	278.4kB	12.78kB (4.81%) ⬆️
example-webpack-bundle-worker-main-array-push	177.12kB	21.75kB (14.0%) ⬆️⚠️
example-vite-bundle-worker-slim-esm	251.31kB	23.01kB (10.08%) ⬆️⚠️
example-vite-bundle-main-esm	107.16kB	1.55kB (1.47%) ⬆️
example-vite-bundle-worker-main-esm	173.22kB	21.85kB (14.43%) ⬆️⚠️
example-webpack-bundle-worker-slim-array-push	177.17kB	21.75kB (13.99%) ⬆️⚠️

Affected Assets, Files, and Routes:

view changes for bundle: example-vite-bundle-worker-main-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/main-*.js`	11.65kB	143.3kB	8.85% ⚠️
`assets/worker.web.bundle-*.js`	10.2kB	29.92kB	51.71% ⚠️

view changes for bundle: example-webpack-bundle-worker-main-array-push

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`main.js`	11.64kB	144.84kB	8.74% ⚠️
`worker.web.bundle.js`	10.11kB	29.62kB	51.78% ⚠️

view changes for bundle: example-vite-bundle-worker-slim-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/main-*.js`	11.65kB	145.33kB	8.71% ⚠️
`assets/csv-*.wasm`	1.16kB	76.06kB	1.55%
`assets/worker.slim.web.bundle-*.js`	10.2kB	29.91kB	51.73% ⚠️

view changes for bundle: example-webpack-bundle-worker-slim-array-push

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`main.js`	11.64kB	144.84kB	8.74% ⚠️
`worker.slim.web.bundle.js`	10.11kB	29.62kB	51.79% ⚠️

view changes for bundle: web-csv-toolbox-CSV-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`_virtual/web_csv_toolbox_wasm_bg.shared.wasm.js`	1.55kB	101.45kB	1.55%
`csv.wasm`	1.16kB	76.06kB	1.55%
`parser/models/FlexibleStringCSVLexer.js`	7.79kB	10.85kB	255.26% ⚠️
`parser/models/FlexibleCSVObjectRecordAssembler.js`	1.04kB	4.15kB	33.44% ⚠️
`parser/models/FlexibleCSVArrayRecordAssembler.js`	731 bytes	3.79kB	23.86% ⚠️
`node/main.node.js`	-27 bytes	3.48kB	-0.77%
`main.web.js`	-27 bytes	3.43kB	-0.78%
`node/slim.node.js`	-27 bytes	3.35kB	-0.8%
`slim.web.js`	-27 bytes	3.3kB	-0.81%
`parser/api/model/createCSVRecordAssembler.js`	500 bytes	1.47kB	51.81% ⚠️
`core/constants.js`	-89 bytes	547 bytes	-13.99%
`utils/memory/ReusableArrayPool.js` (New)	194 bytes	194 bytes	100.0% 🚀

Files in parser/api/model/createCSVRecordAssembler.js:

./src/parser/api/model/createCSVRecordAssembler.ts → Total Size: 1.81kB

Files in core/constants.js:

./src/core/constants.ts → Total Size: 689 bytes

view changes for bundle: example-vite-bundle-slim-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/csv-*.wasm`	1.16kB	76.06kB	1.55%

view changes for bundle: example-vite-bundle-main-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`bundle.js`	1.55kB	107.16kB	1.47%

Update createStringCSVLexerTransformer.spec.ts and createCSVRecordAssemblerTransformer.spec.ts to use the new unified token structure with Delimiter enum instead of separate Field/FieldDelimiter/RecordDelimiter token types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Missing CSV fields now return empty string instead of undefined. Updated test expectations in: - StringCSVParserStream.test.ts - BinaryCSVParserStream.test.ts - parseBinaryToIterableIterator.test.ts - parse.spec.ts

Object.create(null) records don't match toStrictEqual due to prototype differences. Changed to toEqual to properly compare object values. Updated test files: - parseResponse.browser.spec.ts - parseBinaryStream.browser.spec.ts - parseBinary.browser.spec.ts

Stream tests were hanging in CI and locally when run in parallel. Removed .concurrent from describe and it blocks in: - StringCSVParserStream.test.ts - BinaryCSVParserStream.test.ts

Removed unnecessary aliases for describe and it. Directly importing from vitest is cleaner and more standard.

Applied biome formatting to: - src/parser/stream/StringCSVParserStream.test.ts - src/parser/stream/BinaryCSVParserStream.test.ts

- Implemented ReusableArrayPool to optimize memory usage for string segments in FlexibleStringCSVLexer. - Updated FlexibleStringCSVLexer to utilize the new array pool for segment management, improving performance in parsing. - Adjusted token handling to ensure correct delimiter usage, replacing EOF with Record where appropriate. - Modified tests in CSVRecordAssemblerTransformer and StringCSVLexerTransformer to reflect changes in token structure and delimiter handling. - Cleaned up test cases for better readability and consistency.

CSVRecordView機能を削除し、過剰な最適化を切り戻しました。 object形式のcolumnCountStrategy制限（fill/strictのみ）は維持されています。 - FlexibleCSVRecordViewAssembler関連のファイルを削除 - types.tsからCSVRecordView型と'record-view' outputFormatを削除 - FlexibleCSVObjectRecordAssemblerからrecord-view機能を削除 - 関連するテストとドキュメントを更新 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

mainブランチからconst type parameters機能を統合しました。 record-view削除の変更とマージし、以下を採用： - const type parametersによる型推論の改善 - CSVOutputFormat型エイリアスの導入 - 関数オーバーロードによる正確な型推論 - as const不要のユーザビリティ向上コンフリクトはmainの開発意図を尊重して解決： - CSVOutputFormat型エイリアスを採用 - CSVRecord型のStrategyデフォルト値をkeepに変更 - optionsのデフォルト値提供 (?? {}) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

プロジェクト全体でconst type parametersを一貫して適用しました。変更内容： - FlexibleCSVArrayRecordAssembler - FlexibleCSVObjectRecordAssembler - FlexibleStringArrayCSVParser - FlexibleStringObjectCSVParser - FlexibleBinaryArrayCSVParser - FlexibleBinaryObjectCSVParser すべてのクラスで`Header extends ReadonlyArray<string>`を `const Header extends ReadonlyArray<string>`に変更。これにより、ユーザーは`as const`を書かなくてもリテラル型が正しく推論されるようになりました： Before: header: ['name', 'age'] as const After: header: ['name', 'age'] テスト：1341個すべて成功 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

kamiazya · 2025-12-02T15:28:04Z

@coderabbitai full review

coderabbitai · 2025-12-02T15:28:11Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/parser/api/network/parseResponse.spec.ts (1)
7-7: Minor: Incorrect describe block name.

The describe block says "parseRequest function" but this file tests parseResponse. This appears to be a copy-paste error.
-describe("parseRequest function", () => {
+describe("parseResponse function", () => {

🧹 Nitpick comments (13)

docs/reference/column-count-strategy-guide.md (1)
40-44: Optional: Consider varying sentence structure for improved readability.

The static analysis tool noted that lines 40-44 use repetitive sentence starters ("Need..."). While the parallel structure works well for a decision guide, you might consider minor variation if desired, such as:
 1. **Need strict schema enforcement?** Use `strict`.
 2. **Want consistent string values?** Use `fill` (object default).
 3. **Working with ragged rows / headerless CSV?** Use `keep` (array output).
However, the current parallel structure is perfectly acceptable and may even enhance scannability.
src/parser/stream/StringCSVLexerTransformer.ts (1)
135-137: Consider removing the dead comment.

The commented line // undefined, // writableStrategy inside the super() call appears to be a remnant from debugging or a previous implementation. It doesn't provide documentation value and could confuse future maintainers.
       },
-      // undefined, // writableStrategy
       writableStrategy,
       readableStrategy,
     );
src/parser/models/FlexibleStringArrayCSVParser.ts (1)

34-38: Const generic for Header improves type inference; confirm TS version support

Using a const Header extends ReadonlyArray<string> generic here is a nice win for preserving literal header types in user code without changing runtime behavior. Just ensure your minimum supported TypeScript version includes const generics so consumers don’t hit compiler errors.

You may want to confirm in your toolchain/docs that the configured TypeScript version in this repo (and for library consumers, if applicable) supports const generics.

src/parser/models/FlexibleCSVRecordAssembler.spec.ts (1)

8-20: Consider extracting LOCATION_SHAPE to a shared test helper.

This exact shape matcher is duplicated in CSVRecordAssemblerTransformer.spec.ts. Extracting it to @/__tests__/helper.ts would reduce duplication and ensure consistency across tests.
src/parser/stream/StringCSVLexerTransformer.spec.ts (3)
10-11: Debug configuration leftover: concurrent execution disabled.

These lines assign describe_ and it_ directly without .concurrent, which differs from other test files in the PR (e.g., createCSVRecordAssemblerTransformer.spec.ts uses describe_.concurrent). This may have been intended for debugging but slows down test execution.

If concurrent execution was intentional before, consider restoring it:
-const describe = describe_;
-const it = it_;
+const describe = describe_.concurrent;
+const it = it_.concurrent;
45-46: Debug configuration leftover: reduced test runs.

The comment explicitly states this is for debugging. Property-based tests with only 10 runs provide weak coverage guarantees. Consider removing or increasing this for production.
-      { numRuns: 10 }, // Reduce runs to debug
+      // Use default numRuns for thorough property testing
74-75: Debug configuration leftover: reduced test runs.

Same issue as Line 45 - only 10 runs for property-based testing is insufficient for production.
-      { numRuns: 10 },
src/parser/models/FlexibleStringCSVLexer.spec.ts (1)
204-204: Unused variable _isLastRow.

This variable is declared but never used. The underscore prefix suggests it was intentionally kept for future use or documentation, but if not needed, it should be removed to avoid confusion.
-              const _isLastRow = i === data.length - 1;
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (1)

85-97: Consider removing commented-out code.

The commented-out examples block appears to be debug/reference code. If it's not needed for test documentation, consider removing it to keep the test file clean.
src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)
231-242: Dead code: #assembleRecordTruncate is pre-bound but never reachable.

The constructor rejects "truncate" strategy at lines 65-73, so the case "truncate" branch in #setHeader and the #assembleRecordTruncate method (lines 327-341) are unreachable dead code.

Consider removing the dead code path or, if you intentionally keep it for future use or code symmetry with the array assembler, add a comment explaining why.
     // Optimization: Pre-bind strategy function based on columnCountStrategy
     switch (this.#columnCountStrategy) {
       case "strict":
         this.#assembleRecordFn = this.#assembleRecordStrict;
         break;
-      case "truncate":
-        this.#assembleRecordFn = this.#assembleRecordTruncate;
-        break;
       default:
         this.#assembleRecordFn = this.#assembleRecordFill;
         break;
     }
And remove lines 323-341 (#assembleRecordTruncate method).

151-157: Minor: Redundant #rowLength update.

Line 151 updates #rowLength with Math.max(this.#rowLength, this.#fieldIndex + 1), and line 156 does the same after incrementing #fieldIndex. The second update at line 156 is redundant since #fieldIndex was just incremented on line 154, making the new value equal to the already-computed value from line 151 plus 1.
     this.#rowLength = Math.max(this.#rowLength, this.#fieldIndex + 1);
 
     if (token.delimiter === Delimiter.Field) {
       this.#fieldIndex++;
       this.#checkFieldCount();
-      this.#rowLength = Math.max(this.#rowLength, this.#fieldIndex + 1);
       return;
     }
The #rowLength will be correctly set at line 151 before the field index increment, and when the record completes, line 160 recalculates it anyway.
src/parser/models/FlexibleStringCSVLexer.ts (1)
373-384: Variable shadowing of nextCode.

Line 375 declares a new const nextCode that shadows the outer let nextCode (line 316). While this is correct behavior (the inner one is only used within this CRLF check block), consider renaming it to improve readability, e.g., const crlfNextCode.
           // Record delimiter (CRLF)
           if (delimCode === CR) {
-            const nextCode = buf.charCodeAt(delimPos + 1);
-            if (Number.isNaN(nextCode)) {
+            const lfCheck = buf.charCodeAt(delimPos + 1);
+            if (Number.isNaN(lfCheck)) {
               if (!this.#flush) {
                 return null;
               }
-            } else if (nextCode === LF_CODE) {
+            } else if (lfCheck === LF_CODE) {
               this.#bufferOffset = delimPos + 2;
               return { value, delimiter: Delimiter.Record, delimiterLength: 2 };
             }
           }
src/parser/models/FlexibleCSVArrayRecordAssembler.ts (1)

155-163: Clarify #rowLength tracking logic.

Line 155 and line 162 both update #rowLength. Line 162 (after incrementing #fieldIndex) pre-allocates for the next expected field, but this might be over-counting since no value has been stored at that position yet.

Consider consolidating this logic or adding a comment explaining why both updates are needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8adf5d9 and e300438.

📒 Files selected for processing (60)

.changeset/column-count-strategy-rename.md (1 hunks)
.changeset/lexer-api-changes.md (1 hunks)
.changeset/performance-improvements.md (1 hunks)
benchmark/package.json (2 hunks)
config/vitest.setup.ts (1 hunks)
docs/reference/column-count-strategy-guide.md (1 hunks)
docs/reference/output-format-guide.md (1 hunks)
src/core/constants.ts (1 hunks)
src/core/types.test-d.ts (2 hunks)
src/core/types.ts (26 hunks)
src/parser/api/binary/parseBinary.browser.spec.ts (1 hunks)
src/parser/api/binary/parseBinary.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryStream.browser.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryStream.spec.ts (5 hunks)
src/parser/api/binary/parseBinaryStreamToStream.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryToIterableIterator.test.ts (1 hunks)
src/parser/api/file/parseBlob.spec.ts (5 hunks)
src/parser/api/file/parseFile.spec.ts (2 hunks)
src/parser/api/file/parseFileToArray.spec.ts (6 hunks)
src/parser/api/file/parseFileToStream.spec.ts (3 hunks)
src/parser/api/model/createCSVRecordAssembler.test.ts (1 hunks)
src/parser/api/model/createCSVRecordAssembler.ts (1 hunks)
src/parser/api/model/createStringCSVLexer.ts (2 hunks)
src/parser/api/network/parseRequest.spec.ts (3 hunks)
src/parser/api/network/parseResponse.browser.spec.ts (1 hunks)
src/parser/api/network/parseResponse.spec.ts (1 hunks)
src/parser/api/network/parseResponseToStream.spec.ts (1 hunks)
src/parser/api/parse.spec.ts (1 hunks)
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts (5 hunks)
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (5 hunks)
src/parser/api/string/parseString.spec.ts (1 hunks)
src/parser/api/string/parseStringStream.spec.ts (1 hunks)
src/parser/api/string/parseStringStreamToStream.spec.ts (1 hunks)
src/parser/api/string/parseStringToArraySync.spec.ts (1 hunks)
src/parser/api/string/parseStringToIterableIterator.spec.ts (1 hunks)
src/parser/api/string/parseStringToStream.spec.ts (1 hunks)
src/parser/models/FlexibleBinaryArrayCSVParser.ts (1 hunks)
src/parser/models/FlexibleBinaryCSVParser.test.ts (4 hunks)
src/parser/models/FlexibleBinaryObjectCSVParser.ts (1 hunks)
src/parser/models/FlexibleCSVArrayRecordAssembler.ts (5 hunks)
src/parser/models/FlexibleCSVObjectRecordAssembler.ts (4 hunks)
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (7 hunks)
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (10 hunks)
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (11 hunks)
src/parser/models/FlexibleCSVRecordAssembler.spec.ts (3 hunks)
src/parser/models/FlexibleCSVRecordAssembler.test.ts (4 hunks)
src/parser/models/FlexibleStringArrayCSVParser.ts (1 hunks)
src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts (2 hunks)
src/parser/models/FlexibleStringCSVLexer.spec.ts (7 hunks)
src/parser/models/FlexibleStringCSVLexer.test.ts (12 hunks)
src/parser/models/FlexibleStringCSVLexer.ts (6 hunks)
src/parser/models/FlexibleStringCSVParser.test.ts (4 hunks)
src/parser/models/FlexibleStringObjectCSVParser.ts (1 hunks)
src/parser/stream/BinaryCSVParserStream.test.ts (3 hunks)
src/parser/stream/CSVRecordAssemblerTransformer.spec.ts (4 hunks)
src/parser/stream/StringCSVLexerTransformer.spec.ts (6 hunks)
src/parser/stream/StringCSVLexerTransformer.test.ts (1 hunks)
src/parser/stream/StringCSVLexerTransformer.ts (1 hunks)
src/parser/stream/StringCSVParserStream.test.ts (2 hunks)
src/utils/memory/ReusableArrayPool.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parser/api/binary/parseBinary.browser.spec.ts
src/parser/api/network/parseResponseToStream.spec.ts
.changeset/performance-improvements.md
src/parser/api/string/parseString.spec.ts
src/parser/api/file/parseBlob.spec.ts
src/parser/stream/StringCSVLexerTransformer.ts
src/parser/api/file/parseFileToArray.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.test.ts
src/parser/stream/CSVRecordAssemblerTransformer.spec.ts
src/parser/stream/StringCSVParserStream.test.ts
src/parser/api/binary/parseBinaryStream.spec.ts
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts
src/parser/api/model/createCSVRecordAssembler.ts
src/parser/models/FlexibleCSVObjectRecordAssembler.ts
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts
src/core/constants.ts
src/parser/stream/BinaryCSVParserStream.test.ts
src/parser/stream/StringCSVLexerTransformer.spec.ts
src/parser/models/FlexibleStringArrayCSVParser.ts
src/parser/api/string/parseStringToStream.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.spec.ts
src/parser/models/FlexibleStringCSVLexer.spec.ts
src/parser/api/model/createCSVRecordAssembler.test.ts
src/parser/api/file/parseFileToStream.spec.ts
src/parser/api/file/parseFile.spec.ts
src/parser/models/FlexibleStringObjectCSVParser.ts
src/parser/models/FlexibleStringCSVLexer.test.ts
src/parser/api/binary/parseBinaryToIterableIterator.test.ts
src/parser/api/binary/parseBinary.spec.ts
src/parser/models/FlexibleBinaryObjectCSVParser.ts
src/core/types.test-d.ts
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts
src/parser/api/binary/parseBinaryStream.browser.spec.ts
src/parser/models/FlexibleStringCSVParser.test.ts
src/parser/models/FlexibleCSVArrayRecordAssembler.ts
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts
src/parser/api/parse.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts
src/core/types.ts
src/parser/models/FlexibleBinaryCSVParser.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/parser/stream/StringCSVLexerTransformer.test.ts
src/parser/stream/StringCSVLexerTransformer.ts
src/parser/stream/CSVRecordAssemblerTransformer.spec.ts
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts
src/parser/stream/StringCSVLexerTransformer.spec.ts
src/parser/models/FlexibleStringCSVLexer.spec.ts
src/parser/models/FlexibleStringCSVLexer.test.ts
.changeset/lexer-api-changes.md
src/parser/models/FlexibleStringCSVLexer.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 53
File: .github/workflows/ci.yaml:23-39
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user has clarified that in the `package.json` of the `web-csv-toolbox` project, the `test` script is defined as `vitest`, which means that the command `npm test run` correctly expands to `vitest run`.

Applied to files:

src/parser/models/FlexibleCSVRecordAssembler.test.ts
src/parser/stream/StringCSVParserStream.test.ts
src/parser/stream/BinaryCSVParserStream.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/internal/parseStringStreamToStream.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The typo in the class name `RecordAssemblerTransformar` was fixed by the user `kamiazya` in commit `f065e07`.

Applied to files:

src/parser/stream/CSVRecordAssemblerTransformer.spec.ts

🧬 Code graph analysis (17)

src/parser/api/file/parseBlob.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/model/createStringCSVLexer.ts (2)

src/core/types.ts (2)

StringCSVLexerOptions (567-574)

StringCSVLexer (2082-2093)

src/parser/models/FlexibleStringCSVLexer.ts (1)

FlexibleStringCSVLexer (38-1196)

src/parser/stream/CSVRecordAssemblerTransformer.spec.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

src/core/types.ts (3)

ColumnCountStrategy (1579-1584)

CSVObjectRecord (1672-1677)

AnyToken (165-165)

src/utils/memory/ReusableArrayPool.ts (1)

ReusableArrayPool (6-24)

src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (2)

src/parser/api/stream/createStringCSVLexerTransformer.ts (1)

createStringCSVLexerTransformer (75-91)

src/__tests__/helper.ts (3)

autoChunk (294-309)

delimiter (177-192)

eol (242-244)

src/parser/api/network/parseResponse.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/parser/stream/StringCSVLexerTransformer.spec.ts (2)

src/__tests__/helper.ts (1)

autoChunk (294-309)

src/core/types.ts (1)

TokenNoLocation (114-114)

src/parser/models/FlexibleCSVRecordAssembler.spec.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/model/createCSVRecordAssembler.test.ts (3)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (1)

value (129-131)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/core/types.test-d.ts (1)

src/core/types.ts (1)

CSVRecordAssemblerOptions (669-782)

src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (1)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/models/FlexibleStringCSVParser.test.ts (2)

src/parser/models/FlexibleStringArrayCSVParser.ts (1)

FlexibleStringArrayCSVParser (34-46)

src/parser/models/FlexibleStringObjectCSVParser.ts (1)

FlexibleStringObjectCSVParser (34-46)

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (1)

src/core/types.ts (3)

ColumnCountStrategy (1579-1584)

CSVArrayRecord (1731-1738)

AnyToken (165-165)

src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (4)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (1)

RangeError (192-200)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (1)

RangeError (185-193)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/models/FlexibleBinaryCSVParser.test.ts (2)

src/parser/models/FlexibleBinaryArrayCSVParser.ts (1)

FlexibleBinaryArrayCSVParser (38-50)

src/parser/models/FlexibleBinaryObjectCSVParser.ts (1)

FlexibleBinaryObjectCSVParser (38-50)

🪛 LanguageTool

docs/reference/column-count-strategy-guide.md

[style] ~42-~42: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...es?** Use fill (object default). 3. Need ragged rows / headerless CSV? Use `ke...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... CSV?** Use keep (array output). 4. Need to ignore trailing columns? Use `trun...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ncate(array output with header). 5. **Need optional columns?** Usesparse` (array...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (111)

src/parser/api/parse.spec.ts (1)

63-69: Test correctly validates the new fill strategy default behavior.

The updated test case accurately reflects that empty lines now produce records with all fields filled as empty strings, consistent with the new default fill strategy for missing fields.

src/parser/stream/StringCSVLexerTransformer.test.ts (1)

48-48: LGTM! Enhanced error message with location details.

The updated error message snapshot now includes precise position information ("at line 1, column 1."), which improves debugging experience. This aligns with the PR's location tracking enhancements where position data is computed lazily when errors occur.

src/parser/api/binary/parseBinaryStream.browser.spec.ts (1)

60-60: LGTM! Appropriate assertion for parsed data.

The switch from toStrictEqual to toEqual is suitable for comparing parsed CSV records. Deep equality checking is sufficient here, and this change makes the tests less brittle while maintaining correctness validation.

docs/reference/output-format-guide.md (1)

1-47: Excellent documentation! Clear and comprehensive guide.

This new documentation provides valuable guidance on choosing between object and array output formats. The comparison table, detailed explanations, and code examples make it easy for users to understand the trade-offs and select the appropriate format for their use case.

src/parser/api/binary/parseBinaryStream.spec.ts (2)

56-56: LGTM! Consistent test assertion improvements.

The switch from toStrictEqual to toEqual across these test cases is appropriate for validating parsed CSV data. This pattern is consistent with similar changes throughout the test suite.

Also applies to: 72-72, 86-86, 151-151

171-171: LGTM! Enhanced error message validation.

The updated error snapshot now includes precise position information ("at line 2, column 1."), improving error message quality and debugging experience.

docs/reference/column-count-strategy-guide.md (1)

1-46: Excellent comprehensive guide for column count strategies.

This documentation clearly explains the various column count handling strategies and their compatibility with different output formats. The compatibility matrix and detailed strategy descriptions make it easy for users to choose the right approach.

.changeset/lexer-api-changes.md (1)

1-18: LGTM! Clear and well-structured changelog.

This changelog effectively communicates the breaking changes to the low-level Lexer API. It clearly identifies who is affected (direct users of FlexibleStringCSVLexer) and what they need to update, while reassuring most users that high-level APIs remain unchanged. The note about error messages still including position information is particularly helpful.

benchmark/package.json (2)

7-8: LGTM! Modernized script execution.

The switch from tsx to node --import tsx uses Node's native loader API, which is the recommended modern approach for TypeScript execution with better control and performance characteristics.

9-13: LGTM! Enhanced benchmark and profiling capabilities.

The new scripts provide valuable tools for performance analysis:

quick and unified for focused benchmarking

CPU profiling with profile:cpu

Memory profiling with flexible GC options

These additions align well with the PR's performance optimization objectives.

src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (3)

2-3: LGTM! Updated imports for unified token model.

The imports correctly reflect the new token model, replacing the old Token type with AnyToken and introducing the Delimiter enum from constants. This aligns with the PR's migration to a unified token representation.

14-39: LGTM! Token construction updated to unified format.

The test now uses the unified token format with delimiter and delimiterLength fields instead of separate token type objects. The migration maintains test logic while adopting the more efficient token representation.

42-90: LGTM! Comprehensive migration to unified token format.

All test cases have been consistently updated to use the new token structure with Delimiter enum values. The token construction pattern is applied uniformly throughout:

Delimiter.Field for field boundaries

Delimiter.Record for record boundaries

delimiterLength property added

The test logic and assertions remain intact, validating that the field count limit protection works correctly with the new token model.

Also applies to: 92-164, 184-242, 244-370

src/parser/models/FlexibleStringCSVParser.test.ts (4)

98-112: LGTM! Test correctly validates sparse strategy for array format.

The test properly validates that with columnCountStrategy: "sparse", missing fields are filled with undefined rather than empty strings. This aligns with the PR's breaking change where pad was renamed to sparse.

114-132: LGTM! Test correctly distinguishes empty vs missing fields.

Good test coverage for edge case behavior: empty field ("Bob,") yields "" while missing field ("Charlie") yields undefined with the sparse strategy.

167-177: LGTM! Test validates default fill strategy for object format.

The test correctly verifies that in object format with the default fill strategy, both empty and missing fields are filled with "". The comments clearly document the distinction.

244-253: LGTM! Test validates runtime rejection of truncate strategy for object format.

The test correctly verifies that using columnCountStrategy: "truncate" with object format throws an error at construction time. The error message pattern /columnCountStrategy 'truncate' is not allowed for object format/ is consistent with the implementation in createCSVRecordAssembler(), which validates the strategy during BaseStringCSVParser instantiation.

src/parser/api/network/parseResponse.spec.ts (1)

90-90: LGTM! Assertion relaxed from strict to deep equality.

The change from toStrictEqual to toEqual is consistent with the broader test suite updates in this PR. Deep equality is sufficient for comparing parsed CSV row objects.

src/parser/api/binary/parseBinary.spec.ts (1)

54-59: LGTM! Consistent assertion update.

The change from toStrictEqual to toEqual aligns with the broader test suite updates in this PR. The property-based test structure remains sound.

src/parser/api/binary/parseBinary.browser.spec.ts (1)

63-68: LGTM! Consistent assertion update across execution strategies.

The change from toStrictEqual to toEqual is consistent with the broader test suite updates. The test correctly handles multiple execution strategies.

src/parser/api/string/parseStringStreamToStream.spec.ts (1)

47-49: LGTM! Enhanced error message with location details.

The updated error message now includes line and column information (at line 2, column 1), which improves debugging experience. The location is correct: the unclosed quote appears at line 2, column 1 in the input 'a\n"'.

src/parser/api/network/parseResponseToStream.spec.ts (1)

97-101: Relaxing to toEqual is appropriate here

Using toEqual for comparing parsed rows to the expected objects is sufficient and aligns with the broader test-suite shift away from toStrictEqual. The property-based test logic and indexing remain sound.

src/parser/api/string/parseStringToArraySync.spec.ts (1)

18-22: Snapshot update matches location-aware ParseError

Including at line 2, column 1. in the snapshot reflects the new location-aware error reporting and is consistent with the invalid CSV input used in this test.

src/parser/api/network/parseResponse.browser.spec.ts (1)

71-75: Use of toEqual for row comparison looks good

Switching to toEqual for comparing data[i++] with each parsed row is consistent with other tests in this PR and robust against minor structural/prototype differences in the returned records.

src/parser/models/FlexibleBinaryArrayCSVParser.ts (1)

38-46: Const generic for Header improves typing; confirm TS version support

Using a const type parameter for Header is a nice upgrade that preserves literal header tuples (e.g., ['name', 'age'] as const) through the parser types and is consistent with the documented usage example.

Please double-check that:

Your configured TypeScript version (and published types for consumers) supports const type parameters.

The minimum supported TS version / peer dependency range is updated if needed to reflect this requirement.

src/parser/api/file/parseFile.spec.ts (1)

13-16: Consistent use of toEqual for parsed records

Updating both the streaming loop and toArray assertion to use toEqual keeps the tests aligned with the rest of the suite while still ensuring deep equality against the expected objects.

Also applies to: 27-29

src/parser/api/string/parseStringToStream.spec.ts (1)

24-35: Error snapshot now correctly asserts location information

The updated inline snapshot for the rejected Promise includes at line 2, column 1., matching the new location-aware ParseError behavior for invalid CSV streams.

src/parser/api/string/parseString.spec.ts (1)

46-53: Location-aware ParseError snapshot is aligned with new diagnostics

The snapshot now expecting Unexpected EOF while parsing quoted field at line 2, column 1. matches the updated error-reporting behavior and keeps this test consistent with the other string parsing specs.

src/parser/api/binary/parseBinaryStreamToStream.spec.ts (1)

43-45: LGTM!

The updated error snapshot correctly reflects the enhanced location-aware error reporting. The location "line 2, column 1" accurately describes where the malformed quoted field begins in the input 'a\n"'.

src/parser/api/network/parseRequest.spec.ts (1)

94-94: LGTM!

Switching from toStrictEqual to toEqual is appropriate here. The test expectations are simple POJOs with string values, and toEqual provides sufficient deep equality checking for this use case.

src/parser/api/binary/parseBinaryToIterableIterator.test.ts (1)

143-146: LGTM!

The updated expectation correctly reflects the new default columnCountStrategy: "fill" behavior where missing fields are filled with empty strings ("") instead of undefined.

src/parser/api/string/parseStringStream.spec.ts (1)

67-69: LGTM!

The updated error snapshot correctly includes location information, consistent with the enhanced error reporting across the parser suite.

src/parser/api/string/parseStringToIterableIterator.spec.ts (1)

25-27: LGTM!

The error snapshot update is consistent with the location-aware error reporting changes across the test suite.

src/parser/stream/BinaryCSVParserStream.test.ts (3)

1-1: LGTM!

Clean import simplification - directly using vitest exports instead of alias variables improves readability.

65-68: LGTM!

The updated expectation correctly reflects the new default behavior where missing fields are filled with empty strings. The comment accurately describes this behavior.

83-86: LGTM!

Consistent with the other missing field test case - expectation correctly updated to reflect the "fill" strategy default.

src/parser/api/file/parseFileToStream.spec.ts (1)

19-22: LGTM!

The switch to toEqual is consistent with the test suite standardization and appropriate for comparing plain record objects.

src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts (1)

2-2: Delimiter-based assertion correctly targets the new token model

Using Delimiter.Record and checking tokens[0]?.delimiter aligns the test with the unified token shape and continues to verify the intended behavior for the long, valid quoted field. Looks good.

Also applies to: 162-164

src/parser/api/file/parseBlob.spec.ts (1)

57-58: Relaxing assertions to toEqual is appropriate for pooled/object-create records

Switching these checks to toEqual keeps the tests focused on record shape and values, which is compatible with records created via Object.create(null) and other internal optimizations, while avoiding brittle prototype comparisons.

If you want extra confidence, please confirm in the test runner docs that toEqual performs deep value comparison independent of object prototype differences.

Also applies to: 73-74, 87-88, 101-102, 113-115

src/parser/stream/StringCSVParserStream.test.ts (1)

1-1: Updated missing-field expectation matches new fill strategy semantics

Flushing "Bob" with a ["name", "age"] header and default settings should yield { name: "Bob", age: "" }, so asserting an empty string for the missing field correctly reflects the default fill behavior.

Also applies to: 65-67

src/parser/api/file/parseFileToArray.spec.ts (1)

11-15: Use of toEqual keeps tests resilient to internal record representation

Relaxing these expectations to toEqual makes the tests robust against internal changes such as using Object.create(null) or other pooling strategies, while still validating the full record contents and ordering.

If needed, double-check in your test framework docs that toEqual performs deep structural comparison and doesn’t depend on object prototypes.

Also applies to: 22-22, 31-31, 42-45, 91-93, 101-105

src/parser/api/model/createCSVRecordAssembler.test.ts (1)

1-85: New tests thoroughly cover assembler format/strategy validation

This suite exercises the happy paths and all the important misconfiguration cases (headerless vs format, includeHeader, and disallowed strategies on object output), and the makeToken helper matches the new token shape. The coverage looks solid.

src/parser/api/model/createCSVRecordAssembler.ts (1)

132-148: Runtime guard for invalid object columnCountStrategy combinations is well-placed

The new checks cleanly prevent using "sparse", "keep", or "truncate" with object output and provide clear migration guidance, while still allowing "fill"/"strict". Placing this after headerless/includeHeader validation but before assembler construction is appropriate.

src/parser/models/FlexibleBinaryObjectCSVParser.ts (1)

38-42: LGTM! The const generic enhances type inference for header literals.

This enables better type preservation when users provide headers with as const, allowing the compiler to infer exact literal tuple types rather than widening to readonly string[].

src/parser/models/FlexibleStringObjectCSVParser.ts (1)

34-38: LGTM! Consistent type enhancement across parser implementations.

The const generic parameter aligns with the binary parser variant, maintaining API consistency.

src/parser/models/FlexibleCSVRecordAssembler.test.ts (2)

2-5: LGTM! Import updates align with the new unified token model.

The imports correctly reference Delimiter from constants and AnyToken from types.

54-65: LGTM! Token construction correctly uses the unified format.

The test tokens properly include all required fields: value, delimiter, delimiterLength, and location with the expected structure.

src/core/constants.ts (1)

110-117: LGTM! Numeric enum is a good choice for hot-path performance.

Using numeric constants (0, 1) instead of Symbol enables faster comparisons in the lexer/assembler paths. This aligns with the PR's performance improvement goals.

The commented-out EOF = 2 is a reasonable placeholder for future extension without disrupting the current enum values.

src/parser/stream/CSVRecordAssemblerTransformer.spec.ts (2)

4-5: LGTM! Import updates align with the unified token model.

54-73: LGTM! Token generation correctly implements the unified format.

The logic properly assigns Delimiter.Record to the last field in each row and Delimiter.Field to intermediate fields, matching the expected token semantics.

src/utils/memory/ReusableArrayPool.ts (1)

6-24: LGTM! Clean and efficient object pooling implementation.

The design is well-suited for hot paths:

pop() is O(1) for retrieval

Factory pattern avoids unnecessary pre-allocation

maxSize prevents unbounded memory growth

The optional reset callback provides flexibility for cleanup

The behavior where a throwing reset callback prevents the value from being pooled is actually desirable—a failed reset likely means the object is in an inconsistent state.

src/parser/models/FlexibleCSVRecordAssembler.spec.ts (1)

4-5: LGTM! Import updates consistent with other test files.

src/parser/stream/StringCSVLexerTransformer.spec.ts (2)

27-35: LGTM!

Good edge-case handling for empty CSV strings, and the unified token format with Delimiter.Record / Delimiter.Field aligns correctly with the new token model.

100-124: LGTM!

The nested loop correctly generates unified tokens with proper delimiter assignments for each field position. The removal of .flat() on Line 123 is consistent since transform now returns a flat structure matching the expected tokens array directly.

src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts (2)

36-42: LGTM!

The unified token format correctly uses Delimiter.Field and Delimiter.Record to distinguish between field and record boundaries, with delimiterLength indicating the delimiter's character length.

110-129: Test correctly includes location data for property-based testing—no changes needed.

The test file already covers both Token (with location) and Token (without location) cases. Earlier test cases at lines 37, 53, and 72 create tokens without the location field, verifying that the transformer handles the location-disabled code path. The property-based test at lines 110-129 intentionally includes location data to test the with-location variant. Since the transformer accepts AnyToken (which is Token<true> | Token<false>), both test cases are valid and provide appropriate coverage.

src/parser/models/FlexibleStringCSVLexer.spec.ts (3)

8-9: LGTM!

Good practice to document that trackLocation is disabled for spec tests, with unit tests covering location tracking separately.

18-32: LGTM!

Correct handling of the ambiguous case where a single empty unquoted field cannot be distinguished from an empty CSV string. The logic properly clears expectations when this edge case is detected.

206-214: Verify token generation logic for empty fields.

The condition if (quote || field !== "") only adds tokens when the field is non-empty OR when quoting is enabled. This correctly handles the case where empty unquoted fields are ambiguous, but verify this matches the lexer's actual behavior for all edge cases (e.g., empty fields in multi-field rows).

src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (5)

140-198: LGTM!

Comprehensive test coverage for the new fill strategy (default for array format). Tests correctly verify that short rows are padded with empty strings "" and long rows are truncated.

200-258: LGTM!

Good coverage for the sparse strategy, correctly verifying that short rows are filled with undefined values rather than empty strings.

366-387: LGTM!

Good test for variable-length CSV handling with the keep strategy. The comment at Line 382 correctly notes that the first row becomes the header, so only 2 data records remain.

470-492: LGTM!

Proper validation tests ensuring headerless mode (header: []) only supports columnCountStrategy: 'keep' and rejects fill, sparse, strict, and truncate with clear error messages.

356-363: The test is correctly aligned with the implementation.

The error message in the test matches the actual implementation in FlexibleCSVArrayRecordAssembler.ts (lines 79-80). When outputFormat: "array" with columnCountStrategy: "sparse" and no header option, the array assembler throws the error "columnCountStrategy '${this.#columnCountStrategy}' requires header option...". The factory function's validation at lines 133-141 in createCSVRecordAssembler.ts handles object format restrictions and does not apply to array format, so there is no ordering concern here.

src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (5)

7-29: LGTM!

Correctly updated tests for the fill strategy as the default for object format. Missing fields are now filled with empty strings "" instead of undefined, which is more compatible with object format semantics.

71-83: LGTM!

Proper validation that sparse strategy is not allowed for object format, with a clear error message explaining why (sparse fills with undefined, which is incompatible with object format).

160-186: LGTM!

Good validation tests ensuring truncate and keep strategies are rejected for object format with appropriate error messages. This aligns with the factory function validation in createCSVRecordAssembler.ts (Lines 133-142).

207-223: LGTM!

Test correctly verifies that missing fields in short rows use empty strings "" with the fill strategy, distinguishing between "empty fields in CSV" vs "missing fields due to short rows" (both result in "" for fill strategy).

241-263: LGTM!

Proper validation that headerless mode (header: []) is rejected for object format regardless of columnCountStrategy, with the headerless mode check taking precedence over strategy validation.

src/core/types.test-d.ts (3)

267-295: LGTM!

The array format test correctly validates that all columnCountStrategy options ("fill" | "sparse" | "keep" | "strict" | "truncate" | undefined) are allowed. The test cases cover "keep", "sparse", and "truncate" strategies with proper type assertions.

297-321: LGTM!

The object format restriction tests correctly validate that only "fill" | "strict" | undefined are allowed for object output. The @ts-expect-error annotations properly flag "keep" and "truncate" as invalid for object format.

Note: Based on the PR objectives and the relevant snippet from src/core/types.ts, "sparse" should also be invalid for object format. Consider adding a test case with @ts-expect-error for "sparse" to ensure complete coverage of the documented constraint.

323-336: LGTM!

The output format type assertions correctly verify that both "array" and "object" literal types are properly inferred.

src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (3)

21-26: LGTM!

The token expectations correctly use the new unified format with Delimiter.Field and Delimiter.Record enum values, along with delimiterLength indicating the actual delimiter character count (1 for comma, 2 for CRLF).

123-146: LGTM!

The delimiter length calculation logic correctly handles:

Last field of last row without EOF: delimiterLength: 0

Last field of other rows or with EOF: delimiterLength: eol.length

Non-last fields: delimiterLength: options.delimiter.length

The unified token format properly captures both the delimiter type and its length.

61-76: Edge case handling is correct and well-tested.

The ambiguousSingleEmpty check properly addresses the CSV parsing ambiguity where a single empty unquoted field ("") cannot be distinguished from no input. The subsequent expected.pop() correctly adjusts test expectations to match the lexer's behavior of emitting no field tokens for this case. This pattern is consistently applied across the test suite and aligns with the lexer's documented behavior.

src/parser/models/FlexibleStringCSVLexer.test.ts (4)

6-9: LGTM!

The lexer type and initialization correctly specify trackLocation: true as a generic parameter and option, enabling location tracking for all tests in this suite.

44-67: LGTM!

The test correctly verifies escaped field with delimiter handling. The location offsets account for the quoted field ("field" = 7 chars) plus comma delimiter, with the second field "next" starting at column 9 (offset 8).

96-120: LGTM!

The test correctly verifies that trailing newlines do not produce extra empty records. The delimiterLength: 1 on the last field indicates the trailing LF was consumed as the record delimiter.

291-306: LGTM!

The test correctly documents that trackLocation: true is required to get rowNumber in ParseError objects. This aligns with the PR's breaking change where location tracking is disabled by default for performance.

src/parser/models/FlexibleBinaryCSVParser.test.ts (4)

219-235: LGTM!

The test correctly verifies the "sparse" strategy for array format: missing fields are preserved as undefined rather than being filled with empty strings.

267-272: LGTM!

The test correctly documents the new default behavior: missing fields in object format are filled with "" (empty string) when using the default "fill" strategy.

399-413: LGTM!

The test explicitly verifies the "fill" strategy behavior for object format, confirming missing fields are filled with empty strings.

427-437: LGTM!

The test correctly validates that the "truncate" strategy is rejected at construction time for object format output, with an appropriate error message pattern.

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (6)

52-73: LGTM!

The constructor correctly validates that "sparse", "keep", and "truncate" strategies are rejected for object format with clear, actionable error messages.

137-179: LGTM on the unified token processing flow.

The token processing correctly:

Tracks row number from location when available

Stores field value and tracks content

Advances field index on Delimiter.Field

Completes record assembly on Delimiter.Record

Handles header detection, empty lines, and row pool management

271-287: LGTM on the fill strategy implementation.

The implementation correctly:

Uses pre-computed validHeaderIndices and headerKeys for performance

Creates objects with Object.create(null) for prototype pollution safety

Falls back to "" for missing field values using nullish coalescing

293-321: Double release of row buffer on strict mismatch error.

In #assembleRecordStrict, when column count doesn't match, #releaseRow(row) is called at line 300. However, the caller #assembleRecord at line 263 also calls #releaseRow(row) unconditionally after the strategy function returns. Since #assembleRecordStrict throws, line 263 won't execute in this case, so this is actually correct.

However, there's another issue: when the error is thrown, the row has been released back to the pool but the pool might reuse it. This is fine for correctness but note that #releaseRow sets row.length = 0, which clears the row before it's used for error context if needed.

343-376: LGTM on row buffer pooling implementation.

The pooling logic correctly:

Takes a row and allocates a new buffer with appropriate capacity hint

Releases rows back to the pool after clearing

Resets builder state when buffer is replaced

Pre-allocates buffer capacity but resets length to 0 for use

181-183: The empty #flush method is correct by design.

The lexer (FlexibleStringCSVLexer) always emits a Delimiter.Record token for every record, including incomplete ones. When processing input without a trailing newline (e.g., "field" or "Alice,30\nBob"), the lexer emits tokens ending with Delimiter.Record with delimiterLength: 0, ensuring all records are logically complete by the time they reach the assembler. Consequently, no residual data remains to be flushed.

src/parser/models/FlexibleStringCSVLexer.ts (8)

34-36: Good use of character codes for performance.

Pre-computing character codes for comparison is an effective micro-optimization that avoids repeated charCodeAt() calls on constant strings.

57-64: Good design: Lazy position tracking for error reporting.

The lazy position tracking approach (#baseLineNumber, #baseColumn) is a good optimization—position is computed only when errors occur, which is rare in normal operation. This adds minimal overhead to the fast path.

118-132: Position computation logic is correct.

The O(n) scan through the buffer for position computation is acceptable since it only executes on error paths, and the comment clearly documents this trade-off.

219-254: Trailing field delimiter handling is correctly implemented.

The #pendingTrailingFieldEOF mechanism properly handles CSV edge cases like ",x," producing 3 fields by emitting an empty token at EOF after a trailing field delimiter.

452-458: Good handling of RFC 4180 edge case.

The code correctly handles quotes appearing mid-field in unquoted fields by delegating to #parsePartialQuotedFieldNoLocation. While technically a violation of RFC 4180, this permissive approach improves compatibility with malformed CSV data.

553-565: Recursive handling of mixed quoted/unquoted fields.

The recursive call to #parseUnquotedFieldNoLocation (line 555) correctly handles cases where more unquoted content follows a quoted section. This properly supports edge cases like a"b"c"d"e where quoted sections are interleaved with unquoted content.

1048-1050: Verify cursor offset calculation for partial quoted fields.

The offset calculation localOffset - prefixLen - 1 appears at multiple places (lines 1049, 1073). Ensure this correctly accounts for:

The prefix consumed before the opening quote

The opening quote itself (the -1)

The logic seems correct, but given the complexity, consider adding a unit test that specifically validates position reporting for mixed quoted/unquoted fields like abc"def"ghi.

1188-1195: Effective segment pool implementation.

The segment pool properly reuses arrays to reduce GC pressure. Setting segments.length = 0 before release correctly clears all string references while keeping the array allocation.

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (7)

70-82: Good default strategy handling.

The default strategy correctly switches between 'keep' for headerless mode and 'fill' for normal mode. The validation at lines 73-82 properly enforces that strategies requiring a known column count (strict, truncate, sparse) must have an explicit header.

222-241: Good strategy pattern implementation.

Assigning the strategy function once in #setHeader avoids switch overhead on every record assembly. The default fallback to fill (line 239) provides defensive behavior.

281-286: Type cast for sparse array is intentional.

The cast undefined as unknown as string at line 283 is a necessary escape hatch since the internal #row array is typed as string[]. The type safety is preserved at the API boundary through CSVArrayRecord<Header, 'sparse'> which correctly types fields as string | undefined.

295-312: Correct cleanup before throwing in strict mode.

The call to #takeRow(rowLength) at line 299 before throwing the error correctly ensures the row buffer state is reset, preventing state corruption if the error is caught and parsing continues.

356-363: Good optimization: Empty record template caching.

Caching the empty record template (#emptyRecordTemplate) and using slice() to create copies avoids repeated array allocation and fill() calls for empty lines.

372-374: Correct header emission with defensive copy.

Yielding a slice() of the header (line 373) correctly prevents callers from mutating the internal header array. The type cast chain is verbose but necessary due to the readonly constraint on Header.

25-27: Consider adding explicit implements CSVArrayRecordAssembler<Header> clause for clarity.

While the class structurally conforms to the interface (the assemble method signature matches exactly), removing the explicit implements clause eliminates compile-time verification of the contract. This makes it easier to accidentally break compatibility if the interface changes, and less explicit for maintainers about the intended contract. TypeScript's structural typing means the code works correctly, but explicit implementation signals intent and catches interface mismatches early.
src/core/types.ts (7)
89-104: Well-designed unified token model.

The BaseToken interface with delimiter and delimiterLength properties cleanly unifies the previous separate token types. The delimiterLength field (0 for EOF, 1 for comma/LF/CR, 2 for CRLF) enables efficient parsing without string allocation.

157-165: Clean conditional token type.

The Token<TrackLocation> conditional type and AnyToken union provide good flexibility—lexers emit typed tokens based on configuration, while assemblers can accept either via AnyToken.

701-748: Good type-level constraint enforcement for object format.

The object format branch correctly restricts columnCountStrategy to ObjectFormatColumnCountStrategy (lines 742) and uses includeHeader?: never (line 747) to prevent invalid combinations at compile time.

1586-1605: Well-documented strategy constraints.

The ObjectFormatColumnCountStrategy type using Extract cleanly defines the subset of strategies valid for object output. The documentation clearly explains why sparse, keep, and truncate are not supported for object format.

1781-1792: Verify default strategy consistency.

The CSVRecord type defaults Strategy to "keep" (line 1784), while the runtime FlexibleCSVArrayRecordAssembler defaults to "fill" (for non-headerless mode). The InferStrategy helper (line 1330) also defaults to "fill".

While this may be intentional (type-level conservatism), consider aligning the type default with the runtime default for consistency:
 export type CSVRecord<
   Header extends ReadonlyArray<string>,
   Format extends CSVOutputFormat = "object",
-  Strategy extends ColumnCountStrategy = "keep",
+  Strategy extends ColumnCountStrategy = "fill",
 > = ...
1672-1677: Good compile-time enforcement for sparse restriction.

The CSVObjectRecord type returns never when Strategy extends "sparse" (line 1676), causing a type error if users try to use sparse strategy with object format. This provides compile-time safety complementing the runtime check.

2082-2093: Clean generic lexer interface.

The StringCSVLexer<TrackLocation> interface properly propagates the location tracking configuration to the return type, ensuring type safety throughout the token pipeline.

.changeset/column-count-strategy-rename.md

.changeset/performance-improvements.md

config/vitest.setup.ts

src/parser/api/model/createStringCSVLexer.ts

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (1)
111-118: Potential issue: Assembler generator consumed twice.

The test calls assembler.assemble(tokens) twice - once for the assertion and once to collect records. If the assembler's assemble method is a generator or has side effects, the second call may not work as expected since generators are single-use.

Consider refactoring to consume the generator once:
       // Should not throw - exactly at the limit
-      expect(() => [...assembler.assemble(tokens)]).not.toThrow();
-
-      // Verify the record was correctly assembled
-      const records = [...assembler.assemble(tokens)];
+      let records: unknown[];
+      expect(() => {
+        records = [...assembler.assemble(tokens)];
+      }).not.toThrow();
       expect(records).toHaveLength(1);
       expect(Object.keys(records[0] as object)).toHaveLength(10);
src/parser/models/FlexibleStringCSVLexer.test.ts (2)
210-214: Typo in test name: "thorw" should be "throw".

Minor typo in the test description.
-  test("should thorw an error if the field is not closed", () => {
+  test("should throw an error if the field is not closed", () => {
226-235: Same typo: "thorw" should be "throw".
-    test("should thorw DOMException named AbortError if the signal is aborted", () => {
+    test("should throw DOMException named AbortError if the signal is aborted", () => {

♻️ Duplicate comments (7)

config/vitest.setup.ts (1)
3-6: Incorrect comment: endOnFailure defaults to false, not true.

The comment states this is the default value, but fast-check's endOnFailure actually defaults to false. Setting it to true changes behavior by stopping tests after the first failure without shrinking. Either update the comment to clarify this is an intentional behavioral change, or remove the misleading statement.

Apply this diff to correct the comment:
 fc.configureGlobal({
-  // This is the default value, but we set it here to be explicit.
+  // Stop tests on first failure without shrinking (default is false).
   endOnFailure: true,
 });
src/parser/api/file/parseFileToStream.spec.ts (3)

55-57: Consistent use of toEqual for single-record assertions

Same rationale as Line 19: the test still fully validates the parsed fields while avoiding brittleness against internal object shape/prototype changes.

148-152: Streaming first record: relaxed equality is fine

For the first streamed record, toEqual is sufficient to ensure correct field values and works better with potential changes in how record objects are instantiated or pooled.

154-156: Second streamed record assertion matches updated pattern

The second record check mirrors the first and stays focused on field values; no concerns with using toEqual here.
.changeset/column-count-strategy-rename.md (1)
1-5: Incorrect bump level for breaking changes.

This changeset restricts columnCountStrategy options for object/record-view output, rejecting keep/truncate, which is a breaking change. The PR objectives explicitly state: "sparse is disallowed for object format and will throw at runtime." Users relying on these now-invalid options will experience runtime errors.

A patch bump is inappropriate for breaking changes. According to semantic versioning, this should be at least minor (for 0.x versions) or major (for 1.x+ versions).

Apply this diff:
 ---
-"web-csv-toolbox": patch
+"web-csv-toolbox": minor
 ---
Note: If the project is at version 1.x or higher and follows strict semver, consider using major instead of minor.
.changeset/performance-improvements.md (1)
40-46: Fix array vs object throughput percentage to match the table

Given 49.0 MB/s (object) vs 89.6 MB/s (array), array format is about 1.8× faster (~83% higher throughput), not ~46%. The final sentence should be updated to avoid contradicting the table.
-Array format is approximately 46% faster than Object format for the same data.
+Array format is approximately 1.8× faster (~83% higher throughput) than Object format for the same data.
src/parser/api/model/createStringCSVLexer.ts (1)
37-47: Return type is missing the TrackLocation generic parameter.

The function declares TrackLocation extends boolean = false but the return type StringCSVLexer doesn't propagate this generic. This causes callers to lose type information about whether the returned lexer produces tokens with or without location data.
 export function createStringCSVLexer<
   Delimiter extends string = DEFAULT_DELIMITER,
   Quotation extends string = DEFAULT_QUOTATION,
   TrackLocation extends boolean = false,
 >(
   options?: StringCSVLexerOptions<Delimiter, Quotation, TrackLocation>,
-): StringCSVLexer {
+): StringCSVLexer<TrackLocation> {
   return new FlexibleStringCSVLexer<Delimiter, Quotation, TrackLocation>(
     options ?? {},
   );
 }

🧹 Nitpick comments (13)

src/parser/stream/StringCSVLexerTransformer.ts (1)

135-137: Remove or clarify leftover placeholder comment in super call

The comment // undefined, // writableStrategy looks like a leftover from a previous constructor call shape and no longer matches the current arguments. It’s a bit confusing in between the arguments and doesn’t add real value. I’d either remove it or replace it with a brief explanatory comment (outside the argument list) if you want to document why custom strategies are passed here.
src/parser/stream/StringCSVParserStream.test.ts (1)
56-68: Consider adding explicit tests for columnCountStrategy options.

The updated expectation correctly reflects the new default fill behavior (missing fields → ""). However, given the breaking changes in this PR, consider adding explicit test cases for:

fill strategy: Explicitly set columnCountStrategy: "fill" and verify missing fields are filled with "" for both object and array formats.

sparse strategy: Verify it works for array format (missing fields → undefined) and throws an error for object format (as documented in PR objectives).

This would provide clearer documentation of the behavior and prevent regressions.

Example test structure:
describe("columnCountStrategy", () => {
  it("should fill missing fields with empty string using 'fill' strategy", async () => {
    const parser = new FlexibleStringObjectCSVParser({
      header: ["name", "age"] as const,
      columnCountStrategy: "fill",
    });
    const stream = new StringCSVParserStream(parser);
    const records = await transform(stream, ["Alice,30\n", "Bob"]);
    expect(records).toEqual([
      { name: "Alice", age: "30" },
      { name: "Bob", age: "" },
    ]);
  });

  it("should throw when using 'sparse' strategy with object format", () => {
    expect(() => {
      new FlexibleStringObjectCSVParser({
        header: ["name", "age"] as const,
        columnCountStrategy: "sparse",
      });
    }).toThrow();
  });
});
src/utils/memory/ReusableArrayPool.ts (1)

1-24: Simple LIFO pool implementation looks correct

The pool logic (take + release with optional reset) is clean and side-effect free beyond intended reuse. Using Infinity as the default maxSize is fine for an internal helper, though if you ever see unbounded distinct instances in hot paths, you might want a finite default tuned to expected concurrency.

docs/reference/output-format-guide.md (1)

7-40: Output format behavior and constraints look accurate; consider clarifying array+strategy nuance

The object/array descriptions, headerless constraint (header: [] requires outputFormat: "array" + columnCountStrategy: "keep"), and notes about padding with "" vs undefined all match the runtime rules in the factory and assemblers.
If you want to avoid any ambiguity, you could slightly qualify “All strategies (fill, keep, truncate, sparse, strict)" for the array row to say it’s “all strategies (subject to the ColumnCountStrategy compatibility matrix)” to reinforce the cross‑doc constraint.

docs/reference/column-count-strategy-guide.md (1)

7-45: ColumnCountStrategy matrix and descriptions align with the implementation

The compatibility table and per‑strategy behavior (e.g., fill as default, keep for ragged/headerless, truncate/sparse requiring an explicit header, and object‑format limited to fill/strict) are consistent and clearly explained. The repeated “Need …?” phrasing in the “Choosing a Strategy” bullets is fine as a rhetorical device, but you could vary it slightly if you want to appease style checkers.

src/parser/models/FlexibleCSVRecordAssembler.test.ts (1)

2-4: Abort/timeout tests correctly migrated to AnyToken + Delimiter.Record

Switching these tests to feed AnyToken[] with delimiter: Delimiter.Record (and location metadata) into assembler.assemble(...) matches the new token model and keeps the AbortSignal behavior well covered for AbortError, custom reasons, and TimeoutError. If you find yourself adding more such tests later, you might consider extracting a tiny helper to construct the single empty-record token, but it’s not necessary right now.

Also applies to: 43-51, 54-66, 83-96, 103-125, 135-178

src/parser/api/model/createCSVRecordAssembler.ts (1)

132-148: Object-format columnCountStrategy validation is correct and matches docs

The new guard that forbids sparse, keep, and truncate when outputFormat is "object" cleanly enforces the documented constraints and produces clear error messages pointing to fill/strict or switching to array output. Given that the type system still allows these combinations in some generic cases, a future enhancement could further narrow the TS types, but the runtime check here is solid.
src/core/constants.ts (1)
103-117: LGTM! Unified token format improves performance.

The Delimiter enum effectively replaces Symbol-based token types with numeric constants, aligning with the performance optimization goals mentioned in the PR objectives.

Consider removing or documenting the commented-out EOF = 2 member at Line 116. If it's planned for future use, a brief comment explaining the roadmap would help. If not needed, removing it would reduce clutter.
   Record = 1,
-  // /** End of file/stream */
-  // EOF = 2,
 }
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (1)
140-141: Clarify default strategy scope in test description.

The description "(default)" may be misleading. Per the PR objectives, fill is the default for both array and object formats when a header is provided, but keep is required for headerless mode. Consider clarifying:
-    describe("fill strategy (default)", () => {
+    describe("fill strategy (default when header provided)", () => {
src/parser/models/FlexibleStringCSVLexer.spec.ts (1)

197-216: Logic appears correct but consider adding a clarifying comment.

The conditional if (quote || field !== "") on line 207 omits empty unquoted fields from expectations. While this aligns with the ambiguous empty field behavior, a brief comment explaining this would improve readability for future maintainers.
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (1)
85-97: Consider removing or documenting the commented-out examples.

The commented examples block adds noise to the test. If these are kept for debugging purposes, consider moving them to a separate documentation comment or removing them entirely.
-      {
-        // examples: [
-        //   {
-        //     row: ["name", "age"],
-        //     chunks: ["name,age\r\n"],
-        //     expected: [
-        //       { value: "name", delimiter: Delimiter.Field, delimiterLength: 1 },
-        //       { value: "age", delimiter: Delimiter.Record, delimiterLength: 2 },
-        //     ],
-        //   },
-        //   { row: [""], chunks: [""], expected: [{ value: "", delimiter: 1, delimiterLength: 0 }] },
-        // ],
-      },
+      {},
src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)
232-242: Remove unreachable truncate case from switch statement.

The truncate case (lines 236-237) is dead code since truncate is explicitly blocked in the constructor (lines 66-73). This creates maintenance confusion.

Apply this diff to remove the unreachable case:
     switch (this.#columnCountStrategy) {
       case "strict":
         this.#assembleRecordFn = this.#assembleRecordStrict;
         break;
-      case "truncate":
-        this.#assembleRecordFn = this.#assembleRecordTruncate;
-        break;
       default:
         this.#assembleRecordFn = this.#assembleRecordFill;
         break;
     }
Additionally, you can remove the #assembleRecordTruncate method (lines 327-341) entirely since it's never used in object format.

151-157: Potential redundancy in rowLength calculations.

Line 151 calculates #rowLength = Math.max(this.#rowLength, this.#fieldIndex + 1). Then after incrementing #fieldIndex on line 154, line 156-157 recalculates with the new value. The second calculation (line 156-157) is effectively Math.max(rowLength, fieldIndex + 2) which is always equal or greater than the first.

Consider if line 151 is necessary, or if line 156-157 alone is sufficient.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8adf5d9 and e300438.

📒 Files selected for processing (60)

.changeset/column-count-strategy-rename.md (1 hunks)
.changeset/lexer-api-changes.md (1 hunks)
.changeset/performance-improvements.md (1 hunks)
benchmark/package.json (2 hunks)
config/vitest.setup.ts (1 hunks)
docs/reference/column-count-strategy-guide.md (1 hunks)
docs/reference/output-format-guide.md (1 hunks)
src/core/constants.ts (1 hunks)
src/core/types.test-d.ts (2 hunks)
src/core/types.ts (26 hunks)
src/parser/api/binary/parseBinary.browser.spec.ts (1 hunks)
src/parser/api/binary/parseBinary.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryStream.browser.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryStream.spec.ts (5 hunks)
src/parser/api/binary/parseBinaryStreamToStream.spec.ts (1 hunks)
src/parser/api/binary/parseBinaryToIterableIterator.test.ts (1 hunks)
src/parser/api/file/parseBlob.spec.ts (5 hunks)
src/parser/api/file/parseFile.spec.ts (2 hunks)
src/parser/api/file/parseFileToArray.spec.ts (6 hunks)
src/parser/api/file/parseFileToStream.spec.ts (3 hunks)
src/parser/api/model/createCSVRecordAssembler.test.ts (1 hunks)
src/parser/api/model/createCSVRecordAssembler.ts (1 hunks)
src/parser/api/model/createStringCSVLexer.ts (2 hunks)
src/parser/api/network/parseRequest.spec.ts (3 hunks)
src/parser/api/network/parseResponse.browser.spec.ts (1 hunks)
src/parser/api/network/parseResponse.spec.ts (1 hunks)
src/parser/api/network/parseResponseToStream.spec.ts (1 hunks)
src/parser/api/parse.spec.ts (1 hunks)
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts (5 hunks)
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (5 hunks)
src/parser/api/string/parseString.spec.ts (1 hunks)
src/parser/api/string/parseStringStream.spec.ts (1 hunks)
src/parser/api/string/parseStringStreamToStream.spec.ts (1 hunks)
src/parser/api/string/parseStringToArraySync.spec.ts (1 hunks)
src/parser/api/string/parseStringToIterableIterator.spec.ts (1 hunks)
src/parser/api/string/parseStringToStream.spec.ts (1 hunks)
src/parser/models/FlexibleBinaryArrayCSVParser.ts (1 hunks)
src/parser/models/FlexibleBinaryCSVParser.test.ts (4 hunks)
src/parser/models/FlexibleBinaryObjectCSVParser.ts (1 hunks)
src/parser/models/FlexibleCSVArrayRecordAssembler.ts (5 hunks)
src/parser/models/FlexibleCSVObjectRecordAssembler.ts (4 hunks)
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (7 hunks)
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (10 hunks)
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (11 hunks)
src/parser/models/FlexibleCSVRecordAssembler.spec.ts (3 hunks)
src/parser/models/FlexibleCSVRecordAssembler.test.ts (4 hunks)
src/parser/models/FlexibleStringArrayCSVParser.ts (1 hunks)
src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts (2 hunks)
src/parser/models/FlexibleStringCSVLexer.spec.ts (7 hunks)
src/parser/models/FlexibleStringCSVLexer.test.ts (12 hunks)
src/parser/models/FlexibleStringCSVLexer.ts (6 hunks)
src/parser/models/FlexibleStringCSVParser.test.ts (4 hunks)
src/parser/models/FlexibleStringObjectCSVParser.ts (1 hunks)
src/parser/stream/BinaryCSVParserStream.test.ts (3 hunks)
src/parser/stream/CSVRecordAssemblerTransformer.spec.ts (4 hunks)
src/parser/stream/StringCSVLexerTransformer.spec.ts (6 hunks)
src/parser/stream/StringCSVLexerTransformer.test.ts (1 hunks)
src/parser/stream/StringCSVLexerTransformer.ts (1 hunks)
src/parser/stream/StringCSVParserStream.test.ts (2 hunks)
src/utils/memory/ReusableArrayPool.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: :0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user `kamiazya` has confirmed that the changes made in PR #32 are correct and has requested the resolution of all review comments.

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parser/api/binary/parseBinary.browser.spec.ts
src/parser/api/file/parseFileToArray.spec.ts
src/parser/api/binary/parseBinaryStreamToStream.spec.ts
src/parser/api/model/createCSVRecordAssembler.test.ts
src/parser/api/file/parseFile.spec.ts
src/parser/models/FlexibleStringObjectCSVParser.ts
src/parser/stream/StringCSVParserStream.test.ts
src/parser/api/string/parseStringToArraySync.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts
src/parser/stream/StringCSVLexerTransformer.spec.ts
src/core/types.test-d.ts
src/parser/models/FlexibleCSVRecordAssembler.test.ts
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts
src/parser/models/FlexibleStringCSVLexer.spec.ts
src/parser/stream/BinaryCSVParserStream.test.ts
src/parser/api/binary/parseBinaryStream.spec.ts
src/parser/api/model/createCSVRecordAssembler.ts
src/parser/api/string/parseString.spec.ts
src/parser/models/FlexibleStringCSVLexer.test.ts
src/parser/api/binary/parseBinaryStream.browser.spec.ts
src/parser/models/FlexibleBinaryObjectCSVParser.ts
src/parser/api/network/parseResponseToStream.spec.ts
src/parser/api/file/parseFileToStream.spec.ts
src/core/constants.ts
src/parser/stream/CSVRecordAssemblerTransformer.spec.ts
src/parser/stream/StringCSVLexerTransformer.ts
src/parser/models/FlexibleCSVArrayRecordAssembler.ts
src/parser/models/FlexibleCSVObjectRecordAssembler.ts
src/parser/models/FlexibleBinaryCSVParser.test.ts
src/parser/api/parse.spec.ts
src/parser/models/FlexibleStringArrayCSVParser.ts
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts
src/parser/api/binary/parseBinary.spec.ts
src/parser/models/FlexibleStringCSVParser.test.ts
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

.changeset/lexer-api-changes.md
src/parser/stream/StringCSVLexerTransformer.spec.ts
src/parser/api/stream/createStringCSVLexerTransformer.spec.ts
src/parser/models/FlexibleStringCSVLexer.spec.ts
src/parser/models/FlexibleStringCSVLexer.test.ts
src/parser/models/FlexibleStringCSVLexer.ts
src/parser/stream/StringCSVLexerTransformer.ts
src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 53
File: .github/workflows/ci.yaml:23-39
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user has clarified that in the `package.json` of the `web-csv-toolbox` project, the `test` script is defined as `vitest`, which means that the command `npm test run` correctly expands to `vitest run`.

Applied to files:

src/parser/stream/StringCSVParserStream.test.ts
src/parser/models/FlexibleCSVRecordAssembler.test.ts
src/parser/stream/BinaryCSVParserStream.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/internal/parseStringStreamToStream.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The typo in the class name `RecordAssemblerTransformar` was fixed by the user `kamiazya` in commit `f065e07`.

Applied to files:

src/parser/stream/CSVRecordAssemblerTransformer.spec.ts

🧬 Code graph analysis (21)

src/parser/api/network/parseResponse.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/parser/api/model/createCSVRecordAssembler.test.ts (3)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (1)

value (129-131)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/api/file/parseFile.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/parser/api/network/parseRequest.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/parser/models/FlexibleCSVRecordAssembler.spec.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (1)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/stream/StringCSVLexerTransformer.spec.ts (2)

src/__tests__/helper.ts (1)

autoChunk (294-309)

src/core/types.ts (1)

TokenNoLocation (114-114)

src/core/types.test-d.ts (1)

src/core/types.ts (1)

CSVRecordAssemblerOptions (669-782)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (1)

src/core/types.ts (1)

AnyToken (165-165)

src/parser/api/stream/createStringCSVLexerTransformer.spec.ts (2)

src/parser/api/stream/createStringCSVLexerTransformer.ts (1)

createStringCSVLexerTransformer (75-91)

src/__tests__/helper.ts (3)

autoChunk (294-309)

delimiter (177-192)

eol (242-244)

src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (1)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/models/FlexibleStringCSVLexer.spec.ts (2)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/__tests__/helper.ts (2)

field (163-172)

quote (238-240)

src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts (1)

src/parser/models/FlexibleStringCSVLexer.ts (1)

FlexibleStringCSVLexer (38-1196)

src/parser/models/FlexibleStringCSVLexer.ts (3)

src/core/constants.ts (7)

CR (1-1)

CR (2-2)

DEFAULT_DELIMITER (22-22)

DEFAULT_DELIMITER (23-23)

DEFAULT_LEXER_MAX_BUFFER_SIZE (50-50)

LF (7-7)

LF (8-8)

src/core/types.ts (2)

StringCSVLexer (2082-2093)

Token (157-158)

src/utils/memory/ReusableArrayPool.ts (1)

ReusableArrayPool (6-24)

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (1)

src/core/types.ts (3)

ColumnCountStrategy (1579-1584)

CSVArrayRecord (1731-1738)

AnyToken (165-165)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

src/core/types.ts (4)

ColumnCountStrategy (1579-1584)

CSVObjectRecord (1672-1677)

CSVRecordAssemblerCommonOptions (603-615)

AnyToken (165-165)

src/utils/memory/ReusableArrayPool.ts (1)

ReusableArrayPool (6-24)

src/parser/models/FlexibleBinaryCSVParser.test.ts (2)

src/parser/models/FlexibleBinaryArrayCSVParser.ts (1)

FlexibleBinaryArrayCSVParser (38-50)

src/parser/models/FlexibleBinaryObjectCSVParser.ts (1)

FlexibleBinaryObjectCSVParser (38-50)

src/parser/api/stream/createCSVRecordAssemblerTransformer.spec.ts (2)

src/core/types.ts (1)

AnyToken (165-165)

src/__tests__/helper.ts (1)

field (163-172)

src/parser/api/binary/parseBinary.spec.ts (1)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (2)

row (256-265)

row (352-355)

src/core/types.ts (1)

src/core/constants.ts (4)

DEFAULT_DELIMITER (22-22)

DEFAULT_DELIMITER (23-23)

DEFAULT_QUOTATION (25-25)

DEFAULT_QUOTATION (26-26)

src/parser/models/FlexibleStringCSVParser.test.ts (2)

src/parser/models/FlexibleStringArrayCSVParser.ts (1)

FlexibleStringArrayCSVParser (34-46)

src/parser/models/FlexibleStringObjectCSVParser.ts (1)

FlexibleStringObjectCSVParser (34-46)

🪛 LanguageTool

docs/reference/column-count-strategy-guide.md

[style] ~42-~42: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...es?** Use fill (object default). 3. Need ragged rows / headerless CSV? Use `ke...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... CSV?** Use keep (array output). 4. Need to ignore trailing columns? Use `trun...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ncate(array output with header). 5. **Need optional columns?** Usesparse` (array...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

coderabbitai · 2025-12-02T15:40:58Z

src/parser/stream/StringCSVLexerTransformer.spec.ts

+      { numRuns: 10 }, // Reduce runs to debug
    );


⚠️ Potential issue | 🟡 Minor

Remove debug settings before merging.

The { numRuns: 10 } setting with comment "Reduce runs to debug" appears to be a leftover debug configuration. This significantly reduces property-based test coverage. Consider removing or increasing for production.

- { numRuns: 10 }, // Reduce runs to debug

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{ numRuns: 10 }, // Reduce runs to debug

);

);

🤖 Prompt for AI Agents

In src/parser/stream/StringCSVLexerTransformer.spec.ts around lines 45-46 the test includes a leftover debug option "{ numRuns: 10 }" with a "Reduce runs to debug" comment which lowers property-based test coverage; remove that option and comment (or replace it with the intended production value) so the test runs with the default / full number of runs (or a higher configured runs count) to restore proper coverage.

coderabbitai · 2025-12-02T15:40:58Z

src/parser/stream/StringCSVLexerTransformer.spec.ts

+      { numRuns: 10 },
    );


⚠️ Potential issue | 🟡 Minor

Same debug setting to remove.

- { numRuns: 10 },

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/parser/stream/StringCSVLexerTransformer.spec.ts around lines 74 to 75, the test is passing a redundant debug/configuration flag (the same debug setting) to the function call; remove the duplicated debug option so the call only includes unique configuration keys (e.g., keep { numRuns: 10 } alone or merge unique settings) and update any related test expectations if they depend on the removed duplicate.

coderabbitai · 2025-12-02T15:40:59Z

src/parser/stream/StringCSVLexerTransformer.spec.ts

+          const actual = await transform(transformer, chunks);
          expect(actual).toMatchObject(expected);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f "StringCSVLexerTransformer.spec.ts" --exec wc -l {}

Repository: kamiazya/web-csv-toolbox

Length of output: 193

🏁 Script executed:

cat -n src/parser/stream/StringCSVLexerTransformer.spec.ts | sed -n '30,80p'

Repository: kamiazya/web-csv-toolbox

Length of output: 2287

🏁 Script executed:

cat -n src/parser/stream/StringCSVLexerTransformer.spec.ts | sed -n '100,130p'

Repository: kamiazya/web-csv-toolbox

Length of output: 1389

🏁 Script executed:

rg -n "const transform\s*=" src/parser/stream/StringCSVLexerTransformer.spec.ts -A 10

Repository: kamiazya/web-csv-toolbox

Length of output: 50

Add .flat() call to match pattern in other tests.

Lines 41 and 70 both call .flat() on the transform() result before comparison, but line 123 does not. Since transform() returns a nested array structure and the expected array at line 101 is built as a flat array, the comparison will fail or be incorrect without flattening. Change line 123 to:

const actual = (await transform(transformer, chunks)).flat();

🤖 Prompt for AI Agents

In src/parser/stream/StringCSVLexerTransformer.spec.ts around lines 123-124, the test compares the nested array returned by transform() against a flat expected array; add a .flat() call to the transform() result so the shapes match. Replace the current assignment with one that awaits transform(...) and then calls .flat() on the result before calling expect(...).toMatchObject(expected).

レビューコメントに基づいて以下を修正： 1. parseResponse.spec.ts - describeブロック名を修正（parseRequest → parseResponse） 2. column-count-strategy-rename.md - breaking changeをminorに変更し、移行ガイドを追加 3. performance-improvements.md - ベンチマーク結果の記述を修正（46% → 83% faster） 4. vitest.setup.ts - endOnFailureのコメントを正確に修正（デフォルトはfalse） 5. createStringCSVLexer.ts - 戻り値の型にTrackLocationジェネリックを追加すべてのテストが成功（1341個） 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…marks - Remove debug options ({ numRuns: 10 }) to restore full property-based test coverage - Add .flat() to transform result in third test for correct array shape matching - Fix edge case: empty CSV string should produce no tokens - Update performance numbers with final benchmark measurements: - Object format (1,000 rows): 61.2 MB/s (was 49.0 MB/s) - Object format (5,000 rows): 67.9 MB/s (was 53.3 MB/s) - Array format (1,000 rows): 87.6 MB/s (was 89.6 MB/s) - Array format (5,000 rows): 86.4 MB/s (new measurement) - Array format is 43% faster (1.43× throughput) vs Object format 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

src/parser/stream/StringCSVLexerTransformer.spec.ts (3)
46-46: Remove debug configuration before merging.

The { numRuns: 10 } reduces property-based test coverage significantly.
-      { numRuns: 10 }, // Reduce runs to debug
75-75: Remove debug configuration.

Same debug setting to remove as noted above.
-      { numRuns: 10 },
124-125: Add .flat() for consistency with other tests.

Lines 42 and 71 both call .flat() on the transform() result, but this test does not. If transform() returns a nested array structure, the comparison may fail or produce incorrect results.
-          const actual = await transform(transformer, chunks);
+          const actual = (await transform(transformer, chunks)).flat();

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48f5d4f and 89278d2.

📒 Files selected for processing (2)

.changeset/performance-improvements.md (1 hunks)
src/parser/stream/StringCSVLexerTransformer.spec.ts (6 hunks)

✅ Files skipped from review due to trivial changes (1)

.changeset/performance-improvements.md

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: :0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user `kamiazya` has confirmed that the changes made in PR #32 are correct and has requested the resolution of all review comments.

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parser/stream/StringCSVLexerTransformer.spec.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/parser/stream/StringCSVLexerTransformer.spec.ts

🧬 Code graph analysis (1)

src/parser/stream/StringCSVLexerTransformer.spec.ts (1)

src/core/types.ts (1)

TokenNoLocation (114-114)

🪛 GitHub Actions: Main Workflow

src/parser/stream/StringCSVLexerTransformer.spec.ts

[error] 29-29: File content differs from formatting output. Formatting check (Biome) failed: 'File content differs from formatting output' during 'pnpm check:format'.

🔇 Additional comments (3)

src/parser/stream/StringCSVLexerTransformer.spec.ts (3)

4-11: LGTM on imports and test setup.

The new imports for TokenNoLocation and Delimiter align with the unified token model. The switch from concurrent to serial test execution is a valid choice for property-based tests that may have shared state concerns.

101-118: Token expectation logic looks correct.

The iteration properly assigns Delimiter.Record to the last field in each row and Delimiter.Field to all preceding fields. This aligns with the unified token model.

27-31: Good edge case handling for empty CSV.

The conditional logic properly handles the empty CSV case: when csv.length === 0, chunks become [""] and expected tokens are an empty array. This correctly distinguishes between an empty CSV and a CSV containing empty fields.

src/parser/stream/StringCSVLexerTransformer.spec.ts

…rmer tests

- Consolidate FlexibleCSVRecordAssembler test files (6 → 2 files) - Merged array-output, field-count-limit, object-output, and prototype-safety tests into .test.ts - All 77 tests passing - Consolidate FlexibleStringCSVLexer test files (4 → 2 files) - Merged buffer-overflow and undefined-check tests into .test.ts - All 33 tests passing - Improve coverage configuration in vite.config.ts - Add explicit exclude patterns for test files - Add reporter configuration - Set reportsDirectory - Add @vitest/coverage-v8 to devDependencies for future migration Total: 110 test cases passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

package.json (1)

211-235: Potentially unused @vitest/coverage-v8 devDependency

You now have both @vitest/coverage-istanbul and @vitest/coverage-v8, but the visible Vitest config still uses coverage.provider = "istanbul". Unless you have another config that actually uses the V8 provider or plan to switch shortly, @vitest/coverage-v8 looks unused and could be removed to keep installs lean (or, alternatively, switch provider to "v8" and drop the Istanbul plugin).

src/parser/models/FlexibleStringCSVLexer.test.ts (5)

7-11: Consider letting TS infer the lexer generics to reduce coupling

The explicit type FlexibleStringCSVLexer<",", '"', true> on lexer while constructing it via new FlexibleStringCSVLexer({ trackLocation: true }) tightly couples this test to the current generic defaults. Unless you rely on these type parameters elsewhere, you can simplify to a non-parameterized type (or at most fix TrackLocation to true) and let inference handle the rest, making the tests more robust to future API tweaks.

217-237: AbortSignal-based tests look good; consider guarding for runtime support of AbortSignal.timeout

The AbortController/AbortSignal behavior and custom reason coverage are solid. One small concern is AbortSignal.timeout(0), whose availability and semantics can vary across runtimes and Node versions. To avoid brittle failures on environments without this API or with different timing, you might:

Skip this test when typeof AbortSignal.timeout !== "function", or

Use a small non-zero timeout and/or also assert on signal.aborted before constructing the lexer.

This keeps the semantics you’re testing while avoiding environment-specific flakes.

Also applies to: 254-273

275-341: Strengthen ParseError metadata assertions

These tests nicely validate that source and rowNumber are propagated on errors. Two small improvements to make them more precise:

Assert error instanceof ParseError instead of just instanceof Error + error.name === "ParseError" so a renamed/mis-thrown error class is caught earlier.

Where practical, assert an exact rowNumber (or at least a narrow range) rather than > 0, if the row calculation is considered part of the contract.

Not required, but would tighten the guarantees these tests provide.

379-569: Buffer overflow / DoS tests are strong; consider tuning for runtime cost

These tests give thorough coverage of maxBufferSize semantics (default, custom, Infinity, streaming, and multiple attack patterns) and are very valuable from a robustness/security standpoint.

Two optional tweaks you might consider:

Some tests allocate quite large strings (11–20M characters). If the suite ever becomes slow or memory-sensitive, you could reduce the limits and chunk sizes proportionally (e.g., lower maxBufferSize in tests and shrink inputs) while preserving all behaviors being asserted.

In the “Infinity” case, you could assert .not.toThrow() instead of .not.toThrow(RangeError) if you want to be certain that no other error type is unexpectedly raised.

Overall, the structure and intent of these tests look solid.

97-99: Minor typos in test descriptions/comments

Very small nits that don’t affect behavior but slightly impact readability:

Test names: "should thorw an error..." / "should thorw DOMException..." → "should throw...".

Description: "EOF(RCLF)" → "EOF(CRLF)" for consistency with the actual line ending.

If you touch this file again, these quick fixes will make the test output a bit cleaner.

Also applies to: 124-126, 211-215, 227-237

src/parser/models/FlexibleCSVRecordAssembler.test.ts (1)

694-703: Duplicate header: [] + object format runtime validation test

The test "should throw error when header: [] with outputFormat: 'object'" appears twice (once under “Array Output Format → Headerless mode (header: []) → Runtime validation errors” and again under “Headerless mode (header: []) - Runtime Validation Errors” for object output) with effectively identical expectations.

You can safely drop one of them to avoid duplicate coverage and keep the suite leaner.

Also applies to: 930-942

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c01bd and 4395ef1.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (10)

package.json (1 hunks)
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts (0 hunks)
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts (0 hunks)
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts (0 hunks)
src/parser/models/FlexibleCSVRecordAssembler.prototype-safety.test.ts (0 hunks)
src/parser/models/FlexibleCSVRecordAssembler.test.ts (5 hunks)
src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts (0 hunks)
src/parser/models/FlexibleStringCSVLexer.test.ts (13 hunks)
src/parser/models/FlexibleStringCSVLexer.undefined-check.test.ts (0 hunks)
vite.config.ts (1 hunks)

💤 Files with no reviewable changes (6)

src/parser/models/FlexibleCSVRecordAssembler.prototype-safety.test.ts
src/parser/models/FlexibleCSVRecordAssembler.field-count-limit.test.ts
src/parser/models/FlexibleStringCSVLexer.undefined-check.test.ts
src/parser/models/FlexibleStringCSVLexer.buffer-overflow.test.ts
src/parser/models/FlexibleCSVRecordAssembler.array-output.test.ts
src/parser/models/FlexibleCSVRecordAssembler.object-output.test.ts

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: :0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user `kamiazya` has confirmed that the changes made in PR #32 are correct and has requested the resolution of all review comments.

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 53
File: .github/workflows/ci.yaml:23-39
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The user has clarified that in the `package.json` of the `web-csv-toolbox` project, the `test` script is defined as `vitest`, which means that the command `npm test run` correctly expands to `vitest run`.

Applied to files:

package.json
src/parser/models/FlexibleCSVRecordAssembler.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 29
File: src/transformers/LexerTransformer.ts:146-150
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The documentation within the `LexerTransformer` class was corrected to accurately describe the behavior of the `#nextToken` method when not flushing and the buffer does not end with a quotation mark, as per commit `d5c7acd6cd074296ab8dd39c62acb3efd84d4707`.

Applied to files:

src/parser/models/FlexibleStringCSVLexer.test.ts

📚 Learning: 2024-06-10T19:31:23.296Z

Learnt from: kamiazya
Repo: kamiazya/web-csv-toolbox PR: 32
File: src/__tests__/helper.ts:0-0
Timestamp: 2024-06-10T19:31:23.296Z
Learning: The `csv` function that was previously commented out in `src/__tests__/helper.ts` has been confirmed as removed in commit `f567442793c67651e6dbb934623b05540c33c8d1`.

Applied to files:

src/parser/models/FlexibleStringCSVLexer.test.ts
src/parser/models/FlexibleCSVRecordAssembler.test.ts

🧬 Code graph analysis (1)

src/parser/models/FlexibleCSVRecordAssembler.test.ts (4)

src/core/types.ts (2)

AnyToken (165-165)

CSVRecordAssembler (2198-2203)

src/parser/api/model/createCSVRecordAssembler.ts (1)

createCSVRecordAssembler (82-159)

src/parser/models/FlexibleCSVArrayRecordAssembler.ts (1)

RangeError (192-200)

src/parser/models/FlexibleCSVObjectRecordAssembler.ts (1)

RangeError (185-193)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build / Build TypeScript

🔇 Additional comments (6)

vite.config.ts (1)

220-233: Coverage filters and reporters look well-aligned with the test layout

The coverage configuration is consistent: include targets src/**/*.ts, the new exclude patterns cleanly drop all test/test-d files and test directories (including browser/node variants), and the reporter set plus reportsDirectory: "./coverage" are standard choices that should work smoothly with the current Istanbul provider.

src/parser/models/FlexibleStringCSVLexer.test.ts (1)

343-377: Good targeted regression coverage for undefined/flush edge cases

The “Undefined checks” block is a useful guard against regressing into TypeErrors or incorrect behavior when dealing with incomplete vs complete quoted fields and simple unquoted fields. Using toThrow(ParseError) explicitly for the incomplete case is also a good safety net against the wrong error type leaking out. No changes needed here.

src/parser/models/FlexibleCSVRecordAssembler.test.ts (4)

2-3: Abort/timeout tests are correctly migrated to the unified token model

The AbortSignal/Timeout tests now build AnyToken[] with Delimiter.Record and realistic locations and still exercise the expected AbortError, custom error, and TimeoutError behaviors against createCSVRecordAssembler. The token shape and expectations look consistent with the new lexer/assembler API, and the tests remain focused and deterministic.

Also applies to: 54-99, 114-128

181-563: Array output, includeHeader, and columnCountStrategy tests look coherent and thorough

The new “Array Output Format” tests comprehensively exercise outputFormat: "array", includeHeader, all columnCountStrategy options (keep, fill, sparse, strict, truncate), and headerless mode (header: []) — including variable-length rows and invalid configuration cases. The expectations line up with the documented factory constraints and the intended semantics for padding, truncation, and strictness, so this block gives strong coverage of the new behavior.

708-979: Object output and columnCountStrategy constraints are well specified in tests

The “Object Output Format” tests do a good job of pinning down the new contract: fill and strict as the only valid strategies for object output, with clear expectations for short/long rows and empty vs missing fields, and explicit rejections of sparse, keep, and truncate plus all header: [] + object combinations. This should make regressions around the breaking strategy change very obvious.

982-1182: Prototype pollution regression coverage is strong and well targeted

The new tests around __proto__, constructor, prototype, and other dangerous keys (including multi‑header and multi‑record cases) are very solid: they verify that parsed records have these as own string properties while fresh {} instances remain unpolluted, and they document the baseline Object.fromEntries behavior you rely on internally. This is exactly the kind of safety net you want around the new token/assembler changes.

src/parser/models/FlexibleCSVRecordAssembler.test.ts

…RecordAssembler tests

kamiazya force-pushed the pref-js branch from 91d59b4 to 0574ce1 Compare November 30, 2025 02:54

kamiazya and others added 13 commits November 30, 2025 15:04

style: format code

5dc9bbe

fix: update test expectations for new parser behavior

09bfc64

Missing CSV fields now return empty string instead of undefined. Updated test expectations in: - StringCSVParserStream.test.ts - BinaryCSVParserStream.test.ts - parseBinaryToIterableIterator.test.ts - parse.spec.ts

fix: remove concurrent test execution in stream tests

5cb1821

Stream tests were hanging in CI and locally when run in parallel. Removed .concurrent from describe and it blocks in: - StringCSVParserStream.test.ts - BinaryCSVParserStream.test.ts

refactor: use direct vitest imports in stream tests

f8f60ef

Removed unnecessary aliases for describe and it. Directly importing from vitest is cleaner and more standard.

style: format test files

5e6e690

Applied biome formatting to: - src/parser/stream/StringCSVParserStream.test.ts - src/parser/stream/BinaryCSVParserStream.test.ts

Pad record-view fill rows in fill strategy

abb96da

Add record-view assembler and tighten column strategies

125c345

kamiazya marked this pull request as ready for review December 2, 2025 15:27

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

.changeset/column-count-strategy-rename.md Outdated Show resolved Hide resolved

.changeset/performance-improvements.md Outdated Show resolved Hide resolved

config/vitest.setup.ts Show resolved Hide resolved

src/parser/api/model/createStringCSVLexer.ts Show resolved Hide resolved

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

kamiazya and others added 2 commits December 3, 2025 00:48

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

src/parser/stream/StringCSVLexerTransformer.spec.ts Outdated Show resolved Hide resolved

kamiazya and others added 2 commits December 3, 2025 10:51

style: improve formatting of expected tokens in StringCSVLexerTransfo…

19c01bd

…rmer tests

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

src/parser/models/FlexibleCSVRecordAssembler.test.ts Outdated Show resolved Hide resolved

refactor: enhance comments and improve header handling in FlexibleCSV…

6926119

…RecordAssembler tests

kamiazya merged commit 25d49ee into main Dec 3, 2025
42 checks passed

kamiazya deleted the pref-js branch December 3, 2025 16:01

github-actions bot mentioned this pull request Dec 3, 2025

Version Packages #615

Open

		const actual = await transform(transformer, chunks);
		expect(actual).toMatchObject(expected);

Uh oh!

perf: JavaScript parser performance improvements #614

perf: JavaScript parser performance improvements #614

Conversation

kamiazya commented Nov 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Improvements

Final Performance Results (Pure JavaScript)

Breaking Changes

1. Column Count Strategy Rename and New Default

Migration Guide

2. Lexer API Changes (Low-level API only)

Who is affected?

Test plan

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

codspeed-hq bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #614 will improve performances by ×4.2

Summary

Benchmarks breakdown

Footnotes

Uh oh!

coderabbitai bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Assets Changed:

Assets Changed:

Assets Changed:

Assets Changed:

Assets Changed:

Assets Changed:

Uh oh!

kamiazya commented Dec 2, 2025

Uh oh!

coderabbitai bot commented Dec 2, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

kamiazya commented Nov 30, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Nov 30, 2025 •

edited

Loading

codspeed-hq bot commented Nov 30, 2025 •

edited

Loading

coderabbitai bot commented Nov 30, 2025 •

edited

Loading

codecov bot commented Nov 30, 2025 •

edited

Loading

codecov bot commented Nov 30, 2025 •

edited

Loading