perf: optimize CSVRecordAssembler with single-loop array processing #571

kamiazya · 2025-11-04T15:05:31Z

Summary

Optimizes CSVRecordAssembler by replacing chained array methods with efficient single-loop implementation.

Performance Optimization

Before: .map().filter().map() - 3 array passes
After: Single for loop - 1 pass
Applied to: 3 critical code paths
- RecordDelimiter handler (processing non-empty records)
- Empty line handler (skipEmptyLines=false case)
- Flush handler (buffered data)

Benefits

Reduces CPU cycles during record assembly phase
Eliminates intermediate array allocations
Most beneficial for CSVs with many columns
Complements the CSVLexer buffer pointer optimization from perf: optimize CSVLexer with buffer pointer pattern #569

Testing

✅ All 460 tests pass
✅ Benchmark suite runs successfully
✅ Functionally identical output behavior

Replace chained array methods with efficient single-loop implementation to reduce array iterations from 3 passes to 1 pass. Changes: - RecordDelimiter handler: map().filter().map() → single loop - Empty line handler: filter().map() → single loop - Flush handler: map().filter().map() → single loop This optimization reduces CPU cycles during record assembly, particularly beneficial for CSVs with many columns. All 460 tests pass. Complements CSVLexer buffer pointer optimization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

changeset-bot · 2025-11-04T15:05:35Z

🦋 Changeset detected

Latest commit: 30ac954

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2025-11-04T15:05:42Z

Warning

Rate limit exceeded

@kamiazya has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f8d7ffb and 30ac954.

📒 Files selected for processing (2)

.changeset/perf-csv-record-assembler.md (1 hunks)
src/CSVRecordAssembler.ts (2 hunks)

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch perf/optimize-csv-record-assembler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-04T15:05:51Z

Summary of Changes

Hello @kamiazya, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of the CSVRecordAssembler by refactoring array processing logic. It replaces inefficient chained array methods with optimized single-loop implementations across key data handling paths, aiming to reduce CPU load and memory footprint, especially when processing large CSV files. The changes maintain identical output behavior, with all existing tests passing.

Highlights

Performance Optimization: Replaced chained array methods (e.g., .map().filter().map()) with a single for loop in CSVRecordAssembler to reduce array passes from three to one.
Scope of Optimization: The optimization was applied to three critical code paths: the RecordDelimiter handler, the empty line handler (when skipEmptyLines=false), and the flush handler.
Performance Benefits: This change reduces CPU cycles during record assembly, eliminates intermediate array allocations, and is particularly beneficial for CSVs with many columns, complementing previous CSVLexer optimizations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces significant performance optimizations to the CSVRecordAssembler by replacing chained array methods with single-loop implementations. This change reduces CPU cycles and eliminates intermediate array allocations, which is particularly beneficial for CSVs with many columns. The changes are well-documented in the new .changeset file and the code comments. All tests pass, and benchmarks run successfully, indicating that the functional behavior remains identical while improving performance. The approach aligns with best practices for optimizing array processing in JavaScript/TypeScript.

gemini-code-assist · 2025-11-04T15:07:15Z

src/CSVRecordAssembler.ts

+            // Optimize: single loop instead of map().filter().map()
+            const entries: [string, string | undefined][] = [];
+            for (let i = 0; i < this.#header.length; i++) {
+              const header = this.#header[i];
+              if (header) {
+                entries.push([header, this.#row[i]]);
+              }
+            }
+            yield Object.fromEntries(entries) as unknown as CSVRecord<Header>;


The optimization from chained map().filter().map() to a single for loop is a significant improvement for performance, especially when dealing with large datasets. This change reduces array iterations and avoids intermediate array allocations, which aligns with the PR's goal of optimizing CPU cycles.

gemini-code-assist · 2025-11-04T15:07:15Z

src/CSVRecordAssembler.ts

+              // Optimize: single loop instead of filter().map()
+              const entries: [string, string][] = [];
+              for (let i = 0; i < this.#header.length; i++) {
+                const header = this.#header[i];
+                if (header) {
+                  entries.push([header, ""]);
+                }
+              }
+              yield Object.fromEntries(entries) as CSVRecord<Header>;


Similar to the previous change, replacing filter().map() with a single for loop here also contributes to better performance by reducing array passes and memory allocations. This consistency in optimization is good.

gemini-code-assist · 2025-11-04T15:07:15Z

src/CSVRecordAssembler.ts

+        // Optimize: single loop instead of map().filter().map()
+        const entries: [string, string | undefined][] = [];
+        for (let i = 0; i < this.#header.length; i++) {
+          const header = this.#header[i];
+          if (header) {
+            entries.push([header, this.#row[i]]);
+          }
+        }
+        yield Object.fromEntries(entries) as unknown as CSVRecord<Header>;


Applying the same single-loop optimization to the #flush method ensures that this critical path also benefits from the performance improvements. This thoroughness in applying the optimization across all relevant code paths is commendable.

codecov · 2025-11-04T15:07:59Z

Bundle Report

Changes will increase total bundle size by 324 bytes (0.04%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
web-csv-toolbox-CSV-esm	292.06kB	324 bytes (0.11%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: web-csv-toolbox-CSV-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`CSVRecordAssembler.js`	324 bytes	4.31kB	8.13% ⚠️

Files in CSVRecordAssembler.js:

./src/CSVRecordAssembler.ts → Total Size: 4.11kB

codspeed-hq · 2025-11-04T15:14:41Z

CodSpeed Performance Report

Merging #571 will not alter performance

_{Comparing perf/optimize-csv-record-assembler (30ac954) with main (f8d7ffb)}

Summary

✅ 57 untouched

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: optimize CSVRecordAssembler with single-loop array processing #571

perf: optimize CSVRecordAssembler with single-loop array processing #571

Uh oh!

kamiazya commented Nov 4, 2025

Uh oh!

changeset-bot bot commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

codecov bot commented Nov 4, 2025

Assets Changed:

Uh oh!

codspeed-hq bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

perf: optimize CSVRecordAssembler with single-loop array processing #571

Are you sure you want to change the base?

perf: optimize CSVRecordAssembler with single-loop array processing #571

Uh oh!

Conversation

kamiazya commented Nov 4, 2025

Summary

Performance Optimization

Benefits

Testing

Related

Uh oh!

changeset-bot bot commented Nov 4, 2025

🦋 Changeset detected

Uh oh!

coderabbitai bot commented Nov 4, 2025

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 4, 2025

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

codspeed-hq bot commented Nov 4, 2025

CodSpeed Performance Report

Merging #571 will not alter performance

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants