Skip to content

Conversation

@mitchellhenke
Copy link
Contributor

@mitchellhenke mitchellhenke commented Oct 25, 2025

Thanks for making and maintaining this package, it's really excellent! I have been running into the issue mentioned in #259 (comment) where a name of "\" causes the process to fail with:

ERROR:  extra data after last expected column
CONTEXT:  COPY __ways17535, line 1: " 112       8720801 50      50      0       UNKNOWN 0       0.032828469505751391    -83.6166172     42.2295873      -83.63609       42.2560168   6247..."

I noticed that in the past there was escaping of backslashes (a120fc2), but I wasn't able to track down the removal, so I've taken a similar approach here. I've included a minimal test OSM file below that fails without the patch and succeeds with it. It's based on a much larger file that fails and succeeds in the same way.

Test File
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6">
	<bounds minlat="42.187151" minlon="-83.699906" maxlat="42.300306" maxlon="-83.553524"/>
	<node id="62472359" lat="42.2295873" lon="-83.6166172" version="7" timestamp="2023-08-17T19:02:05Z" changeset="0"/>
	<node id="62472289" lat="42.2559569" lon="-83.6361373" version="4" timestamp="2025-02-15T13:25:01Z" changeset="0"/>
	<node id="4714375469" lat="42.2560168" lon="-83.63609" version="2" timestamp="2025-02-15T13:25:01Z" changeset="0"/>
	<node id="2169047981" lat="42.2299107" lon="-83.6163898" version="1" timestamp="2013-02-23T16:32:23Z" changeset="0"/>
	<way id="8720801" version="4" timestamp="2025-02-15T13:25:01Z" changeset="0">
		<nd ref="62472289"/>
		<nd ref="4714375469"/>
		<tag k="name" v="\"/>
		<tag k="highway" v="service"/>
	</way>
	<way id="8720804" version="7" timestamp="2018-09-06T20:44:47Z" changeset="0">
		<nd ref="62472359"/>
		<nd ref="2169047981"/>
		<tag k="name" v="a\b"/>
		<tag k="highway" v="motorway_link"/>
	</way>
</osm>

Changes proposed in this pull request:

  • Escape backslashes when building tab-separated row

@pgRouting/admins

Summary by CodeRabbit

  • Bug Fixes
    • Fixed improper handling of backslash characters in tab-separated data output to ensure backslashes are correctly escaped during data export, preventing formatting issues and data loss.

@coderabbitai
Copy link

coderabbitai bot commented Oct 25, 2025

Walkthrough

A Boost header for string replacement was added to src/utilities/utilities.cpp. The tab_separated function now escapes backslashes by replacing \ with \\ in non-empty columns before appending output. No public interfaces were modified.

Changes

Cohort / File(s) Summary
Backslash escaping in tab-separated output
src/utilities/utilities.cpp
Added Boost string replace header inclusion; modified tab_separated function to escape backslashes in non-empty columns by replacing \ with \\ before result appending

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify backslash escaping is applied at the correct stage in the pipeline (before appending to result)
  • Confirm the Boost header inclusion is necessary and correctly referenced
  • Check for potential edge cases with special character combinations or empty columns

Poem

🐰 A backslash hops through code today,
With double form, it's here to stay!
One becomes two, a simple feat,
Tab-separated rows now neat!
Boost strings join this woodland dance,
Where escaped chars take their chance. 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Escape backslashes when building tab-separated row" accurately and specifically describes the main change in the changeset. The title directly corresponds to the implementation in src/utilities/utilities.cpp, where backslashes are escaped by replacing "\" with "\\" in the tab_separated function before appending to the result. The title is concise, clear, and uses specific language that allows a teammate scanning the repository history to immediately understand the primary purpose of the change without ambiguity.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utilities/utilities.cpp (1)

37-48: Add escaping for tabs, newlines, and carriage returns in tab_separated().

PostgreSQL's COPY text format requires that backslashes, newlines, carriage returns, and the delimiter character must be preceded by a backslash if they appear as part of a column value. The current implementation at src/utilities/utilities.cpp:37-48 only escapes backslashes but omits the three other required escape sequences. Since tab_separated() uses tab as the delimiter, actual tab characters in the data would corrupt the format.

Add the following escape operations before line 44 (after the backslash escape):

boost::replace_all(column, "\n", "\\n");
boost::replace_all(column, "\r", "\\r");
boost::replace_all(column, "\t", "\\t");

This fix applies to all calls to tab_separated() in the codebase (lines 466 in src/database/Export2DB.cpp, line 79 in src/configuration/tag_key.cpp, and line 97 in include/database/Export2DB.h), particularly for way names and other OSM-sourced data that may contain these characters.

🧹 Nitpick comments (2)
src/utilities/utilities.cpp (2)

40-40: Simplify the empty check.

The condition column.empty() || column == "" is redundant since empty() already checks for zero length.

Apply this diff:

-        if (column.empty() || column == "") {
+        if (column.empty()) {

39-39: Optional: Consider optimizing to avoid copying strings without backslashes.

The loop uses pass-by-value (auto column), which copies every string even when no escaping is needed. For better performance when backslashes are rare, consider using const auto& and only copying when replacement is necessary.

Example optimization:

-    for (auto column: columns) {
+    for (const auto& column: columns) {
         if (column.empty()) {
             result += "\\N\t";
+        } else if (column.find('\\') != std::string::npos) {
+            auto escaped = column;
+            boost::replace_all(escaped, "\\", "\\\\");
+            result += escaped + "\t";
         } else {
-            boost::replace_all(column, "\\", "\\\\");
             result += column + "\t";
         }

Note: This adds complexity, so only worthwhile if profiling shows string copies are a bottleneck.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b79a39 and 798d8ec.

📒 Files selected for processing (1)
  • src/utilities/utilities.cpp (2 hunks)
🔇 Additional comments (1)
src/utilities/utilities.cpp (1)

23-23: LGTM: Header addition is correct.

The Boost string replace header is necessary for the boost::replace_all function used below.

@cvvergara
Copy link
Member

All tests are failing, Create the #318 and merged your changes

@cvvergara cvvergara closed this Oct 28, 2025
@mitchellhenke
Copy link
Contributor Author

Thank you for the quick review and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants