Added PO handling for new lines + tests #3282

pasotee · 2025-10-20T11:12:43Z

Summary by CodeRabbit

Bug Fixes
- Improved parsing of multiline translations in PO files, including correct handling of LF and CRLF line breaks.
- Enhanced interpretation of escape sequences inside quoted strings so escaped characters are correctly rendered.
Tests
- Updated test fixtures and expectations to reflect additional parsed translations and multiline handling.

coderabbitai · 2025-10-20T11:12:58Z

Walkthrough

Po parsing was changed to split header meta lines by raw newline characters and to interpret common escape sequences inside quoted strings (e.g., \n, \r, \t, \", \\) as their actual characters. Tests and example PO resource entries were updated to cover multiline translations (LF and CRLF).

Changes

Cohort / File(s)	Summary
Parser logic `backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt`	`processHeader` now splits header text on raw `\n` characters. Character handling for escaped sequences was changed: when inside quotes and `currentEscaped` is true, common escapes (`n`, `r`, `t`, `"`, `\`) are converted to their corresponding characters; unknown escapes preserve the backslash plus character.
Unit tests `backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt`, `backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt`	Added assertions in PoParserTest validating multiline `msgstr` values (LF and CRLF). Updated PoFileProcessorTest expected translation count from 9 to 11.
Test resources `backend/data/src/test/resources/import/po/example.po`	Added two PO entries demonstrating multiline translations: one using LF (`\n`) and one using CRLF (`\r\n`) line breaks in `msgstr`.

Sequence Diagram(s)

sequenceDiagram
  participant FS as FileSystem
  participant Parser as PoParser
  participant Test as TestRunner

  FS->>Parser: load example.po (header + entries)
  note right of Parser #DDEEFF: Header processing
  Parser->>Parser: processHeader(split by raw "\n")
  note right of Parser #F6F2DD: Entry parsing
  Parser->>Parser: parse msgid/msgstr chars
  alt escaped char inside quotes
    Parser->>Parser: translate '\n','\r','\t','"','\\' → actual char
  else unknown escape
    Parser->>Parser: keep backslash + char
  end
  Parser->>Test: produce translations (11)
  Test->>FS: compare against updated example.po expectations

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled backslashes late at night,

Turned \n to breath and \" to light.
Two new lines hopped into view,
CRLF and LF—both true.
Hooray, the parser sings anew!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Added PO handling for new lines + tests" directly and accurately reflects the main changes in the pull request. The changeset focuses on improving how the PO file parser handles newline characters (both LF and CRLF) in multiline translations, with corresponding parser logic updates in PoParser.kt and multiple test additions and updates to validate this behavior. The title is specific enough to convey the primary intent without being overly verbose, and it clearly indicates what functionality was added and that tests were included.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32c24b6 and bc273ef.

📒 Files selected for processing (1)

backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2)
54-68: Header split on LF aligns with new decoding; add safety and locale fixes.

Good change. Two follow-ups:

Guard malformed header lines (no colon) to avoid IndexOutOfBounds.

Use Locale.ROOT for case-insensitive keys; avoid default-locale surprises (e.g., Turkish i).

Minor: use lineSequence()/forEach instead of map-for-side-effects.

Apply:
-      it.msgstr.split("\n").map { metaLine ->
-        val trimmed = metaLine.trim()
-        if (trimmed.isBlank()) {
-          return@map
-        }
-        val colonPosition = trimmed.indexOf(":")
-        val key = trimmed.substring(0 until colonPosition)
-        val value = trimmed.substring(colonPosition + 1).trim()
-        when (key.lowercase(Locale.getDefault())) {
+      it.msgstr.lineSequence().forEach { metaLine ->
+        val trimmed = metaLine.trim()
+        if (trimmed.isEmpty()) return@forEach
+        val colonPosition = trimmed.indexOf(':')
+        if (colonPosition <= 0) return@forEach
+        val key = trimmed.substring(0, colonPosition).lowercase(Locale.ROOT)
+        val value = trimmed.substring(colonPosition + 1).trim()
+        when (key) {
           "project-id-version" -> result.projectIdVersion = value
           "language" -> result.language = value
           "plural-forms" -> result.pluralForms = value
           else -> result.other[key] = value
         }
-      }
+      }
Please add a small test asserting header meta (e.g., language/plural-forms) still parses when a line lacks a colon or has trailing spaces.

129-137: Escape decoding inside quotes looks right; consider completing the table.

Current set covers n/r/t/"/\. Add common C escapes b (backspace), f (form feed), v (vertical tab). Unknowns remain literal (good).
-      val specialEscape: Char? = if (quoted) when (this) {
+      val specialEscape: Char? = if (quoted) when (this) {
         'n'  -> '\n'
         'r'  -> '\r'
         't'  -> '\t'
+        'b'  -> '\b'
+        'f'  -> '\u000C'
+        'v'  -> '\u000B'
         '"'  -> '"'
         '\\' -> '\\'
         else -> null
       } else null
Add one assertion that unknown escapes (e.g., "\q") are preserved as two characters and that "\b" maps to backspace when enabled.

Also applies to: 138-143
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt (1)
30-31: Avoid index fragility; assert by msgid.

Order can shift; look up entries by msgid for robustness.
-    assertThat(result.translations[10].msgstr.toString()).isEqualTo("This\nis\na\nmultiline\nstring")
-    assertThat(result.translations[11].msgstr.toString()).isEqualTo("This\r\nis\r\na\r\nmultiline\r\nstring")
+    val lf = result.translations.first { it.msgid.toString() == "Multiline message with \\n" }
+    assertThat(lf.msgstr.toString()).isEqualTo("This\nis\na\nmultiline\nstring")
+    val crlf = result.translations.first { it.msgid.toString() == "Multiline message with \\n\\r" }
+    assertThat(crlf.msgstr.toString()).isEqualTo("This\r\nis\r\na\r\nmultiline\r\nstring")
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt (1)
32-32: Consider asserting presence, not only count.

Exact size can fluctuate as the fixture evolves. Optionally assert keys exist to make intent explicit.
 assertThat(mockUtil.fileProcessorContext.translations).hasSize(11)
+assertThat(mockUtil.fileProcessorContext.translations.keys)
+  .contains("Multiline message with \\n", "Multiline message with \\n\\r")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99cbd12 and 32c24b6.

📒 Files selected for processing (4)

backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2 hunks)
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt (1 hunks)
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt (1 hunks)
backend/data/src/test/resources/import/po/example.po (1 hunks)

🔇 Additional comments (1)

backend/data/src/test/resources/import/po/example.po (1)

53-57: Test data additions LGTM.

The LF/CRLF cases are well chosen and align with parser changes.

Added PO handling for new lines + tests

32c24b6

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

JanCizmar requested a review from Anty0 October 20, 2025 16:32

fixed linting

bc273ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added PO handling for new lines + tests #3282

Added PO handling for new lines + tests #3282

Uh oh!

pasotee commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Added PO handling for new lines + tests #3282

Are you sure you want to change the base?

Added PO handling for new lines + tests #3282

Uh oh!

Conversation

pasotee commented Oct 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pasotee commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 20, 2025 •

edited

Loading