ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

maparent · 2025-12-25T17:16:46Z

https://linear.app/discourse-graphs/issue/ENG-1211/refactor-exportutils-factor-out-utility-functions-and-markdown

Summary by CodeRabbit

New Features
- Added richer Roam-to-Markdown export with frontmatter, improved embed and reference resolution, and optional discourse context.
- Enhanced filename generation and link formatting for exported pages.
- Support for collecting block IDs and converting block structures for exports.
Refactor
- Consolidated export logic into reusable utilities and centralized page-to-markdown rendering.
- Export flows now early-return on empty input to avoid producing invalid outputs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…own. This is pure refactoring with no change of functionality.

linear · 2025-12-25T17:16:49Z

ENG-1211 Refactor exportUtils, factor out utility functions and markdown

supabase · 2025-12-25T17:16:55Z

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.

Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

maparent · 2025-12-25T18:39:29Z

Explanations about the breakup here: https://www.loom.com/share/afa48744aa0441628ec182a2147e8321
Also note: The coderabbit work was done in 1032 , this is just splitting off the code changes.

mdroidian · 2025-12-25T21:16:50Z

@coderabbitai full review

coderabbitai · 2025-12-25T21:16:55Z

✅ Actions performed

Full review triggered.

coderabbitai · 2025-12-25T21:22:33Z

📝 Walkthrough

Walkthrough

Adds new Roam export utilities and a page-to-Markdown renderer, and refactors getExportTypes to delegate page data extraction, filename/link handling, and Markdown generation to the new modules (exportUtils.ts and pageToMarkdown.ts). Changes are focused on export formatting and data extraction logic.

Changes

Cohort / File(s)	Change Summary
New export helpers `apps/roam/src/utils/exportUtils.ts`	New utility module: `uniqJsonArray`, `getPageData`, `getContentFromNodes`, `getFilename`, `toLink`, `pullBlockToTreeNode`, `collectUids` — handles deduplication, discourse-node matching, filename sanitization, link formatting, and PullBlock→TreeNode conversion.
Markdown renderer `apps/roam/src/utils/pageToMarkdown.ts`	New module: `toMarkdown` and `pageToMarkdown` that render TreeNode/Result into Markdown with frontmatter, discourse context, embed and reference handling, block embedding/children support, filename/link integration, and UID collection.
Export orchestration refactor `apps/roam/src/utils/getExportTypes.ts`	Refactored to use `getPageData` and `pageToMarkdown`; removed local implementations (pull/formatting, filename, frontmatter); added guards for empty inputs; removed exported `DiscourseExportResult` type.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

ENG-527 resolve block embeds for page export not working for embed #265: Modifies Markdown export embedding logic and relates to embed/embed-children regexes and their handling in the new toMarkdown implementation.

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: refactoring exportUtils and factoring out utility functions and markdown handling into new modules, which aligns directly with the three new/modified files (exportUtils.ts, pageToMarkdown.ts, getExportTypes.ts).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

apps/roam/src/utils/getExportTypes.ts (1)

224-241: TODO comment indicates future consolidation opportunity.

The comment at line 224 notes that treeNodeToMarkdown logic should be reused with the markdown export. Since pageToMarkdown.ts now exists with a toMarkdown function, this could potentially be addressed in a follow-up PR to reduce duplication between the PDF and Markdown exports.

Would you like me to open an issue to track consolidating the PDF export's treeNodeToMarkdown with the existing toMarkdown from pageToMarkdown.ts?

apps/roam/src/utils/exportUtils.ts (1)

8-17: Consider performance implications of uniqJsonArray.

The implementation uses JSON.stringify with sorted entries for deduplication, which works correctly but has O(n * m log m) complexity where m is the number of keys per object. For large arrays with complex objects, this could be slow.

For the current use case (filtering relation data), this should be acceptable, but worth noting if used with larger datasets.
apps/roam/src/utils/pageToMarkdown.ts (1)
96-110: YAML frontmatter escaping is limited to the text field.

Line 101-102 correctly escapes quotes in the text field, but other dynamic fields (line 104) are inserted without escaping. Values containing YAML special characters (:, \n, #, etc.) could produce malformed YAML.

Since this is for user-controlled exports, it's not a security issue, but could cause parsing errors when importing the exported markdown.
🔎 Optional: Comprehensive YAML escaping
       if (capt === "text") {
-        // Wrap title in quotes and escape additional quotes
-        const escapedText = result[capt].toString().replace(/"/g, '\\"');
-        return `"${escapedText}"`;
+        const escapedText = result[capt].toString().replace(/"/g, '\\"');
+        return `"${escapedText}"`;
       }
-      return result[capt].toString();
+      const value = result[capt].toString();
+      // Quote values containing YAML special characters
+      if (/[:\n#[\]{}|>&*!?]/.test(value) || value.includes('"')) {
+        return `"${value.replace(/"/g, '\\"')}"`;
+      }
+      return value;

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dfa920 and 4d811e4.

📒 Files selected for processing (3)

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

🧠 Learnings (10)

📚 Learning: 2025-06-22T10:40:52.752Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-07-13T16:47:14.352Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-07-13T16:47:14.352Z
Learning: In the discourse-graph codebase, types.gen.ts contains automatically generated database function type definitions that may have reordered signatures between versions. This reordering is expected behavior from the code generation process and should not be flagged as an issue requiring fixes.

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-07-21T14:22:20.752Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 291
File: packages/database/supabase/functions/create-space/index.ts:0-0
Timestamp: 2025-07-21T14:22:20.752Z
Learning: In the discourse-graph codebase, types.gen.ts is not accessible from Supabase edge functions, requiring duplication of types and utilities when needed in the edge function environment at packages/database/supabase/functions/.

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,jsx,js,css,scss} : Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-11-05T21:57:14.909Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-12-07T20:54:20.007Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

apps/roam/src/utils/exportUtils.ts

🧬 Code graph analysis (2)

apps/roam/src/utils/getExportTypes.ts (4)

apps/roam/src/utils/types.ts (1)

Result (42-46)

apps/roam/src/utils/exportUtils.ts (1)

getPageData (19-67)

apps/roam/src/utils/getExportSettings.ts (1)

getExportSettings (112-125)

apps/roam/src/utils/pageToMarkdown.ts (1)

pageToMarkdown (212-351)

apps/roam/src/utils/exportUtils.ts (1)

apps/roam/src/utils/types.ts (1)

Result (42-46)

🪛 ast-grep (0.40.3)

apps/roam/src/utils/exportUtils.ts

[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (10)

apps/roam/src/utils/getExportTypes.ts (3)

14-22: Clean modular imports from new utility files.

The refactoring correctly imports the required utilities from the new modules. This aligns well with the PR's goal of factoring out utility functions.

84-132: Markdown export callback cleanly refactored.

The callback properly delegates to pageToMarkdown while preserving the progress update pattern and sequential processing. The early guard at line 85 improves robustness.

55-79: Incorrect claim about Neo4j export parameter.

The review incorrectly states that Neo4j (line 152) includes isExportDiscourseGraph, but it doesn't. The actual pattern shows: JSON and Neo4j both omit the parameter (filtering results to discourse nodes), while Markdown and PDF include it (exporting all content). This appears to be intentional design: structured data exports filter by node type, while document exports preserve all content.

Likely an incorrect or invalid review comment.

apps/roam/src/utils/exportUtils.ts (2)

122-129: Good link type handling with appropriate fallback.

The toLink function correctly handles the three link types (wikilinks, alias, roam url) with a sensible plain filename fallback. The extension removal for wikilinks and roam URLs matches expected Markdown/wiki conventions.

131-153: Well-structured recursive tree conversion utilities.

pullBlockToTreeNode correctly maps Roam's PullBlock structure to the TreeNode interface with appropriate defaults. collectUids provides a clean recursive UID collection. Both functions are idiomatic and readable.

apps/roam/src/utils/pageToMarkdown.ts (5)

24-34: Clear regex patterns for Roam embed syntax.

The MATCHES_NONE pattern (/$.+^/) is a clever approach to create a regex that never matches, avoiding conditional logic. The embed regexes correctly capture the 9-10 character block UIDs. Good documentation comments explain the syntax being matched.

42-75: Clean conditional discourse context handling.

The function uses early returns appropriately and cleanly handles the optional appendRefNodeContext feature. The logic for finding and appending referenced discourse nodes is well-structured.

142-161: Comprehensive text processing with proper ordering.

The text processing chain correctly handles:

Embeds and embed-children (with recursion)

Block references

TODO/DONE checkbox conversion

Roam italic syntax conversion

Code block formatting

The use of MATCHES_NONE to conditionally skip regex replacements is an elegant pattern.

280-348: Well-structured content assembly with clear section formatting.

The template literal construction correctly builds the markdown document with:

Frontmatter

Main content from tree children

Optional Discourse Context section

Optional References section

The conditional sections use appropriate guards and formatting.

253-264: Verify the escaping behavior of normalizePageTitle from the roamjs-components library.

The concern about query injection via string interpolation at line 256 is theoretically valid—if normalizePageTitle doesn't escape quotes in page titles, a title like foo"bar would break the Datalog syntax. However, the implementation of normalizePageTitle from the external roamjs-components (v0.85.7) library cannot be verified from public sources. The pattern of string interpolation is consistent throughout the codebase for various user inputs, and no documented security issues exist for this pattern in Roam API calls. Clarify whether normalizePageTitle properly escapes special characters (particularly quotes) for use in Datalog string contexts, or consider using Datalog's parameter binding (:in $ clause) as a safer alternative if available.

apps/roam/src/utils/exportUtils.ts

coderabbitai · 2025-12-25T21:22:36Z

apps/roam/src/utils/exportUtils.ts

+  if (!nodeFormat) return title;
+  const regex = new RegExp(
+    `^${nodeFormat
+      .replace(/\[/g, "\\[")
+      .replace(/]/g, "\\]")
+      .replace("{content}", "(.*?)")
+      .replace(/{[^}]+}/g, "(?:.*?)")}$`,
+  );
+  return regex.exec(title)?.[1] || title;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find DiscourseNode type definition and format field rg -n "type DiscourseNode|interface DiscourseNode" -A 20

Repository: DiscourseGraphs/discourse-graph

Length of output: 22213

🏁 Script executed:

# Check how format patterns are created/validated in the codebase rg -n "\.format\s*=" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 57

🏁 Script executed:

# Look at the full context of exportUtils.ts around lines 79-87 cat -n apps/roam/src/utils/exportUtils.ts | head -100 | tail -50

Repository: DiscourseGraphs/discourse-graph

Length of output: 1584

🏁 Script executed:

# Check if there's any validation of format field when it's set/saved rg -n "format" apps/roam/src -l | head -20

Repository: DiscourseGraphs/discourse-graph

Length of output: 933

🏁 Script executed:

# Search for format field validation or sanitization rg -n "format.*validate|validateFormat|format.*check" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 1338

🏁 Script executed:

# Check DiscourseNodeConfigPanel to see format field editing cat -n apps/roam/src/components/settings/DiscourseNodeConfigPanel.tsx | head -200

Repository: DiscourseGraphs/discourse-graph

Length of output: 8237

🏁 Script executed:

# Search for other uses of format field and regex construction rg -n "getContentFromNodes|format.*RegExp|new RegExp" apps/roam/src/utils

Repository: DiscourseGraphs/discourse-graph

Length of output: 1790

🏁 Script executed:

# Check if there's any try-catch or error handling for regex in similar code rg -n "try.*RegExp|RegExp.*catch" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 57

🏁 Script executed:

# Find the validate function in NodeConfig rg -n "const validate|function validate" apps/roam/src/components/settings/NodeConfig.tsx -B 5 -A 10

Repository: DiscourseGraphs/discourse-graph

Length of output: 506

🏁 Script executed:

# Check getDiscourseNodeFormatExpression since it also processes format cat -n apps/roam/src/utils/getDiscourseNodeFormatExpression.ts

Repository: DiscourseGraphs/discourse-graph

Length of output: 544

🏁 Script executed:

# Look for format validation logic more broadly rg -n "validate.*format|format.*valid" apps/roam/src -B 2 -A 5

Repository: DiscourseGraphs/discourse-graph

Length of output: 2231

🏁 Script executed:

# Check if format field has any UI constraints or warnings cat -n apps/roam/src/components/settings/NodeConfig.tsx | grep -A 30 -B 5 "format"

Repository: DiscourseGraphs/discourse-graph

Length of output: 7712

Use the proper format escaping function to prevent ReDoS.

The getContentFromNodes function at lines 79-87 constructs a regex from the user-editable nodeFormat field with incomplete escaping. Unlike getDiscourseNodeFormatExpression which properly escapes regex metacharacters ([, ], ?, ., +), this code only escapes brackets, leaving other quantifiers unescaped. A malformed format like (a+)+b would create a regex with nested quantifiers, causing exponential backtracking.

Rather than adding a try-catch, replace the custom escaping with getDiscourseNodeFormatExpression, which already handles this safely.

🧰 Tools

🪛 ast-grep (0.40.3)

[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents

In apps/roam/src/utils/exportUtils.ts around lines 79-87, the code builds a RegExp from user-editable nodeFormat but only escapes brackets, leaving other metacharacters and causing ReDoS risk; replace this custom escaping and regex construction with the existing getDiscourseNodeFormatExpression utility that properly escapes all regex metacharacters, then use the returned expression to exec against title (preserve the same fallback to title when no match). Ensure you import getDiscourseNodeFormatExpression if not already imported and remove the manual replace chain so the format is safely escaped before creating/using the regex.

Ok, methodology question: On the one hand, we are saying we should have only one logical change per PR. I tried to make this PR do only the refactor, nothing else, so that you could see it as innocuous and focus on the substance in the other PRs.
On the other hand, moving code draws the attention to issues with previously existing code. Coderabbit pointed this out in its comments to 1032, and I made (a subset of) this fix in that PR originally (a subset because I was following pattern) and then moved it to 1213 instead of 1211 so that 1211 would not have unrelated changes.
(Accessorily, maybe we should do the full fix in both places.)
So the question: When we do a refactoring, and try to make it into a single PR for clarity, is fixing issues with old code part of the same task or an extra task? Honestly, I think it depends; I have argued earlier that it's too much trouble to revert anything done automatically by the tooling. But it's also true that it muddies the refactoring.
I don't care that much either way, but I'm curious about which rule I'm expected to follow here.

@maparent, this is a great question that highlights a real tension in software engineering practice. Here's my perspective on the tradeoffs:

Arguments for keeping refactoring PRs pure:

Easier to review and understand the structural changes

Clearer git history with distinct "what" and "why"

Can be reverted independently if needed

Reduces cognitive load for reviewers

Arguments for fixing issues during refactoring:

Avoids touching the same code twice

Issues are fresh in mind when you're already working with the code

More efficient overall (fewer PRs, fewer context switches)

Prevents shipping known issues in the interim

Practical heuristics I'd suggest:

Security or critical bugs: Fix immediately in the same PR. ReDoS vulnerabilities like the one I flagged fall into this category - if you're moving the code anyway, fixing it is the responsible choice.

Minor improvements or style issues: Keep separate unless trivial (e.g., automated formatting).

Structural issues revealed by the refactor: This is where judgment matters. If the refactoring exposes that the code fundamentally doesn't work right, fixing it in the same PR makes sense. If it's just "this could be better," separate PR.

Team conventions: Your retrieved learning shows you "take a focused approach to PR scope," which is a good default. However, pragmatism should win when the overhead of separation outweighs the benefits.

For this specific case: The ReDoS issue is a security concern that exists in code you're moving. I'd recommend fixing it in PR 1213 (or this PR if you consolidate) rather than leaving it unfixed. You could also note in the PR description "refactor + fix ReDoS in getContentFromNodes" so reviewers know to expect both.

What does your team prefer?

🧠 Learnings used

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 232 File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31 Timestamp: 2025-06-22T10:40:52.752Z Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 559 File: apps/roam/src/utils/findDiscourseNode.ts:37-39 Timestamp: 2025-12-07T20:54:20.007Z Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 220 File: apps/roam/src/utils/conceptConversion.ts:42-56 Timestamp: 2025-06-17T23:37:45.289Z Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 372 File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116 Timestamp: 2025-08-25T15:53:21.799Z Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 165 File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45 Timestamp: 2025-05-20T03:11:07.917Z Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 0 File: :0-0 Timestamp: 2025-08-30T17:03:26.927Z Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 165 File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45 Timestamp: 2025-05-20T03:11:07.917Z Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Sorry I forgot to tag @mdroidian that question was addressed to him.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

But @CodeRabbit good point about the security implications!

@maparent, you're welcome! Security issues like ReDoS can be subtle but have real impact, especially in user-facing tools where format strings are configurable. Since you're already touching this code in the refactor, addressing it now makes sense.

If you do decide to fix it in this PR or PR 1213, using getDiscourseNodeFormatExpression (which already has the proper escaping logic) should make the fix straightforward. Let me know if you'd like me to review the fix once it's in place!

🧠 Learnings used

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 232 File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31 Timestamp: 2025-06-22T10:40:52.752Z Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 559 File: apps/roam/src/utils/findDiscourseNode.ts:37-39 Timestamp: 2025-12-07T20:54:20.007Z Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 220 File: apps/roam/src/utils/conceptConversion.ts:42-56 Timestamp: 2025-06-17T23:37:45.289Z Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 372 File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116 Timestamp: 2025-08-25T15:53:21.799Z Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 0 File: :0-0 Timestamp: 2025-08-30T17:03:26.927Z Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Sorry I forgot to tag @mdroidian that question was addressed to him.

This is probably best decided on a case by case basis depending on severity, effort, number of changes, etc.

Use your best judgement and just be sure to document the rationale with an inline GitHub comment.

Ok. Here's what I'm going to do:
I will correct the security issue locally, because I agree that is important.
I think we should make this stated policy.
I will do a few minor local fixes here (tbh I had already added removing a spurious async marker) because they're low impact.
I think the issue with maxFilenameLength calls for a solution that is too unrelated to this code, and i will make it separate.
I want to go on record that it's a productivity blocker that we cannot do a refactor without taking ownership of fixing minor problems in the code that is moved.
It means we allow coderabbit to dictate priority of which minor bugs we're going to fix. There's many of them, and we should be more systematic about choosing when we fix them.
OTH, I also acknowledge that, if we do not fix them opportunistically, they will be forgotten.
Actually, my favoured approach would be to fix them opportunistically in the same PR, but that would require us to relax the rule about single-purpose PRs, and that means you would have to give up on checking every small change. I suggest that our current process is too heavy, with a single bottleneck, to ever have the time to handle minor fixes.

Feel free to bring this up at our next 1:1 if you'd like to discuss further.

coderabbitai · 2025-12-25T21:22:37Z

apps/roam/src/utils/exportUtils.ts

+  return name.length > maxFilenameLength
+    ? `${name.substring(
+        0,
+        Math.ceil((maxFilenameLength - 3) / 2),
+      )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}`
+    : name;


⚠️ Potential issue | 🟡 Minor

Edge case: Very small maxFilenameLength values.

If maxFilenameLength is less than or equal to 3, the calculation (maxFilenameLength - 3) / 2 could result in 0 or negative values, potentially producing unexpected output. Consider adding a guard or minimum value check.

🔎 Suggested guard

+ const effectiveMaxLength = Math.max(maxFilenameLength, 4); return name.length > maxFilenameLength ? `${name.substring( 0, - Math.ceil((maxFilenameLength - 3) / 2), - )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}` + Math.ceil((effectiveMaxLength - 3) / 2), + )}...${name.slice(-Math.floor((effectiveMaxLength - 3) / 2))}` : name;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In apps/roam/src/utils/exportUtils.ts around lines 114 to 119, the truncation logic can produce incorrect slices when maxFilenameLength <= 3; add a guard that handles very small maxFilenameLength values first (e.g., if maxFilenameLength <= 3) and return a string of length maxFilenameLength composed of dots or a truncated ellipsis, otherwise compute left/right slice sizes using Math.max(0, Math.floor/ceil((maxFilenameLength - 3) / 2)) to avoid negative values and then perform the substring + '...' + slice as before.

Valid concern, but I think that should be done as a guard on the setting, not an adjustment downstream.

@maparent That makes sense! Validating maxFilenameLength at the settings level (e.g., ensuring it's always >= 4 or some reasonable minimum) is a cleaner approach and prevents invalid values from propagating through the system. This keeps the utility function simpler and enforces the constraint where the configuration is defined.

🧠 Learnings used

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 232 File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31 Timestamp: 2025-06-22T10:40:52.752Z Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 559 File: apps/roam/src/utils/findDiscourseNode.ts:37-39 Timestamp: 2025-12-07T20:54:20.007Z Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

apps/roam/src/utils/pageToMarkdown.ts

mdroidian

Lets address the coderabbit suggestions (or resolve if unnecessary ), then re-tag for review.

maparent · 2025-12-27T14:45:16Z

@CodeRabbit review

coderabbitai · 2025-12-27T14:45:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

apps/roam/src/utils/pageToMarkdown.ts (1)
236-237: Unused parameters blockRefsAsLinks and blockAnchors remain in signature.

These parameters are declared but never used in the function body. This was flagged in a previous review.

Either remove these parameters if no longer needed, or add implementation/TODO comments if they're planned for future functionality.
🔎 Suggested fix
     linkType: string;
-    blockRefsAsLinks?: boolean;
-    blockAnchors?: boolean;
   },

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d811e4 and a52ccd4.

📒 Files selected for processing (2)

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

🧠 Learnings (21)

📓 Common learnings

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-22T10:40:52.752Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Applied to files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-12-07T20:54:20.007Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-22T10:40:21.679Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:15-16
Timestamp: 2025-06-22T10:40:21.679Z
Learning: In the getAllDiscourseNodesSince function in apps/roam/src/utils/getAllDiscourseNodesSince.ts, date validation is performed before the function is called, so additional date validation within the function is not needed.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-12-19T16:59:40.640Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-12-19T16:59:40.640Z
Learning: In this codebase, type assertions at system boundaries (e.g., Roam datalog queries) are acceptable when the output shape is known from the query structure, such as when using `as Array<[string, string]>` after datalog queries with `:find ?uid ?title` clauses.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-10-18T18:58:16.100Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 504
File: apps/roam/src/utils/syncDgNodesToSupabase.ts:523-531
Timestamp: 2025-10-18T18:58:16.100Z
Learning: In `apps/roam/src/utils/syncDgNodesToSupabase.ts`, partial successes from `upsertNodesToSupabaseAsContent` and `addMissingEmbeddings` (indicated by numeric return values showing the count of successful operations) should NOT trigger backoff. Only complete failures (false) should trigger the exponential backoff mechanism. This design allows the sync process to continue making progress even when some items fail.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-08-25T15:53:21.799Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-07-27T19:49:41.859Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 301
File: packages/database/create_env.ts:58-67
Timestamp: 2025-07-27T19:49:41.859Z
Learning: Git branch names can contain shell metacharacters that pose command injection risks when used directly in shell commands. Vercel doesn't add additional validation beyond Git's rules, so sanitization is necessary when using branch names in execSync commands in the discourse-graph codebase.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-05-20T03:11:07.917Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-08-30T17:03:26.927Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-05-20T03:11:07.917Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-09T16:55:48.941Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 198
File: packages/database/supabase/schemas/account.sql:0-0
Timestamp: 2025-06-09T16:55:48.941Z
Learning: Security work including RLS policies and permission restrictions is currently postponed in the discourse-graph project as communicated by maparent.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:27.779Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/main.mdc:0-0
Timestamp: 2025-11-25T00:52:27.779Z
Learning: Applies to **/*.{ts,tsx} : Prefer early returns over nested conditionals for better readability

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:27.779Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/main.mdc:0-0
Timestamp: 2025-11-25T00:52:27.779Z
Learning: Applies to **/*.{ts,tsx} : Avoid `any` types when possible

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-05T21:57:14.909Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-11-06T13:48:35.007Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-11-06T13:48:35.007Z
Learning: In apps/roam/src/utils/createReifiedBlock.ts, the `setBlockProps` function does not need await as it is synchronous or fire-and-forget.

Applied to files:

apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-12-22T05:43:01.282Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 630
File: apps/roam/src/utils/settingsUsingBlockProps.ts:64-66
Timestamp: 2025-12-22T05:43:01.282Z
Learning: In `apps/roam/src/utils/settingsUsingBlockProps.ts`, the `setBlockPropBasedSettings` function expects callers to ensure that the `value` parameter matches the expected type for the given `keys` path. Type validation is handled at the caller side rather than within the utility function.

Applied to files:

apps/roam/src/utils/pageToMarkdown.ts

🧬 Code graph analysis (1)

apps/roam/src/utils/exportUtils.ts (1)

apps/roam/src/utils/types.ts (1)

Result (42-46)

🪛 ast-grep (0.40.3)

apps/roam/src/utils/exportUtils.ts

[warning] 68-73: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/[[\]\\^$.|?*+()]/g, "\\$&") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (14)

apps/roam/src/utils/exportUtils.ts (8)

1-6: Imports and type definition look good.

Clean imports from roamjs-components and local modules. The DiscourseExportResult type properly extends Result with the type field needed for discourse graph exports.

8-17: uniqJsonArray implementation is correct.

The approach of sorting object entries before stringifying ensures consistent deduplication regardless of key order. The type assertion is appropriate here.

19-56: getPageData implementation is well-structured.

Good use of early return for the isExportDiscourseGraph case. The matchedTexts Set correctly prevents duplicate matches across different node types. The explicit return type aligns with coding guidelines.

58-76: ReDoS mitigation applied correctly.

The regex metacharacter escaping at line 71 (/[[\]\\^$.|?*+()]/g) now properly escapes all standard metacharacters before constructing the regex. This addresses the security concern raised in the past review by preventing user-editable nodeFormat values from introducing malicious patterns like nested quantifiers.

The static analysis tool still flags this as a warning because it detects dynamic regex construction, but the escaping makes it safe.

78-108: getFilename implementation is clean.

Good use of named parameters with defaults. The truncation logic preserves readability by keeping start and end portions. As discussed in past review, the edge case for very small maxFilenameLength values should be guarded at the settings level.

110-117: toLink handles different link formats with sensible fallback.

The function correctly uses window.roamAlphaAPI.graph.name for constructing Roam URLs. The default fallback to plain filename for unknown link types is a safe approach.

119-136: pullBlockToTreeNode correctly transforms Roam PullBlock data.

The recursive transformation handles optional fields gracefully with appropriate defaults. The sorting of children by :block/order ensures correct rendering order. The view type inheritance from parent to children follows Roam's behavior.

138-141: collectUids is a clean recursive utility.

Simple and effective approach for gathering all UIDs from a tree structure.

apps/roam/src/utils/pageToMarkdown.ts (6)

1-40: Imports and constants are well-organized.

Good documentation of the Roam embed syntax patterns. The MATCHES_NONE regex (/$.+^/) is a clever pattern that can never match, used to disable optional replacements.

42-75: handleDiscourseContext follows good patterns.

Clean early return when context is disabled. The optional augmentation with referenced node context is well-structured.

77-110: handleFrontmatter generates valid YAML frontmatter.

Good handling of quote escaping in titles (line 101-102) to prevent YAML parsing issues. The default template provides sensible metadata fields.

112-210: toMarkdown handles complex Roam-to-Markdown conversion.

The function correctly handles:

Embed expansion with recursive rendering

Block reference resolution

TODO/DONE checkbox conversion

Roam italics to Markdown italics

Code block formatting

Nested page links via XRegExp.matchRecursive

View type inheritance and indentation

The logic for treating embed blocks as "document" view (lines 188-191) is a smart way to avoid double-prefixing embedded content.

239-349: pageToMarkdown provides comprehensive page export functionality.

The function correctly:

Extracts page metadata and tree structure

Handles optional discourse context augmentation

Conditionally includes references section

Generates proper YAML frontmatter

Collects all UIDs for tracking

The complex template literal (lines 278-346) is readable given the section structure.

251-262: Verify that normalizePageTitle provides adequate sanitization for datalog query injection, or add explicit quote escaping.

The datalog query at line 254-255 interpolates normalizePageTitle(text) directly into the query string. While this file demonstrates awareness of injection risks through explicit quote escaping elsewhere (YAML values), the datalog query relies on normalizePageTitle from the external roamjs-components package without visible additional protection. Either verify that normalizePageTitle handles quote escaping for datalog syntax, or consider adding explicit escaping like text.replace(/"/g, '\\"') before interpolation.

apps/roam/src/utils/pageToMarkdown.ts

mdroidian

I’m okay letting this through as-is. I do want to re-emphasize that we should generally default to the code placement heuristic we discussed in sept:

co-locate code with its primary usage first
export only when there’s a clear reuse need
move to /utils once it’s shared across multiple call sites
consider a /package only after it’s proven stable and broadly reusable

Having a shared heuristic helps anchor these decisions in something concrete. Without it, code movement tends to become subjective and preference-driven, which makes reviews harder and increases churn without a clear functional benefit.

mdroidian · 2025-12-28T22:45:59Z

apps/roam/src/utils/exportUtils.ts

+        ),
+      ),
+    ),
+  ).map((entries) => Object.fromEntries(JSON.parse(entries))) as T[];


Unsafe argument of type any assigned to a parameter of type Iterable<readonly [PropertyKey, unknown]>.eslint@typescript-eslint/no-unsafe-argument

mdroidian · 2025-12-28T22:48:38Z

apps/roam/src/utils/exportUtils.ts

+    : title;
+  const name = `${
+    removeSpecialCharacters
+      ? baseName.replace(/[<>:"/\\|\?*[\]]/g, "")


Unnecessary escape character: ?.eslintno-useless-escape

mdroidian · 2025-12-28T23:06:09Z

apps/roam/src/utils/pageToMarkdown.ts

+    allNodes,
+    maxFilenameLength,
+    removeSpecialCharacters,
+    linkType,


mdroidian · 2025-12-28T23:07:47Z

apps/roam/src/utils/pageToMarkdown.ts

+  const yamlLines = yaml.concat(resultCols.map((k) => `${k}: {${k}}`));
+  const content = yamlLines
+    .map((s) =>
+      s.replace(/{([^}]+)}/g, (_, capt: string) => {


Am I to assume that this page was copied over 1:1 with no changes, which is why the es lints are here? Next time just make a quick inline github comment at the top of the page so there is no ambiguity.

Yes, that is why. I mentioned it in the loom video and in the original commit message.
(I realize the latter is rarely read? Maybe I should avoid putting anything meaningful in it.)

Lets stick to keeping it in the body of the PR. And keep the loom video specific to a single PR. That way it can be focused. But walking through and adding inline GitHub comments is usually a really good idea. How about let's commit to try adding a few next PR?

mdroidian · 2025-12-28T23:24:22Z

apps/roam/src/utils/exportUtils.ts

+  if (!nodeFormat) return title;
+  const regex = new RegExp(
+    `^${nodeFormat
+      .replace(/\[/g, "\\[")
+      .replace(/]/g, "\\]")
+      .replace("{content}", "(.*?)")
+      .replace(/{[^}]+}/g, "(?:.*?)")}$`,
+  );
+  return regex.exec(title)?.[1] || title;


Feel free to bring this up at our next 1:1 if you'd like to discuss further.

maparent · 2025-12-30T02:48:01Z

Thank you.
I would propose adding a line to the heuristics: When the file becomes long and unwieldy.
(I think it's especially appropriate if it brings together not-quite-related functions.)
We may want to define a threshold so it's not arbitrary; but co-locate with primary usage could in theory translate to a single file, obviously undesirable.

mdroidian · 2025-12-30T03:57:50Z

long and unwieldy … not-quite-related functions … obviously undesirable

This is still subjective and preference-driven, which removes the benefit of the heuristic altogether. Until we can point to an objective signal—reuse across independent call sites, a real API boundary, or a concrete maintenance cost—“file feels too big” just becomes a proxy for personal comfort.

The heuristic is intentionally designed to avoid size- or aesthetics-based triggers. Modern IDEs already mitigate large-file concerns through code folding, symbol navigation, search, refactoring tools, and now semantic and LLM-assisted search. File length by itself is not an externally meaningful constraint anymore.

A single larger file that is locally coherent and rarely reused is typically cheaper than premature extraction that introduces indirection, naming overhead, and churn without reducing complexity.

I’m open to refining the heuristic, but any additional rule needs to be grounded in observable constraints rather than taste; otherwise reviews drift back into preference debates instead of shared decision criteria.

maparent · 2025-12-31T19:46:50Z

Your stated heuristic still does not explain why all the code does not live in a single file.
Also, you're saying you're comfortable with very big files thanks to tooling, which is no less a subjective judgement than my discomfort.
Let's try again to define a heuristic that is somewhat consistent with the current state of the code (not a single file):
I think that files and folders are meant to be cognitive grouping. For example, canvas components were moved into a folder because it's a group; one would want to look at all canvas functions as a group, not worry about them when not working on canvas. Also, files within the canvas folder were separated, because a single file would have encompassed too much functionality to be understood as a whole. I think what lives in a file should be understandable in a few hours at most. Hard to give a corresponding LoC estimate, but it's still an estimate.
I'll also add: If we have sizeable subcomponents in a file, with little overlap between them, that's a candidate for breaking up.
I think that the export is a great example of that: Exporting in html vs markdown vs json (vs pdf someday?) are all different functions with minimal overlap. It could conceivably be a plugin system. I think that's a great reason to separate them, so we can think of a unit at a time. (I think export is an important enough case that I'd consider a utils/export/type.js structure.)
Now, those are not deterministic heuristics, but the point of heuristics is precisely that they're not always deterministic (trying to eliminate subjectivity altogether strikes me as a questionable value judgement on the value of subjectivity.)
But I think they are still valid heuristics, insofar as they are criteria that can be reasoned about.

mdroidian · 2025-12-31T20:58:36Z

Your stated heuristic still does not explain why all the code does not live in a single file.

It wasn't meant to and doesn't need to. The scope in question was the utility functions.

eng-1211 Refactor exportUtils, factor out utility functions and markd…

5970ac4

…own. This is pure refactoring with no change of functionality.

unnecessary await

4d811e4

maparent requested a review from mdroidian December 25, 2025 18:38

maparent mentioned this pull request Dec 25, 2025

ENG-1032 Minimal JSON-LD export of the discourse-graph #529

Draft

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

mdroidian requested changes Dec 25, 2025

View reviewed changes

maparent added 2 commits December 27, 2025 09:09

coderabbit corrections

2b81cec

optimization

a52ccd4

coderabbitai bot reviewed Dec 27, 2025

View reviewed changes

apps/roam/src/utils/pageToMarkdown.ts Show resolved Hide resolved

more coderabbit corrections

c2924c2

maparent requested a review from mdroidian December 27, 2025 15:06

mdroidian approved these changes Dec 28, 2025

View reviewed changes

maparent merged commit 65ccadb into main Dec 30, 2025
5 checks passed

maparent deleted the eng-1211-refactor-exportutils-factor-out-utility-functions-and branch December 30, 2025 02:48

ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

Conversation

maparent commented Dec 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear bot commented Dec 25, 2025

Uh oh!

supabase bot commented Dec 25, 2025

Uh oh!

maparent commented Dec 25, 2025

Uh oh!

mdroidian commented Dec 25, 2025

Uh oh!

coderabbitai bot commented Dec 25, 2025

Uh oh!

coderabbitai bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdroidian left a comment

Choose a reason for hiding this comment

Uh oh!

maparent commented Dec 27, 2025

Uh oh!

coderabbitai bot commented Dec 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdroidian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

maparent commented Dec 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 25, 2025 •

edited

Loading

coderabbitai bot Dec 25, 2025 •

edited

Loading

coderabbitai bot Dec 25, 2025 •

edited

Loading