-
Notifications
You must be signed in to change notification settings - Fork 4
ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647
Conversation
…own. This is pure refactoring with no change of functionality.
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
Explanations about the breakup here: https://www.loom.com/share/afa48744aa0441628ec182a2147e8321 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdds new Roam export utilities and a page-to-Markdown renderer, and refactors getExportTypes to delegate page data extraction, filename/link handling, and Markdown generation to the new modules (exportUtils.ts and pageToMarkdown.ts). Changes are focused on export formatting and data extraction logic. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
apps/roam/src/utils/getExportTypes.ts (1)
224-241: TODO comment indicates future consolidation opportunity.The comment at line 224 notes that
treeNodeToMarkdownlogic should be reused with the markdown export. SincepageToMarkdown.tsnow exists with atoMarkdownfunction, this could potentially be addressed in a follow-up PR to reduce duplication between the PDF and Markdown exports.Would you like me to open an issue to track consolidating the PDF export's
treeNodeToMarkdownwith the existingtoMarkdownfrompageToMarkdown.ts?apps/roam/src/utils/exportUtils.ts (1)
8-17: Consider performance implications ofuniqJsonArray.The implementation uses
JSON.stringifywith sorted entries for deduplication, which works correctly but has O(n * m log m) complexity where m is the number of keys per object. For large arrays with complex objects, this could be slow.For the current use case (filtering relation data), this should be acceptable, but worth noting if used with larger datasets.
apps/roam/src/utils/pageToMarkdown.ts (1)
96-110: YAML frontmatter escaping is limited to the text field.Line 101-102 correctly escapes quotes in the
textfield, but other dynamic fields (line 104) are inserted without escaping. Values containing YAML special characters (:,\n,#, etc.) could produce malformed YAML.Since this is for user-controlled exports, it's not a security issue, but could cause parsing errors when importing the exported markdown.
🔎 Optional: Comprehensive YAML escaping
if (capt === "text") { - // Wrap title in quotes and escape additional quotes - const escapedText = result[capt].toString().replace(/"/g, '\\"'); - return `"${escapedText}"`; + const escapedText = result[capt].toString().replace(/"/g, '\\"'); + return `"${escapedText}"`; } - return result[capt].toString(); + const value = result[capt].toString(); + // Quote values containing YAML special characters + if (/[:\n#[\]{}|>&*!?]/.test(value) || value.includes('"')) { + return `"${value.replace(/"/g, '\\"')}"`; + } + return value;
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.ts
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/main.mdc)
**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefertypeoverinterfacein TypeScript
Use explicit return types for functions
Avoidanytypes when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Prefer existing dependencies from package.json when working on the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Implement the Discourse Graph protocol in the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
🧠 Learnings (10)
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-07-13T16:47:14.352Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-07-13T16:47:14.352Z
Learning: In the discourse-graph codebase, types.gen.ts contains automatically generated database function type definitions that may have reordered signatures between versions. This reordering is expected behavior from the code generation process and should not be flagged as an issue requiring fixes.
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-07-21T14:22:20.752Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 291
File: packages/database/supabase/functions/create-space/index.ts:0-0
Timestamp: 2025-07-21T14:22:20.752Z
Learning: In the discourse-graph codebase, types.gen.ts is not accessible from Supabase edge functions, requiring duplication of types and utilities when needed in the edge function environment at packages/database/supabase/functions/.
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,jsx,js,css,scss} : Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-12-07T20:54:20.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.
Applied to files:
apps/roam/src/utils/exportUtils.ts
🧬 Code graph analysis (2)
apps/roam/src/utils/getExportTypes.ts (4)
apps/roam/src/utils/types.ts (1)
Result(42-46)apps/roam/src/utils/exportUtils.ts (1)
getPageData(19-67)apps/roam/src/utils/getExportSettings.ts (1)
getExportSettings(112-125)apps/roam/src/utils/pageToMarkdown.ts (1)
pageToMarkdown(212-351)
apps/roam/src/utils/exportUtils.ts (1)
apps/roam/src/utils/types.ts (1)
Result(42-46)
🪛 ast-grep (0.40.3)
apps/roam/src/utils/exportUtils.ts
[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (10)
apps/roam/src/utils/getExportTypes.ts (3)
14-22: Clean modular imports from new utility files.The refactoring correctly imports the required utilities from the new modules. This aligns well with the PR's goal of factoring out utility functions.
84-132: Markdown export callback cleanly refactored.The callback properly delegates to
pageToMarkdownwhile preserving the progress update pattern and sequential processing. The early guard at line 85 improves robustness.
55-79: Incorrect claim about Neo4j export parameter.The review incorrectly states that Neo4j (line 152) includes
isExportDiscourseGraph, but it doesn't. The actual pattern shows: JSON and Neo4j both omit the parameter (filtering results to discourse nodes), while Markdown and PDF include it (exporting all content). This appears to be intentional design: structured data exports filter by node type, while document exports preserve all content.Likely an incorrect or invalid review comment.
apps/roam/src/utils/exportUtils.ts (2)
122-129: Good link type handling with appropriate fallback.The
toLinkfunction correctly handles the three link types (wikilinks, alias, roam url) with a sensible plain filename fallback. The extension removal for wikilinks and roam URLs matches expected Markdown/wiki conventions.
131-153: Well-structured recursive tree conversion utilities.
pullBlockToTreeNodecorrectly maps Roam's PullBlock structure to the TreeNode interface with appropriate defaults.collectUidsprovides a clean recursive UID collection. Both functions are idiomatic and readable.apps/roam/src/utils/pageToMarkdown.ts (5)
24-34: Clear regex patterns for Roam embed syntax.The
MATCHES_NONEpattern (/$.+^/) is a clever approach to create a regex that never matches, avoiding conditional logic. The embed regexes correctly capture the 9-10 character block UIDs. Good documentation comments explain the syntax being matched.
42-75: Clean conditional discourse context handling.The function uses early returns appropriately and cleanly handles the optional
appendRefNodeContextfeature. The logic for finding and appending referenced discourse nodes is well-structured.
142-161: Comprehensive text processing with proper ordering.The text processing chain correctly handles:
- Embeds and embed-children (with recursion)
- Block references
- TODO/DONE checkbox conversion
- Roam italic syntax conversion
- Code block formatting
The use of
MATCHES_NONEto conditionally skip regex replacements is an elegant pattern.
280-348: Well-structured content assembly with clear section formatting.The template literal construction correctly builds the markdown document with:
- Frontmatter
- Main content from tree children
- Optional Discourse Context section
- Optional References section
The conditional sections use appropriate guards and formatting.
253-264: Verify the escaping behavior ofnormalizePageTitlefrom the roamjs-components library.The concern about query injection via string interpolation at line 256 is theoretically valid—if
normalizePageTitledoesn't escape quotes in page titles, a title likefoo"barwould break the Datalog syntax. However, the implementation ofnormalizePageTitlefrom the externalroamjs-components(v0.85.7) library cannot be verified from public sources. The pattern of string interpolation is consistent throughout the codebase for various user inputs, and no documented security issues exist for this pattern in Roam API calls. Clarify whethernormalizePageTitleproperly escapes special characters (particularly quotes) for use in Datalog string contexts, or consider using Datalog's parameter binding (:in $clause) as a safer alternative if available.
| if (!nodeFormat) return title; | ||
| const regex = new RegExp( | ||
| `^${nodeFormat | ||
| .replace(/\[/g, "\\[") | ||
| .replace(/]/g, "\\]") | ||
| .replace("{content}", "(.*?)") | ||
| .replace(/{[^}]+}/g, "(?:.*?)")}$`, | ||
| ); | ||
| return regex.exec(title)?.[1] || title; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find DiscourseNode type definition and format field
rg -n "type DiscourseNode|interface DiscourseNode" -A 20Repository: DiscourseGraphs/discourse-graph
Length of output: 22213
🏁 Script executed:
# Check how format patterns are created/validated in the codebase
rg -n "\.format\s*=" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 57
🏁 Script executed:
# Look at the full context of exportUtils.ts around lines 79-87
cat -n apps/roam/src/utils/exportUtils.ts | head -100 | tail -50Repository: DiscourseGraphs/discourse-graph
Length of output: 1584
🏁 Script executed:
# Check if there's any validation of format field when it's set/saved
rg -n "format" apps/roam/src -l | head -20Repository: DiscourseGraphs/discourse-graph
Length of output: 933
🏁 Script executed:
# Search for format field validation or sanitization
rg -n "format.*validate|validateFormat|format.*check" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 1338
🏁 Script executed:
# Check DiscourseNodeConfigPanel to see format field editing
cat -n apps/roam/src/components/settings/DiscourseNodeConfigPanel.tsx | head -200Repository: DiscourseGraphs/discourse-graph
Length of output: 8237
🏁 Script executed:
# Search for other uses of format field and regex construction
rg -n "getContentFromNodes|format.*RegExp|new RegExp" apps/roam/src/utilsRepository: DiscourseGraphs/discourse-graph
Length of output: 1790
🏁 Script executed:
# Check if there's any try-catch or error handling for regex in similar code
rg -n "try.*RegExp|RegExp.*catch" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 57
🏁 Script executed:
# Find the validate function in NodeConfig
rg -n "const validate|function validate" apps/roam/src/components/settings/NodeConfig.tsx -B 5 -A 10Repository: DiscourseGraphs/discourse-graph
Length of output: 506
🏁 Script executed:
# Check getDiscourseNodeFormatExpression since it also processes format
cat -n apps/roam/src/utils/getDiscourseNodeFormatExpression.tsRepository: DiscourseGraphs/discourse-graph
Length of output: 544
🏁 Script executed:
# Look for format validation logic more broadly
rg -n "validate.*format|format.*valid" apps/roam/src -B 2 -A 5Repository: DiscourseGraphs/discourse-graph
Length of output: 2231
🏁 Script executed:
# Check if format field has any UI constraints or warnings
cat -n apps/roam/src/components/settings/NodeConfig.tsx | grep -A 30 -B 5 "format"Repository: DiscourseGraphs/discourse-graph
Length of output: 7712
Use the proper format escaping function to prevent ReDoS.
The getContentFromNodes function at lines 79-87 constructs a regex from the user-editable nodeFormat field with incomplete escaping. Unlike getDiscourseNodeFormatExpression which properly escapes regex metacharacters ([, ], ?, ., +), this code only escapes brackets, leaving other quantifiers unescaped. A malformed format like (a+)+b would create a regex with nested quantifiers, causing exponential backtracking.
Rather than adding a try-catch, replace the custom escaping with getDiscourseNodeFormatExpression, which already handles this safely.
🧰 Tools
🪛 ast-grep (0.40.3)
[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
In apps/roam/src/utils/exportUtils.ts around lines 79-87, the code builds a
RegExp from user-editable nodeFormat but only escapes brackets, leaving other
metacharacters and causing ReDoS risk; replace this custom escaping and regex
construction with the existing getDiscourseNodeFormatExpression utility that
properly escapes all regex metacharacters, then use the returned expression to
exec against title (preserve the same fallback to title when no match). Ensure
you import getDiscourseNodeFormatExpression if not already imported and remove
the manual replace chain so the format is safely escaped before creating/using
the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, methodology question: On the one hand, we are saying we should have only one logical change per PR. I tried to make this PR do only the refactor, nothing else, so that you could see it as innocuous and focus on the substance in the other PRs.
On the other hand, moving code draws the attention to issues with previously existing code. Coderabbit pointed this out in its comments to 1032, and I made (a subset of) this fix in that PR originally (a subset because I was following pattern) and then moved it to 1213 instead of 1211 so that 1211 would not have unrelated changes.
(Accessorily, maybe we should do the full fix in both places.)
So the question: When we do a refactoring, and try to make it into a single PR for clarity, is fixing issues with old code part of the same task or an extra task? Honestly, I think it depends; I have argued earlier that it's too much trouble to revert anything done automatically by the tooling. But it's also true that it muddies the refactoring.
I don't care that much either way, but I'm curious about which rule I'm expected to follow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maparent, this is a great question that highlights a real tension in software engineering practice. Here's my perspective on the tradeoffs:
Arguments for keeping refactoring PRs pure:
- Easier to review and understand the structural changes
- Clearer git history with distinct "what" and "why"
- Can be reverted independently if needed
- Reduces cognitive load for reviewers
Arguments for fixing issues during refactoring:
- Avoids touching the same code twice
- Issues are fresh in mind when you're already working with the code
- More efficient overall (fewer PRs, fewer context switches)
- Prevents shipping known issues in the interim
Practical heuristics I'd suggest:
-
Security or critical bugs: Fix immediately in the same PR. ReDoS vulnerabilities like the one I flagged fall into this category - if you're moving the code anyway, fixing it is the responsible choice.
-
Minor improvements or style issues: Keep separate unless trivial (e.g., automated formatting).
-
Structural issues revealed by the refactor: This is where judgment matters. If the refactoring exposes that the code fundamentally doesn't work right, fixing it in the same PR makes sense. If it's just "this could be better," separate PR.
-
Team conventions: Your retrieved learning shows you "take a focused approach to PR scope," which is a good default. However, pragmatism should win when the overhead of separation outweighs the benefits.
For this specific case: The ReDoS issue is a security concern that exists in code you're moving. I'd recommend fixing it in PR 1213 (or this PR if you consolidate) rather than leaving it unfixed. You could also note in the PR description "refactor + fix ReDoS in getContentFromNodes" so reviewers know to expect both.
What does your team prefer?
🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to tag @mdroidian that question was addressed to him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @CodeRabbit good point about the security implications!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maparent, you're welcome! Security issues like ReDoS can be subtle but have real impact, especially in user-facing tools where format strings are configurable. Since you're already touching this code in the refactor, addressing it now makes sense.
If you do decide to fix it in this PR or PR 1213, using getDiscourseNodeFormatExpression (which already has the proper escaping logic) should make the fix straightforward. Let me know if you'd like me to review the fix once it's in place!
🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to tag @mdroidian that question was addressed to him.
This is probably best decided on a case by case basis depending on severity, effort, number of changes, etc.
Use your best judgement and just be sure to document the rationale with an inline GitHub comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Here's what I'm going to do:
I will correct the security issue locally, because I agree that is important.
I think we should make this stated policy.
I will do a few minor local fixes here (tbh I had already added removing a spurious async marker) because they're low impact.
I think the issue with maxFilenameLength calls for a solution that is too unrelated to this code, and i will make it separate.
I want to go on record that it's a productivity blocker that we cannot do a refactor without taking ownership of fixing minor problems in the code that is moved.
It means we allow coderabbit to dictate priority of which minor bugs we're going to fix. There's many of them, and we should be more systematic about choosing when we fix them.
OTH, I also acknowledge that, if we do not fix them opportunistically, they will be forgotten.
Actually, my favoured approach would be to fix them opportunistically in the same PR, but that would require us to relax the rule about single-purpose PRs, and that means you would have to give up on checking every small change. I suggest that our current process is too heavy, with a single bottleneck, to ever have the time to handle minor fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to bring this up at our next 1:1 if you'd like to discuss further.
| return name.length > maxFilenameLength | ||
| ? `${name.substring( | ||
| 0, | ||
| Math.ceil((maxFilenameLength - 3) / 2), | ||
| )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}` | ||
| : name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: Very small maxFilenameLength values.
If maxFilenameLength is less than or equal to 3, the calculation (maxFilenameLength - 3) / 2 could result in 0 or negative values, potentially producing unexpected output. Consider adding a guard or minimum value check.
🔎 Suggested guard
+ const effectiveMaxLength = Math.max(maxFilenameLength, 4);
return name.length > maxFilenameLength
? `${name.substring(
0,
- Math.ceil((maxFilenameLength - 3) / 2),
- )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}`
+ Math.ceil((effectiveMaxLength - 3) / 2),
+ )}...${name.slice(-Math.floor((effectiveMaxLength - 3) / 2))}`
: name;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/roam/src/utils/exportUtils.ts around lines 114 to 119, the truncation
logic can produce incorrect slices when maxFilenameLength <= 3; add a guard that
handles very small maxFilenameLength values first (e.g., if maxFilenameLength <=
3) and return a string of length maxFilenameLength composed of dots or a
truncated ellipsis, otherwise compute left/right slice sizes using Math.max(0,
Math.floor/ceil((maxFilenameLength - 3) / 2)) to avoid negative values and then
perform the substring + '...' + slice as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid concern, but I think that should be done as a guard on the setting, not an adjustment downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maparent That makes sense! Validating maxFilenameLength at the settings level (e.g., ensuring it's always >= 4 or some reasonable minimum) is a cleaner approach and prevents invalid values from propagating through the system. This keeps the utility function simpler and enforces the constraint where the configuration is defined.
🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
mdroidian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets address the coderabbit suggestions (or resolve if unnecessary ), then re-tag for review.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apps/roam/src/utils/pageToMarkdown.ts (1)
236-237: Unused parametersblockRefsAsLinksandblockAnchorsremain in signature.These parameters are declared but never used in the function body. This was flagged in a previous review.
Either remove these parameters if no longer needed, or add implementation/TODO comments if they're planned for future functionality.
🔎 Suggested fix
linkType: string; - blockRefsAsLinks?: boolean; - blockAnchors?: boolean; },
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/main.mdc)
**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefertypeoverinterfacein TypeScript
Use explicit return types for functions
Avoidanytypes when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability
Files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Prefer existing dependencies from package.json when working on the Roam Research extension
Files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension
Files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
apps/roam/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
apps/roam/**
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Implement the Discourse Graph protocol in the Roam Research extension
Files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
🧠 Learnings (21)
📓 Common learnings
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Applied to files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Applied to files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-12-07T20:54:20.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-22T10:40:21.679Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:15-16
Timestamp: 2025-06-22T10:40:21.679Z
Learning: In the getAllDiscourseNodesSince function in apps/roam/src/utils/getAllDiscourseNodesSince.ts, date validation is performed before the function is called, so additional date validation within the function is not needed.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-12-19T16:59:40.640Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-12-19T16:59:40.640Z
Learning: In this codebase, type assertions at system boundaries (e.g., Roam datalog queries) are acceptable when the output shape is known from the query structure, such as when using `as Array<[string, string]>` after datalog queries with `:find ?uid ?title` clauses.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-10-18T18:58:16.100Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 504
File: apps/roam/src/utils/syncDgNodesToSupabase.ts:523-531
Timestamp: 2025-10-18T18:58:16.100Z
Learning: In `apps/roam/src/utils/syncDgNodesToSupabase.ts`, partial successes from `upsertNodesToSupabaseAsContent` and `addMissingEmbeddings` (indicated by numeric return values showing the count of successful operations) should NOT trigger backoff. Only complete failures (false) should trigger the exponential backoff mechanism. This design allows the sync process to continue making progress even when some items fail.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-08-25T15:53:21.799Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-07-27T19:49:41.859Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 301
File: packages/database/create_env.ts:58-67
Timestamp: 2025-07-27T19:49:41.859Z
Learning: Git branch names can contain shell metacharacters that pose command injection risks when used directly in shell commands. Vercel doesn't add additional validation beyond Git's rules, so sanitization is necessary when using branch names in execSync commands in the discourse-graph codebase.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-05-20T03:11:07.917Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-08-30T17:03:26.927Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-05-20T03:11:07.917Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-09T16:55:48.941Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 198
File: packages/database/supabase/schemas/account.sql:0-0
Timestamp: 2025-06-09T16:55:48.941Z
Learning: Security work including RLS policies and permission restrictions is currently postponed in the discourse-graph project as communicated by maparent.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:27.779Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/main.mdc:0-0
Timestamp: 2025-11-25T00:52:27.779Z
Learning: Applies to **/*.{ts,tsx} : Prefer early returns over nested conditionals for better readability
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:27.779Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/main.mdc:0-0
Timestamp: 2025-11-25T00:52:27.779Z
Learning: Applies to **/*.{ts,tsx} : Avoid `any` types when possible
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.
Applied to files:
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-11-06T13:48:35.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-11-06T13:48:35.007Z
Learning: In apps/roam/src/utils/createReifiedBlock.ts, the `setBlockProps` function does not need await as it is synchronous or fire-and-forget.
Applied to files:
apps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-12-22T05:43:01.282Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 630
File: apps/roam/src/utils/settingsUsingBlockProps.ts:64-66
Timestamp: 2025-12-22T05:43:01.282Z
Learning: In `apps/roam/src/utils/settingsUsingBlockProps.ts`, the `setBlockPropBasedSettings` function expects callers to ensure that the `value` parameter matches the expected type for the given `keys` path. Type validation is handled at the caller side rather than within the utility function.
Applied to files:
apps/roam/src/utils/pageToMarkdown.ts
🧬 Code graph analysis (1)
apps/roam/src/utils/exportUtils.ts (1)
apps/roam/src/utils/types.ts (1)
Result(42-46)
🪛 ast-grep (0.40.3)
apps/roam/src/utils/exportUtils.ts
[warning] 68-73: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/[[\]\\^$.|?*+()]/g, "\\$&") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (14)
apps/roam/src/utils/exportUtils.ts (8)
1-6: Imports and type definition look good.Clean imports from roamjs-components and local modules. The
DiscourseExportResulttype properly extendsResultwith thetypefield needed for discourse graph exports.
8-17:uniqJsonArrayimplementation is correct.The approach of sorting object entries before stringifying ensures consistent deduplication regardless of key order. The type assertion is appropriate here.
19-56:getPageDataimplementation is well-structured.Good use of early return for the
isExportDiscourseGraphcase. ThematchedTextsSet correctly prevents duplicate matches across different node types. The explicit return type aligns with coding guidelines.
58-76: ReDoS mitigation applied correctly.The regex metacharacter escaping at line 71 (
/[[\]\\^$.|?*+()]/g) now properly escapes all standard metacharacters before constructing the regex. This addresses the security concern raised in the past review by preventing user-editablenodeFormatvalues from introducing malicious patterns like nested quantifiers.The static analysis tool still flags this as a warning because it detects dynamic regex construction, but the escaping makes it safe.
78-108:getFilenameimplementation is clean.Good use of named parameters with defaults. The truncation logic preserves readability by keeping start and end portions. As discussed in past review, the edge case for very small
maxFilenameLengthvalues should be guarded at the settings level.
110-117:toLinkhandles different link formats with sensible fallback.The function correctly uses
window.roamAlphaAPI.graph.namefor constructing Roam URLs. The default fallback to plain filename for unknown link types is a safe approach.
119-136:pullBlockToTreeNodecorrectly transforms Roam PullBlock data.The recursive transformation handles optional fields gracefully with appropriate defaults. The sorting of children by
:block/orderensures correct rendering order. The view type inheritance from parent to children follows Roam's behavior.
138-141:collectUidsis a clean recursive utility.Simple and effective approach for gathering all UIDs from a tree structure.
apps/roam/src/utils/pageToMarkdown.ts (6)
1-40: Imports and constants are well-organized.Good documentation of the Roam embed syntax patterns. The
MATCHES_NONEregex (/$.+^/) is a clever pattern that can never match, used to disable optional replacements.
42-75:handleDiscourseContextfollows good patterns.Clean early return when context is disabled. The optional augmentation with referenced node context is well-structured.
77-110:handleFrontmattergenerates valid YAML frontmatter.Good handling of quote escaping in titles (line 101-102) to prevent YAML parsing issues. The default template provides sensible metadata fields.
112-210:toMarkdownhandles complex Roam-to-Markdown conversion.The function correctly handles:
- Embed expansion with recursive rendering
- Block reference resolution
- TODO/DONE checkbox conversion
- Roam italics to Markdown italics
- Code block formatting
- Nested page links via
XRegExp.matchRecursive- View type inheritance and indentation
The logic for treating embed blocks as "document" view (lines 188-191) is a smart way to avoid double-prefixing embedded content.
239-349:pageToMarkdownprovides comprehensive page export functionality.The function correctly:
- Extracts page metadata and tree structure
- Handles optional discourse context augmentation
- Conditionally includes references section
- Generates proper YAML frontmatter
- Collects all UIDs for tracking
The complex template literal (lines 278-346) is readable given the section structure.
251-262: Verify thatnormalizePageTitleprovides adequate sanitization for datalog query injection, or add explicit quote escaping.The datalog query at line 254-255 interpolates
normalizePageTitle(text)directly into the query string. While this file demonstrates awareness of injection risks through explicit quote escaping elsewhere (YAML values), the datalog query relies onnormalizePageTitlefrom the externalroamjs-componentspackage without visible additional protection. Either verify thatnormalizePageTitlehandles quote escaping for datalog syntax, or consider adding explicit escaping liketext.replace(/"/g, '\\"')before interpolation.
mdroidian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m okay letting this through as-is. I do want to re-emphasize that we should generally default to the code placement heuristic we discussed in sept:
- co-locate code with its primary usage first
- export only when there’s a clear reuse need
- move to
/utilsonce it’s shared across multiple call sites - consider a
/packageonly after it’s proven stable and broadly reusable
Having a shared heuristic helps anchor these decisions in something concrete. Without it, code movement tends to become subjective and preference-driven, which makes reviews harder and increases churn without a clear functional benefit.
| ), | ||
| ), | ||
| ), | ||
| ).map((entries) => Object.fromEntries(JSON.parse(entries))) as T[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe argument of type any assigned to a parameter of type Iterable<readonly [PropertyKey, unknown]>.eslint@typescript-eslint/no-unsafe-argument
| : title; | ||
| const name = `${ | ||
| removeSpecialCharacters | ||
| ? baseName.replace(/[<>:"/\\|\?*[\]]/g, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary escape character: ?.eslintno-useless-escape
| allNodes, | ||
| maxFilenameLength, | ||
| removeSpecialCharacters, | ||
| linkType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
| const yamlLines = yaml.concat(resultCols.map((k) => `${k}: {${k}}`)); | ||
| const content = yamlLines | ||
| .map((s) => | ||
| s.replace(/{([^}]+)}/g, (_, capt: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I to assume that this page was copied over 1:1 with no changes, which is why the es lints are here? Next time just make a quick inline github comment at the top of the page so there is no ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is why. I mentioned it in the loom video and in the original commit message.
(I realize the latter is rarely read? Maybe I should avoid putting anything meaningful in it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets stick to keeping it in the body of the PR. And keep the loom video specific to a single PR. That way it can be focused. But walking through and adding inline GitHub comments is usually a really good idea. How about let's commit to try adding a few next PR?
| if (!nodeFormat) return title; | ||
| const regex = new RegExp( | ||
| `^${nodeFormat | ||
| .replace(/\[/g, "\\[") | ||
| .replace(/]/g, "\\]") | ||
| .replace("{content}", "(.*?)") | ||
| .replace(/{[^}]+}/g, "(?:.*?)")}$`, | ||
| ); | ||
| return regex.exec(title)?.[1] || title; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to bring this up at our next 1:1 if you'd like to discuss further.
|
Thank you. |
This is still subjective and preference-driven, which removes the benefit of the heuristic altogether. Until we can point to an objective signal—reuse across independent call sites, a real API boundary, or a concrete maintenance cost—“file feels too big” just becomes a proxy for personal comfort. The heuristic is intentionally designed to avoid size- or aesthetics-based triggers. Modern IDEs already mitigate large-file concerns through code folding, symbol navigation, search, refactoring tools, and now semantic and LLM-assisted search. File length by itself is not an externally meaningful constraint anymore. A single larger file that is locally coherent and rarely reused is typically cheaper than premature extraction that introduces indirection, naming overhead, and churn without reducing complexity. I’m open to refining the heuristic, but any additional rule needs to be grounded in observable constraints rather than taste; otherwise reviews drift back into preference debates instead of shared decision criteria. |
|
Your stated heuristic still does not explain why all the code does not live in a single file. |
It wasn't meant to and doesn't need to. The scope in question was the utility functions. |
https://linear.app/discourse-graphs/issue/ENG-1211/refactor-exportutils-factor-out-utility-functions-and-markdown
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.