Skip to content

Conversation

@maparent
Copy link
Collaborator

@maparent maparent commented Dec 31, 2025

https://linear.app/discourse-graphs/issue/ENG-1230/backlinks-in-json-ld-export
Add backlink information in json-ld export
Create a backlink relationship based on common elements (dcterms) and also reference to the document structure ontology (which allows describing roam block containment)
Accessorily: Added ontology info to dg-core, consolidated per-page calculations in a single function for better progress tracking.

Loom: https://www.loom.com/share/b9fc610d5d9349e0aff6af16dc801636

Summary by CodeRabbit

Release Notes

  • New Features
    • Added backlink functionality to display all pages that reference the current page.
    • Enhanced vocabulary schema to better support semantic relationships between content.
    • Improved JSON-LD export with comprehensive relationship and backlink tracking.
    • Refined progress tracking during data export operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@linear
Copy link

linear bot commented Dec 31, 2025

@supabase
Copy link

supabase bot commented Dec 31, 2025

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@maparent
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

📝 Walkthrough

Walkthrough

The changes introduce a backlink mechanism to the discourse graph export system. The JSON-LD export now queries Roam for backlinks, constructs nodes asynchronously with backlink data, and integrates the new dg_core ontology schema. The dg_base schema removes the textRefersToNode property, while dg_core adds containment and backlink vocabulary terms.

Changes

Cohort / File(s) Summary
JSON-LD Export & Context
apps/roam/src/utils/jsonld.ts
Added dg_core CURIE namespace; introduced async node mapping with backlink queries from Roam; enhanced node construction to include @id, @type, title, content, modified, created, creator, and backlinks array; added error handling for unknown node types; integrated progress tracking.
Core Ontology Vocabulary
apps/website/public/schema/dg_core.ttl
Added dg_core ontology header with metadata; introduced dg:containsRec as TransitiveProperty; defined po:contains and dct:hasPart as subClasses of dg:containsRec; added dg:containsRef as ObjectProperty with property chain axiom; added dg:backlink as ObjectProperty with inverseOf dg:containsRef. Removed vs and sioc namespace prefixes; added po namespace prefix.
Base Ontology Schema
apps/website/public/schema/dg_base.ttl
Removed dgb:textRefersToNode ObjectProperty definition including its domain, range, and rdfs:comment.

Sequence Diagram(s)

sequenceDiagram
    participant Exp as Export Process
    participant Roam as Roam Database
    participant Node as Node Constructor
    participant Graph as JSON-LD Graph

    Exp->>Roam: Query pages for export
    Roam-->>Exp: Return page data
    
    loop For each page
        Exp->>Exp: Convert page to Markdown (pageToMarkdown)
        Exp->>Roam: Query backlinks for page UID
        Roam-->>Exp: Return backlink page UIDs
        Exp->>Node: Construct node with metadata
        Node->>Node: Resolve node type via nodeSchemaUriByName
        Note over Node: Add fields: `@id`, `@type`, title,<br/>content, modified, created, creator
        alt Has backlinks
            Node->>Node: Format backlinks as pages:uid array
        end
        Node-->>Exp: Return constructed node
        Exp->>Exp: Update export progress
    end
    
    Exp->>Exp: Query relation data (getRelationData)
    Exp->>Graph: Assemble final JSON-LD with nodes & relations
    Graph-->>Exp: Complete graph
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • eng-1212 Base json-ld export #648: Modifies JSON-LD export functionality in apps/roam/src/utils/jsonld.ts with context and schema updates; shares overlapping changes to CURIE definitions and node construction logic.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ENG-1230 Backlinks in JSON-LD export' directly matches the primary changes: adding backlink functionality to the JSON-LD export system.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/roam/src/utils/jsonld.ts (1)

109-111: Fix type mismatch: Return type doesn't allow string arrays.

The return type on line 110 specifies Record<string, string>[] for the @graph field, but the backlinks property (line 179) is typed as string[]. This causes the compilation error reported by the pipeline.

🔎 Proposed fix for return type
 }): Promise<
-  Record<string, string | Record<string, string> | Record<string, string>[]>
+  Record<string, string | string[] | Record<string, string> | Record<string, string | string[]>[]>
 > => {

This allows @graph entries to have both string and string[] values, which aligns with the backlinks implementation.

Also applies to: 169-169, 203-203

apps/website/public/schema/dg_core.ttl (1)

1-11: Add missing dct: prefix declaration.

Lines 103 and 106 reference dct:hasPart and dct:references (Dublin Core Terms), but the dct: prefix is not declared. This will cause RDF parsing errors.

🔎 Proposed fix

Add the Dublin Core Terms prefix declaration after line 4:

 @prefix dc: <http://purl.org/dc/elements/1.1/> .
+@prefix dct: <http://purl.org/dc/terms/> .
 @prefix owl: <http://www.w3.org/2002/07/owl#> .

Also applies to: 103-103, 106-106

🧹 Nitpick comments (1)
apps/roam/src/utils/jsonld.ts (1)

150-154: Remove unreachable fallback operator.

The nullish coalescing operator ?? "nodeSchema" on line 171 is unreachable because internalError throws an exception when nodeType is falsy (lines 150-154). The fallback value will never be used.

🔎 Proposed fix
       "@type": nodeType ?? "nodeSchema", // eslint-disable-line @typescript-eslint/naming-convention
+      "@type": nodeType, // eslint-disable-line @typescript-eslint/naming-convention

Also applies to: 171-171

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0734588 and c2b00ba.

📒 Files selected for processing (3)
  • apps/roam/src/utils/jsonld.ts
  • apps/website/public/schema/dg_base.ttl
  • apps/website/public/schema/dg_core.ttl
💤 Files with no reviewable changes (1)
  • apps/website/public/schema/dg_base.ttl
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

  • apps/roam/src/utils/jsonld.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

  • apps/roam/src/utils/jsonld.ts
apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
apps/website/**

📄 CodeRabbit inference engine (.cursor/rules/website.mdc)

Prefer existing dependencies from apps/website/package.json when adding dependencies to the website application

Files:

  • apps/website/public/schema/dg_core.ttl
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/** : Implement the Discourse Graph protocol in the Roam Research extension
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 189
File: packages/database/supabase/migrations/20250603144146_account_centric.sql:50-63
Timestamp: 2025-06-04T11:41:34.951Z
Learning: In the discourse-graph database, all accounts currently stored are Roam platform accounts, making platform-specific migration logic safe for global operations.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:11-40
Timestamp: 2025-06-23T11:49:45.457Z
Learning: In the DiscourseGraphs/discourse-graph codebase, direct access to `window.roamAlphaAPI` is the established pattern throughout the codebase. The team prefers to maintain this pattern consistently rather than making piecemeal changes, and plans to address dependency injection as a global refactor when scheduled.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/schema.yaml:116-121
Timestamp: 2025-05-20T03:06:16.600Z
Learning: In the discourse-graph project's LinkML schema (packages/database/schema.yaml), attributes and slots are equivalent constructs. Items can be defined either as slots or attributes without needing to duplicate them in both sections.
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

  • apps/roam/src/utils/jsonld.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

  • apps/roam/src/utils/jsonld.ts
🧬 Code graph analysis (1)
apps/roam/src/utils/jsonld.ts (3)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
apps/roam/src/utils/pageToMarkdown.ts (1)
  • pageToMarkdown (212-347)
apps/roam/src/utils/getExportTypes.ts (1)
  • updateExportProgress (25-33)
🪛 GitHub Actions: CI
apps/roam/src/utils/jsonld.ts

[error] 203-203: Type '(Record<string, string | string[]> | { "@type": string; predicate: string; source: string; destination: string; })[]' is not assignable to type 'string | Record<string, string> | Record<string, string>[]'.

🔇 Additional comments (5)
apps/roam/src/utils/jsonld.ts (3)

20-20: LGTM: Context extensions for backlinks.

The new CURIE mappings for dg (core schema namespace) and backlink property are correctly structured and align with the ontology additions in dg_core.ttl.

Also applies to: 36-36


138-185: LGTM: Async node construction with backlinks.

The refactor to async node construction is well-structured:

  • Proper use of Promise.all for concurrent processing
  • Correct integration with pageToMarkdown per the codebase pattern
  • Conditional inclusion of backlinks only when present
  • Progress tracking maintained throughout

155-168: The backlinks query logic is correct.

The Datalog query properly finds all pages that reference the current page:

  1. It locates the target page and retrieves all blocks on it (including nested blocks)
  2. It finds blocks with :block/refs relationships to either those blocks or the page itself
  3. It extracts the UID of the pages containing those references
  4. It filters results to only pages in the export

This implementation handles both page-level and block-level references correctly and is consistent with similar query patterns elsewhere in the codebase. No changes needed.

apps/website/public/schema/dg_core.ttl (2)

12-17: LGTM: Ontology metadata.

The ontology header properly declares the core vocabulary with appropriate metadata (date, version, labels). The tentative version marker correctly signals the experimental status.


101-101: LGTM: Backlink property semantics (pending fixes).

The ontology design is sound:

  • dg:containsRec as a transitive property correctly models hierarchical containment
  • The property chain axiom (dg:containsRec dct:references) logically derives that if A transitively contains B and B references C, then A contains a reference to C
  • dg:backlink as the inverse of dg:containsRef correctly models bidirectional reference relationships

This aligns well with the JSON-LD implementation in jsonld.ts.

Note: This approval is contingent on fixing the critical issues identified in previous comments (missing dct: prefix and rdfs:subPropertyOf).

Also applies to: 105-111

@maparent
Copy link
Collaborator Author

@maparent maparent requested a review from mdroidian December 31, 2025 18:57
Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be more stringent with adherence to the PR workflows going forward. Please take a minute to re-read the handbook if necessary.

Please add the loom video to the body of the PR then re-tag me for review.

@maparent maparent requested a review from mdroidian December 31, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants