feat!: allow for multiple gene_lists #164

dlaehnemann · 2025-10-08T11:23:59Z

This will generate one heatmap per list, and bootstrap plots for each of the genes in the union of the lists.

Summary by CodeRabbit

New Features
- Support multiple named gene lists — one heatmap and one bootstrap plot per list; list name included in filenames, logs, and outputs.
- Ability to exclude samples via a YAML list.
Refactor
- Modernized per-list plotting and bootstrap flow; heatmap generation moved to a tidyverse/ggplot-based pipeline; per-transcript bootstrap handling updated.
Documentation
- Clarified multi-list usage, per-list plotting, sample-exclude format, and Ensembl/Pfam guidance.
Chores
- Default bootstrap FDR raised to 0.5; updated Ensembl (115) and Pfam (37.1) and adjusted R environment pins.

…plots for union of genes (untested)

coderabbitai · 2025-10-08T11:24:19Z

Walkthrough

Replaces single-string diffexp.genes_of_interest.genelist with a map diffexp.genes_of_interest.gene_lists across configs and schema; updates Snakemake rules and wildcards to iterate by {gene_list}; adapts R scripts and conda envs to produce per-gene-list heatmaps and per-transcript bootstraps; bumps Ensembl and Pfam refs and adjusts test configs.

Changes

Cohort / File(s)	Summary
Configs (main and tests) `config/config.yaml`, `.test/config/config.yaml`, `.test/three_prime/config/config.yaml`	Replace `diffexp.genes_of_interest.genelist` (string) with `diffexp.genes_of_interest.gene_lists` (map of named file paths, e.g. `gene_list_1`, `gene_list_2`); update `experiment.resources.ref.release` 114→115 and `pfam` 37.0→37.1; add/clarify `diffexp.exclude` examples.
Schema `workflow/schemas/config.schema.yaml`	Replace `genelist` (string) with `gene_lists` (object) using `patternProperties` allowing arbitrary named string entries; update required fields to expect `gene_lists`.
Snakemake rules `workflow/rules/common.smk`, `workflow/rules/diffexp.smk`	Remove `input_genelist(predef_genelist)` helper; lookup `diffexp/genes_of_interest/gene_lists`; switch wildcard/iteration from `mode` to `{gene_list}`; include `{gene_list}` in inputs/outputs/logs/labels and pass per-list paths to bootstrap/heatmap rules.
R scripts `workflow/scripts/plot-bootstrap.R`, `workflow/scripts/plot_diffexp_heatmap.R`	`plot-bootstrap.R`: read named TSV gene-lists when activated, derive transcripts_of_interest from observed data, and output per-transcript bootstrap PDFs including gene/transcript/model in filenames. `plot_diffexp_heatmap.R`: use `{gene_list}` wildcard, support `topn` or predefined per `{gene_list}`, select genes, and render heatmaps with tidyverse/ggalign ggplot pipeline (replacing pheatmap-based code).
Conda envs `workflow/envs/heatmap.yaml`, `workflow/envs/sleuth.yaml`	`heatmap.yaml`: replace `r-pheatmap`, `r-dplyr`, `r-tidyr` with `r-base`, `r-tidyverse`, `r-ggalign`. `sleuth.yaml`: loosen `r-base` pin from `=4` to `>=4.1`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CFG as Config (diffexp.gene_lists)
  participant SMK as Snakemake
  participant HEAT as plot_diffexp_heatmap.R
  participant BOOT as plot-bootstrap.R

  CFG->>SMK: provide map of named gene-list paths
  rect rgb(240,248,255)
    note right of SMK: Snakemake iterates per `gene_list` key
    SMK->>SMK: expand `{gene_list}` wildcard
    SMK->>HEAT: run heatmap job with {model, gene_list, predef_gene_list}
    HEAT-->>SMK: produce `{model}.{gene_list}.diffexp-heatmap.{gene_list}.pdf` + log
  end

  rect rgb(245,255,240)
    note right of SMK: bootstrap reads all enabled gene_lists
    SMK->>BOOT: pass gene_lists map (paths + activate flags)
    BOOT->>BOOT: read TSVs, derive transcripts_of_interest from observed counts
    BOOT-->>SMK: generate bootstrap PDFs per transcript (filenames include gene, transcript, model)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

docs: add explanations and sources to config/config.yaml #162 — touches diffexp.genes_of_interest handling and config/schema edits (closely related).

Suggested reviewers

johanneskoester
FelixMoelder

Poem

I hop through configs, tidy and neat,
Named lists now gather each gene's heartbeat.
Heatmaps and bootstraps stamped with their name,
Per-list wildcards frame each plotting game. 🐰

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly highlights the breaking feature of supporting multiple gene lists which aligns exactly with the PR objectives of enabling multiple gene_list inputs and updating downstream plotting behavior. It is concise, uses conventional commit syntax to denote a breaking change, and clearly conveys the primary change without extraneous details.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/allow-multiple-gene-set-lists-heatmap-bootstrap-plots

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e015b3 and 8dbfca8.

📒 Files selected for processing (1)

config/config.yaml (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

config/config.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-rna-workflow
GitHub Check: run-three-prime-rna-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

workflow/rules/diffexp.smk (1)
258-279: Consider adding wildcard constraint for {gene_list}.

The rule correctly implements per-gene-list heatmap generation. However, consider adding a wildcard constraint for {gene_list} to prevent potential issues with special characters in gene list names and maintain consistency with existing patterns (see line 35: {model,[^.]+}).

For example, add a constraint in the output definition:
-            "results/plots/diffexp-heatmap/{model}.{gene_list}.diffexp-heatmap.{mode}.pdf",
+            "results/plots/diffexp-heatmap/{model}.{gene_list,[^./]+}.diffexp-heatmap.{mode}.pdf",
This prevents gene list names containing dots or slashes from causing wildcard ambiguity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e580493 and a6eaf38.

⛔ Files ignored due to path filters (2)

.test/resources/gene_list.tsv is excluded by !**/*.tsv
.test/resources/gene_list_2.tsv is excluded by !**/*.tsv

📒 Files selected for processing (8)

.test/config/config.yaml (1 hunks)
.test/three_prime/config/config.yaml (1 hunks)
config/config.yaml (1 hunks)
workflow/rules/common.smk (1 hunks)
workflow/rules/diffexp.smk (2 hunks)
workflow/schemas/config.schema.yaml (1 hunks)
workflow/scripts/plot-bootstrap.R (1 hunks)
workflow/scripts/plot_diffexp_heatmap.R (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Tests

workflow/rules/common.smk

[error] 336-336: SyntaxError: positional argument follows keyword argument.

🔇 Additional comments (1)

workflow/rules/diffexp.smk (1)

149-149: LGTM! Correct implementation for multi-list bootstrap aggregation.

The change correctly retrieves the entire gene_lists map from the config, enabling the bootstrap plot script to aggregate genes across all provided lists as intended.

workflow/rules/common.smk

workflow/scripts/plot-bootstrap.R

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

workflow/rules/common.smk (1)
329-340: Syntax error from previous review has been addressed.

The critical issues flagged in the past review (missing dpath= keyword and incorrect path) have been resolved. The current implementation correctly uses lookup(within=config, dpath="diffexp/genes_of_interest/gene_lists").

For consistency with line 326 (which uses gene_list=["topn"]), consider explicitly converting the lookup result to a list:
             gene_list=lookup(
                 within=config, dpath="diffexp/genes_of_interest/gene_lists"
             ),
+            gene_list=list(
+                lookup(
+                    within=config, dpath="diffexp/genes_of_interest/gene_lists"
+                )
+            ),
This makes the type explicit and aligns with the default gene_list specification on line 326.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a67b6a and dca8aef.

📒 Files selected for processing (2)

workflow/rules/common.smk (1 hunks)
workflow/rules/diffexp.smk (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

workflow/rules/diffexp.smk

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-three-prime-rna-workflow
GitHub Check: run-rna-workflow

🔇 Additional comments (1)

workflow/rules/common.smk (1)

323-328: LGTM! Default heatmap generation for top N genes is well-structured.

The addition of the {gene_list} wildcard and the default gene_list=["topn"] ensures that a heatmap for the top differentially expressed genes is always generated, which maintains backward compatibility while supporting the new multi-list feature.

…t release

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

workflow/scripts/plot-bootstrap.R (2)
23-30: Well-implemented gene list aggregation.

The logic correctly handles multiple gene lists:

character() initialization prevents crashes when gene_lists is empty (addresses previous review)

union() automatically deduplicates genes across lists

Pipe operators and transformations are appropriate

Consider adding error handling around file reads to provide clearer feedback if a gene list file is missing or malformed:
   for (gene_list in unname(snakemake@params[["genes_of_interest"]][["gene_lists"]])) {
-    new_genes <- read_tsv(
-      gene_list,
-      col_names = FALSE
-    )
+    new_genes <- tryCatch(
+      read_tsv(gene_list, col_names = FALSE, show_col_types = FALSE),
+      error = function(e) {
+        stop(sprintf("Failed to read gene list '%s': %s", gene_list, e$message))
+      }
+    )
     genes <- union(genes, new_genes |> unlist() |> unname())
   }
36-39: Update outdated comment.

The comment refers to an old implementation detail. Consider updating it to reflect the current logic:
 } else {
-    # "genes" is null, if the list provided in config.yaml is empty
+    # Create empty tibble when genes_of_interest is not activated
     genes_of_interest <- tibble(ext_gene = character())
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4fb68b and 84aca9e.

📒 Files selected for processing (1)

workflow/scripts/plot-bootstrap.R (1 hunks)

🔇 Additional comments (2)

workflow/scripts/plot-bootstrap.R (2)

22-22: LGTM! Parameter access updated correctly.

The parameter path change from genes to genes_of_interest aligns with the PR's shift to multiple gene lists.

31-35: LGTM! Tibble creation is correct.

enframe() appropriately converts the gene vector to a tibble with the expected ext_gene column name.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

workflow/scripts/plot-bootstrap.R (1)
22-35: Add error handling for missing or malformed gene list files.

The loop reads TSV files without error handling. If a file path is invalid or the file is malformed, the script will crash with potentially unclear errors. Consider wrapping the file read operations in tryCatch() to provide informative error messages.

Additionally, the native pipe operator |> requires R >= 4.1.0, which may break on older R versions if not properly documented or enforced.

Apply this diff to add error handling:
 if (snakemake@params[["genes_of_interest"]][["activate"]] == TRUE) {
   genes = character()
   for (gene_list in unname(snakemake@params[["genes_of_interest"]][["gene_lists"]])) {
-    new_genes <- read_tsv(
-      gene_list,
-      col_names = FALSE
-    )
-    genes <- union(genes, new_genes |> unlist() |> unname())
+    tryCatch({
+      new_genes <- read_tsv(
+        gene_list,
+        col_names = FALSE,
+        show_col_types = FALSE
+      )
+      genes <- union(genes, new_genes |> unlist() |> unname())
+    }, error = function(e) {
+      stop(sprintf("Failed to read gene list file '%s': %s", gene_list, e$message))
+    })
   }
   genes_of_interest <- enframe(
     genes,
     name = NULL,
     value = "ext_gene"
   )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84aca9e and f6709e7.

📒 Files selected for processing (1)

workflow/scripts/plot-bootstrap.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-three-prime-rna-workflow
GitHub Check: run-rna-workflow

🔇 Additional comments (1)

workflow/scripts/plot-bootstrap.R (1)

37-38: LGTM!

The empty tibble initialization is consistent with the structure created when genes_of_interest is activated, ensuring downstream compatibility.

dlaehnemann · 2025-10-10T07:42:18Z

Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/scripts/plot-bootstrap.R (1)> 22-35: Add error handling for missing or malformed gene list files.
The loop reads TSV files without error handling. If a file path is invalid or the file is malformed, the script will crash with potentially unclear errors. Consider wrapping the file read operations in tryCatch() to provide informative error messages.
Additionally, the native pipe operator |> requires R >= 4.1.0, which may break on older R versions if not properly documented or enforced.
Apply this diff to add error handling:
 if (snakemake@params[["genes_of_interest"]][["activate"]] == TRUE) {
   genes = character()
   for (gene_list in unname(snakemake@params[["genes_of_interest"]][["gene_lists"]])) {
-    new_genes <- read_tsv(
-      gene_list,
-      col_names = FALSE
-    )
-    genes <- union(genes, new_genes |> unlist() |> unname())
+    tryCatch({
+      new_genes <- read_tsv(
+        gene_list,
+        col_names = FALSE,
+        show_col_types = FALSE
+      )
+      genes <- union(genes, new_genes |> unlist() |> unname())
+    }, error = function(e) {
+      stop(sprintf("Failed to read gene list file '%s': %s", gene_list, e$message))
+    })
   }
   genes_of_interest <- enframe(
     genes,
     name = NULL,
     value = "ext_gene"
   )
📜 Review details

@coderabbitai read_tsv() is a function from a tidyverse package (readr) and these provide good error handling and clear error messages. Thus, for tidyverse package functions, no manual error handling for common cases is necessary.

coderabbitai · 2025-10-10T07:43:05Z

Tip

For best results, initiate chat on the files or code changes.

@dlaehnemann, thank you for the clarification! You're absolutely right that read_tsv() and other tidyverse functions provide robust error handling with clear, informative error messages out of the box. I'll keep this in mind for future reviews.

✏️ Learnings added

Learnt from: dlaehnemann
URL: https://github.com/snakemake-workflows/rna-seq-kallisto-sleuth/pull/164

Timestamp: 2025-10-10T07:43:04.524Z
Learning: In this Snakemake workflow project, tidyverse package functions (such as `read_tsv()` from `readr`) provide good error handling and clear error messages. Manual error handling with `tryCatch()` for common cases is not necessary when using tidyverse functions.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

fxwiegand

I don't quite understand the rotation of the genes into the horizontal list. I would not say that this helps readability either. Also previous lists would need to be adapted again. Couldnt you simply rotate the list where it is needed in the code?

Also one thought that came to my mind was that this is a breaking change correct? One could make this non-breaking by simply allowing to add a second gene set via another column.

However all this is probably nitpicky stuff - looks good in general i would say!

dlaehnemann · 2025-10-13T09:26:08Z

Good points, thanks. The explanatory comment in the config/config.yaml and the actual example gene list for the tests were previously disagreeing, because I added a wrong explanatory comment there (I guess I misunderstood the explicit specification of the \t separator in previous code, which would not be necessary). I made them agree in this way, but possibly the other way around makes the code a bit clearer, being able to directly parse in a column. I'll adjust this.

Also, this is a breaking change, yes. So we should reflect this in the version bump. I also like the idea of having only a single file with multiple columns, but this would also require a header line with column names, to name each list. So I guess this would also be breaking. And I am actually unsure whether users would rather maintain a single file with multiple columns, or multiple files with a single column each. The former would mean that genes arbitrarily line up in rows, even though they don't relate in any way. So I guess I would stick with one file per gene list?

fxwiegand · 2025-10-13T09:45:24Z

Good points, thanks. The explanatory comment in the config/config.yaml and the actual example gene list for the tests were previously disagreeing, because I added a wrong explanatory comment there (I guess I misunderstood the explicit specification of the \t separator in previous code, which would not be necessary). I made them agree in this way, but possibly the other way around makes the code a bit clearer, being able to directly parse in a column. I'll adjust this.

Yeah i get that! The .tsv extension is quite confusing when theres only a single column and it somehow also is a line break delimited file then.

Also, this is a breaking change, yes. So we should reflect this in the version bump. I also like the idea of having only a single file with multiple columns, but this would also require a header line with column names, to name each list. So I guess this would also be breaking. And I am actually unsure whether users would rather maintain a single file with multiple columns, or multiple files with a single column each. The former would mean that genes arbitrarily line up in rows, even though they don't relate in any way. So I guess I would stick with one file per gene list?

Ah yes right the columns would at least need a header for naming the gene set then so yeah the change will be breaking anyway. In that case i also don't really have a strong opinion on single file vs. multiple files. I agree that the single file solution file could be confusing when looking at a row of that table.

Addimator · 2025-10-13T10:32:05Z

For the single file multiple columns discussion I want to add that I personally like single files (especially since the cluster is so damn slow a lot of times) to not have to open multiple files. But since this is a really specific scenario and there should not be too many files anyways I do not think this is an important reason to switch to single file multiple cols.

config/config.yaml

Addimator · 2025-10-13T10:34:41Z

I just finished the review. Again some nitpicking stuff and nothing major to add. Great PR from my side too!

dlaehnemann · 2025-10-13T17:57:24Z

OK, I went back to line-separated gene lists. But then I couldn't bear the current status of the code and changed it all to proper tidyverse syntax and hopefully made it all a bit more readable. Also fixed the bootstrap plot filtering in the process, which was omitting quite a few requested genes. And switched the heatmap plotting from the very specific pheatmap() to the ggplot2-based ggalign(), which should be much easier to adjust and improve.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

config/config.yaml (2)

144-156: Clarify gene-list docs: avoid “infix”, add period, and state “no header”

Tighten wording and remove ambiguity about headers.

Apply:

-  # If you want a heatmap and bootstrap plots for particular sets of genes,
+  # If you want a heatmap and bootstrap plots for particular sets of genes,
   # you can set `activate: true` and provide one or more gene list files. In
-  # those files, list all your HGNC gene symbols of interest, one gene per line.
+  # those files, list all your HGNC gene symbols of interest, one gene per line (no header).
@@
-    gene_lists:
-      # use a descriptive gene list name, as this will be used as an infix in
-      # the filename of the resulting heatmap plot
+    gene_lists:
+      # Use a descriptive gene list name; it will be used in the filename of the resulting heatmap plot.
       gene_list_1: "config/gene_list.tsv"

42-42: Consider updating default Ensembl release (duplicate of prior nit)

Default is "114". If you want “latest working” by default, consider bumping to current.

What is the latest Ensembl release as of today, and is release 114 still recommended as a default for new analyses?

🧹 Nitpick comments (5)

workflow/scripts/plot_diffexp_heatmap.R (2)

35-38: Robust gene-list read: handle headerless files explicitly

If lists are single-column without header (as docs suggest), read as headerless and rename to avoid accidental header rows being treated as genes.

Apply:

-    predef_gene_list <-
-        read_tsv(
-            snakemake@input[["predef_gene_list"]],
-            col_names = c("gene")
-        )
+    predef_gene_list <-
+        read_tsv(
+            snakemake@input[["predef_gene_list"]],
+            col_names = FALSE,
+            show_col_types = FALSE
+        ) |>
+        rename(gene = 1)

47-70: Ensure proper matrix type for ggheatmap and qualify unit()

Coerce to a numeric matrix so ggheatmap has the expected input.
Use grid::unit to avoid missing import.

Apply:

-matrix <- selected_genes |>
+matrix <- selected_genes |>
     mutate(transcript = str_pad(transcript, 18, "right")) |>
     replace_na(list(gene = "   ")) |>
     mutate(t = str_c(gene, " - ", transcript)) |>
     select(-gene, -transcript) |>
-    column_to_rownames(var = "t")
+    column_to_rownames(var = "t") |>
+    as.matrix()
@@
-    anno_top(size = unit(3, "cm")) +
+    anno_top(size = grid::unit(3, "cm")) +
@@
-    anno_right(size = unit(3, "cm")) +
+    anno_right(size = grid::unit(3, "cm")) +

workflow/scripts/plot-bootstrap.R (3)

14-19: Prefer slice_min over superseded top_n and avoid negative n

Cleaner and more explicit.

Apply:
-top_genes <- transcripts |>
-    filter(qval <= snakemake@params[["fdr"]]) |>
-    top_n(top_n, qval) |>
+top_genes <- transcripts |>
+    filter(qval <= snakemake@params[["fdr"]]) |>
+    slice_min(qval, n = abs(top_n)) |>
     dplyr::select(ext_gene) |>
     drop_na() |>
     distinct(ext_gene)
35-35: Create output dir recursively and silence warnings

Avoid warnings if the directory exists and handle nested dirs.

Apply:
-dir.create(snakemake@output[[1]])
+dir.create(snakemake@output[[1]], showWarnings = FALSE, recursive = TRUE)
12-12: Optional: suppress type messages for read_tsv

Keeps logs clean.

Apply:
-transcripts <- read_tsv(snakemake@input[["transcripts"]])
+transcripts <- read_tsv(snakemake@input[["transcripts"]], show_col_types = FALSE)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6709e7 and a4b56c6.

⛔ Files ignored due to path filters (2)

.test/resources/gene_list.tsv is excluded by !**/*.tsv
.test/resources/gene_list_2.tsv is excluded by !**/*.tsv

📒 Files selected for processing (5)

.test/config/config.yaml (2 hunks)
config/config.yaml (3 hunks)
workflow/envs/heatmap.yaml (1 hunks)
workflow/scripts/plot-bootstrap.R (1 hunks)
workflow/scripts/plot_diffexp_heatmap.R (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.test/config/config.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: formatting

workflow/envs/heatmap.yaml

workflow/scripts/plot-bootstrap.R

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

config/config.yaml (1)

153-156: Clarify wording (“infix”), add period, and note “no header”.

Make wording clearer and guide users to single-column, headerless files to match the readers.

-      # use a descriptive gene list name, as this will be used as an infix in
-      # the filename of the resulting heatmap plot
+      # Use a descriptive gene list name; it will be part of the resulting heatmap
+      # filename.
+      # Each gene list file must be single-column, no header; one HGNC symbol per line.
       gene_list_1: "config/gene_list.tsv"

🧹 Nitpick comments (5)

workflow/scripts/plot_diffexp_heatmap.R (2)
47-53: Ensure a true numeric matrix for ggheatmap.

Coerce to matrix after setting row names.
 matrix <- selected_genes |>
     mutate(transcript = str_pad(transcript, 18, "right")) |>
     replace_na(list(gene = "   ")) |>
     mutate(t = str_c(gene, " - ", transcript)) |>
     select(-gene, -transcript) |>
-    column_to_rownames(var = "t")
+    column_to_rownames(var = "t") |>
+    as.matrix()
57-65: Namespace unit() to avoid runtime resolution issues.

Explicitly use grid::unit.
-    anno_top(size = unit(3, "cm")) +
+    anno_top(size = grid::unit(3, "cm")) +
@@
-    anno_right(size = unit(3, "cm")) +
+    anno_right(size = grid::unit(3, "cm")) +
workflow/scripts/plot-bootstrap.R (3)
14-19: Use slice_min instead of negative top_n.

Cleaner and future-proof.
-top_genes <- transcripts |>
-    filter(qval <= snakemake@params[["fdr"]]) |>
-    top_n(top_n, qval) |>
+top_genes <- transcripts |>
+    filter(qval <= snakemake@params[["fdr"]]) |>
+    slice_min(qval, n = as.integer(snakemake@params[["top_n"]]), with_ties = FALSE) |>
     dplyr::select(ext_gene) |>
     drop_na() |>
     distinct(ext_gene)
35-35: Make dir.create idempotent.

Avoid warnings when dir exists and ensure parents are created.
-dir.create(snakemake@output[[1]])
+dir.create(snakemake@output[[1]], recursive = TRUE, showWarnings = FALSE)
22-26: Multi-file read_tsv: confirm behavior.

read_tsv() (via vroom) can read a vector of files and row-bind them. Confirm that your readr version guarantees this across environments; otherwise, loop and bind_rows.
# Alternative (if needed):
genes_of_interest <- purrr::map_df(
  unname(snakemake@params[["genes_of_interest"]][["gene_lists"]]),
  ~ read_tsv(.x, col_names = "ext_gene")
) |> distinct()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6709e7 and a4b56c6.

⛔ Files ignored due to path filters (2)

.test/resources/gene_list.tsv is excluded by !**/*.tsv
.test/resources/gene_list_2.tsv is excluded by !**/*.tsv

📒 Files selected for processing (5)

.test/config/config.yaml (2 hunks)
config/config.yaml (3 hunks)
workflow/envs/heatmap.yaml (1 hunks)
workflow/scripts/plot-bootstrap.R (1 hunks)
workflow/scripts/plot_diffexp_heatmap.R (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.test/config/config.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-rna-workflow
GitHub Check: run-three-prime-rna-workflow

workflow/envs/heatmap.yaml

workflow/scripts/plot_diffexp_heatmap.R

workflow/scripts/plot-bootstrap.R

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (8)

workflow/scripts/plot_diffexp_heatmap.R (4)
10-12: Optional: quiet readr parsing output

Consider suppressing column-type messages to keep logs clean.
-log_counts <- read_tsv(
-    snakemake@input[["logcountmatrix_file"]]
-)
+log_counts <- readr::read_tsv(
+  snakemake@input[["logcountmatrix_file"]],
+  show_col_types = FALSE
+)
47-53: Guard empty selections to avoid ggheatmap crash

If no rows are selected (empty gene list or no matches), ggheatmap likely errors. Short‑circuit with a friendly message and create an empty placeholder plot or skip saving.
-matrix <- selected_genes |>
+if (nrow(selected_genes) == 0) {
+  message("No genes selected for heatmap; skipping plot.")
+  quit(save = "no", status = 0)
+}
+
+matrix <- selected_genes |>
54-70: Namespace grid::unit for clarity

Be explicit to avoid masked imports.
-    anno_top(size = unit(3, "cm")) +
+    anno_top(size = grid::unit(3, "cm")) +
...
-    anno_right(size = unit(3, "cm")) +
+    anno_right(size = grid::unit(3, "cm")) +
34-38: Gene list read: document “no header” assumption

You assign col_names = "gene", implying a headerless single-column file. Consider a quick check to detect and warn on accidental headers.
 predef_gene_list <-
-    read_tsv(
+    readr::read_tsv(
         snakemake@input[["predef_gene_list"]],
-        col_names = c("gene")
+        col_names = "gene",
+        show_col_types = FALSE
     )
config/config.yaml (2)
45-51: Avoid hardcoding “release-114” in examples

Use a placeholder or the current release to prevent confusion.
-# https://ftp.ensembl.org/pub/release-114/species_EnsemblVertebrates.txt
+# https://ftp.ensembl.org/pub/release-<release>/species_EnsemblVertebrates.txt
...
-# https://ftp.ensembl.org/pub/release-114/fasta/homo_sapiens/cdna/
+# https://ftp.ensembl.org/pub/release-<release>/fasta/homo_sapiens/cdna/
144-151: Clarify gene list file format: no header

Explicitly state that files must be single-column, headerless TSV/CSV with one HGNC symbol per line.
-# you can set `activate: true` and provide one or more gene list files. In
-# those files, list all your HGNC gene symbols of interest, one gene per line.
+# you can set `activate: true` and provide one or more gene list files.
+# Each file must be a single-column, headerless TSV/CSV listing HGNC gene
+# symbols of interest, one gene per line.
workflow/scripts/plot-bootstrap.R (2)
35-35: Optional: make dir.create idempotent

Avoid warnings if the directory already exists.
-dir.create(snakemake@output[[1]])
+dir.create(snakemake@output[[1]], recursive = TRUE, showWarnings = FALSE)
14-19: Prefer slice_min over deprecated top_n

top_n() is superseded; slice_min expresses intent directly.
-top_genes <- transcripts |>
-    filter(qval <= snakemake@params[["fdr"]]) |>
-    top_n(top_n, qval) |>
+top_genes <- transcripts |>
+    filter(qval <= snakemake@params[["fdr"]]) |>
+    dplyr::slice_min(order_by = qval, n = as.integer(snakemake@params[["top_n"]])) |>
     dplyr::select(ext_gene) |>
     drop_na() |>
     distinct(ext_gene)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6709e7 and 4c4536c.

⛔ Files ignored due to path filters (2)

.test/resources/gene_list.tsv is excluded by !**/*.tsv
.test/resources/gene_list_2.tsv is excluded by !**/*.tsv

📒 Files selected for processing (5)

.test/config/config.yaml (3 hunks)
config/config.yaml (4 hunks)
workflow/envs/heatmap.yaml (1 hunks)
workflow/scripts/plot-bootstrap.R (1 hunks)
workflow/scripts/plot_diffexp_heatmap.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-three-prime-rna-workflow
GitHub Check: run-rna-workflow

🔇 Additional comments (2)

.test/config/config.yaml (1)

85-87: FDR=0.5 in tests: confirm intent

A 0.5 FDR will yield many positives (more bootstrap plots). If this is to stress plotting in CI, fine; otherwise consider a tighter threshold.

workflow/envs/heatmap.yaml (1)

6-7: r-ggalign includes ggheatmap(), anno_top(), anno_right(), and align_dendro(); no new dependencies required.

.test/config/config.yaml

workflow/scripts/plot-bootstrap.R

Addimator

Another round, but again only two small things. The R improvements were particularly interesting to me since I originally wrote the pheatmap stuff. It was my first time looking into R after a long time. So rule of thumb: Whenever possible use tidyverse and ggplot for doing stuff in R, I will remember that :)

config/config.yaml

Addimator · 2025-10-14T08:03:04Z

workflow/schemas/config.schema.yaml

+                type: string
        required:
-          - genelist
+          - gene_lists


Why is gene_lists required? When the user sets activate to FALSE, then we do not need gene_lists?

I guess this is debatable. It was in there previously and I kept it, because even if we set it to FALSE, we usually keep the placeholder to have the expected structure in place. Also, the bootstrap rule doesn't actually specify the respective files under input, so we would not get a missing input exception, but rather some lookup error where I wouldn't be sure if it is clearly interpretable. On the other hand, if someone really doesn't care about these lists, but wants other updates to this workflow, they will be forced to update their config. So we could consider dropping this.
Maybe @fxwiegand has another opinion?

Co-authored-by: Adrian Prinz <44083468+Addimator@users.noreply.github.com>

dlaehnemann · 2025-10-14T11:10:49Z

Another round, but again only two small things. The R improvements were particularly interesting to me since I originally wrote the pheatmap stuff. It was my first time looking into R after a long time. So rule of thumb: Whenever possible use tidyverse and ggplot for doing stuff in R, I will remember that :)

I think this was actually re-implementing the sleuth-internal heatmap, right? Because that already wasn't configurable in the way we wanted it to be? But yeah, I also wanted to try this package, which seems to be very systematic and allows for all kinds of complex multi-panel plots with nice alignment among them:
https://yunuuuu.github.io/ggalign-gallery/

feat: allow for multiple gene_lists, one heatmap per list, bootstrap …

a6eaf38

…plots for union of genes (untested)

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

workflow/rules/common.smk Outdated Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Outdated Show resolved Hide resolved

dlaehnemann added 4 commits October 8, 2025 13:49

fix: lookup syntax

896e4bc

chore: snakefmt

c48cb17

fix: typo

1a67b6a

fix: restore proper handling of topn in heatmap plotting rule

dca8aef

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

dlaehnemann added 10 commits October 9, 2025 10:58

docs: add linkout to ensembl blog release category to determine lates…

6043b72

…t release

fix: handle topn case in lookup

c9b1afb

fix: provide dummy file path for topn case in heatmap rule

5594d41

fix: use correct wildcard in heatmap plotting script

b4f88d1

docs: better explain diffexp/exclude in config.yaml

fa97e67

fix: correctly parse params in plot_bootstrap

3e97347

fix: also test diffexp/exclude in config.yaml of CI cases

c4fb68b

fix: more defensive coding strategy, as suggested by @coderabbitai

206d244

dummy commit to trigger GitHub

09914fb

another dummy commit to trigger GitHub Actions

84aca9e

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

docs: update comment in plot-bootstrap.R

f6709e7

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

fxwiegand reviewed Oct 13, 2025

View reviewed changes

dlaehnemann changed the title ~~feat: allow for multiple gene_lists~~ feat!: allow for multiple gene_lists Oct 13, 2025

Addimator reviewed Oct 13, 2025

View reviewed changes

config/config.yaml Outdated Show resolved Hide resolved

config/config.yaml Outdated Show resolved Hide resolved

config/config.yaml Outdated Show resolved Hide resolved

dlaehnemann added 7 commits October 13, 2025 19:47

fix: switch back to line-separated gene set files for plots

511cae8

fix: make boostrap plotting more modern

e036131

fix: clean up and overhaul heatmap plotting, switch to ggalign

107d503

fix: main test config

a4b56c6

chore: update release versions

778f099

docs: adjust wording to not use "infix"

b338f0e

docs: write a proper sentence

4c4536c

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

workflow/envs/heatmap.yaml Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

workflow/envs/heatmap.yaml Show resolved Hide resolved

workflow/scripts/plot_diffexp_heatmap.R Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

.test/config/config.yaml Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Show resolved Hide resolved

workflow/scripts/plot-bootstrap.R Outdated Show resolved Hide resolved

dlaehnemann added 3 commits October 13, 2025 20:08

fix: length test for transcripts of interest

5f1c7fa

fix: syntax

626f306

fix: r-base to >=4.1 for scripts using native R pipe

4e015b3

Addimator reviewed Oct 14, 2025

View reviewed changes

dlaehnemann and others added 2 commits October 14, 2025 13:04

docs: update config/config.yaml to ensembl release 114

846b420

Co-authored-by: Adrian Prinz <44083468+Addimator@users.noreply.github.com>

docs: update more 114 -> 115

8dbfca8

feat!: allow for multiple gene_lists #164

Are you sure you want to change the base?

feat!: allow for multiple gene_lists #164

Uh oh!

Conversation

dlaehnemann commented Oct 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dlaehnemann commented Oct 10, 2025

Uh oh!

coderabbitai bot commented Oct 10, 2025

Uh oh!

fxwiegand left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlaehnemann commented Oct 13, 2025

Uh oh!

fxwiegand commented Oct 13, 2025

Uh oh!

Addimator commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Addimator commented Oct 13, 2025

Uh oh!

dlaehnemann commented Oct 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Addimator left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Addimator Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

dlaehnemann commented Oct 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 8, 2025 •

edited

Loading

fxwiegand left a comment •

edited

Loading