-
Notifications
You must be signed in to change notification settings - Fork 45
feat!: allow for multiple gene_lists #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat!: allow for multiple gene_lists #164
Conversation
…plots for union of genes (untested)
WalkthroughReplaces single-string diffexp.genes_of_interest.genelist with a map diffexp.genes_of_interest.gene_lists across configs and schema; updates Snakemake rules and wildcards to iterate by Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CFG as Config (diffexp.gene_lists)
participant SMK as Snakemake
participant HEAT as plot_diffexp_heatmap.R
participant BOOT as plot-bootstrap.R
CFG->>SMK: provide map of named gene-list paths
rect rgb(240,248,255)
note right of SMK: Snakemake iterates per `gene_list` key
SMK->>SMK: expand `{gene_list}` wildcard
SMK->>HEAT: run heatmap job with {model, gene_list, predef_gene_list}
HEAT-->>SMK: produce `{model}.{gene_list}.diffexp-heatmap.{gene_list}.pdf` + log
end
rect rgb(245,255,240)
note right of SMK: bootstrap reads all enabled gene_lists
SMK->>BOOT: pass gene_lists map (paths + activate flags)
BOOT->>BOOT: read TSVs, derive transcripts_of_interest from observed counts
BOOT-->>SMK: generate bootstrap PDFs per transcript (filenames include gene, transcript, model)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
workflow/rules/diffexp.smk (1)
258-279
: Consider adding wildcard constraint for{gene_list}
.The rule correctly implements per-gene-list heatmap generation. However, consider adding a wildcard constraint for
{gene_list}
to prevent potential issues with special characters in gene list names and maintain consistency with existing patterns (see line 35:{model,[^.]+}
).For example, add a constraint in the output definition:
- "results/plots/diffexp-heatmap/{model}.{gene_list}.diffexp-heatmap.{mode}.pdf", + "results/plots/diffexp-heatmap/{model}.{gene_list,[^./]+}.diffexp-heatmap.{mode}.pdf",This prevents gene list names containing dots or slashes from causing wildcard ambiguity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.test/resources/gene_list.tsv
is excluded by!**/*.tsv
.test/resources/gene_list_2.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (8)
.test/config/config.yaml
(1 hunks).test/three_prime/config/config.yaml
(1 hunks)config/config.yaml
(1 hunks)workflow/rules/common.smk
(1 hunks)workflow/rules/diffexp.smk
(2 hunks)workflow/schemas/config.schema.yaml
(1 hunks)workflow/scripts/plot-bootstrap.R
(1 hunks)workflow/scripts/plot_diffexp_heatmap.R
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Tests
workflow/rules/common.smk
[error] 336-336: SyntaxError: positional argument follows keyword argument.
🔇 Additional comments (1)
workflow/rules/diffexp.smk (1)
149-149
: LGTM! Correct implementation for multi-list bootstrap aggregation.The change correctly retrieves the entire
gene_lists
map from the config, enabling the bootstrap plot script to aggregate genes across all provided lists as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/rules/common.smk (1)
329-340
: Syntax error from previous review has been addressed.The critical issues flagged in the past review (missing
dpath=
keyword and incorrect path) have been resolved. The current implementation correctly useslookup(within=config, dpath="diffexp/genes_of_interest/gene_lists")
.For consistency with line 326 (which uses
gene_list=["topn"]
), consider explicitly converting the lookup result to a list:gene_list=lookup( within=config, dpath="diffexp/genes_of_interest/gene_lists" ), + gene_list=list( + lookup( + within=config, dpath="diffexp/genes_of_interest/gene_lists" + ) + ),This makes the type explicit and aligns with the default gene_list specification on line 326.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk
(1 hunks)workflow/rules/diffexp.smk
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/diffexp.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-three-prime-rna-workflow
- GitHub Check: run-rna-workflow
🔇 Additional comments (1)
workflow/rules/common.smk (1)
323-328
: LGTM! Default heatmap generation for top N genes is well-structured.The addition of the
{gene_list}
wildcard and the defaultgene_list=["topn"]
ensures that a heatmap for the top differentially expressed genes is always generated, which maintains backward compatibility while supporting the new multi-list feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
workflow/scripts/plot-bootstrap.R (2)
23-30
: Well-implemented gene list aggregation.The logic correctly handles multiple gene lists:
character()
initialization prevents crashes whengene_lists
is empty (addresses previous review)union()
automatically deduplicates genes across lists- Pipe operators and transformations are appropriate
Consider adding error handling around file reads to provide clearer feedback if a gene list file is missing or malformed:
for (gene_list in unname(snakemake@params[["genes_of_interest"]][["gene_lists"]])) { - new_genes <- read_tsv( - gene_list, - col_names = FALSE - ) + new_genes <- tryCatch( + read_tsv(gene_list, col_names = FALSE, show_col_types = FALSE), + error = function(e) { + stop(sprintf("Failed to read gene list '%s': %s", gene_list, e$message)) + } + ) genes <- union(genes, new_genes |> unlist() |> unname()) }
36-39
: Update outdated comment.The comment refers to an old implementation detail. Consider updating it to reflect the current logic:
} else { - # "genes" is null, if the list provided in config.yaml is empty + # Create empty tibble when genes_of_interest is not activated genes_of_interest <- tibble(ext_gene = character()) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/scripts/plot-bootstrap.R
(1 hunks)
🔇 Additional comments (2)
workflow/scripts/plot-bootstrap.R (2)
22-22
: LGTM! Parameter access updated correctly.The parameter path change from
genes
togenes_of_interest
aligns with the PR's shift to multiple gene lists.
31-35
: LGTM! Tibble creation is correct.
enframe()
appropriately converts the gene vector to a tibble with the expectedext_gene
column name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/scripts/plot-bootstrap.R (1)
22-35
: Add error handling for missing or malformed gene list files.The loop reads TSV files without error handling. If a file path is invalid or the file is malformed, the script will crash with potentially unclear errors. Consider wrapping the file read operations in
tryCatch()
to provide informative error messages.Additionally, the native pipe operator
|>
requires R >= 4.1.0, which may break on older R versions if not properly documented or enforced.Apply this diff to add error handling:
if (snakemake@params[["genes_of_interest"]][["activate"]] == TRUE) { genes = character() for (gene_list in unname(snakemake@params[["genes_of_interest"]][["gene_lists"]])) { - new_genes <- read_tsv( - gene_list, - col_names = FALSE - ) - genes <- union(genes, new_genes |> unlist() |> unname()) + tryCatch({ + new_genes <- read_tsv( + gene_list, + col_names = FALSE, + show_col_types = FALSE + ) + genes <- union(genes, new_genes |> unlist() |> unname()) + }, error = function(e) { + stop(sprintf("Failed to read gene list file '%s': %s", gene_list, e$message)) + }) } genes_of_interest <- enframe( genes, name = NULL, value = "ext_gene" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/scripts/plot-bootstrap.R
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-three-prime-rna-workflow
- GitHub Check: run-rna-workflow
🔇 Additional comments (1)
workflow/scripts/plot-bootstrap.R (1)
37-38
: LGTM!The empty tibble initialization is consistent with the structure created when genes_of_interest is activated, ensuring downstream compatibility.
@coderabbitai |
Tip For best results, initiate chat on the files or code changes. @dlaehnemann, thank you for the clarification! You're absolutely right that ✏️ Learnings added
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the rotation of the genes into the horizontal list. I would not say that this helps readability either. Also previous lists would need to be adapted again. Couldnt you simply rotate the list where it is needed in the code?
Also one thought that came to my mind was that this is a breaking change correct? One could make this non-breaking by simply allowing to add a second gene set via another column.
However all this is probably nitpicky stuff - looks good in general i would say!
Good points, thanks. The explanatory comment in the Also, this is a breaking change, yes. So we should reflect this in the version bump. I also like the idea of having only a single file with multiple columns, but this would also require a header line with column names, to name each list. So I guess this would also be breaking. And I am actually unsure whether users would rather maintain a single file with multiple columns, or multiple files with a single column each. The former would mean that genes arbitrarily line up in rows, even though they don't relate in any way. So I guess I would stick with one file per gene list? |
Yeah i get that! The
Ah yes right the columns would at least need a header for naming the gene set then so yeah the change will be breaking anyway. In that case i also don't really have a strong opinion on single file vs. multiple files. I agree that the single file solution file could be confusing when looking at a row of that table. |
For the single file multiple columns discussion I want to add that I personally like single files (especially since the cluster is so damn slow a lot of times) to not have to open multiple files. But since this is a really specific scenario and there should not be too many files anyways I do not think this is an important reason to switch to single file multiple cols. |
I just finished the review. Again some nitpicking stuff and nothing major to add. Great PR from my side too! |
OK, I went back to line-separated gene lists. But then I couldn't bear the current status of the code and changed it all to proper tidyverse syntax and hopefully made it all a bit more readable. Also fixed the bootstrap plot filtering in the process, which was omitting quite a few requested genes. And switched the heatmap plotting from the very specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
config/config.yaml (2)
144-156
: Clarify gene-list docs: avoid “infix”, add period, and state “no header”Tighten wording and remove ambiguity about headers.
Apply:
- # If you want a heatmap and bootstrap plots for particular sets of genes, + # If you want a heatmap and bootstrap plots for particular sets of genes, # you can set `activate: true` and provide one or more gene list files. In - # those files, list all your HGNC gene symbols of interest, one gene per line. + # those files, list all your HGNC gene symbols of interest, one gene per line (no header). @@ - gene_lists: - # use a descriptive gene list name, as this will be used as an infix in - # the filename of the resulting heatmap plot + gene_lists: + # Use a descriptive gene list name; it will be used in the filename of the resulting heatmap plot. gene_list_1: "config/gene_list.tsv"
42-42
: Consider updating default Ensembl release (duplicate of prior nit)Default is "114". If you want “latest working” by default, consider bumping to current.
What is the latest Ensembl release as of today, and is release 114 still recommended as a default for new analyses?
🧹 Nitpick comments (5)
workflow/scripts/plot_diffexp_heatmap.R (2)
35-38
: Robust gene-list read: handle headerless files explicitlyIf lists are single-column without header (as docs suggest), read as headerless and rename to avoid accidental header rows being treated as genes.
Apply:
- predef_gene_list <- - read_tsv( - snakemake@input[["predef_gene_list"]], - col_names = c("gene") - ) + predef_gene_list <- + read_tsv( + snakemake@input[["predef_gene_list"]], + col_names = FALSE, + show_col_types = FALSE + ) |> + rename(gene = 1)
47-70
: Ensure proper matrix type for ggheatmap and qualify unit()
- Coerce to a numeric matrix so ggheatmap has the expected input.
- Use grid::unit to avoid missing import.
Apply:
-matrix <- selected_genes |> +matrix <- selected_genes |> mutate(transcript = str_pad(transcript, 18, "right")) |> replace_na(list(gene = " ")) |> mutate(t = str_c(gene, " - ", transcript)) |> select(-gene, -transcript) |> - column_to_rownames(var = "t") + column_to_rownames(var = "t") |> + as.matrix() @@ - anno_top(size = unit(3, "cm")) + + anno_top(size = grid::unit(3, "cm")) + @@ - anno_right(size = unit(3, "cm")) + + anno_right(size = grid::unit(3, "cm")) +workflow/scripts/plot-bootstrap.R (3)
14-19
: Prefer slice_min over superseded top_n and avoid negative nCleaner and more explicit.
Apply:
-top_genes <- transcripts |> - filter(qval <= snakemake@params[["fdr"]]) |> - top_n(top_n, qval) |> +top_genes <- transcripts |> + filter(qval <= snakemake@params[["fdr"]]) |> + slice_min(qval, n = abs(top_n)) |> dplyr::select(ext_gene) |> drop_na() |> distinct(ext_gene)
35-35
: Create output dir recursively and silence warningsAvoid warnings if the directory exists and handle nested dirs.
Apply:
-dir.create(snakemake@output[[1]]) +dir.create(snakemake@output[[1]], showWarnings = FALSE, recursive = TRUE)
12-12
: Optional: suppress type messages for read_tsvKeeps logs clean.
Apply:
-transcripts <- read_tsv(snakemake@input[["transcripts"]]) +transcripts <- read_tsv(snakemake@input[["transcripts"]], show_col_types = FALSE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.test/resources/gene_list.tsv
is excluded by!**/*.tsv
.test/resources/gene_list_2.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (5)
.test/config/config.yaml
(2 hunks)config/config.yaml
(3 hunks)workflow/envs/heatmap.yaml
(1 hunks)workflow/scripts/plot-bootstrap.R
(1 hunks)workflow/scripts/plot_diffexp_heatmap.R
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .test/config/config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
config/config.yaml (1)
153-156
: Clarify wording (“infix”), add period, and note “no header”.Make wording clearer and guide users to single-column, headerless files to match the readers.
- # use a descriptive gene list name, as this will be used as an infix in - # the filename of the resulting heatmap plot + # Use a descriptive gene list name; it will be part of the resulting heatmap + # filename. + # Each gene list file must be single-column, no header; one HGNC symbol per line. gene_list_1: "config/gene_list.tsv"
🧹 Nitpick comments (5)
workflow/scripts/plot_diffexp_heatmap.R (2)
47-53
: Ensure a true numeric matrix for ggheatmap.Coerce to matrix after setting row names.
matrix <- selected_genes |> mutate(transcript = str_pad(transcript, 18, "right")) |> replace_na(list(gene = " ")) |> mutate(t = str_c(gene, " - ", transcript)) |> select(-gene, -transcript) |> - column_to_rownames(var = "t") + column_to_rownames(var = "t") |> + as.matrix()
57-65
: Namespace unit() to avoid runtime resolution issues.Explicitly use grid::unit.
- anno_top(size = unit(3, "cm")) + + anno_top(size = grid::unit(3, "cm")) + @@ - anno_right(size = unit(3, "cm")) + + anno_right(size = grid::unit(3, "cm")) +workflow/scripts/plot-bootstrap.R (3)
14-19
: Use slice_min instead of negative top_n.Cleaner and future-proof.
-top_genes <- transcripts |> - filter(qval <= snakemake@params[["fdr"]]) |> - top_n(top_n, qval) |> +top_genes <- transcripts |> + filter(qval <= snakemake@params[["fdr"]]) |> + slice_min(qval, n = as.integer(snakemake@params[["top_n"]]), with_ties = FALSE) |> dplyr::select(ext_gene) |> drop_na() |> distinct(ext_gene)
35-35
: Make dir.create idempotent.Avoid warnings when dir exists and ensure parents are created.
-dir.create(snakemake@output[[1]]) +dir.create(snakemake@output[[1]], recursive = TRUE, showWarnings = FALSE)
22-26
: Multi-file read_tsv: confirm behavior.
read_tsv()
(via vroom) can read a vector of files and row-bind them. Confirm that your readr version guarantees this across environments; otherwise, loop and bind_rows.# Alternative (if needed): genes_of_interest <- purrr::map_df( unname(snakemake@params[["genes_of_interest"]][["gene_lists"]]), ~ read_tsv(.x, col_names = "ext_gene") ) |> distinct()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.test/resources/gene_list.tsv
is excluded by!**/*.tsv
.test/resources/gene_list_2.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (5)
.test/config/config.yaml
(2 hunks)config/config.yaml
(3 hunks)workflow/envs/heatmap.yaml
(1 hunks)workflow/scripts/plot-bootstrap.R
(1 hunks)workflow/scripts/plot_diffexp_heatmap.R
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .test/config/config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-rna-workflow
- GitHub Check: run-three-prime-rna-workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (8)
workflow/scripts/plot_diffexp_heatmap.R (4)
10-12
: Optional: quiet readr parsing outputConsider suppressing column-type messages to keep logs clean.
-log_counts <- read_tsv( - snakemake@input[["logcountmatrix_file"]] -) +log_counts <- readr::read_tsv( + snakemake@input[["logcountmatrix_file"]], + show_col_types = FALSE +)
47-53
: Guard empty selections to avoid ggheatmap crashIf no rows are selected (empty gene list or no matches), ggheatmap likely errors. Short‑circuit with a friendly message and create an empty placeholder plot or skip saving.
-matrix <- selected_genes |> +if (nrow(selected_genes) == 0) { + message("No genes selected for heatmap; skipping plot.") + quit(save = "no", status = 0) +} + +matrix <- selected_genes |>
54-70
: Namespace grid::unit for clarityBe explicit to avoid masked imports.
- anno_top(size = unit(3, "cm")) + + anno_top(size = grid::unit(3, "cm")) + ... - anno_right(size = unit(3, "cm")) + + anno_right(size = grid::unit(3, "cm")) +
34-38
: Gene list read: document “no header” assumptionYou assign col_names = "gene", implying a headerless single-column file. Consider a quick check to detect and warn on accidental headers.
predef_gene_list <- - read_tsv( + readr::read_tsv( snakemake@input[["predef_gene_list"]], - col_names = c("gene") + col_names = "gene", + show_col_types = FALSE )config/config.yaml (2)
45-51
: Avoid hardcoding “release-114” in examplesUse a placeholder or the current release to prevent confusion.
-# https://ftp.ensembl.org/pub/release-114/species_EnsemblVertebrates.txt +# https://ftp.ensembl.org/pub/release-<release>/species_EnsemblVertebrates.txt ... -# https://ftp.ensembl.org/pub/release-114/fasta/homo_sapiens/cdna/ +# https://ftp.ensembl.org/pub/release-<release>/fasta/homo_sapiens/cdna/
144-151
: Clarify gene list file format: no headerExplicitly state that files must be single-column, headerless TSV/CSV with one HGNC symbol per line.
-# you can set `activate: true` and provide one or more gene list files. In -# those files, list all your HGNC gene symbols of interest, one gene per line. +# you can set `activate: true` and provide one or more gene list files. +# Each file must be a single-column, headerless TSV/CSV listing HGNC gene +# symbols of interest, one gene per line.workflow/scripts/plot-bootstrap.R (2)
35-35
: Optional: make dir.create idempotentAvoid warnings if the directory already exists.
-dir.create(snakemake@output[[1]]) +dir.create(snakemake@output[[1]], recursive = TRUE, showWarnings = FALSE)
14-19
: Prefer slice_min over deprecated top_ntop_n() is superseded; slice_min expresses intent directly.
-top_genes <- transcripts |> - filter(qval <= snakemake@params[["fdr"]]) |> - top_n(top_n, qval) |> +top_genes <- transcripts |> + filter(qval <= snakemake@params[["fdr"]]) |> + dplyr::slice_min(order_by = qval, n = as.integer(snakemake@params[["top_n"]])) |> dplyr::select(ext_gene) |> drop_na() |> distinct(ext_gene)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.test/resources/gene_list.tsv
is excluded by!**/*.tsv
.test/resources/gene_list_2.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (5)
.test/config/config.yaml
(3 hunks)config/config.yaml
(4 hunks)workflow/envs/heatmap.yaml
(1 hunks)workflow/scripts/plot-bootstrap.R
(1 hunks)workflow/scripts/plot_diffexp_heatmap.R
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-three-prime-rna-workflow
- GitHub Check: run-rna-workflow
🔇 Additional comments (2)
.test/config/config.yaml (1)
85-87
: FDR=0.5 in tests: confirm intentA 0.5 FDR will yield many positives (more bootstrap plots). If this is to stress plotting in CI, fine; otherwise consider a tighter threshold.
workflow/envs/heatmap.yaml (1)
6-7
: r-ggalign includes ggheatmap(), anno_top(), anno_right(), and align_dendro(); no new dependencies required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round, but again only two small things. The R improvements were particularly interesting to me since I originally wrote the pheatmap stuff. It was my first time looking into R after a long time. So rule of thumb: Whenever possible use tidyverse and ggplot for doing stuff in R, I will remember that :)
type: string | ||
required: | ||
- genelist | ||
- gene_lists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is gene_lists required? When the user sets activate to FALSE, then we do not need gene_lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is debatable. It was in there previously and I kept it, because even if we set it to FALSE, we usually keep the placeholder to have the expected structure in place. Also, the bootstrap rule doesn't actually specify the respective files under input, so we would not get a missing input exception, but rather some lookup error where I wouldn't be sure if it is clearly interpretable. On the other hand, if someone really doesn't care about these lists, but wants other updates to this workflow, they will be forced to update their config. So we could consider dropping this.
Maybe @fxwiegand has another opinion?
Co-authored-by: Adrian Prinz <44083468+Addimator@users.noreply.github.com>
I think this was actually re-implementing the sleuth-internal heatmap, right? Because that already wasn't configurable in the way we wanted it to be? But yeah, I also wanted to try this package, which seems to be very systematic and allows for all kinds of complex multi-panel plots with nice alignment among them: |
This will generate one heatmap per list, and bootstrap plots for each of the genes in the union of the lists.
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores