Skip to content

Conversation

KurayiChawatama
Copy link

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scdownstream branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Description of Changes

This PR resolves several environment-related issues that were affecting -profile conda functionality and module portability within the nf-core/scdownstream pipeline. The fixes apply to both nf-core and local modules and focus on improving compatibility, completeness, and reproducibility.

Summary of Fixes

nf-core Modules (PR to be opened there soon)

  • Fixed scvitools/solo conda environment creation failure by downgrading from Python 3.12.7 to Python 3.10.
  • Fixed doubletdetection environment creation crash by adding missing libraries: anndata, psutil, phenograph, pillow, matplotlib, pyparsing.

local Modules

  • Enforced highly_variable column data type as boolean to fix crashes during adata/extend.
  • Added missing dependencies r2py and SingleCellExperiment to the readrds module's environment.
  • Used importlib.metadata to dynamically retrieve the rpy2 version, preventing crashes during rds reading.
  • Fixed shebang lines in upsetplot.py and doublet_removal.py to use /usr/bin/env python3 for portability.
  • Added missing pandas and matplotlib to the doublet removal environment.
  • Explicitly included scrublet in the scrublet module's environment to resolve missing imports.

Other

  • Ran nf-core lint --fix files_unchanged to sync assets/nf-core-scdownstream_logo_light.png with the nf-core template.

Validation

Tested on two different machines using the command:

nextflow run ./scdownstream \
  -profile conda,test \
  --ambient_removal none \
  --celltypist_model Adult_Human_Skin \
  --integration_methods harmony \
  --integration_hvgs 500 \
  --doublet_detection scrublet,solo \
  --skip_liana true

with the celldex_reference parameter commented out in the test.config to skip singleR annotation, which is not supported with the conda run

Note: The celldex_reference parameter was commented out in test.config to skip singleR annotation, which is not currently supported in Conda.

All affected environments now resolve and run successfully with no missing dependencies or execution errors.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes Conda dependency issues across multiple modules in the nf-core/scdownstream pipeline to ensure proper environment resolution and execution with the -profile conda configuration.

  • Fixed missing dependencies in several conda environment files
  • Updated Python shebang lines for better portability across systems
  • Added data type enforcement for the 'highly_variable' column to prevent runtime crashes

Reviewed Changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
modules/nf-core/doubletdetection/environment.yml Reordered dependencies (pyyaml moved after pip)
modules/local/scanpy/scrublet/environment.yml Added missing scrublet dependency
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py Updated shebang to use portable env python
modules/local/doublet_detection/doublet_removal/environment.yml Added missing pandas and matplotlib dependencies
modules/local/adata/upsetgenes/templates/upsetplot.py Updated shebang to use portable env python
modules/local/adata/readrds/templates/readrds.py Used importlib.metadata for rpy2 version and fixed process name
modules/local/adata/readrds/environment.yml Added missing SingleCellExperiment and rpy2 dependencies
modules/local/adata/extend/templates/extend.py Added boolean type enforcement for highly_variable column

@KurayiChawatama KurayiChawatama added bug Something isn't working good first issue Good for newcomers labels Aug 2, 2025
@KurayiChawatama KurayiChawatama self-assigned this Aug 2, 2025
Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, but some changes are needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what issue you are trying to fix with this, but I need to think about if this is the best way of doing it

@nictru
Copy link
Collaborator

nictru commented Aug 3, 2025

I just realized I did not attach my comments to the code sections properly, but I think you can make the connections (in the "Files changed" tab it is more clear)

@KurayiChawatama
Copy link
Author

@nictru I should be able to get to working on these in the next few days and provide justifications where you need them :) Thanks!

@KurayiChawatama
Copy link
Author

@nictru some changes willl need to be made to the nf-core modules. I will open PRs for those on that repo soon

@KurayiChawatama KurayiChawatama requested a review from nictru August 19, 2025 20:23
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes the error that happened because the doublet_removal module tried to load a pickle (sample_scrublet.pkl) that was created with NumPy ≥2.0, which uses the internal module numpy._core. The environment for doublet_removal was pinned to NumPy 1.23.5, where numpy._core does not exist, so unpickling failed with ModuleNotFoundError. Updating the environment to a NumPy-2–compatible stack resolved the mismatch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1 here as well, not just anndata=0.11.1)

@nictru
Copy link
Collaborator

nictru commented Aug 20, 2025

Hey, just to let you know, I am on vacation right now and will only be able to look at this another time some time next week. But thanks for the effort!

Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite some improvements, but there are still some things that bother me

Comment on lines +41 to +44
# Ensure 'highly_variable' is boolean if present
if "highly_variable" in adata.var:
adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Ensure 'highly_variable' is boolean if present
if "highly_variable" in adata.var:
adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool)

I explained the issue that you are trying to fix in #215 and implemented a different fix in the meantime. That fix is not optimal, but for me preferable over this. So please revert this for now

I am not happy with the situation in this aspect, but I have not yet come up with a really clean way of handling this

Comment on lines -56 to +58
"${task.process}": {
"NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:LOAD_H5AD:ADATA_READRDS": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this please, unless you have a very good explanation for why this is needed

"anndata": ad.__version__,
"anndata2ri": anndata2ri.__version__,
"rpy2": rpy2.__version__,
"rpy2": importlib.metadata.version("rpy2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the following to validate your concern:

  1. docker run -it community.wave.seqera.io/library/anndata2ri_bioconductor-singlecellexperiment_anndata_r-seurat_rpy2:5e785d9c16504ed6
  2. python3
  3. import rpy2
  4. rpy2.__version__

Which correctly printed a version string. So if you get an error and I do not, I think the following are possible explanations:

  • You have a different version of python in the conda env (in docker it's 3.12.11)
  • You have a different version of rpy2 in the conda env (in docker it's 3.5.11) - but since the rpy2 version is encoded in the environment.yml, this should not be possible
  • The problem is related to the host computer, outside of conda

The reason I come to these explanations is that the docker is basically a minimal ubuntu, with micromamba installed and all the specified packages installed in the base environment. So if it works in this setting, but not in yours, there is not many places one needs to check.

Anyways, I don't want to accept this change until we have a better understanding of the problem. The __version__ access is used in many places throughout the pipeline (and all of nf-core) and if there is a fundamental problem with it, we need to make a lot more changes than this single one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1 here as well, not just anndata=0.11.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working good first issue Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants