Fixed Conda dependency issues in environment.yml #211

KurayiChawatama · 2025-08-02T20:47:44Z

PR checklist

Description of Changes

This PR resolves several environment-related issues that were affecting -profile conda functionality and module portability within the nf-core/scdownstream pipeline. The fixes apply to both nf-core and local modules and focus on improving compatibility, completeness, and reproducibility.

Summary of Fixes

nf-core Modules (PR to be opened there soon)

Fixed scvitools/solo conda environment creation failure by downgrading from Python 3.12.7 to Python 3.10.
Fixed doubletdetection environment creation crash by adding missing libraries: anndata, psutil, phenograph, pillow, matplotlib, pyparsing.

local Modules

Enforced highly_variable column data type as boolean to fix crashes during adata/extend.
Added missing dependencies r2py and SingleCellExperiment to the readrds module's environment.
Used importlib.metadata to dynamically retrieve the rpy2 version, preventing crashes during rds reading.
Fixed shebang lines in upsetplot.py and doublet_removal.py to use /usr/bin/env python3 for portability.
Added missing pandas and matplotlib to the doublet removal environment.
Explicitly included scrublet in the scrublet module's environment to resolve missing imports.

Other

Ran nf-core lint --fix files_unchanged to sync assets/nf-core-scdownstream_logo_light.png with the nf-core template.

Validation

Tested on two different machines using the command:

nextflow run ./scdownstream \
  -profile conda,test \
  --ambient_removal none \
  --celltypist_model Adult_Human_Skin \
  --integration_methods harmony \
  --integration_hvgs 500 \
  --doublet_detection scrublet,solo \
  --skip_liana true

with the celldex_reference parameter commented out in the test.config to skip singleR annotation, which is not supported with the conda run

Note: The celldex_reference parameter was commented out in test.config to skip singleR annotation, which is not currently supported in Conda.

All affected environments now resolve and run successfully with no missing dependencies or execution errors.

Copilot

Pull Request Overview

This PR fixes Conda dependency issues across multiple modules in the nf-core/scdownstream pipeline to ensure proper environment resolution and execution with the -profile conda configuration.

Fixed missing dependencies in several conda environment files
Updated Python shebang lines for better portability across systems
Added data type enforcement for the 'highly_variable' column to prevent runtime crashes

Reviewed Changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
modules/nf-core/doubletdetection/environment.yml	Reordered dependencies (pyyaml moved after pip)
modules/local/scanpy/scrublet/environment.yml	Added missing scrublet dependency
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py	Updated shebang to use portable env python
modules/local/doublet_detection/doublet_removal/environment.yml	Added missing pandas and matplotlib dependencies
modules/local/adata/upsetgenes/templates/upsetplot.py	Updated shebang to use portable env python
modules/local/adata/readrds/templates/readrds.py	Used importlib.metadata for rpy2 version and fixed process name
modules/local/adata/readrds/environment.yml	Added missing SingleCellExperiment and rpy2 dependencies
modules/local/adata/extend/templates/extend.py	Added boolean type enforcement for highly_variable column

modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py

modules/local/adata/upsetgenes/templates/upsetplot.py

modules/local/doublet_detection/doublet_removal/environment.yml

modules/local/adata/readrds/environment.yml

nictru

Thanks for the PR, but some changes are needed

assets/nf-core-scdownstream_logo_light.png

conf/base.config

nictru · 2025-08-03T12:01:32Z

modules/local/adata/extend/templates/extend.py

I know what issue you are trying to fix with this, but I need to think about if this is the best way of doing it

modules/local/adata/readrds/environment.yml

modules/local/adata/readrds/templates/readrds.py

modules/local/adata/upsetgenes/templates/upsetplot.py

modules/local/doublet_detection/doublet_removal/environment.yml

modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py

modules/local/liana/rankaggregate/environment.yml

modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py

nictru · 2025-08-03T12:24:53Z

I just realized I did not attach my comments to the code sections properly, but I think you can make the connections (in the "Files changed" tab it is more clear)

KurayiChawatama · 2025-08-04T08:16:53Z

@nictru I should be able to get to working on these in the next few days and provide justifications where you need them :) Thanks!

This reverts commit d67d647.

… to nf-core modules

KurayiChawatama · 2025-08-19T20:21:53Z

@nictru some changes willl need to be made to the nf-core modules. I will open PRs for those on that repo soon

KurayiChawatama · 2025-08-20T13:02:22Z

modules/local/doublet_detection/doublet_removal/main.nf

Fixes the error that happened because the doublet_removal module tried to load a pickle (sample_scrublet.pkl) that was created with NumPy ≥2.0, which uses the internal module numpy._core. The environment for doublet_removal was pinned to NumPy 1.23.5, where numpy._core does not exist, so unpickling failed with ModuleNotFoundError. Updating the environment to a NumPy-2–compatible stack resolved the mismatch.

Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1 here as well, not just anndata=0.11.1)

nictru · 2025-08-20T16:06:04Z

Hey, just to let you know, I am on vacation right now and will only be able to look at this another time some time next week. But thanks for the effort!

nictru

Quite some improvements, but there are still some things that bother me

nictru · 2025-08-31T17:27:46Z

modules/local/adata/extend/templates/extend.py

+# Ensure 'highly_variable' is boolean if present
+if "highly_variable" in adata.var:
+    adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool)
+


Suggested change

# Ensure 'highly_variable' is boolean if present

if "highly_variable" in adata.var:

adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool)

I explained the issue that you are trying to fix in #215 and implemented a different fix in the meantime. That fix is not optimal, but for me preferable over this. So please revert this for now

I am not happy with the situation in this aspect, but I have not yet come up with a really clean way of handling this

nictru · 2025-08-31T17:28:42Z

modules/local/adata/readrds/templates/readrds.py

-    "${task.process}": {
+    "NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:LOAD_H5AD:ADATA_READRDS": {


Revert this please, unless you have a very good explanation for why this is needed

nictru · 2025-08-31T18:08:53Z

modules/local/adata/readrds/templates/readrds.py

        "anndata": ad.__version__,
        "anndata2ri": anndata2ri.__version__,
-        "rpy2": rpy2.__version__,
+        "rpy2": importlib.metadata.version("rpy2"),


I did the following to validate your concern:

docker run -it community.wave.seqera.io/library/anndata2ri_bioconductor-singlecellexperiment_anndata_r-seurat_rpy2:5e785d9c16504ed6

python3

import rpy2

rpy2.__version__

Which correctly printed a version string. So if you get an error and I do not, I think the following are possible explanations:

You have a different version of python in the conda env (in docker it's 3.12.11)

You have a different version of rpy2 in the conda env (in docker it's 3.5.11) - but since the rpy2 version is encoded in the environment.yml, this should not be possible

The problem is related to the host computer, outside of conda

The reason I come to these explanations is that the docker is basically a minimal ubuntu, with micromamba installed and all the specified packages installed in the base environment. So if it works in this setting, but not in yours, there is not many places one needs to check.

Anyways, I don't want to accept this change until we have a better understanding of the problem. The __version__ access is used in many places throughout the pipeline (and all of nf-core) and if there is a fundamental problem with it, we need to make a lot more changes than this single one

nictru · 2025-08-31T18:10:53Z

modules/local/doublet_detection/doublet_removal/main.nf

Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1 here as well, not just anndata=0.11.1)

KurayiChawatama requested review from Copilot and nictru August 2, 2025 20:47

Copilot AI reviewed Aug 2, 2025

View reviewed changes

KurayiChawatama added bug Something isn't working good first issue Good for newcomers labels Aug 2, 2025

KurayiChawatama self-assigned this Aug 2, 2025

KurayiChawatama added this to scdownstream Aug 2, 2025

KurayiChawatama removed this from scdownstream Aug 2, 2025

nictru requested changes Aug 3, 2025

View reviewed changes

KurayiChawatama added 19 commits August 19, 2025 20:55

changed base.config process_medium settings for local runs

c1875d2

fixed conda profile highly_variable column not boolean error

4a0e1ca

added missing singlecellexperiment and r2py libs to environment.yml

0fa1ee6

fixed rpy2 version fetch crash

eda483e

fixed upset plot script python fetch fail crash

8f1f0ca

added missing libs to doublet removal env yml

e74e1ad

fixed doublet removal script python fetch fail crash

2a02909

added evironment yml to scds to allow conda based running

b22c1a0

updated liana env yml to include missing conda packadges

f05a981

added missing scrublet lib to env yml

e596e27

added missing libs to doublet detection env yml

95f8c8e

added missing libs to solo env yml

93a62e9

untested attempted addition of SingleR support to the pipeline

0e711ff

Revert "untested attempted addition of SingleR support to the pipeline"

4edf537

This reverts commit d67d647.

reset base.config to it's original state

9ab7a8b

reset the scds module to it's original condaless state

8e9ce72

added correct conda dependencies for liana rankaggregate to run

8363758

reset liana rankagreggate env to it's original state

784d8e6

reverted nf-core solo env.yml to original state

5b01930

KurayiChawatama added 2 commits August 19, 2025 20:58

restored nf-core doubletdetection env.yml to original version

8dfca4f

Fix: Sync logo to match nf-core template (lint fix)

fc9c88c

KurayiChawatama force-pushed the fix/conda-improvements branch from 5748730 to fc9c88c Compare August 19, 2025 19:01

KurayiChawatama added 3 commits August 19, 2025 21:21

resolved the issues raised in the PR

270f77b

added the container images to match env yaml changes, removed changes…

bee50ca

… to nf-core modules

matched liana py to upstream, removed scrublet environment yaml

089c0b0

KurayiChawatama requested a review from nictru August 19, 2025 20:23

KurayiChawatama added 2 commits August 20, 2025 14:49

fix(doublet_removal): update environment to NumPy 2–compatible stack

72dfd31

updated doublet removal contianers to match changes to env yml

119efb0

KurayiChawatama commented Aug 20, 2025

View reviewed changes

nictru requested changes Aug 31, 2025

View reviewed changes

	# Ensure 'highly_variable' is boolean if present
	if "highly_variable" in adata.var:
	adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool)

		"${task.process}": {
		"NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:LOAD_H5AD:ADATA_READRDS": {

Fixed Conda dependency issues in environment.yml #211

Are you sure you want to change the base?

Fixed Conda dependency issues in environment.yml #211

Uh oh!

Conversation

KurayiChawatama commented Aug 2, 2025

PR checklist

Description of Changes

Summary of Fixes

nf-core Modules (PR to be opened there soon)

local Modules

Other

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nictru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nictru Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nictru commented Aug 3, 2025

Uh oh!

KurayiChawatama commented Aug 4, 2025

Uh oh!

KurayiChawatama commented Aug 19, 2025

Uh oh!

KurayiChawatama Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

nictru Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

nictru commented Aug 20, 2025

Uh oh!

nictru left a comment

Choose a reason for hiding this comment

Uh oh!

nictru Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

nictru Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

nictru Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

nictru Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants