-
Notifications
You must be signed in to change notification settings - Fork 41
Fixed Conda dependency issues in environment.yml #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Fixed Conda dependency issues in environment.yml #211
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes Conda dependency issues across multiple modules in the nf-core/scdownstream pipeline to ensure proper environment resolution and execution with the -profile conda
configuration.
- Fixed missing dependencies in several conda environment files
- Updated Python shebang lines for better portability across systems
- Added data type enforcement for the 'highly_variable' column to prevent runtime crashes
Reviewed Changes
Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
modules/nf-core/doubletdetection/environment.yml | Reordered dependencies (pyyaml moved after pip) |
modules/local/scanpy/scrublet/environment.yml | Added missing scrublet dependency |
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py | Updated shebang to use portable env python |
modules/local/doublet_detection/doublet_removal/environment.yml | Added missing pandas and matplotlib dependencies |
modules/local/adata/upsetgenes/templates/upsetplot.py | Updated shebang to use portable env python |
modules/local/adata/readrds/templates/readrds.py | Used importlib.metadata for rpy2 version and fixed process name |
modules/local/adata/readrds/environment.yml | Added missing SingleCellExperiment and rpy2 dependencies |
modules/local/adata/extend/templates/extend.py | Added boolean type enforcement for highly_variable column |
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py
Outdated
Show resolved
Hide resolved
modules/local/doublet_detection/doublet_removal/environment.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but some changes are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what issue you are trying to fix with this, but I need to think about if this is the best way of doing it
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py
Show resolved
Hide resolved
modules/local/doublet_detection/doublet_removal/templates/doublet_removal.py
Outdated
Show resolved
Hide resolved
I just realized I did not attach my comments to the code sections properly, but I think you can make the connections (in the "Files changed" tab it is more clear) |
@nictru I should be able to get to working on these in the next few days and provide justifications where you need them :) Thanks! |
This reverts commit d67d647.
5748730
to
fc9c88c
Compare
@nictru some changes willl need to be made to the nf-core modules. I will open PRs for those on that repo soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes the error that happened because the doublet_removal module tried to load a pickle (sample_scrublet.pkl) that was created with NumPy ≥2.0, which uses the internal module numpy._core. The environment for doublet_removal was pinned to NumPy 1.23.5, where numpy._core does not exist, so unpickling failed with ModuleNotFoundError. Updating the environment to a NumPy-2–compatible stack resolved the mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1
here as well, not just anndata=0.11.1
)
Hey, just to let you know, I am on vacation right now and will only be able to look at this another time some time next week. But thanks for the effort! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite some improvements, but there are still some things that bother me
# Ensure 'highly_variable' is boolean if present | ||
if "highly_variable" in adata.var: | ||
adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Ensure 'highly_variable' is boolean if present | |
if "highly_variable" in adata.var: | |
adata.var["highly_variable"] = adata.var["highly_variable"].astype(bool) |
I explained the issue that you are trying to fix in #215 and implemented a different fix in the meantime. That fix is not optimal, but for me preferable over this. So please revert this for now
I am not happy with the situation in this aspect, but I have not yet come up with a really clean way of handling this
"${task.process}": { | ||
"NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:LOAD_H5AD:ADATA_READRDS": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this please, unless you have a very good explanation for why this is needed
"anndata": ad.__version__, | ||
"anndata2ri": anndata2ri.__version__, | ||
"rpy2": rpy2.__version__, | ||
"rpy2": importlib.metadata.version("rpy2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the following to validate your concern:
docker run -it community.wave.seqera.io/library/anndata2ri_bioconductor-singlecellexperiment_anndata_r-seurat_rpy2:5e785d9c16504ed6
python3
import rpy2
rpy2.__version__
Which correctly printed a version string. So if you get an error and I do not, I think the following are possible explanations:
- You have a different version of python in the conda env (in docker it's 3.12.11)
- You have a different version of rpy2 in the conda env (in docker it's 3.5.11) - but since the rpy2 version is encoded in the
environment.yml
, this should not be possible - The problem is related to the host computer, outside of conda
The reason I come to these explanations is that the docker is basically a minimal ubuntu, with micromamba installed and all the specified packages installed in the base
environment. So if it works in this setting, but not in yours, there is not many places one needs to check.
Anyways, I don't want to accept this change until we have a better understanding of the problem. The __version__
access is used in many places throughout the pipeline (and all of nf-core) and if there is a fundamental problem with it, we need to make a lot more changes than this single one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, updating to newer versions is generally fine, just make sure to use the full accessions (e.g. conda-forge::anndata=0.11.1
here as well, not just anndata=0.11.1
)
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).Description of Changes
This PR resolves several environment-related issues that were affecting
-profile conda
functionality and module portability within thenf-core/scdownstream
pipeline. The fixes apply to bothnf-core
andlocal
modules and focus on improving compatibility, completeness, and reproducibility.Summary of Fixes
nf-core Modules (PR to be opened there soon)
scvitools/solo
conda environment creation failure by downgrading from Python 3.12.7 to Python 3.10.doubletdetection
environment creation crash by adding missing libraries:anndata
,psutil
,phenograph
,pillow
,matplotlib
,pyparsing
.local Modules
highly_variable
column data type as boolean to fix crashes duringadata/extend
.r2py
andSingleCellExperiment
to thereadrds
module's environment.importlib.metadata
to dynamically retrieve therpy2
version, preventing crashes duringrds
reading.upsetplot.py
anddoublet_removal.py
to use/usr/bin/env python3
for portability.pandas
andmatplotlib
to the doublet removal environment.scrublet
in thescrublet
module's environment to resolve missing imports.Other
nf-core lint --fix files_unchanged
to syncassets/nf-core-scdownstream_logo_light.png
with the nf-core template.Validation
Tested on two different machines using the command:
nextflow run ./scdownstream \ -profile conda,test \ --ambient_removal none \ --celltypist_model Adult_Human_Skin \ --integration_methods harmony \ --integration_hvgs 500 \ --doublet_detection scrublet,solo \ --skip_liana true
with the celldex_reference parameter commented out in the test.config to skip singleR annotation, which is not supported with the conda run
All affected environments now resolve and run successfully with no missing dependencies or execution errors.