-
Notifications
You must be signed in to change notification settings - Fork 134
Description
What container were you trying to use, and how were you attempting to use it?
The docker image for most current version of SeqSero2 v1.3.1 contains SalmID 0.11
Image name staphb/seqsero2:1.3.1
and the :latest
tag.
I have found multiple occurrences/samples where this docker image has misclassified the subspecies, resulting in discrepancies between results produced from this docker image and the results from the PulseNet 2.0 platform (which I think uses SeqSero2S and some kind of look up table)
For example, with one sample I expected subspecies "diarizonae IIIb", but this docker image predicts "arizonae IIIa"
And vice-versa: with another sample I expected subspecies "arizonae IIIa", but this docker image predicts "diarizonae IIIb"
Another example, I expected subspecies "houtenae IV", but this docker image did not predicted a subspecies at all, just -
So, when running the same data through the biocontainers docker image quay.io/biocontainers/seqsero2:1.3.1--pyhdfd78af_1
, all predictions are correct and as expected.
I've concluded that the reason the biocontainers docker image runs successfully, is that SalmID is on v0.1.23 (the most recent version) instead of the older 0.11 version
SO. I'm working on re-vamping the SeqSero2 v1.3.1 dockerfile to install via the bioconda recipe instead of installing dependencies manually. SeqSero2 is old & unmaintained python code and there's so many dependencies which leads to mess when attempting to install everything independently. So many issues with installing the various python modules and external deps like SalmID which require poetry
(yet another python package manager????) for manual installation.
I'm also planning on adding tests for the various subspecies to ensure it can predict them accurately.
I will open a draft PR soon.