Repository to translate spectra to queries.
Here is what you minimally need:
- A file containing MS/MS spectra with associated skeleton information (or any other relevant chemical classification) provided as metadata. This structure information, stored in the metadata field “skeleton”, allows the generation of queries specific to a given skeleton by extracting repetitive skeleton-specific fragmentation patterns. The MIADB file is provided as an example.
As the package is not (yet) available on CRAN, you will need to install with:
install.packages(
"SpectraToQueries",
repos = c(
"https://spectra-to-knowledge.r-universe.dev",
"https://bioc.r-universe.dev",
"https://cloud.r-project.org"
)
)
To reproduce the example that uses the Monoterpene Indole Alkaloids Database (.mgf) file by default, which includes the annotation of spectral skeletons:
SpectraToQueries::spectra_to_queries()
To reproduce the “grouped” example that uses the MIADB file, which includes an expert-based annotation of spectral “super skeletons” (combination of skeletons exhibiting a high structural similarity):
SpectraToQueries::spectra_to_queries(
spectra = system.file(
"extdata",
"spectra_grouped.rds",
package = "SpectraToQueries"
),
export = "data/interim/queries-grouped.tsv"
)
To generate diagnostic ions queries from your spectra:
SpectraToQueries::spectra_to_queries(
spectra = "yourAwesomeSpectra.mgf",
export = "path/yourEvenBetterResults.tsv"
)
Showing all parameters:
SpectraToQueries::spectra_to_queries(
spectra = NULL,
export = "data/interim/queries.tsv",
beta_1 = 1.0,
beta_2 = 0.5,
dalton = 0.01,
decimals = 4L,
intensity_min = 0.0,
ions_max = 10L,
n_skel_min = 5L,
n_spec_min = 3L,
ppm = 30.0,
fscore_min = 0.0,
precision_min = 0.0,
recall_min = 0.0,
zero_val = 0.0
)
Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids: https://doi.org/10.1186/s13321-025-01009-0
Package | Version | Citation |
---|---|---|
base | 4.5.1 | R Core Team (2025) |
BiocGenerics | 0.55.1 | Huber et al. (2015) |
BiocManager | 1.30.26 | Morgan and Ramos (2025) |
BiocParallel | 1.43.4 | Wang et al. (2025) |
BiocVersion | 3.22.0 | Morgan (2025) |
knitr | 1.50 | Xie (2014); Xie (2015); Xie (2025) |
MsBackendMgf | 1.17.0 | Gatto, Rainer, and Gibb (2025) |
pkgload | 1.4.0 | Wickham et al. (2024) |
rmarkdown | 2.29 | Xie, Allaire, and Grolemund (2018); Xie, Dervieux, and Riederer (2020); Allaire et al. (2024) |
Spectra | 1.19.4 | Rainer et al. (2022) |
testthat | 3.2.3 | Wickham (2011) |
tidytable | 0.11.2 | Fairbanks (2024) |
tidyverse | 2.0.0 | Wickham et al. (2019) |
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2024. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Fairbanks, Mark. 2024. tidytable: Tidy Interface to “data.table”. https://doi.org/10.32614/CRAN.package.tidytable.
Gatto, Laurent, Johannes Rainer, and Sebastian Gibb. 2025. MsBackendMgf: Mass Spectrometry Data Backend for Mascot Generic Format (Mgf) Files. https://doi.org/10.18129/B9.bioc.MsBackendMgf.
Huber, W., Carey, V. J., Gentleman, R., Anders, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.
Morgan, Martin. 2025. BiocVersion: Set the Appropriate Version of Bioconductor Packages. https://doi.org/10.18129/B9.bioc.BiocVersion.
Morgan, Martin, and Marcel Ramos. 2025. BiocManager: Access the Bioconductor Project Package Repository. https://doi.org/10.32614/CRAN.package.BiocManager.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in r.” Metabolites 12: 173. https://doi.org/10.3390/metabo12020173.
Wang, Jiefei, Martin Morgan, Valerie Obenchain, Michel Lang, Ryan Thompson, and Nitesh Turaga. 2025. BiocParallel: Bioconductor Facilities for Parallel Evaluation. https://doi.org/10.18129/B9.bioc.BiocParallel.
Wickham, Hadley. 2011. “testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Jim Hester, and Lionel Henry. 2024. pkgload: Simulate Package Installation and Attach. https://doi.org/10.32614/CRAN.package.pkgload.
Xie, Yihui. 2014. “knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.