Skip to content

Submitting LBDiscover package #725

@chaoliu-cl

Description

@chaoliu-cl

Submitting Author Name: Chao Liu
Submitting Author Github Handle: @chaoliu-cl
Other Package Authors Github handles: (comma separated, delete if none)
Repository: https://github.com/chaoliu-cl/LBDiscover
Version submitted:
Submission type: Standard
Editor: TBD
Reviewers: TBD

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: LBDiscover
Title: Literature-Based Discovery Tools for Biomedical Research
Version: 0.1.0
Date: 2025-05-14
Authors@R: 
    person("Chao Liu", email = "chaoliu@cedarville.edu", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-9979-8272"))
Description: A suite of tools for literature-based discovery in biomedical research. 
    Provides functions for retrieving scientific articles from PubMed and 
    other NCBI databases, extracting biomedical entities (diseases, drugs, genes, etc.), 
    building co-occurrence networks, and applying various discovery models 
    including ABC, AnC, LSI, and BITOLA. The package also includes 
    visualization tools for exploring discovered connections.
License: GPL-3
URL: https://github.com/chaoliu-cl/LBDiscover, http://liu-chao.site/LBDiscover/, https://liu-chao.site/LBDiscover/
BugReports: https://github.com/chaoliu-cl/LBDiscover/issues
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Depends: 
    R (>= 4.0.0)
Imports: 
    httr (>= 1.4.0),
    xml2 (>= 1.3.0),
    igraph (>= 1.2.0),
    Matrix (>= 1.3.0),
    utils,
    stats,
    grDevices,
    graphics,
    tools,
    rentrez (>= 1.2.0),
    jsonlite (>= 1.7.0)
Suggests:
    openxlsx (>= 4.2.0),
    SnowballC (>= 0.7.0),
    visNetwork (>= 2.1.0),
    spacyr (>= 1.2.0),
    parallel,
    digest (>= 0.6.0),
    irlba (>= 2.3.0),
    knitr,
    rmarkdown,
    base64enc,
    reticulate,
    testthat (>= 3.0.0),
    mockery,
    covr,
    htmltools
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • translation
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    Data retrieval: The package provides functions for retrieving scientific articles from PubMed and other NCBI databases. It is a tool for systematically accessing biomedical literature from major research repositories.
    Data extraction: It extracts biomedical entities (diseases, drugs, genes, etc.) from retrieved literature, performing information extraction from scientific texts.
    Citation management and bibliometrics: The package builds co-occurrence networks from literature and applies discovery models (ABC, AnC, LSI, BITOLA) to find hidden connections between concepts, which represents bibliometric analysis for literature-based discovery research.

  • Who is the target audience and what are scientific applications of this package?
    Target Audience: LBDiscover is designed for biomedical researchers, bioinformaticians, and data scientists working in literature-based discovery (LBD). The primary users include:

  • Biomedical researchers seeking hidden connections between diseases, drugs, and genes

  • Pharmaceutical researchers exploring drug repurposing opportunities

  • Bioinformaticians building knowledge networks from literature

  • Graduate students and academics studying computational approaches to hypothesis generation

Scientific Applications:
The package supports several key research applications:

  1. Drug Discovery and Repurposing: LBD has been used extensively in drug development and repurposing as well as predicting adverse drug reactions
  2. Disease-Gene Association Discovery: Using literature-based discovery to identify disease candidate genes
  3. Biomarker Identification: LBD has been explored as a tool to identify biomarkers for diagnostic and prognostic for diseases
  4. Hypothesis Generation: Creating testable scientific hypotheses by connecting disparate pieces of literature
  5. Knowledge Network Construction: Building co-occurrence networks to visualize research landscapes
  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
    There are several R packages that overlap with LBDiscover's functionality, but none provide the same comprehensive approach to literature-based discovery:
    Similar Packages and Key Differences:
  1. pubmed.mineR
    Overlap: PubMed text mining with functions for data visualization and biomedical entity extraction
    Difference: Focuses on general text mining and clustering rather than implementing specific LBD models like ABC, AnC, LSI, and BITOLA

  2. bibliometrix
    Overlap: Comprehensive science mapping analysis with network analysis capabilities and bibliometric workflows
    Difference: Designed for general scientometric analysis across all disciplines, not specifically for biomedical literature-based discovery or implementing LBD-specific algorithms

  3. Data Retrieval Packages (rentrez, easyPubMed, RISmed)
    Overlap: All provide interfaces to NCBI/PubMed for retrieving biomedical literature
    Difference: These focus solely on data retrieval and don't perform LBD analysis, entity extraction, or hypothesis generation

How LBDiscover Meets Best-in-Category Criteria:

  1. Unique Functionality: LBDiscover is the first R package to specifically implement established LBD models:
  • ABC Model: The most basic and widespread type of LBD centered around finding connections between concepts A, B, and C
  • BITOLA: An interactive literature-based biomedical discovery support system using semantic prediction
  • LSI (Latent Semantic Indexing): A statistical technique for improving information retrieval effectiveness used to assist in literature-based discoveries
  • AnC Model: Advanced connection models for more sophisticated discovery patterns
  1. Integrated Workflow: Unlike other packages that handle only one aspect (retrieval OR analysis OR visualization), LBDiscover provides a complete workflow from data retrieval through entity extraction to discovery model application and network visualization.
  2. Biomedical Specialization: While bibliometrix serves general scientometrics and pubmed.mineR does general text mining, LBDiscover is specifically designed for biomedical literature-based discovery with domain-specific entity recognition (diseases, drugs, genes).
  3. Modern Implementation: Recent work has focused on integrating Large Language Models for enhancing Literature-Based Discovery processes, and LBDiscover appears positioned to incorporate such advances while maintaining established methodological foundations.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions