Skip to content

[RFC] Enabling Cython-based PDB parser backend for speed improvements #139

@a-r-j

Description

@a-r-j

Describe the workflow you want to enable

Currently, the pure-python of PDB parsing in BioPandas is quite slow - certainly too slow for highthroughput structural bioinformatics or ML.

Describe your proposed solution

I have written a Cython-based implementation (CPDB) which is considerably faster and would like to set this as the default parsing backend. As it stands, I believe this to be one of the fastest (if not the fastest) available PDB parser for Python.

Screenshot 2023-08-29 at 13 25 44

Performance comparison

However, given BioPandas' widespread usage, I am unclear if distributing this with a Cython component will lead to dependency problems for users.

Describe alternatives you've considered, if relevant

Speeding up the passage of time

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions