Skip to content

CellProfiling/SubCellPortable

Repository files navigation

SubCellPortable

Efficient inference wrapper for the SubCell subcellular protein localization foundation model

Python License DOI

SubCellPortable provides a streamlined interface for running the SubCell model on IF microscopy images. Generate single-cell embeddings that encode cell morphology or protein localization and predict protein subcellular localization from multi-channel fluorescence microscopy images.

πŸ“„ Preprint: SubCell: Subcellular protein localization foundation model (Gupta et al., 2024)


πŸš€ Quick Start

Installation

# Clone repository
git clone https://github.com/yourusername/SubCellPortable.git
cd SubCellPortable

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Basic Usage

  1. Prepare your input CSV (path_list.csv):
r_image,y_image,b_image,g_image,output_prefix
images/cell_1_mt.png,,images/cell_1_nuc.png,images/cell_1_prot.png,cell1_
images/cell_2_mt.png,,images/cell_2_nuc.png,images/cell_2_prot.png,cell2_

Channel mapping:

  • r = microtubules (red)
  • y = ER (yellow)
  • b = nuclei (blue/DAPI)
  • g = protein of interest (green)

Leave channels empty if not available (e.g., use rbg for 3-channel images)

  1. Configure settings (config.yaml):
model_channels: "rybg"      # Channel configuration
output_dir: "./results"     # Output directory
batch_size: 128             # Batch size (adjust for GPU memory)
gpu: 0                      # GPU device ID (-1 for CPU)
output_format: "combined"   # "combined" (h5ad) or "individual" (npy)
  1. Run inference:
python process.py

πŸ“– Usage Guide

Command-Line Interface

# Basic run with config file
python process.py

# Specify parameters via CLI
python process.py --output_dir ./results --batch_size 256 --gpu 0

# Custom config and input files
python process.py --config experiment_config.yaml --path_list experiment_data.csv -o ./results

# Embeddings only (faster, no classification)
python process.py -o ./results --embeddings_only

# Get help
python process.py --help

Input CSV Formats

Recommended Format:

r_image,y_image,b_image,g_image,output_prefix
path/to/image1_mt.png,,path/to/image1_nuc.png,path/to/image1_prot.png,sample_1
path/to/image2_mt.png,,path/to/image2_nuc.png,path/to/image2_prot.png,batch_A/sample_2
  • Skip rows by prefixing with #
  • Create subfolders in the output folder by them to output_prefix like: /subfolder_1/sample_1

Legacy Format (deprecated but supported):

r_image,y_image,b_image,g_image,output_folder,output_prefix
...

βš™οΈ Configuration Parameters

Basic Parameters

Parameter Description Default Example
--config Path to configuration YAML file config.yaml experiment.yaml
--path_list Path to input CSV file path_list.csv data.csv
--output_dir -o Output directory for all results - ./results
--model_channels -c Channel configuration rybg rbg, ybg, bg
--model_type -m Model architecture mae_contrast_supcon_model vit_supcon_model
--output_format Output format combined individual
--num_workers -w Data loading workers 4 8
--gpu -g GPU device ID (-1 = CPU) -1 0
--batch_size -b Batch size 128 256
--embeddings_only Skip classification False -

Advanced Parameters

Parameter Description Default
--update_model -u Download/update models False
--prefetch_factor -p Prefetch batches 2
--create_csv Generate combined CSV False
--save_attention_maps Save attention visualizations False
--async_saving Async file saving (individual only) False
--quiet -q Suppress verbose logging False

πŸ“¦ Output Formats

Default: Combined H5AD Format

File: embeddings.h5ad (AnnData-compatible)

import anndata as ad

# Load results
adata = ad.read_h5ad("results/embeddings.h5ad")

# Access data
embeddings = adata.X                    # (n_samples, 1536)
probabilities = adata.obsm['probabilities']  # (n_samples, 31)
sample_ids = adata.obs_names            # Image identifiers

Compatible with: scanpy and other single-cell tools

Individual Format

Files per image:

  • {output_prefix}_embedding.npy - 1536D embedding vector
  • {output_prefix}_probabilities.npy - 31-class probability distribution
  • {output_prefix}_attention_map.png - Attention visualization (optional)
import numpy as np

embedding = np.load("results/cell1_embedding.npy")      # Shape: (1536,)
probs = np.load("results/cell1_probabilities.npy")      # Shape: (31,)

Optional: Combined CSV

File: result.csv

Column Description
id Sample identifier
top_class_name Top predicted location
top_class Top class index
top_3_classes_names Top 3 predictions (comma-separated)
top_3_classes Top 3 indices
prob00 - prob30 Full probability distribution
feat0000 - feat1535 Full embedding vector

🎯 Subcellular Location Classes

The model predicts 31 subcellular locations:

View all 31 classes
  1. Actin filaments
  2. Aggresome
  3. Cell Junctions
  4. Centriolar satellite
  5. Centrosome
  6. Cytokinetic bridge
  7. Cytoplasmic bodies
  8. Cytosol
  9. Endoplasmic reticulum
  10. Endosomes
  11. Focal adhesion sites
  12. Golgi apparatus
  13. Intermediate filaments
  14. Lipid droplets
  15. Lysosomes
  16. Microtubules
  17. Midbody
  18. Mitochondria
  19. Mitotic chromosome
  20. Mitotic spindle
  21. Nuclear bodies
  22. Nuclear membrane
  23. Nuclear speckles
  24. Nucleoli
  25. Nucleoli fibrillar center
  26. Nucleoli rim
  27. Nucleoplasm
  28. Peroxisomes
  29. Plasma membrane
  30. Vesicles
  31. Negative

Class names and visualization colors available in inference.py (CLASS2NAME, CLASS2COLOR dictionaries).


πŸ”§ Model Setup

Using Default Models

Models are automatically downloaded on first run with -u/--update_model:

python process.py -u --output_dir ./results

Custom Models

Edit models_urls.yaml to specify custom model URLs:

rybg:  # 4-channel configuration
  mae_contrast_supcon_model:
    encoder: "s3://bucket/path/to/encoder.pth"
    classifier_s0: "https://url/to/classifier.pth"

🀝 Citation

If you use SubCellPortable in your research, please cite:

@article{gupta2024subcell,
  title={SubCell: Subcellular protein localization foundation model},
  author={Gupta, Ankit and others},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.12.06.627299}
}

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


Support


SubCellPortable wrapper maintained with ❀️ by the Lundberg Lab for the computational biology community.

About

This is a convenient code wrapper to run Lundberg lab SubCell model in inference with your own images

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages