SubCellPortable

Efficient inference wrapper for the SubCell subcellular protein localization foundation model

SubCellPortable provides a streamlined interface for running the SubCell model on IF microscopy images. Generate single-cell embeddings that encode cell morphology or protein localization and predict protein subcellular localization from multi-channel fluorescence microscopy images.

📄 Preprint: SubCell: Subcellular protein localization foundation model (Gupta et al., 2024)

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/yourusername/SubCellPortable.git
cd SubCellPortable

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Basic Usage

Prepare your input CSV (path_list.csv):

r_image,y_image,b_image,g_image,output_prefix
images/cell_1_mt.png,,images/cell_1_nuc.png,images/cell_1_prot.png,cell1_
images/cell_2_mt.png,,images/cell_2_nuc.png,images/cell_2_prot.png,cell2_

Channel mapping:

r = microtubules (red)
y = ER (yellow)
b = nuclei (blue/DAPI)
g = protein of interest (green)

Leave channels empty if not available (e.g., use rbg for 3-channel images)

Configure settings (config.yaml):

model_channels: "rybg"      # Channel configuration
output_dir: "./results"     # Output directory
batch_size: 128             # Batch size (adjust for GPU memory)
gpu: 0                      # GPU device ID (-1 for CPU)
output_format: "combined"   # "combined" (h5ad) or "individual" (npy)

Run inference:

python process.py

📖 Usage Guide

Command-Line Interface

# Basic run with config file
python process.py

# Specify parameters via CLI
python process.py --output_dir ./results --batch_size 256 --gpu 0

# Custom config and input files
python process.py --config experiment_config.yaml --path_list experiment_data.csv -o ./results

# Embeddings only (faster, no classification)
python process.py -o ./results --embeddings_only

# Get help
python process.py --help

Input CSV Formats

Recommended Format:

r_image,y_image,b_image,g_image,output_prefix
path/to/image1_mt.png,,path/to/image1_nuc.png,path/to/image1_prot.png,sample_1
path/to/image2_mt.png,,path/to/image2_nuc.png,path/to/image2_prot.png,batch_A/sample_2

Skip rows by prefixing with #
Create subfolders in the output folder by them to output_prefix like: /subfolder_1/sample_1

Legacy Format (deprecated but supported):

r_image,y_image,b_image,g_image,output_folder,output_prefix
...

⚙️ Configuration Parameters

Basic Parameters

Parameter	Description	Default	Example
`--config`	Path to configuration YAML file	`config.yaml`	`experiment.yaml`
`--path_list`	Path to input CSV file	`path_list.csv`	`data.csv`
`--output_dir` `-o`	Output directory for all results	-	`./results`
`--model_channels` `-c`	Channel configuration	`rybg`	`rbg`, `ybg`, `bg`
`--model_type` `-m`	Model architecture	`mae_contrast_supcon_model`	`vit_supcon_model`
`--output_format`	Output format	`combined`	`individual`
`--num_workers` `-w`	Data loading workers	`4`	`8`
`--gpu` `-g`	GPU device ID (-1 = CPU)	`-1`	`0`
`--batch_size` `-b`	Batch size	`128`	`256`
`--embeddings_only`	Skip classification	`False`	-

Advanced Parameters

Parameter	Description	Default
`--update_model` `-u`	Download/update models	`False`
`--prefetch_factor` `-p`	Prefetch batches	`2`
`--create_csv`	Generate combined CSV	`False`
`--save_attention_maps`	Save attention visualizations	`False`
`--async_saving`	Async file saving (individual only)	`False`
`--quiet` `-q`	Suppress verbose logging	`False`

📦 Output Formats

Default: Combined H5AD Format

File: embeddings.h5ad (AnnData-compatible)

import anndata as ad

# Load results
adata = ad.read_h5ad("results/embeddings.h5ad")

# Access data
embeddings = adata.X                    # (n_samples, 1536)
probabilities = adata.obsm['probabilities']  # (n_samples, 31)
sample_ids = adata.obs_names            # Image identifiers

Compatible with: scanpy and other single-cell tools

Individual Format

Files per image:

{output_prefix}_embedding.npy - 1536D embedding vector
{output_prefix}_probabilities.npy - 31-class probability distribution
{output_prefix}_attention_map.png - Attention visualization (optional)

import numpy as np

embedding = np.load("results/cell1_embedding.npy")      # Shape: (1536,)
probs = np.load("results/cell1_probabilities.npy")      # Shape: (31,)

Optional: Combined CSV

File: result.csv

Column	Description
`id`	Sample identifier
`top_class_name`	Top predicted location
`top_class`	Top class index
`top_3_classes_names`	Top 3 predictions (comma-separated)
`top_3_classes`	Top 3 indices
`prob00` - `prob30`	Full probability distribution
`feat0000` - `feat1535`	Full embedding vector

🎯 Subcellular Location Classes

The model predicts 31 subcellular locations:

View all 31 classes

Actin filaments
Aggresome
Cell Junctions
Centriolar satellite
Centrosome
Cytokinetic bridge
Cytoplasmic bodies
Cytosol
Endoplasmic reticulum
Endosomes
Focal adhesion sites
Golgi apparatus
Intermediate filaments
Lipid droplets
Lysosomes
Microtubules
Midbody
Mitochondria
Mitotic chromosome
Mitotic spindle
Nuclear bodies
Nuclear membrane
Nuclear speckles
Nucleoli
Nucleoli fibrillar center
Nucleoli rim
Nucleoplasm
Peroxisomes
Plasma membrane
Vesicles
Negative

Class names and visualization colors available in inference.py (CLASS2NAME, CLASS2COLOR dictionaries).

🔧 Model Setup

Using Default Models

Models are automatically downloaded on first run with -u/--update_model:

python process.py -u --output_dir ./results

Custom Models

Edit models_urls.yaml to specify custom model URLs:

rybg:  # 4-channel configuration
  mae_contrast_supcon_model:
    encoder: "s3://bucket/path/to/encoder.pth"
    classifier_s0: "https://url/to/classifier.pth"

🤝 Citation

If you use SubCellPortable in your research, please cite:

@article{gupta2024subcell,
  title={SubCell: Subcellular protein localization foundation model},
  author={Gupta, Ankit and others},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.12.06.627299}
}

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues

SubCellPortable wrapper maintained with ❤️ by the Lundberg Lab for the computational biology community.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
models		models
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
config.py		config.py
config.yaml		config.yaml
dataset.py		dataset.py
image_utils.py		image_utils.py
inference.py		inference.py
model_loader.py		model_loader.py
models_urls.yaml		models_urls.yaml
output_handlers.py		output_handlers.py
path_list.csv		path_list.csv
process.py		process.py
requirements.txt		requirements.txt
vit_model.py		vit_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SubCellPortable

🚀 Quick Start

Installation

Basic Usage

📖 Usage Guide

Command-Line Interface

Input CSV Formats

⚙️ Configuration Parameters

Basic Parameters

Advanced Parameters

📦 Output Formats

Default: Combined H5AD Format

Individual Format

Optional: Combined CSV

🎯 Subcellular Location Classes

🔧 Model Setup

Using Default Models

Custom Models

🤝 Citation

📝 License

Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

CellProfiling/SubCellPortable

Folders and files

Latest commit

History

Repository files navigation

SubCellPortable

🚀 Quick Start

Installation

Basic Usage

📖 Usage Guide

Command-Line Interface

Input CSV Formats

⚙️ Configuration Parameters

Basic Parameters

Advanced Parameters

📦 Output Formats

Default: Combined H5AD Format

Individual Format

Optional: Combined CSV

🎯 Subcellular Location Classes

🔧 Model Setup

Using Default Models

Custom Models

🤝 Citation

📝 License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages