HALO Implementation: HALO_MIMIC3Dataset & HALO Model Classes #528
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adapted from: https://github.com/btheodorou99/HALO_Inpatient/tree/main
I've been using the following files to test this code via the NCSA cluster (but haven't included them in the PR/push). Both files will need to be at the root of
PyHealth/
(NOTPyHealth/pyhealth/
).Testing Script (
halo_testing_script.py
):print("BEGIN: Testing")
import subprocess, sys, os
subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", "."])
subprocess.check_call([sys.executable, "-m", "pip", "install", "numpy", "--force-reinstall"])
subprocess.check_call([sys.executable, "-m", "pip", "install", "pandas", "--force-reinstall"])
print("Success on pip install -e .")
from pyhealth.models.generators.halo import HALO
from pyhealth.models.generators.halo_resources.halo_model import HALOModel
from pyhealth.models.generators.halo_resources.halo_config import HALOConfig
from pyhealth.datasets.halo_mimic3 import HALO_MIMIC3Dataset
print("Sucess on imports")
print(f"Operating in dir: {os.getcwd()}")
halo_config = HALOConfig()
halo_dataset = HALO_MIMIC3Dataset(mimic3_dir="../../../../scratch_old/ethanmr3/mimic3/physionet.org/files/mimiciii/1.4/", pkl_data_dir="../../halo_pkl/", gzip=True)
model = HALO(dataset=halo_dataset, config=halo_config, save_dir="../../halo_save/", train_on_init=False)
print("Success on model setup")
model.train()
print("Sucess on model train")
model.test(testing_results_dir = "../../halo_results/")
print("Success on model test")
model.synthesize_dataset(pkl_save_dir = "../../halo_results/")
print("Success on dataset synthesis")
print("END: Testing success!!!")
Slurm Job (
test_halo_model.slurm
):#!/bin/bash
#SBATCH --account=ethanmr3-ic
#SBATCH --job-name=pyhealth-halo-testing
#SBATCH --output=halo-testing-logs/halo_test_%j.out
#SBATCH --error=halo-testing-logs/halo_test_%j.err
#SBATCH --partition=IllinoisComputes-GPU # Change to appropriate partition
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --cpus-per-task=4
#SBATCH --mem=64G
#SBATCH --time=48:00:00
#Change to the directory where you submitted the job
cd "$SLURM_SUBMIT_DIR"
#Print useful Slurm environment variables for debugging
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
echo "SLURM_NTASKS: $SLURM_NTASKS"
echo "SLURM_CPUS_ON_NODE: $SLURM_CPUS_ON_NODE"
echo "SLURM_GPUS_ON_NODE: $SLURM_GPUS_ON_NODE"
echo "SLURM_GPUS: $SLURM_GPUS"
echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
#Optional: check what GPU(s) is/are actually visible
echo "Running nvidia-smi to confirm GPU availability:"
nvidia-smi
#Load modules or activate environment
#module load python/3.10
#module load cuda/11.7
#conda activate your-env
#Run your Python training script
python /u/ethanmr3/halo/PyHealth/halo_testing_script.py