Inconsistent drug indexing in loo_label.csv, expr_index.txt, and --drug_index argument

Can you provide more information about what each row and index in loo_label.csv and expr_index.txt represents? I believe it is the label of each drug perturbation, because each row in loo_label.csv corresponds to each row in pert.csv and expr.csv, but I cannot tell what the number indices in loo_label.csv represent.

From the paper, there are 12 drugs being tested. The `--drug_index` argument therefore refers to the drug that is left out during training. I would assume that, for example, when I ran `python scripts/main.py -config=configs/Example.leave_one_out.json --drug_index 12`, all the rows in pert.csv that belong to the drug at index 12 (indicated in loo_label.csv) are left out in the training set. However, with a closer look, I see that `testidx` (defined in `dataset.py`) contains the indices that points to rows in loo_label.csv that has the number 9. Similarly, setting `--drug_index 11` points to rows with number 8, and so on. But setting `--drug_index` from 0 to 7 points correctly to rows in loo_label.csv that have that number.

Can you confirm with me if this is an expected bahaviour? This is important for me to test my pytorch dataloader to confirm it fetches the similar rows in pert.csv as the current tensorflow dataloader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent drug indexing in loo_label.csv, expr_index.txt, and --drug_index argument #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent drug indexing in loo_label.csv, expr_index.txt, and --drug_index argument #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions