1212
1313<!-- [](https://papers.nips.cc/paper/2020) -->
1414
15- [ ![ Data DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.14660031 .svg )] ( https://doi.org/10.5281/zenodo.14660031 )
15+ [ ![ Data DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.15066450 .svg )] ( https://doi.org/10.5281/zenodo.15066450 )
1616
1717<img src =" ./img/FlowDock.png " width =" 600 " >
1818
@@ -76,6 +76,7 @@ cd FlowDock
7676mamba env create -f environments/flowdock_environment.yaml
7777conda activate FlowDock # NOTE: one still needs to use `conda` to (de)activate environments
7878pip3 install -e . # install local project as package
79+ pip3 install prody==2.4.1 --no-dependencies # install ProDy without NumPy dependency
7980```
8081
8182Download checkpoints
9192
9293``` bash
9394# pretrained FlowDock weights
94- wget https://zenodo.org/records/14660031 /files/flowdock_checkpoints.tar.gz
95+ wget https://zenodo.org/records/15066450 /files/flowdock_checkpoints.tar.gz
9596tar -xzf flowdock_checkpoints.tar.gz
9697rm flowdock_checkpoints.tar.gz
9798```
@@ -105,19 +106,19 @@ tar -xzf flowdock_data_cache.tar.gz
105106rm flowdock_data_cache.tar.gz
106107
107108# cached data for PDBBind, Binding MOAD, DockGen, and the PDB-based van der Mers (vdM) dataset
108- wget https://zenodo.org/records/14660031 /files/flowdock_pdbbind_data.tar.gz
109+ wget https://zenodo.org/records/15066450 /files/flowdock_pdbbind_data.tar.gz
109110tar -xzf flowdock_pdbbind_data.tar.gz
110111rm flowdock_pdbbind_data.tar.gz
111112
112- wget https://zenodo.org/records/14660031 /files/flowdock_moad_data.tar.gz
113+ wget https://zenodo.org/records/15066450 /files/flowdock_moad_data.tar.gz
113114tar -xzf flowdock_moad_data.tar.gz
114115rm flowdock_moad_data.tar.gz
115116
116- wget https://zenodo.org/records/14660031 /files/flowdock_dockgen_data.tar.gz
117+ wget https://zenodo.org/records/15066450 /files/flowdock_dockgen_data.tar.gz
117118tar -xzf flowdock_dockgen_data.tar.gz
118119rm flowdock_dockgen_data.tar.gz
119120
120- wget https://zenodo.org/records/14660031 /files/flowdock_pdbsidechain_data.tar.gz
121+ wget https://zenodo.org/records/15066450 /files/flowdock_pdbsidechain_data.tar.gz
121122tar -xzf flowdock_pdbsidechain_data.tar.gz
122123rm flowdock_pdbsidechain_data.tar.gz
123124```
@@ -129,7 +130,7 @@ rm flowdock_pdbsidechain_data.tar.gz
129130<details >
130131
131132** NOTE:** The following steps (besides downloading PDBBind and Binding MOAD's PDB files) are only necessary if one wants to fully process each of the following datasets manually.
132- Otherwise, preprocessed versions of each dataset can be found on [ Zenodo] ( https://zenodo.org/records/14660031 ) .
133+ Otherwise, preprocessed versions of each dataset can be found on [ Zenodo] ( https://zenodo.org/records/15066450 ) .
133134
134135Download data
135136
@@ -159,6 +160,16 @@ mv pdb_2021aug02/ pdbsidechain/
159160cd ../
160161```
161162
163+ Lastly, to finetune ` FlowDock ` using the ` PLINDER ` dataset, one must first prepare this data for training
164+
165+ ``` bash
166+ # fetch PLINDER data (NOTE: requires ~1 hour to download and ~750G of storage)
167+ export PLINDER_MOUNT=" $( pwd) /data/PLINDER"
168+ mkdir -p " $PLINDER_MOUNT " # create the directory if it doesn't exist
169+
170+ plinder_download -y
171+ ```
172+
162173### Generating ESM2 embeddings for each protein (optional, cached input data available on SharePoint)
163174
164175To generate the ESM2 embeddings for the protein inputs,
@@ -260,10 +271,10 @@ python flowdock/train.py experiment=flowdock_fm
260271python flowdock/train.py experiment=flowdock_fm trainer.max_epochs=20 data.batch_size=8
261272```
262273
263- For example, override parameters to finetune ` FlowDock ` 's pretrained weights using a new dataset
274+ For example, override parameters to finetune ` FlowDock ` 's pretrained weights using a new dataset such as [ PLINDER ] ( https://www.plinder.sh/ )
264275
265276``` bash
266- python flowdock/train.py experiment=flowdock_fm data=my_new_datamodule ckpt_path=checkpoints/esmfold_prior_paper_weights.ckpt
277+ python flowdock/train.py experiment=flowdock_fm data=plinder ckpt_path=checkpoints/esmfold_prior_paper_weights.ckpt
267278```
268279
269280</details >
@@ -277,7 +288,7 @@ To reproduce `FlowDock`'s evaluation results for structure prediction, please re
277288To reproduce ` FlowDock ` 's evaluation results for binding affinity prediction using the PDBBind dataset
278289
279290``` bash
280- python flowdock/eval.py data.test_datasets=[pdbbind] ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt trainer=gpu
291+ python flowdock/eval.py data.test_datasets=[pdbbind] ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt trainer=gpu
281292... # re-run two more times to gather triplicate results
282293```
283294
@@ -291,47 +302,55 @@ Download baseline method predictions and results
291302
292303``` bash
293304# cached predictions and evaluation metrics for reproducing structure prediction paper results
294- wget https://zenodo.org/records/14660031 /files/alphafold3_baseline_method_predictions.tar.gz
305+ wget https://zenodo.org/records/15066450 /files/alphafold3_baseline_method_predictions.tar.gz
295306tar -xzf alphafold3_baseline_method_predictions.tar.gz
296307rm alphafold3_baseline_method_predictions.tar.gz
297308
298- wget https://zenodo.org/records/14660031 /files/chai_baseline_method_predictions.tar.gz
309+ wget https://zenodo.org/records/15066450 /files/chai_baseline_method_predictions.tar.gz
299310tar -xzf chai_baseline_method_predictions.tar.gz
300311rm chai_baseline_method_predictions.tar.gz
301312
302- wget https://zenodo.org/records/14660031 /files/diffdock_baseline_method_predictions.tar.gz
313+ wget https://zenodo.org/records/15066450 /files/diffdock_baseline_method_predictions.tar.gz
303314tar -xzf diffdock_baseline_method_predictions.tar.gz
304315rm diffdock_baseline_method_predictions.tar.gz
305316
306- wget https://zenodo.org/records/14660031 /files/dynamicbind_baseline_method_predictions.tar.gz
317+ wget https://zenodo.org/records/15066450 /files/dynamicbind_baseline_method_predictions.tar.gz
307318tar -xzf dynamicbind_baseline_method_predictions.tar.gz
308319rm dynamicbind_baseline_method_predictions.tar.gz
309320
310- wget https://zenodo.org/records/14660031 /files/flowdock_baseline_method_predictions.tar.gz
321+ wget https://zenodo.org/records/15066450 /files/flowdock_baseline_method_predictions.tar.gz
311322tar -xzf flowdock_baseline_method_predictions.tar.gz
312323rm flowdock_baseline_method_predictions.tar.gz
313324
314- wget https://zenodo.org/records/14660031 /files/flowdock_aft_baseline_method_predictions.tar.gz
325+ wget https://zenodo.org/records/15066450 /files/flowdock_aft_baseline_method_predictions.tar.gz
315326tar -xzf flowdock_aft_baseline_method_predictions.tar.gz
316327rm flowdock_aft_baseline_method_predictions.tar.gz
317328
318- wget https://zenodo.org/records/14660031/files/flowdock_esmfold_baseline_method_predictions.tar.gz
329+ wget https://zenodo.org/records/15066450/files/flowdock_pft_baseline_method_predictions.tar.gz
330+ tar -xzf flowdock_pft_baseline_method_predictions.tar.gz
331+ rm flowdock_pft_baseline_method_predictions.tar.gz
332+
333+ wget https://zenodo.org/records/15066450/files/flowdock_esmfold_baseline_method_predictions.tar.gz
319334tar -xzf flowdock_esmfold_baseline_method_predictions.tar.gz
320335rm flowdock_esmfold_baseline_method_predictions.tar.gz
321336
322- wget https://zenodo.org/records/14660031/files/flowdock_hp_baseline_method_predictions.tar.gz
337+ wget https://zenodo.org/records/15066450/files/flowdock_chai_baseline_method_predictions.tar.gz
338+ tar -xzf flowdock_chai_baseline_method_predictions.tar.gz
339+ rm flowdock_chai_baseline_method_predictions.tar.gz
340+
341+ wget https://zenodo.org/records/15066450/files/flowdock_hp_baseline_method_predictions.tar.gz
323342tar -xzf flowdock_hp_baseline_method_predictions.tar.gz
324343rm flowdock_hp_baseline_method_predictions.tar.gz
325344
326- wget https://zenodo.org/records/14660031 /files/neuralplexer_baseline_method_predictions.tar.gz
345+ wget https://zenodo.org/records/15066450 /files/neuralplexer_baseline_method_predictions.tar.gz
327346tar -xzf neuralplexer_baseline_method_predictions.tar.gz
328347rm neuralplexer_baseline_method_predictions.tar.gz
329348
330- wget https://zenodo.org/records/14660031 /files/vina_p2rank_baseline_method_predictions.tar.gz
349+ wget https://zenodo.org/records/15066450 /files/vina_p2rank_baseline_method_predictions.tar.gz
331350tar -xzf vina_p2rank_baseline_method_predictions.tar.gz
332351rm vina_p2rank_baseline_method_predictions.tar.gz
333352
334- wget https://zenodo.org/records/14660031 /files/rfaa_baseline_method_predictions.tar.gz
353+ wget https://zenodo.org/records/15066450 /files/rfaa_baseline_method_predictions.tar.gz
335354tar -xzf rfaa_baseline_method_predictions.tar.gz
336355rm rfaa_baseline_method_predictions.tar.gz
337356```
@@ -353,13 +372,13 @@ jupyter notebook notebooks/casp16_binding_affinity_prediction_results_plotting.i
353372For example, generate new protein-ligand complexes for a pair of protein sequence and ligand SMILES strings such as those of the PDBBind 2020 test target ` 6i67 `
354373
355374``` bash
356- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' YNKIVHLLVAEPEKIYAMPDPTVPDSDIKALTTLCDLADRELVVIIGWAKHIPGFSTLSLADQMSLLQSAWMEILILGVVYRSLFEDELVYADDYIMDEDQSKLAGLLDLNNAILQLVKKYKSMKLEKEEFVTLKAIALANSDSMHIEDVEAVQKLQDVLHEALQDYEAGQHMEDPRRAGKMLMTLPLLRQTSTKAVQHFYNKLEGKVPMHKLFLEMLEAKV' input_ligand=' "c1cc2c(cc1O)CCCC2"' input_template=data/pdbbind/pdbbind_holo_aligned_esmfold_structures/6i67_holo_aligned_esmfold_protein.pdb sample_id=' 6i67' out_path=' ./6i67_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
375+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' YNKIVHLLVAEPEKIYAMPDPTVPDSDIKALTTLCDLADRELVVIIGWAKHIPGFSTLSLADQMSLLQSAWMEILILGVVYRSLFEDELVYADDYIMDEDQSKLAGLLDLNNAILQLVKKYKSMKLEKEEFVTLKAIALANSDSMHIEDVEAVQKLQDVLHEALQDYEAGQHMEDPRRAGKMLMTLPLLRQTSTKAVQHFYNKLEGKVPMHKLFLEMLEAKV' input_ligand=' "c1cc2c(cc1O)CCCC2"' input_template=data/pdbbind/pdbbind_holo_aligned_esmfold_structures/6i67_holo_aligned_esmfold_protein.pdb sample_id=' 6i67' out_path=' ./6i67_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
357376```
358377
359378Or, for example, generate new protein-ligand complexes for pairs of protein sequences and (multi-)ligand SMILES strings (delimited via ` | ` ) such as those of the CASP15 target ` T1152 `
360379
361380``` bash
362- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPN' input_ligand=' "CC(=O)NC1C(O)OC(CO)C(OC2OC(CO)C(OC3OC(CO)C(O)C(O)C3NC(C)=O)C(O)C2NC(C)=O)C1O"' input_template=data/test_cases/predicted_structures/T1152.pdb sample_id=' T1152' out_path=' ./T1152_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
381+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPN' input_ligand=' "CC(=O)NC1C(O)OC(CO)C(OC2OC(CO)C(OC3OC(CO)C(O)C(O)C3NC(C)=O)C(O)C2NC(C)=O)C1O"' input_template=data/test_cases/predicted_structures/T1152.pdb sample_id=' T1152' out_path=' ./T1152_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
363382```
364383
365384If you do not already have a template protein structure available for your target of interest, set ` input_template=null ` to instead have the sampling script predict the ESMFold structure of your provided ` input_protein ` sequence before running the sampling pipeline. For more information regarding the input arguments available for sampling, please refer to the config at ` configs/sample.yaml ` .
@@ -369,7 +388,7 @@ If you do not already have a template protein structure available for your targe
369388For instance, one can perform batched prediction as follows:
370389
371390``` bash
372- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling csv_path=' ./data/test_cases/prediction_inputs/flowdock_batched_inputs.csv' out_path=' ./T1152_batch_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=false auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
391+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling csv_path=' ./data/test_cases/prediction_inputs/flowdock_batched_inputs.csv' out_path=' ./T1152_batch_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=false auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
373392```
374393
375394</details >
0 commit comments