Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/doc.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Documentation
**************

All the prototypes of the class/methods are here specified
All prototypes of the class/methods used here are specified.

.. toctree::
:maxdepth: 4
Expand Down Expand Up @@ -30,4 +30,4 @@ SVM

.. automodule:: iScore.rank
:members:
:undoc-members:
:undoc-members:
12 changes: 6 additions & 6 deletions docs/graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Generating the Graphs :
----------------------------


The first step in iSCore is to generate the connections graph of the itnerface. In this graph each node is represented by the PSSM of a residue. The nodes are connected if they form a contact pair between the two proteins.
The first step in iScore is to generate the bipartite graph of the interface of a decoy model. In the generated graph, each node is encoded the PSSM profile of a residue. The nodes are connected if they form a contact pair between two proteins of the decoy model.

To create the graph one needs the PDB file of the interface and the two PSSM files (one for each chain) created by the PSSMGen tool. To generate the graph simply use :
To create the graph, a PDB file for the interface and two separate PSSM files (one for each protein chain) created by the PSSMGen tool are required. To generate the graph, simply use :

>>> from iScore.graph import GenGraph, Graph
>>>
Expand All @@ -19,14 +19,14 @@ To create the graph one needs the PDB file of the interface and the two PSSM fil
>>> g.construct_graph()
>>> g.export_graph('name.pkl')

This simple example will construct the connection graph and export it in a pickle file. A working example can be found in ``example/graph/create_graph.py``
This simple example constructs the bipartite graph and export it into a pickle file. A working example can be found in ``example/graph/create_graph.py``

The function ``iscore_graph()`` facilitate the generation of a large number of conformations. By default this function will create the graphs of all the conformations stored in the subfolder ``./pdb/`` using the pssm files stored in the subfolder ``./pssm/``. The resulting graphs will be stored in the subfolder ``./graph/``.
The function ``iscore_graph()`` facilitates generation of a large number of conformations. By default, this function creates the graphs of all conformations stored in the subfolder ``./pdb/`` using the PSSM files stored in the subfolder ``./pssm/``. The resulting graphs will be stored in the subfolder ``./graph/``.

Generating the Graph Kernels :
-------------------------------------

Once we have calculated the graphs of multiple conformation we can simply compute the kernel of the different pairs using iScore. An example can be found at ``example/kernel/create_kernel.py``
Once we obtain the graphs of conformations, we can simply compute the kernel of the different pairs using iScore. An example can be found at ``example/kernel/create_kernel.py``

>>> from iScore.graph import Graph, iscore_graph
>>> from iScore.kernel import Kernel
Expand All @@ -41,4 +41,4 @@ Once we have calculated the graphs of multiple conformation we can simply comput
>>> # run the calculations
>>> ker.run(lamb=1.0,walk=4,check=checkfile)

The kernel between the two graphs computed above is calculated with the class `Kernel()`. By default the method `Kernel.import_from_mat()` will read all the graphs stored in the subfolder `graph/`. To compute all the pairwise kernels of the graphs loaded above we can simply use the method `Kernel.run()`. We can here specify the value of lambda and the length of the walk.
The kernel between the two graphs is computed by the `Kernel()` class. By default, the method `Kernel.import_from_mat()` imports all the graphs stored in the subfolder `graph/`. To compute all pairwise kernels of the graphs loaded, we can simply use the method `Kernel.run()` (Yong: which one is correct, ker.run or Kernel.run?) . Users can set a lambda value and a walking length as parameters.
13 changes: 7 additions & 6 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,20 @@ Test the installation
To test the module go to the test folder ``cd ./test`` and execute the following test : ``pytest``

These tests are automatically run on Travis CI at each new push.
So if the build button display passing they should work !
So if the build button display passing they should work ! (Yong: I am not sure what this sentence means)

Requiried Dependencies

Required packages for dependencies
------------------------

The code is written in Python3. Several packages are required to run the code but most are pretty standard. Here is an non-exhaustive list of dependencies
The code is written in Python3. Several packages are required to run the code. Here is a list of their dependencies.

* Numpy

* Biopython

* libsvm

* mpi4py

* pdb2sql
* libsvm (https://github.com/cjlin1/libsvm/tree/master/python)

* pdb2sql (https://github.com/DeepRank/pdb2sql)
7 changes: 4 additions & 3 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@
Introduction
=============================

**Support Vector Machine on Graph Kernels for Protein-Protein Docking Scoring**
**iScore: a MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and
support vector machines**

The software supports the publication of the following articles:

C. Geng *et al.*, *iScore: A novel graph kernel-based function for scoring protein-protein docking models*, bioRxiv 2018, https://doi.org/10.1101/498584


iScore uses a support vector machine (SVM) approach to rank protein-protein interfaces. Each interface is represented by a connection graph in which each node represents a contact residue and each edge the connection between two contact residues of different proterin chain. As feature, the node contains the Position Specific Similarity Matrix (PSSM) of the corresponding residue.
iScore uses a support vector machine (SVM) approach to rank protein-protein docking models using their interface information. Each interface is represented as a bipartite graph, in which each node represents a contact residue and each edge denotes the two nodes are close to each other in 3D space (the current cutoff is 6 A). Currently, edges are not labelled, and each node is labeled with 20 by 1 vector from the Position Specific Scoring Matrix (PSSM) of the corresponding residue.

To measure the similarity between two graphs, iScore use a random walk graph kernel (RWGK) approach. These RWGKs are then used as input of the SVM model to either train the model on a training set or use a pretrained model to rank new protein-protein interface.
To measure the similarity between two graphs, iScore use a random walk graph kernel (RWGK) approach. The graph kernel matrix for all graph pairs is then used as input of the SVM model to either train the model on a training set or use a pretrained model to rank new protein-protein docking models.

.. image :: comp.png

Expand Down
10 changes: 5 additions & 5 deletions docs/pssm.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
Computing PSSM files
=============================

As a prepocessign step one must compute the PSSM files corespondng to the PDB files in the training/testing dataset. Thiscan be acheived with the PisBLast library (https://ncbiinsights.ncbi.nlm.nih.gov/2017/10/27/blast-2-7-1-now-available/). The library BioPython allows ane asy use of these libraries.
As a preprocessing step, users must compute the PSSM files correspondng to the PDB files in the training/testing dataset. This can be acheived with the PSI-Blast library (https://ncbiinsights.ncbi.nlm.nih.gov/2017/10/27/blast-2-7-1-now-available/). The BioPython package allows an easy use of the library.


iScore contains wrapper that allows to compute the PSSM data, map them to the PDB files and format them for further processing. The only input needed is the PDB file of the decoy. To compute the PSSM file one can simply use :
iScore contains a wrapper that allows to compute the PSSM data, map them to the PDB files and format them for further processing. The only input needed is the PDB file of the decoy. To compute the PSSM file one can simply use :


>>> from iscore.pssm.pssm import PSSM
>>> from iScore.pssm.pssm import PSSM
>>>
>>> gen = PSSM('1AK4')
>>> gen = PSSM(caseID = '1AK4', pdb_dir ='1AK4/pdb')
>>>
>>> # generates the FASTA query
>>> gen.get_fasta()
Expand All @@ -21,4 +21,4 @@ iScore contains wrapper that allows to compute the PSSM data, map them to the PD
>>> gen.get_pssm()
>>>
>>> # map the pssm to the pdb
>>> gen.map_pssm()
>>> gen.map_pssm()
10 changes: 5 additions & 5 deletions docs/viz.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
Visualizing the connection graphs
======================================

iSore allows to easily visualize the connection graphs using the HDF5 browser provided with the software and pymol. First the connections graphs must be stored in a HDF5 file. To do that simply generate the graphs as following:
iScore allows to easily visualize the bipartite graphs using the HDF5 browser provided by the software and pymol. First, the bipartite graphs must be stored in the format of a HDF5 file. To do so, the graphs can be processed to fit in HDF5 file format as follows:


>>> from iScore.graphrank.graph import iscore_graph
>>> iscore_graph(pdb_path=<pdb_path>,
>>> pssm_path=<pssm_path>,
>>> export_hdf5=True)

where you have to specify the folder containing the PDB files abd PSSM files in pdb_path and pssm_path. By default this are simply ``./pdb/`` and ``./pssm/``. The script above will create a HDF5 file containing the graph.
where you have to specify the folder containing the PDB files and PSSM files in ``pdb_path`` and ``pssm_path``. By default, these are set as ``./pdb/`` and ``./pssm/``. The script above creates a HDF5 file containing the graphs.

This HDF5 cile can be explored using the the dedicated HDF5 browser. Go to the ``./h5x/`` folder and type:
The generated HDF5 file can be opened using the HDF5 browser. To open the HDF5 file in the HDF5 browser, please go to the ``./h5x/`` folder and type:

``./h5x.py``

This will open the hdf5 browser. You can open a hdf5 file by clicking on the file icon in the bottom left of the browser. Once opened, you will see the content of the file in the browser. Right-click on the name of a conformation and choose ``3D Plot``. This will open PyMol and allow you to visualize the connecton graph
You can open a HDF5 file by clicking on the file icon in the bottom left of the browser. Once it is opened, you can see the content of the file in the browser. Right-click on the name of a conformation and choose ``3D Plot``. This will open PyMol and allow you to visualize the bipartite graph

.. image :: h5x_iscore.png
.. image :: h5x_iscore.png
19 changes: 11 additions & 8 deletions docs/workflow.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
iScore Workflow
========================

One of the mainfeature of the software are the serial and MPI binaries that fully automatize the workflow and that can be used directly from the command line. To illustrate the use of these binaries go to the folder ``iScore/example/training_set/``. This folder contains the subfolders ``pdb/`` and ``pssm/`` that contain the PDB and PSSM files of our training set. The binary class corresponding to these PDBs are specified in the file 'caseID.lst'.
One of the main features in the iScore software are the serial and MPI binaries that fully automatize the workflow and that can be used directly from the command line. To illustrate the use of these binaries, you can go to the folder ``iScore/example/training_set/``. This folder contains the subfolders ``pdb/`` and ``pssm/`` that have the PDB and PSSM files for our training set (xue: this folder contains also a folder of `test` and caseID.lst). The binary class corresponding to these PDBs are specified in the file 'caseID.lst'.

Training a model using iScore can be done in a single line using MPI binaries with the command :
=== train ===

Training a model using iScore can be done in a single line using MPI binaries with the command :

``$ cd iScore/example/training_set/train `` (xue: I added this line.)
``$ mpiexec -n 2 iScore.train.mpi``

This command will first generate the graphs of the conformations stored in ``pdb/`` using the corresponding PSSMs contained in ``pssm/`` as features. These graphs will be stored as a pickle file in ``graph/``. The command will then compute the pairwise kernels of these graphs and store the kernel files in ``kernel/``. Finally, an SVM model will be trained using the kernel files and the ``caseID.lst`` file that contains its binary class of the conformation.

This command will first generate the graphs of the conformations stored in ``pdb/`` using the PSSM contained in ``pssm/`` as features. These graphs will be stored as pickle file in ``graph/``. The command will then compute the pairwise kernels of these graphs and store the kernel files in ``kernel/``. Finally it will train a SVM model using the kernel files and the ``caseID.lst`` file that contains the binary class of the model.
The calculated graphs and the svm model are stored in a single tar file called here ``training_set.tar.gz``. This file contains all the information needed to predict binary classes of decoy models in a test set using the trained model.

The calculated graphs and the svm model are stored in a single tar file called here ``training_set.tar.gz``. This file contains all the information needed to predict binary classes of a test set using the trained model.
=== test ===

To predict binary classes (and decision values) of new conformations go to the subfoler ``test/``. Here 5 conformations are specified by the PDB and PSSM files stored in ``pdb/`` and ``pssm/`` that we want to use as a test set. Ranking these conformations can be done in a single command using :
To predict binary classes (and decision values) of new conformations go to the subfolder ``test/``. Here 5 conformations are specified by the PDB and PSSM files stored in ``pdb/`` and ``pssm/`` that we want to use as a test set. Ranking these conformations can be done in a single command using :

``$ mpiexec -n 2 iScore.predict.mpi --archive ../training_set.tar.gz``
``$ mpiexec -n 2 iScore.predict.mpi --archive ../train/training_set.tar.gz``

This command will use first compute the graph of the comformation in the test set and store them in `graph/`. The binary will then compute the pair wise kernels of each graph in the test set with all the graph contained in the training set that are stored in the tar file. These kernels will be stored in ``kernel/``. Finally the binary will use the trained SVM model contained in the tar file to predict the binary class and decision value of the conformations in the test set. The results are then stored in a text file and a pickle file ``iScorePredict.pkl`` and ``iScorePredict.txt``. Opening the text file you will see :
This command will first compute the graphs of the comformations in the test set and store them in `graph/`. The binary will then compute the pairwise kernels for each graph in the test set and all the graphs contained in the training set that are stored in the tar file. These kernels will be stored in ``kernel/``. Finally the binary will use the trained SVM model contained in the tar file to predict the binary classes and decision values of the conformations in the test set. The results are then stored in a text file and a pickle file ``iScorePredict.pkl`` and ``iScorePredict.txt``. Opening the text file you will see :

+--------+--------+---------+-------------------+
|Name | label| pred| decision_value|
Expand All @@ -40,7 +43,7 @@ The ground truth label are here all None because they were not provided in the t
Serial Binaries
------------------------

Serial binaries are also provided and can be used in a similar way than the MPI binaries : ``iscore.train`` and ``iscore.predict``
Serial binaries are also provided and can be used in a similar way than the MPI binaries (Yong: it needs to be rewrite. A bit unclear what it means) : ``iscore.train`` and ``iscore.predict``