Introduction

The characterization of the protein-protein association mechanisms is crucial to understand how biological processes occur. Previous works in in this field describe that the initial formation of non-specific encounter complexes enhance the formation of the stereospecific complex by reducing the dimensionality of the search process. The rate of association of forming a binary complex can play a relevant role in the cell biology and be affected by the diffusion in which two proteins come near. Predicting the binding free energy of proteins provides new opportunities to modulate and control protein-protein interactions. However, existent methods rely on the structure of the complexes to make predictions, seriously limiting the applicability to few interactions. Here, we offer a new method of prediction based on the decoys obtained in the docking approach. The strategy implemented in this work allows us to predict the binding affinities from the unbound tertiary structures. We have tested the approach on a set of globular and soluble proteins of the newest affinity benchmark, obtaining a prediction accuracy comparable to other state-of-art methods.

The method

a) Docking sampling

The first step consists of an exhaustive sampling of the conformational space using PatchDock. Despite the final goal of docking is to find near-native conformations among a large variety of different orientations, the analysis of the whole set of poses give insights of the encounter complex and the association process.

b) Docking scoring

We score each docking pose by the ES3DC statistic potential. Then we obtain a global score of the interaction by averaging the complete set of scores.

c) Binding energy Prediction

We use a linear regression model for predicting the affinity (ΔG) from the averaged score

Validation

We used the Binding Affinity Benchmark 2 to train and test a prediction model based on the poses resulting from a docking experiment. This benchmarks consist of 179 non-redundant high quality structures of protein complexes classified by biological functions: involving enzymes, antibody-antigen and other (involving membrane-bound receptors, g-proteins and a set of miscellaneous protein types and functions). In addition, for each protein is reported the interface-RMSD. This measure can be used to estimate the degree of conformational change upon binding for a particular protein in a particular complex, allowing to split the datasets into rigid (interface-RMSD<1Å) and flexible (interface-RMSD>=1Å) interactions. We restrict our dataset to globular soluble proteins by omitting the categories of membrane-binding receptors, g-proteins We also omit antibody-antigen complexes as we consider that these are a particular case of protein-protein interactions and the mechanisms of recognition and binding may be more intriguing. The trimmed datasets are referred here as AB2 (from Affinity Benchmark 2)

The linear regression model obtained with the whole set of decoys from a docking search and the ES3DC statistical potential shows a significant average of Person’s correlation (0.36) between the experimental and predicted values of ΔG, with an average error (RMSE) of 2.84 kcal/mol in the ten-fold cross-validation. Splitting the data into rigid and flexible cases yields average correlations of r=0.40 (with rmse=3.03 kcal/mol) for rigid and r=0.27 (with rmse=2.49 kcal/mol) for flexible cases.


Density plot between experimental and predicted ΔG using the ES3DC statistical potential and all docking posess
Density plot between experimental and predicted ΔG using the ES3DC statistical potential and all docking poses

Predictions are made using the test sets of 1000 random ten-fold cross validation models with the ES3DC averaged scores of all docking poses in the AB2 dataset. Blue lines show the density plot of ΔG energies predicted vs. experimental for rigid cases of AB2 and red lines for flexible cases


Submission

Mandatory input:

  • PDB of protein A: The structure (in PDB format) of one of the proteins. Protein A will be used as receptor in the docking process, so it is recomendable to upload the larger protein as protein A.
  • PDB of protein B: The structure (in PDB format) of the protein partner. Protein B will be used as ligand in the docking process, so it is recomendable to upload the smaller protein as protein B.

After submitting a job, the user will be given a code to retrieve the results linked to a specific web address.

Results

To retrieve the results the user can:

  • Use the code provided after the submission in the Results tab
  • Save or bookmark the link assigned to the provided code

While the prediction is not finished, the Results page remains reloading.

Once the computations are completed, the predicted binding energy (kcal/mol) is show.

Results for each submission are stored for 7 days before being erased completel. For any question or suggestion about the BADock Server, please contact us.

Input: troubleshooting errors

Why do I get a “The Upload PDB of protein A/B input is empty” error?

This error is shown when the mandatory inputs (PDB of protein A or B) are empty. Upload the PDBs to solve this error.

Why do I get a “ Allowed file size exceeded. (Max. 10 MB)” error?

In order to mantain a good performance of the server, PDBs cannot exeed 10MB.

Input: restrictions

Do the query PDBs require to be single-chain?

No, PDBs can be single-chain or multi-chain. However, the binding energy will be computed for the interface formed by PDB A and PDB B.

Do the query proteins require to be soluble globular proteins?

Yes, the predictor has only been validated in globular soluble proteins.