### abstract ###
AIMX A new theoretical survey of proteins' resistance to constant speed stretching is performed for a set of 17 134 proteins as described by a structure-based model.
OWNX The proteins selected have no gaps in their structure determination and consist of no more than 250 amino acids.
MISC Our previous studies have dealt with 7510 proteins of no more than 150 amino acids.
OWNX The proteins are ranked according to the strength of the resistance.
OWNX Most of the predicted top-strength proteins have not yet been studied experimentally.
OWNX Architectures and folds which are likely to yield large forces are identified.
OWNX New types of potent force clamps are discovered.
OWNX They involve disulphide bridges and, in particular, cysteine slipknots.
OWNX An effective energy parameter of the model is estimated by comparing the theoretical data on characteristic forces to the corresponding experimental values combined with an extrapolation of the theoretical data to the experimental pulling speeds.
OWNX These studies provide guidance for future experiments on single molecule manipulation and should lead to selection of proteins for applications.
OWNX A new class of proteins, involving cystein slipknots, is identified as one that is expected to lead to the strongest force clamps known.
OWNX This class is characterized through molecular dynamics simulations.
### introduction ###
MISC Atomic force microscopy, optical tweezers, and other tools of nanotechnology have enabled induction and monitoring of large conformational changes in biomolecules.
MISC Such studies are performed to assess structure of the biomolecules, their elastic properties, and ability to act as nanomachines in a cell.
MISC Stretching studies of proteins CITATION are of a particular current interest and they have been performed for under a hundred of systems.
MISC Interpretation of some of these experiments has been helped by all-atom simulations, such as reported in refs. CITATION, CITATION.
CONT They are limited by of order 100 ns time scales and thus require using unrealistically large constant pulling speeds.
MISC However, they often elucidate the nature of the force clamp the region responsible for the largest force of resistance to pulling, FORMULA.
CONT All of the experimental and all-atom simulational studies address merely a tiny fraction of proteins that are stored in the Protein Data Bank CITATION.
MISC Thus it appears worthwhile to consider a large set of proteins and determine their FORMULA within an approximate model that allows for fast and yet reasonably accurate calculations.
MISC Structure-based models of proteins, as pioneered by Go and his collaborators CITATION and used in several implementations CITATION CITATION, seem to be suited to this task especially well since they are defined in terms of the native structures away from which stretching is imposed.
MISC There are many ways, all phenomenological, to construct a structure-based model of a protein.
MISC 504 of possible variants are enumerated and 62 are studied in details in ref. CITATION.
MISC The variants differ by the choice of effective potentials, nature of the local backbone stiffness, energy-related parameters, and of the coarse-grained degrees of freedom.
MISC The most crucial choice relates to making a decision about which interactions between amino acids count as native contacts.
MISC Comparing FORMULA to the corresponding experimental values in 36 available cases selects several optimal models CITATION.
MISC Among them, there is one which is very simple and which describes a protein in terms of its FORMULA atoms, as labeled by the sequential index FORMULA.
MISC This model is denoted by FORMULA which stands for, respectively, the Lennard-Jones native contact potentials, local backbone stiffness represented by harmonic terms that favor the native values of local chiralities, the contact map in which there are no FORMULA contacts, and the amplitude of the Lennard-Jones potential, FORMULA, is uniform.
MISC The contact map is determined by assigning the van der Waals spheres to the heavy atoms and by checking whether spheres belonging to different amino acids overlap in the native state CITATION, CITATION.
MISC If they do, a contact is declared as native.
MISC Non-native contacts are considered repulsive.
MISC Application of this criterion frequently selects the FORMULA contacts as native.
MISC If the contact map includes these contacts the resulting model will be denoted here as FORMULA.
MISC On average, it performs worse than FORMULA because the FORMULA contacts usually correspond to the weak van der Waals couplings as can be demonstrated in a sample of proteins by using a software CITATION which analyses atomic configurations from the chemical perspective on molecular bonds.
MISC Thus the FORMULA couplings should better be removed from the contact map .
MISC The survey to determine FORMULA in 7510 model proteins with the number of amino acids, FORMULA, not exceeding 150 and 239 longer proteins has been accomplished twice.
MISC First within the FORMULA model CITATION and soon afterwords within the FORMULA model CITATION.
MISC The first survey also comes with many details of the methodology whereas the second just presents the outcomes.
MISC The two surveys are compared in more details in refs. CITATION, CITATION.
MISC The results differ, particularly when it comes to ranking of the proteins according to the value of FORMULA, but they mutually provide the error bars on the findings.
MISC They both agree, however, on predicting that there are many proteins whose strength should be considerably larger than the frequently studied benchmark the sarcomere protein titin.
MISC Near the top of the list, there is the scaffoldin protein c7A which has been recently measured to have FORMULA of about 480 pN CITATION.
MISC Other findings include establishing correlations with the CATH hierarchical classification scheme CITATION, CITATION, such as that there are no strong FORMULA proteins, and identification of several types of the force clamps.
MISC The large forces most commonly originate in parallel FORMULA that are sheared CITATION.
MISC However, there are also clamps with antiparallel FORMULA, unstructured strands, and other kinds.
MISC The two surveys have been based on the structure download made on July 26, 2005 when the PDB comprised 29 385 entries.
MISC Many of them correspond to nucleic acids, complexes with nucleic acids and with other proteins, carbohydrates, or come with incomplete files and hence the much smaller number of proteins that could be used in the molecular dynamics studies.
AIMX Here, we present results of still another survey which is based on a download of December 18, 2008 which contains 54 807 structure files and leads to 17 134 acceptable structures with FORMULA not exceeding 250.
OWNX These structures are then analyzed through simulations based on the FORMULA model.
OWNX The numerical code has been improved to allow for acceleration of calculations by a factor of 2.
OWNX The 190 structures with the top values of FORMULA in units of FORMULA are shown in Table 1 and Table S1 of the SI, together with the values of titin and ubiquitin to provide a scale.
OWNX As argued in the Materials and Methods section section, the unit of force, FORMULA, is now estimated to be of order 110 pN.
OWNX All of the corresponding proteins are predicted to be much stronger than titin and none but two of them have been studied experimentally yet.
OWNX In addition to the types of force clamps identified before, we have discovered two new mechanisms of sturdiness.
OWNX One of them involves a cysteine slipknot and is found to be operational in all of the 13 top strength proteins.
OWNX In this motif, a slip-loop is pulled out of a cysteine knot-loop.
OWNX Another involves dragging of a single fragment of the main chain across a cysteine knot-loop.
OWNX The two mechanisms are similar in spirit since both involve dragging of the backbone.
OWNX However, in the CSK case, two fragments of the backbone are participating.
OWNX We make a more systematic identification of the CATH-classified architectures that are linked to mechanical strength and then analyze correlations of the data to the SCOP-based grouping CITATION CITATION.
CONT The previous surveys did not relate to the SCOP scheme.
OWNX We identify the CATH-based architectures and SCOP-based folds that are associated with the occurrence of a strong resistance to pulling.
OWNX A general observation, however, is that each such group of structures may also include examples of proteins that unravel easily.
OWNX The dynamics of a protein are very sensitive to mechanical details that are largely captured by the contact map and not just by the appearance of a structure.
OWNX On the other hand, if one were to look for mechanically strong proteins then the architectures and folds identified by us should provide a good starting point.
OWNX We also study the dependence of FORMULA on the pulling velocity and characterize the dependence on FORMULA through distributions of the forces.
MISC The current third survey has been performed within the same FORMULA model as the second survey CITATION.
BASE However, we reuse and extend it here because the editors of Biophysical Journal retracted the second survey CITATION.
OWNX All of the values of FORMULA are deposited at the website LINK and can by accessed by through the PDB structure code.
