Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
Faculty of Sciences. Mathematics and Computer Science
, 15 p.
University of Antwerp
Background The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. Results Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. Conclusions The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. Electronic supplementary material The online version of this article (doi:10.1186/s13040-015-0038-4) contains supplementary material, which is available to authorized users.