Laevis will lead to a better understanding of the biological functions, expression regulation mechanisms, pathogenicity and other important aspects of GOLPH2. The protein universe refers to a collection of all proteins across all organisms in nature. In 1992, there were only 887 protein structures in the Protein Data Bank which could be categorized into 120 different tertiary folds. Chothia noticed that about 1/4 of the entries at the EMBL/SwissProt sequence databank were homologous to the 120 folds, and 1/3 of the genome sequences presented in the sequence databank. He thereby suggested that the number of protein tertiary folds in the protein universe XL880 cost should be limited and around 1500. Amazingly, this simple estimation stood well the test of time and lies at the center of the subsequent estimation range using more elaborate methods based on much larger datasets.
At present, the PDB has over 70 k structures, which has been argued to be structurally complete. The structure set has been categorized into 1,195 folds by SCOP in the 2009 release, consistent with the Chothia’s original estimation. In contrast to the extensive studies of protein tertiary structural space, the quaternary structure space of protein-protein interactions is relatively unexplored. For example, the questions on whether the number of unique protein-protein complex structures is constrained and if yes, how many they are, have remained largely unanswered. Since most proteins perform their physiological functions via interaction with other protein molecules, the answers to these questions have practical applications in the understanding of protein-protein interaction specificity and protein-protein networks. Meanwhile, the template-based methods have recently demonstrated promising power in protein complex structural modeling; the completeness of the quaternary structure space is of important implications to the studies of protein-protein docking and structure prediction, and the forthcoming structural genomics of protein-protein interactions. Exploration of the quaternary structure space has been mainly hampered by the relative dearth of protein-protein complex structures in the PDB library, and the lack of an unambiguous definition of protein quaternary structural folds and efficient methods to compare and categorize protein-protein complex structures. Among limited attempts, Aloy and Russell exploited the protein-protein interaction data from high-throughput genomic data to estimate, based on the assumption that homologous proteins should participate in similar interactions, that the number of unique protein-protein interactions is around 10,000.
Although the estimation could be meaningful for the complex homologous families, it is often observed that proteins of different sequences have similar complex structure and interface interactions. Thus, the AloyRussell calculation may overestimate the protein-protein interaction space if the protein-protein interactions are counted at the structural level. Here, we present a systemic study of a representative set of protein-protein complex structures in the PDB.