The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and i...The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.展开更多
The electronic structure of protein chains L and M in photosynthetic reaction center (PRC) of Rhodobacter sphaeroides (Van Niel) Imhoff, Truper et Pfennig) was studied by using the Overlapping Dimer Approximation meth...The electronic structure of protein chains L and M in photosynthetic reaction center (PRC) of Rhodobacter sphaeroides (Van Niel) Imhoff, Truper et Pfennig) was studied by using the Overlapping Dimer Approximation method and the Extended Negative Factor Counter method at ab initio level. The result indicated that: (1) Amino acid residues, the molecular orbitals of which composed the main components of frontier orbitals of protein chain L (M), are located at the random coil areas of chain L (alpha helix areas of chain M). Since the random coil is flexible and more easy to change its conformation in the electron transfer process and to reduce the energy of the system, and the structure of the alpha helix is reletively stable, this difference might be one of the causes for the electron transfer in photosynthetic reaction center (PRC) only takes place along the L branch. (2) The His residues which axially coordinated to the 'special pair' P and accessory chlorophyll molecules (ABChls) are essentially important for the E-LUMO levels of P and ABChl. But, the corresponding molecular orbitals of these His residues do not appear in the composition of frontier orbitals of protein chains. It means that the interaction between pigment molecules and protein chains do not influence the contribution to the frontier orbitals of protein chains explicitly, but influences the corresponding E-LUMO levels significantly.展开更多
A distance measure that infers to indicate the evolutionary relationship of protein structures has been developed based on spatial preference factors of residues. The spatial preference factor is a reflection of the e...A distance measure that infers to indicate the evolutionary relationship of protein structures has been developed based on spatial preference factors of residues. The spatial preference factor is a reflection of the environment of residues in tertiary structure. Compared with the phyletic relationships derived from sequence homologies and three-dimensional structures, we find that the two lines of evolution are similar in general. This approach is applied to a group of glins here.展开更多
In this work, we make an investigation on the preferences of orientations between amino acids using the orientation defined based on the local geometry of the amino acids concerned. It is found that there are common p...In this work, we make an investigation on the preferences of orientations between amino acids using the orientation defined based on the local geometry of the amino acids concerned. It is found that there are common preferences of orientations (70°, 30°, 140°) and (110°, 340°, 100°) for various pairs of amino acids. Different side chains may strengthen or weaken the common preferences, which is related to the effect of packing. Some amino acids having specific local flexibility may possess some preferences of orientations besides the common ones, such as (10°, 280°, 210°). Another analysis on the pairs of the amino acids with different secondary-structure preferences shows that the directional interaction may affect the distribution of orientation more effectively than the packing or local flexibility. All these results provide us some insight of the organization of amino acids in protein, and their relation with some related interactions.展开更多
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less tha...Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.60704047).
文摘The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.
文摘The electronic structure of protein chains L and M in photosynthetic reaction center (PRC) of Rhodobacter sphaeroides (Van Niel) Imhoff, Truper et Pfennig) was studied by using the Overlapping Dimer Approximation method and the Extended Negative Factor Counter method at ab initio level. The result indicated that: (1) Amino acid residues, the molecular orbitals of which composed the main components of frontier orbitals of protein chain L (M), are located at the random coil areas of chain L (alpha helix areas of chain M). Since the random coil is flexible and more easy to change its conformation in the electron transfer process and to reduce the energy of the system, and the structure of the alpha helix is reletively stable, this difference might be one of the causes for the electron transfer in photosynthetic reaction center (PRC) only takes place along the L branch. (2) The His residues which axially coordinated to the 'special pair' P and accessory chlorophyll molecules (ABChls) are essentially important for the E-LUMO levels of P and ABChl. But, the corresponding molecular orbitals of these His residues do not appear in the composition of frontier orbitals of protein chains. It means that the interaction between pigment molecules and protein chains do not influence the contribution to the frontier orbitals of protein chains explicitly, but influences the corresponding E-LUMO levels significantly.
文摘A distance measure that infers to indicate the evolutionary relationship of protein structures has been developed based on spatial preference factors of residues. The spatial preference factor is a reflection of the environment of residues in tertiary structure. Compared with the phyletic relationships derived from sequence homologies and three-dimensional structures, we find that the two lines of evolution are similar in general. This approach is applied to a group of glins here.
基金Project supported by the National Natural Science Foundation of China (Grant Nos 10204013, 90103031, 10074030, 10474041, 90403120 and 10021001), and the Nonlinear Project (973) of the National Science Ministry, China.
文摘In this work, we make an investigation on the preferences of orientations between amino acids using the orientation defined based on the local geometry of the amino acids concerned. It is found that there are common preferences of orientations (70°, 30°, 140°) and (110°, 340°, 100°) for various pairs of amino acids. Different side chains may strengthen or weaken the common preferences, which is related to the effect of packing. Some amino acids having specific local flexibility may possess some preferences of orientations besides the common ones, such as (10°, 280°, 210°). Another analysis on the pairs of the amino acids with different secondary-structure preferences shows that the directional interaction may affect the distribution of orientation more effectively than the packing or local flexibility. All these results provide us some insight of the organization of amino acids in protein, and their relation with some related interactions.
基金the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001)the Nonlinear Project (973) of the NSM
文摘Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.