A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the ne...A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.展开更多
Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper stud...Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).展开更多
Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein ...Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.展开更多
The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of Multiple...The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.展开更多
To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their...To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.展开更多
Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies o...Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.展开更多
Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigor...Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigorously promoting the tumor-related biomedical and clinical study,it is urgent for us to develop a professional and comprehensive database to curate MC-related data.Mitotic Catastrophe Database(MCDB)consists of 1214 genes/proteins and 5014 compounds collected and organized from more than 8000 research articles.Also,MCDB defines the confidence level,classification criteria,and uniform naming rules for MC-related data,which greatly improves data reliability and retrieval convenience.Moreover,MCDB develops protein sequence alignment and target prediction functions.The former can be used to predict new potential MC-related genes and proteins,and the latter can facilitate the identification of potential target proteins of unknown MC-related compounds.In short,MCDB is such a proprietary,standard,and comprehensive database for MC-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry,molecular biology,bioinformatics,oncology and so on.The MCDB is distributed on http://www.combio-lezhang.online/MCDB/indexhtml/.展开更多
Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlati...Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.展开更多
Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific m...Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific mutations,particu-larly in the form of tumor neoantigens,to induce immune responses,particularly the activation of CD8+T cells,which can attack malignant cells.Since tumor mutations result in protein sequence alterations distinct from those in normal tissues,therapies that precisely target these alterations could,in principle,confer effective tumor control while minimizing off-target effects.展开更多
In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequ...In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.展开更多
Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computi...Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.展开更多
Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engi...Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engineering,it has been widely applied.The amino acid sequence of recombinant human-like gelatin was constructed through a newly designed hexamer composed of six protein monomer sequences in series,with the minimum repeating unit being the characteristic Gly-X-Y sequence found in type III human collagenα1 chain.The nucleotide sequence was subsequently inserted into the genome of Pichia pastoris to enable soluble secretion expression of recombinant gelatin.At the shake flask fermentation level,the yield of recombinant gelatin is up to 0.057 g/L,and its purity can rise up to 95%through affinity purification.It was confirmed in the molecular weight determination and amino acid analysis that the amino acid composition of the obtained recombinant gelatin is identical to that of the theoretically designed.Furthermore,scanning electron microscopy revealed that the freeze-dried recombinant gelatin hydrogel exhibited a porous structure.After culturing cells continuously within these gelatin microspheres for two days followed by fluorescence staining and observation through confocal laser scanning microscopy,it was observed that cells clustered together within the gelatin matrix,exhibiting three-dimensional growth characteristics while maintaining good viability.This research presents promising prospects for developing recombinant gelatin as a biomedical material.展开更多
Graphical representation is a very efficient tool for visual analysis of protein sequences. In this paper, a novel 2D graphical representation scheme is proposed on the basis of a newly introduced concept, named chara...Graphical representation is a very efficient tool for visual analysis of protein sequences. In this paper, a novel 2D graphical representation scheme is proposed on the basis of a newly introduced concept, named characteristic model of the protein sequences. After obtaining the 2D graphics of protein sequences, two numerical characterizations of them is designed as descriptors to analyze the nine DN5 protein sequences, simulation and analysis results show that, comparing with existing methods, our method is not only visible, intuitional, and simple, but also has no circuit or degeneracy, and even more important, since the storage space required by our method is constant and has nothing to do with the length of protein sequences, then it can keep excellent visual inspection for long protein sequences.展开更多
Sequence clustering software is essential in bioinformatics.However,selecting the appropriate one can be challenging due to its diverse algorithms and targeted applications.This paper analyzes and evaluates eight repr...Sequence clustering software is essential in bioinformatics.However,selecting the appropriate one can be challenging due to its diverse algorithms and targeted applications.This paper analyzes and evaluates eight representative softwares(algorithms)in terms of precision,sensitivity,speed,scale of running time,and memory consumption.Furthermore,this paper examines the effects of sequence count,sequence length,identity,thread count,and GPU on the above aspects.Sequence length and identity significantly impact clustering efficiency(speed and memory consumption),with fluctuation amplitudes exceeding an order of magnitude and non-monotonic effects observed.The evaluation results are analyzed and summarized in tables for users’reference.展开更多
Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC pat...Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC patients, as well as its association with the anti-apoptotic protein, myeloid cell leukemia sequence 1 (MCL1) protein. Methods FXR and MCL1 expression in 42 primary GBC and 15 normal gallbladder tissues were analyzed by immunohistochemistry. The patients and samples were collected from Ren Ji Hospital from January 2005 to December 2010. Their association with clinicopathologic factors and prognosis, as well as the correlation between FXR and MCL1 protein expression were analyzed by statistical analyses. Results Compared with normal gallbladder tissues, FXR expression was decreased and MCL1 expression was increased in GBC, during progression of tumor node metastasis (TNM) stage. The Kaplan-Meier survival analysis showed that FXR low-expression and MCL1 over-expression were significantly associated with overall poor survival. Furthermore, multivariate analysis showed that FXR and MCL1 are both prognostic factors for GBC patients. FXR low-expression was significantly correlated with MCL1 over-expression. Conclusion FXR might be a new molecular marker to predict the prognosis of patients with GBC and a novel therapeutic target. Chin Med J 2014;127 (14): 2637-2642展开更多
BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report...BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.展开更多
The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was ...The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was confirmed by N-terminal peptide sequencing. Two groups of bioassays, artificial diet incorporation assay and contact insecticidal effect assay, were carried out separately to verify the toxicity of this recombinant toxin. At the end of a 24 h experimental period, more than 60% of the testing diamondback moth (Plutella xylostella) larvae were killed in both groups with LC50 value of 18.4 microM and 0.70 microM respectively. Cytotoxicity assay using cultured Sf9 insect cells and MCF-7 human cells demonstrated that the toxin AaIT had specific toxicity against insect cells but not human cells. Only 0.13 microM recombinant toxin was needed to kill 50% of cultured insect cells while as much as 1.3 microM toxin had absolutely no effect on human cells. Insect cells produced obvious intrusions from their plasma membrane before broken up. We infer that toxin AaIT bind to a putative sodium channel in these insect cells and open the channel persistently, which would result in Na+ influx and finally cause destruction of insect cells.展开更多
MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a ...MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a mcm10-1 suppressor strain, which grows at 37 degrees C. Interestingly, this mcm10-1 suppressor undergoes cell cycle arrest at 14 degrees C. A novel gene, YLR003c, is identified by high-copy complementation of this suppressor. We called it as Cms1 (Complementation of Mcm 10 Suppressor). Furthermore, the experiments of transformation show that cells of mcm10-1 suppressor with high-copy plasmid but not low-copy plasmid grow at 14 degrees C, indicating that overexpression of Cms1 can rescue the growth arrest of this mcm10 suppressor at non-permissive temperature. These results suggest that CMS1 protein may functionally interact with MCM10 protein and play a role in the regulation of DNA replication and cell cycle control.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University, China (Grant No 20070365)the Program for Innovative Research Team of Jiangnan University, China
文摘A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.
基金Project supported in part by the International Technology Collaboration Research Program of China (Grant No 2007DFA706700)
文摘Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).
基金Project(No.Z111020834)supported by 08 Special Talent Fund of Northwest A&F University,China
文摘Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.
基金The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research Grant No(DSR2020–01–414).
文摘The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60435020)
文摘To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.
基金supported by grants from the National Natural Science Foundation of China(No.21974069)Open Fund Programs of Shenzhen Bay Laboratory(No.SZBL2020090501001)。
文摘Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.
基金supported by grants from National Natural Science Foundation of China(Grant Nos.81803755 and 81922064)National Science and Technology Major Project(Grant No.2018ZX10201002,China)+1 种基金China Postdoctoral ScienceFoundation(2018M640926 and 2020M673221)Sichuan University Postdoctoral Research and Development Foundation(2020SCU12062 and 2020SCU12056,China)。
文摘Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigorously promoting the tumor-related biomedical and clinical study,it is urgent for us to develop a professional and comprehensive database to curate MC-related data.Mitotic Catastrophe Database(MCDB)consists of 1214 genes/proteins and 5014 compounds collected and organized from more than 8000 research articles.Also,MCDB defines the confidence level,classification criteria,and uniform naming rules for MC-related data,which greatly improves data reliability and retrieval convenience.Moreover,MCDB develops protein sequence alignment and target prediction functions.The former can be used to predict new potential MC-related genes and proteins,and the latter can facilitate the identification of potential target proteins of unknown MC-related compounds.In short,MCDB is such a proprietary,standard,and comprehensive database for MC-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry,molecular biology,bioinformatics,oncology and so on.The MCDB is distributed on http://www.combio-lezhang.online/MCDB/indexhtml/.
基金supported by the National Natural Science Foundation of China (Grant Nos. 11175224 and 11121403)
文摘Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.
基金supported by the National Natural Science Foundation of China(82341042 and 32270993)the PhD program of the Interdisciplinary Research Center,Sun Yat-sen University.
文摘Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific mutations,particu-larly in the form of tumor neoantigens,to induce immune responses,particularly the activation of CD8+T cells,which can attack malignant cells.Since tumor mutations result in protein sequence alterations distinct from those in normal tissues,therapies that precisely target these alterations could,in principle,confer effective tumor control while minimizing off-target effects.
文摘In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.
文摘Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.
基金financially supported by the Anhui Provincial Natural Science Foundation(No.2022AH052316,GXXT-2022-002,KJ2021A1273).
文摘Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engineering,it has been widely applied.The amino acid sequence of recombinant human-like gelatin was constructed through a newly designed hexamer composed of six protein monomer sequences in series,with the minimum repeating unit being the characteristic Gly-X-Y sequence found in type III human collagenα1 chain.The nucleotide sequence was subsequently inserted into the genome of Pichia pastoris to enable soluble secretion expression of recombinant gelatin.At the shake flask fermentation level,the yield of recombinant gelatin is up to 0.057 g/L,and its purity can rise up to 95%through affinity purification.It was confirmed in the molecular weight determination and amino acid analysis that the amino acid composition of the obtained recombinant gelatin is identical to that of the theoretically designed.Furthermore,scanning electron microscopy revealed that the freeze-dried recombinant gelatin hydrogel exhibited a porous structure.After culturing cells continuously within these gelatin microspheres for two days followed by fluorescence staining and observation through confocal laser scanning microscopy,it was observed that cells clustered together within the gelatin matrix,exhibiting three-dimensional growth characteristics while maintaining good viability.This research presents promising prospects for developing recombinant gelatin as a biomedical material.
基金Acknowledgments The authors thank the anonymous referees for suggestions that helped to improve the paper substantially. And the project is partly sponsored by the Colleges and Universities Open Innovation Platform Fund of Hunan Province (No. 13K041), the Hunan Provincial Natural Science Foundation of China (No. 14JJ2070), the construct program of the key discipline in Hunan province, the State Educa- tion Ministry Scientific Research Foundation for the Returned Overseas Chinese Scholars, the Introduced Talent Start-up Fund Project of Xiangtan University (No. 11QDZ45).
文摘Graphical representation is a very efficient tool for visual analysis of protein sequences. In this paper, a novel 2D graphical representation scheme is proposed on the basis of a newly introduced concept, named characteristic model of the protein sequences. After obtaining the 2D graphics of protein sequences, two numerical characterizations of them is designed as descriptors to analyze the nine DN5 protein sequences, simulation and analysis results show that, comparing with existing methods, our method is not only visible, intuitional, and simple, but also has no circuit or degeneracy, and even more important, since the storage space required by our method is constant and has nothing to do with the length of protein sequences, then it can keep excellent visual inspection for long protein sequences.
基金supported by the National Key Research and Development Program of China(No.2023YFF1206103)the National Natural Science Foundation of China(No.62272449)+1 种基金the Shenzhen Basic Research Fund(Nos.KQTD20200820113106007 and ZDSYS20220422103800001)the funding support by the Key Laboratory of Quantitative Synthetic Biology,Chinese Academy of Sciences(No.CKL075).
文摘Sequence clustering software is essential in bioinformatics.However,selecting the appropriate one can be challenging due to its diverse algorithms and targeted applications.This paper analyzes and evaluates eight representative softwares(algorithms)in terms of precision,sensitivity,speed,scale of running time,and memory consumption.Furthermore,this paper examines the effects of sequence count,sequence length,identity,thread count,and GPU on the above aspects.Sequence length and identity significantly impact clustering efficiency(speed and memory consumption),with fluctuation amplitudes exceeding an order of magnitude and non-monotonic effects observed.The evaluation results are analyzed and summarized in tables for users’reference.
文摘Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC patients, as well as its association with the anti-apoptotic protein, myeloid cell leukemia sequence 1 (MCL1) protein. Methods FXR and MCL1 expression in 42 primary GBC and 15 normal gallbladder tissues were analyzed by immunohistochemistry. The patients and samples were collected from Ren Ji Hospital from January 2005 to December 2010. Their association with clinicopathologic factors and prognosis, as well as the correlation between FXR and MCL1 protein expression were analyzed by statistical analyses. Results Compared with normal gallbladder tissues, FXR expression was decreased and MCL1 expression was increased in GBC, during progression of tumor node metastasis (TNM) stage. The Kaplan-Meier survival analysis showed that FXR low-expression and MCL1 over-expression were significantly associated with overall poor survival. Furthermore, multivariate analysis showed that FXR and MCL1 are both prognostic factors for GBC patients. FXR low-expression was significantly correlated with MCL1 over-expression. Conclusion FXR might be a new molecular marker to predict the prognosis of patients with GBC and a novel therapeutic target. Chin Med J 2014;127 (14): 2637-2642
文摘BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.
基金This work was supported by a grant from 863High Technology Program,Chinese Ministry of Sci-ence and Technology
文摘The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was confirmed by N-terminal peptide sequencing. Two groups of bioassays, artificial diet incorporation assay and contact insecticidal effect assay, were carried out separately to verify the toxicity of this recombinant toxin. At the end of a 24 h experimental period, more than 60% of the testing diamondback moth (Plutella xylostella) larvae were killed in both groups with LC50 value of 18.4 microM and 0.70 microM respectively. Cytotoxicity assay using cultured Sf9 insect cells and MCF-7 human cells demonstrated that the toxin AaIT had specific toxicity against insect cells but not human cells. Only 0.13 microM recombinant toxin was needed to kill 50% of cultured insect cells while as much as 1.3 microM toxin had absolutely no effect on human cells. Insect cells produced obvious intrusions from their plasma membrane before broken up. We infer that toxin AaIT bind to a putative sodium channel in these insect cells and open the channel persistently, which would result in Na+ influx and finally cause destruction of insect cells.
文摘MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a mcm10-1 suppressor strain, which grows at 37 degrees C. Interestingly, this mcm10-1 suppressor undergoes cell cycle arrest at 14 degrees C. A novel gene, YLR003c, is identified by high-copy complementation of this suppressor. We called it as Cms1 (Complementation of Mcm 10 Suppressor). Furthermore, the experiments of transformation show that cells of mcm10-1 suppressor with high-copy plasmid but not low-copy plasmid grow at 14 degrees C, indicating that overexpression of Cms1 can rescue the growth arrest of this mcm10 suppressor at non-permissive temperature. These results suggest that CMS1 protein may functionally interact with MCM10 protein and play a role in the regulation of DNA replication and cell cycle control.