Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization ca...Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization can only be envisioned,and huge challenges need to be overcome.Major difficulties are inherent to the structure of proteins,which are composed by several different amino-acids.Despite long standing efforts,only few complex techniques,such as Edman degradation,liquid chromatography and mass spectroscopy,make protein sequencing possible.Unfortunately,these techniques present significant limitations in terms of amount of sample required and dynamic range of measurement.It is known that proteins can distinguish closely similar molecules.Moreover,several proteins can work as biological nanopores in order to perform single molecule detection and sequencing.Unfortunately,while DNA sequencing by means of nanopores is demonstrated,very few examples of nanopores able to perform reliable protein-sequencing have been reported sofar.Here,we investigate,by means of molecular dynamics simulations,how a re-engineered protein,acting as biological nanopore,can be used to recognize the sequence of a translocating peptide by sensing the MshapeH of individual amino-acids.In our simulations we demonstrate that it is possible to discriminate with high fidelity,9 different amino-acids in a short peptide translocating through the engineered construct.The method,here shown for fluorescence-based sequencing,does not require any labelling of the peptidic analyte.These results can pave the way for a new and highly sensitive method of sequencing.展开更多
Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies o...Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.展开更多
A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the ne...A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.展开更多
Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlati...Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.展开更多
Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper stud...Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).展开更多
Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein ...Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.展开更多
The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of Multiple...The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.展开更多
To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their...To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.展开更多
Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific m...Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific mutations,particu-larly in the form of tumor neoantigens,to induce immune responses,particularly the activation of CD8+T cells,which can attack malignant cells.Since tumor mutations result in protein sequence alterations distinct from those in normal tissues,therapies that precisely target these alterations could,in principle,confer effective tumor control while minimizing off-target effects.展开更多
In de novo protein sequencing,we often could only obtain an incomplete protein sequence,namely a scaffold,from top-down and bottom-up tandem mass spectrometry.While most sections of proteins can be inferred from their...In de novo protein sequencing,we often could only obtain an incomplete protein sequence,namely a scaffold,from top-down and bottom-up tandem mass spectrometry.While most sections of proteins can be inferred from their homologous sequences,some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffolds.Thus,we only focus on predicting the gaps based on a probabilistic algorithm and a machine learning model instead predicting the complete protein sequence using generative AI models in this paper.We study two versions of the protein scaffold filling problem with known gap size and known gap mass,respectively.For the known size gaps version,we develop several machine learning models based on random forest,k-nearest neighbors,decision tree,and fully connected neural network.For the known gap mass problem,we design a probabilistic algorithm to predict the missing amino acids in the gaps.The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100%and close to 100%accuracy,respectively.展开更多
BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report...BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.展开更多
In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequ...In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.展开更多
The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was ...The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was confirmed by N-terminal peptide sequencing. Two groups of bioassays, artificial diet incorporation assay and contact insecticidal effect assay, were carried out separately to verify the toxicity of this recombinant toxin. At the end of a 24 h experimental period, more than 60% of the testing diamondback moth (Plutella xylostella) larvae were killed in both groups with LC50 value of 18.4 microM and 0.70 microM respectively. Cytotoxicity assay using cultured Sf9 insect cells and MCF-7 human cells demonstrated that the toxin AaIT had specific toxicity against insect cells but not human cells. Only 0.13 microM recombinant toxin was needed to kill 50% of cultured insect cells while as much as 1.3 microM toxin had absolutely no effect on human cells. Insect cells produced obvious intrusions from their plasma membrane before broken up. We infer that toxin AaIT bind to a putative sodium channel in these insect cells and open the channel persistently, which would result in Na+ influx and finally cause destruction of insect cells.展开更多
MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a ...MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a mcm10-1 suppressor strain, which grows at 37 degrees C. Interestingly, this mcm10-1 suppressor undergoes cell cycle arrest at 14 degrees C. A novel gene, YLR003c, is identified by high-copy complementation of this suppressor. We called it as Cms1 (Complementation of Mcm 10 Suppressor). Furthermore, the experiments of transformation show that cells of mcm10-1 suppressor with high-copy plasmid but not low-copy plasmid grow at 14 degrees C, indicating that overexpression of Cms1 can rescue the growth arrest of this mcm10 suppressor at non-permissive temperature. These results suggest that CMS1 protein may functionally interact with MCM10 protein and play a role in the regulation of DNA replication and cell cycle control.展开更多
Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computi...Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.展开更多
Baicalein had been proved to have anti-cancer activity in vitro and in vivo, including the inhibition of malignant proliferation, migration, adhesion and invasion of many kinds of cancer cells. The special AT-rich seq...Baicalein had been proved to have anti-cancer activity in vitro and in vivo, including the inhibition of malignant proliferation, migration, adhesion and invasion of many kinds of cancer cells. The special AT-rich sequence binding protein 1 (SATB1) is a tissue-specific expression of nuclear matrix-binding protein and is reported to be a breast cancer "gene group organizer". Previous studies have shown that SATB1 is involved in the growth, metastasis and prognosis of breast cancer. The present study was aimed to investigate whether baicalein inhibits the proliferation and migration of MDA-MB-231 human breast cancer cells through down-regulation of the SATB1 expression. Methods: MDA-MB-231 cells were treated for 24 h, 48 h and 72 h with various concentrations of baicalein (0, 5, 10, 20, 40 and 80 pM) respectively. Then, the proliferation and migration of MDA-MB-231 cells following treatment with baicalein were determined using colorimetric 3-(4, 5-dimethylthia- zol-2-yl) 2, 5-diphenyltetrazolium bromide (MTT) and wound healing assays. Thereafter, western blot analysis was performed to detect the changes of SATB1 protein expression in MDA-MB-231 cells. Results: Along with the prolongation of time and increase of drug concentration, inhibitory effect of baicalein on proliferation and migration of MDA-MB-231 cells gradually in- creased, in a time.- and dose- dependent manner (P 〈 0.05). Meanwhile, after treated with baicalein in different concentrations for 48 h, the level of SATB1 protein expression of MDA-MB-231 cells decreased obviously, in a dose-dependent manner (P 〈 0.05). Conclusion: Baicalein inhibits breast cancer cell proliferation and suppresses its invasion and metastasis by reducing cell migration possibly by down-regulation of the SATB1 protein expression, indicating that baicalein is a potential therapeutic agent for human breast cancer.展开更多
A study on nontuberculous mycobacteria (NTM) was carried out in wildlife-livestock interface of Katavi Rukwa ecosystem (KRE). 328 livestock tissues and 178 wild animals were cultured, wild animals were sampled opp...A study on nontuberculous mycobacteria (NTM) was carried out in wildlife-livestock interface of Katavi Rukwa ecosystem (KRE). 328 livestock tissues and 178 wild animals were cultured, wild animals were sampled opportunistically during professional hunting and game cropping operations in the KRE protected areas. The objective of the study was to generate data on epidemiology of NTM in the wildlife-livestock interface of the KRE. Methods used to identify the NTM were: culture and isolation, polymerase chain reaction, protein heat shock 65 kilodalton (hsp65) and sequencing. Mycobacteria were detected on 25.9% and 11.9% of livestock and wildlife tissue cultures, respectively. The most NTM isolated were M. kansasii (30%), M. gastri (30%), M. fortuitum (1%), M. intracellulare (4%), M. indicuspranii (4%), M. nonchromogenicum (6%) and M. lentiflavum (6%). Other NTM in smaller percentages were M. hibernae, M. engbaekii, M. septicum, M. arupense and 34.. godii. Due to rise of NTM infection in both human and animals, it is recommended that awareness and laboratory facilities be improved to curb the underreporting especially in TB-endemic countries. For species specific identification, a network of national and regional laboratories is promoted.展开更多
Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engi...Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engineering,it has been widely applied.The amino acid sequence of recombinant human-like gelatin was constructed through a newly designed hexamer composed of six protein monomer sequences in series,with the minimum repeating unit being the characteristic Gly-X-Y sequence found in type III human collagenα1 chain.The nucleotide sequence was subsequently inserted into the genome of Pichia pastoris to enable soluble secretion expression of recombinant gelatin.At the shake flask fermentation level,the yield of recombinant gelatin is up to 0.057 g/L,and its purity can rise up to 95%through affinity purification.It was confirmed in the molecular weight determination and amino acid analysis that the amino acid composition of the obtained recombinant gelatin is identical to that of the theoretically designed.Furthermore,scanning electron microscopy revealed that the freeze-dried recombinant gelatin hydrogel exhibited a porous structure.After culturing cells continuously within these gelatin microspheres for two days followed by fluorescence staining and observation through confocal laser scanning microscopy,it was observed that cells clustered together within the gelatin matrix,exhibiting three-dimensional growth characteristics while maintaining good viability.This research presents promising prospects for developing recombinant gelatin as a biomedical material.展开更多
Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair ...Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair of primers P1 and P2 was systhesized according to the DNA sequence of the 23kDa membrane protein of Schistosoma mansoni, a BamH1 site was added at 5' end of the primer P1 and the Sall site was added at the 5' end of the primer P2. The gene DNA fragment of the 23kDa membrane protein (SjC23) of Schistosoma japonicum was amplified from the cDNA library of Schistosoma japonica by PCR, the purified target DNA fragment was inserted into the vector pUC18/19 to form the recombinant, and sequenced in Livopool University, UK and Fudan Universtiy, China respectively. The DNA sequence was analyzed with Dnasis software, and the amino sequence was deduced with the SWISS PORT software. Results The size of DNA of 23kDa membrane protein of Schistosoma japonica Chinese strain (SjC23) was 657bp, and it was the same size as that of Sm23 and Sj23 Philippine strain. The DNA sequence of Sj23 Chinese strain (SjC23) was 100% in homology with the SjC23 Philippine strain, and 79.5% in homology with Sm23. The deduced amino acid sequence of SjC23 was 84% in homology with the Sm23, and 100% in homology with Philippine strain. There were two hydrophilic domains in the SjC23, one was located at the N terminal (amino acid 36-56), and another was at the C terminal (amino acid 108-183). Conclusions The gene of the 23kDa membrane protein of Schistosoma japonica Chinese strain has been cloned, and this work has laid the foundation for the development of the vaccine of Schistosoma japonica Chinese strain.展开更多
基金the Horizon 2020 Program,FET-Open:PROSEQO,Grant Agreement no.[687089].We acknowledge PRACE for awarding us access to Marconi at CINECA,Italy.
文摘Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization can only be envisioned,and huge challenges need to be overcome.Major difficulties are inherent to the structure of proteins,which are composed by several different amino-acids.Despite long standing efforts,only few complex techniques,such as Edman degradation,liquid chromatography and mass spectroscopy,make protein sequencing possible.Unfortunately,these techniques present significant limitations in terms of amount of sample required and dynamic range of measurement.It is known that proteins can distinguish closely similar molecules.Moreover,several proteins can work as biological nanopores in order to perform single molecule detection and sequencing.Unfortunately,while DNA sequencing by means of nanopores is demonstrated,very few examples of nanopores able to perform reliable protein-sequencing have been reported sofar.Here,we investigate,by means of molecular dynamics simulations,how a re-engineered protein,acting as biological nanopore,can be used to recognize the sequence of a translocating peptide by sensing the MshapeH of individual amino-acids.In our simulations we demonstrate that it is possible to discriminate with high fidelity,9 different amino-acids in a short peptide translocating through the engineered construct.The method,here shown for fluorescence-based sequencing,does not require any labelling of the peptidic analyte.These results can pave the way for a new and highly sensitive method of sequencing.
基金supported by grants from the National Natural Science Foundation of China(No.21974069)Open Fund Programs of Shenzhen Bay Laboratory(No.SZBL2020090501001)。
文摘Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University, China (Grant No 20070365)the Program for Innovative Research Team of Jiangnan University, China
文摘A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.
基金supported by the National Natural Science Foundation of China (Grant Nos. 11175224 and 11121403)
文摘Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.
基金Project supported in part by the International Technology Collaboration Research Program of China (Grant No 2007DFA706700)
文摘Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).
基金Project(No.Z111020834)supported by 08 Special Talent Fund of Northwest A&F University,China
文摘Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.
基金The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research Grant No(DSR2020–01–414).
文摘The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60435020)
文摘To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.
基金supported by the National Natural Science Foundation of China(82341042 and 32270993)the PhD program of the Interdisciplinary Research Center,Sun Yat-sen University.
文摘Tumor initiation and progression are highly intricate biolog-ical processes,and mutation-driven tumorigenesis is a pri-mary underlying cause.Personalized cancer vaccines have been developed to exploit these specific mutations,particu-larly in the form of tumor neoantigens,to induce immune responses,particularly the activation of CD8+T cells,which can attack malignant cells.Since tumor mutations result in protein sequence alterations distinct from those in normal tissues,therapies that precisely target these alterations could,in principle,confer effective tumor control while minimizing off-target effects.
基金supported by the USA National Science Foundation under Grant Nos.2307571,2307572,and 2307573.
文摘In de novo protein sequencing,we often could only obtain an incomplete protein sequence,namely a scaffold,from top-down and bottom-up tandem mass spectrometry.While most sections of proteins can be inferred from their homologous sequences,some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffolds.Thus,we only focus on predicting the gaps based on a probabilistic algorithm and a machine learning model instead predicting the complete protein sequence using generative AI models in this paper.We study two versions of the protein scaffold filling problem with known gap size and known gap mass,respectively.For the known size gaps version,we develop several machine learning models based on random forest,k-nearest neighbors,decision tree,and fully connected neural network.For the known gap mass problem,we design a probabilistic algorithm to predict the missing amino acids in the gaps.The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100%and close to 100%accuracy,respectively.
文摘BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.
文摘In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.
基金This work was supported by a grant from 863High Technology Program,Chinese Ministry of Sci-ence and Technology
文摘The nucleotide sequence deduced from the amino acid sequence of the scorpion insectotoxin AaIT was chemically synthesized and was expressed in Escherichia coli. The authenticity of this in vitro expressed peptide was confirmed by N-terminal peptide sequencing. Two groups of bioassays, artificial diet incorporation assay and contact insecticidal effect assay, were carried out separately to verify the toxicity of this recombinant toxin. At the end of a 24 h experimental period, more than 60% of the testing diamondback moth (Plutella xylostella) larvae were killed in both groups with LC50 value of 18.4 microM and 0.70 microM respectively. Cytotoxicity assay using cultured Sf9 insect cells and MCF-7 human cells demonstrated that the toxin AaIT had specific toxicity against insect cells but not human cells. Only 0.13 microM recombinant toxin was needed to kill 50% of cultured insect cells while as much as 1.3 microM toxin had absolutely no effect on human cells. Insect cells produced obvious intrusions from their plasma membrane before broken up. We infer that toxin AaIT bind to a putative sodium channel in these insect cells and open the channel persistently, which would result in Na+ influx and finally cause destruction of insect cells.
文摘MCM10 protein is an essential replication factor involved in the initiation of DNA replication. A mcm10 mutant (mcm10-1) of budding yeast shows a growth arrest at 37 degrees C. In the present work, we have isolated a mcm10-1 suppressor strain, which grows at 37 degrees C. Interestingly, this mcm10-1 suppressor undergoes cell cycle arrest at 14 degrees C. A novel gene, YLR003c, is identified by high-copy complementation of this suppressor. We called it as Cms1 (Complementation of Mcm 10 Suppressor). Furthermore, the experiments of transformation show that cells of mcm10-1 suppressor with high-copy plasmid but not low-copy plasmid grow at 14 degrees C, indicating that overexpression of Cms1 can rescue the growth arrest of this mcm10 suppressor at non-permissive temperature. These results suggest that CMS1 protein may functionally interact with MCM10 protein and play a role in the regulation of DNA replication and cell cycle control.
文摘Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.
基金Supported by grants from the National Natural Science Foundation of China(No.81274136)Xi’an Jiaotong University’s Cross Project Funds(No.Xjj2012141)the Talent Funds of the Second Affiliated Hospitalof Xi’an Jiaotong University(No.RCCGG201105)
文摘Baicalein had been proved to have anti-cancer activity in vitro and in vivo, including the inhibition of malignant proliferation, migration, adhesion and invasion of many kinds of cancer cells. The special AT-rich sequence binding protein 1 (SATB1) is a tissue-specific expression of nuclear matrix-binding protein and is reported to be a breast cancer "gene group organizer". Previous studies have shown that SATB1 is involved in the growth, metastasis and prognosis of breast cancer. The present study was aimed to investigate whether baicalein inhibits the proliferation and migration of MDA-MB-231 human breast cancer cells through down-regulation of the SATB1 expression. Methods: MDA-MB-231 cells were treated for 24 h, 48 h and 72 h with various concentrations of baicalein (0, 5, 10, 20, 40 and 80 pM) respectively. Then, the proliferation and migration of MDA-MB-231 cells following treatment with baicalein were determined using colorimetric 3-(4, 5-dimethylthia- zol-2-yl) 2, 5-diphenyltetrazolium bromide (MTT) and wound healing assays. Thereafter, western blot analysis was performed to detect the changes of SATB1 protein expression in MDA-MB-231 cells. Results: Along with the prolongation of time and increase of drug concentration, inhibitory effect of baicalein on proliferation and migration of MDA-MB-231 cells gradually in- creased, in a time.- and dose- dependent manner (P 〈 0.05). Meanwhile, after treated with baicalein in different concentrations for 48 h, the level of SATB1 protein expression of MDA-MB-231 cells decreased obviously, in a dose-dependent manner (P 〈 0.05). Conclusion: Baicalein inhibits breast cancer cell proliferation and suppresses its invasion and metastasis by reducing cell migration possibly by down-regulation of the SATB1 protein expression, indicating that baicalein is a potential therapeutic agent for human breast cancer.
文摘A study on nontuberculous mycobacteria (NTM) was carried out in wildlife-livestock interface of Katavi Rukwa ecosystem (KRE). 328 livestock tissues and 178 wild animals were cultured, wild animals were sampled opportunistically during professional hunting and game cropping operations in the KRE protected areas. The objective of the study was to generate data on epidemiology of NTM in the wildlife-livestock interface of the KRE. Methods used to identify the NTM were: culture and isolation, polymerase chain reaction, protein heat shock 65 kilodalton (hsp65) and sequencing. Mycobacteria were detected on 25.9% and 11.9% of livestock and wildlife tissue cultures, respectively. The most NTM isolated were M. kansasii (30%), M. gastri (30%), M. fortuitum (1%), M. intracellulare (4%), M. indicuspranii (4%), M. nonchromogenicum (6%) and M. lentiflavum (6%). Other NTM in smaller percentages were M. hibernae, M. engbaekii, M. septicum, M. arupense and 34.. godii. Due to rise of NTM infection in both human and animals, it is recommended that awareness and laboratory facilities be improved to curb the underreporting especially in TB-endemic countries. For species specific identification, a network of national and regional laboratories is promoted.
基金financially supported by the Anhui Provincial Natural Science Foundation(No.2022AH052316,GXXT-2022-002,KJ2021A1273).
文摘Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen,belonging to natural biopeptides.With irreplaceable biological functions in the field of biomedical science and tissue engineering,it has been widely applied.The amino acid sequence of recombinant human-like gelatin was constructed through a newly designed hexamer composed of six protein monomer sequences in series,with the minimum repeating unit being the characteristic Gly-X-Y sequence found in type III human collagenα1 chain.The nucleotide sequence was subsequently inserted into the genome of Pichia pastoris to enable soluble secretion expression of recombinant gelatin.At the shake flask fermentation level,the yield of recombinant gelatin is up to 0.057 g/L,and its purity can rise up to 95%through affinity purification.It was confirmed in the molecular weight determination and amino acid analysis that the amino acid composition of the obtained recombinant gelatin is identical to that of the theoretically designed.Furthermore,scanning electron microscopy revealed that the freeze-dried recombinant gelatin hydrogel exhibited a porous structure.After culturing cells continuously within these gelatin microspheres for two days followed by fluorescence staining and observation through confocal laser scanning microscopy,it was observed that cells clustered together within the gelatin matrix,exhibiting three-dimensional growth characteristics while maintaining good viability.This research presents promising prospects for developing recombinant gelatin as a biomedical material.
文摘Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair of primers P1 and P2 was systhesized according to the DNA sequence of the 23kDa membrane protein of Schistosoma mansoni, a BamH1 site was added at 5' end of the primer P1 and the Sall site was added at the 5' end of the primer P2. The gene DNA fragment of the 23kDa membrane protein (SjC23) of Schistosoma japonicum was amplified from the cDNA library of Schistosoma japonica by PCR, the purified target DNA fragment was inserted into the vector pUC18/19 to form the recombinant, and sequenced in Livopool University, UK and Fudan Universtiy, China respectively. The DNA sequence was analyzed with Dnasis software, and the amino sequence was deduced with the SWISS PORT software. Results The size of DNA of 23kDa membrane protein of Schistosoma japonica Chinese strain (SjC23) was 657bp, and it was the same size as that of Sm23 and Sj23 Philippine strain. The DNA sequence of Sj23 Chinese strain (SjC23) was 100% in homology with the SjC23 Philippine strain, and 79.5% in homology with Sm23. The deduced amino acid sequence of SjC23 was 84% in homology with the Sm23, and 100% in homology with Philippine strain. There were two hydrophilic domains in the SjC23, one was located at the N terminal (amino acid 36-56), and another was at the C terminal (amino acid 108-183). Conclusions The gene of the 23kDa membrane protein of Schistosoma japonica Chinese strain has been cloned, and this work has laid the foundation for the development of the vaccine of Schistosoma japonica Chinese strain.