Objective N6-methyladenosine(m6A),the most prevalent epigenetic modification in eukaryotic RNA,plays a pivotal role in regulating cellular differentiation and developmental processes,with its dysregulation implicated ...Objective N6-methyladenosine(m6A),the most prevalent epigenetic modification in eukaryotic RNA,plays a pivotal role in regulating cellular differentiation and developmental processes,with its dysregulation implicated in diverse pathological conditions.Accurate prediction of m6A sites is critical for elucidating their regulatory mechanisms and informing drug development.However,traditional experimental methods are time-consuming and costly.Although various computational approaches have been proposed,challenges remain in feature learning,predictive accuracy,and generalization.Here,we present m6A-PSRA,a dual-branch residual-network-based predictor that fully exploits RNA sequence information to enhance prediction performance and model generalization.Methods m6A-PSRA adopts a parallel dual-branch network architecture to comprehensively extract RNA sequence features via two independent pathways.The first branch applies one-hot encoding to transform the RNA sequence into a numerical matrix while strictly preserving positional information and sequence continuity.This ensures that the biological context conveyed by nucleotide order is retained.A bidirectional long short-term memory network(BiLSTM)then processes the encoded matrix,capturing both forward and backward dependencies between bases to resolve contextual correlations.The second branch employs a k-mer tokenization strategy(k=3),decomposing the sequence into overlapping 3-mer subsequences to capture local sequence patterns.A pre-trained Doc2vec model maps these subsequences into fixeddimensional vectors,reducing feature dimensionality while extracting latent global semantic information via context learning.Both branches integrate residual networks(ResNet)and a self-attention mechanism:ResNet mitigates vanishing gradients through skip connections,preserving feature integrity,while self-attention adaptively assigns weights to focus on sequence regions most relevant to methylation prediction.This synergy enhances both feature learning and generalization capability.Results Across 11 tissues from humans,mice,and rats,m6A-PSRA consistently outperformed existing methods in accuracy(ACC)and area under the curve(AUC),achieving>90%ACC and>95%AUC in every tissue tested,indicating strong cross-species and cross-tissue adaptability.Validation on independent datasets—including three human cell lines(MOLM1,HEK293,A549)and a long-sequence dataset(m6A_IND,1001 nt)—confirmed stable performance across varied biological contexts and sequence lengths.Ablation studies demonstrated that the dual-branch architecture,residual network,and self-attention mechanism each contribute critically to performance,with their combination reducing interference between pathways.Motif analysis revealed an enrichment of m6A sites in guanine(G)and cytosine(C),consistent with known regulatory patterns,supporting the model’s biological plausibility.Conclusion m6A-PSRA effectively captures RNA sequence features,achieving high prediction accuracy and robust generalization across tissues and species,providing an efficient computational tool for m6A methylation site prediction.展开更多
为系统梳理2004—2023年可溶性有机碳研究的发展动态、热点演变及前沿方向,基于Web of Science和CNKI数据库,系统检索并筛选2004—2023年发表的3003篇相关文献。使用VOSviewer、CiteSpace等文献计量工具,构建涵盖发文国家、机构、关键...为系统梳理2004—2023年可溶性有机碳研究的发展动态、热点演变及前沿方向,基于Web of Science和CNKI数据库,系统检索并筛选2004—2023年发表的3003篇相关文献。使用VOSviewer、CiteSpace等文献计量工具,构建涵盖发文国家、机构、关键词等维度的可视化知识图谱。结果发现,可溶性有机碳主题相关研究论文发表数量呈现逐年增加趋势,中国科学院是该领域发文量最多的科研机构。中美两国主导全球研究产出(美国748篇,中国556篇),并形成密切国际合作网络。研究热点主要聚焦于农业土壤和环境保护,研究范畴从土壤环境拓展至河流、湖泊、海洋等水生系统,并日益关注气候变化影响和生物可利用性等关键议题。本研究系统揭示了可溶性有机碳研究的核心力量、热点主题的时空演变规律及关键文献,可为厘清该领域研究脉络、识别未来方向提供计量学依据和实践参考。展开更多
基金supported by grants from The National Natural Science Foundation of China(12361104)Yunnan Fundamental Research Projects(202301AT070016,202401AT070036)+2 种基金the Youth Talent Program of Xingdian Talent Support Plan(XDYC-QNRC-2022-0514)the Yunnan Province International Joint Laboratory for Intelligent Integration and Application of Ethnic Multilingualism(202403AP140014)the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University(SMDAYB2023004)。
文摘Objective N6-methyladenosine(m6A),the most prevalent epigenetic modification in eukaryotic RNA,plays a pivotal role in regulating cellular differentiation and developmental processes,with its dysregulation implicated in diverse pathological conditions.Accurate prediction of m6A sites is critical for elucidating their regulatory mechanisms and informing drug development.However,traditional experimental methods are time-consuming and costly.Although various computational approaches have been proposed,challenges remain in feature learning,predictive accuracy,and generalization.Here,we present m6A-PSRA,a dual-branch residual-network-based predictor that fully exploits RNA sequence information to enhance prediction performance and model generalization.Methods m6A-PSRA adopts a parallel dual-branch network architecture to comprehensively extract RNA sequence features via two independent pathways.The first branch applies one-hot encoding to transform the RNA sequence into a numerical matrix while strictly preserving positional information and sequence continuity.This ensures that the biological context conveyed by nucleotide order is retained.A bidirectional long short-term memory network(BiLSTM)then processes the encoded matrix,capturing both forward and backward dependencies between bases to resolve contextual correlations.The second branch employs a k-mer tokenization strategy(k=3),decomposing the sequence into overlapping 3-mer subsequences to capture local sequence patterns.A pre-trained Doc2vec model maps these subsequences into fixeddimensional vectors,reducing feature dimensionality while extracting latent global semantic information via context learning.Both branches integrate residual networks(ResNet)and a self-attention mechanism:ResNet mitigates vanishing gradients through skip connections,preserving feature integrity,while self-attention adaptively assigns weights to focus on sequence regions most relevant to methylation prediction.This synergy enhances both feature learning and generalization capability.Results Across 11 tissues from humans,mice,and rats,m6A-PSRA consistently outperformed existing methods in accuracy(ACC)and area under the curve(AUC),achieving>90%ACC and>95%AUC in every tissue tested,indicating strong cross-species and cross-tissue adaptability.Validation on independent datasets—including three human cell lines(MOLM1,HEK293,A549)and a long-sequence dataset(m6A_IND,1001 nt)—confirmed stable performance across varied biological contexts and sequence lengths.Ablation studies demonstrated that the dual-branch architecture,residual network,and self-attention mechanism each contribute critically to performance,with their combination reducing interference between pathways.Motif analysis revealed an enrichment of m6A sites in guanine(G)and cytosine(C),consistent with known regulatory patterns,supporting the model’s biological plausibility.Conclusion m6A-PSRA effectively captures RNA sequence features,achieving high prediction accuracy and robust generalization across tissues and species,providing an efficient computational tool for m6A methylation site prediction.
文摘为系统梳理2004—2023年可溶性有机碳研究的发展动态、热点演变及前沿方向,基于Web of Science和CNKI数据库,系统检索并筛选2004—2023年发表的3003篇相关文献。使用VOSviewer、CiteSpace等文献计量工具,构建涵盖发文国家、机构、关键词等维度的可视化知识图谱。结果发现,可溶性有机碳主题相关研究论文发表数量呈现逐年增加趋势,中国科学院是该领域发文量最多的科研机构。中美两国主导全球研究产出(美国748篇,中国556篇),并形成密切国际合作网络。研究热点主要聚焦于农业土壤和环境保护,研究范畴从土壤环境拓展至河流、湖泊、海洋等水生系统,并日益关注气候变化影响和生物可利用性等关键议题。本研究系统揭示了可溶性有机碳研究的核心力量、热点主题的时空演变规律及关键文献,可为厘清该领域研究脉络、识别未来方向提供计量学依据和实践参考。