摘要
Codon optimization enhances heterologous gene expression by modulating synonymous codon usage,a critical task in genetic engineering and synthetic biology.Achieving optimal expression requires balancing multiple interdependent factors,such as host codon bias,GC content and mRNA secondary structure,turning optimization into a challenging multiobjective problem.Here,we introduce DeepCodon,a novel deep learning tool focused on preserving functionally important rare codon clusters,which are often overlooked in previous methods.Using Escherichia coli as the host species for gene expression,a protein-CDS translation model was first trained on 1.5 million natural Enterobacteriaceae sequences and then fine-tuned with highly expressed genes.To protect functionally important rare codon clusters,we integrated a conditional probability strategy that preserves conserved rare codons.Compared with conventional approaches,DeepCodon generates sequences that better match host preferences,achieves superior in silico metrics and maintains critical rare codons.Experimental validation of seven low-yield P450s and thirteen AI-designed G3PDHs in E.coli revealed that DeepCodon out-performed traditional methods in nine cases.These results demonstrate DeepCodon's potential as a practical solution for codon optimization.
基金
funding from the Strategic Priority Research Program of the Chinese Academy of Sciences XDC0120200
National Natural Science Foundation of China(32371499,12326611)
COMSATS Joint Center for Industrial Biotechnology(No.TSBICIP-IJCP-001)
Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project(No.TSBICIP-IJCP-002,TSBICIP-CYFH-011,TSBICIPKJG G-009-02,T SBICIP-KJG G-008-02,TSB IC IP-K JG G-018,TSBIC IPPTJJ-012)
major research projects of the Haihe Laboratory of Synthetic Biology(No.22HHSWSS00005 and 22HHSWSS00004).