Nuclear magnetic resonance(NMR)spectroscopy is a key method for molecular structure elucidation.However,interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral dat...Nuclear magnetic resonance(NMR)spectroscopy is a key method for molecular structure elucidation.However,interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space.Here we introduce DiffNMR,a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra.DiffNMR refines molecular graphs iteratively through a diffusion-based generative process,ensuring global consistency and mitigating error accumulation inherent in autoregressive methods.The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via a diffusion autoencoder and contrastive learning.It also incorporates retrieval initialization and similarity filtering during inference.Our experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation,especially outperforming autoregressive models in domain generalization and robustness,thereby offering an efficient and robust solution for automated molecular analysis.展开更多
基金the National Key Research and Development Program of China(2020YFA0309600)the National Natural Science Foundation of China(NSFC+6 种基金61888102,11834017,and 12074413)the Strategic Priority Research Program of Chinese Academy of Sciences(CASXDB30000000 and XDB33000000)Gen Long acknowledges the support from NSFC(12104330)the support from the Elemental Strategy Initiative conducted by the MEXT,Japan(JPMXP0112101001)JSPS KAKENHI(19H05790,20H00354,and 21H05233)A3 Foresight by JSPS。
基金supported by the National Science and Technology Major Project(Grants No.2023ZD0120702)Basic Research Program of Jiangsu(BK20231215)+1 种基金National Natural Science Foundation of China(Grant No.82401075)Natural Science Foundation of Jiangsu Province Major Project(BK20232012).
文摘Nuclear magnetic resonance(NMR)spectroscopy is a key method for molecular structure elucidation.However,interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space.Here we introduce DiffNMR,a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra.DiffNMR refines molecular graphs iteratively through a diffusion-based generative process,ensuring global consistency and mitigating error accumulation inherent in autoregressive methods.The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via a diffusion autoencoder and contrastive learning.It also incorporates retrieval initialization and similarity filtering during inference.Our experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation,especially outperforming autoregressive models in domain generalization and robustness,thereby offering an efficient and robust solution for automated molecular analysis.