摘要
为了推动化学分子结构式的信息化发展,通过深度学习方法将手绘化学分子结构式图像有效识别为SMILES编码。首先通过合成和真实手绘两种方式构建包含碳、氢、氧及卤素原子的化学分子结构式图像数据集,然后采用Efficient Net、Transformer组合网络在不同数据集规模、合成图像与真实手绘图像不同混合比例下分别进行对比实验,最终发现在100 K规模下按照合成图像与真实手绘图像9︰1的比例来训练的模型识别效果最好,Tanimoto系数与匹配率分别达到75.3%和60.7%。
In order to promote the informatization development of chemical molecular structure formulas,deep learning methods are used to effectively recognize hand drawn chemical molecular structure formula images as SMILES encoding.Firstly,a chemical molecular structure image dataset containing carbon,hydrogen,oxygen,and halogen atoms is constructed using both synthetic and real hand drawn methods.Then,a combination of EfficientNet and Transformer networks is used to conduct comparative experiments under different dataset sizes and different mixing ratios of synthetic and real hand drawn images.Finally,it is found that the model trained at a ratio of 9:1 between synthetic and real hand drawn images had the best recognition performance at a scale of 100K,with a Tanimoto coefficient and match rate of 75.3%and 60.7%.
作者
罗泱鸿
张成朋
耿舒琪
张婉佳
周佳钰
陶家俊
刘伟
LUO Yanghong;ZHANG Chengpeng;GENG Shuqi;ZHANG Wanjia;ZHOU Jiayu;TAO Jiajun;LIU Wei(Hunan University of Chinese Medicine,Changsha Hunan 410208,China)
出处
《信息与电脑》
2024年第22期73-76,共4页
Information & Computer
基金
湖南省大学生创新创业训练计划项目(项目编号:2023-2365)
湖南省自然科学基金面上项目(项目编号:2022JJ30438)。