Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease re...Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.展开更多
Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yie...Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yielded impressive achievements for spacecraft pose estimation.To improve the robustness and interpretability of CNNs,this paper proposes a Pose Estimation approach based on Variational Auto-Encoder structure(PE-VAE)and a Feature-Aided pose estimation approach based on Variational Auto-Encoder structure(FA-VAE),which aim to accurately estimate the 6 DoF pose of a target spacecraft.Both methods treat the pose vector as latent variables,employing an encoder-decoder network with a Variational Auto-Encoder(VAE)structure.To enhance the precision of pose estimation,PE-VAE uses the VAE structure to introduce reconstruction mechanism with the whole image.Furthermore,FA-VAE enforces feature shape constraints by exclusively reconstructing the segment of the target spacecraft with the desired shape.Comparative evaluation against leading methods on public datasets reveals similar accuracy with a threefold improvement in processing speed,showcasing the significant contribution of VAE structures to accuracy enhancement,and the additional benefit of incorporating global shape prior features.展开更多
The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic ...The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic variants in time. Here, we built a stacked auto-encoder (SAE) model for predicting the antigenic variant of human influenza A(H3N2) viruses based on the hemagglutinin (HA) protein sequences. The model achieved an accuracy of 0.95 in five-fold cross-validations, better than the logistic regression model did. Further analysis of the model shows that most of the active nodes in the hidden layer reflected the combined contribution of multiple residues to antigenic variation. Besides, some features (residues on HA protein) in the input layer were observed to take part in multiple active nodes, such as residue 189, 145 and 156, which were also reported to mostly determine the antigenic variation of influenza A(H3N2) viruses. Overall,this work is not only useful for rapidly identifying antigenic variants in influenza prevention, but also an interesting attempt in inferring the mechanisms of biological process through analysis of SAE model, which may give some insights into interpretation of the deep learning展开更多
Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security.The advent of Genomic Selection heralds a new epoch in breeding,characterized by its capacity to harness whole...Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security.The advent of Genomic Selection heralds a new epoch in breeding,characterized by its capacity to harness whole-genome variation for genomic prediction.This approach transcends the need for prior knowledge of genes associated with specific traits.Nonetheless,the vast dimensionality of genomic data juxtaposed with the relatively limited number of phenotypic samples often leads to the“curse of dimensionality”,where traditional statistical,machine learning,and deep learning methods are prone to overfitting and suboptimal predictive performance.To surmount this challenge,we introduce a unified Variational auto-encoder based Multi-task Genomic Prediction model(VMGP)that integrates self-supervised genomic compression and reconstruction with multiple prediction tasks.This approach provides a robust solution,offering a formidable predictive framework that has been rigorously validated across public datasets for wheat,rice,and maize.Our model demonstrates exceptional capabilities in multi-phenotype and multi-environment genomic prediction,successfully navigating the complexities of cross-population genomic selection and underscoring its unique strengths and utility.Furthermore,by integrating VMGP with model interpretability,we can effectively triage relevant single nucleotide polymorphisms,thereby enhancing prediction performance and proposing potential cost-effective genotyping solutions.The VMGP framework,with its simplicity,stable predictive prowess,and open-source code,is exceptionally well-suited for broad dissemination within plant breeding programs.It is particularly advantageous for breeders who prioritize phenotype prediction yet may not possess extensive knowledge in deep learning or proficiency in parameter tuning.展开更多
The Proton Exchange Membrane Fuel Cell(PEMFC)converts the chemical energy of hydrogen fuel directly into electrical energy with broad application prospects.Understanding how current density is distributed in the PEMFC...The Proton Exchange Membrane Fuel Cell(PEMFC)converts the chemical energy of hydrogen fuel directly into electrical energy with broad application prospects.Understanding how current density is distributed in the PEMFC systems is crucial as it is a key factor influencing system performance.However,direct modeling for current distribution may encounter the challenge of dimensional catastrophe owing to the high dimensionality of the data.This paper uses a high-resolution segmented measurement device with 396 points to conduct experimental tests on the current distribution of a PEMFC with reactive area of 406 cm^(2) during a stepwise increase in load current.The current distribution is modeled based on the test results to learn the mapping relationship between the experimental parameters and the current distribution.The proposed model utilizes a Conditional Variational Auto-Encoder(CVAE)to generate current distributions.The MSE(Mean-Square Error)of the trained CVAE model reaches 9.2×10^(-5),and the comparison results show that the 222.9A current distribution error has the largest MSE of 6.36×10^(-4) and a KL Divergence(Kullback-Leibler Divergence)of 9.55×10^(-4),both of which are at a low level.This model enables the direct determination of the current distribution based on the experimental parameters,thereby establishing a technical foundation for investigating the impact of experimental conditions on fuel cells.This model is also of great significance for research on fuel cell system control strategies and fault diagnosis.展开更多
针对滚动轴承故障诊断中样本分布不均衡引起的模型泛化能力差、诊断精度低的问题,从两个方面展开研究:(1)故障样本增广,提出结合变分自编码器(VAE)和生成对抗网络(GAN)的VAE-GAN样本增广模型;(2)改进分类算法,提出基于焦点损失(FL)和卷...针对滚动轴承故障诊断中样本分布不均衡引起的模型泛化能力差、诊断精度低的问题,从两个方面展开研究:(1)故障样本增广,提出结合变分自编码器(VAE)和生成对抗网络(GAN)的VAE-GAN样本增广模型;(2)改进分类算法,提出基于焦点损失(FL)和卷积神经网络(CNN)的FLCNN(focal loss and convolutional neural network)样本分类模型。在此基础上,将VAE-GAN和FLCNN融合,构建VAE-GAN+FLCNN轴承故障诊断模型。首先,将样本量少的故障类输入VAE-GAN模型,通过交替训练编码网络、生成网络和判别网络,学习出真实故障样本的数据分布,从而实现故障样本的增广;然后用增广后的数据样本训练FLCNN分类模型,完成轴承故障识别。试验对比结果表明,所提方法能够有效提升样本不均衡条件下的轴承故障诊断效果,拥有更高的Recall值和F1-score值。展开更多
Generative AI models for music and the arts in general are increasingly complex and hard to understand.The field of ex-plainable AI(XAI)seeks to make complex and opaque AI models such as neural networks more understan...Generative AI models for music and the arts in general are increasingly complex and hard to understand.The field of ex-plainable AI(XAI)seeks to make complex and opaque AI models such as neural networks more understandable to people.One ap-proach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on gen-erative AI models.This paper contributes a systematic examination of the impact that different combinations of variational auto-en-coder models(measureVAE and adversarialVAE),configurations of latent space in the AI model(from 4 to 256 latent dimensions),and training datasets(Irish folk,Turkish folk,classical,and pop)have on music generation performance when 2 or 4 meaningful musical at-tributes are imposed on the generative model.To date,there have been no systematic comparisons of such models at this level of com-binatorial detail.Our findings show that measureVAE has better reconstruction performance than adversarialVAE which has better musical attribute independence.Results demonstrate that measureVAE was able to generate music across music genres with inter-pretable musical dimensions of control,and performs best with low complexity music such as pop and rock.We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using measureVAE to generate music across genres.Our res-ults are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models,musical features,and datasets for more understandable generation of music.展开更多
基金Lanzhou Talent Innovation and Entrepreneurship Project(No.2020-RC-14)。
文摘Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.
基金supported by the National Natural Science Foundation of China(No.52272390)the Natural Science Foundation of Heilongjiang Province of China(No.YQ2022A009)the Shanghai Sailing Program,China(No.20YF1417300).
文摘Real-time 6 Degree-of-Freedom(DoF)pose estimation is of paramount importance for various on-orbit tasks.Benefiting from the development of deep learning,Convolutional Neural Networks(CNNs)in feature extraction has yielded impressive achievements for spacecraft pose estimation.To improve the robustness and interpretability of CNNs,this paper proposes a Pose Estimation approach based on Variational Auto-Encoder structure(PE-VAE)and a Feature-Aided pose estimation approach based on Variational Auto-Encoder structure(FA-VAE),which aim to accurately estimate the 6 DoF pose of a target spacecraft.Both methods treat the pose vector as latent variables,employing an encoder-decoder network with a Variational Auto-Encoder(VAE)structure.To enhance the precision of pose estimation,PE-VAE uses the VAE structure to introduce reconstruction mechanism with the whole image.Furthermore,FA-VAE enforces feature shape constraints by exclusively reconstructing the segment of the target spacecraft with the desired shape.Comparative evaluation against leading methods on public datasets reveals similar accuracy with a threefold improvement in processing speed,showcasing the significant contribution of VAE structures to accuracy enhancement,and the additional benefit of incorporating global shape prior features.
文摘The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic variants in time. Here, we built a stacked auto-encoder (SAE) model for predicting the antigenic variant of human influenza A(H3N2) viruses based on the hemagglutinin (HA) protein sequences. The model achieved an accuracy of 0.95 in five-fold cross-validations, better than the logistic regression model did. Further analysis of the model shows that most of the active nodes in the hidden layer reflected the combined contribution of multiple residues to antigenic variation. Besides, some features (residues on HA protein) in the input layer were observed to take part in multiple active nodes, such as residue 189, 145 and 156, which were also reported to mostly determine the antigenic variation of influenza A(H3N2) viruses. Overall,this work is not only useful for rapidly identifying antigenic variants in influenza prevention, but also an interesting attempt in inferring the mechanisms of biological process through analysis of SAE model, which may give some insights into interpretation of the deep learning
基金supported by the National Key Research and Development Program of China(No.2024YFD1201500)the Key Research and Development Program of Jiangsu Province,China(No.BE2022337,BE2023302,and BE2023315)the National Innovation Center for Digital Seed Industry,Beijing,China,100097.
文摘Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security.The advent of Genomic Selection heralds a new epoch in breeding,characterized by its capacity to harness whole-genome variation for genomic prediction.This approach transcends the need for prior knowledge of genes associated with specific traits.Nonetheless,the vast dimensionality of genomic data juxtaposed with the relatively limited number of phenotypic samples often leads to the“curse of dimensionality”,where traditional statistical,machine learning,and deep learning methods are prone to overfitting and suboptimal predictive performance.To surmount this challenge,we introduce a unified Variational auto-encoder based Multi-task Genomic Prediction model(VMGP)that integrates self-supervised genomic compression and reconstruction with multiple prediction tasks.This approach provides a robust solution,offering a formidable predictive framework that has been rigorously validated across public datasets for wheat,rice,and maize.Our model demonstrates exceptional capabilities in multi-phenotype and multi-environment genomic prediction,successfully navigating the complexities of cross-population genomic selection and underscoring its unique strengths and utility.Furthermore,by integrating VMGP with model interpretability,we can effectively triage relevant single nucleotide polymorphisms,thereby enhancing prediction performance and proposing potential cost-effective genotyping solutions.The VMGP framework,with its simplicity,stable predictive prowess,and open-source code,is exceptionally well-suited for broad dissemination within plant breeding programs.It is particularly advantageous for breeders who prioritize phenotype prediction yet may not possess extensive knowledge in deep learning or proficiency in parameter tuning.
基金sponsored by Science and Technology Program of Sichuan Province(2024ZDZX0035 and 2024ZHCG0072)。
文摘The Proton Exchange Membrane Fuel Cell(PEMFC)converts the chemical energy of hydrogen fuel directly into electrical energy with broad application prospects.Understanding how current density is distributed in the PEMFC systems is crucial as it is a key factor influencing system performance.However,direct modeling for current distribution may encounter the challenge of dimensional catastrophe owing to the high dimensionality of the data.This paper uses a high-resolution segmented measurement device with 396 points to conduct experimental tests on the current distribution of a PEMFC with reactive area of 406 cm^(2) during a stepwise increase in load current.The current distribution is modeled based on the test results to learn the mapping relationship between the experimental parameters and the current distribution.The proposed model utilizes a Conditional Variational Auto-Encoder(CVAE)to generate current distributions.The MSE(Mean-Square Error)of the trained CVAE model reaches 9.2×10^(-5),and the comparison results show that the 222.9A current distribution error has the largest MSE of 6.36×10^(-4) and a KL Divergence(Kullback-Leibler Divergence)of 9.55×10^(-4),both of which are at a low level.This model enables the direct determination of the current distribution based on the experimental parameters,thereby establishing a technical foundation for investigating the impact of experimental conditions on fuel cells.This model is also of great significance for research on fuel cell system control strategies and fault diagnosis.
文摘针对滚动轴承故障诊断中样本分布不均衡引起的模型泛化能力差、诊断精度低的问题,从两个方面展开研究:(1)故障样本增广,提出结合变分自编码器(VAE)和生成对抗网络(GAN)的VAE-GAN样本增广模型;(2)改进分类算法,提出基于焦点损失(FL)和卷积神经网络(CNN)的FLCNN(focal loss and convolutional neural network)样本分类模型。在此基础上,将VAE-GAN和FLCNN融合,构建VAE-GAN+FLCNN轴承故障诊断模型。首先,将样本量少的故障类输入VAE-GAN模型,通过交替训练编码网络、生成网络和判别网络,学习出真实故障样本的数据分布,从而实现故障样本的增广;然后用增广后的数据样本训练FLCNN分类模型,完成轴承故障识别。试验对比结果表明,所提方法能够有效提升样本不均衡条件下的轴承故障诊断效果,拥有更高的Recall值和F1-score值。
文摘Generative AI models for music and the arts in general are increasingly complex and hard to understand.The field of ex-plainable AI(XAI)seeks to make complex and opaque AI models such as neural networks more understandable to people.One ap-proach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on gen-erative AI models.This paper contributes a systematic examination of the impact that different combinations of variational auto-en-coder models(measureVAE and adversarialVAE),configurations of latent space in the AI model(from 4 to 256 latent dimensions),and training datasets(Irish folk,Turkish folk,classical,and pop)have on music generation performance when 2 or 4 meaningful musical at-tributes are imposed on the generative model.To date,there have been no systematic comparisons of such models at this level of com-binatorial detail.Our findings show that measureVAE has better reconstruction performance than adversarialVAE which has better musical attribute independence.Results demonstrate that measureVAE was able to generate music across music genres with inter-pretable musical dimensions of control,and performs best with low complexity music such as pop and rock.We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using measureVAE to generate music across genres.Our res-ults are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models,musical features,and datasets for more understandable generation of music.