Accurately predicting geomagnetic field is of great significance for space environment monitoring and space weather forecasting worldwide.This paper proposes a vision Transformer(ViT)hybrid model that leverages aurora...Accurately predicting geomagnetic field is of great significance for space environment monitoring and space weather forecasting worldwide.This paper proposes a vision Transformer(ViT)hybrid model that leverages aurora images to predict local geomagnetic station component,breaking the spatial limitations of geomagnetic stations.Our method utilizes the ViT backbone model in combination with convolutional networks to capture both the large-scale spatial correlation and distinct local feature correlation between aurora images and geomagnetic station data.Essentially,the model comprises a visual geometry group(VGG)image feature extraction network,a ViT-based encoder network,and a regression prediction network.Our experimental findings indicate that global features of aurora images play a more substantial role in predicting geomagnetic data than local features.Specifically,the hybrid model achieves a 39.1%reduction in root mean square error compared to the VGG model,a 29.5%reduction compared to the ViT model and a 35.3%reduction relative to the residual network(ResNet)model.Moreover,the fitting accuracy of the model surpasses that of the VGG,ViT,and ResNet models by 2.14%1.58%,and 4.1%,respectively.展开更多
针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对Efficie...针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。展开更多
在HBIM(Historic Building Information Modeling)数据库中进行信息查询面临三个问题:一是没有普适性的规则判断建筑之间的相似性;二是未考虑建筑本身所包含的历史文化信息;三是查询文本多基于关键词,难以检索到关键词未包含的信息。针...在HBIM(Historic Building Information Modeling)数据库中进行信息查询面临三个问题:一是没有普适性的规则判断建筑之间的相似性;二是未考虑建筑本身所包含的历史文化信息;三是查询文本多基于关键词,难以检索到关键词未包含的信息。针对以上问题,提出了一种面向历史建筑的多模态检索方法,用户能通过输入图像或自然语言文本数据,检索到与输入特征相符的建筑,并以列表形式进行排序。在以图像检索建筑时,利用“dino_vit16”模型对图像进行特征提取,所提出的图像-建筑检索方法检索精度达90.08%;在文本检索建筑时则基于CLIP(Contrastive Language-Image Pre-training)模型建立图像和文本的关联,研究了图文相似度和文本相似度权重的取值,选择m=0.6,n=0.4作为权重的最佳配置。实验证明所提出的文本-建筑检索算法对于包含某种外观特征查询语句的检索效果最好,对于描述某种功能和建筑风格的查询语句检索效果最差,而当查询语句中包含4个以上的混合特征,能够描述出建筑的基本面貌时,可以准确地检索到符合条件的建筑。展开更多
基金supported by the National Natural Science Foundation of China(No.41471381)the General Project of Jiangsu Natural Science Foundation(No.BK20171410)the Major Scientific and Technological Achievements Cultivation Fund of Nanjing University of Aeronautics and Astronautics(No.1011-XBD23002)。
文摘Accurately predicting geomagnetic field is of great significance for space environment monitoring and space weather forecasting worldwide.This paper proposes a vision Transformer(ViT)hybrid model that leverages aurora images to predict local geomagnetic station component,breaking the spatial limitations of geomagnetic stations.Our method utilizes the ViT backbone model in combination with convolutional networks to capture both the large-scale spatial correlation and distinct local feature correlation between aurora images and geomagnetic station data.Essentially,the model comprises a visual geometry group(VGG)image feature extraction network,a ViT-based encoder network,and a regression prediction network.Our experimental findings indicate that global features of aurora images play a more substantial role in predicting geomagnetic data than local features.Specifically,the hybrid model achieves a 39.1%reduction in root mean square error compared to the VGG model,a 29.5%reduction compared to the ViT model and a 35.3%reduction relative to the residual network(ResNet)model.Moreover,the fitting accuracy of the model surpasses that of the VGG,ViT,and ResNet models by 2.14%1.58%,and 4.1%,respectively.
文摘针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。
文摘在HBIM(Historic Building Information Modeling)数据库中进行信息查询面临三个问题:一是没有普适性的规则判断建筑之间的相似性;二是未考虑建筑本身所包含的历史文化信息;三是查询文本多基于关键词,难以检索到关键词未包含的信息。针对以上问题,提出了一种面向历史建筑的多模态检索方法,用户能通过输入图像或自然语言文本数据,检索到与输入特征相符的建筑,并以列表形式进行排序。在以图像检索建筑时,利用“dino_vit16”模型对图像进行特征提取,所提出的图像-建筑检索方法检索精度达90.08%;在文本检索建筑时则基于CLIP(Contrastive Language-Image Pre-training)模型建立图像和文本的关联,研究了图文相似度和文本相似度权重的取值,选择m=0.6,n=0.4作为权重的最佳配置。实验证明所提出的文本-建筑检索算法对于包含某种外观特征查询语句的检索效果最好,对于描述某种功能和建筑风格的查询语句检索效果最差,而当查询语句中包含4个以上的混合特征,能够描述出建筑的基本面貌时,可以准确地检索到符合条件的建筑。