The integration of genotypic and environmental data can enhance genomic prediction accuracy for crop field traits.Existing genomic prediction methods fail to consider environmental factors and the real growth environm...The integration of genotypic and environmental data can enhance genomic prediction accuracy for crop field traits.Existing genomic prediction methods fail to consider environmental factors and the real growth environments of crops,resulting in low genomic prediction accuracy.In this work,we developed GEFormer,a genotype-environment interaction genomic prediction method that integrates gating multilayer perceptron(gMLP)and linear attention mechanisms.First,GEFormer uses gMLP to extract local and global features among SNPs.Then,Omni-dimensional Dynamic Convolution is used to extract the dynamic and comprehensive features of multiple environmental factors within each day,taking into consideration the real growth pattern of crops.A linear attention mechanism is used to capture the temporal features of environmental changes.Finally,GEFormer uses a gating mechanism to effectively fuse the genomic and environmental features.We examined the accuracy of GEFormer for predicting important agronomic traits of maize,rice,and wheat under three experimental scenarios:untested genotypes in tested environments,tested genotypes in untested environments,and untested genotypes in untested environments.The results showed that GEFormer outperforms six cutting-edge statistical learning methods and four machine learning methods,especially with great advantages under the scenario of untested genotypes in untested environments.In addition,we used GEFormer for three realworld breeding applications:phenotype prediction in unknown environments,hybrid phenotype prediction using an inbred population,and cross-population phenotype prediction.The results showed that GEFormer had better prediction performance in actual breeding scenarios and could be used to assist in crop breeding.展开更多
In this paper,we introduce TianXing,a transformer-based data-driven model designed with physical augmentation for skillful and efficient global weather forecasting.Previous data-driven transformer models such as Pangu...In this paper,we introduce TianXing,a transformer-based data-driven model designed with physical augmentation for skillful and efficient global weather forecasting.Previous data-driven transformer models such as Pangu-Weather,FengWu,and FuXi have emerged as promising alternatives for numerical weather prediction in weather forecasting.However,these models have been characterized by their substantial computational resource consumption during training and limited incorporation of explicit physical guidance in their modeling frameworks.In contrast,TianXing applies a linear complexity mechanism that ensures proportional scalability with input data size while significantly diminishing GPU resource demands,with only a marginal compromise in accuracy.Furthermore,TianXing proposes an explicit attention decay mechanism in the linear attention derived from physical insights to enhance its forecasting skill.The mechanism can reweight attention based on Earth's spherical distances and learned sparse multivariate coupling relationships,promptingTianXing to prioritize dynamically relevant neighboring features.Finally,to enhance its performance in mediumrange forecasting,TianXing employs a stacked autoregressive forecast algorithm.Validation of the model's architecture is conducted using ERA5 reanalysis data at a 5.625°latitude-longitude resolution,while a high-resolution dataset at 0.25°is utilized for training the actual forecasting model.Notably,the TianXing exhibits excellent performance,particularly in the Z500(geopotential height)and T850(temperature)fields,surpassing previous data-driven models and operational fullresolution models such as NCEP GFS and ECMWF IFS,as evidenced by latitude-weighted RMSE and ACC metrics.Moreover,the TianXing has demonstrated remarkable capabilities in predicting extreme weather events,such as typhoons.展开更多
Radiation dose reduction in computed tomography(CT)can be achieved by decreasing the number of projections.However,reconstructing CT images via filtered back projection algorithm from sparse-view projections often con...Radiation dose reduction in computed tomography(CT)can be achieved by decreasing the number of projections.However,reconstructing CT images via filtered back projection algorithm from sparse-view projections often contains severe streak artifacts,affecting clinical diagnosis.To address this issue,this paper proposes TransitNet,an iterative unrolling deep neural network that combines model-driven data consistency,a physical a prior constraint,with deep learning’s feature extraction capabilities.TransitNet employs a novel iterative architecture,implementing flexible physical constraints through learnable data consistency operations,utilizing Transformer’s self-attention mechanism to model long-range dependencies in image features,and introducing linear attention mechanisms to reduce self-attention’s computational complexity from quadratic to linear.Extensive experiments demonstrate that this method exhibits significant advantages in both reconstruction quality and computational efficiency,effectively suppressing streak artifacts while preserving structures and details of images.展开更多
The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matc...The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency and accuracy of visual descriptors.Based on high matching sparseness and coarse-to-fine covisible area detection,FilterGNN utilizes cascaded optimal graph-matching filter modules to dynamically reject outlier matches.Moreover,we successfully adapted linear attention in FilterGNN with post-instance normalization support,which significantly reduces the complexity of complete graph learning from O(N2)to O(N).Experiments show that FilterGNN requires only 6%of the time cost and 33.3%of the memory cost compared with SuperGlue under a large-scale input size and achieves a competitive performance in various tasks,such as pose estimation,visual localization,and sparse 3D reconstruction.展开更多
As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately l...As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.展开更多
在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with...在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with a Single Head)与卷积神经网络块结合.卷积块中采用Inception V2网络,提取语音信号多尺度的局部特征.再通过Coordinate Attention机制捕获特征的位置信息和多通道之间的相互关联.此外,采用深度可分离卷积,用于特征增强和层间平滑过渡.为了使其可流式化处理音频,采用RNN-T(Recurrent Neural Network Transducer)架构进行训练与解码.将当前块已经计算的全局注意力作为隐变量,传入后续块中,串联各块信息,保留训练的并行性和相关性,并且不会随着序列的增长而消耗计算资源.在开源数据集THCHS30上进行训练与测试,CFLASH-Transducer取得较高的识别率.并且相比离线识别,流式识别精度损失不超过1%.展开更多
Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational res...Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe.展开更多
基金supported by the Biological Breeding-National Science and Technology Major Project(2023ZD04076)the Hubei Provincial Natural Science Foundation(2023AFB832)+2 种基金the Natural Science Foundation of Guizhou Province Science and Technology Agency(ZK[2025]096)the Major Project of Hubei Hongshan Laboratory(2022HSZD031)the Yingzi Tech&Huazhong Agricultural University Intelligent Research Institute of Food Health(IRIFH202209).
文摘The integration of genotypic and environmental data can enhance genomic prediction accuracy for crop field traits.Existing genomic prediction methods fail to consider environmental factors and the real growth environments of crops,resulting in low genomic prediction accuracy.In this work,we developed GEFormer,a genotype-environment interaction genomic prediction method that integrates gating multilayer perceptron(gMLP)and linear attention mechanisms.First,GEFormer uses gMLP to extract local and global features among SNPs.Then,Omni-dimensional Dynamic Convolution is used to extract the dynamic and comprehensive features of multiple environmental factors within each day,taking into consideration the real growth pattern of crops.A linear attention mechanism is used to capture the temporal features of environmental changes.Finally,GEFormer uses a gating mechanism to effectively fuse the genomic and environmental features.We examined the accuracy of GEFormer for predicting important agronomic traits of maize,rice,and wheat under three experimental scenarios:untested genotypes in tested environments,tested genotypes in untested environments,and untested genotypes in untested environments.The results showed that GEFormer outperforms six cutting-edge statistical learning methods and four machine learning methods,especially with great advantages under the scenario of untested genotypes in untested environments.In addition,we used GEFormer for three realworld breeding applications:phenotype prediction in unknown environments,hybrid phenotype prediction using an inbred population,and cross-population phenotype prediction.The results showed that GEFormer had better prediction performance in actual breeding scenarios and could be used to assist in crop breeding.
基金supported in part by the Meteorological Joint Funds of the National Natural Science Foundation of China under Grant U2142211in part by the National Natural Science Foundation of China under Grant 42075141,42341202+2 种基金in part by the National Key Research and Development Program of China under Grant 2020YFA0608000in part by the Shanghai Municipal Science and Technology Major Project(2021SHZDZX0100)the Fundamental Research Funds for the Central Universities。
文摘In this paper,we introduce TianXing,a transformer-based data-driven model designed with physical augmentation for skillful and efficient global weather forecasting.Previous data-driven transformer models such as Pangu-Weather,FengWu,and FuXi have emerged as promising alternatives for numerical weather prediction in weather forecasting.However,these models have been characterized by their substantial computational resource consumption during training and limited incorporation of explicit physical guidance in their modeling frameworks.In contrast,TianXing applies a linear complexity mechanism that ensures proportional scalability with input data size while significantly diminishing GPU resource demands,with only a marginal compromise in accuracy.Furthermore,TianXing proposes an explicit attention decay mechanism in the linear attention derived from physical insights to enhance its forecasting skill.The mechanism can reweight attention based on Earth's spherical distances and learned sparse multivariate coupling relationships,promptingTianXing to prioritize dynamically relevant neighboring features.Finally,to enhance its performance in mediumrange forecasting,TianXing employs a stacked autoregressive forecast algorithm.Validation of the model's architecture is conducted using ERA5 reanalysis data at a 5.625°latitude-longitude resolution,while a high-resolution dataset at 0.25°is utilized for training the actual forecasting model.Notably,the TianXing exhibits excellent performance,particularly in the Z500(geopotential height)and T850(temperature)fields,surpassing previous data-driven models and operational fullresolution models such as NCEP GFS and ECMWF IFS,as evidenced by latitude-weighted RMSE and ACC metrics.Moreover,the TianXing has demonstrated remarkable capabilities in predicting extreme weather events,such as typhoons.
基金National Natural Science Foundation of China under grant (62071281)Local Science and Technology Development Fund Project Guided by the Central Government under grant (YDZJSX2021A003)。
文摘Radiation dose reduction in computed tomography(CT)can be achieved by decreasing the number of projections.However,reconstructing CT images via filtered back projection algorithm from sparse-view projections often contains severe streak artifacts,affecting clinical diagnosis.To address this issue,this paper proposes TransitNet,an iterative unrolling deep neural network that combines model-driven data consistency,a physical a prior constraint,with deep learning’s feature extraction capabilities.TransitNet employs a novel iterative architecture,implementing flexible physical constraints through learnable data consistency operations,utilizing Transformer’s self-attention mechanism to model long-range dependencies in image features,and introducing linear attention mechanisms to reduce self-attention’s computational complexity from quadratic to linear.Extensive experiments demonstrate that this method exhibits significant advantages in both reconstruction quality and computational efficiency,effectively suppressing streak artifacts while preserving structures and details of images.
基金supported by the National Natural Science Foundation of China(Grant No.62220106003)Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.
文摘The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency and accuracy of visual descriptors.Based on high matching sparseness and coarse-to-fine covisible area detection,FilterGNN utilizes cascaded optimal graph-matching filter modules to dynamically reject outlier matches.Moreover,we successfully adapted linear attention in FilterGNN with post-instance normalization support,which significantly reduces the complexity of complete graph learning from O(N2)to O(N).Experiments show that FilterGNN requires only 6%of the time cost and 33.3%of the memory cost compared with SuperGlue under a large-scale input size and achieves a competitive performance in various tasks,such as pose estimation,visual localization,and sparse 3D reconstruction.
基金This study was funded by the Science and Technology Project in Xi’an(No.22GXFW0123)this work was supported by the Special Fund Construction Project of Key Disciplines in Ordinary Colleges and Universities in Shaanxi Province,the authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
文摘As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.
文摘在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with a Single Head)与卷积神经网络块结合.卷积块中采用Inception V2网络,提取语音信号多尺度的局部特征.再通过Coordinate Attention机制捕获特征的位置信息和多通道之间的相互关联.此外,采用深度可分离卷积,用于特征增强和层间平滑过渡.为了使其可流式化处理音频,采用RNN-T(Recurrent Neural Network Transducer)架构进行训练与解码.将当前块已经计算的全局注意力作为隐变量,传入后续块中,串联各块信息,保留训练的并行性和相关性,并且不会随着序列的增长而消耗计算资源.在开源数据集THCHS30上进行训练与测试,CFLASH-Transducer取得较高的识别率.并且相比离线识别,流式识别精度损失不超过1%.
基金National Natural Science Foundations of China(41671452)China Postdoctoral Science Foundation Funded Project(2017M612510)。
文摘Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe.