针对复杂背景中行人小目标的检测精度低以及检测不及时的问题,提出了一种改进的Mamba行人小目标检测方法。首先,在主干网络中将标准卷积替换成感受野注意力卷积(RFAConv),通过动态感受野调整了模型对多尺度特征的捕捉能力,同时优化了计...针对复杂背景中行人小目标的检测精度低以及检测不及时的问题,提出了一种改进的Mamba行人小目标检测方法。首先,在主干网络中将标准卷积替换成感受野注意力卷积(RFAConv),通过动态感受野调整了模型对多尺度特征的捕捉能力,同时优化了计算效率。其次,将注意力机制融入视觉状态空间模型(Visual State Space Model,VSSM)中,实现行人小目标多尺度特征的提取。最后,在颈部利用特征增强模块(Feature Enhancement Module,FEM)和双向金字塔模型实现多尺度特征融合。实验结果表明:在HIT-UAV数据集上,改进的Mamba模型实现了81.25%的准确率(以mAP@0.5为标准),比现有的大型模型如YOLOv5、YOLOv8、YOLOv11高出15%以上。展开更多
卫星捕获的遥感数据容易受到成像过程中悬浮粒子的影响而造成图像雾化现象,极大地影响遥感图像的清晰度。为了弥补这一不足,遥感图像去雾(RSID)非常必要。最近兴起的状态空间模型State Space Model(SSM)在建模线性复杂性和远程依赖关系...卫星捕获的遥感数据容易受到成像过程中悬浮粒子的影响而造成图像雾化现象,极大地影响遥感图像的清晰度。为了弥补这一不足,遥感图像去雾(RSID)非常必要。最近兴起的状态空间模型State Space Model(SSM)在建模线性复杂性和远程依赖关系方面的性能卓越,受其启发,笔者设计了一种基于CSC-Mamba(Cross-Shaped Convolutional Mamba Model)视觉模型遥感图像去雾技术。该技术基于SSM设计了RSMamba模块,利用其线性复杂性来实现全局上下文编码,大大降低了模型的复杂度。同时,利用卷积神经网络CNN以及基于自注意力机制设计CSwin模块来聚合不同方向域上的特征,以有效地感知雾分布的空间变化特征。通过这种方式,CSC-Mamba能够更好地提取雾特征,从而有效地去除雾对遥感图像的影响。通过在SateHaze1K公共数据集上的实验,结果表明本CSC-Mamba模型遥感图像去雾技术不仅具有较好的轻量化特征的同时性,还具有较高的去雾效果。展开更多
Change detection(CD)plays a crucial role in numerous fields,where both convolutional neural networks(CNNs)and Transformers have demonstrated exceptional performance in CD tasks.However,CNNs suffer from limited recepti...Change detection(CD)plays a crucial role in numerous fields,where both convolutional neural networks(CNNs)and Transformers have demonstrated exceptional performance in CD tasks.However,CNNs suffer from limited receptive fields,hindering their ability to capture global features,while Transformers are constrained by high computational complexity.Recently,Mamba architecture,which is based on state space models(SSMs),has shown powerful global modeling capabilities while achieving linear computational complexity.Although some researchers have incorporated Mamba into CD tasks,the existing Mamba⁃based remote sensing CD methods struggle to effectively perceive the inherent locality of changed regions when flattening and scanning remote sensing images,leading to limitations in extracting change features.To address these issues,we propose a novel Mamba⁃based CD method termed difference feature fusion Mamba model(DFFMamba)by mitigating the loss of feature locality caused by traditional Mamba⁃style scanning.Specifically,two distinct difference feature extraction modules are designed:Difference Mamba(DMamba)and local difference Mamba(LDMamba),where DMamba extracts difference features by calculating the difference in coefficient matrices between the state⁃space equations of the bi⁃temporal features.Building upon DMamba,LDMamba combines a locally adaptive state⁃space scanning(LASS)strategy to enhance feature locality so as to accurately extract difference features.Additionally,a fusion Mamba(FMamba)module is proposed,which employs a spatial⁃channel token modeling SSM(SCTMS)unit to integrate multi⁃dimensional spatio⁃temporal interactions of change features,thereby capturing their dependencies across both spatial and channel dimensions.To verify the effectiveness of the proposed DFFMamba,extensive experiments are conducted on three datasets of WHU⁃CD,LEVIR⁃CD,and CLCD.The results demonstrate that DFFMamba significantly outperforms state⁃of⁃the⁃art CD methods,achieving intersection over union(IoU)scores of 90.67%,85.04%,and 66.56%on the three datasets,respectively.展开更多
Brain tumors,one of the most lethal diseases with low survival rates,require early detection and accurate diagnosis to enable effective treatment planning.While deep learning architectures,particularly Convolutional N...Brain tumors,one of the most lethal diseases with low survival rates,require early detection and accurate diagnosis to enable effective treatment planning.While deep learning architectures,particularly Convolutional Neural Networks(CNNs),have shown significant performance improvements over traditional methods,they struggle to capture the subtle pathological variations between different brain tumor types.Recent attention-based models have attempted to address this by focusing on global features,but they come with high computational costs.To address these challenges,this paper introduces a novel parallel architecture,ParMamba,which uniquely integrates Convolutional Attention Patch Embedding(CAPE)and the Conv Mamba block including CNN,Mamba and the channel enhancement module,marking a significant advancement in the field.The unique design of ConvMamba block enhances the ability of model to capture both local features and long-range dependencies,improving the detection of subtle differences between tumor types.The channel enhancement module refines feature interactions across channels.Additionally,CAPE is employed as a downsampling layer that extracts both local and global features,further improving classification accuracy.Experimental results on two publicly available brain tumor datasets demonstrate that ParMamba achieves classification accuracies of 99.62%and 99.35%,outperforming existing methods.Notably,ParMamba surpasses vision transformers(ViT)by 1.37%in accuracy,with a throughput improvement of over 30%.These results demonstrate that ParMamba delivers superior performance while operating faster than traditional attention-based methods.展开更多
密集视频描述旨在从视频中提取多个关键事件并生成连贯的文本描述,可广泛应用于自动讲解、人机交互、视频检索以及辅助视障人士日常生活等场景。现有方法存在对视频中短时、长时等多尺度事件特征提取不足,以及视频重复帧或相似帧特征信...密集视频描述旨在从视频中提取多个关键事件并生成连贯的文本描述,可广泛应用于自动讲解、人机交互、视频检索以及辅助视障人士日常生活等场景。现有方法存在对视频中短时、长时等多尺度事件特征提取不足,以及视频重复帧或相似帧特征信息冗余的问题,这导致现有方法生成的视频描述缺失细节信息,连贯性和准确性较低。针对这一问题,研究提出了一种基于Mamba多尺度特征提取的密集视频描述模型(Mamba Multi-scale Feature Extraction for dense video caption,MMFE)。首先,提出Mamba多尺度特征提取模块,利用Mamba增加长程依赖捕捉能力,并通过多层次特征提取和融合,解决了对短时、长时等多尺度事件特征提取不足问题;其次,引入趋势感知注意力,通过重点关注有显著语义变化的关键帧,解决重复帧或相似帧特征信息冗余,提升特征表达的准确性;然后,加入事件差异损失函数,促使模型关注长视频中不同内容事件的特征差异,提高对多样化事件的分辨以及预测能力;最后,在描述头中引入跳跃连接,将先前生成的历史描述文本选择性融入到当前解码过程,通过参考整体视频叙事脉络补充上下文信息,提高模型对全局信息的理解能力。在ActivityNet Captions数据集的实验结果表明,针对短时、长时等不同时长事件定位任务,MMFE的召回率、准确率和F1值分别为59.85%、60.45%和60.15%,较次优方法 PDVC提升了4.43%、2.38%和3.44%。针对多样化事件文本描述任务,MMFE的BLEU4、CIDEr和METEOR分别为2.67%、37.78%和8.79%,较次优方法PDVC提升0.71%、9.19%和0.71%。这表明MMFE所生成的视频描述更加准确,可为提高网络信息传播效率、增强信息安全监管能力以及推动智能社会建设提供有效工具。展开更多
文摘针对复杂背景中行人小目标的检测精度低以及检测不及时的问题,提出了一种改进的Mamba行人小目标检测方法。首先,在主干网络中将标准卷积替换成感受野注意力卷积(RFAConv),通过动态感受野调整了模型对多尺度特征的捕捉能力,同时优化了计算效率。其次,将注意力机制融入视觉状态空间模型(Visual State Space Model,VSSM)中,实现行人小目标多尺度特征的提取。最后,在颈部利用特征增强模块(Feature Enhancement Module,FEM)和双向金字塔模型实现多尺度特征融合。实验结果表明:在HIT-UAV数据集上,改进的Mamba模型实现了81.25%的准确率(以mAP@0.5为标准),比现有的大型模型如YOLOv5、YOLOv8、YOLOv11高出15%以上。
基金supported by the National Natural Science Foundation of China(Nos.42371449,41801386).
文摘Change detection(CD)plays a crucial role in numerous fields,where both convolutional neural networks(CNNs)and Transformers have demonstrated exceptional performance in CD tasks.However,CNNs suffer from limited receptive fields,hindering their ability to capture global features,while Transformers are constrained by high computational complexity.Recently,Mamba architecture,which is based on state space models(SSMs),has shown powerful global modeling capabilities while achieving linear computational complexity.Although some researchers have incorporated Mamba into CD tasks,the existing Mamba⁃based remote sensing CD methods struggle to effectively perceive the inherent locality of changed regions when flattening and scanning remote sensing images,leading to limitations in extracting change features.To address these issues,we propose a novel Mamba⁃based CD method termed difference feature fusion Mamba model(DFFMamba)by mitigating the loss of feature locality caused by traditional Mamba⁃style scanning.Specifically,two distinct difference feature extraction modules are designed:Difference Mamba(DMamba)and local difference Mamba(LDMamba),where DMamba extracts difference features by calculating the difference in coefficient matrices between the state⁃space equations of the bi⁃temporal features.Building upon DMamba,LDMamba combines a locally adaptive state⁃space scanning(LASS)strategy to enhance feature locality so as to accurately extract difference features.Additionally,a fusion Mamba(FMamba)module is proposed,which employs a spatial⁃channel token modeling SSM(SCTMS)unit to integrate multi⁃dimensional spatio⁃temporal interactions of change features,thereby capturing their dependencies across both spatial and channel dimensions.To verify the effectiveness of the proposed DFFMamba,extensive experiments are conducted on three datasets of WHU⁃CD,LEVIR⁃CD,and CLCD.The results demonstrate that DFFMamba significantly outperforms state⁃of⁃the⁃art CD methods,achieving intersection over union(IoU)scores of 90.67%,85.04%,and 66.56%on the three datasets,respectively.
基金supported by the Outstanding Youth Science and Technology Innovation Team Project of Colleges and Universities in Hubei Province(Grant no.T201923)Key Science and Technology Project of Jingmen(Grant nos.2021ZDYF024,2022ZDYF019)Cultivation Project of Jingchu University of Technology(Grant no.PY201904).
文摘Brain tumors,one of the most lethal diseases with low survival rates,require early detection and accurate diagnosis to enable effective treatment planning.While deep learning architectures,particularly Convolutional Neural Networks(CNNs),have shown significant performance improvements over traditional methods,they struggle to capture the subtle pathological variations between different brain tumor types.Recent attention-based models have attempted to address this by focusing on global features,but they come with high computational costs.To address these challenges,this paper introduces a novel parallel architecture,ParMamba,which uniquely integrates Convolutional Attention Patch Embedding(CAPE)and the Conv Mamba block including CNN,Mamba and the channel enhancement module,marking a significant advancement in the field.The unique design of ConvMamba block enhances the ability of model to capture both local features and long-range dependencies,improving the detection of subtle differences between tumor types.The channel enhancement module refines feature interactions across channels.Additionally,CAPE is employed as a downsampling layer that extracts both local and global features,further improving classification accuracy.Experimental results on two publicly available brain tumor datasets demonstrate that ParMamba achieves classification accuracies of 99.62%and 99.35%,outperforming existing methods.Notably,ParMamba surpasses vision transformers(ViT)by 1.37%in accuracy,with a throughput improvement of over 30%.These results demonstrate that ParMamba delivers superior performance while operating faster than traditional attention-based methods.
文摘密集视频描述旨在从视频中提取多个关键事件并生成连贯的文本描述,可广泛应用于自动讲解、人机交互、视频检索以及辅助视障人士日常生活等场景。现有方法存在对视频中短时、长时等多尺度事件特征提取不足,以及视频重复帧或相似帧特征信息冗余的问题,这导致现有方法生成的视频描述缺失细节信息,连贯性和准确性较低。针对这一问题,研究提出了一种基于Mamba多尺度特征提取的密集视频描述模型(Mamba Multi-scale Feature Extraction for dense video caption,MMFE)。首先,提出Mamba多尺度特征提取模块,利用Mamba增加长程依赖捕捉能力,并通过多层次特征提取和融合,解决了对短时、长时等多尺度事件特征提取不足问题;其次,引入趋势感知注意力,通过重点关注有显著语义变化的关键帧,解决重复帧或相似帧特征信息冗余,提升特征表达的准确性;然后,加入事件差异损失函数,促使模型关注长视频中不同内容事件的特征差异,提高对多样化事件的分辨以及预测能力;最后,在描述头中引入跳跃连接,将先前生成的历史描述文本选择性融入到当前解码过程,通过参考整体视频叙事脉络补充上下文信息,提高模型对全局信息的理解能力。在ActivityNet Captions数据集的实验结果表明,针对短时、长时等不同时长事件定位任务,MMFE的召回率、准确率和F1值分别为59.85%、60.45%和60.15%,较次优方法 PDVC提升了4.43%、2.38%和3.44%。针对多样化事件文本描述任务,MMFE的BLEU4、CIDEr和METEOR分别为2.67%、37.78%和8.79%,较次优方法PDVC提升0.71%、9.19%和0.71%。这表明MMFE所生成的视频描述更加准确,可为提高网络信息传播效率、增强信息安全监管能力以及推动智能社会建设提供有效工具。