Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems,...Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.展开更多
Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in ...Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in the scene,due to the infuence of target adhesion and environmental reflection,this technology cannot estimate the pose accurately.A passive binocular optical motion capture technology under complex illumination based on binocular camera and fixed retroreflective marker balls has been proposed.By fixing multiple hemispherical retrorefective marker balls on a rigid base,it uses binocular camera for depth estimation to obtain the fixed position relationship between the feature points.After performing unsupervised state estimation without manual operation,it overcomes the infuence of refection spots in the background.Meanwhile,contour extraction and ellipse least square fitting are used to extract the marker balls with incomplete shape as the feature points,so as to solve the problem of target adhesion in the scene.A FANUC m10i-a robot moving with 6-DOF is used for verification using the above methods in a complex lighting environment of a welding laboratory.The result shows that the average of absolute position errors is 5.793mm,the average of absolute rotation errors is 1.997°the average of relative position errors is 0.972 mm,and the average of relative rotation errors is 0.002°.Therefore,this technology meets the requirements of high-precision measurement in a complex lighting environment when estimating the 6-DOF-motion mobile robot and has very significant application prospects in complex scenes.展开更多
Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise...Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise due to their low pixel proportion and limited available features,leading to detection failure.To address this problem,this paper proposes an Attention Shift-Invariant Cross-Evolutionary Feature Fusion Network(ASCFNet)tailored for the detection of infrared weak and small targets.The network architecture first designs a Multidimensional Lightweight Pixel-level Attention Module(MLPA),which alleviates the issue of small-target feature suppression during deep network propagation by combining channel reshaping,multi-scale parallel subnet architectures,and local cross-channel interactions.Then,a Multidimensional Shift-Invariant Recall Module(MSIR)is designed to ensure the network remains unaffected by minor input perturbations when processing infrared images,through focusing on the model’s shift invariance.Subsequently,a Cross-Evolutionary Feature Fusion structure(CEFF)is designed to allow flexible and efficient integration of multidimensional feature information from different network hierarchies,thereby achieving complementarity and enhancement among features.Experimental results on three public datasets,SIRST,NUDT-SIRST,and IRST640,demonstrate that our proposed network outperforms advanced algorithms in the field.Specifically,on the NUDT-SIRST dataset,the mAP50,mAP50-95,and metrics reached 99.26%,85.22%,and 99.31%,respectively.Visual evaluations of detection results in diverse scenarios indicate that our algorithm exhibits an increased detection rate and reduced false alarm rate.Our method balances accuracy and real-time performance,and achieves efficient and stable detection of infrared weak and small targets.展开更多
为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方...为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方向的信息丢失;引入RepBlock结构重参数化模块,在训练过程中采用多分支结构提高模型特征提取性能;添加P2检测层捕捉更多小目标细节特征,丰富网络内小目标的特征信息流;采用Dynamic Head自注意力机制检测头,将尺度感知、空间感知和任务感知自注意力机制融合到统一框架中,提高检测性能;采用基于层自适应幅度的剪枝(layer-adaptive magnitude based pruning,LAMP)算法,移除模型的冗余参数,构建YOLO-NPDL(Neck-ARW,P2,Dynamic Head,LAMP)车辆目标检测模型。以UA-DETRAC(university at Albany detection and tracking)数据集为试验数据集,分别进行RepBlock模块嵌入位置试验、不同颈部结构对比试验、剪枝试验、消融试验、模型性能对比试验,验证YOLO-NPDL模型的平均精度均值。试验结果表明:RepBlock模块同时嵌入辅助检测分支和颈部主干结构时对多尺度目标的特征提取能力更优,在训练过程中可保留更多的细节信息,但参数量和计算量均增大;采用Neck-ARW颈部结构后模型的平均精度均值E mAP50、E mAP50-95分别提高1.1%、1.7%,参数量减小约17.9%,结构较优;剪枝率为1.3时,模型参数量、计算量分别减小约38.0%、24.0%,冗余通道占比较少,结构较紧凑;与YOLOv8n模型相比,YOLO-NPDL模型在参数量基本相同的基础上,召回率增大2.7%,E mAP50增大2.7%,达到94.7%,E mAP50-95增大6.4%,达到79.7%;与目前广泛使用的YOLO系列模型相比,YOLO-NPDL模型在较少参数量的基础上,检测精度较高。YOLO-NPDL模型在检测远端目标、雨天及夜景等实际复杂交通情景中无明显误检、漏检情况,可检测到更多的远端小目标车辆,检测效果更优。展开更多
基金support by the National Natural Science Foundation of China (Grant No. 62005049)Natural Science Foundation of Fujian Province (Grant Nos. 2020J01451, 2022J05113)Education and Scientific Research Program for Young and Middleaged Teachers in Fujian Province (Grant No. JAT210035)。
文摘Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.
基金the National Key Research and Development Program of China(No.2018YFB1305005)。
文摘Passive optical motion capture technology is an effective mean to conduct high-precision pose estimation of small scenes of mobile robots;nevertheless,in the case of complex background and stray light interference in the scene,due to the infuence of target adhesion and environmental reflection,this technology cannot estimate the pose accurately.A passive binocular optical motion capture technology under complex illumination based on binocular camera and fixed retroreflective marker balls has been proposed.By fixing multiple hemispherical retrorefective marker balls on a rigid base,it uses binocular camera for depth estimation to obtain the fixed position relationship between the feature points.After performing unsupervised state estimation without manual operation,it overcomes the infuence of refection spots in the background.Meanwhile,contour extraction and ellipse least square fitting are used to extract the marker balls with incomplete shape as the feature points,so as to solve the problem of target adhesion in the scene.A FANUC m10i-a robot moving with 6-DOF is used for verification using the above methods in a complex lighting environment of a welding laboratory.The result shows that the average of absolute position errors is 5.793mm,the average of absolute rotation errors is 1.997°the average of relative position errors is 0.972 mm,and the average of relative rotation errors is 0.002°.Therefore,this technology meets the requirements of high-precision measurement in a complex lighting environment when estimating the 6-DOF-motion mobile robot and has very significant application prospects in complex scenes.
基金supported in part by the National Natural Science Foundation of China under Grant 62271302the Shanghai Municipal Natural Science Foundation under Grant 20ZR1423500.
文摘Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise due to their low pixel proportion and limited available features,leading to detection failure.To address this problem,this paper proposes an Attention Shift-Invariant Cross-Evolutionary Feature Fusion Network(ASCFNet)tailored for the detection of infrared weak and small targets.The network architecture first designs a Multidimensional Lightweight Pixel-level Attention Module(MLPA),which alleviates the issue of small-target feature suppression during deep network propagation by combining channel reshaping,multi-scale parallel subnet architectures,and local cross-channel interactions.Then,a Multidimensional Shift-Invariant Recall Module(MSIR)is designed to ensure the network remains unaffected by minor input perturbations when processing infrared images,through focusing on the model’s shift invariance.Subsequently,a Cross-Evolutionary Feature Fusion structure(CEFF)is designed to allow flexible and efficient integration of multidimensional feature information from different network hierarchies,thereby achieving complementarity and enhancement among features.Experimental results on three public datasets,SIRST,NUDT-SIRST,and IRST640,demonstrate that our proposed network outperforms advanced algorithms in the field.Specifically,on the NUDT-SIRST dataset,the mAP50,mAP50-95,and metrics reached 99.26%,85.22%,and 99.31%,respectively.Visual evaluations of detection results in diverse scenarios indicate that our algorithm exhibits an increased detection rate and reduced false alarm rate.Our method balances accuracy and real-time performance,and achieves efficient and stable detection of infrared weak and small targets.
文摘为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方向的信息丢失;引入RepBlock结构重参数化模块,在训练过程中采用多分支结构提高模型特征提取性能;添加P2检测层捕捉更多小目标细节特征,丰富网络内小目标的特征信息流;采用Dynamic Head自注意力机制检测头,将尺度感知、空间感知和任务感知自注意力机制融合到统一框架中,提高检测性能;采用基于层自适应幅度的剪枝(layer-adaptive magnitude based pruning,LAMP)算法,移除模型的冗余参数,构建YOLO-NPDL(Neck-ARW,P2,Dynamic Head,LAMP)车辆目标检测模型。以UA-DETRAC(university at Albany detection and tracking)数据集为试验数据集,分别进行RepBlock模块嵌入位置试验、不同颈部结构对比试验、剪枝试验、消融试验、模型性能对比试验,验证YOLO-NPDL模型的平均精度均值。试验结果表明:RepBlock模块同时嵌入辅助检测分支和颈部主干结构时对多尺度目标的特征提取能力更优,在训练过程中可保留更多的细节信息,但参数量和计算量均增大;采用Neck-ARW颈部结构后模型的平均精度均值E mAP50、E mAP50-95分别提高1.1%、1.7%,参数量减小约17.9%,结构较优;剪枝率为1.3时,模型参数量、计算量分别减小约38.0%、24.0%,冗余通道占比较少,结构较紧凑;与YOLOv8n模型相比,YOLO-NPDL模型在参数量基本相同的基础上,召回率增大2.7%,E mAP50增大2.7%,达到94.7%,E mAP50-95增大6.4%,达到79.7%;与目前广泛使用的YOLO系列模型相比,YOLO-NPDL模型在较少参数量的基础上,检测精度较高。YOLO-NPDL模型在检测远端目标、雨天及夜景等实际复杂交通情景中无明显误检、漏检情况,可检测到更多的远端小目标车辆,检测效果更优。