Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
The study was performed on neurons with direction selective (DS) receptive fields (RFs) in the primary visual cortex of the cat. Preferred directions (PDs) of these cells to a single light spot and a system of two ide...The study was performed on neurons with direction selective (DS) receptive fields (RFs) in the primary visual cortex of the cat. Preferred directions (PDs) of these cells to a single light spot and a system of two identical light spots moving across the RF with a given angle between them were compared. Directional interactions appeared when the angles between the directions of the two moving spots were 30o or 60o. PD for 56% of the cells coincided with bisectors of these angles. These cells responded to a combination of the two moving stimuli as if only one stimulus moved in the RF in an intermediate direction. This direction coincided with PD of the DS neuron to a single spot. Also, the investigation revealed that DS neurons responded to stimuli moving at such angles as 180o (to preferred and opposite directions simultaneously). In the further experiment we investigated responses of the DS cells in the primary visual cortex of RF. The angle between the directions of the two moving spots was 60o. These cells responded to a combination of the two moving stimuli as if only one stimulus moved in RF in an intermediate direction. The more relative luminance of one of spots in pair was, the closer the intermediate direction approached to the direction of this spot).展开更多
Rapid coal-rock identification is one of the key technologies for intelligent and unmanned coal mining.Currently,the existing image recognition algorithms cannot satisfy practical needs in terms of recognition speed a...Rapid coal-rock identification is one of the key technologies for intelligent and unmanned coal mining.Currently,the existing image recognition algorithms cannot satisfy practical needs in terms of recognition speed and accuracy.In view of the evident differences between coal and rock in visual attributes such as color,gloss and texture,the complete local binary pattern(CLBP)image feature descriptor is introduced for coal and rock image recognition.Given that the original algorithm oversimplifies local texture features by ignoring imaging information from higher-order pixels and the concave and convex areas between adjacent sampling points,this paper proposes a higher-order differential median CLBP image feature descriptor to replace the original CLBP center pixel gray with a local gray median,and replace the binary differential with a second-order differential.Meanwhile,for the high dimensionality of CLBP descriptor histogram and feature redundancy,deep learning perceptual field theory is introduced to realize data nonlinear dimensionality reduction and deep feature extraction.With relevant experiments conducted,the following conclusion can be drawn:(1)Compared with that of the original CLBP,the recognition accuracy of the improved CLBP algorithm is greatly improved and finally stabilized above 94.3%under strong noise interference;(2)Compared with that of the original CLBP model,the single image recognition time of the coal rock image recognition model fusing the improved CLBP and the receptive field theory is 0.0035 s,a reduction of 71.0%;compared with the improved CLBP model(without the fusion of receptive field theory),it can shorten the recognition time by 97.0%,but the accuracy rate still maintains more than 98.5%.The method offers a valuable technical reference for the fields of mineral development and deep mining.展开更多
Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal windo...Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy.展开更多
The concept of receptive field(RF) is central to sensory neuroscience. Neuronal RF properties have been substantially studied in animals,while those in humans remain nearly unexplored. Here, we measured neuronal RFs w...The concept of receptive field(RF) is central to sensory neuroscience. Neuronal RF properties have been substantially studied in animals,while those in humans remain nearly unexplored. Here, we measured neuronal RFs with intracranial local field potentials(LFPs) and spiking activity in human visual cortex(V1/V2/V3). We recorded LFPs via macro-contacts and discovered that RF sizes estimated from lowfrequency activity(LFA, 0.5–30 Hz) were larger than those estimated from low-gamma activity(LGA, 30–60 Hz) and high-gamma activity(HGA, 60–150 Hz). We then took a rare opportunity to record LFPs and spiking activity via microwires in V1 simultaneously. We found that RF sizes and temporal profiles measured from LGA and HGA closely matched those from spiking activity. In sum, this study reveals that spiking activity of neurons in human visual cortex could be well approximated by LGA and HGA in RF estimation and temporal profile measurement, implying the pivotal functions of LGA and HGA in early visual information processing.展开更多
Spatiotemporal structures of receptive fields (RF) have been studied for simple cells in area 18 of cat by measuring the temporal transfer function (TTF) over different locations (subregions) within the RF. The tempor...Spatiotemporal structures of receptive fields (RF) have been studied for simple cells in area 18 of cat by measuring the temporal transfer function (TTF) over different locations (subregions) within the RF. The temporal characteristics of different subregions differed from each other in the absolute phase shift (APS) to visual stimuli. Two types of relationships can be seen: (i)The APS varied continuously from one subregion to the next; (ii) A 180° phase jump was seen as the stimulus position changed somewhere within the receptive field. Spatiotemporal receptive field profiles have been determined by applying reverse Fourier analysis to responses in the frequency domain. For the continuous type, spatial and temporal characteristics cannot be dissociated (space time inseparable) and the spatiotemporal structure is oriented. On the contrary, the spatial and temporal characteristics for the jumping type can be dissociated (space time separable) and the structure is not oriented in the space time plane. Based on the APSs measured at different subregions, the optimal direction of motion and optimal spatial frequency of neurons can be predicted.展开更多
In this paper,we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks.Unlike previous work which placed much importance on obtaining better receptive fields using manual...In this paper,we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks.Unlike previous work which placed much importance on obtaining better receptive fields using manually selected dilated convolutional kernels,our approach uses two affine transformation layers in the network’s backbone and operates on feature maps.Feature maps are inflated or shrunk by the new layer,thereby changing the receptive fields in the following layers.By use of end-to-end training,the whole framework is data-driven,without laborious manual intervention.The proposed method is generic across datasets and different tasks.We have conducted extensive experiments on both general image parsing tasks,and face parsing tasks as concrete examples,to demonstrate the method’s superior ability to regulate over manual designs.展开更多
The L0-norm constraint in sparse coding has the advantage of producing the same diversity of receptive field shapes as physiology data,but is difficult for analysis.It remains a challenging issue to understand how the...The L0-norm constraint in sparse coding has the advantage of producing the same diversity of receptive field shapes as physiology data,but is difficult for analysis.It remains a challenging issue to understand how the diverse shapes of V1 simple cell receptive fields emerge in visual cortex.This paper presents a biologically plausible learning algorithm,named Hebbian-based mean shift,for this problem.The L0-norm constraint optimizes the number of basis functions rather than their coefficients.We report that the optimization procedure is essentially a 0–1 programming of the selection of basis functions.By assuming that the basis functions are independently selected from a basis set,we find the spatial distribution of input samples containing a special basis function has a star shape and peaks at this basis function.Thus,learning the basis functions for sparse coding with the L0-norm can be interpreted as mode detection where the basis functions are the modes of the kernel density estimate.We employ mean shift to detect modes and prove that the updating rule for the mean shift is Hebbian.The simulation results demonstrate the robustness of the proposed algorithm in producing both Gabor-like and blob-like basis functions.展开更多
无人机载平台中的目标检测在军事和民用领域具有重要的应用价值.然而,现有的检测方法通常侧重于多尺度目标检测,缺乏对小目标的优化,且模型复杂度过高,难以在资源受限的机载平台中应用.为此,本文提出了一种面向无人机载平台的轻量级小...无人机载平台中的目标检测在军事和民用领域具有重要的应用价值.然而,现有的检测方法通常侧重于多尺度目标检测,缺乏对小目标的优化,且模型复杂度过高,难以在资源受限的机载平台中应用.为此,本文提出了一种面向无人机载平台的轻量级小目标检测算法YOLOH(You Only Look One Head).首先,针对小目标对基准网络优化,移除深层特征以减少模型参数量,增加浅层特征以获取小目标信息.其次,在特征融合部分加入NAM注意力,增强对小目标的感知能力.接着,设计了多感受野聚焦模块MRFF,以挖掘特征图的感受野信息,增强模型的多尺度检测能力.最后,使用LAMP算法对模型剪枝,去除冗余神经元以压缩模型.实验结果表明,与YOLOv8s相比,YOLOH的模型参数量和计算量分别减少了92%和35%,FPS提高了57%.在VisDrone2019和CARPK数据集上AP_(S)分别提高了3.3%和3.7%.与其他轻量级模型相比,所提YOLOH具有最佳的整体性能,同时平衡了模型大小、精度和推理速度,为无人机载平台的目标检测提供了有效的解决方案.展开更多
目的针对遥感图像(remote sensing image,RSI)检测中目标尺寸小且密集、尺度变化大,尤其在复杂背景信息下容易出现漏检和误检问题,提出一种上下文信息和多尺度特征序列引导的遥感图像检测方法,以提升遥感图像的检测精度。方法首先,设计...目的针对遥感图像(remote sensing image,RSI)检测中目标尺寸小且密集、尺度变化大,尤其在复杂背景信息下容易出现漏检和误检问题,提出一种上下文信息和多尺度特征序列引导的遥感图像检测方法,以提升遥感图像的检测精度。方法首先,设计自适应大感受野机制(adaptive large receptive field,ALRF)用于特征提取。该机制通过级联不同扩张率的深度卷积进行分层特征提取,并利用通道和空间注意力对提取的特征进行通道加权和空间融合,使模型能够自适应地调整感受野大小,从而实现遥感图像上下文信息的有效利用。其次,为解决颈部网络特征融合过程中小目标语义信息丢失问题,设计多尺度特征序列融合架构(multi-scale feature fusion,MFF)。该架构通过构建多尺度特征序列,并结合浅层语义特征信息,实现复杂背景下多尺度全局信息的有效融合,从而减轻深层网络中特征模糊性对小目标局部细节捕捉的影响。最后,因传统交并比(intersection over union,IoU)对小目标位置偏差过于敏感,引入归一化Wasserstein距离(normalized Wasserstein distance,NWD)。NWD将边界框建模为二维高斯分布,计算这些分布间的Wasserstein距离来衡量边界框的相似性,从而降低小目标位置偏差敏感性。结果在NWPU VHR-10(Northwestern Polytechnical University very high resolution10)和DIOR(dataset for object detection in aerial images)数据集上与10种方法进行综合比较,结果表明,提出的方法优于对比方法,平均精度(average precision,AP)分别达到93.15%和80.89%,相较于基准模型YOLOv8n(you only look once version 8 nano),提升了5.48%和2.97%,同时参数量下降6.96%。结论提出一种上下文信息和多尺度特征序列引导的遥感图像检测方法,该方法提升目标的定位能力,改善复杂背景下遥感图像检测中的漏检和误检问题。展开更多
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
文摘The study was performed on neurons with direction selective (DS) receptive fields (RFs) in the primary visual cortex of the cat. Preferred directions (PDs) of these cells to a single light spot and a system of two identical light spots moving across the RF with a given angle between them were compared. Directional interactions appeared when the angles between the directions of the two moving spots were 30o or 60o. PD for 56% of the cells coincided with bisectors of these angles. These cells responded to a combination of the two moving stimuli as if only one stimulus moved in the RF in an intermediate direction. This direction coincided with PD of the DS neuron to a single spot. Also, the investigation revealed that DS neurons responded to stimuli moving at such angles as 180o (to preferred and opposite directions simultaneously). In the further experiment we investigated responses of the DS cells in the primary visual cortex of RF. The angle between the directions of the two moving spots was 60o. These cells responded to a combination of the two moving stimuli as if only one stimulus moved in RF in an intermediate direction. The more relative luminance of one of spots in pair was, the closer the intermediate direction approached to the direction of this spot).
基金Scientific and technological innovation project of colleges and universities in Shanxi Province,Grant/Award Number:2020L0294Shanxi Province Science Foundation for Youths,Grant/Award Number:201901D211249。
文摘Rapid coal-rock identification is one of the key technologies for intelligent and unmanned coal mining.Currently,the existing image recognition algorithms cannot satisfy practical needs in terms of recognition speed and accuracy.In view of the evident differences between coal and rock in visual attributes such as color,gloss and texture,the complete local binary pattern(CLBP)image feature descriptor is introduced for coal and rock image recognition.Given that the original algorithm oversimplifies local texture features by ignoring imaging information from higher-order pixels and the concave and convex areas between adjacent sampling points,this paper proposes a higher-order differential median CLBP image feature descriptor to replace the original CLBP center pixel gray with a local gray median,and replace the binary differential with a second-order differential.Meanwhile,for the high dimensionality of CLBP descriptor histogram and feature redundancy,deep learning perceptual field theory is introduced to realize data nonlinear dimensionality reduction and deep feature extraction.With relevant experiments conducted,the following conclusion can be drawn:(1)Compared with that of the original CLBP,the recognition accuracy of the improved CLBP algorithm is greatly improved and finally stabilized above 94.3%under strong noise interference;(2)Compared with that of the original CLBP model,the single image recognition time of the coal rock image recognition model fusing the improved CLBP and the receptive field theory is 0.0035 s,a reduction of 71.0%;compared with the improved CLBP model(without the fusion of receptive field theory),it can shorten the recognition time by 97.0%,but the accuracy rate still maintains more than 98.5%.The method offers a valuable technical reference for the fields of mineral development and deep mining.
基金supported by the research team of Xi’an Traffic Engineering Institute and the Young and middle-aged fund project of Xi’an Traffic Engineering Institute (2022KY-02).
文摘Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy.
基金supported by the National Science and Technology Innovation 2030 Major Program(2022ZD0204802,2022ZD0204804)the National Natural Science Foundation of China(31930053,32171039)Beijing Academy of Artificial Intelligence(BAAI)。
文摘The concept of receptive field(RF) is central to sensory neuroscience. Neuronal RF properties have been substantially studied in animals,while those in humans remain nearly unexplored. Here, we measured neuronal RFs with intracranial local field potentials(LFPs) and spiking activity in human visual cortex(V1/V2/V3). We recorded LFPs via macro-contacts and discovered that RF sizes estimated from lowfrequency activity(LFA, 0.5–30 Hz) were larger than those estimated from low-gamma activity(LGA, 30–60 Hz) and high-gamma activity(HGA, 60–150 Hz). We then took a rare opportunity to record LFPs and spiking activity via microwires in V1 simultaneously. We found that RF sizes and temporal profiles measured from LGA and HGA closely matched those from spiking activity. In sum, this study reveals that spiking activity of neurons in human visual cortex could be well approximated by LGA and HGA in RF estimation and temporal profile measurement, implying the pivotal functions of LGA and HGA in early visual information processing.
文摘Spatiotemporal structures of receptive fields (RF) have been studied for simple cells in area 18 of cat by measuring the temporal transfer function (TTF) over different locations (subregions) within the RF. The temporal characteristics of different subregions differed from each other in the absolute phase shift (APS) to visual stimuli. Two types of relationships can be seen: (i)The APS varied continuously from one subregion to the next; (ii) A 180° phase jump was seen as the stimulus position changed somewhere within the receptive field. Spatiotemporal receptive field profiles have been determined by applying reverse Fourier analysis to responses in the frequency domain. For the continuous type, spatial and temporal characteristics cannot be dissociated (space time inseparable) and the spatiotemporal structure is oriented. On the contrary, the spatial and temporal characteristics for the jumping type can be dissociated (space time separable) and the structure is not oriented in the space time plane. Based on the APSs measured at different subregions, the optimal direction of motion and optimal spatial frequency of neurons can be predicted.
基金supported by the National Natural Science Foundation of China (Nos.U1536203,61572493)the Cutting Edge Technology Research Program of the Institute of Information Engineering,CAS (No.Y7Z0241102)+1 种基金the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of the Ministry of Education (No.Y6Z0021102)Nanjing University of Science and Technology (No.JYB201702)
文摘In this paper,we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks.Unlike previous work which placed much importance on obtaining better receptive fields using manually selected dilated convolutional kernels,our approach uses two affine transformation layers in the network’s backbone and operates on feature maps.Feature maps are inflated or shrunk by the new layer,thereby changing the receptive fields in the following layers.By use of end-to-end training,the whole framework is data-driven,without laborious manual intervention.The proposed method is generic across datasets and different tasks.We have conducted extensive experiments on both general image parsing tasks,and face parsing tasks as concrete examples,to demonstrate the method’s superior ability to regulate over manual designs.
基金supported in part by the Natural Science Foundation of China(60973059,81171407)Specialized Research Fund for the Doctoral Program of Higher Education(20121101110035)Specialized Fund for Joint Building Program of Beijing Municipal Education Commission
文摘The L0-norm constraint in sparse coding has the advantage of producing the same diversity of receptive field shapes as physiology data,but is difficult for analysis.It remains a challenging issue to understand how the diverse shapes of V1 simple cell receptive fields emerge in visual cortex.This paper presents a biologically plausible learning algorithm,named Hebbian-based mean shift,for this problem.The L0-norm constraint optimizes the number of basis functions rather than their coefficients.We report that the optimization procedure is essentially a 0–1 programming of the selection of basis functions.By assuming that the basis functions are independently selected from a basis set,we find the spatial distribution of input samples containing a special basis function has a star shape and peaks at this basis function.Thus,learning the basis functions for sparse coding with the L0-norm can be interpreted as mode detection where the basis functions are the modes of the kernel density estimate.We employ mean shift to detect modes and prove that the updating rule for the mean shift is Hebbian.The simulation results demonstrate the robustness of the proposed algorithm in producing both Gabor-like and blob-like basis functions.
文摘无人机载平台中的目标检测在军事和民用领域具有重要的应用价值.然而,现有的检测方法通常侧重于多尺度目标检测,缺乏对小目标的优化,且模型复杂度过高,难以在资源受限的机载平台中应用.为此,本文提出了一种面向无人机载平台的轻量级小目标检测算法YOLOH(You Only Look One Head).首先,针对小目标对基准网络优化,移除深层特征以减少模型参数量,增加浅层特征以获取小目标信息.其次,在特征融合部分加入NAM注意力,增强对小目标的感知能力.接着,设计了多感受野聚焦模块MRFF,以挖掘特征图的感受野信息,增强模型的多尺度检测能力.最后,使用LAMP算法对模型剪枝,去除冗余神经元以压缩模型.实验结果表明,与YOLOv8s相比,YOLOH的模型参数量和计算量分别减少了92%和35%,FPS提高了57%.在VisDrone2019和CARPK数据集上AP_(S)分别提高了3.3%和3.7%.与其他轻量级模型相比,所提YOLOH具有最佳的整体性能,同时平衡了模型大小、精度和推理速度,为无人机载平台的目标检测提供了有效的解决方案.
文摘目的针对遥感图像(remote sensing image,RSI)检测中目标尺寸小且密集、尺度变化大,尤其在复杂背景信息下容易出现漏检和误检问题,提出一种上下文信息和多尺度特征序列引导的遥感图像检测方法,以提升遥感图像的检测精度。方法首先,设计自适应大感受野机制(adaptive large receptive field,ALRF)用于特征提取。该机制通过级联不同扩张率的深度卷积进行分层特征提取,并利用通道和空间注意力对提取的特征进行通道加权和空间融合,使模型能够自适应地调整感受野大小,从而实现遥感图像上下文信息的有效利用。其次,为解决颈部网络特征融合过程中小目标语义信息丢失问题,设计多尺度特征序列融合架构(multi-scale feature fusion,MFF)。该架构通过构建多尺度特征序列,并结合浅层语义特征信息,实现复杂背景下多尺度全局信息的有效融合,从而减轻深层网络中特征模糊性对小目标局部细节捕捉的影响。最后,因传统交并比(intersection over union,IoU)对小目标位置偏差过于敏感,引入归一化Wasserstein距离(normalized Wasserstein distance,NWD)。NWD将边界框建模为二维高斯分布,计算这些分布间的Wasserstein距离来衡量边界框的相似性,从而降低小目标位置偏差敏感性。结果在NWPU VHR-10(Northwestern Polytechnical University very high resolution10)和DIOR(dataset for object detection in aerial images)数据集上与10种方法进行综合比较,结果表明,提出的方法优于对比方法,平均精度(average precision,AP)分别达到93.15%和80.89%,相较于基准模型YOLOv8n(you only look once version 8 nano),提升了5.48%和2.97%,同时参数量下降6.96%。结论提出一种上下文信息和多尺度特征序列引导的遥感图像检测方法,该方法提升目标的定位能力,改善复杂背景下遥感图像检测中的漏检和误检问题。