期刊文献+
共找到44篇文章
< 1 2 3 >
每页显示 20 50 100
EHDC-YOLO: Enhancing Object Detection for UAV Imagery via Multi-Scale Edge and Detail Capture
1
作者 Zhiyong Deng Yanchen Ye Jiangling Guo 《Computers, Materials & Continua》 2026年第1期1665-1682,共18页
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ... With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios. 展开更多
关键词 UAV imagery object detection multi-scale feature fusion edge enhancement detail preservation YOLO feature pyramid network attention mechanism
在线阅读 下载PDF
Pyramid–MixNet: Integrate Attention into Encoder-Decoder Transformer Framework for Automatic Railway Surface Damage Segmentation
2
作者 Hui Luo Wenqing Li Wei Zeng 《Computers, Materials & Continua》 2025年第7期1567-1580,共14页
Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi... Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi-category,and multi-scale target segmentation tasks.To address these challenges,this paper proposes Pyramid-MixNet,an intelligent segmentation model for high-speed rail surface damage,leveraging dataset construction and expansion alongside a feature pyramid-based encoder-decoder network with multi-attention mechanisms.The encoding net-work integrates Spatial Reduction Masked Multi-Head Attention(SRMMHA)to enhance global feature extraction while reducing trainable parameters.The decoding network incorporates Mix-Attention(MA),enabling multi-scale structural understanding and cross-scale token group correlation learning.Experimental results demonstrate that the proposed method achieves 62.17%average segmentation accuracy,80.28%Damage Dice Coefficient,and 56.83 FPS,meeting real-time detection requirements.The model’s high accuracy and scene adaptability significantly improve the detection of small-scale and complex multi-scale rail damage,offering practical value for real-time monitoring in high-speed railway maintenance systems. 展开更多
关键词 pyramid vision transformer encoder–decoder architecture railway damage segmentation masked multi-head attention mix-attention
在线阅读 下载PDF
Optimized Convolutional Neural Networks with Multi-Scale Pyramid Feature Integration for Efficient Traffic Light Detection in Intelligent Transportation Systems
3
作者 Yahia Said Yahya Alassaf +2 位作者 Refka Ghodhbani Taoufik Saidani Olfa Ben Rhaiem 《Computers, Materials & Continua》 2025年第2期3005-3018,共14页
Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportatio... Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportation systems (ITS) and Advanced Driver Assistance Systems (ADAS), the development of efficient and reliable traffic light detection mechanisms is crucial for enhancing road safety and traffic management. This paper presents an optimized convolutional neural network (CNN) framework designed to detect traffic lights in real-time within complex urban environments. Leveraging multi-scale pyramid feature maps, the proposed model addresses key challenges such as the detection of small, occluded, and low-resolution traffic lights amidst complex backgrounds. The integration of dilated convolutions, Region of Interest (ROI) alignment, and Soft Non-Maximum Suppression (Soft-NMS) further improves detection accuracy and reduces false positives. By optimizing computational efficiency and parameter complexity, the framework is designed to operate seamlessly on embedded systems, ensuring robust performance in real-world applications. Extensive experiments using real-world datasets demonstrate that our model significantly outperforms existing methods, providing a scalable solution for ITS and ADAS applications. This research contributes to the advancement of Artificial Intelligence-driven (AI-driven) pattern recognition in transportation systems and offers a mathematical approach to improving efficiency and safety in logistics and transportation networks. 展开更多
关键词 Intelligent transportation systems(ITS) traffic light detection multi-scale pyramid feature maps advanced driver assistance systems(ADAS) real-time detection AI in transportation
在线阅读 下载PDF
DDFNet:real-time salient object detection with dual-branch decoding fusion for steel plate surface defects
4
作者 Tao Wang Wang-zhe Du +5 位作者 Xu-wei Li Hua-xin Liu Yuan-ming Liu Xiao-miao Niu Ya-xing Liu Tao Wang 《Journal of Iron and Steel Research International》 2025年第8期2421-2433,共13页
A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decod... A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet. 展开更多
关键词 Steel plate surface defect Real-time detection Salient object detection Dual-branch decoder multi-scale attention fusion multi-scale residual fusion
原文传递
Bidirectional parallel multi-branch convolution feature pyramid network for target detection in aerial images of swarm UAVs 被引量:4
5
作者 Lei Fu Wen-bin Gu +3 位作者 Wei Li Liang Chen Yong-bao Ai Hua-lei Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2021年第4期1531-1541,共11页
In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swa... In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swarm unmanned aerial vehicles(UAVs).First,the bidirectional parallel multi-branch convolution modules are used to construct the feature pyramid to enhance the feature expression abilities of different scale feature layers.Next,the feature pyramid is integrated into the single-stage object detection framework to ensure real-time performance.In order to validate the effectiveness of the proposed algorithm,experiments are conducted on four datasets.For the PASCAL VOC dataset,the proposed algorithm achieves the mean average precision(mAP)of 85.4 on the VOC 2007 test set.With regard to the detection in optical remote sensing(DIOR)dataset,the proposed algorithm achieves 73.9 mAP.For vehicle detection in aerial imagery(VEDAI)dataset,the detection accuracy of small land vehicle(slv)targets reaches 97.4 mAP.For unmanned aerial vehicle detection and tracking(UAVDT)dataset,the proposed BPMFPN Det achieves the mAP of 48.75.Compared with the previous state-of-the-art methods,the results obtained by the proposed algorithm are more competitive.The experimental results demonstrate that the proposed algorithm can effectively solve the problem of real-time detection of ground multi-scale targets in aerial images of swarm UAVs. 展开更多
关键词 Aerial images Object detection Feature pyramid networks multi-scale feature fusion Swarm UAVs
在线阅读 下载PDF
Neighborhood fusion-based hierarchical parallel feature pyramid network for object detection 被引量:3
6
作者 Mo Lingfei Hu Shuming 《Journal of Southeast University(English Edition)》 EI CAS 2020年第3期252-263,共12页
In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid... In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy. 展开更多
关键词 computer vision deep convolutional neural network object detection hierarchical parallel feature pyramid network multi-scale feature fusion
在线阅读 下载PDF
IMTNet:Improved Multi-Task Copy-Move Forgery Detection Network with Feature Decoupling and Multi-Feature Pyramid
7
作者 Huan Wang Hong Wang +2 位作者 Zhongyuan Jiang Qing Qian Yong Long 《Computers, Materials & Continua》 SCIE EI 2024年第9期4603-4620,共18页
Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality a... Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality and quantity,and a narrow range of applicable tasks.These limitations significantly restrict the capacity and applicability of CMFD.To overcome the limitations of existing methods,a novel solution called IMTNet is proposed for CMFD by employing a feature decoupling approach.Firstly,this study formulates the objective task and network relationship as an optimization problem using transfer learning.Furthermore,it thoroughly discusses and analyzes the relationship between CMFD and deep network architecture by employing ResNet-50 during the optimization solving phase.Secondly,a quantitative comparison between fine-tuning and feature decoupling is conducted to evaluate the degree of similarity between the image classification and CMFD domains by the enhanced ResNet-50.Finally,suspicious regions are localized using a feature pyramid network with bottom-up path augmentation.Experimental results demonstrate that IMTNet achieves faster convergence,shorter training times,and favorable generalization performance compared to existingmethods.Moreover,it is shown that IMTNet significantly outperforms fine-tuning based approaches in terms of accuracy and F_(1). 展开更多
关键词 Image copy-move detection feature decoupling multi-scale feature pyramids passive forensics
在线阅读 下载PDF
基于时频双域特征融合与动态交互机制的短期电力负荷预测
8
作者 王东风 张浩 +2 位作者 胡怡然 崔玉雷 黄宇 《电力科学与工程》 2025年第12期57-64,共8页
针对电力负荷序列时序动态性、多尺度特征及复杂周期规律给预测带来的难题,提出一种基于时频双域特征融合与动态交互机制的短期电力负荷预测方法,其核心架构为双谱网。首先,针对短期电力负荷数据的非平稳和非线性特性,采用基于阿尔法进... 针对电力负荷序列时序动态性、多尺度特征及复杂周期规律给预测带来的难题,提出一种基于时频双域特征融合与动态交互机制的短期电力负荷预测方法,其核心架构为双谱网。首先,针对短期电力负荷数据的非平稳和非线性特性,采用基于阿尔法进化算法改进的变分模态分解算法对负荷数据分解,得到若干本征模态函数;其次,设计频域特征增强机制,通过频谱注意力动态融合振幅谱与相位谱,并构建时频交叉注意力网络嵌入频域先验,结合跨维度门控实现特征校准;最后,基于多尺度金字塔解码器自适应融合时空特征生成预测值。以某市电力负荷数据集进行验证并与主流模型进行对比,结果表明所采用的预测方法具有更好的预测性能。 展开更多
关键词 时频双域 动态交互 双谱网 频域特征增强 多尺度金字塔解码器
在线阅读 下载PDF
Hybrid receptive field network for small object detection on drone view 被引量:1
9
作者 Zhaodong CHEN Hongbing JI +2 位作者 Yongquan ZHANG Wenke LIU Zhigang ZHU 《Chinese Journal of Aeronautics》 2025年第2期322-338,共17页
Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones... Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built. 展开更多
关键词 Drone remote sensing Object detection on drone view Small object detector Hybrid receptive field Feature pyramid network Feature augmentation multi-scale object detection
原文传递
基于高效加性注意力的级联式特征融合息肉分割网络
10
作者 李萌 张孙杰 《中国生物医学工程学报》 北大核心 2025年第4期447-456,共10页
为解决大多数息肉分割方法存在的局部和全局信息交互不足、相邻层不同深度间的特征弱相关性问题,本研究提出了一种基于金字塔视觉Transformer和自注意力机制级联解码器的网络模型(PVT-SMCD)。首先,以PVTv2为骨干网络提取图像特征,通过... 为解决大多数息肉分割方法存在的局部和全局信息交互不足、相邻层不同深度间的特征弱相关性问题,本研究提出了一种基于金字塔视觉Transformer和自注意力机制级联解码器的网络模型(PVT-SMCD)。首先,以PVTv2为骨干网络提取图像特征,通过高效加性注意力获取关键信息,捕捉长距离依赖关系;其次,引入多核卷积增强块定位息肉的高级语义特征,将其输入到级联解码器中实现局部和全局间的信息交互;最后,利用特征融合模块自上而下逐步融合相邻层间的特征以减少高维特征与低维特征间的信息差距。所提出模型在5个息肉分割数据集上与其他8种医学图像分割网络进行对比,其中在Kvasir和CVC-ClinicDB数据集上,mDice分别为92.3%、94.5%,mIoU为87.1%、89.9%,MAE分别为0.021和0.006;在CVC-300上,mDice和mIoU分别达到了90%和83.3%,MAE为0.007;在CVC-ColonDB上mDice为81.5%,mIoU为73.5%,MAE为0.028;在ETIS数据集上,mDice为78.9%,mIoU为71.3%,MAE为0.019。实验结果表明,PVT-SMCD在绝大多数评价指标上均优于对比模型,展现出更优异的学习能力和泛化性能,能够实现更精准的息肉分割效果。 展开更多
关键词 息肉分割 金字塔视觉Transformer 级联解码器 高效加性注意力
暂未订购
Visual Perception and Adaptive Scene Analysis with Autonomous Panoptic Segmentation
11
作者 Darthy Rabecka V Britto Pari J Man-Fai Leung 《Computers, Materials & Continua》 2025年第10期827-853,共27页
Techniques in deep learning have significantly boosted the accuracy and productivity of computer vision segmentation tasks.This article offers an intriguing architecture for semantic,instance,and panoptic segmentation... Techniques in deep learning have significantly boosted the accuracy and productivity of computer vision segmentation tasks.This article offers an intriguing architecture for semantic,instance,and panoptic segmentation using EfficientNet-B7 and Bidirectional Feature Pyramid Networks(Bi-FPN).When implemented in place of the EfficientNet-B5 backbone,EfficientNet-B7 strengthens the model’s feature extraction capabilities and is far more appropriate for real-world applications.By ensuring superior multi-scale feature fusion,Bi-FPN integration enhances the segmentation of complex objects across various urban environments.The design suggested is examined on rigorous datasets,encompassing Cityscapes,Common Objects in Context,KITTI Karlsruhe Institute of Technology and Toyota Technological Institute,and Indian Driving Dataset,which replicate numerous real-world driving conditions.During extensive training,validation,and testing,the model showcases major gains in segmentation accuracy and surpasses state-of-the-art performance in semantic,instance,and panoptic segmentation tasks.Outperforming present methods,the recommended approach generates noteworthy gains in Panoptic Quality:+0.4%on Cityscapes,+0.2%on COCO,+1.7%on KITTI,and+0.4%on IDD.These changes show just how efficient it is in various driving circumstances and datasets.This study emphasizes the potential of EfficientNet-B7 and Bi-FPN to provide dependable,high-precision segmentation in computer vision applications,primarily autonomous driving.The research results suggest that this framework efficiently tackles the constraints of practical situations while delivering a robust solution for high-performance tasks involving segmentation. 展开更多
关键词 Panoptic segmentation multi-scale features efficient net-B7 Feature pyramid Network
在线阅读 下载PDF
MFF-YOLO:An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion
12
作者 Junsan Zhang Chenyang Xu +2 位作者 Shigen Shen Jie Zhu Peiying Zhang 《Tsinghua Science and Technology》 2025年第5期2097-2113,共17页
The YOLOv5 algorithm is widely used in edge computing systems for object detection.However,the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase ... The YOLOv5 algorithm is widely used in edge computing systems for object detection.However,the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices.To address this issue,we propose a smaller,less computationally intensive,and more accurate algorithm for object detection.Multi-scale Feature Fusion-YOLO(MFF-YOLO)is built on top of the YOLOv5s framework,but it contains substantial improvements to YOLOv5s.First,we design the MFF module to improve the feature propagation path in the feature pyramid,which further integrates the semantic information from different paths of feature layers.Then,a large convolution-kernel module is used in the bottleneck.The structure enlarges the receptive field and preserves shallow semantic information,which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks(FPN).In addition,a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity.The experimental results on PASCAL VOC and MS COCO datasets show that,compared with YOLOv5s,MFF-YOLO reduces the number of parameters by 7%and the number of FLoating point Operations Per second(FLOPs)by 11.8%.The mAP@0.5 has improved by 3.7%and 5.5%,and the mAP@0.5:0.95 has improved by 6.5%and 6.2%,respetively.Furthermore,compared with YOLOv7-tiny,PP-YOLO-tiny,and other mainstream methods,MFF-YOLO has achieved better results on multiple indicators. 展开更多
关键词 object detection YOLOv5 Feature pyramid Networks(FPN) feature fusion Deformable Convolutional Networks(DCN) multi-scale Feature Fusion(MFF)
原文传递
双金字塔式编码的人像语义感知自动抠图网络
13
作者 程艳 严志航 《计算机系统应用》 2025年第7期261-271,共11页
人像抠图是图像处理领域重要任务之一,针对现有图像数据中人像前景尺度多样造成的人像抠取粗糙问题,提出一种双金字塔式编码的人像语义感知自动抠图网络.双金字塔式编码器包含输入金字塔和特征金字塔,输入金字塔中输入图像等比例下采样... 人像抠图是图像处理领域重要任务之一,针对现有图像数据中人像前景尺度多样造成的人像抠取粗糙问题,提出一种双金字塔式编码的人像语义感知自动抠图网络.双金字塔式编码器包含输入金字塔和特征金字塔,输入金字塔中输入图像等比例下采样后送入网络以保留原始图像细节,特征金字塔结合带状卷积组和5个层级的编码块充分捕获不同层次的图像特征.同时,在双分支解码结构中,全局分割解码分支上设计了视域扩张模块扩大网络感受范围,进一步增强全局上下文信息的捕获;局部细节分支上提出细节感知模块融合编码特征与解码输出,引导网络关注人像轮廓.在3个数据集上与6种人像自动抠图方法进行了对比实验,所提方法的抠图性能均优于对比方法.验证了所提方法能改善人像抠取的精细度,提高了复杂图像数据下人像抠取的鲁棒性. 展开更多
关键词 人像抠图 语义感知 双金字塔 双分支解码 全局上下文信息
在线阅读 下载PDF
编码-解码多尺度卷积神经网络人群计数方法 被引量:9
14
作者 孟月波 纪拓 +2 位作者 刘光辉 徐胜军 李彤月 《西安交通大学学报》 EI CAS CSCD 北大核心 2020年第5期149-157,共9页
针对基于多列卷积神经网络的人群计数方法存在的多尺度特征信息丢失、融合不佳以及密度图质量不高等问题,提出了一种编码-解码结构的多尺度卷积神经网络人群计数方法。编码器采用多列卷积捕获多尺度特征,通过空洞空间金字塔池化扩大感... 针对基于多列卷积神经网络的人群计数方法存在的多尺度特征信息丢失、融合不佳以及密度图质量不高等问题,提出了一种编码-解码结构的多尺度卷积神经网络人群计数方法。编码器采用多列卷积捕获多尺度特征,通过空洞空间金字塔池化扩大感受野并减少参数量,保留尺度特征和图像的上下文信息;解码器对编码器输出进行上采样,实现高层语义信息和编码器前端低层特征信息有效融合,从而提升了密度图的输出质量。为增强网络对计数的敏感性,在以往像素空间损失的基础上考虑了计数误差,提出了一种新型损失函数。采用Shanghai Tech、Mall以及自建数据集进行了对比实验,结果表明:与之前最优方法相比,所提方法在Shanghai Tech数据集Part_A部分的平均绝对误差和均方误差分别降低了8.3%和21.3%,Part_B部分分别降低了12.9%和12.0%,Mall数据集分别降低了15.1%和23.8%,自建数据集分别降低了13.5%和7.1%;在不同人群场景下,所提方法的人群计数准确性和鲁棒性均优于其他对比方法的。 展开更多
关键词 人群计数 编码-解码结构 多尺度 空洞空间金字塔池化 计数误差 损失函数
在线阅读 下载PDF
一种改进DeepLabV3+网络的高分辨率遥感影像道路提取方法 被引量:19
15
作者 葛小三 曹伟 《遥感信息》 CSCD 北大核心 2022年第1期40-46,共7页
道路网络提取是高分辨率遥感影像数据应用研究的难点之一。针对现有的道路提取方法普遍注重区域精度而边界质量缺失考虑的问题,提出一种基于DeepLabV3+语义分割神经网络的深度学习提取道路的方法。该网络模型采用编码器-解码器网络(enco... 道路网络提取是高分辨率遥感影像数据应用研究的难点之一。针对现有的道路提取方法普遍注重区域精度而边界质量缺失考虑的问题,提出一种基于DeepLabV3+语义分割神经网络的深度学习提取道路的方法。该网络模型采用编码器-解码器网络(encoder-decoder)和多孔空间金字塔池(atrous spatial pyramid pooling,ASPP)相结合的方式,增强了对道路边界的划分效果。模型在Massachusetts roads数据集进行了道路网络提取实验。分析结果表明,基于该方法的道路提取精度优于U-Net等网络模型,F1分数达到87.27%,与其他方法相比较,该方法能够更有效、完整地从遥感图像中提取道路。 展开更多
关键词 编码器-解码器 多孔金字塔池化 道路提取 DeepLabV3+ 深度学习
在线阅读 下载PDF
基于残差注意力和金字塔上采样的图像语义分割 被引量:4
16
作者 高军礼 周华 +2 位作者 宋海涛 郭靖 张慧 《信阳师范学院学报(自然科学版)》 CAS 北大核心 2022年第1期134-140,共7页
针对图像语义分割中,存在细节信息丢失、分割类别边缘模糊而粗糙的问题,在编码解码结构的基础上,结合残差模块和注意力机制,设计一种残差注意力模块。通过注意力机制加强特征图通道之间的联系,以提升语义分割的细腻度。为提高模型对多... 针对图像语义分割中,存在细节信息丢失、分割类别边缘模糊而粗糙的问题,在编码解码结构的基础上,结合残差模块和注意力机制,设计一种残差注意力模块。通过注意力机制加强特征图通道之间的联系,以提升语义分割的细腻度。为提高模型对多尺度物体的识别能力,结合金字塔模型,设计一种金字塔上采样模块。利用编码过程中产生的不同尺度的特征图,进行不同尺度的语义信息提取,以加强模型场景识别能力。最后,对所提出的方法进行实验验证,与FCN-8s、SegNet、Deeplab-v2、PSPNet等方法相比,针对VOC 2012,平均交并比(mIoU)和平均像素精度(mPA)最高分别提高了15.9%和3.57%;针对Cityscape数据集,mIoU和mPA指标分别提高了17.8%和13.3%,图像语义分割效果得到明显提升。 展开更多
关键词 残差注意力 金字塔模型 上采样 编解码器 卷积神经网络 图像语义分割
在线阅读 下载PDF
轻量金字塔解码结构的单目深度估计网络 被引量:2
17
作者 贾瑞明 李彤 +1 位作者 李阳 王一丁 《计算机应用研究》 CSCD 北大核心 2021年第1期293-297,共5页
针对单目深度估计网络庞大的参数量和计算量,提出一种轻量金字塔解码结构的单目深度估计网络,可以在保证估计精度的情况下降低网络模型的复杂度、减少运算时间。该网络基于编解码结构,以端到端的方式估计单目图像的深度图。编码端使用Re... 针对单目深度估计网络庞大的参数量和计算量,提出一种轻量金字塔解码结构的单目深度估计网络,可以在保证估计精度的情况下降低网络模型的复杂度、减少运算时间。该网络基于编解码结构,以端到端的方式估计单目图像的深度图。编码端使用ResNet50网络结构;在解码端提出了一种轻量金字塔解码模块,采用深度空洞可分离卷积和分组卷积以提升感受野范围,同时减少了参数量,并且采用金字塔结构融合不同感受野下的特征图以提升解码模块的性能;此外,在解码模块之间增加跳跃连接实现知识共享,以提升网络的估计精度。在NYUD v2数据集上的实验结果表明,与结构注意力引导网络相比,轻量金字塔解码结构的单目深度估计网络在误差RMS的指标上降低约11.0%,计算效率提升约84.6%。 展开更多
关键词 单目深度估计 卷积神经网络 编解码结构 轻量金字塔解码
在线阅读 下载PDF
基于注意力密集连接金字塔网络的新增建设用地变化检测 被引量:4
18
作者 潘建平 李鑫 +2 位作者 孙博文 胡勇 李明明 《测绘通报》 CSCD 北大核心 2022年第3期41-46,59,共7页
城市新增建设用地变化迅速频繁、场景复杂等因素导致变化检测结果出现欠分割或过分割等问题,基于此本文提出了一种融合注意力机制的密集连接金字塔网络用于城市新增建设用地变化检测。在编码阶段运用卷积注意力模型提升对变化信息的关注... 城市新增建设用地变化迅速频繁、场景复杂等因素导致变化检测结果出现欠分割或过分割等问题,基于此本文提出了一种融合注意力机制的密集连接金字塔网络用于城市新增建设用地变化检测。在编码阶段运用卷积注意力模型提升对变化信息的关注度,突出重要特征;采用密集连接空洞卷积空间金字塔池化模块实现多尺度特征的提取与融合,提高特征的利用率与传播效率;在解码阶段通过对提取的特征图进行上采样还原图像的空间尺度特征。试验结果表明,该方法有效改善了欠分割与过分割问题,变化检测效果更好。 展开更多
关键词 注意力机制 密集连接金字塔 编码解码 新增建设用地 变化检测
原文传递
一种基于改进的MobileNetV2网络语义分割算法 被引量:32
19
作者 孟琭 徐磊 郭嘉阳 《电子学报》 EI CAS CSCD 北大核心 2020年第9期1769-1776,共8页
基于金字塔卷积神经网络的语义分割算法准确率很高,但是其计算资源消耗巨大、算法执行时间长、无法满足实时性要求.为了解决这个问题,本文做出了以下改进:(1)用MobileNet替换原网络的结构,减少了网络运算时间和内存开销;(2)引入编码器-... 基于金字塔卷积神经网络的语义分割算法准确率很高,但是其计算资源消耗巨大、算法执行时间长、无法满足实时性要求.为了解决这个问题,本文做出了以下改进:(1)用MobileNet替换原网络的结构,减少了网络运算时间和内存开销;(2)引入编码器-解码器结构提高输出图像的分辨率,进一步细化分割结果;(3)针对高分辨率图像推断时间过长的问题,本文设计了多级图像输入方法,降低了网络推断高分辨率图像所消耗的时间.本文在VOC 2012数据集和Cityscapes数据集上进行了测试,并与FCN、SegNet、DeepLab、PSPNet以及DFN等语义分割模型对比.实验结果表明,本文设计的语义分割算法在VOC 2012数据集上达到了76.1%的mIoU,在Cityscapes数据集上达到了74.1%的mIoU,略低于传统语义分割算法;处理一张分辨率为1024×512的图片需要18ms,少于传统语义分割算法,满足了实时性要求,达到了准确率与计算资源消耗之间的平衡. 展开更多
关键词 语义分割 卷积神经网络 金字塔网络 快速语义分割 MobileNet 编码器-解码器
在线阅读 下载PDF
基于倒金字塔深度学习网络的三维医学图像分割 被引量:7
20
作者 张相芬 刘艳 袁非牛 《计算机工程》 CAS CSCD 北大核心 2022年第12期304-311,共8页
基于深度学习的医学图像分割对医学研究和临床疾病诊断具有重要意义。然而,现有三维脑图像分割网络仅依赖单一模态信息,且最后一层网络的特征表达不准确,导致分割精度降低。引入注意力机制,提出一种基于深度学习的多模态交叉重构的倒金... 基于深度学习的医学图像分割对医学研究和临床疾病诊断具有重要意义。然而,现有三维脑图像分割网络仅依赖单一模态信息,且最后一层网络的特征表达不准确,导致分割精度降低。引入注意力机制,提出一种基于深度学习的多模态交叉重构的倒金字塔网络MCRAIP-Net。以多模态磁共振图像作为输入,通过三个独立的编码器结构提取各模态的特征信息,并将提取的特征信息在同一分辨率级进行初步融合。利用双通道交叉重构注意力模块实现多模态特征的细化与融合。在此基础上,采用倒金字塔解码器对解码器各阶段不同分辨率的特征进行整合,完成脑组织的分割任务。在MRBrainS13和IBSR18数据集上的实验结果表明,相比3D U-Net、MMAN、SW-3DUnet等网络,MCRAIP-Net能够充分利用多模态图像的互补信息,获取更准确丰富的细节特征且具有较优的分割精度,白质、灰质、脑脊液的Dice系数分别达到91.67%、88.95%、84.79%。 展开更多
关键词 多模态融合 交叉重构注意力 倒金字塔解码器 医学图像分割 深度学习
在线阅读 下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部