期刊文献+
共找到888篇文章
< 1 2 45 >
每页显示 20 50 100
Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification
1
作者 Omar Alqahtani Mohamed Ghouse +2 位作者 Asfia Sabahath Omer Bin Hussain Arshiya Begum 《Computers, Materials & Continua》 2025年第5期2221-2244,共24页
This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi... This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison. 展开更多
关键词 Medical image retrieval vision transformer multi-scale encoding multi-loss function ISIC-2018 ChestX-ray14
在线阅读 下载PDF
Multi-Scale Fusion Network Using Time-Division Fourier Transform for Rolling Bearing Fault Diagnosis
2
作者 Ronghua Wang Shibao Sun +3 位作者 Pengcheng Zhao Xianglan Yang Xingjia Wei Changyang Hu 《Computers, Materials & Continua》 2025年第8期3519-3539,共21页
The capacity to diagnose faults in rolling bearings is of significant practical importance to ensure the normal operation of the equipment.Frequency-domain features can effectively enhance the identification of fault ... The capacity to diagnose faults in rolling bearings is of significant practical importance to ensure the normal operation of the equipment.Frequency-domain features can effectively enhance the identification of fault modes.However,existing methods often suffer from insufficient frequency-domain representation in practical applications,which greatly affects diagnostic performance.Therefore,this paper proposes a rolling bearing fault diagnosismethod based on aMulti-Scale FusionNetwork(MSFN)using the Time-Division Fourier Transform(TDFT).The method constructs multi-scale channels to extract time-domain and frequency-domain features of the signal in parallel.A multi-level,multi-scale filter-based approach is designed to extract frequency-domain features in a segmented manner.A cross-attention mechanism is introduced to facilitate the fusion of the extracted time-frequency domain features.The performance of the proposed method is validated using the CWRU and Ottawa datasets.The results show that the average accuracy of MSFN under complex noisy signals is 97.75%and 94.41%.The average accuracy under variable load conditions is 98.68%.This demonstrates its significant application potential compared to existing methods. 展开更多
关键词 Rolling bearing fault diagnosis time-division fourier transform cross-attention multi-scale feature fusion
在线阅读 下载PDF
CT-MFENet:Context Transformer and Multi-Scale Feature Extraction Network via Global-Local Features Fusion for Retinal Vessels Segmentation
3
作者 SHAO Dangguo YANG Yuanbiao +1 位作者 MA Lei YI Sanli 《Journal of Shanghai Jiaotong university(Science)》 2025年第4期668-682,共15页
Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete v... Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete vessel segmentation and poor continuity.In this study,we propose CT-MFENet to address the aforementioned issues.First,the use of context transformer(CT)allows for the integration of contextual feature information,which helps establish the connection between pixels and solve the problem of incomplete vessel continuity.Second,multi-scale dense residual networks are used instead of traditional CNN to address the issue of inadequate local feature extraction when the model encounters vessels at multiple scales.In the decoding stage,we introduce a local-global fusion module.It enhances the localization of vascular information and reduces the semantic gap between high-and low-level features.To address the class imbalance in retinal images,we propose a hybrid loss function that enhances the segmentation ability of the model for topological structures.We conducted experiments on the publicly available DRIVE,CHASEDB1,STARE,and IOSTAR datasets.The experimental results show that our CT-MFENet performs better than most existing methods,including the baseline U-Net. 展开更多
关键词 retinal vessel segmentation context transformer(CT) multi-scale dense residual hybrid loss function global-local fusion
原文传递
3D medical image segmentation using the serial-parallel convolutional neural network and transformer based on crosswindow self-attention
4
作者 Bin Yu Quan Zhou +3 位作者 Li Yuan Huageng Liang Pavel Shcherbakov Xuming Zhang 《CAAI Transactions on Intelligence Technology》 2025年第2期337-348,共12页
Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global featu... Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance. 展开更多
关键词 convolution neural network cross window self‐attention medical image segmentation transformer
在线阅读 下载PDF
不均衡样本下轴承故障的LSGAN‑Swin Transformer诊断方法
5
作者 刘杰 谭玉涛 +1 位作者 谷艳玲 杨娜 《振动工程学报》 北大核心 2025年第8期1775-1787,共13页
针对轴承在复杂环境下工作时故障数据难以大量获取,正常数据与故障数据比例严重失衡造成的深度模型训练不充分、诊断精度低等问题,提出一种基于LSGAN-Swin Transformer的轴承故障诊断方法,利用最小二乘生成对抗网络(LSGAN)扩充不均衡或... 针对轴承在复杂环境下工作时故障数据难以大量获取,正常数据与故障数据比例严重失衡造成的深度模型训练不充分、诊断精度低等问题,提出一种基于LSGAN-Swin Transformer的轴承故障诊断方法,利用最小二乘生成对抗网络(LSGAN)扩充不均衡或缺少的轴承数据集,引入窗口自注意力网络进行轴承故障状态识别,使用两种数据集验证所提方法的有效性,并分别与SGAN、WGAN进行对比,证明LSGAN生成的数据训练模型具有更高的准确率。在小样本条件下训练LSGAN,将所提Swin Transformer(Swin-T)模型与CNN、AlexNe和SqueezeNet进行对比,诊断准确率分别提升了34.85%、13.45%和12.95%。通过t-SNE可视化分析对模型分类效果进行评估,结果表明,LSGAN-Swin-T模型在训练样本数量较少时仍能较好地满足故障诊断中的需求,为不均衡数据下的轴承故障诊断研究提供思路。 展开更多
关键词 故障诊断 滚动轴承 不均衡样本 最小二乘生成对抗网络 Swin transformer
在线阅读 下载PDF
增强双流Transformer的柴油发动机剩余寿命预测模型 被引量:1
6
作者 张曦 杨颖 +2 位作者 陈超君 王春风 杨磊 《汽车工程》 北大核心 2025年第2期292-300,325,共10页
基于Transformer的模型在剩余使用寿命(remaining useful life,RUL)预测方面取得了显著的进展。然而,现有Transformer模型主要存在以下不足:模型在提取局部特征方面有所欠缺,且没有同时考虑输入特征的不同时间和不同空间的重要性。针对... 基于Transformer的模型在剩余使用寿命(remaining useful life,RUL)预测方面取得了显著的进展。然而,现有Transformer模型主要存在以下不足:模型在提取局部特征方面有所欠缺,且没有同时考虑输入特征的不同时间和不同空间的重要性。针对以上问题,提出一种增强的双流Transformer模型,通过局部特征提取模块和交互融合模块对模型进行增强。首先,通过局部特征提取模块分别在时间流和空间流提取局部特征,以弥补Transformer在局部特征提取方面的不足。然后,使用双流Transformer分别在时间和空间维度提取长期依赖,增强双流分支的互补学习。最后,构建交互融合模块,通过双线性融合方法捕获流级交互,进一步提升预测效果。使用多个模型在某柴油发动机制造商两个真实的数据集上进行实验,其结果表明评价指标RMSE和Score至少分别降低3.23%和5.89%。 展开更多
关键词 剩余使用寿命预测 transformer编码器 卷积神经网络 特征融合 滑动窗口
在线阅读 下载PDF
Denoising of seismic data via multi-scale ridgelet transform 被引量:4
7
作者 Henglei Zhang Tianyou Liu Yuncui Zhang 《Earthquake Science》 CSCD 2009年第5期493-498,共6页
Noise has traditionally been suppressed or eliminated in seismic data sets by the use of Fourier filters and, to a lesser degree, nonlinear statistical filters. Although these methods are quite useful under specific c... Noise has traditionally been suppressed or eliminated in seismic data sets by the use of Fourier filters and, to a lesser degree, nonlinear statistical filters. Although these methods are quite useful under specific conditions, they may produce undesirable effects for the low signal to noise ratio data. In this paper, a new method, multi-scale ridgelet transform, is used in the light of the theory of ridgelet transform. We employ wavelet transform to do sub-band decomposition for the signals and then use non-linear thresholding in ridgelet domain for every block. In other words, it is based on the idea of partition, at sufficiently fine scale, a curving singularity looks straight, and so ridgelet transform can work well in such cases. Applications on both synthetic data and actual seismic data from Sichuan basin, South China, show that the new method eliminates the noise portion of the signal more efficiently and retains a greater amount of geologic data than other methods, the quality and consecutiveness of seismic event are improved obviously as well as the quality of section is improved. 展开更多
关键词 ridgelet transform multi-scale random noise sub-band decomposition complex Morlet wavelet
在线阅读 下载PDF
MSD-Net: Pneumonia Classification Model Based on Multi-Scale Directional Feature Enhancement
8
作者 Tao Zhou Yujie Guo +3 位作者 Caiyue Peng Yuxia Niu Yunfeng Pan Huiling Lu 《Computers, Materials & Continua》 SCIE EI 2024年第6期4863-4882,共20页
Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the f... Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis. 展开更多
关键词 PNEUMONIA X-ray image ResNet multi-scale feature direction feature transformER
在线阅读 下载PDF
Sub-Regional Infrared-Visible Image Fusion Using Multi-Scale Transformation 被引量:2
9
作者 Yexin Liu Ben Xu +2 位作者 Mengmeng Zhang Wei Li Ran Tao 《Journal of Beijing Institute of Technology》 EI CAS 2022年第6期535-550,共16页
Infrared-visible image fusion plays an important role in multi-source data fusion,which has the advantage of integrating useful information from multi-source sensors.However,there are still challenges in target enhanc... Infrared-visible image fusion plays an important role in multi-source data fusion,which has the advantage of integrating useful information from multi-source sensors.However,there are still challenges in target enhancement and visual improvement.To deal with these problems,a sub-regional infrared-visible image fusion method(SRF)is proposed.First,morphology and threshold segmentation is applied to extract targets interested in infrared images.Second,the infrared back-ground is reconstructed based on extracted targets and the visible image.Finally,target and back-ground regions are fused using a multi-scale transform.Experimental results are obtained using public data for comparison and evaluation,which demonstrate that the proposed SRF has poten-tial benefits over other methods. 展开更多
关键词 image fusion infrared image visible image multi-scale transform
在线阅读 下载PDF
Multi-scale phase average waveform of electroencephalogram signals in childhood absence epilepsy using wavelet transformation 被引量:1
10
作者 Meiyun Zhang Benshu Zhang +2 位作者 Fenglou Wang Ying Chen Nan Jiang 《Neural Regeneration Research》 SCIE CAS CSCD 2010年第10期774-780,共7页
BACKGROUND: Recent studies have focused on various methods of wavelet transformation for electroencephalogram (EEG) signals. However, there are very few studies reporting characteristics of multi-scale phase waves ... BACKGROUND: Recent studies have focused on various methods of wavelet transformation for electroencephalogram (EEG) signals. However, there are very few studies reporting characteristics of multi-scale phase waves during epileptic discharge.OBJECTIVE: To extract multi-scale phase average waveforms from childhood absence epilepsy EEG signals between time and frequency domains using wavelet transformation, and to compare EEG signals of absence seizure with pre-epileptic seizure and normal children, and to quantify multi-scale phase average waveforms from childhood absence epilepsy EEG signals. DESIGN, TIME AND SETTING: The case-comparative experiment was performed at the Department of Neuroelectrophysiology, Tianjin Medical University from August 2002 to May 2005. PARTICIPANTS: A total of 15 patients with childhood absence epilepsy from the General Hospital of Tianjin Medical University were enrolled in the study. The patients were not administered anti-epileptic drugs or sedatives prior to EEG testing. In addition, 12 healthy, age- and gender-matched children were also enrolled.METHODS: EEG signals were tested on 15 patients with childhood absence epilepsy and 12 normal children. Epileptic discharge signals during clinical and subclinical seizures were collected 10 and 20 times, respectively. The collected EEG signals were treated with wavelet transformation to extract multi-scale characteristics during absence epilepsy seizure using a conditional sampling method. Multi-scale phase average waveforms were collected using a conditional phase averaging technique. Amplitude of phase average waveform from EEG signals of epilepsy seizure, subclinical epileptic discharge, and EEG signals of normal children were compared and statistically analyzed in the first half-cycle.MAIN OUTCOME MEASURES: Multi-scale wavelet coefficient and the evolution of EEG signals were observed during childhood absence epilepsy seizures using wavelet transformation. Multi-scale phase average waveforms from EEG signals were observed using a conditional sampling method and phase averaging technique.RESULTS: Multi-scale characteristics of EEG signals demonstrated that 12-scale (3 Hz) rhythmical activity was significantly enhanced during childhood absence epilepsy seizure and co-existed with background structure (〈1 Hz, low frequency discharge). The phase average wave exhibited opposed phase abnormal rhythm at 3 Hz. Prior to childhood absence epilepsy seizure, EEG detected opposed abnormal a rhythm and 3 Hz composition, which were not detected with traditional EEG. Compared to EEG signals from normal children, epileptic discharges from clinical and subclinical childhood absence epilepsy seizures were positive and amplitude was significantly greater (P〈0.05).CONCLUSION: Wavelet transformation was used to analyze EEG signals from childhood absence epilepsy to obtain multi-scale quantitative characteristics and phase average waveforms. Multi-scale wavelet coefficients of EEG signals correlated with childhood absence epilepsy seizure, and multi-scale waveforms prior to epilepsy seizure were similar to characteristics during the onset period. Compared to normal children, EEG signals during epilepsy seizure exhibited an opposed phase model. 展开更多
关键词 EEG multi-scale absence epilepsy wavelet transform phase average waveform neuroelectrophysiology neural regeneration
暂未订购
An infrared and visible image fusion method based upon multi-scale and top-hat transforms 被引量:1
11
作者 Gui-Qing He Qi-Qi Zhang +3 位作者 Hai-Xi Zhang Jia-Qi Ji Dan-Dan Dong Jun Wang 《Chinese Physics B》 SCIE EI CAS CSCD 2018年第11期340-348,共9页
The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients ar... The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract highfrequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multiscale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced. 展开更多
关键词 infrared and visible image fusion multi-scale transform mathematical morphology top-hat trans- form
原文传递
融合多特征与全局-局部Transformer的图像修复算法 被引量:1
12
作者 滕诗宇 何丽君 《电子测量技术》 北大核心 2025年第6期121-129,共9页
针对当前图像修复领域所面临的高计算复杂度以及在生成结构合理且细节丰富的图像方面的局限,提出了一种融合多尺度分层特征与全局-局部协同Transformer的图像修复模型。首先提出多尺度分层特征融合模块,以实现深层特征与浅层特征细节上... 针对当前图像修复领域所面临的高计算复杂度以及在生成结构合理且细节丰富的图像方面的局限,提出了一种融合多尺度分层特征与全局-局部协同Transformer的图像修复模型。首先提出多尺度分层特征融合模块,以实现深层特征与浅层特征细节上的有效融合,在扩大感受野的同时减少关键信息丢失情况。其次提出用于全局推理的全局-局部协同Transformer模块,它通过集成矩形窗口注意力机制和局部前馈神经网络,在降低计算复杂度的同时,提高模型对全局上下文信息的宏观理解和对局部细节特征的微观捕捉能力,增强图像的整体一致性。实验在CelebA-HQ和Places2数据集上进行了验证,在处理40%~50%掩码时,所提方法与常用的修复方法对比,PSNR平均提高了0.26~6.25 dB,SSIM平均提升了1.4%~19%,L1平均下降了0.2%~5.66%。实验证明,所提方法修复后的图像在视觉上具有更加真实和自然的效果,进一步验证了该方法的有效性。 展开更多
关键词 深度学习 图像修复 多尺度分层特征融合 全局-局部协同transformer 矩形窗口注意力机制 局部前馈神经网络
原文传递
结合Swin Transformer与MobileNetv3的多源无人机影像目标检测方法 被引量:3
13
作者 王新广 李辉 《城市勘测》 2025年第1期27-32,共6页
针对无人机载荷硬件算力有限,而当前轻量级模型对无人机影像内目标检测精度不佳的问题,设计了一种基于MobileNetv3的轻量级目标检测模型。在特征提取层内引入通道混排注意力机制,同步捕捉通道与空间维度的注意力特征,同时在网络末端引... 针对无人机载荷硬件算力有限,而当前轻量级模型对无人机影像内目标检测精度不佳的问题,设计了一种基于MobileNetv3的轻量级目标检测模型。在特征提取层内引入通道混排注意力机制,同步捕捉通道与空间维度的注意力特征,同时在网络末端引入移动式窗口式视觉变压器模块,计算全局尺度的上下文语义特征;通过双向加权特征金字塔实现多尺度特征加权融合,并在其中引入深度可分离卷积核与动态上采样层,降低融合阶段计算消耗;参考YOLOv7损失函数结构,采用焦点-高效交并比函数计算目标框回归损失,采用梯度协调函数计算目标分类损失。在光学及热红外影像数据集上的实验结果表明,所提模型较原模型在检测速度方面提升了3.52%与3.83%,同时mAP 0.5也分别提高7.11%与6.85%,与对照组内主流轻量级检测模型相比,本文模型在检测精度、速度及体量方面,具有一定的优势,适合部署在无人机载荷硬件中,针对复杂地面场景,开展全天候目标实时检测。 展开更多
关键词 多源无人机影像 轻量级目标检测 MobileNetv3 移动式窗口视觉变压器 动态上采样
在线阅读 下载PDF
LSD-DETR:a Lightweight Real-Time Detection Transformer for SAR Ship Detection
14
作者 GAO Gui LINGHU Wenya 《Journal of Geodesy and Geoinformation Science》 2025年第1期47-70,共24页
Recently,there has been a widespread application of deep learning in object detection with Synthetic Aperture Radar(SAR).The current algorithms based on Convolutional Neural Networks(CNN)often achieve good accuracy at... Recently,there has been a widespread application of deep learning in object detection with Synthetic Aperture Radar(SAR).The current algorithms based on Convolutional Neural Networks(CNN)often achieve good accuracy at the expense of more complex model structures and huge parameters,which poses a great challenge for real-time and accurate detection of multi-scale targets.To address these problems,we propose a lightweight real-time SAR ship object detector based on detection transformer(LSD-DETR)in this study.First,a lightweight backbone network LCNet containing a stem module and inverted residual structure is constructed to balance the inference speed and detection accuracy of model.Second,we design a transformer encoder with Cascaded Group Attention(CGA Encoder)to enrich the feature information of small targets in SAR images,which makes detection of small-sized ships more precise.Third,an efficient cross-scale feature fusion pyramid module(C3Het-FPN)is proposed through the lightweight units(C3Het)and the introduction of the weighted bidirectional feature pyramid(BiFPN)structure,which realizes the adaptive fusion of multi-scale features with fewer parameters.Ablation experiments and comparative experiments demonstrate the effectiveness of LSD-DETR.The model parameter of LSD-DETR is 8.8 M(only 20.6%of DETR),the model’s FPS reached 43.1,the average detection accuracy mAP50 on the SSDD and HRSID datasets reached 97.3%and 93.4%.Compared to advanced methods,the LSD-DETR can attain superior precision with fewer parameters,which enables accurate real-time object detection of multi-scale ships in SAR images. 展开更多
关键词 detection transformer Synthetic Aperture Radar(SAR) LIGHTWEIGHT multi-scale ship detection deep learning
在线阅读 下载PDF
Coupling the Power of YOLOv9 with Transformer for Small Object Detection in Remote-Sensing Images
15
作者 Mohammad Barr 《Computer Modeling in Engineering & Sciences》 2025年第4期593-616,共24页
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen... Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images. 展开更多
关键词 Remote sensing images YOLOv9-TH multi-scale object detection transformer heads VisDrone2021 dataset
在线阅读 下载PDF
Fast Face Detection with Multi-Scale Window Search Free from Image Resizing Using SGI Features
16
作者 Masayuki Miyama 《Journal of Computer and Communications》 2016年第10期22-29,共9页
Face detection is applied to many tasks such as auto focus control, surveillance, user interface, and face recognition. Processing speed and detection accuracy of the face detection have been improved continuously. Th... Face detection is applied to many tasks such as auto focus control, surveillance, user interface, and face recognition. Processing speed and detection accuracy of the face detection have been improved continuously. This paper describes a novel method of fast face detection with multi-scale window search free from image resizing. We adopt statistics of gradient images (SGI) as image features and append an overlapping cell array to improve detection accuracy. The SGI feature is scale invariant and insensitive to small difference of pixel value. These characteristics enable the multi-scale window search without image resizing. Experimental results show that processing speed of our method is 3.66 times faster than a conventional method, adopting HOG features combined to an SVM classifier, without accuracy degradation. 展开更多
关键词 Face Detection multi-scale window Search Resizing Free SGI Feature
在线阅读 下载PDF
Electrocardiogram Signal Denoising Using Optimized Adaptive Hybrid Filter with Empirical Wavelet Transform
17
作者 BALASUBRAMANIAN S NARUKA Mahaveer Singh TEWARI Gaurav 《Journal of Shanghai Jiaotong university(Science)》 2025年第1期66-80,共15页
Cardiovascular diseases are the world’s leading cause of death;therefore cardiac health of the human heart has been a fascinating topic for decades.The electrocardiogram(ECG)signal is a comprehensive non-invasive met... Cardiovascular diseases are the world’s leading cause of death;therefore cardiac health of the human heart has been a fascinating topic for decades.The electrocardiogram(ECG)signal is a comprehensive non-invasive method for determining cardiac health.Various health practitioners use the ECG signal to ascertain critical information about the human heart.In this article,swarm intelligence approaches are used in the biomedical signal processing sector to enhance adaptive hybrid filters and empirical wavelet transforms(EWTs).At first,the white Gaussian noise is added to the input ECG signal and then applied to the EWT.The ECG signals are denoised by the proposed adaptive hybrid filter.The honey badge optimization(HBO)algorithm is utilized to optimize the EWT window function and adaptive hybrid filter weight parameters.The proposed approach is simulated by MATLAB 2018a using the MIT-BIH dataset with white Gaussian,electromyogram and electrode motion artifact noises.A comparison of the HBO approach with recursive least square-based adaptive filter,multichannel least means square,and discrete wavelet transform methods has been done in order to show the efficiency of the proposed adaptive hybrid filter.The experimental results show that the HBO approach supported by EWT and adaptive hybrid filter can be employed efficiently for cardiovascular signal denoising. 展开更多
关键词 electrocardiogram(ECG)signal denoising empirical wavelet transform(EWT) honey badge optimization(HBO) adaptive hybrid filter window function
原文传递
基于窗口注意力聚合Swin Transformer的无人机影像语义分割方法 被引量:2
18
作者 李俊杰 易诗 +1 位作者 何润华 刘茜 《计算机工程与应用》 CSCD 北大核心 2024年第15期198-210,共13页
采用无人机遥感影像进行地物分类的过程中,由于无人机影像的小尺寸地物目标不够突出和无人机影像背景复杂、地物信息难以辨别等问题,采用现行的经典语义分割方法难以获得理想的地物分类效果。该研究以Swin Transformer网络模型为基础,... 采用无人机遥感影像进行地物分类的过程中,由于无人机影像的小尺寸地物目标不够突出和无人机影像背景复杂、地物信息难以辨别等问题,采用现行的经典语义分割方法难以获得理想的地物分类效果。该研究以Swin Transformer网络模型为基础,提出了基于窗口注意力聚合Swin Transformer(window attention aggregation Swin Transformer,WAA SwinT)的语义分割网络模型方法。采用了多窗口注意力聚合的方式来进行更精准的注意力计算,以提升无人机遥感影像中的小尺寸地物目标的分类精度和质量。同时借鉴嵌入连接的思想,采用多级特征嵌入连接解码器改善网络结构,应用于无人机遥感影像的分割中,取得了更精细化的分割效果。为了验证提出的方法在无人机影像语义分割中的效果,分别在城市无人机遥感影像UAVid数据集和UDD数据集进行了实验,并与现行的经典语义分割方法进行了对比。实验结果表明,语义分割方法在UAVid数据集和UDD数据集上均可以得到最佳的语义分割效果。同时,该语义分割方法能显著地提升无人机影像中小尺寸地物精准分割的质量。 展开更多
关键词 无人机影像 语义分割 Swin transformer 窗口注意力聚合
在线阅读 下载PDF
面向弱纹理目标立体匹配的Transformer网络 被引量:1
19
作者 贾迪 蔡鹏 +2 位作者 吴思 王骞 宋慧伦 《中国图象图形学报》 CSCD 北大核心 2024年第8期2413-2425,共13页
目的近年来,采用神经网络完成立体匹配任务已成为计算机视觉领域的研究热点,目前现有方法存在弱纹理目标缺乏全局表征的问题,为此本文提出一种基于Transformer架构的密集特征提取网络。方法首先,采用空间池化窗口策略使得Transformer层... 目的近年来,采用神经网络完成立体匹配任务已成为计算机视觉领域的研究热点,目前现有方法存在弱纹理目标缺乏全局表征的问题,为此本文提出一种基于Transformer架构的密集特征提取网络。方法首先,采用空间池化窗口策略使得Transformer层可以在维持线性计算复杂度的同时,捕获广泛的上下文表示,弥补局部弱纹理导致的特征匮乏问题。其次,通过卷积与转置卷积实现重叠式块嵌入,使得所有特征点都尽可能多地捕捉邻近特征,便于细粒度匹配。再者,通过将跳跃查询策略应用于编码器和解码器间的特征融合部分,以此实现高效信息传递。最后,针对立体像对存在的遮挡情况,对固定区域内的匹配概率进行截断求和,输出更为合理的遮挡置信度。结果在Scene Flow数据集上进行了消融实验,实验结果表明,本文网络获得了0.33的绝对像素距离,0.92%的异常像素占比和98%的遮挡预测交并比。为了验证模型在实际路况场景下的有效性,在KITTI-2015数据集上进行了补充对比实验,本文方法获得了1.78%的平均异常值百分比,上述指标均优于STTR(stereo Transformer)等主流方法。此外,在KITTI-2015、MPI-Sintel(max planck institute sintel)和Middlebury-2014数据集的测试中,本文模型具备较强的泛化性。结论本文提出了一个纯粹的基于Transformer架构的密集特征提取器,使用空间池化窗口策略减小注意力计算的空间规模,并利用跳跃查询策略对编码器和解码器的特征进行了有效融合,可以较好地提高Transformer架构下的特征提取性能。 展开更多
关键词 立体匹配 弱纹理目标 transformER 空间池化窗口 跳跃查询 截断求和 Scene Flow KITTI-2015
原文传递
基于Swin-Transformer改进的目标跟踪算法 被引量:1
20
作者 刘时 朱明 《液晶与显示》 CAS CSCD 北大核心 2024年第11期1569-1580,共12页
基于STARK目标跟踪方法中采用ResNet为骨干网络,其特征提取能力不足,跟踪效果较差。针对此问题,本文基于Swin-Transformer网络,提出了一种改进的目标跟踪算法。首先,对Swin-Transformer内窗口注意力机制进行多尺度改进,设计多尺度窗口模... 基于STARK目标跟踪方法中采用ResNet为骨干网络,其特征提取能力不足,跟踪效果较差。针对此问题,本文基于Swin-Transformer网络,提出了一种改进的目标跟踪算法。首先,对Swin-Transformer内窗口注意力机制进行多尺度改进,设计多尺度窗口模块MW-MSA,旨在提取更为丰富的局部细节信息,与全局上下文信息共同构成多尺度判别性特征。接着,结合Transformer的编码-解码结构作为特征融合网络,采用优化的多层感知机作为更新分数判断网络构成状态感知模块。最后,针对目标消失、重现挑战,提出了一种多跟踪器融合方法。融合多尺度改进的跟踪算法和SuperDiMP跟踪算法,设计消失状态判断模块,综合考虑两种跟踪器的置信度分数及目标在预测框附近的可能性估计。实验结果表明,相较STARK跟踪算法,本文算法在GOT-10K数据集上的平均重叠率(AO)提升2.7%、成功率SR_(0.5)提高3.3%。在L-LaSOT数据集上,相较于STARK算法,成功率(AUC)提升0.8%,在目标消失重现挑战下成功率提升1%。 展开更多
关键词 目标跟踪 多尺度窗口 Swin-transformer 模板更新 多模型融合
在线阅读 下载PDF
上一页 1 2 45 下一页 到第
使用帮助 返回顶部