期刊文献+
共找到3,123篇文章
< 1 2 157 >
每页显示 20 50 100
A Lightweight Multimodal Deep Fusion Network for Face Antis Poofing with Cross-Axial Attention and Deep Reinforcement Learning Technique
1
作者 Diyar Wirya Omar Ameenulhakeem Osman Nuri Uçan 《Computers, Materials & Continua》 2025年第12期5671-5702,共32页
Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.How... Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.However,attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information.Many existing methods for face spoofing face difficulties when dealing with new scenarios,especially when there are variations in background,lighting,and other environmental factors.Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing,surpassing single-modal methods.However,these approaches often generate several features that can lead to issues with data dimensionality.In this study,we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques.This network operates at three patch levels and analyzes images from modalities(RGB,IR,and depth).Initially,our design includes an axial attention network(XANet)model that extracts deeply hidden features from multimodal images.Further,we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively.We further improve feature optimization by using the Enhanced Pity Beetle Optimization(EPBO)algorithm,which selects the features to address data dimensionality problems.Moreover,our proposed model employs a hybrid federated reinforcement learning(FDDRL)approach to detect and classify face anti-spoofing,achieving a more optimal tradeoff between detection rates and false positive rates.We evaluated the proposed approach on publicly available datasets,including CASIA-SURF and GREATFASD-S,and realized 98.985%and 97.956%classification accuracy,respectively.In addition,the current method outperforms other state-of-the-art methods in terms of precision,recall,and Fmeasures.Overall,the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts. 展开更多
关键词 Face antispoofing LIGHTWEIGHT MULTIMODAL deep feature fusion feature extraction feature optimization
在线阅读 下载PDF
An Improved Deep Fusion CNN for Image Recognition 被引量:7
2
作者 Rongyu Chen Lili Pan +3 位作者 Cong Li Yan Zhou Aibin Chen Eric Beckman 《Computers, Materials & Continua》 SCIE EI 2020年第11期1691-1706,共16页
With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies h... With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies have shown that the deeper the network is,the more abstract the features are.However,the recognition ability of deep features would be limited by insufficient training samples.To address this problem,this paper derives an improved Deep Fusion Convolutional Neural Network(DF-Net)which can make full use of the differences and complementarities during network learning and enhance feature expression under the condition of limited datasets.Specifically,DF-Net organizes two identical subnets to extract features from the input image in parallel,and then a well-designed fusion module is introduced to the deep layer of DF-Net to fuse the subnet’s features in multi-scale.Thus,the more complex mappings are created and the more abundant and accurate fusion features can be extracted to improve recognition accuracy.Furthermore,a corresponding training strategy is also proposed to speed up the convergence and reduce the computation overhead of network training.Finally,DF-Nets based on the well-known ResNet,DenseNet and MobileNetV2 are evaluated on CIFAR100,Stanford Dogs,and UECFOOD-100.Theoretical analysis and experimental results strongly demonstrate that DF-Net enhances the performance of DCNNs and increases the accuracy of image recognition. 展开更多
关键词 deep convolutional neural networks deep features image recognition deep fusion feature fusion.
在线阅读 下载PDF
DMF: A Deep Multimodal Fusion-Based Network Traffic Classification Model
3
作者 Xiangbin Wang Qingjun Yuan +3 位作者 Weina Niu Qianwei Meng Yongjuan Wang Chunxiang Gu 《Computers, Materials & Continua》 2025年第5期2267-2285,共19页
With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods... With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods. 展开更多
关键词 deep fusion intrusion detection multimodal learning network traffic classification
在线阅读 下载PDF
Performance vs.Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems
4
作者 Sarah M.Kamel Mai A.Fadel +1 位作者 Lamiaa Elrefaei Shimaa I.Hassan 《Computer Modeling in Engineering & Sciences》 2025年第4期373-411,共39页
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate... Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions. 展开更多
关键词 Arabic-VQA deep learning-based VQA deep multimodal information fusion multimodal representation learning VQA of yes/no questions VQA model complexity VQA model performance performance-complexity trade-off
在线阅读 下载PDF
Seismic velocity inversion based on CNN-LSTM fusion deep neural network 被引量:9
5
作者 Cao Wei Guo Xue-Bao +4 位作者 Tian Feng Shi Ying Wang Wei-Hong Sun Hong-Ri Ke Xuan 《Applied Geophysics》 SCIE CSCD 2021年第4期499-514,593,共17页
Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-mi... Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-midpoint(CMP)gather.In the proposed method,a convolutional neural network(CNN)Encoder and two long short-term memory networks(LSTMs)are used to extract spatial and temporal features from seismic signals,respectively,and a CNN Decoder is used to recover RMS velocity and interval velocity of underground media from various feature vectors.To address the problems of unstable gradients and easily fall into a local minimum in the deep neural network training process,we propose to use Kaiming normal initialization with zero negative slopes of rectifi ed units and to adjust the network learning process by optimizing the mean square error(MSE)loss function with the introduction of a freezing factor.The experiments on testing dataset show that CNN-LSTM fusion deep neural network can predict RMS velocity as well as interval velocity more accurately,and its inversion accuracy is superior to that of single neural network models.The predictions on the complex structures and Marmousi model are consistent with the true velocity variation trends,and the predictions on fi eld data can eff ectively correct the phase axis,improve the lateral continuity of phase axis and quality of stack section,indicating the eff ectiveness and decent generalization capability of the proposed method. 展开更多
关键词 Velocity inversion CNN-LSTM fusion deep neural network weight initialization training strategy
在线阅读 下载PDF
A deep multimodal fusion and multitasking trajectory prediction model for typhoon trajectory prediction to reduce flight scheduling cancellation 被引量:1
6
作者 TANG Jun QIN Wanting +1 位作者 PAN Qingtao LAO Songyang 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期666-678,共13页
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon... Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather. 展开更多
关键词 flight scheduling optimization deep multimodal fusion multitasking trajectory prediction typhoon weather flight cancellation prediction reliability
在线阅读 下载PDF
Deep Bimodal Fusion Approach for Apparent Personality Analysis
7
作者 Saman Riaz Ali Arshad +1 位作者 Shahab S.Band Amir Mosavi 《Computers, Materials & Continua》 SCIE EI 2023年第4期2301-2312,共12页
Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research ... Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research concludespreliminary results to get immense knowledge from visual and Audio(sound) modality. To overcome the deficiency, we proposed the Deep BimodalFusion (DBF) approach to predict five traits of personality-agreeableness,extraversion, openness, conscientiousness and neuroticism. In the proposedframework, regarding visual modality, the modified convolution neural networks(CNN), more specifically Descriptor Aggregator Model (DAN) areused to attain significant visual modality. The proposed model extracts audiorepresentations for greater efficiency to construct the long short-termmemory(LSTM) for the audio modality. Moreover, employing modality-based neuralnetworks allows this framework to independently determine the traits beforecombining them with weighted fusion to achieve a conclusive prediction of thegiven traits. The proposed approach attains the optimal mean accuracy score,which is 0.9183. It is achieved based on the average of five personality traitsand is thus better than previously proposed frameworks. 展开更多
关键词 Apparent personality analysis deep bimodal fusion convolutional neural network long short-term memory bimodal information fusion approach
在线阅读 下载PDF
TGNet:Intelligent Identification of Thunderstorm Wind Gusts Using Multimodal Fusion 被引量:4
8
作者 Xiaowen ZHANG Yongguang ZHENG +3 位作者 Hengde ZHANG Jie SHENG Bingjian LU Shuo FENG 《Advances in Atmospheric Sciences》 2025年第1期146-164,共19页
Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.There... Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts. 展开更多
关键词 thunderstorm wind gusts shapelet transform multimodal deep feature fusion
在线阅读 下载PDF
Application of Attributes Fusion Technology in Prediction of Deep Reservoirs in Paleogene of Bohai Sea
9
作者 ZHANG Daxiang YIN Taiju +1 位作者 SUN Shaochuan SHI Qian 《Acta Geologica Sinica(English Edition)》 SCIE CAS CSCD 2017年第S1期148-149,共2页
1 Introduction The Paleogene strata(with a depth of more than 2500m)in the Bohai sea is complex(Xu Changgui,2006),the reservoir buried deeply,the reservoir prediction is difficult(LAI Weicheng,XU Changgui,2012),and more
关键词 In DATA Application of Attributes fusion Technology in Prediction of deep Reservoirs in Paleogene of Bohai Sea RGB
在线阅读 下载PDF
基于改进DeepLabV3+算法的遥感影像滑坡识别 被引量:2
10
作者 李旺平 尉文博 +6 位作者 刘晓杰 柴成富 张雪莹 周兆叶 张秀霞 郝君明 魏玉明 《地球信息科学学报》 北大核心 2025年第6期1448-1461,共14页
【目的】深度学习方法在地物识别中可以通过自动提取复杂地形特征从而显著提升效率,其中DeepLabV3+算法能够有效捕获多像素特征,被广泛地应用于遥感影像的分割和识别。但其在滑坡识别中细节处理能力受限,容易导致目标边界的模糊和识别错... 【目的】深度学习方法在地物识别中可以通过自动提取复杂地形特征从而显著提升效率,其中DeepLabV3+算法能够有效捕获多像素特征,被广泛地应用于遥感影像的分割和识别。但其在滑坡识别中细节处理能力受限,容易导致目标边界的模糊和识别错误,此外,该模型依靠卷积运算捕获的是局部信息,难以有效地建立长距离依赖关系。【方法】本文提出了一种基于DeepLabV3+的改进模型,首先,引入坐标注意力(Coordinate Attention,CA)机制,增强特征表达能力。其次,使用密集空间空洞金字塔池化(Dense Atrous Spatial Pyramid Pooling,DenseASPP)模块替换原有的空间空洞金字塔池化(Atrous Spatial Pyramid Pooling,ASPP)模块,提升多尺度特征提取效果并有效地解决了空洞卷积低效或失效的问题;同时,通过并联加入条形池化(Strip Pooling,SP)分支模块,提升主干网络对长距离依赖关系的建模能力。最后,引入级联特征融合(Cascade Feature Fusion,CFF)模块,用于整合不同层次的特征信息,进一步优化分割性能。【结果】使用毕节滑坡数据集进行实验,结果表明,改进后模型相较原模型的MIoU提高了2.2%,F1分数提高了1.2%;与其他主流深度学习模型进行对比,该模型在提取精度方面均表现出一定优势。在分割效果上,该模型在识别滑坡区域的整体准确性上有显著提高,分割结果与原始滑坡形态保持很高的一致性,减少了错分和漏分现象,在滑坡边界的分割上更加精确。【结论】通过验证数据集测试及实际应用验证,本文提出的方法在不同场景、不同复杂程度下的滑坡影像均表现出较强的识别能力,尤其在植被覆盖区、河流邻近区域等复杂背景环境中表现更加稳定,展现出较强的泛化能力和普适性。 展开更多
关键词 滑坡识别 遥感影像 深度学习 语义分割 deepLabV3+ 注意力机制 DenseASPP 特征融合
原文传递
Deep Convolutional Feature Fusion Model for Multispectral Maritime Imagery Ship Recognition
11
作者 Xiaohua Qiu Min Li +1 位作者 Liqiong Zhang Rui Zhao 《Journal of Computer and Communications》 2020年第11期23-43,共21页
Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural net... Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural network and multispectral data, we model multispectral ship recognition task into a convolutional feature fusion problem, and propose a feature fusion architecture called Hybrid Fusion. We fine-tune the VGG-16 model pre-trained on ImageNet through three channels single spectral image and four channels multispectral images, and use existing regularization techniques to avoid over-fitting problem. Hybrid Fusion as well as the other three feature fusion architectures is investigated. Each fusion architecture consists of visible image and infrared image feature extraction branches, in which the pre-trained and fine-tuned VGG-16 models are taken as feature extractor. In each fusion architecture, image features of two branches are firstly extracted from the same layer or different layers of VGG-16 model. Subsequently, the features extracted from the two branches are flattened and concatenated to produce a multispectral feature vector, which is finally fed into a classifier to achieve ship recognition task. Furthermore, based on these fusion architectures, we also evaluate recognition performance of a feature vector normalization method and three combinations of feature extractors. Experimental results on the visible and infrared ship (VAIS) dataset show that the best Hybrid Fusion achieves 89.6% mean per-class recognition accuracy on daytime paired images and 64.9% on nighttime infrared images, and outperforms the state-of-the-art method by 1.4% and 3.9%, respectively. 展开更多
关键词 deep Convolutional Neural Network Feature fusion Multispectral Data Ob-ject Recognition
在线阅读 下载PDF
基于改进DeepLabV3+的苹果叶面病斑语义分割方法
12
作者 郑瑜煌 陈丙三 +2 位作者 章文水 卢敏瑞 张腾健 《中国农机化学报》 北大核心 2025年第8期75-82,共8页
以苹果叶面病斑为研究对象,针对现有模型分割精度低、模型参数量大的问题,提出一种基于改进DeepLabV3+的苹果叶面病斑语义分割方法。使用MobileNetV2作为DeepLabV3+的主干特征提取网络,以减少模型参数量;提出MP—DenseASPP模块,在ASPP... 以苹果叶面病斑为研究对象,针对现有模型分割精度低、模型参数量大的问题,提出一种基于改进DeepLabV3+的苹果叶面病斑语义分割方法。使用MobileNetV2作为DeepLabV3+的主干特征提取网络,以减少模型参数量;提出MP—DenseASPP模块,在ASPP的基础上增添空洞卷积层并进行密集连接,同时结合混合池化模块,增大模型的感受野,提高模型的鲁棒性;设计一种多尺度浅层特征层,增强对多尺度目标的分割能力;改进AFF模块,提出ECAFF模块以融合多尺度浅层特征层中各层级,增强层与层间的特征融合能力。结果表明,改进的DeepLabV3+模型在ATLDSD数据集上的平均交并比、平均像素精度和F1分数分别达到72.22%、88.77%、83.44%,对比原模型分别提升1.10%、4.73%、1.02%。改进后模型的浮点计算量和参数量对比原模型分别减少58.5%、77.1%,而检测速度对比原模型提高6.67帧/s。该方法可大幅减少模型参数量,在保证叶面病斑检测精度的同时满足实时性,为基于语义分割的叶面病斑在线检测奠定基础。 展开更多
关键词 苹果叶片 病害分割 注意力机制 特征融合 深度学习
在线阅读 下载PDF
一种改进DeepLabV3+的SAR图像建筑分割方法 被引量:1
13
作者 张文武 龙伟军 +1 位作者 陈虹廷 陈逸飞 《无线电工程》 2025年第3期475-483,共9页
合成孔径雷达(Synthetic Aperture Radar,SAR)图像相对于光学图像具有一定的穿透能力和全天候连续监测能力等优势,适合更多场景的应用。建筑分割图像对于城市规划、环境监测以及灾害评估等领域具有重要作用。针对SAR图像中建筑分割算法... 合成孔径雷达(Synthetic Aperture Radar,SAR)图像相对于光学图像具有一定的穿透能力和全天候连续监测能力等优势,适合更多场景的应用。建筑分割图像对于城市规划、环境监测以及灾害评估等领域具有重要作用。针对SAR图像中建筑分割算法特征提取能力不足、分割精度较低的问题,提出一种改进DeepLabV3+的语义分割模型——CFNet。CFNet将传统DeepLabV3+的主干网络Xception修改为MobileNetV2主干网络,以减少模型总参数量并提升运算速度;提出了一种新的结合通道注意力机制和空间注意力机制的交叉注意力机制,以提取浅层和深层特征;改进了网络中提取的浅层和深层特征的融合方式,分别将浅层和深层特征作为辅助引入进行二者的融合,最大程度地利用了网络中的浅层与深层特征,提升了算法的特征提取能力。在SARBuD 1.0数据集上的实验结果表明,CFNet的平均交并比(mean Intersection over Union,mIoU)为80.69%,精确率(Precision)为87.99%,召回率(Recall)为92.05%,F1因子为89.86%,相较于其他多种分割网络,CFNet在SAR图像建筑分割精度上有一定提升。 展开更多
关键词 deepLabV3+模型 合成孔径雷达图像 深度学习 语义分割 特征融合
在线阅读 下载PDF
The Fusion of Temporal Sequence with Scene Priori Information in Deep Learning Object Recognition
14
作者 Yongkang Cao Fengjun Liu +2 位作者 Xian Wang Wenyun Wang Zhaoxin Peng 《Open Journal of Applied Sciences》 2024年第9期2610-2627,共18页
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe... For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance. 展开更多
关键词 Computer Vison Object Recognition deep Learning Consecutive Scene Information fusion
在线阅读 下载PDF
多模态融合技术在抗深度伪造(Deepfake)中的应用
15
作者 高思童 《信息与电脑》 2025年第2期185-187,共3页
深度伪造(Deepfake)技术在视觉、音频和文本生成领域的快速发展,给信息安全和真实性带来了严重挑战。现有的伪造检测方法大多基于单一模态,难以有效应对高质量伪造样本的复杂性。为解决这一问题,文章提出了一种基于多模态融合的深度伪... 深度伪造(Deepfake)技术在视觉、音频和文本生成领域的快速发展,给信息安全和真实性带来了严重挑战。现有的伪造检测方法大多基于单一模态,难以有效应对高质量伪造样本的复杂性。为解决这一问题,文章提出了一种基于多模态融合的深度伪造检测框架,利用视觉、音频和文本模态特征之间的跨模态不一致性,显著提升检测性能。实验在Face Forensics++和DF-TIMIT等公开数据集上验证了方法的有效性。结果表明,文章所提出的方法在不同伪造类型下的检测准确率相比现有方法都有提高,泛化性能和鲁棒性也得到了显著提升。 展开更多
关键词 多模态融合 深度伪造 真实性 检测能力
在线阅读 下载PDF
基于多方位感知深度融合检测头的目标检测算法
16
作者 包晓安 彭书友 +3 位作者 张娜 涂小妹 张庆琪 吴彪 《浙江大学学报(工学版)》 北大核心 2026年第1期32-42,共11页
针对传统目标检测头难以有效捕捉全局信息的问题,提出基于多方位感知深度融合检测头的目标检测算法.通过在检测头部分设计高效双轴窗口注意力编码器(EDWE)模块,使网络能够深度融合捕获到的全局信息与局部信息;在特征金字塔结构之后使用... 针对传统目标检测头难以有效捕捉全局信息的问题,提出基于多方位感知深度融合检测头的目标检测算法.通过在检测头部分设计高效双轴窗口注意力编码器(EDWE)模块,使网络能够深度融合捕获到的全局信息与局部信息;在特征金字塔结构之后使用重参化大核卷积(RLK)模块,减小来自主干网络的特征空间差异,增强网络对中小型数据集的适应性;引入编码器选择保留模块(ESM),选择性地累积来自EDWE模块的输出,优化反向传播.实验结果表明,在规模较大的MS-COCO2017数据集上,所提算法应用于常见模型RetinaNet、FCOS、ATSS时使AP分别提升了2.9、2.6、3.4个百分点;在规模较小的PASCAL VOC2007数据集上,所提算法使3种模型的AP分别实现了1.3、1.0和1.1个百分点的提升.通过EDWE、RLK和ESM模块的协同作用,所提算法有效提升了目标检测精度,在不同规模的数据集上均展现了显著的性能优势. 展开更多
关键词 检测头 目标检测 Transformer编码器 深度融合 大核卷积
在线阅读 下载PDF
多模态信息融合技术在声带病变的诊断及报告生成的应用研究
17
作者 陈晓丽 卜志纯 +6 位作者 杨立 廖阔 张萍 窦艳玲 方丽 雷峥 刘涛 《中国眼耳鼻喉科杂志》 2026年第1期1-6,共6页
目的探讨基于深度学习的多模态信息融合技术(MIFRL模型)在声带病变诊断及报告自动生成中的应用价值。方法回顾性收集2019年1月—2022年12月我院和广安市人民医院符合标准的1867例电子喉镜检查资料(含图像及对应诊断报告),涵盖正常、白... 目的探讨基于深度学习的多模态信息融合技术(MIFRL模型)在声带病变诊断及报告自动生成中的应用价值。方法回顾性收集2019年1月—2022年12月我院和广安市人民医院符合标准的1867例电子喉镜检查资料(含图像及对应诊断报告),涵盖正常、白斑、息肉、癌变4种类别。构建融合图像信息与文字描述信息的多模态信息融合识别模型(MIFRL模型),经训练后在测试集上验证其性能;通过与其他深度学习识别模型对比多属性分类能力,并与低年资住院医师的判读结果对比,评估模型的准确性和有效性。结果MIFRL模型对4种类别的平均精确度、敏感度、特异度、预测准确率分别为90.3%、85.3%、95.2%、85.6%。与其他深度学习模型相比,该模型多属性分类能力更优,且可生成模式化文字报告;与低年资住院医师的判读结果相比,其预测准确率在各疾病分组中均更高,其中白斑组、癌变组的差异具有统计学意义(P<0.05),优势显著。结论MIFRL模型在声带病变诊断中准确率较高,能够提供客观的病变识别结果和属性描述,具有临床应用潜力。 展开更多
关键词 声带病变 喉镜图像 多模态 信息融合 深度学习
暂未订购
改进U-Net的全局特征融合水下图像增强网络
18
作者 高绍姝 焦广森 +1 位作者 李广峰 刘宗恩 《光学精密工程》 北大核心 2026年第2期322-335,共14页
针对光在水下环境中传播时由于散射和衰减导致水下图像出现颜色偏差和细节模糊问题,提出改进U-Net的全局特征融合水下图像增强网络。首先,在编码器和解码器中设计多残差卷积模块对特征信息进行分层次融合处理,减少细节信息丢失。其次,... 针对光在水下环境中传播时由于散射和衰减导致水下图像出现颜色偏差和细节模糊问题,提出改进U-Net的全局特征融合水下图像增强网络。首先,在编码器和解码器中设计多残差卷积模块对特征信息进行分层次融合处理,减少细节信息丢失。其次,在解码器中引入通道注意力模块对通道进行加权处理,缓解通道退化程度不同的问题。最后,在解码器中设计卷积-置换自注意力模块融合全局信息,促进网络引导图像重建。所提出的方法在UIEB数据集上测试,最终在PSNR,SSIM和LPIPS三个指标上分别取得了23.42,0.9005和0.1385的成绩,在LSUI数据集上测试,最终在PSNR,SSIM和LPIPS三个指标上分别取得了29.35,0.9382和0.0880的成绩。实验结果表明所提出的方法在恢复颜色偏差和减少细节模糊方面具有较好的效果,证明其有效性和可行性。 展开更多
关键词 水下图像增强 深度学习 特征融合 注意力机制 卷积神经网络
在线阅读 下载PDF
Method of Multi-Mode Sensor Data Fusion with an Adaptive Deep Coupling Convolutional Auto-Encoder
19
作者 Xiaoxiong Feng Jianhua Liu 《Journal of Sensor Technology》 2023年第4期69-85,共17页
To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features e... To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion. 展开更多
关键词 Multi-Mode Data fusion Coupling Convolutional Auto-Encoder Adaptive Optimization deep Learning
在线阅读 下载PDF
基于时频图和时序特征组合的电能质量复合扰动识别
20
作者 毕贵红 刘大卫 +2 位作者 陈仕龙 张维 陈世轲 《电气技术》 2026年第1期9-19,共11页
针对电能质量扰动(PQDs)识别难题,本文提出一种基于LIRC-BiLSTM的双分支多模态融合轻量化识别模型。该模型首先对原始PQDs信号进行S变换,生成时频图像并作为卷积注意力模块(CBAM)支路输入;同时,将原始PQDs一维时序信号向量输入双向长短... 针对电能质量扰动(PQDs)识别难题,本文提出一种基于LIRC-BiLSTM的双分支多模态融合轻量化识别模型。该模型首先对原始PQDs信号进行S变换,生成时频图像并作为卷积注意力模块(CBAM)支路输入;同时,将原始PQDs一维时序信号向量输入双向长短期记忆网络(BiLSTM)支路。在CBAM支路中,采用多尺度特征提取模块提取不同分辨率的图像特征,再引入CBAM自适应增强通道与空间关注信息,以聚焦时频图像的关键模式与整体趋势;在BiLSTM支路中,先对时序矩阵进行轻量卷积预处理,再送入BiLSTM,并通过自注意力机制对时序特征进行强化。最后,将两条支路的输出进行时频特征和时序特征融合,完成PQDs类型判别。仿真实验表明,所提LIRC-BiLSTM模型能够有效融合时频图像与时序细节信息,显著提升了对多类电能质量扰动的识别准确率与抗噪性能。 展开更多
关键词 电能质量扰动 S变换 多模态特征融合 深度学习
在线阅读 下载PDF
上一页 1 2 157 下一页 到第
使用帮助 返回顶部