期刊文献+
共找到411篇文章
< 1 2 21 >
每页显示 20 50 100
Joint Feature Encoding and Task Alignment Mechanism for Emotion-Cause Pair Extraction
1
作者 Shi Li Didi Sun 《Computers, Materials & Continua》 SCIE EI 2025年第1期1069-1086,共18页
With the rapid expansion of social media,analyzing emotions and their causes in texts has gained significant importance.Emotion-cause pair extraction enables the identification of causal relationships between emotions... With the rapid expansion of social media,analyzing emotions and their causes in texts has gained significant importance.Emotion-cause pair extraction enables the identification of causal relationships between emotions and their triggers within a text,facilitating a deeper understanding of expressed sentiments and their underlying reasons.This comprehension is crucial for making informed strategic decisions in various business and societal contexts.However,recent research approaches employing multi-task learning frameworks for modeling often face challenges such as the inability to simultaneouslymodel extracted features and their interactions,or inconsistencies in label prediction between emotion-cause pair extraction and independent assistant tasks like emotion and cause extraction.To address these issues,this study proposes an emotion-cause pair extraction methodology that incorporates joint feature encoding and task alignment mechanisms.The model consists of two primary components:First,joint feature encoding simultaneously generates features for emotion-cause pairs and clauses,enhancing feature interactions between emotion clauses,cause clauses,and emotion-cause pairs.Second,the task alignment technique is applied to reduce the labeling distance between emotion-cause pair extraction and the two assistant tasks,capturing deep semantic information interactions among tasks.The proposed method is evaluated on a Chinese benchmark corpus using 10-fold cross-validation,assessing key performance metrics such as precision,recall,and F1 score.Experimental results demonstrate that the model achieves an F1 score of 76.05%,surpassing the state-of-the-art by 1.03%.The proposed model exhibits significant improvements in emotion-cause pair extraction(ECPE)and cause extraction(CE)compared to existing methods,validating its effectiveness.This research introduces a novel approach based on joint feature encoding and task alignment mechanisms,contributing to advancements in emotion-cause pair extraction.However,the study’s limitation lies in the data sources,potentially restricting the generalizability of the findings. 展开更多
关键词 Emotion-cause pair extraction interactive information enhancement joint feature encoding label consistency task alignment mechanisms
在线阅读 下载PDF
Self-FAGCFN:Graph-Convolution Fusion Network Based on Feature Fusion and Self-Supervised Feature Alignment for Pneumonia and Tuberculosis Diagnosis
2
作者 Junding Sun Wenhao Tang +5 位作者 Lei Zhao Chaosheng Tang Xiaosheng Wu Zhaozhao Xu Bin Pu Yudong Zhang 《Journal of Bionic Engineering》 2025年第4期2012-2029,共18页
Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely us... Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications. 展开更多
关键词 feature fusion Self-supervised feature alignment Convolutional neural networks Graph convolutional networks Class imbalance feature-centroid fusion
在线阅读 下载PDF
Hierarchical Optimization Method for Federated Learning with Feature Alignment and Decision Fusion
3
作者 Ke Li Xiaofeng Wang Hu Wang 《Computers, Materials & Continua》 SCIE EI 2024年第10期1391-1407,共17页
In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate... In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate accuracy of the global model.Utilizing shared feature representations alongside customized classifiers for individual clients emerges as a promising personalized solution.Nonetheless,previous research has frequently neglected the integration of global knowledge into local representation learning and the synergy between global and local classifiers,thereby limiting model performance.To tackle these issues,this study proposes a hierarchical optimization method for federated learning with feature alignment and the fusion of classification decisions(FedFCD).FedFCD regularizes the relationship between global and local feature representations to achieve alignment and incorporates decision information from the global classifier,facilitating the late fusion of decision outputs from both global and local classifiers.Additionally,FedFCD employs a hierarchical optimization strategy to flexibly optimize model parameters.Through experiments on the Fashion-MNIST,CIFAR-10 and CIFAR-100 datasets,we demonstrate the effectiveness and superiority of FedFCD.For instance,on the CIFAR-100 dataset,FedFCD exhibited a significant improvement in average test accuracy by 6.83%compared to four outstanding personalized federated learning approaches.Furthermore,extended experiments confirm the robustness of FedFCD across various hyperparameter values. 展开更多
关键词 Federated learning data heterogeneity feature alignment decision fusion hierarchical optimization
在线阅读 下载PDF
Feature pyramid attention network for audio-visual scene classification 被引量:1
4
作者 Liguang Zhou Yuhongze Zhou +3 位作者 Xiaonan Qi Junjie Hu Tin Lun Lam Yangsheng Xu 《CAAI Transactions on Intelligence Technology》 2025年第2期359-374,共16页
Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text... Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals. 展开更多
关键词 dimension alignment feature pyramid attention network pyramid channel attention pyramid spatial attention semantic relevant regions
在线阅读 下载PDF
Multi-Modal Pre-Synergistic Fusion Entity Alignment Based on Mutual Information Strategy Optimization
5
作者 Huayu Li Xinxin Chen +3 位作者 Lizhuang Tan Konstantin I.Kostromitin Athanasios V.Vasilakos Peiying Zhang 《Computers, Materials & Continua》 2025年第11期4133-4153,共21页
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities... To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model. 展开更多
关键词 Knowledge graph MULTI-MODAL entity alignment feature fusion pre-synergistic fusion
在线阅读 下载PDF
A Dual Stream Multimodal Alignment and Fusion Network for Classifying Short Videos
6
作者 ZHOU Ming WANG Tong 《Journal of Donghua University(English Edition)》 2025年第1期88-95,共8页
Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and t... Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and the modality fusion approach tends to be too simple,often neglecting modality alignment before fusion.This research introduces a novel dual stream multimodal alignment and fusion network named DMAFNet for classifying short videos.The network uses two unimodal encoder modules to extract features within modalities and exploits a multimodal encoder module to learn interaction between modalities.To solve the modality alignment problem,contrastive learning is introduced between two unimodal encoder modules.Additionally,masked language modeling(MLM)and video text matching(VTM)auxiliary tasks are introduced to improve the interaction between video frames and text modalities through backpropagation of loss functions.Diverse experiments prove the efficiency of DMAFNet in multimodal video classification tasks.Compared with other two mainstream baselines,DMAFNet achieves the best results on the 2022 WeChat Big Data Challenge dataset. 展开更多
关键词 video classification multimodal fusion feature alignment
在线阅读 下载PDF
Advancing Sports Image Classification and Analysis:Effective Data Augmentation and Feature Alignment Strategies
7
作者 Ping Liu Chao Zhao +2 位作者 Bin Zang Sifeng Wang Shigen Shen 《Tsinghua Science and Technology》 2026年第1期577-589,共13页
Sport plays a crucial role in society,influencing physical health,entertainment,and community engagement.As artificial intelligence advances,the ability to classify sport images accurately becomes increasingly crucial... Sport plays a crucial role in society,influencing physical health,entertainment,and community engagement.As artificial intelligence advances,the ability to classify sport images accurately becomes increasingly crucial.Effective sport image classification enhances applications,such as performance analysis,athlete tracking,and fan engagement.Despite its significance,current methods face challenges due to limited labeled datasets and issues with feature misalignment.This paper introduces a novel Contrastive Language-Image Pre-training(CLIP)based framework specifically designed for sport image classification.By incorporating data augmentation techniques,the approach addresses data sparsity and enriches the diversity of image-text pairings,reducing the need for extensive manual annotation.Additionally,feature alignment strategies tackle text-image misalignment issues that affect classification accuracy.This approach fills a significant research gap and offers practical solutions to improve classification performance in sport image analysis.The results of extensive experiments validate the effectiveness of the framework,demonstrating its potential to advance sports analytics and contribute to more precise and scalable solutions in sport image classification. 展开更多
关键词 sport image classification sports analytics data augmentation feature alignment
原文传递
Feature Extraction of Kernel Regress Reconstruction for Fault Diagnosis Based on Self-organizing Manifold Learning 被引量:3
8
作者 CHEN Xiaoguang LIANG Lin +1 位作者 XU Guanghua LIU Dan 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2013年第5期1041-1049,共9页
The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddi... The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddings,such as manifold learning.However,these methods are all based on manual intervention,which have some shortages in stability,and suppressing the disturbance noise.To extract features automatically,a manifold learning method with self-organization mapping is introduced for the first time.Under the non-uniform sample distribution reconstructed by the phase space,the expectation maximization(EM) iteration algorithm is used to divide the local neighborhoods adaptively without manual intervention.After that,the local tangent space alignment(LTSA) algorithm is adopted to compress the high-dimensional phase space into a more truthful low-dimensional representation.Finally,the signal is reconstructed by the kernel regression.Several typical states include the Lorenz system,engine fault with piston pin defect,and bearing fault with outer-race defect are analyzed.Compared with the LTSA and continuous wavelet transform,the results show that the background noise can be fully restrained and the entire periodic repetition of impact components is well separated and identified.A new way to automatically and precisely extract the impulsive components from mechanical signals is proposed. 展开更多
关键词 feature extraction manifold learning self-organize mapping kernel regression local tangent space alignment
在线阅读 下载PDF
Class conditional distribution alignment for domain adaptation 被引量:2
9
作者 Kai CAO Zhipeng TU Yang MING 《Control Theory and Technology》 EI CSCD 2020年第1期72-80,共9页
In this paper,we study the problem of domain adaptation,which is a crucial ingredient in transfer learning with two domains,that is,the source domain with labeled data and the target domain with none or few labels.Dom... In this paper,we study the problem of domain adaptation,which is a crucial ingredient in transfer learning with two domains,that is,the source domain with labeled data and the target domain with none or few labels.Domain adaptation aims to extract knowledge from the source domain to improve the performance of the learning task in the target domain.A popular approach to handle this problem is via adversarial training,which is explained by the H△H-distance theory.However,traditional adversarial network architectures just align the marginal feature distribution in the feature space.The alignment of class condition distribution is not guaranteed.Therefore,we proposed a novel method based on pseudo labels and the cluster assumption to avoid the incorrect class alignment in the feature space.The experiments demonstrate that our framework improves the accuracy on typical transfer learning tasks. 展开更多
关键词 DOMAIN ADAPTATION distribution alignment feature CLUSTER
原文传递
A Power Data Anomaly Detection Model Based on Deep Learning with Adaptive Feature Fusion
10
作者 Xiu Liu Liang Gu +3 位作者 Xin Gong Long An Xurui Gao Juying Wu 《Computers, Materials & Continua》 SCIE EI 2024年第6期4045-4061,共17页
With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve suffi... With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed. 展开更多
关键词 Data alignment dimension reduction feature fusion data anomaly detection deep learning
在线阅读 下载PDF
基于特征流的点云目标检测方法
11
作者 陆军 邹康成 李杨 《智能系统学报》 北大核心 2026年第1期146-155,共10页
针对现有激光雷达点云三维目标检测方法因点云稀疏性导致的场景信息缺失与目标漏检问题,本文提出一种基于特征流的单阶段三维目标检测算法,该算法通过多帧时空特征融合与动态对齐机制优化检测性能。首先,构建门控网络驱动的多帧融合框架... 针对现有激光雷达点云三维目标检测方法因点云稀疏性导致的场景信息缺失与目标漏检问题,本文提出一种基于特征流的单阶段三维目标检测算法,该算法通过多帧时空特征融合与动态对齐机制优化检测性能。首先,构建门控网络驱动的多帧融合框架,利用可变形注意力机制协同时空特征提取模块,实现跨帧特征的动态对齐,抑制未对齐特征融合导致的误检;其次,设计时空特征引导的可变形注意力机制,通过目标运动信息预测特征偏移与权重,提升稀疏点云的特征匹配精度;最后,设计层级式特征流提取模块,结合多尺度特征提取与渐进融合策略,增强场景表征能力。实验结果表明,所提算法在NuScenes验证集上的平均精度均值达到63.73%,较体素基准方法提升4.51%,其中摩托车、自行车等小目标检测精度提升超过14%。消融实验结果表明,多帧互补机制使远距离目标(>50 m)召回率提升16.2%,遮挡场景漏检率降低11.8%。本研究为自动驾驶领域稀疏点云三维检测提供了有效方案。 展开更多
关键词 激光雷达点云 目标检测 特征流 特征对齐 时序特征融合 可变形注意力机制 鸟瞰视角表示 多帧点云融合
在线阅读 下载PDF
基于知识蒸馏的轻量化遥感多模态大语言模型
12
作者 张馨月 冯世阳 王斌 《红外与毫米波学报》 北大核心 2026年第1期103-115,共13页
遥感多模态大语言模型融合了丰富的视觉语言模态信息,在遥感图像分析和解译等领域中展现出巨大潜力。然而,现有的知识蒸馏方法多聚焦于单模态大语言模型的压缩,忽视了各模态间的特征对齐,因而阻碍了大语言模型在跨模态任务中的性能表现... 遥感多模态大语言模型融合了丰富的视觉语言模态信息,在遥感图像分析和解译等领域中展现出巨大潜力。然而,现有的知识蒸馏方法多聚焦于单模态大语言模型的压缩,忽视了各模态间的特征对齐,因而阻碍了大语言模型在跨模态任务中的性能表现。针对上述问题,提出一种基于知识蒸馏的遥感多模态大语言模型轻量化方法,通过在特征层对齐各模态的输出,实现了多模态信息的有效对齐;通过引入反向Kullback-Leibler散度作为损失函数,并结合教师混合采样和单步分解的优化策略,进一步提升了学生模型的泛化性与稳定性。实验结果表明,本文方法在遥感图像的场景分类、视觉问答、视觉定位与图像描述四种下游任务上实现了更高的准确性与效率,同时显著减少了模型参数量和对计算资源的需求,为多模态大语言模型在遥感领域的高效应用提供了新的解决方案。 展开更多
关键词 遥感图像 多模态大语言模型 知识蒸馏 反向Kullback-Leibler散度 特征对齐
在线阅读 下载PDF
隐式特征图引导的文生图:三向注意力融合
13
作者 马栋林 马晓珍 赵宏 《计算机技术与发展》 2026年第2期101-108,共8页
针对多阶段生成模型因显式中间图像导致误差累积的问题,提出基于隐式特征图与三向注意力融合的双阶段解耦框架。在隐式特征图生成阶段,通过递归注意力迭代生成64×64非可视化特征图,替代AttnGAN等模型的显式中间图像,有效规避多阶... 针对多阶段生成模型因显式中间图像导致误差累积的问题,提出基于隐式特征图与三向注意力融合的双阶段解耦框架。在隐式特征图生成阶段,通过递归注意力迭代生成64×64非可视化特征图,替代AttnGAN等模型的显式中间图像,有效规避多阶段可视化误差传递;在三向注意力增强阶段,扩展Triplet Attention为通道-空间-文本交互机制,实现像素级细粒度控制。实验表明,在细粒度要求最高的CUB数据集上,CLIP分数达0.82,优于基线模型AttnGAN(0.62)、MediaPipe(0.71)等模型,较基线模型提升32.3%;IS分数提升至5.05,较基线模型提升15.8%,且模型推理速度较StackGAN++等多阶段生成方法提升20%。在NVIDIA RTX 4090(24 GB显存)环境下,单张图像生成时间降低至0.96秒,该框架有效解决了多阶段误差传递与细粒度对齐问题。 展开更多
关键词 文本到图像生成 三向注意力 隐式特征图 细粒度对齐 双阶段生成
在线阅读 下载PDF
基于统一对齐与多阶段融合机制的多模态情感分析模型
14
作者 冯广 刘馨婷 +4 位作者 林忆宝 赵志文 肖俊鸿 周科栋 黄俊辉 《计算机应用研究》 北大核心 2026年第2期342-352,共11页
针对多模态情感分析中模态异构、贡献动态与语义抽象不足等问题,提出一种三阶段闭环融合模型MIFA,路径包含“统一对齐-动态融合调控-高阶语义抽象”。方法上,首先以统一语义对齐实现异构模态在共享空间的一致表达;继而通过上下文门控与... 针对多模态情感分析中模态异构、贡献动态与语义抽象不足等问题,提出一种三阶段闭环融合模型MIFA,路径包含“统一对齐-动态融合调控-高阶语义抽象”。方法上,首先以统一语义对齐实现异构模态在共享空间的一致表达;继而通过上下文门控与通道调制联合估计模态/通道权重;最终以分层残差语义增强实现高阶抽象与判别强化。在CMU-MOSI与CMU-MOSEI数据集上的实验表明,二分类Acc2与F_(1)分别达到86.43%/86.03%和86.42%/85.81%,七分类Acc7为45.04%/50.41%,回归任务中MAE为0.689/0.532,总体优于主流模型。验证了该方法能够稳定对齐并自适应调控信息流,提升情感分类与强度回归性能,具备在复杂跨模态场景中的应用潜力。 展开更多
关键词 多模态情感分析 跨模态特征融合 统一语义对齐 动态融合调控 分层残差机制 跨模态鲁棒性
在线阅读 下载PDF
图文跨模态检索双重过滤与动态补全的注意力区域优化方法研究
15
作者 孟凡奇 田凯迪 田研 《现代信息科技》 2026年第1期41-46,共6页
当前图文跨模态检索存在两个主要瓶颈:传统注意力机制往往包含大量冗余区域,引入无关语义噪声;过度筛选又会导致有效区域不足,造成关键视觉信息丢失。这两种情况均会显著降低模型的匹配精度与鲁棒性。针对该问题,提出一种双重优化策略:... 当前图文跨模态检索存在两个主要瓶颈:传统注意力机制往往包含大量冗余区域,引入无关语义噪声;过度筛选又会导致有效区域不足,造成关键视觉信息丢失。这两种情况均会显著降低模型的匹配精度与鲁棒性。针对该问题,提出一种双重优化策略:首先通过双重过滤机制自适应保留高响应区域,有效抑制冗余噪声;同时创新性引入Top-K动态补全方法,在检测到特征不足时自动补充关键语义区域。实验验证表明,该方法在保持特征选择精度的同时,有效避免关键信息丢失,显著提升了模型在复杂场景下的跨模态匹配性能。 展开更多
关键词 跨模态检索 图文检索 特征对齐 阈值过滤 注意力优化
在线阅读 下载PDF
多模态融合与球面采样的层级式位姿估计
16
作者 李欣 况立群 +2 位作者 赵融 韩慧妍 杨晓文 《计算机应用研究》 北大核心 2026年第2期617-625,共9页
六自由度位姿估计作为三维空间感知的关键技术,广泛应用于机器人抓取、自动驾驶和智能制造等领域。针对RGB-D图像的六自由度估计,如何深度融合RGB图像的语义信息和深度图像的几何信息仍是一大挑战。现有方法大多采用双流网络或特征拼接... 六自由度位姿估计作为三维空间感知的关键技术,广泛应用于机器人抓取、自动驾驶和智能制造等领域。针对RGB-D图像的六自由度估计,如何深度融合RGB图像的语义信息和深度图像的几何信息仍是一大挑战。现有方法大多采用双流网络或特征拼接策略,采用先独立处理各模态数据再融合的策略,但因缺乏跨模态的特征对齐机制,导致语义-几何信息耦合不足,限制了估计精度。为此,提出了一种基于多模态融合与球面采样的位姿估计框架—SpherePose。该方法首先基于Icosphere实现均匀采样,生成多样化初始位姿假设;然后,引入基于多模态特征融合的位姿优化模块,对初始位姿假设进行迭代细化,联合RGB图、深度图和三维坐标图提升位姿估计精度;最后设计具备双重注意力与分层排序机制的位姿评分网络,筛选最优位姿。在BOP基准测试中,所提方法在LineMOD和YCB-Video数据集上的ADD(s)-0.1d和ADD-s AUC指标分别达到99.8%和97.7%,显著优于对比方法,展现出更高的准确性与鲁棒性。综上,整体框架为多模态融合与姿态筛选提供了一种高效且可靠的解决方案,提高了6D位姿估计精度。 展开更多
关键词 六自由度 位姿估计 多模态融合 跨模态特征对齐 注意力机制
在线阅读 下载PDF
基于SMPL模态分解与嵌入融合的多模态步态识别
17
作者 吴越 梁铮 +4 位作者 高巍 杨茂达 赵培森 邓红霞 常媛媛 《浙江大学学报(工学版)》 北大核心 2026年第1期52-60,共9页
针对现有步态识别研究中步态信息挖掘不足和跨模态特征对齐不充分导致真实场景中识别性能受限的问题,提出基于蒙皮多人线性(SMPL)模态分解与嵌入融合的多模态步态识别方法.通过将SMPL模型分解为形状分支和姿势分支,全面提取人体静态形... 针对现有步态识别研究中步态信息挖掘不足和跨模态特征对齐不充分导致真实场景中识别性能受限的问题,提出基于蒙皮多人线性(SMPL)模态分解与嵌入融合的多模态步态识别方法.通过将SMPL模型分解为形状分支和姿势分支,全面提取人体静态形状特征和动态运动特征;构建自适应帧关节注意力模块,自适应聚焦关键帧与重要关节,增强姿势特征表达能力;设计模态嵌入融合模块,将不同模态特征投影至统一语义空间,并构建模态一致性损失函数,优化跨模态特征对齐,提升融合效果.在Gait3D数据集上的实验结果表明,与6种基于轮廓的方法、2种基于骨骼的方法以及5种基于轮廓和骨骼或SMPL模型的多模态方法比较,所提方法 Rank-1准确率达到70.4%,在复杂真实场景中表现出更高鲁棒性,验证了所提方法在模态特征提取和跨模态特征对齐方面的有效性. 展开更多
关键词 步态识别 SMPL模型 自适应注意力 特征对齐 模态融合
在线阅读 下载PDF
基于中西医临床病证特点的视网膜色素变性动物模型分析
18
作者 李晓宇 梁丽娜 +2 位作者 陈结凤 朱晓晓 齐依娜 《中国实验方剂学杂志》 北大核心 2026年第3期198-203,共6页
视网膜色素变性(RP)是临床最常见的遗传性致盲眼病,患者视网膜光感受器细胞进行性凋亡伴随视网膜色素上皮(RPE)细胞变性,其发病机制暂不明确,当下西医治疗以基因、干细胞移植等方法为主,但疗效较为有限,而中医药治疗在临床观察中显示出... 视网膜色素变性(RP)是临床最常见的遗传性致盲眼病,患者视网膜光感受器细胞进行性凋亡伴随视网膜色素上皮(RPE)细胞变性,其发病机制暂不明确,当下西医治疗以基因、干细胞移植等方法为主,但疗效较为有限,而中医药治疗在临床观察中显示出有一定的疗效,建立符合中西医病证特点的RP动物模型,有助于共同发挥中、西医治疗的优势,从而拓宽RP治疗方案。该研究对RP已有动物模型的分类、种类、遗传方式与临床吻合度进行整理与总结,发现当下RP模型主要来自于RD小鼠、RCS大鼠等自然动物模型,RPE-65基因敲除小鼠、视紫红质基因敲除小鼠等转基因动物模型,单色光照射、N-乙基-N-亚硝基脲(ENU)等化学造模法模型。以上3类模型更多侧重于RP的组织病理学、分子生物学、细胞免疫学等检测指标,对疾病特征观察较为有限,对证候观察基本缺失。RP虽为先天遗传性疾病,其发病进程仍受到环境、体质、情志、养护等后天因素的影响,现有模型未能全面展现疾病特征。故建立基于中西医病证特点的RP动物模型将对今后开展实验与临床研究有积极意义。 展开更多
关键词 视网膜色素变性 中西医病证特点 动物模型 临床吻合度
原文传递
多模态引导裙装图像生成的结构化风格增强学习
19
作者 马嘉妮 刘骊 +2 位作者 付晓东 刘利军 彭玮 《中国图象图形学报》 北大核心 2026年第3期862-879,共18页
目的针对多模态引导的裙装图像生成中存在的多角度文本注释信息冗余与冲突、跨区域风格传递能力有限以及语义与风格难以精细协同控制的问题,提出了一种结构化风格增强学习方法。方法以文本描述作为输入,针对裙装特点设计动态属性模板生... 目的针对多模态引导的裙装图像生成中存在的多角度文本注释信息冗余与冲突、跨区域风格传递能力有限以及语义与风格难以精细协同控制的问题,提出了一种结构化风格增强学习方法。方法以文本描述作为输入,针对裙装特点设计动态属性模板生成策略,智能提取并重构7类关键裙装属性,构建消除冗余与冲突的结构化文本提示;建立文本反转语义融合机制,将裙装图像特征经文本反转生成伪词嵌入,与结构化提示融合,形成语义丰富的文本表示;构建跨域图像特征对齐模块,引入跳跃交叉注意力,实现草图结构与风格图像的选择性融合并实现跨区域风格关联;建立双重条件协同融合框架,将增强的文本表示与跨域风格表示分层注入潜在扩散模型,精细控制语义与风格以生成裙装图像。结果实验在DressCode Multimodal数据集裙装子集上与目前较新的5种方法进行比较。结果表明,所提方法的弗雷歇起始距离(Fréchet inception distance,FID)和学习感知图像块相似度(learned perceptual image patch similarity,LPIPS)较对比方法提高2.131和0.193,对比语言图像预训练分数(contrastive language-image pre-training score,CLIPScore)和纹理分数(texture score,TS)分别提高17.57%和8.29%,说明本文方法具有更好的生成效果。结论本文提出的多模态引导裙装图像生成的结构化风格增强学习方法,能有效聚焦语义内容与风格结构间的深层关联,在确保多模态一致性的同时,实现高质量的裙装图像生成。 展开更多
关键词 裙装图像生成 结构化文本提示 文本反转语义融合 跨域图像特征对齐 扩散模型
原文传递
基于细粒度特征增强的多模态视觉问答研究
20
作者 王志伟 陆振宇 《南京信息工程大学学报》 北大核心 2026年第1期35-47,共13页
现有多模态视觉问答(Visual Question Answering,VQA)模型忽略了图像中局部显著信息与文本中局部基本词之间的细粒度交互作用,图像与文本之间的语义相关性有待提高.为此,本文提出一种基于细粒度特征增强的多模态视觉问答方法.首先,对视... 现有多模态视觉问答(Visual Question Answering,VQA)模型忽略了图像中局部显著信息与文本中局部基本词之间的细粒度交互作用,图像与文本之间的语义相关性有待提高.为此,本文提出一种基于细粒度特征增强的多模态视觉问答方法.首先,对视觉和文本分别增加一种细粒度特征提取方法,以便更全面准确地提取图像和问题的语义特征;然后,为了利用不同层次模态之间的对齐信息,提出一种对齐引导的自注意力模块来对齐单一模态内(视觉或文本)细粒度特征和全局语义特征之间的对应关系,并以统一的方式融合不同层次的单模态信息;最后,在VQA v2.0和VQA-CP v2数据集上进行实验,结果表明,本文所提方法在各项视觉问答评估指标上的表现优于现有的模型. 展开更多
关键词 视觉问答 多模态 细粒度 特征增强 实体对齐 特征融合
在线阅读 下载PDF
上一页 1 2 21 下一页 到第
使用帮助 返回顶部