期刊文献+
共找到392篇文章
< 1 2 20 >
每页显示 20 50 100
Joint Feature Encoding and Task Alignment Mechanism for Emotion-Cause Pair Extraction
1
作者 Shi Li Didi Sun 《Computers, Materials & Continua》 SCIE EI 2025年第1期1069-1086,共18页
With the rapid expansion of social media,analyzing emotions and their causes in texts has gained significant importance.Emotion-cause pair extraction enables the identification of causal relationships between emotions... With the rapid expansion of social media,analyzing emotions and their causes in texts has gained significant importance.Emotion-cause pair extraction enables the identification of causal relationships between emotions and their triggers within a text,facilitating a deeper understanding of expressed sentiments and their underlying reasons.This comprehension is crucial for making informed strategic decisions in various business and societal contexts.However,recent research approaches employing multi-task learning frameworks for modeling often face challenges such as the inability to simultaneouslymodel extracted features and their interactions,or inconsistencies in label prediction between emotion-cause pair extraction and independent assistant tasks like emotion and cause extraction.To address these issues,this study proposes an emotion-cause pair extraction methodology that incorporates joint feature encoding and task alignment mechanisms.The model consists of two primary components:First,joint feature encoding simultaneously generates features for emotion-cause pairs and clauses,enhancing feature interactions between emotion clauses,cause clauses,and emotion-cause pairs.Second,the task alignment technique is applied to reduce the labeling distance between emotion-cause pair extraction and the two assistant tasks,capturing deep semantic information interactions among tasks.The proposed method is evaluated on a Chinese benchmark corpus using 10-fold cross-validation,assessing key performance metrics such as precision,recall,and F1 score.Experimental results demonstrate that the model achieves an F1 score of 76.05%,surpassing the state-of-the-art by 1.03%.The proposed model exhibits significant improvements in emotion-cause pair extraction(ECPE)and cause extraction(CE)compared to existing methods,validating its effectiveness.This research introduces a novel approach based on joint feature encoding and task alignment mechanisms,contributing to advancements in emotion-cause pair extraction.However,the study’s limitation lies in the data sources,potentially restricting the generalizability of the findings. 展开更多
关键词 Emotion-cause pair extraction interactive information enhancement joint feature encoding label consistency task alignment mechanisms
在线阅读 下载PDF
Self-FAGCFN:Graph-Convolution Fusion Network Based on Feature Fusion and Self-Supervised Feature Alignment for Pneumonia and Tuberculosis Diagnosis
2
作者 Junding Sun Wenhao Tang +5 位作者 Lei Zhao Chaosheng Tang Xiaosheng Wu Zhaozhao Xu Bin Pu Yudong Zhang 《Journal of Bionic Engineering》 2025年第4期2012-2029,共18页
Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely us... Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications. 展开更多
关键词 feature fusion Self-supervised feature alignment Convolutional neural networks Graph convolutional networks Class imbalance feature-centroid fusion
在线阅读 下载PDF
Hierarchical Optimization Method for Federated Learning with Feature Alignment and Decision Fusion
3
作者 Ke Li Xiaofeng Wang Hu Wang 《Computers, Materials & Continua》 SCIE EI 2024年第10期1391-1407,共17页
In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate... In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate accuracy of the global model.Utilizing shared feature representations alongside customized classifiers for individual clients emerges as a promising personalized solution.Nonetheless,previous research has frequently neglected the integration of global knowledge into local representation learning and the synergy between global and local classifiers,thereby limiting model performance.To tackle these issues,this study proposes a hierarchical optimization method for federated learning with feature alignment and the fusion of classification decisions(FedFCD).FedFCD regularizes the relationship between global and local feature representations to achieve alignment and incorporates decision information from the global classifier,facilitating the late fusion of decision outputs from both global and local classifiers.Additionally,FedFCD employs a hierarchical optimization strategy to flexibly optimize model parameters.Through experiments on the Fashion-MNIST,CIFAR-10 and CIFAR-100 datasets,we demonstrate the effectiveness and superiority of FedFCD.For instance,on the CIFAR-100 dataset,FedFCD exhibited a significant improvement in average test accuracy by 6.83%compared to four outstanding personalized federated learning approaches.Furthermore,extended experiments confirm the robustness of FedFCD across various hyperparameter values. 展开更多
关键词 Federated learning data heterogeneity feature alignment decision fusion hierarchical optimization
在线阅读 下载PDF
Feature pyramid attention network for audio-visual scene classification 被引量:1
4
作者 Liguang Zhou Yuhongze Zhou +3 位作者 Xiaonan Qi Junjie Hu Tin Lun Lam Yangsheng Xu 《CAAI Transactions on Intelligence Technology》 2025年第2期359-374,共16页
Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text... Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals. 展开更多
关键词 dimension alignment feature pyramid attention network pyramid channel attention pyramid spatial attention semantic relevant regions
在线阅读 下载PDF
Multi-Modal Pre-Synergistic Fusion Entity Alignment Based on Mutual Information Strategy Optimization
5
作者 Huayu Li Xinxin Chen +3 位作者 Lizhuang Tan Konstantin I.Kostromitin Athanasios V.Vasilakos Peiying Zhang 《Computers, Materials & Continua》 2025年第11期4133-4153,共21页
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities... To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model. 展开更多
关键词 Knowledge graph MULTI-MODAL entity alignment feature fusion pre-synergistic fusion
在线阅读 下载PDF
A Dual Stream Multimodal Alignment and Fusion Network for Classifying Short Videos
6
作者 ZHOU Ming WANG Tong 《Journal of Donghua University(English Edition)》 2025年第1期88-95,共8页
Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and t... Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and the modality fusion approach tends to be too simple,often neglecting modality alignment before fusion.This research introduces a novel dual stream multimodal alignment and fusion network named DMAFNet for classifying short videos.The network uses two unimodal encoder modules to extract features within modalities and exploits a multimodal encoder module to learn interaction between modalities.To solve the modality alignment problem,contrastive learning is introduced between two unimodal encoder modules.Additionally,masked language modeling(MLM)and video text matching(VTM)auxiliary tasks are introduced to improve the interaction between video frames and text modalities through backpropagation of loss functions.Diverse experiments prove the efficiency of DMAFNet in multimodal video classification tasks.Compared with other two mainstream baselines,DMAFNet achieves the best results on the 2022 WeChat Big Data Challenge dataset. 展开更多
关键词 video classification multimodal fusion feature alignment
在线阅读 下载PDF
Advancing Sports Image Classification and Analysis:Effective Data Augmentation and Feature Alignment Strategies
7
作者 Ping Liu Chao Zhao +2 位作者 Bin Zang Sifeng Wang Shigen Shen 《Tsinghua Science and Technology》 2026年第1期577-589,共13页
Sport plays a crucial role in society,influencing physical health,entertainment,and community engagement.As artificial intelligence advances,the ability to classify sport images accurately becomes increasingly crucial... Sport plays a crucial role in society,influencing physical health,entertainment,and community engagement.As artificial intelligence advances,the ability to classify sport images accurately becomes increasingly crucial.Effective sport image classification enhances applications,such as performance analysis,athlete tracking,and fan engagement.Despite its significance,current methods face challenges due to limited labeled datasets and issues with feature misalignment.This paper introduces a novel Contrastive Language-Image Pre-training(CLIP)based framework specifically designed for sport image classification.By incorporating data augmentation techniques,the approach addresses data sparsity and enriches the diversity of image-text pairings,reducing the need for extensive manual annotation.Additionally,feature alignment strategies tackle text-image misalignment issues that affect classification accuracy.This approach fills a significant research gap and offers practical solutions to improve classification performance in sport image analysis.The results of extensive experiments validate the effectiveness of the framework,demonstrating its potential to advance sports analytics and contribute to more precise and scalable solutions in sport image classification. 展开更多
关键词 sport image classification sports analytics data augmentation feature alignment
原文传递
Feature Extraction of Kernel Regress Reconstruction for Fault Diagnosis Based on Self-organizing Manifold Learning 被引量:3
8
作者 CHEN Xiaoguang LIANG Lin +1 位作者 XU Guanghua LIU Dan 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2013年第5期1041-1049,共9页
The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddi... The feature space extracted from vibration signals with various faults is often nonlinear and of high dimension.Currently,nonlinear dimensionality reduction methods are available for extracting low-dimensional embeddings,such as manifold learning.However,these methods are all based on manual intervention,which have some shortages in stability,and suppressing the disturbance noise.To extract features automatically,a manifold learning method with self-organization mapping is introduced for the first time.Under the non-uniform sample distribution reconstructed by the phase space,the expectation maximization(EM) iteration algorithm is used to divide the local neighborhoods adaptively without manual intervention.After that,the local tangent space alignment(LTSA) algorithm is adopted to compress the high-dimensional phase space into a more truthful low-dimensional representation.Finally,the signal is reconstructed by the kernel regression.Several typical states include the Lorenz system,engine fault with piston pin defect,and bearing fault with outer-race defect are analyzed.Compared with the LTSA and continuous wavelet transform,the results show that the background noise can be fully restrained and the entire periodic repetition of impact components is well separated and identified.A new way to automatically and precisely extract the impulsive components from mechanical signals is proposed. 展开更多
关键词 feature extraction manifold learning self-organize mapping kernel regression local tangent space alignment
在线阅读 下载PDF
Class conditional distribution alignment for domain adaptation 被引量:2
9
作者 Kai CAO Zhipeng TU Yang MING 《Control Theory and Technology》 EI CSCD 2020年第1期72-80,共9页
In this paper,we study the problem of domain adaptation,which is a crucial ingredient in transfer learning with two domains,that is,the source domain with labeled data and the target domain with none or few labels.Dom... In this paper,we study the problem of domain adaptation,which is a crucial ingredient in transfer learning with two domains,that is,the source domain with labeled data and the target domain with none or few labels.Domain adaptation aims to extract knowledge from the source domain to improve the performance of the learning task in the target domain.A popular approach to handle this problem is via adversarial training,which is explained by the H△H-distance theory.However,traditional adversarial network architectures just align the marginal feature distribution in the feature space.The alignment of class condition distribution is not guaranteed.Therefore,we proposed a novel method based on pseudo labels and the cluster assumption to avoid the incorrect class alignment in the feature space.The experiments demonstrate that our framework improves the accuracy on typical transfer learning tasks. 展开更多
关键词 DOMAIN ADAPTATION distribution alignment feature CLUSTER
原文传递
A Power Data Anomaly Detection Model Based on Deep Learning with Adaptive Feature Fusion
10
作者 Xiu Liu Liang Gu +3 位作者 Xin Gong Long An Xurui Gao Juying Wu 《Computers, Materials & Continua》 SCIE EI 2024年第6期4045-4061,共17页
With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve suffi... With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed. 展开更多
关键词 Data alignment dimension reduction feature fusion data anomaly detection deep learning
在线阅读 下载PDF
隐式特征图引导的文生图:三向注意力融合
11
作者 马栋林 马晓珍 赵宏 《计算机技术与发展》 2026年第2期101-108,共8页
针对多阶段生成模型因显式中间图像导致误差累积的问题,提出基于隐式特征图与三向注意力融合的双阶段解耦框架。在隐式特征图生成阶段,通过递归注意力迭代生成64×64非可视化特征图,替代AttnGAN等模型的显式中间图像,有效规避多阶... 针对多阶段生成模型因显式中间图像导致误差累积的问题,提出基于隐式特征图与三向注意力融合的双阶段解耦框架。在隐式特征图生成阶段,通过递归注意力迭代生成64×64非可视化特征图,替代AttnGAN等模型的显式中间图像,有效规避多阶段可视化误差传递;在三向注意力增强阶段,扩展Triplet Attention为通道-空间-文本交互机制,实现像素级细粒度控制。实验表明,在细粒度要求最高的CUB数据集上,CLIP分数达0.82,优于基线模型AttnGAN(0.62)、MediaPipe(0.71)等模型,较基线模型提升32.3%;IS分数提升至5.05,较基线模型提升15.8%,且模型推理速度较StackGAN++等多阶段生成方法提升20%。在NVIDIA RTX 4090(24 GB显存)环境下,单张图像生成时间降低至0.96秒,该框架有效解决了多阶段误差传递与细粒度对齐问题。 展开更多
关键词 文本到图像生成 三向注意力 隐式特征图 细粒度对齐 双阶段生成
在线阅读 下载PDF
图文跨模态检索双重过滤与动态补全的注意力区域优化方法研究
12
作者 孟凡奇 田凯迪 田研 《现代信息科技》 2026年第1期41-46,共6页
当前图文跨模态检索存在两个主要瓶颈:传统注意力机制往往包含大量冗余区域,引入无关语义噪声;过度筛选又会导致有效区域不足,造成关键视觉信息丢失。这两种情况均会显著降低模型的匹配精度与鲁棒性。针对该问题,提出一种双重优化策略:... 当前图文跨模态检索存在两个主要瓶颈:传统注意力机制往往包含大量冗余区域,引入无关语义噪声;过度筛选又会导致有效区域不足,造成关键视觉信息丢失。这两种情况均会显著降低模型的匹配精度与鲁棒性。针对该问题,提出一种双重优化策略:首先通过双重过滤机制自适应保留高响应区域,有效抑制冗余噪声;同时创新性引入Top-K动态补全方法,在检测到特征不足时自动补充关键语义区域。实验验证表明,该方法在保持特征选择精度的同时,有效避免关键信息丢失,显著提升了模型在复杂场景下的跨模态匹配性能。 展开更多
关键词 跨模态检索 图文检索 特征对齐 阈值过滤 注意力优化
在线阅读 下载PDF
基于SMPL模态分解与嵌入融合的多模态步态识别
13
作者 吴越 梁铮 +4 位作者 高巍 杨茂达 赵培森 邓红霞 常媛媛 《浙江大学学报(工学版)》 北大核心 2026年第1期52-60,共9页
针对现有步态识别研究中步态信息挖掘不足和跨模态特征对齐不充分导致真实场景中识别性能受限的问题,提出基于蒙皮多人线性(SMPL)模态分解与嵌入融合的多模态步态识别方法.通过将SMPL模型分解为形状分支和姿势分支,全面提取人体静态形... 针对现有步态识别研究中步态信息挖掘不足和跨模态特征对齐不充分导致真实场景中识别性能受限的问题,提出基于蒙皮多人线性(SMPL)模态分解与嵌入融合的多模态步态识别方法.通过将SMPL模型分解为形状分支和姿势分支,全面提取人体静态形状特征和动态运动特征;构建自适应帧关节注意力模块,自适应聚焦关键帧与重要关节,增强姿势特征表达能力;设计模态嵌入融合模块,将不同模态特征投影至统一语义空间,并构建模态一致性损失函数,优化跨模态特征对齐,提升融合效果.在Gait3D数据集上的实验结果表明,与6种基于轮廓的方法、2种基于骨骼的方法以及5种基于轮廓和骨骼或SMPL模型的多模态方法比较,所提方法 Rank-1准确率达到70.4%,在复杂真实场景中表现出更高鲁棒性,验证了所提方法在模态特征提取和跨模态特征对齐方面的有效性. 展开更多
关键词 步态识别 SMPL模型 自适应注意力 特征对齐 模态融合
在线阅读 下载PDF
基于中西医临床病证特点的视网膜色素变性动物模型分析
14
作者 李晓宇 梁丽娜 +2 位作者 陈结凤 朱晓晓 齐依娜 《中国实验方剂学杂志》 北大核心 2026年第3期198-203,共6页
视网膜色素变性(RP)是临床最常见的遗传性致盲眼病,患者视网膜光感受器细胞进行性凋亡伴随视网膜色素上皮(RPE)细胞变性,其发病机制暂不明确,当下西医治疗以基因、干细胞移植等方法为主,但疗效较为有限,而中医药治疗在临床观察中显示出... 视网膜色素变性(RP)是临床最常见的遗传性致盲眼病,患者视网膜光感受器细胞进行性凋亡伴随视网膜色素上皮(RPE)细胞变性,其发病机制暂不明确,当下西医治疗以基因、干细胞移植等方法为主,但疗效较为有限,而中医药治疗在临床观察中显示出有一定的疗效,建立符合中西医病证特点的RP动物模型,有助于共同发挥中、西医治疗的优势,从而拓宽RP治疗方案。该研究对RP已有动物模型的分类、种类、遗传方式与临床吻合度进行整理与总结,发现当下RP模型主要来自于RD小鼠、RCS大鼠等自然动物模型,RPE-65基因敲除小鼠、视紫红质基因敲除小鼠等转基因动物模型,单色光照射、N-乙基-N-亚硝基脲(ENU)等化学造模法模型。以上3类模型更多侧重于RP的组织病理学、分子生物学、细胞免疫学等检测指标,对疾病特征观察较为有限,对证候观察基本缺失。RP虽为先天遗传性疾病,其发病进程仍受到环境、体质、情志、养护等后天因素的影响,现有模型未能全面展现疾病特征。故建立基于中西医病证特点的RP动物模型将对今后开展实验与临床研究有积极意义。 展开更多
关键词 视网膜色素变性 中西医病证特点 动物模型 临床吻合度
原文传递
基于细粒度特征增强的多模态视觉问答研究
15
作者 王志伟 陆振宇 《南京信息工程大学学报》 北大核心 2026年第1期35-47,共13页
现有多模态视觉问答(Visual Question Answering,VQA)模型忽略了图像中局部显著信息与文本中局部基本词之间的细粒度交互作用,图像与文本之间的语义相关性有待提高.为此,本文提出一种基于细粒度特征增强的多模态视觉问答方法.首先,对视... 现有多模态视觉问答(Visual Question Answering,VQA)模型忽略了图像中局部显著信息与文本中局部基本词之间的细粒度交互作用,图像与文本之间的语义相关性有待提高.为此,本文提出一种基于细粒度特征增强的多模态视觉问答方法.首先,对视觉和文本分别增加一种细粒度特征提取方法,以便更全面准确地提取图像和问题的语义特征;然后,为了利用不同层次模态之间的对齐信息,提出一种对齐引导的自注意力模块来对齐单一模态内(视觉或文本)细粒度特征和全局语义特征之间的对应关系,并以统一的方式融合不同层次的单模态信息;最后,在VQA v2.0和VQA-CP v2数据集上进行实验,结果表明,本文所提方法在各项视觉问答评估指标上的表现优于现有的模型. 展开更多
关键词 视觉问答 多模态 细粒度 特征增强 实体对齐 特征融合
在线阅读 下载PDF
一种基于CLIP和动态语义优化的文本到3D形状生成方法
16
作者 袁康 王旭智 +2 位作者 万旺根 孙学涛 张振 《工业控制计算机》 2026年第1期47-48,54,共3页
文本到3D形状生成技术为虚拟现实、3D打印和动画设计等领域提供了极具潜力的自然语言交互方式。然而,由于文本与3D形状在模态上的显著差异,以及高质量3D形状生成中存在的语义一致性和多样性挑战,目前的方法往往难以在生成质量与文本一... 文本到3D形状生成技术为虚拟现实、3D打印和动画设计等领域提供了极具潜力的自然语言交互方式。然而,由于文本与3D形状在模态上的显著差异,以及高质量3D形状生成中存在的语义一致性和多样性挑战,目前的方法往往难以在生成质量与文本一致性之间取得平衡。提出了一种基于CLIP和动态语义优化的文本到3D形状生成方法。该方法通过构建动态语义优化模块,实时分解并调整文本特征的语义权重,使生成的3D形状更符合输入文本的描述。将动态语义优化嵌入现有的两阶段特征空间对齐框架中,显著提升了文本到3D形状生成的精度和质量。实验结果表明,与现有方法相比,该方法在生成质量、一致性方面得到了提升。 展开更多
关键词 文本到3D形状生成 CLIP 动态语义优化 特征空间对齐 生成一致性
在线阅读 下载PDF
基于嵌入特征和稀疏矩阵的实体对齐方法
17
作者 冯超文 耿程晨 刘英莉 《浙江大学学报(工学版)》 北大核心 2026年第2期379-387,454,共10页
多语言知识融合的实体对齐面临特征建模粒度不足、结构信息利用受限的挑战,为此提出融合多层次嵌入特征与稀疏矩阵传播机制的实体对齐方法.结合字符特征、词向量特征与邻域关系特征,构建统一的多维实体表示,增强实体的局部语义表达和结... 多语言知识融合的实体对齐面临特征建模粒度不足、结构信息利用受限的挑战,为此提出融合多层次嵌入特征与稀疏矩阵传播机制的实体对齐方法.结合字符特征、词向量特征与邻域关系特征,构建统一的多维实体表示,增强实体的局部语义表达和结构关联建模能力.基于关系嵌入构建稀疏邻接矩阵,结合特征归一化传播机制,实现信息在知识图谱中的稳定扩展与有效传递.为了进一步提升实体匹配的全局一致性,引入Sinkhorn正则化优化相似度矩阵,采用Hungarian算法执行最优实体对齐.所提方法在多个跨语言知识图谱数据集上的命中率和平均倒数排名评价指标上均有稳定性能表现,比代表性方法(如SNGA、EAMI)的竞争性强.该结果有效验证了所提方法的准确性与鲁棒性. 展开更多
关键词 知识图谱 实体对齐 多层次特征建模 稀疏矩阵传播 Sinkhorn正则化
在线阅读 下载PDF
旋转自适应的SAR-可见光图像深度匹配方法
18
作者 周双倩 杨子骏 +2 位作者 马蔼彤 周慧婕 牛轶峰 《海军航空大学学报》 2026年第1期197-206,共10页
为解决跨模态匹配中因成像机理差异和视角变化导致的特征失配问题,提出了一种面向SAR与可见光异源图像的自适应匹配方法。为避免难以设计的手工特征和昂贵的数据与标注成本,创新性地将LightGlue深度匹配器首次迁移至SAR-可见光匹配场景... 为解决跨模态匹配中因成像机理差异和视角变化导致的特征失配问题,提出了一种面向SAR与可见光异源图像的自适应匹配方法。为避免难以设计的手工特征和昂贵的数据与标注成本,创新性地将LightGlue深度匹配器首次迁移至SAR-可见光匹配场景。针对跨模态特征表征差异,设计了级联式特征对齐预处理管道,通过非局部均值去噪、高斯滤波和Lab空间CLAHE增强,提升特征的一致性。针对跨视角旋转变化,提出基于多指标约束的两阶段动态步长搜索算法,通过动态指数衰减策略优化角度精度,通过双阶段并行策略优化计算效率。在高分辨率机载图像上进行实验,结果表明,本文方法的匹配成功率可达84.21%,角度估计精度达±1.875°,匹配耗时缩短至15.33 s,为异源图像配准提供了新的技术范式。 展开更多
关键词 SAR-可见光匹配 跨模态特征对齐 动态角度搜索 深度匹配
在线阅读 下载PDF
基于大语言模型的双视角多级跨模态推荐
19
作者 李佚名 于亚新 +2 位作者 于之晟 司一廷 叶育松 《计算机研究与发展》 北大核心 2026年第1期147-161,共15页
多模态推荐系统旨在提供更为精准和个性化的推荐服务。然而,现有研究仍存在以下问题:1)特征失真。由于输入的嵌入均由小型预训练语言模型和深层卷积神经网络等模型进行处理,导致得到的特征表示不准确。2)编码视角单一。目前模型的多模... 多模态推荐系统旨在提供更为精准和个性化的推荐服务。然而,现有研究仍存在以下问题:1)特征失真。由于输入的嵌入均由小型预训练语言模型和深层卷积神经网络等模型进行处理,导致得到的特征表示不准确。2)编码视角单一。目前模型的多模态编码层只考虑在单一的记忆或扩展视角进行编码,造成信息缺失。3)多模态对齐效果欠佳。不同模态嵌入分布在不同空间中,需将其映射至同一空间以实现对齐。而现有方法通过简单的行为信息乘积无法捕捉模态之间的复杂关系,导致多种模态无法精确对齐。基于上述问题,提出了一个新颖的模型DPRec。该模型同时考虑了记忆与扩展的双视角编码,并引入超图进行多级精准跨模态对齐。所提模型在3个真实数据集上进行了扩展实验,实验结果验证了所提模型的有效性。 展开更多
关键词 多模态推荐 特征表示 编码视角 超图 跨模态对齐
在线阅读 下载PDF
自监督对比学习驱动的SMWRec多模态微地图个性化推荐方法
20
作者 马文骏 闫浩文 +3 位作者 李精忠 王小龙 王卓 余懿韬 《地球信息科学学报》 北大核心 2026年第1期105-119,共15页
【目的】现有微地图推荐系统主要依赖用户与内容的历史交互行为,忽视了图像、文本等多模态信息间的协同特征,导致在用户偏好建模和内容理解方面存在表达能力弱、模态融合度低的问题。【方法】针对这一问题,本文提出一种融合自监督机制... 【目的】现有微地图推荐系统主要依赖用户与内容的历史交互行为,忽视了图像、文本等多模态信息间的协同特征,导致在用户偏好建模和内容理解方面存在表达能力弱、模态融合度低的问题。【方法】针对这一问题,本文提出一种融合自监督机制的多模态个性化推荐框架——SMWRec。该方法以图神经网络为主干,联合构建主监督任务与3类自监督对比学习任务。在特征层面,设计特征随机丢弃与特征掩码两种模态无关的数据增强策略,以增强模型对信息不完整与扰动的鲁棒性;在模态层面,引入模态对齐机制,在融合前构建图文语义空间的一致性约束机制。该方法通过最大化同一项目不同视图间的表示一致性、最小化不同项目间的干扰,有效提升了模态间的表达协调性与判别能力。【结果】实验在Movielens、TikTok、Kwai和Wemaps 4个包含图文信息的多模态数据集上开展,评估指标包括Recall@K与NDCG@K,结果显示,SMWRec在4个数据集上普遍优于各强基线;其中在Wemaps中,Recall@10与NDCG@10分别较最优基线提升31.48%和33.86%。【结论】消融与模态缺失实验表明,“先对齐后融合”与特征增强是性能提升的主要来源,并能在部分模态缺失情况下保持较高排序质量。综上,SMWRec有效缓解了稀疏与缺失导致的表征退化,兼具准确性与鲁棒性,为微地图推荐提供了可复现、可扩展的多模态范式。 展开更多
关键词 微地图推荐 多模态 图神经网络 自监督学习 对比学习 特征随机丢弃 特征掩码 模态对齐
原文传递
上一页 1 2 20 下一页 到第
使用帮助 返回顶部