期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
基于多维频域特征融合的人物交互检测
1
作者 樊跃波 陈明轩 +2 位作者 汤显 高永彬 李文超 《计算机应用》 北大核心 2026年第2期580-586,共7页
人物交互(HOI)检测任务的目标是检测图像中所有人与物体之间的交互关系。目前的研究大多采用编码器-解码器结构进行端到端的训练,依赖绝对位置编码(APE),且在复杂的多对象交互场景中表现欠佳。针对现有方法依赖APE,难以有效捕捉人与物... 人物交互(HOI)检测任务的目标是检测图像中所有人与物体之间的交互关系。目前的研究大多采用编码器-解码器结构进行端到端的训练,依赖绝对位置编码(APE),且在复杂的多对象交互场景中表现欠佳。针对现有方法依赖APE,难以有效捕捉人与物体之间的相对空间关系,以及在复杂多对象交互场景中局部与全局信息整合不足的问题,提出一种结合跨维度交互特征提取与频域特征融合的HOI检测模型。首先,改进传统的Transformer编码器,额外引入一种相对位置编码(RPE),并通过融合RPE与APE,使模型能够对人与物体之间的相对关系进行建模;其次,引入一种新的特征提取模块加强图像信息的整合,即通过跨维度交互捕捉图像中通道、空间和特征维度的交互特征,提升信息表达能力,同时利用离散余弦变换(DCT)提取频域特征,从而捕捉更丰富的局部与全局信息;最后,使用Wise-IoU损失函数提升检测精度与类别区分能力,使模型可以更灵活地处理不同类别的目标。实验在HICO-DET和V-COCO两个公开数据集上进行,结果表明,与GEN-VLKT(Guided Embedding Network Visual-Linguistic Knowledge Transfer)模型相比,所提模型在HICO-DET数据集全部种类上的平均精度均值(mAP)提升了0.95个百分点,在VCOCO数据集场景1上的AP提升了0.9个百分点。 展开更多
关键词 人物交互检测 目标检测 相对位置编码 频域特征 离散余弦变换
在线阅读 下载PDF
PCATNet: Position-Class Awareness Transformer for Image Captioning
2
作者 Ziwei Tang Yaohua Yi +1 位作者 Changhui Yu Aiguo Yin 《Computers, Materials & Continua》 SCIE EI 2023年第6期6007-6022,共16页
Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position... Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods. 展开更多
关键词 Image captioning relative position encoding object classes awareness
在线阅读 下载PDF
Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences 被引量:2
3
作者 Le QI Yu ZHANG Ting LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第1期63-71,共9页
Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the ... Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines. 展开更多
关键词 TRANSFORMER relative position encoding bidirectional mask strategy sentence encoder
原文传递
基于位置编码与实体交互信息的关系抽取方法 被引量:2
4
作者 厉晓妍 张德平 《计算机系统应用》 2022年第6期238-244,共7页
关系抽取作为信息抽取领域的重要研究课题,其主要目的是抽取句子中已标记实体对之间的语义关系,对句子语义理解及知识库构建有着重要作用.针对现有抽取方法中未能充分利用单词位置信息和实体间的交互信息导致重要特征丢失的问题,本工作... 关系抽取作为信息抽取领域的重要研究课题,其主要目的是抽取句子中已标记实体对之间的语义关系,对句子语义理解及知识库构建有着重要作用.针对现有抽取方法中未能充分利用单词位置信息和实体间的交互信息导致重要特征丢失的问题,本工作提出一种基于位置编码与实体交互信息的关系抽取方法 (BPI-BERT).首先将新型位置编码融入BERT预训练语言模型生成的词向量中后使用平均池化技术得到实体和句子向量,再利用哈达玛乘积构造实体交互信息,最后将实体向量、句子向量及交互信息向量拼接得到关系向量并输入到Softmax分类器进行关系分类.实验结果表明BPI-BERT在精准率和F1上较现有方法有提高,证明了BPI-BERT的有效性. 展开更多
关键词 位置编码 实体交互 预训练语言模型 关系抽取 监督学习 深度学习 特征融合
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部