期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
UniTrans:Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model
1
作者 Jiakang Sun Ke Chen +3 位作者 Xinyang He Xu Liu Ke Li Cheng Peng 《Computers, Materials & Continua》 2025年第4期219-238,共20页
With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,ap... With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks. 展开更多
关键词 Parameter-efficient transfer learning multimodal alignment image captioning image-text retrieval visual question answering
在线阅读 下载PDF
Video action recognition meets vision-language models exploring human factors in scene interaction: a review
2
作者 GUO Yuping GAO Hongwei +3 位作者 YU Jiahui GE Jinchao HAN Meng JU Zhaojie 《Optoelectronics Letters》 2025年第10期626-640,共15页
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions... Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions. 展开更多
关键词 human factors video action recognition vision language models analyze dynamic behaviors spatiotemporal granularity video action recognition var aims multimodal alignment scene interaction
原文传递
Sep-NMS:Unlocking the Aptitude of Two-Stage Referring Expression Comprehension
3
作者 Jing Wang Zhikang Wang +2 位作者 Xiaojie Wang Fangxiang Feng Bo Yang 《CAAI Transactions on Intelligence Technology》 2025年第4期1049-1061,共13页
Referring expression comprehension(REC)aims to locate a specific region in an image described by a natural language.Existing two-stage methods generate multiple candidate proposals in the first stage,followed by selec... Referring expression comprehension(REC)aims to locate a specific region in an image described by a natural language.Existing two-stage methods generate multiple candidate proposals in the first stage,followed by selecting one of these proposals as the grounding result in the second stage.Nevertheless,the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate,thereby enormously limiting the overall network performance.To address the above issues,the authors propose an innovative method termed Separate Non-Maximum Suppression(Sep-NMS)for two-stage REC.Particularly,Sep-NMS models information from the two stages independently and collaboratively,ultimately achieving an overall improvement in comprehension and identification of the target objects.Specifically,the authors propose a Ref-Relatedness module for filtering referent proposals rigorously,decreasing the redundancy of referent proposals.A CLIP†Relatedness module based on robust multimodal pre-trained encoders is built to precisely assess the relevance between language and proposals to improve the recall of critical objects.It is worth mentioning that the authors are the pioneers in utilising a multimodal pre-training model for proposal filtering in the first stage.Moreover,an Information Fusion module is designed to effectively amalgamate the multimodal information across two stages,ensuring maximum uti-lisation of the available information.Extensive experiments demonstrate that the approach achieves competitive performance with previous state-of-the-art methods.The datasets used are publicly available:RefCOCO,RefCOCO+:https://doi.org/10.1007/978-3-319-46475-6_5 and RefCOCOg:https://doi.org/10.1109/CVPR.2016.9. 展开更多
关键词 candidate proposals generation multimodal alignment non-maximum suppression object identification referring expression comprehension
在线阅读 下载PDF
An aligned mixture probabilistic principal component analysis for fault detection of multimode chemical processes 被引量:5
4
作者 杨雅伟 马玉鑫 +1 位作者 宋冰 侍洪波 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2015年第8期1357-1363,共7页
A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the... A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the AMPPCA algorithm first estimates a statistical description for each operating mode by applying mixture probabilistic principal component analysis(MPPCA). As a comparison, the combined MPPCA is employed where monitoring results are softly integrated according to posterior probabilities of the test sample in each local model. For exploiting the cross-mode correlations, which may be useful but are inadvertently neglected due to separately held monitoring approaches, a global monitoring model is constructed by aligning all local models together. In this way, both within-mode and cross-mode correlations are preserved in this integrated space. Finally, the utility and feasibility of AMPPCA are demonstrated through a non-isothermal continuous stirred tank reactor and the TE benchmark process. 展开更多
关键词 Multimode process monitoring Mixture probabilistic principal component analysis Model alignment Fault detection
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部