With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,ap...With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.展开更多
Laboratory safety is a critical area of broad societal concern,particularly in the detection of abnormal actions.To enhance the efficiency and accuracy of detecting such actions,this paper introduces a novel method ca...Laboratory safety is a critical area of broad societal concern,particularly in the detection of abnormal actions.To enhance the efficiency and accuracy of detecting such actions,this paper introduces a novel method called TubeRAPT(Tubelet Transformer based onAdapter and Prefix TrainingModule).Thismethod primarily comprises three key components:the TubeR network,an adaptive clustering attention mechanism,and a prefix training module.These components work in synergy to address the challenge of knowledge preservation in models pretrained on large datasets while maintaining training efficiency.The TubeR network serves as the backbone for spatio-temporal feature extraction,while the adaptive clustering attention mechanism refines the focus on relevant information.The prefix training module facilitates efficient fine-tuning and knowledge transfer.Experimental results demonstrate the effectiveness of TubeRAPT,achieving a 68.44%mean Average Precision(mAP)on the CLA(Crazy LabActivity)small-scale dataset,marking a significant improvement of 1.53%over the previous TubeR method.This research not only showcases the potential applications of TubeRAPT in the field of abnormal action detection but also offers innovative ideas and technical support for the future development of laboratory safety monitoring technologies.The proposed method has implications for improving safety management systems in various laboratory environments,potentially reducing accidents and enhancing overall workplace safety.展开更多
Modern smart grids face significant challenges in short-term load forecasting due to increasing complexity across transmission,distribution,and consumer levels.While recent studies have explored large language models ...Modern smart grids face significant challenges in short-term load forecasting due to increasing complexity across transmission,distribution,and consumer levels.While recent studies have explored large language models for load forecasting,existing methods are limited by computational overhead,voltage-level specificity,and inadequate cross-domain generalization.This paper introduces Multi-Voltage Load Forecasting Large Model(MVLFLM),a unified Transformer-based framework that addresses multi-voltage STLF through efficient parameter-fine-tuning of a Llama 2-7B foundation model.Unlike previous LLM-based forecasting methods that focus on single voltage levels or require extensive retraining,MVLFLM employs selective layer freezing to preserve pre-trained knowledge while adapting only essential parameters for load pattern recognition.prehensive Com-evaluation across four real-world datasets spanning high(transmission),medium(distribution),and low(consumer)voltage levels demonstrates MVLFLM’s superior performance,achieving higher performance than benchmarks.Most significantly,MVLFLM exhibits exceptional zero-shot generalization with only 9.07%average performance degradation when applied to unseen grid entities,substantially outperforming existing methods.These results establish MVLFLM as the unified,computationally efficient solution for multi-voltage load forecasting that maintains forecasting accuracy while enabling seamless deployment across heterogeneous smart grid infrastructures.展开更多
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci...Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.展开更多
In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address thi...In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.展开更多
End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods th...End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods that perform full fine-tuning of pretrained video models often incur substantial computational costs,which become particularly pronounced when processing long video sequences.Moreover,the need for precise temporal boundary annotations makes data labeling extremely expensive.In low-resource settings where annotated samples are scarce,direct fine-tuning tends to cause overfitting.To address these challenges,we introduce Dynamic LowRank Adapter(DyLoRA),a lightweight fine-tuning framework tailored specifically for the TAD task.Built upon the Low-Rank Adaptation(LoRA)architecture,DyLoRA adapts only the key layers of the pretrained model via low-rank decomposition,reducing the number of trainable parameters to less than 5%of full fine-tuning methods.This significantly lowers memory consumption and mitigates overfitting in low-resource settings.Notably,DyLoRA enhances the temporal modeling capability of pretrained models by optimizing temporal dimension weights,thereby alleviating the representation misalignment of temporal features.Experimental results demonstrate that DyLoRA-TAD achieves impressive performance,with 73.9%mAP on THUMOS14,39.52%on ActivityNet-1.3,and 28.2%on Charades,substantially surpassing the best traditional feature-based methods.展开更多
Continual learning,characterized by the sequential acquisition of multiple tasks,has emerged as a prominent challenge in deep learning.During the process of continual learning,deep neural networks experience a phenome...Continual learning,characterized by the sequential acquisition of multiple tasks,has emerged as a prominent challenge in deep learning.During the process of continual learning,deep neural networks experience a phenomenon known as catastrophic forgetting,wherein networks lose the acquired knowledge related to previous tasks when training on new tasks.Recently,parameter-efficient fine-tuning(PEFT)methods have gained prominence in tackling the challenge of catastrophic forgetting.However,within the realm of domain incremental learning,a type characteristic of continual learning,there exists an additional overlooked inductive bias,which warrants attention beyond existing approaches.In this paper,we propose a novel PEFT method called Domain Correlation Low-Rank Adaptation for domain incremental learning.Our approach put forward a domain correlated loss,which encourages the weights of the LoRA module for adjacent tasks to become more similar,thereby leveraging the correlation between different task domains.Furthermore,we consolidate the classifiers of different task domains to improve prediction performance by capitalizing on the knowledge acquired from diverse tasks.To validate the effectiveness of our method,we conduct comparative experiments and ablation studies on publicly available domain incremental learning benchmark dataset.The experimental results demonstrate that our method outperforms state-of-the-art approaches.展开更多
Industrial fault diagnosis is crucial for ensuring the safety and efficiency of modern production systems.Industrial big data,particularly large-scale tabular data capturing multivariate time-series processes,offer va...Industrial fault diagnosis is crucial for ensuring the safety and efficiency of modern production systems.Industrial big data,particularly large-scale tabular data capturing multivariate time-series processes,offer valuable operational insights.Existing methods face significant challenges due to extreme label scarcity and massive unlabeled data volumes.Large Language Models(LLMs)hold great potential to address these issues due to their strong heterogeneous and few-shot learning capabilities.However,the application of LLMs to fault diagnosis with industrial big data,especially for tabular data,remains unexplored.In view of this,we propose a novel semi-supervised prefix tuning of LLMs for fault diagnosis with industrial big data.We first generate auxiliary prediction tasks based on the unlabeled data as the semi-supervised training materials for LLMs.Then we design a prefix-based soft embedding layer to fine-tune the LLMs,so that the model is able to learn the task-specific information in a parameter-efficient way.To make the model applicable to industrial big data,we also implement the Sparse Gaussian Processes(SGP)to filter the most informative samples to relieve the computational cost.Finally,we design a hybrid prompt template to effectively combine the hard and soft prompts and formulate the final prediction prompt for the industrial diagnosis tasks.The experiments have proven the superiority of the proposed method.展开更多
Stance detection is the view towards a specific target by a given context(e.g.tweets,commercial reviews).Target-related knowledge is often needed to assist stance detection models in understanding the target well and ...Stance detection is the view towards a specific target by a given context(e.g.tweets,commercial reviews).Target-related knowledge is often needed to assist stance detection models in understanding the target well and making detection correctly.However,prevailing works for knowledge-infused stance detection predominantly incorporate target knowledge from a singular source that lacks knowledge verification in limited domain knowledge.The low-resource training data further increase the challenge for the data-driven large models in this task.To address those challenges,we propose a collaborative knowledge infusion approach for low-resource stance detection tasks,employing a combination of aligned knowledge enhancement and efficient parameter learning techniques.Specifically,our stance detection approach leverages target background knowledge collaboratively from different knowledge sources with the help of knowledge alignment.Additionally,we also introduce the parameter-efficient collaborative adaptor with a staged optimization algorithm,which collaboratively addresses the challenges associated with low-resource stance detection tasks from both network structure and learning perspectives.To assess the effectiveness of our method,we conduct extensive experiments on three public stance detection datasets,including low-resource and cross-target settings.The results demonstrate significant performance improvements compared to the existing stance detection approaches.展开更多
Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring proc...Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring process,waterbody segmentation involves precisely delineating waterbody boundaries from imagery.Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis.In response to these challenges,this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images.However,the segmentation of waterbodies from ordinary images faces several obstacles,including variations in lighting,occlusions from objects like trees and buildings,and reflections on the water surface,all of which can mislead algorithms.Additionally,the diverse shapes and textures of waterbodies,alongside complex backgrounds,further complicate this task.While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets,their application to waterbody segmentation from ground-level images remains underexplored.Hence,this research proposed the Visual Aquatic Generalist(VAGen)as a countermeasure.Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning(ICL)and Visual Prompting(VP),VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL.As demonstrated by the experimental results,VAGen demonstrated a significant increase in the mean Intersection over Union(mIoU)metric,showing a 22.38%enhancement when compared to the baseline model that lacked the integration of learnable prompts.Moreover,VAGen surpassed the current stateof-the-art(SOTA)task-specific models designed for waterbody segmentation by 6.20%.The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead,and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles(UAVs)and mobile computing platforms.This study thereby makes a valuable contribution to the field of computer vision,offering practical solutions for engineering applications related to urban flood monitoring,agricultural water resource management,and environmental conservation efforts.展开更多
文摘With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.
基金supported by the Philosophy and Social Sciences Planning Project of Guangdong Province of China(GD23XGL099)the Guangdong General Universities Young Innovative Talents Project(2023KQNCX247)the Research Project of Shanwei Institute of Technology(SWKT22-019).
文摘Laboratory safety is a critical area of broad societal concern,particularly in the detection of abnormal actions.To enhance the efficiency and accuracy of detecting such actions,this paper introduces a novel method called TubeRAPT(Tubelet Transformer based onAdapter and Prefix TrainingModule).Thismethod primarily comprises three key components:the TubeR network,an adaptive clustering attention mechanism,and a prefix training module.These components work in synergy to address the challenge of knowledge preservation in models pretrained on large datasets while maintaining training efficiency.The TubeR network serves as the backbone for spatio-temporal feature extraction,while the adaptive clustering attention mechanism refines the focus on relevant information.The prefix training module facilitates efficient fine-tuning and knowledge transfer.Experimental results demonstrate the effectiveness of TubeRAPT,achieving a 68.44%mean Average Precision(mAP)on the CLA(Crazy LabActivity)small-scale dataset,marking a significant improvement of 1.53%over the previous TubeR method.This research not only showcases the potential applications of TubeRAPT in the field of abnormal action detection but also offers innovative ideas and technical support for the future development of laboratory safety monitoring technologies.The proposed method has implications for improving safety management systems in various laboratory environments,potentially reducing accidents and enhancing overall workplace safety.
基金supported in part by the National Natural Sci-ence Foundation of China(Key Program 71931003,72061147004,72171206,72192805,and 42105145)in part by the Shenzhen Institute of Artificial Intelligence and Robotics for Society.
文摘Modern smart grids face significant challenges in short-term load forecasting due to increasing complexity across transmission,distribution,and consumer levels.While recent studies have explored large language models for load forecasting,existing methods are limited by computational overhead,voltage-level specificity,and inadequate cross-domain generalization.This paper introduces Multi-Voltage Load Forecasting Large Model(MVLFLM),a unified Transformer-based framework that addresses multi-voltage STLF through efficient parameter-fine-tuning of a Llama 2-7B foundation model.Unlike previous LLM-based forecasting methods that focus on single voltage levels or require extensive retraining,MVLFLM employs selective layer freezing to preserve pre-trained knowledge while adapting only essential parameters for load pattern recognition.prehensive Com-evaluation across four real-world datasets spanning high(transmission),medium(distribution),and low(consumer)voltage levels demonstrates MVLFLM’s superior performance,achieving higher performance than benchmarks.Most significantly,MVLFLM exhibits exceptional zero-shot generalization with only 9.07%average performance degradation when applied to unseen grid entities,substantially outperforming existing methods.These results establish MVLFLM as the unified,computationally efficient solution for multi-voltage load forecasting that maintains forecasting accuracy while enabling seamless deployment across heterogeneous smart grid infrastructures.
基金supported by the National Key R&D Program of China(No.2021YFB0301200)National Natural Science Foundation of China(No.62025208).
文摘Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.
基金supported by the National Key R&D Program of China(No.2023YFB3308601)Sichuan Science and Technology Program(2024NSFJQ0035,2024NSFSC0004)the Talents by Sichuan provincial Party Committee Organization Department.
文摘In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.
基金supported by the National Natural Science Foundation of China(Grant No.62266054)the Major Science and Technology Project of Yunnan Province(Grant No.202402AD080002)the Scientific Research Fund of the Yunnan Provincial Department of Education(Grant No.2025Y0302).
文摘End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods that perform full fine-tuning of pretrained video models often incur substantial computational costs,which become particularly pronounced when processing long video sequences.Moreover,the need for precise temporal boundary annotations makes data labeling extremely expensive.In low-resource settings where annotated samples are scarce,direct fine-tuning tends to cause overfitting.To address these challenges,we introduce Dynamic LowRank Adapter(DyLoRA),a lightweight fine-tuning framework tailored specifically for the TAD task.Built upon the Low-Rank Adaptation(LoRA)architecture,DyLoRA adapts only the key layers of the pretrained model via low-rank decomposition,reducing the number of trainable parameters to less than 5%of full fine-tuning methods.This significantly lowers memory consumption and mitigates overfitting in low-resource settings.Notably,DyLoRA enhances the temporal modeling capability of pretrained models by optimizing temporal dimension weights,thereby alleviating the representation misalignment of temporal features.Experimental results demonstrate that DyLoRA-TAD achieves impressive performance,with 73.9%mAP on THUMOS14,39.52%on ActivityNet-1.3,and 28.2%on Charades,substantially surpassing the best traditional feature-based methods.
基金supported by the NSFC(62122013,U2001211)supported by the Innovative Development Joint Fund Key Projects of Shandong NSF(ZR2022LZH007).
文摘Continual learning,characterized by the sequential acquisition of multiple tasks,has emerged as a prominent challenge in deep learning.During the process of continual learning,deep neural networks experience a phenomenon known as catastrophic forgetting,wherein networks lose the acquired knowledge related to previous tasks when training on new tasks.Recently,parameter-efficient fine-tuning(PEFT)methods have gained prominence in tackling the challenge of catastrophic forgetting.However,within the realm of domain incremental learning,a type characteristic of continual learning,there exists an additional overlooked inductive bias,which warrants attention beyond existing approaches.In this paper,we propose a novel PEFT method called Domain Correlation Low-Rank Adaptation for domain incremental learning.Our approach put forward a domain correlated loss,which encourages the weights of the LoRA module for adjacent tasks to become more similar,thereby leveraging the correlation between different task domains.Furthermore,we consolidate the classifiers of different task domains to improve prediction performance by capitalizing on the knowledge acquired from diverse tasks.To validate the effectiveness of our method,we conduct comparative experiments and ablation studies on publicly available domain incremental learning benchmark dataset.The experimental results demonstrate that our method outperforms state-of-the-art approaches.
基金supported by the National Key R&D Program of China(No.2023YFB4704900)the National Natural Science Foundation of China(Nos.62422312,62203134,and 62503337)+3 种基金the National Natural Science Funds for Distinguished Young Scholar(No.62325307)the Natural Science Foundation of Guangdong Province(No.023B1515120038)the Shenzhen Science and Technology Innovation Commission(Nos.20220809141216003 and KJZD20230923113801004)the Scientific Instrument Developing Project of Shenzhen University(No.2023YQ019).
文摘Industrial fault diagnosis is crucial for ensuring the safety and efficiency of modern production systems.Industrial big data,particularly large-scale tabular data capturing multivariate time-series processes,offer valuable operational insights.Existing methods face significant challenges due to extreme label scarcity and massive unlabeled data volumes.Large Language Models(LLMs)hold great potential to address these issues due to their strong heterogeneous and few-shot learning capabilities.However,the application of LLMs to fault diagnosis with industrial big data,especially for tabular data,remains unexplored.In view of this,we propose a novel semi-supervised prefix tuning of LLMs for fault diagnosis with industrial big data.We first generate auxiliary prediction tasks based on the unlabeled data as the semi-supervised training materials for LLMs.Then we design a prefix-based soft embedding layer to fine-tune the LLMs,so that the model is able to learn the task-specific information in a parameter-efficient way.To make the model applicable to industrial big data,we also implement the Sparse Gaussian Processes(SGP)to filter the most informative samples to relieve the computational cost.Finally,we design a hybrid prompt template to effectively combine the hard and soft prompts and formulate the final prediction prompt for the industrial diagnosis tasks.The experiments have proven the superiority of the proposed method.
基金supported by the RCA founding of A*STAR and DSO National Laboratory(Nos.2208-526-RCA-CFAR and SC23/22-3204FA)。
文摘Stance detection is the view towards a specific target by a given context(e.g.tweets,commercial reviews).Target-related knowledge is often needed to assist stance detection models in understanding the target well and making detection correctly.However,prevailing works for knowledge-infused stance detection predominantly incorporate target knowledge from a singular source that lacks knowledge verification in limited domain knowledge.The low-resource training data further increase the challenge for the data-driven large models in this task.To address those challenges,we propose a collaborative knowledge infusion approach for low-resource stance detection tasks,employing a combination of aligned knowledge enhancement and efficient parameter learning techniques.Specifically,our stance detection approach leverages target background knowledge collaboratively from different knowledge sources with the help of knowledge alignment.Additionally,we also introduce the parameter-efficient collaborative adaptor with a staged optimization algorithm,which collaboratively addresses the challenges associated with low-resource stance detection tasks from both network structure and learning perspectives.To assess the effectiveness of our method,we conduct extensive experiments on three public stance detection datasets,including low-resource and cross-target settings.The results demonstrate significant performance improvements compared to the existing stance detection approaches.
文摘Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring process,waterbody segmentation involves precisely delineating waterbody boundaries from imagery.Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis.In response to these challenges,this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images.However,the segmentation of waterbodies from ordinary images faces several obstacles,including variations in lighting,occlusions from objects like trees and buildings,and reflections on the water surface,all of which can mislead algorithms.Additionally,the diverse shapes and textures of waterbodies,alongside complex backgrounds,further complicate this task.While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets,their application to waterbody segmentation from ground-level images remains underexplored.Hence,this research proposed the Visual Aquatic Generalist(VAGen)as a countermeasure.Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning(ICL)and Visual Prompting(VP),VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL.As demonstrated by the experimental results,VAGen demonstrated a significant increase in the mean Intersection over Union(mIoU)metric,showing a 22.38%enhancement when compared to the baseline model that lacked the integration of learnable prompts.Moreover,VAGen surpassed the current stateof-the-art(SOTA)task-specific models designed for waterbody segmentation by 6.20%.The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead,and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles(UAVs)and mobile computing platforms.This study thereby makes a valuable contribution to the field of computer vision,offering practical solutions for engineering applications related to urban flood monitoring,agricultural water resource management,and environmental conservation efforts.