In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address thi...In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.展开更多
基金supported by the National Key R&D Program of China(No.2023YFB3308601)Sichuan Science and Technology Program(2024NSFJQ0035,2024NSFSC0004)the Talents by Sichuan provincial Party Committee Organization Department.
文摘In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.