This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework.Addressing challenges such as the complexity ofmedical terminology,the difficulty of constructingmedical knowledge...This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework.Addressing challenges such as the complexity ofmedical terminology,the difficulty of constructingmedical knowledge graphs,and the scarcity of medical data,the method retrieves structured knowledge from clinical cases via external knowledge graphs.The method retrieves structured knowledge from external knowledge graphs related to clinical cases,encodes it,and injects it into the prompt templates to enhance the language model’s understanding and reasoning capabilities for the task.We conducted experiments on three public datasets:CHIP-CTC,IMCS-V2-NER,and KUAKE-QTR.The results indicate that the proposedmethod significantly outperforms existing models acrossmultiple evaluation metrics.Additionally,ablation studies confirmed the critical role of the knowledge injection module,as the removal of this module resulted in a significant drop in F1 score.The experimental results demonstrate that the proposed method not only effectively improves the accuracy of disease diagnosis but also enhances the interpretability of the predictions,providing more reliable support and evidence for clinical diagnosis.展开更多
The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively han...The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.展开更多
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ...Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.展开更多
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.展开更多
This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs mainta...This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs maintain current knowledge and are essential for providing accurate and up-to-date information. The datasets analyzed in this article are intended to evaluate LLM performance on educational tasks, such as error correction and question answering. We acknowledge the limitations of LLMs while highlighting their fundamental educational capabilities in writing, math, programming, and reasoning. We also explore two promising system architectures: a Mixture-of-Experts (MoE) framework and a unified LLM approach, for LLM-based education. The MoE approach makes use of specialized LLMs under the direction of a central controller for various subjects. We also discuss the use of LLMs for individualized feedback and their possibility in content creation, including the creation of videos, quizzes, and plans. In our final section, we discuss the difficulties and potential solutions for incorporating LLMs into educational systems, highlighting the importance of factual accuracy, reducing bias, and fostering critical thinking abilities. The purpose of this survey is to show the promise of LLMs as well as the issues that still need to be resolved in order to facilitate their responsible and successful integration into the educational ecosystem.展开更多
Large Language Models(LLMs)have significantly advanced human-computer interaction by improving natural language understanding and generation.However,their vulnerability to adversarial prompts–carefully designed input...Large Language Models(LLMs)have significantly advanced human-computer interaction by improving natural language understanding and generation.However,their vulnerability to adversarial prompts–carefully designed inputs that manipulate model outputs–presents substantial challenges.This paper introduces a classification-based approach to detect adversarial prompts by utilizing both prompt features and prompt response features.Elevenmachine learning models were evaluated based on key metrics such as accuracy,precision,recall,and F1-score.The results show that the Convolutional Neural Network–Long Short-Term Memory(CNN-LSTM)cascade model delivers the best performance,especially when using prompt features,achieving an accuracy of over 97%in all adversarial scenarios.Furthermore,the Support Vector Machine(SVM)model performed best with prompt response features,particularly excelling in prompt type classification tasks.Classification results revealed that certain types of adversarial attacks,such as“Word Level”and“Adversarial Prefix”,were particularly difficult to detect,as indicated by their low recall and F1-scores.These findings suggest that more subtle manipulations can evade detection mechanisms.In contrast,attacks like“Sentence Level”and“Adversarial Insertion”were easier to identify,due to the model’s effectiveness in recognizing inserted content.Natural Language Processing(NLP)techniques played a critical role by enabling the extraction of semantic and syntactic features from both prompts and their corresponding responses.These insights highlight the importance of combining traditional and deep learning approaches,along with advanced NLP techniques,to build more reliable adversarial prompt detection systems for LLMs.展开更多
Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step...Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.展开更多
Effective news recommendation is crucial for alleviating users’information overload.While recent prompt-based news recommendation methods have shown promising performance by reformulating the recommendation task as a...Effective news recommendation is crucial for alleviating users’information overload.While recent prompt-based news recommendation methods have shown promising performance by reformulating the recommendation task as a masked prediction problem,we note that this paradigm still faces several major limitations including inadequate multi-interest representation,limited global interaction modeling,and historical interaction truncation.To address these problems,this paper proposes PCRec,a prompt-guided cross-view contrastive learning framework for multi-interest news recommendation.PCRec first introduces feature-level prompts to overcome the input constraints inherent in text-level prompts.Moreover,a two-stage user modeling module is designed to capture users’multi-interests.Finally,to model global user-news relationships,PCRec implements a cross-view contrastive learning strategy.This approach groups similar users,enabling learning from multiple perspectives and breaking down isolated relationships among users,news categories,and news subcategories.Extensive experiments on two real-world news recommendation datasets validate the superiority of our proposed PCRec compared with various state-of-the-art baselines.展开更多
Synonym discovery is important in a wide variety of concept-related tasks,such as entity/concept mining and industrial knowledge graph(KG)construction.It intends to determine whether two terms refer to the same concep...Synonym discovery is important in a wide variety of concept-related tasks,such as entity/concept mining and industrial knowledge graph(KG)construction.It intends to determine whether two terms refer to the same concept in semantics.Existing methods rely on contexts or KGs.However,these methods are often impractical in some cases where contexts or KGs are not available.Therefore,this paper proposes a context-free prompt learning based synonym discovery method called ProSyno,which takes the world’s largest freely available dictionary Wiktionary as a semantic source.Based on a pre-trained language model(PLM),we employ a prompt learning method to generalize to other datasets without any fine-tuning.Thus,our model is more appropriate for context-free situation and can be easily transferred to other fields.Experimental results demonstrate its superiority comparing with state-of-the-art methods.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.62162014).
文摘This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework.Addressing challenges such as the complexity ofmedical terminology,the difficulty of constructingmedical knowledge graphs,and the scarcity of medical data,the method retrieves structured knowledge from clinical cases via external knowledge graphs.The method retrieves structured knowledge from external knowledge graphs related to clinical cases,encodes it,and injects it into the prompt templates to enhance the language model’s understanding and reasoning capabilities for the task.We conducted experiments on three public datasets:CHIP-CTC,IMCS-V2-NER,and KUAKE-QTR.The results indicate that the proposedmethod significantly outperforms existing models acrossmultiple evaluation metrics.Additionally,ablation studies confirmed the critical role of the knowledge injection module,as the removal of this module resulted in a significant drop in F1 score.The experimental results demonstrate that the proposed method not only effectively improves the accuracy of disease diagnosis but also enhances the interpretability of the predictions,providing more reliable support and evidence for clinical diagnosis.
基金partially supported by China Postdoctoral Science Foundation(2023M730741)the National Natural Science Foundation of China(U22B2052,52102432,52202452,62372080,62302078)
文摘The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.
基金supported by the National Natural Science Foundation of China(Nos.42488201,42172137,42050104,and 42050102)the National Key R&D Program of China(No.2023YFF0804000)Sichuan Provincial Youth Science&Technology Innovative Research Group Fund(No.2022JDTD0004)
文摘Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.
基金supported in part by the National Key Research and Development Program of China,No.2021ZD0112400National Natural Science Foundation of China,No.U1908214+5 种基金Program for Innovative Research Team at the University of Liaoning Province,No.LT2020015the Support Plan for Key Field Innovation Team of Dalian,No.2021RT06the Support Plan for Leading Innovation Team of Dalian University,No.XLJ202010Program for the Liaoning Province Doctoral Research Starting Fund,No.2022-BS-336Key Laboratory of Advanced Design and Intelligent Computing(Dalian University),and Ministry of Education,No.ADIC2022003Interdisciplinary Project of Dalian University,No.DLUXK-2023-QN-015.
文摘With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
文摘This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs maintain current knowledge and are essential for providing accurate and up-to-date information. The datasets analyzed in this article are intended to evaluate LLM performance on educational tasks, such as error correction and question answering. We acknowledge the limitations of LLMs while highlighting their fundamental educational capabilities in writing, math, programming, and reasoning. We also explore two promising system architectures: a Mixture-of-Experts (MoE) framework and a unified LLM approach, for LLM-based education. The MoE approach makes use of specialized LLMs under the direction of a central controller for various subjects. We also discuss the use of LLMs for individualized feedback and their possibility in content creation, including the creation of videos, quizzes, and plans. In our final section, we discuss the difficulties and potential solutions for incorporating LLMs into educational systems, highlighting the importance of factual accuracy, reducing bias, and fostering critical thinking abilities. The purpose of this survey is to show the promise of LLMs as well as the issues that still need to be resolved in order to facilitate their responsible and successful integration into the educational ecosystem.
文摘Large Language Models(LLMs)have significantly advanced human-computer interaction by improving natural language understanding and generation.However,their vulnerability to adversarial prompts–carefully designed inputs that manipulate model outputs–presents substantial challenges.This paper introduces a classification-based approach to detect adversarial prompts by utilizing both prompt features and prompt response features.Elevenmachine learning models were evaluated based on key metrics such as accuracy,precision,recall,and F1-score.The results show that the Convolutional Neural Network–Long Short-Term Memory(CNN-LSTM)cascade model delivers the best performance,especially when using prompt features,achieving an accuracy of over 97%in all adversarial scenarios.Furthermore,the Support Vector Machine(SVM)model performed best with prompt response features,particularly excelling in prompt type classification tasks.Classification results revealed that certain types of adversarial attacks,such as“Word Level”and“Adversarial Prefix”,were particularly difficult to detect,as indicated by their low recall and F1-scores.These findings suggest that more subtle manipulations can evade detection mechanisms.In contrast,attacks like“Sentence Level”and“Adversarial Insertion”were easier to identify,due to the model’s effectiveness in recognizing inserted content.Natural Language Processing(NLP)techniques played a critical role by enabling the extraction of semantic and syntactic features from both prompts and their corresponding responses.These insights highlight the importance of combining traditional and deep learning approaches,along with advanced NLP techniques,to build more reliable adversarial prompt detection systems for LLMs.
基金National Natural Science Foundation of China(No.62176052)。
文摘Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.
基金supported by the National Key Research and Development Program of China under Grant No.2024YFF0729003the National Natural Science Foundation of China under Grant Nos.62176014 and 62276015the Fundamental Research Funds for the Central Universities of China.
文摘Effective news recommendation is crucial for alleviating users’information overload.While recent prompt-based news recommendation methods have shown promising performance by reformulating the recommendation task as a masked prediction problem,we note that this paradigm still faces several major limitations including inadequate multi-interest representation,limited global interaction modeling,and historical interaction truncation.To address these problems,this paper proposes PCRec,a prompt-guided cross-view contrastive learning framework for multi-interest news recommendation.PCRec first introduces feature-level prompts to overcome the input constraints inherent in text-level prompts.Moreover,a two-stage user modeling module is designed to capture users’multi-interests.Finally,to model global user-news relationships,PCRec implements a cross-view contrastive learning strategy.This approach groups similar users,enabling learning from multiple perspectives and breaking down isolated relationships among users,news categories,and news subcategories.Extensive experiments on two real-world news recommendation datasets validate the superiority of our proposed PCRec compared with various state-of-the-art baselines.
基金supported by the National Key R&D Program of China(2023YFC3304104)the National Natural Science Foundation of China(Grant No.62172094).
文摘Synonym discovery is important in a wide variety of concept-related tasks,such as entity/concept mining and industrial knowledge graph(KG)construction.It intends to determine whether two terms refer to the same concept in semantics.Existing methods rely on contexts or KGs.However,these methods are often impractical in some cases where contexts or KGs are not available.Therefore,this paper proposes a context-free prompt learning based synonym discovery method called ProSyno,which takes the world’s largest freely available dictionary Wiktionary as a semantic source.Based on a pre-trained language model(PLM),we employ a prompt learning method to generalize to other datasets without any fine-tuning.Thus,our model is more appropriate for context-free situation and can be easily transferred to other fields.Experimental results demonstrate its superiority comparing with state-of-the-art methods.