In-context learning(ICL), which teaches a large language model(LLM) to perform a task with few-shot demonstrations rather than adjusting the model parameters, has emerged as a strong paradigm for using LLMs. While ear...In-context learning(ICL), which teaches a large language model(LLM) to perform a task with few-shot demonstrations rather than adjusting the model parameters, has emerged as a strong paradigm for using LLMs. While early studies primarily used a fixed or random set of demonstrations for all test queries, recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches along several dimensions. We extend the success of retrieval-based ICL to instructionfinetuned LLMs as well as Chain-of-Thought(CoT) prompting. While the prior work utilizes general Large Language Models(LLMs), such as GPT-3, we find that retrieved demonstrations also enhance instructionfinetuned LLMs. This insight implies that training data, despite being exposed during the fine-tuning phase, can still be effectively used through retrieval and in-context demonstrations during testing, resulting in superior outcomes when compared to utilizing no demonstrations or selecting them at random. For CoT, when the demonstrations contain reasoning chains, we get improvements by retrieving based on such chains. Finally, we train a task-specific demonstration retriever that outperforms off-the-shelf retrievers.展开更多
Visual entailment(VE)is a prototypical task in multimodal visual reasoning,where current methods frequently utilize large language models(LLMs)as the knowledge base to assist in answering questions.These methods heavi...Visual entailment(VE)is a prototypical task in multimodal visual reasoning,where current methods frequently utilize large language models(LLMs)as the knowledge base to assist in answering questions.These methods heavily rely on the textual modality,which inherently cannot capture the full extent of information contained within images.We propose a context-aware visual entailment(CAVE)model,which introduces a novel aggregation module designed to extract high-level semantic features from images.This module integrates lower-level semantic image features into high-level visual tokens,formatting them similarly to text tokens so that they can serve as inputs for LLMs.The CAVE model compensates for the loss of image information and integrates it more effectively with textual comprehension.Additionally,the CAVE model incorporates a new input format and training methodology,which is rooted in instruction tuning and in-context learning techniques.The objective of this research is to maximize the inherent logical reasoning capabilities of LLMs.Experimental results on the E-SNLIVE dataset show that the proposed CAVE model exhibits outstanding performance.展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
With the rapid development of artificial intelligence,large language models(LLMs)have shown promising capabilities in mimicking human-level language comprehen-sion and reasoning.This has sparked significant interest i...With the rapid development of artificial intelligence,large language models(LLMs)have shown promising capabilities in mimicking human-level language comprehen-sion and reasoning.This has sparked significant interest in applying LLMs to enhance various aspects of healthcare,ranging from medical education to clinical decision support.However,medicine involves multifaceted data modalities and nuanced reasoning skills,presenting challenges for integrating LLMs.This review introduces the fundamental applications of general-purpose and specialized LLMs,demon-strating their utilities in knowledge retrieval,research support,clinical workflow automation,and diagnostic assistance.Recognizing the inherent multimodality of medicine,the review emphasizes the multimodal LLMs and discusses their ability to process diverse data types like medical imaging and electronic health records to augment diagnostic accuracy.To address LLMs'limitations regarding personalization and complex clinical reasoning,the review further explores the emerging develop-ment of LLM-powered autonomous agents for healthcare.Moreover,it summarizes the evaluation methodologies for assessing LLMs'reliability and safety in medical contexts.LLMs have transformative potential in medicine;however,there is a pivotal need for continuous optimizations and ethical oversight before these models can be effectively integrated into clinical practice.展开更多
Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring proc...Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring process,waterbody segmentation involves precisely delineating waterbody boundaries from imagery.Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis.In response to these challenges,this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images.However,the segmentation of waterbodies from ordinary images faces several obstacles,including variations in lighting,occlusions from objects like trees and buildings,and reflections on the water surface,all of which can mislead algorithms.Additionally,the diverse shapes and textures of waterbodies,alongside complex backgrounds,further complicate this task.While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets,their application to waterbody segmentation from ground-level images remains underexplored.Hence,this research proposed the Visual Aquatic Generalist(VAGen)as a countermeasure.Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning(ICL)and Visual Prompting(VP),VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL.As demonstrated by the experimental results,VAGen demonstrated a significant increase in the mean Intersection over Union(mIoU)metric,showing a 22.38%enhancement when compared to the baseline model that lacked the integration of learnable prompts.Moreover,VAGen surpassed the current stateof-the-art(SOTA)task-specific models designed for waterbody segmentation by 6.20%.The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead,and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles(UAVs)and mobile computing platforms.This study thereby makes a valuable contribution to the field of computer vision,offering practical solutions for engineering applications related to urban flood monitoring,agricultural water resource management,and environmental conservation efforts.展开更多
文摘In-context learning(ICL), which teaches a large language model(LLM) to perform a task with few-shot demonstrations rather than adjusting the model parameters, has emerged as a strong paradigm for using LLMs. While early studies primarily used a fixed or random set of demonstrations for all test queries, recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches along several dimensions. We extend the success of retrieval-based ICL to instructionfinetuned LLMs as well as Chain-of-Thought(CoT) prompting. While the prior work utilizes general Large Language Models(LLMs), such as GPT-3, we find that retrieved demonstrations also enhance instructionfinetuned LLMs. This insight implies that training data, despite being exposed during the fine-tuning phase, can still be effectively used through retrieval and in-context demonstrations during testing, resulting in superior outcomes when compared to utilizing no demonstrations or selecting them at random. For CoT, when the demonstrations contain reasoning chains, we get improvements by retrieving based on such chains. Finally, we train a task-specific demonstration retriever that outperforms off-the-shelf retrievers.
基金Fundamental Research Funds for the Central Universities,China(No.2232021A-10)Shanghai Pujiang Program,China(No.22PJ1423400)。
文摘Visual entailment(VE)is a prototypical task in multimodal visual reasoning,where current methods frequently utilize large language models(LLMs)as the knowledge base to assist in answering questions.These methods heavily rely on the textual modality,which inherently cannot capture the full extent of information contained within images.We propose a context-aware visual entailment(CAVE)model,which introduces a novel aggregation module designed to extract high-level semantic features from images.This module integrates lower-level semantic image features into high-level visual tokens,formatting them similarly to text tokens so that they can serve as inputs for LLMs.The CAVE model compensates for the loss of image information and integrates it more effectively with textual comprehension.Additionally,the CAVE model incorporates a new input format and training methodology,which is rooted in instruction tuning and in-context learning techniques.The objective of this research is to maximize the inherent logical reasoning capabilities of LLMs.Experimental results on the E-SNLIVE dataset show that the proposed CAVE model exhibits outstanding performance.
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.
基金supported by the National Natural Science Foundation of China(91959205,U22A20327,82203881,12090022,11831002,and 81801778)Beijing Natural Science Foundation(7222021)+2 种基金Beijing Hospitals Authority Youth Programme(QML20231115)Clinical Medicine Plus X-Young Scholars Project of Peking University(PKU2023LCXQ041)Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Cancer(2020B121201004).
文摘With the rapid development of artificial intelligence,large language models(LLMs)have shown promising capabilities in mimicking human-level language comprehen-sion and reasoning.This has sparked significant interest in applying LLMs to enhance various aspects of healthcare,ranging from medical education to clinical decision support.However,medicine involves multifaceted data modalities and nuanced reasoning skills,presenting challenges for integrating LLMs.This review introduces the fundamental applications of general-purpose and specialized LLMs,demon-strating their utilities in knowledge retrieval,research support,clinical workflow automation,and diagnostic assistance.Recognizing the inherent multimodality of medicine,the review emphasizes the multimodal LLMs and discusses their ability to process diverse data types like medical imaging and electronic health records to augment diagnostic accuracy.To address LLMs'limitations regarding personalization and complex clinical reasoning,the review further explores the emerging develop-ment of LLM-powered autonomous agents for healthcare.Moreover,it summarizes the evaluation methodologies for assessing LLMs'reliability and safety in medical contexts.LLMs have transformative potential in medicine;however,there is a pivotal need for continuous optimizations and ethical oversight before these models can be effectively integrated into clinical practice.
文摘Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring process,waterbody segmentation involves precisely delineating waterbody boundaries from imagery.Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis.In response to these challenges,this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images.However,the segmentation of waterbodies from ordinary images faces several obstacles,including variations in lighting,occlusions from objects like trees and buildings,and reflections on the water surface,all of which can mislead algorithms.Additionally,the diverse shapes and textures of waterbodies,alongside complex backgrounds,further complicate this task.While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets,their application to waterbody segmentation from ground-level images remains underexplored.Hence,this research proposed the Visual Aquatic Generalist(VAGen)as a countermeasure.Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning(ICL)and Visual Prompting(VP),VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL.As demonstrated by the experimental results,VAGen demonstrated a significant increase in the mean Intersection over Union(mIoU)metric,showing a 22.38%enhancement when compared to the baseline model that lacked the integration of learnable prompts.Moreover,VAGen surpassed the current stateof-the-art(SOTA)task-specific models designed for waterbody segmentation by 6.20%.The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead,and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles(UAVs)and mobile computing platforms.This study thereby makes a valuable contribution to the field of computer vision,offering practical solutions for engineering applications related to urban flood monitoring,agricultural water resource management,and environmental conservation efforts.