In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput...In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.展开更多
Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the ...Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the increasing size and complexity of these models have led to increased training costs and reduced efficiency.This study aims to minimize the inference time of such models while maintaining computational performance.It also proposes a novel Distillation model for PAL-BERT(DPAL-BERT),specifically,employs knowledge distillation,using the PAL-BERT model as the teacher model to train two student models:DPAL-BERT-Bi and DPAL-BERTC.This research enhances the dataset through techniques such as masking,replacement,and n-gram sampling to optimize knowledge transfer.The experimental results showed that the distilled models greatly outperform models trained from scratch.In addition,although the distilled models exhibit a slight decrease in performance compared to PAL-BERT,they significantly reduce inference time to just 0.25%of the original.This demonstrates the effectiveness of the proposed approach in balancing model performance and efficiency.展开更多
To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,t...To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.展开更多
Recently,pre-trained language representation models such as bidirec-tional encoder representations from transformers(BERT)have been performing well in commonsense question answering(CSQA).However,there is a problem th...Recently,pre-trained language representation models such as bidirec-tional encoder representations from transformers(BERT)have been performing well in commonsense question answering(CSQA).However,there is a problem that the models do not directly use explicit information of knowledge sources existing outside.To augment this,additional methods such as knowledge-aware graph network(KagNet)and multi-hop graph relation network(MHGRN)have been proposed.In this study,we propose to use the latest pre-trained language model a lite bidirectional encoder representations from transformers(ALBERT)with knowledge graph information extraction technique.We also propose to applying the novel method,schema graph expansion to recent language models.Then,we analyze the effect of applying knowledge graph-based knowledge extraction techniques to recent pre-trained language models and confirm that schema graph expansion is effective in some extent.Furthermore,we show that our proposed model can achieve better performance than existing KagNet and MHGRN models in CommonsenseQA dataset.展开更多
In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilizati...In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.展开更多
Knowledge-based visual question answering(KB-VQA),requiring external world knowledge beyond the image for reasoning,is more challenging than traditional visual question answering.Recent works have demonstrated the eff...Knowledge-based visual question answering(KB-VQA),requiring external world knowledge beyond the image for reasoning,is more challenging than traditional visual question answering.Recent works have demonstrated the effectiveness of using a large(vision)language model as an implicit knowledge source to acquire the necessary information.However,the knowledge stored in large models(LMs)is often coarse-grained and inaccurate,causing questions requiring finer-grained information to be answered incorrectly.In this work,we propose a variational expectation-maximization(EM)framework that bootstraps the VQA performance of LMs with its own answer.In contrast to former VQA pipelines,we treat the outside knowledge as a latent variable.In the E-step,we approximate the posterior with two components:First,a rough answer,e.g.,a general description of the image,which is usually the strength of LMs,and second,a multi-modal neural retriever to retrieve question-specific knowledge from an external knowledge base.In the M-step,the training objective optimizes the ability of the original LMs to generate rough answers as well as refined answers based on the retrieved information.Extensive experiments show that our proposed framework,BootLM,has a strong retrieval ability and achieves state-of-the-art performance on knowledge-based VQA tasks.展开更多
Purposes:To develop a bilingual multimodal visual question answering(VQA)benchmark for evaluating Vision-language models(VLMs)in ophthalmology.Methods:In this cross-sectional study,ophthalmic image posts and associate...Purposes:To develop a bilingual multimodal visual question answering(VQA)benchmark for evaluating Vision-language models(VLMs)in ophthalmology.Methods:In this cross-sectional study,ophthalmic image posts and associated captions published between Jan 1,2016,and Dec 31,2024,were collected from WeChat Official Accounts.Based on these captions,bilingual question-answer(QA)pairs in Chinese and English were generated using GPT-4o-mini.QA pairs were categorized into six subsets by question type and language:binary(Binary_CN,Binary_EN),single-choice(Singlechoice_CN,Single-choice_EN),and open-ended(Open-ended_CN,Open-ended_EN).The benchmark was used to evaluate six VLMs:GPT-4o,Gemini 2.0 Flash,Qwen2.5-VL-72B-Instruct,Janus-Pro-7B,InternVL3-8B,and HealthGPT-L14.Primary outcome was overall accuracy;secondary outcomes included subset-,subspeciality-,and modality-specific accuracy.Performance on open-ended questions were also quantified using languagebased metrics,including AlignScore,BARTScore,BERTScore,BLEU,CIDEr,METEOR,and ROUGE_L.Error types in open-ended responses were manually analyzed through stratified sampling.Results:OphthalWeChat included 3469 images and 30120 QA pairs cover 9 ophthalmic subspecialties,548 conditions,29 imaging modalities,and 68 modality combinations.Gemini 2.0 Flash achieved the highest overall accuracy(0.555),significantly outperforming GPT-4o(0.527),Qwen2.5-VL-72B-Instruct(0.520),HealthGPTL14(0.502),InternVL3-L14(0.453),and Janus-Pro-7B(0.333)(all P<0.001).It also led in both Chinese(0.551)and English subsets(0.559).By subset,Gemini 2.0 Flash excelled in Binary_CN(0.687)and Singlechoice_CN(0.666);HealthGPT-L14 performed best in Single-choice_EN(0.739);while GPT-4o ranked highest in Binary_EN(0.717),Open-ended_CN(0.254),and Open-ended_EN(0.271).Language-based metrics showed inconsistent rankings relative to accuracy in open-ended subsets.Performance varied across subspecialties and modalities,with Gemini 2.0 Flash leading in 6 of 9 subspecialties and 11 of top-15 imaging modalities.Error types analysis revealed lesion/diagnosis errors as the most frequent(35.6%-50.6%),followed by anatomical location errors(28.3%-37.5%).Conclusions:This study presents the first bilingual VQA benchmark for ophthalmology,distinguished by its realworld context and inclusion of multiple examinations per patient.The dataset enables quantitative evaluation of VLMs,supporting the development of accurate and specialized AI systems for eye care.展开更多
Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based ...Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities.However,LLMs are commonly perceived as implicit knowledge bases,and their generative and in-context learning potential remains underutilized.Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts.In light of this,we propose Knowledge Generation with Frozen Language Models(KGFLM),a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA.Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order,thereby activating the frozen LLM to produce more useful knowledge statements for better predictions.The generated knowledge statements can also serve as interpretable rationales.In our method,the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question,without requiring additional annotations.Furthermore,a series of experiments are conducted on A-OKVQA and OKVQA datasets.The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.展开更多
Automatic Question Answer System(QAS)is a kind of high-powered software system based on Internet.Its key technology is the interrelated technology based on natural language understanding,including the construction of ...Automatic Question Answer System(QAS)is a kind of high-powered software system based on Internet.Its key technology is the interrelated technology based on natural language understanding,including the construction of knowledge base and corpus,the Word Segmentation and POS Tagging of text,the Grammatical Analysis and Semantic Analysis of sentences etc.This thesis dissertated mainly the denotation of knowledge-information based on semantic network in QAS,the stochastic syntax-parse model named LSF of knowledge-information in QAS,the structure and constitution of QAS.And the LSF model's parameters were exercised,which proved that they were feasible.At the same time,through "the limited-domain QAS" which was exploited for banks by us,these technologies were proved effective and propagable.展开更多
大模型时代,自动问答系统呈现出诸多新的特征。通过文献阅读和梳理,对自动问答系统特征和评测体系进行总结与归纳,从问答模型推理训练的训练数据、预训练框架、模型后处理、模型高效微调等阶段,对比大模型发展初期“追求数据和参数规模...大模型时代,自动问答系统呈现出诸多新的特征。通过文献阅读和梳理,对自动问答系统特征和评测体系进行总结与归纳,从问答模型推理训练的训练数据、预训练框架、模型后处理、模型高效微调等阶段,对比大模型发展初期“追求数据和参数规模”的训练方法和如今“注重数据和模型效率”之间的差异,系统分析基于大模型的自动问答系统新的特征。总结当前各种类型的自动问答大模型评测体系,并详细梳理自动化评价体系HELM(holistic evaluation of language model)在自动问答任务上的数据集、评价指标和量化计算方法。未来基于大模型的自动问答系统研究将会围绕多模态融合、高安全性、高可解释性、低资源消耗,以及结合大模型和自动化的综合评价体系这几个方面进一步拓展与深化。展开更多
基金Supported by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY0026,2023YFH0004).
文摘In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.
基金supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004).
文摘Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the increasing size and complexity of these models have led to increased training costs and reduced efficiency.This study aims to minimize the inference time of such models while maintaining computational performance.It also proposes a novel Distillation model for PAL-BERT(DPAL-BERT),specifically,employs knowledge distillation,using the PAL-BERT model as the teacher model to train two student models:DPAL-BERT-Bi and DPAL-BERTC.This research enhances the dataset through techniques such as masking,replacement,and n-gram sampling to optimize knowledge transfer.The experimental results showed that the distilled models greatly outperform models trained from scratch.In addition,although the distilled models exhibit a slight decrease in performance compared to PAL-BERT,they significantly reduce inference time to just 0.25%of the original.This demonstrates the effectiveness of the proposed approach in balancing model performance and efficiency.
基金Microsoft Research Asia Internet Services in Academic Research Fund(No.FY07-RES-OPP-116)the Science and Technology Development Program of Tianjin(No.06YFGZGX05900)
文摘To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea Government(MSIT)(No.2020R1G1A1100493).
文摘Recently,pre-trained language representation models such as bidirec-tional encoder representations from transformers(BERT)have been performing well in commonsense question answering(CSQA).However,there is a problem that the models do not directly use explicit information of knowledge sources existing outside.To augment this,additional methods such as knowledge-aware graph network(KagNet)and multi-hop graph relation network(MHGRN)have been proposed.In this study,we propose to use the latest pre-trained language model a lite bidirectional encoder representations from transformers(ALBERT)with knowledge graph information extraction technique.We also propose to applying the novel method,schema graph expansion to recent language models.Then,we analyze the effect of applying knowledge graph-based knowledge extraction techniques to recent pre-trained language models and confirm that schema graph expansion is effective in some extent.Furthermore,we show that our proposed model can achieve better performance than existing KagNet and MHGRN models in CommonsenseQA dataset.
文摘In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.
文摘Knowledge-based visual question answering(KB-VQA),requiring external world knowledge beyond the image for reasoning,is more challenging than traditional visual question answering.Recent works have demonstrated the effectiveness of using a large(vision)language model as an implicit knowledge source to acquire the necessary information.However,the knowledge stored in large models(LMs)is often coarse-grained and inaccurate,causing questions requiring finer-grained information to be answered incorrectly.In this work,we propose a variational expectation-maximization(EM)framework that bootstraps the VQA performance of LMs with its own answer.In contrast to former VQA pipelines,we treat the outside knowledge as a latent variable.In the E-step,we approximate the posterior with two components:First,a rough answer,e.g.,a general description of the image,which is usually the strength of LMs,and second,a multi-modal neural retriever to retrieve question-specific knowledge from an external knowledge base.In the M-step,the training objective optimizes the ability of the original LMs to generate rough answers as well as refined answers based on the retrieved information.Extensive experiments show that our proposed framework,BootLM,has a strong retrieval ability and achieves state-of-the-art performance on knowledge-based VQA tasks.
基金supported by the Start-up Fund for RAPs under the Strategic Hiring Scheme(P0048623)from HKSARthe Global STEM Professorship Scheme(P0046113)and Henry G.Leong Endowed Professorship in Elderly Vision Health.
文摘Purposes:To develop a bilingual multimodal visual question answering(VQA)benchmark for evaluating Vision-language models(VLMs)in ophthalmology.Methods:In this cross-sectional study,ophthalmic image posts and associated captions published between Jan 1,2016,and Dec 31,2024,were collected from WeChat Official Accounts.Based on these captions,bilingual question-answer(QA)pairs in Chinese and English were generated using GPT-4o-mini.QA pairs were categorized into six subsets by question type and language:binary(Binary_CN,Binary_EN),single-choice(Singlechoice_CN,Single-choice_EN),and open-ended(Open-ended_CN,Open-ended_EN).The benchmark was used to evaluate six VLMs:GPT-4o,Gemini 2.0 Flash,Qwen2.5-VL-72B-Instruct,Janus-Pro-7B,InternVL3-8B,and HealthGPT-L14.Primary outcome was overall accuracy;secondary outcomes included subset-,subspeciality-,and modality-specific accuracy.Performance on open-ended questions were also quantified using languagebased metrics,including AlignScore,BARTScore,BERTScore,BLEU,CIDEr,METEOR,and ROUGE_L.Error types in open-ended responses were manually analyzed through stratified sampling.Results:OphthalWeChat included 3469 images and 30120 QA pairs cover 9 ophthalmic subspecialties,548 conditions,29 imaging modalities,and 68 modality combinations.Gemini 2.0 Flash achieved the highest overall accuracy(0.555),significantly outperforming GPT-4o(0.527),Qwen2.5-VL-72B-Instruct(0.520),HealthGPTL14(0.502),InternVL3-L14(0.453),and Janus-Pro-7B(0.333)(all P<0.001).It also led in both Chinese(0.551)and English subsets(0.559).By subset,Gemini 2.0 Flash excelled in Binary_CN(0.687)and Singlechoice_CN(0.666);HealthGPT-L14 performed best in Single-choice_EN(0.739);while GPT-4o ranked highest in Binary_EN(0.717),Open-ended_CN(0.254),and Open-ended_EN(0.271).Language-based metrics showed inconsistent rankings relative to accuracy in open-ended subsets.Performance varied across subspecialties and modalities,with Gemini 2.0 Flash leading in 6 of 9 subspecialties and 11 of top-15 imaging modalities.Error types analysis revealed lesion/diagnosis errors as the most frequent(35.6%-50.6%),followed by anatomical location errors(28.3%-37.5%).Conclusions:This study presents the first bilingual VQA benchmark for ophthalmology,distinguished by its realworld context and inclusion of multiple examinations per patient.The dataset enables quantitative evaluation of VLMs,supporting the development of accurate and specialized AI systems for eye care.
基金supported by the National Natural Science Foundation of China(No.62271125).
文摘Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities.However,LLMs are commonly perceived as implicit knowledge bases,and their generative and in-context learning potential remains underutilized.Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts.In light of this,we propose Knowledge Generation with Frozen Language Models(KGFLM),a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA.Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order,thereby activating the frozen LLM to produce more useful knowledge statements for better predictions.The generated knowledge statements can also serve as interpretable rationales.In our method,the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question,without requiring additional annotations.Furthermore,a series of experiments are conducted on A-OKVQA and OKVQA datasets.The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60305009)the Ph.D Degree Teacher Foundation of North China Electric Power University(Grant No.H0585).
文摘Automatic Question Answer System(QAS)is a kind of high-powered software system based on Internet.Its key technology is the interrelated technology based on natural language understanding,including the construction of knowledge base and corpus,the Word Segmentation and POS Tagging of text,the Grammatical Analysis and Semantic Analysis of sentences etc.This thesis dissertated mainly the denotation of knowledge-information based on semantic network in QAS,the stochastic syntax-parse model named LSF of knowledge-information in QAS,the structure and constitution of QAS.And the LSF model's parameters were exercised,which proved that they were feasible.At the same time,through "the limited-domain QAS" which was exploited for banks by us,these technologies were proved effective and propagable.
文摘表格作为一种重要的数据载体,能以紧凑的形式承载大量高价值信息,被广泛应用于经济、金融及科研等领域。表格问答(Table Question Answering,TableQA)旨在针对用自然语言描述的问题,从表格数据中自动进行推理并生成相应的答案,是自然语言处理与数据分析交叉领域的重要研究方向。与传统的文本问答和知识库问答相比,表格问答不仅需要理解自然语言,还须解析表格的二维结构,并处理数值计算与复杂逻辑推理,因此面临更大的挑战。近年来,随着多样化数据集的持续构建,表格问答技术不断取得进展。其研究范式经历了从基于规则与模板的方法,到统计学习与神经网络模型的应用,再到预训练语言模型的引入,整体性能不断提升。尤其是近年来大语言模型(Large Language Models,LLMs)的兴起,进一步推动了表格问答进入新的发展阶段。凭借卓越的跨任务泛化能力与推理能力,大语言模型加速了新型研究范式的形成与发展,为方法创新提供了有力支撑。文中系统梳理了表格问答技术的演进脉络与代表性方法,重点总结了大语言模型驱动下的最新研究进展,概述了当前研究面临的关键挑战,并对未来发展趋势进行了展望。
文摘大模型时代,自动问答系统呈现出诸多新的特征。通过文献阅读和梳理,对自动问答系统特征和评测体系进行总结与归纳,从问答模型推理训练的训练数据、预训练框架、模型后处理、模型高效微调等阶段,对比大模型发展初期“追求数据和参数规模”的训练方法和如今“注重数据和模型效率”之间的差异,系统分析基于大模型的自动问答系统新的特征。总结当前各种类型的自动问答大模型评测体系,并详细梳理自动化评价体系HELM(holistic evaluation of language model)在自动问答任务上的数据集、评价指标和量化计算方法。未来基于大模型的自动问答系统研究将会围绕多模态融合、高安全性、高可解释性、低资源消耗,以及结合大模型和自动化的综合评价体系这几个方面进一步拓展与深化。