It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing the...It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing theories,such as those by von Finte1 and Rooth,fail to ad-equately account for this phenomenon.In this paper,I exposit the drawbacks of these theories and then propose a novel account,ie.the Good Question-Answer(GQA)view.The GQA account posits that focus triggers question-answer pairs,and pragmatic pressures conceming the adequacy of such question answer pairs in contexts are able to affect the truth-conditions of counterfactuals.I also argue for the GQA account by appeal to its theoretical virtues.展开更多
Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capab...Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.展开更多
With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,ap...With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.展开更多
AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and ...AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.展开更多
Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its app...Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its application in MedVQA requires further enhancement.A critical limitation of contemporary MedVQA systems lies in the inability to integrate lifelong knowledge with specific patient data to generate human-like responses.Existing Transformer-based MedVQA models require enhancing their capabitities for interpreting answers through the applications of medical image knowledge.The introduction of the medical knowledge graph visual language transformer(MKGViLT),designed for joint medical knowledge graphs(KGs),addresses this challenge.MKGViLT incorporates an enhanced Transformer structure to effectively extract features and combine modalities for MedVQA tasks.The MKGViLT model delivers answers based on richer background knowledge,thereby enhancing performance.The efficacy of MKGViLT is evaluated using the SLAKE and P-VQA datasets.Experimental results show that MKGViLT surpasses the most advanced methods on the SLAKE dataset.展开更多
The medical education of the Song dynasty constitutes a pivotal aspect within the broader framework of ancient Chinese medical education. The advent of the imperial examination system coincided with the emergence of a...The medical education of the Song dynasty constitutes a pivotal aspect within the broader framework of ancient Chinese medical education. The advent of the imperial examination system coincided with the emergence of a medical examination system, which served as the cornerstone for the subsequent evolution of medical education. According to historical records, the Song government established dedicated medical departments, along with comprehensive systems encompassing medical professors, students, and examinations. By examining extant medical historical documents, such as Tai Yi Ju Zhu Ke Cheng Wen Ge(《太医局诸科程文格》 Examination Answers and Standards of the Imperial Medical Bureau), researchers and readers can obtain a comprehensive understanding of the medical system that prevailed in the Song dynasty. While the intricate details of medical education during this era are not explicitly documented in historical records, modern researchers have the opportunity to uncover the entire view of medical education, particularly the medical examination system, through rigorous analysis of these extant historical medical documents. Such studies offer valuable insights into the developmental trajectory of the ancient Chinese medical examination system and provide crucial references for contemporary medical education. By conducting in-depth literature research and analysis of Tai Yi Ju Zhu Ke Cheng Wen Ge, this study endeavors to reconstruct the authentic scenario of medical examinations in the Song dynasty, as presented in the document, for the benefit of modern readers and researchers.展开更多
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.展开更多
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput...In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.展开更多
Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the ...Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the increasing size and complexity of these models have led to increased training costs and reduced efficiency.This study aims to minimize the inference time of such models while maintaining computational performance.It also proposes a novel Distillation model for PAL-BERT(DPAL-BERT),specifically,employs knowledge distillation,using the PAL-BERT model as the teacher model to train two student models:DPAL-BERT-Bi and DPAL-BERTC.This research enhances the dataset through techniques such as masking,replacement,and n-gram sampling to optimize knowledge transfer.The experimental results showed that the distilled models greatly outperform models trained from scratch.In addition,although the distilled models exhibit a slight decrease in performance compared to PAL-BERT,they significantly reduce inference time to just 0.25%of the original.This demonstrates the effectiveness of the proposed approach in balancing model performance and efficiency.展开更多
The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challen...The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challenge is that the existing weapons and equipment data fails to carry out structured knowledge representation, and knowledge navigation based on natural language cannot efficiently support the WEORA. To solve above problem, this research proposes a method based on question answering(QA) of weapons and equipment knowledge graph(WEKG) to construct and navigate the knowledge related to weapons and equipment in the WEORA. This method firstly constructs the WEKG, and builds a neutral network-based QA system over the WEKG by means of semantic parsing for knowledge navigation. Finally, the method is evaluated and a chatbot on the QA system is developed for the WEORA. Our proposed method has good performance in the accuracy and efficiency of searching target knowledge, and can well assist the WEORA.展开更多
Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding appro...Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding approaches are deficient in representing some complex relations,resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information.Methods To this end,we propose MKEAH:Multimodal Knowledge Extraction and Accumulation on Hyperplanes.To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information,two losses are proposed to learn the triplet representations from the complementary views:range loss and orthogonal loss.To interpret the capability of extracting topic-related knowledge,we present the Topic Similarity(TS)between topic and entity-relations.Results Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering.Our model outperformed state-of-the-art methods by 2.12%and 3.24%on two challenging knowledge-request datasets:OK-VQA and KRVQA,respectively.Conclusions The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.展开更多
With the continuous expansion of the demand in China for the integration of medical care and elderly care,more social capital will be directed into this field.A LTHOUGHT answers to the question“What is happiness?”ma...With the continuous expansion of the demand in China for the integration of medical care and elderly care,more social capital will be directed into this field.A LTHOUGHT answers to the question“What is happiness?”may vary among young people,for most senior citizens the answer is by and large the same:to be looked after properly.展开更多
Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual r...Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.展开更多
Prepare well before you go to a job interview.First,understand the company's goals.Then it will be easierto answer questions about them.Second,learn about the job.Third,practice answering common intervie wquestion...Prepare well before you go to a job interview.First,understand the company's goals.Then it will be easierto answer questions about them.Second,learn about the job.Third,practice answering common intervie wquestions.Fourth,wear nice clothes and arriveat your interview on time.And after the interview.展开更多
基金supported by the Major Program of National Social Science Foundation of China(No.23&ZD240)。
文摘It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing theories,such as those by von Finte1 and Rooth,fail to ad-equately account for this phenomenon.In this paper,I exposit the drawbacks of these theories and then propose a novel account,ie.the Good Question-Answer(GQA)view.The GQA account posits that focus triggers question-answer pairs,and pragmatic pressures conceming the adequacy of such question answer pairs in contexts are able to affect the truth-conditions of counterfactuals.I also argue for the GQA account by appeal to its theoretical virtues.
基金Supported by the Program for Liaoning Excellent Talents in University(No.LR15045)the Liaoning Provincial Science and Technology Department Applied Basic Research Plan(No.101300243).
文摘Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.
文摘With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.
基金Supported by Hunan Province Traditional Chinese Medicine Research Project(No.B2023043)Hunan Provincial Department of Education Scientific Research Project(No.22B0386)+1 种基金Research Project of Hunan Provincial Health Commission(No.20256982)Hunan University of Traditional Chinese Medicine Campus Level Research Fund Project(No.2022XJZKC004).
文摘AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.
基金Supported by the National Natural Science Foundation of China(No.62001313)the Liaoning Professional Talent Protect(No.XLYC2203046)the Shenyang Municipal Medical Engineering Cross Research Foundation of China(No.22-321-32-09).
文摘Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its application in MedVQA requires further enhancement.A critical limitation of contemporary MedVQA systems lies in the inability to integrate lifelong knowledge with specific patient data to generate human-like responses.Existing Transformer-based MedVQA models require enhancing their capabitities for interpreting answers through the applications of medical image knowledge.The introduction of the medical knowledge graph visual language transformer(MKGViLT),designed for joint medical knowledge graphs(KGs),addresses this challenge.MKGViLT incorporates an enhanced Transformer structure to effectively extract features and combine modalities for MedVQA tasks.The MKGViLT model delivers answers based on richer background knowledge,thereby enhancing performance.The efficacy of MKGViLT is evaluated using the SLAKE and P-VQA datasets.Experimental results show that MKGViLT surpasses the most advanced methods on the SLAKE dataset.
文摘The medical education of the Song dynasty constitutes a pivotal aspect within the broader framework of ancient Chinese medical education. The advent of the imperial examination system coincided with the emergence of a medical examination system, which served as the cornerstone for the subsequent evolution of medical education. According to historical records, the Song government established dedicated medical departments, along with comprehensive systems encompassing medical professors, students, and examinations. By examining extant medical historical documents, such as Tai Yi Ju Zhu Ke Cheng Wen Ge(《太医局诸科程文格》 Examination Answers and Standards of the Imperial Medical Bureau), researchers and readers can obtain a comprehensive understanding of the medical system that prevailed in the Song dynasty. While the intricate details of medical education during this era are not explicitly documented in historical records, modern researchers have the opportunity to uncover the entire view of medical education, particularly the medical examination system, through rigorous analysis of these extant historical medical documents. Such studies offer valuable insights into the developmental trajectory of the ancient Chinese medical examination system and provide crucial references for contemporary medical education. By conducting in-depth literature research and analysis of Tai Yi Ju Zhu Ke Cheng Wen Ge, this study endeavors to reconstruct the authentic scenario of medical examinations in the Song dynasty, as presented in the document, for the benefit of modern readers and researchers.
基金supported in part by the National Key Research and Development Program of China,No.2021ZD0112400National Natural Science Foundation of China,No.U1908214+5 种基金Program for Innovative Research Team at the University of Liaoning Province,No.LT2020015the Support Plan for Key Field Innovation Team of Dalian,No.2021RT06the Support Plan for Leading Innovation Team of Dalian University,No.XLJ202010Program for the Liaoning Province Doctoral Research Starting Fund,No.2022-BS-336Key Laboratory of Advanced Design and Intelligent Computing(Dalian University),and Ministry of Education,No.ADIC2022003Interdisciplinary Project of Dalian University,No.DLUXK-2023-QN-015.
文摘With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
基金Supported by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY0026,2023YFH0004).
文摘In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.
基金supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004).
文摘Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems.However,with the constant evolution of algorithms,data,and computing power,the increasing size and complexity of these models have led to increased training costs and reduced efficiency.This study aims to minimize the inference time of such models while maintaining computational performance.It also proposes a novel Distillation model for PAL-BERT(DPAL-BERT),specifically,employs knowledge distillation,using the PAL-BERT model as the teacher model to train two student models:DPAL-BERT-Bi and DPAL-BERTC.This research enhances the dataset through techniques such as masking,replacement,and n-gram sampling to optimize knowledge transfer.The experimental results showed that the distilled models greatly outperform models trained from scratch.In addition,although the distilled models exhibit a slight decrease in performance compared to PAL-BERT,they significantly reduce inference time to just 0.25%of the original.This demonstrates the effectiveness of the proposed approach in balancing model performance and efficiency.
文摘The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challenge is that the existing weapons and equipment data fails to carry out structured knowledge representation, and knowledge navigation based on natural language cannot efficiently support the WEORA. To solve above problem, this research proposes a method based on question answering(QA) of weapons and equipment knowledge graph(WEKG) to construct and navigate the knowledge related to weapons and equipment in the WEORA. This method firstly constructs the WEKG, and builds a neutral network-based QA system over the WEKG by means of semantic parsing for knowledge navigation. Finally, the method is evaluated and a chatbot on the QA system is developed for the WEORA. Our proposed method has good performance in the accuracy and efficiency of searching target knowledge, and can well assist the WEORA.
基金Supported by National Nature Science Foudation of China(61976160,61906137,61976158,62076184,62076182)Shanghai Science and Technology Plan Project(21DZ1204800)。
文摘Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding approaches are deficient in representing some complex relations,resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information.Methods To this end,we propose MKEAH:Multimodal Knowledge Extraction and Accumulation on Hyperplanes.To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information,two losses are proposed to learn the triplet representations from the complementary views:range loss and orthogonal loss.To interpret the capability of extracting topic-related knowledge,we present the Topic Similarity(TS)between topic and entity-relations.Results Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering.Our model outperformed state-of-the-art methods by 2.12%and 3.24%on two challenging knowledge-request datasets:OK-VQA and KRVQA,respectively.Conclusions The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.
文摘With the continuous expansion of the demand in China for the integration of medical care and elderly care,more social capital will be directed into this field.A LTHOUGHT answers to the question“What is happiness?”may vary among young people,for most senior citizens the answer is by and large the same:to be looked after properly.
文摘Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.
文摘Prepare well before you go to a job interview.First,understand the company's goals.Then it will be easierto answer questions about them.Second,learn about the job.Third,practice answering common intervie wquestions.Fourth,wear nice clothes and arriveat your interview on time.And after the interview.