Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions tow...Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?展开更多
Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions to...Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?展开更多
Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capab...Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.展开更多
The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach ...The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach facilitates the rapid collection of complete knowledge and rules to form effective decisions.However,the current structured degree of the URT emergency knowledge base remains low,and the domain questions lack labeled datasets,resulting in a large deviation between the consultation outcomes and the intended objectives.To address this issue,this paper proposes a question intention recognition model for the URT emergency domain,leveraging knowledge graph(KG)and data enhancement technology.First,a structured storage of emergency cases and emergency plans is realized based on KG.Subsequently,a comprehensive question template is developed,and the labeled dataset of emergency domain questions in URT is generated through the KG.Lastly,data enhancement is applied by prompt learning and the NLP Chinese Data Augmentation(NLPCDA)tool,and the intention recognition model combining Generalized Auto-regression Pre-training for Language Understanding(XLNet)and Recurrent Convolutional Neural Network for Text Classification(TextRCNN)is constructed.Word embeddings are generated by XLNet,context information is further captured using Bidirectional Long Short-Term Memory Neural Network(BiLSTM),and salient features are extracted with Convolutional Neural Network(CNN).Experimental results demonstrate that the proposed model can enhance the clarity of classification and the identification of domain questions,thereby providing supportive knowledge for emergency decision-making in URT.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilizati...In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.展开更多
Microwave fracturing of rocks before mechanical breakage could improve the performance of mechanical excavators and reduce environmental impacts.Previous research focused on the microwave fracturing of intact rock blo...Microwave fracturing of rocks before mechanical breakage could improve the performance of mechanical excavators and reduce environmental impacts.Previous research focused on the microwave fracturing of intact rock blocks.By using an open-ended antenna,this paper investigates the effect of pre-existing joints on the microwave fracturing of the Singapore Bukit Timah granite blocks.The results show that the specimens are weakened in the manners of cracking,spalling,melting,or a combination of them.The crack number and the total crack length produced by microwave treatment of jointed rock blocks are slightly smaller than those in the intact rock blocks.The interaction between joints and microwave-induced cracks can be summarized into the following four patterns:(1)microwave-induced cracks become arrested so that the crack propagation is terminated;(2)microwave-induced cracks penetrate the joints and continue to propagate;(3)microwave-induced cracks become deflected along the joints;and(4)microwave-induced cracks propagate forward following the joints.The smaller the approach angle between the microwave-induced crack and the preexisting joint is,the more microwave-induced cracks tend to be arrested at the joint.Increasing the approach angle between the microwave-induced crack and the joint can increase the chance of microwave-induced crack penetrating the joint.The results also show that the smaller the distance is between the microwave radiation point and the joint,the easier it is for microwave-induced cracks to penetrate the joints;otherwise,the microwave-induced crack is more likely to be arrested at the pre-existing joint.展开更多
During the installation of a pipe pile,the soil around the pile will be squeezed out. This paper deals with this squeezing effect of open-ended pipe piles using the cylindrical cavity expansion theory. The characteris...During the installation of a pipe pile,the soil around the pile will be squeezed out. This paper deals with this squeezing effect of open-ended pipe piles using the cylindrical cavity expansion theory. The characteristics of soil with different tension and compression moduli and dilation are involved by applying the elastic theory with different moduli and logarithmic strain. The closed-form solutions of the radius of the plastic region,the displacement of the boundary between the plastic region and the elastic region and the expansion pressure on the external surface of the pipe piles are obtained. When obtaining these solutions,the soil plug in the open-ended pipe pile is considered by employing an incremental filling ratio to quantify the degree of soil plugging. Moreover,the effects of the ratio of tension and compression moduli,angle of dilation and incremental filling ratio on the radius of the plastic region and the expansion pressure on the external surface of the pipe pile are investigated. The parametric analyses show that it is necessary and important to consider the difference between the tension modulus and compression modulus,dilation angle and incremental filling ratio for studying the squeezing effect of open-ended pipe pile installation. It is concluded that the analytical solutions presented in this paper are suitable for studying the squeezing effect of open-ended pipe piles.展开更多
The flawed engineering practice is considered the main factor that is affecting to the development quality of engineering postgraduates.Taking Foshan Base as an example,this paper has analyzed the operational pattern,...The flawed engineering practice is considered the main factor that is affecting to the development quality of engineering postgraduates.Taking Foshan Base as an example,this paper has analyzed the operational pattern,practice teaching model,and internal governance system of the open-ended base as a new system for engineering practice and proposed several suggestions for the reformation of engineering postgraduates based on the construction effect.展开更多
Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions to...Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?展开更多
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.展开更多
Determining clinical questions is fundamental to the development of clinical practice guidelines(CPGs),which bridges the initial phases and the final recommendations.It is essential for evidence retrieval and the form...Determining clinical questions is fundamental to the development of clinical practice guidelines(CPGs),which bridges the initial phases and the final recommendations.It is essential for evidence retrieval and the formulation of recommendations.The scientific rigor and precision in determination of clinical questions directly influence the future implementation and applicability of guidelines.In 2020,the World Federation of Acupuncture-Moxibustion Societies initiated the project of clinical practice guideline on acupuncture and moxibustion for adult major depressive disorder(mild-moderate degree)to address clinical and medical decision-making issues in acupuncture treatment for adult mild to moderate major depressive disorder.This CPG provides systematic recommendations based on clinical evidence,patient values,and other factors,aiding decision-makers,clinicians,and patients in selecting appropriate interventions.This paper discusses and analyzes the determination process of clinical questions,and the related issues during the development of this guideline,aiming to provide a reference for determining clinical questions and developing CPGs in the field of acupuncture and exploring more scientific tools and methods for determining clinical questions in future CPGs.展开更多
Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding appro...Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding approaches are deficient in representing some complex relations,resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information.Methods To this end,we propose MKEAH:Multimodal Knowledge Extraction and Accumulation on Hyperplanes.To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information,two losses are proposed to learn the triplet representations from the complementary views:range loss and orthogonal loss.To interpret the capability of extracting topic-related knowledge,we present the Topic Similarity(TS)between topic and entity-relations.Results Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering.Our model outperformed state-of-the-art methods by 2.12%and 3.24%on two challenging knowledge-request datasets:OK-VQA and KRVQA,respectively.Conclusions The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.展开更多
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput...In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.展开更多
The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challen...The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challenge is that the existing weapons and equipment data fails to carry out structured knowledge representation, and knowledge navigation based on natural language cannot efficiently support the WEORA. To solve above problem, this research proposes a method based on question answering(QA) of weapons and equipment knowledge graph(WEKG) to construct and navigate the knowledge related to weapons and equipment in the WEORA. This method firstly constructs the WEKG, and builds a neutral network-based QA system over the WEKG by means of semantic parsing for knowledge navigation. Finally, the method is evaluated and a chatbot on the QA system is developed for the WEORA. Our proposed method has good performance in the accuracy and efficiency of searching target knowledge, and can well assist the WEORA.展开更多
文摘Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?
文摘Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?
基金Supported by the Program for Liaoning Excellent Talents in University(No.LR15045)the Liaoning Provincial Science and Technology Department Applied Basic Research Plan(No.101300243).
文摘Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.
基金supported in part by the National Natural Science Foundation of China.The funding numbers 62433005,62272036,62132003,and 62173167.
文摘The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach facilitates the rapid collection of complete knowledge and rules to form effective decisions.However,the current structured degree of the URT emergency knowledge base remains low,and the domain questions lack labeled datasets,resulting in a large deviation between the consultation outcomes and the intended objectives.To address this issue,this paper proposes a question intention recognition model for the URT emergency domain,leveraging knowledge graph(KG)and data enhancement technology.First,a structured storage of emergency cases and emergency plans is realized based on KG.Subsequently,a comprehensive question template is developed,and the labeled dataset of emergency domain questions in URT is generated through the KG.Lastly,data enhancement is applied by prompt learning and the NLP Chinese Data Augmentation(NLPCDA)tool,and the intention recognition model combining Generalized Auto-regression Pre-training for Language Understanding(XLNet)and Recurrent Convolutional Neural Network for Text Classification(TextRCNN)is constructed.Word embeddings are generated by XLNet,context information is further captured using Bidirectional Long Short-Term Memory Neural Network(BiLSTM),and salient features are extracted with Convolutional Neural Network(CNN).Experimental results demonstrate that the proposed model can enhance the clarity of classification and the identification of domain questions,thereby providing supportive knowledge for emergency decision-making in URT.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
文摘In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.
基金Innovative and Entrepreneurial Team Program of Jiangsu Province,China,Grant/Award Number:JSSCTD202140Innovative and Entrepreneurial Doctor Program of Jiangsu Province,Grant/Award Number:KYCX20_0114National Natural Science Foundation of China,Grant/Award Numbers:41831281,52104121。
文摘Microwave fracturing of rocks before mechanical breakage could improve the performance of mechanical excavators and reduce environmental impacts.Previous research focused on the microwave fracturing of intact rock blocks.By using an open-ended antenna,this paper investigates the effect of pre-existing joints on the microwave fracturing of the Singapore Bukit Timah granite blocks.The results show that the specimens are weakened in the manners of cracking,spalling,melting,or a combination of them.The crack number and the total crack length produced by microwave treatment of jointed rock blocks are slightly smaller than those in the intact rock blocks.The interaction between joints and microwave-induced cracks can be summarized into the following four patterns:(1)microwave-induced cracks become arrested so that the crack propagation is terminated;(2)microwave-induced cracks penetrate the joints and continue to propagate;(3)microwave-induced cracks become deflected along the joints;and(4)microwave-induced cracks propagate forward following the joints.The smaller the approach angle between the microwave-induced crack and the preexisting joint is,the more microwave-induced cracks tend to be arrested at the joint.Increasing the approach angle between the microwave-induced crack and the joint can increase the chance of microwave-induced crack penetrating the joint.The results also show that the smaller the distance is between the microwave radiation point and the joint,the easier it is for microwave-induced cracks to penetrate the joints;otherwise,the microwave-induced crack is more likely to be arrested at the pre-existing joint.
文摘During the installation of a pipe pile,the soil around the pile will be squeezed out. This paper deals with this squeezing effect of open-ended pipe piles using the cylindrical cavity expansion theory. The characteristics of soil with different tension and compression moduli and dilation are involved by applying the elastic theory with different moduli and logarithmic strain. The closed-form solutions of the radius of the plastic region,the displacement of the boundary between the plastic region and the elastic region and the expansion pressure on the external surface of the pipe piles are obtained. When obtaining these solutions,the soil plug in the open-ended pipe pile is considered by employing an incremental filling ratio to quantify the degree of soil plugging. Moreover,the effects of the ratio of tension and compression moduli,angle of dilation and incremental filling ratio on the radius of the plastic region and the expansion pressure on the external surface of the pipe pile are investigated. The parametric analyses show that it is necessary and important to consider the difference between the tension modulus and compression modulus,dilation angle and incremental filling ratio for studying the squeezing effect of open-ended pipe pile installation. It is concluded that the analytical solutions presented in this paper are suitable for studying the squeezing effect of open-ended pipe piles.
基金supported by Guangdong Province Graduate Education Innovation Program(2021JGXM103)the 2020“Research on Talents”Project by the Guangdong Planning Office of Philosophy and Social Science.
文摘The flawed engineering practice is considered the main factor that is affecting to the development quality of engineering postgraduates.Taking Foshan Base as an example,this paper has analyzed the operational pattern,practice teaching model,and internal governance system of the open-ended base as a new system for engineering practice and proposed several suggestions for the reformation of engineering postgraduates based on the construction effect.
文摘Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?
基金supported in part by the National Key Research and Development Program of China,No.2021ZD0112400National Natural Science Foundation of China,No.U1908214+5 种基金Program for Innovative Research Team at the University of Liaoning Province,No.LT2020015the Support Plan for Key Field Innovation Team of Dalian,No.2021RT06the Support Plan for Leading Innovation Team of Dalian University,No.XLJ202010Program for the Liaoning Province Doctoral Research Starting Fund,No.2022-BS-336Key Laboratory of Advanced Design and Intelligent Computing(Dalian University),and Ministry of Education,No.ADIC2022003Interdisciplinary Project of Dalian University,No.DLUXK-2023-QN-015.
文摘With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
基金Supported by National Key Research and Development Program:2017YFC1703606Shenzhen Science and Technology Program:JCYJ20210324120804012+1 种基金2021 Luohu Soft Science Research Program Project:LX202101022022 Luohu Soft Science Research Program Project:LX202202128。
文摘Determining clinical questions is fundamental to the development of clinical practice guidelines(CPGs),which bridges the initial phases and the final recommendations.It is essential for evidence retrieval and the formulation of recommendations.The scientific rigor and precision in determination of clinical questions directly influence the future implementation and applicability of guidelines.In 2020,the World Federation of Acupuncture-Moxibustion Societies initiated the project of clinical practice guideline on acupuncture and moxibustion for adult major depressive disorder(mild-moderate degree)to address clinical and medical decision-making issues in acupuncture treatment for adult mild to moderate major depressive disorder.This CPG provides systematic recommendations based on clinical evidence,patient values,and other factors,aiding decision-makers,clinicians,and patients in selecting appropriate interventions.This paper discusses and analyzes the determination process of clinical questions,and the related issues during the development of this guideline,aiming to provide a reference for determining clinical questions and developing CPGs in the field of acupuncture and exploring more scientific tools and methods for determining clinical questions in future CPGs.
基金Supported by National Nature Science Foudation of China(61976160,61906137,61976158,62076184,62076182)Shanghai Science and Technology Plan Project(21DZ1204800)。
文摘Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world.Recent entity-relationship embedding approaches are deficient in representing some complex relations,resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information.Methods To this end,we propose MKEAH:Multimodal Knowledge Extraction and Accumulation on Hyperplanes.To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information,two losses are proposed to learn the triplet representations from the complementary views:range loss and orthogonal loss.To interpret the capability of extracting topic-related knowledge,we present the Topic Similarity(TS)between topic and entity-relations.Results Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering.Our model outperformed state-of-the-art methods by 2.12%and 3.24%on two challenging knowledge-request datasets:OK-VQA and KRVQA,respectively.Conclusions The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.
基金Supported by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY0026,2023YFH0004).
文摘In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.
文摘The weapon and equipment operational requirement analysis(WEORA) is a necessary condition to win a future war,among which the acquisition of knowledge about weapons and equipment is a great challenge. The main challenge is that the existing weapons and equipment data fails to carry out structured knowledge representation, and knowledge navigation based on natural language cannot efficiently support the WEORA. To solve above problem, this research proposes a method based on question answering(QA) of weapons and equipment knowledge graph(WEKG) to construct and navigate the knowledge related to weapons and equipment in the WEORA. This method firstly constructs the WEKG, and builds a neutral network-based QA system over the WEKG by means of semantic parsing for knowledge navigation. Finally, the method is evaluated and a chatbot on the QA system is developed for the WEORA. Our proposed method has good performance in the accuracy and efficiency of searching target knowledge, and can well assist the WEORA.