This paper reports the results acquired in a research work about "questionable" practices and behaviors in the academic production of researches and postgraduate social sciences and humanities studies of the program...This paper reports the results acquired in a research work about "questionable" practices and behaviors in the academic production of researches and postgraduate social sciences and humanities studies of the programs that are appointed by the National Program of Quality Postgraduate Studies (PNPC, by its acronym in Spanish) in Mexico. Through a qualitative methodology, the authors interpreted some of the arguments that explain and/or justify certain practices in relation to doubled production, authorship, and coauthorship of academic products. In this paper, the authors present and analyze the results that they obtained after reviewing documents produced by professors and students of six postgraduate programs that are taught in two Mexican public universities. At the same time, the authors examine some of the practices that take place within said programs, given the institutional demands of improving finished studies efficiency. One of the hypotheses of this work is the demands that are imposed by external evaluations of academic processes on professors and their programs in order to reach desirable rates with the purpose of maintaining or increasing the levels of productivity, gives way to certain practices that must be analyzed. This work's theoretic framework is constituted by the contributions of career sociology and professional ethics.展开更多
Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions to...Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?展开更多
Activity 1 Think about the following questions and write down your answers before reading the text.1.What are some common factors that usually cause damage to trees when they are struck by lightning?2.How might the un...Activity 1 Think about the following questions and write down your answers before reading the text.1.What are some common factors that usually cause damage to trees when they are struck by lightning?2.How might the unique characteristics of a tree contribute to its ability to survive a lightning strike?展开更多
Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions tow...Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?展开更多
It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing the...It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing theories,such as those by von Finte1 and Rooth,fail to ad-equately account for this phenomenon.In this paper,I exposit the drawbacks of these theories and then propose a novel account,ie.the Good Question-Answer(GQA)view.The GQA account posits that focus triggers question-answer pairs,and pragmatic pressures conceming the adequacy of such question answer pairs in contexts are able to affect the truth-conditions of counterfactuals.I also argue for the GQA account by appeal to its theoretical virtues.展开更多
Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capab...Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.展开更多
In maths,some calculations are just as hard as word problems.Actually,they are word problems.Strategies of solving problems must be applied,while you are doing such calculations.Now,let's discuss a few questions a...In maths,some calculations are just as hard as word problems.Actually,they are word problems.Strategies of solving problems must be applied,while you are doing such calculations.Now,let's discuss a few questions as examples.展开更多
AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and ...AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.展开更多
With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,ap...With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.展开更多
When it comes to the Taiwan question,people tend to focus on American factors.However,if tracing the source,the Japanese factors may be more profound than the American ones.After World War II,a group of intellectuals ...When it comes to the Taiwan question,people tend to focus on American factors.However,if tracing the source,the Japanese factors may be more profound than the American ones.After World War II,a group of intellectuals from Taiwan represented by Thomas W.I.Liao established various“Taiwan independence”organizations in Japan and engaged in“Taiwan independence”activities,which can be regarded as the origin of the post-war“Taiwan independence”movement.Their journey to“Taiwan independence”was also influenced by the political situation at home and abroad at that time,and their experiences also showed that the“Taiwan independence”elements were just victims of international political transactions.The Taiwan question,which continues to this day,has been influenced by the complicated international political situation from the very beginning.展开更多
在小学英语教学中,单元整体教学法逐渐成为提升教学质量的有效途径。本文以译林版英语三年级下册Unit 3 School rules为例,深入探讨了问题引领单元整体教学的设计原则与实施策略。通过设计具有启发性、整合性、以学生为中心及实践创新...在小学英语教学中,单元整体教学法逐渐成为提升教学质量的有效途径。本文以译林版英语三年级下册Unit 3 School rules为例,深入探讨了问题引领单元整体教学的设计原则与实施策略。通过设计具有启发性、整合性、以学生为中心及实践创新性的问题,本文旨在构建一个高效、互动、探究式的英语学习环境。研究结果显示,该教学模式能够显著提高学生的英语学习兴趣与综合能力,促进学生对学习规则的理解与遵守。展开更多
The origin of engraving and printing has a variety of points of view,especially the“early Tang Dynasty”has the most far-reaching impact.Based on media evolution theory,engraved printing as a new media technology is ...The origin of engraving and printing has a variety of points of view,especially the“early Tang Dynasty”has the most far-reaching impact.Based on media evolution theory,engraved printing as a new media technology is first used in secular culture,folk entertainment,and other fields,after a long period of development,the new medium was accepted by the ruling class,the mainstream society;the use of the media began to intervene in the real society.This reflects that the choice of new media by the society depends on the media itself and is intrinsically linked to the whole social communication environment.Therefore,from the point of view of media evolution,the view that engraving originated in the early Tang Dynasty is not in line with the logic of media evolution and is unreasonable.展开更多
Activity 1 Think about the following questions and write down your answers before reading the text.1.Suppose you are planning a trip to Antarctica and want to visit Don Juan Pond,what special preparations would you ne...Activity 1 Think about the following questions and write down your answers before reading the text.1.Suppose you are planning a trip to Antarctica and want to visit Don Juan Pond,what special preparations would you need to make compared to a normal trip?2.In your opinion,how could the unique features of Don Juan Pond be used to develop educational programs for high school students?展开更多
The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach ...The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach facilitates the rapid collection of complete knowledge and rules to form effective decisions.However,the current structured degree of the URT emergency knowledge base remains low,and the domain questions lack labeled datasets,resulting in a large deviation between the consultation outcomes and the intended objectives.To address this issue,this paper proposes a question intention recognition model for the URT emergency domain,leveraging knowledge graph(KG)and data enhancement technology.First,a structured storage of emergency cases and emergency plans is realized based on KG.Subsequently,a comprehensive question template is developed,and the labeled dataset of emergency domain questions in URT is generated through the KG.Lastly,data enhancement is applied by prompt learning and the NLP Chinese Data Augmentation(NLPCDA)tool,and the intention recognition model combining Generalized Auto-regression Pre-training for Language Understanding(XLNet)and Recurrent Convolutional Neural Network for Text Classification(TextRCNN)is constructed.Word embeddings are generated by XLNet,context information is further captured using Bidirectional Long Short-Term Memory Neural Network(BiLSTM),and salient features are extracted with Convolutional Neural Network(CNN).Experimental results demonstrate that the proposed model can enhance the clarity of classification and the identification of domain questions,thereby providing supportive knowledge for emergency decision-making in URT.展开更多
Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its app...Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its application in MedVQA requires further enhancement.A critical limitation of contemporary MedVQA systems lies in the inability to integrate lifelong knowledge with specific patient data to generate human-like responses.Existing Transformer-based MedVQA models require enhancing their capabitities for interpreting answers through the applications of medical image knowledge.The introduction of the medical knowledge graph visual language transformer(MKGViLT),designed for joint medical knowledge graphs(KGs),addresses this challenge.MKGViLT incorporates an enhanced Transformer structure to effectively extract features and combine modalities for MedVQA tasks.The MKGViLT model delivers answers based on richer background knowledge,thereby enhancing performance.The efficacy of MKGViLT is evaluated using the SLAKE and P-VQA datasets.Experimental results show that MKGViLT surpasses the most advanced methods on the SLAKE dataset.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
文摘This paper reports the results acquired in a research work about "questionable" practices and behaviors in the academic production of researches and postgraduate social sciences and humanities studies of the programs that are appointed by the National Program of Quality Postgraduate Studies (PNPC, by its acronym in Spanish) in Mexico. Through a qualitative methodology, the authors interpreted some of the arguments that explain and/or justify certain practices in relation to doubled production, authorship, and coauthorship of academic products. In this paper, the authors present and analyze the results that they obtained after reviewing documents produced by professors and students of six postgraduate programs that are taught in two Mexican public universities. At the same time, the authors examine some of the practices that take place within said programs, given the institutional demands of improving finished studies efficiency. One of the hypotheses of this work is the demands that are imposed by external evaluations of academic processes on professors and their programs in order to reach desirable rates with the purpose of maintaining or increasing the levels of productivity, gives way to certain practices that must be analyzed. This work's theoretic framework is constituted by the contributions of career sociology and professional ethics.
文摘Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CAS Topic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?
文摘Activity 1 Think about the following questions and write down your answers before reading the text.1.What are some common factors that usually cause damage to trees when they are struck by lightning?2.How might the unique characteristics of a tree contribute to its ability to survive a lightning strike?
文摘Editors Yang Wang,Xi'an Jiaotong University Dongbo Shi,Shanghai Jiaotong University Ye Sun,University College London Zhesi Shen,National Science Library,CASTopic of the Special Issue What are the top questions towards better science and innovation and the required data to answer these questions?
基金supported by the Major Program of National Social Science Foundation of China(No.23&ZD240)。
文摘It has long been noticed that focus is able to influence the truth-conditions of coun-terfactual conditionals.Namely,stressing different parts of a counterfactual leads to distinct interpretations.However,existing theories,such as those by von Finte1 and Rooth,fail to ad-equately account for this phenomenon.In this paper,I exposit the drawbacks of these theories and then propose a novel account,ie.the Good Question-Answer(GQA)view.The GQA account posits that focus triggers question-answer pairs,and pragmatic pressures conceming the adequacy of such question answer pairs in contexts are able to affect the truth-conditions of counterfactuals.I also argue for the GQA account by appeal to its theoretical virtues.
基金Supported by the Program for Liaoning Excellent Talents in University(No.LR15045)the Liaoning Provincial Science and Technology Department Applied Basic Research Plan(No.101300243).
文摘Medical visual question answering(MedVQA)faces unique challenges due to the high precision required for images and the specialized nature of the questions.These challenges include insufficient feature extraction capabilities,a lack of textual priors,and incomplete information fusion and interaction.This paper proposes an enhanced bootstrapping language-image pre-training(BLIP)model for MedVQA based on multimodal feature augmentation and triple-path collaborative attention(FCA-BLIP)to address these issues.First,FCA-BLIP employs a unified bootstrap multimodal model architecture that integrates ResNet and bidirectional encoder representations from Transformer(BERT)models to enhance feature extraction capabilities.It enables a more precise analysis of the details in images and questions.Next,the pre-trained BLIP model is used to extract features from image-text sample pairs.The model can understand the semantic relationships and shared information between images and text.Finally,a novel attention structure is developed to fuse the multimodal feature vectors,thereby improving the alignment accuracy between modalities.Experimental results demonstrate that the proposed method performs well in clinical visual question-answering tasks.For the MedVQA task of staging diabetic macular edema in fundus imaging,the proposed method outperforms the existing major models in several performance metrics.
文摘In maths,some calculations are just as hard as word problems.Actually,they are word problems.Strategies of solving problems must be applied,while you are doing such calculations.Now,let's discuss a few questions as examples.
基金Supported by Hunan Province Traditional Chinese Medicine Research Project(No.B2023043)Hunan Provincial Department of Education Scientific Research Project(No.22B0386)+1 种基金Research Project of Hunan Provincial Health Commission(No.20256982)Hunan University of Traditional Chinese Medicine Campus Level Research Fund Project(No.2022XJZKC004).
文摘AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.
文摘With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.
基金supported by“National Social Science Foundation of China 2020”(Grant No.20CSS011).
文摘When it comes to the Taiwan question,people tend to focus on American factors.However,if tracing the source,the Japanese factors may be more profound than the American ones.After World War II,a group of intellectuals from Taiwan represented by Thomas W.I.Liao established various“Taiwan independence”organizations in Japan and engaged in“Taiwan independence”activities,which can be regarded as the origin of the post-war“Taiwan independence”movement.Their journey to“Taiwan independence”was also influenced by the political situation at home and abroad at that time,and their experiences also showed that the“Taiwan independence”elements were just victims of international political transactions.The Taiwan question,which continues to this day,has been influenced by the complicated international political situation from the very beginning.
文摘在小学英语教学中,单元整体教学法逐渐成为提升教学质量的有效途径。本文以译林版英语三年级下册Unit 3 School rules为例,深入探讨了问题引领单元整体教学的设计原则与实施策略。通过设计具有启发性、整合性、以学生为中心及实践创新性的问题,本文旨在构建一个高效、互动、探究式的英语学习环境。研究结果显示,该教学模式能够显著提高学生的英语学习兴趣与综合能力,促进学生对学习规则的理解与遵守。
基金the National Social Science Foundation of China’s Art Program“Study on the Transformation of Tang and Song Calligraphy through the Perspective of Media Change and Engraved Plate Printing”(Project No.2020BF00876).
文摘The origin of engraving and printing has a variety of points of view,especially the“early Tang Dynasty”has the most far-reaching impact.Based on media evolution theory,engraved printing as a new media technology is first used in secular culture,folk entertainment,and other fields,after a long period of development,the new medium was accepted by the ruling class,the mainstream society;the use of the media began to intervene in the real society.This reflects that the choice of new media by the society depends on the media itself and is intrinsically linked to the whole social communication environment.Therefore,from the point of view of media evolution,the view that engraving originated in the early Tang Dynasty is not in line with the logic of media evolution and is unreasonable.
文摘Activity 1 Think about the following questions and write down your answers before reading the text.1.Suppose you are planning a trip to Antarctica and want to visit Don Juan Pond,what special preparations would you need to make compared to a normal trip?2.In your opinion,how could the unique features of Don Juan Pond be used to develop educational programs for high school students?
基金supported in part by the National Natural Science Foundation of China.The funding numbers 62433005,62272036,62132003,and 62173167.
文摘The consultation intention of emergency decision-makers in urban rail transit(URT)is input into the emergency knowledge base in the form of domain questions to obtain emergency decision support services.This approach facilitates the rapid collection of complete knowledge and rules to form effective decisions.However,the current structured degree of the URT emergency knowledge base remains low,and the domain questions lack labeled datasets,resulting in a large deviation between the consultation outcomes and the intended objectives.To address this issue,this paper proposes a question intention recognition model for the URT emergency domain,leveraging knowledge graph(KG)and data enhancement technology.First,a structured storage of emergency cases and emergency plans is realized based on KG.Subsequently,a comprehensive question template is developed,and the labeled dataset of emergency domain questions in URT is generated through the KG.Lastly,data enhancement is applied by prompt learning and the NLP Chinese Data Augmentation(NLPCDA)tool,and the intention recognition model combining Generalized Auto-regression Pre-training for Language Understanding(XLNet)and Recurrent Convolutional Neural Network for Text Classification(TextRCNN)is constructed.Word embeddings are generated by XLNet,context information is further captured using Bidirectional Long Short-Term Memory Neural Network(BiLSTM),and salient features are extracted with Convolutional Neural Network(CNN).Experimental results demonstrate that the proposed model can enhance the clarity of classification and the identification of domain questions,thereby providing supportive knowledge for emergency decision-making in URT.
基金Supported by the National Natural Science Foundation of China(No.62001313)the Liaoning Professional Talent Protect(No.XLYC2203046)the Shenyang Municipal Medical Engineering Cross Research Foundation of China(No.22-321-32-09).
文摘Medical visual question answering(MedVQA)aims to enhance diagnostic confidence and deepen patientsunderstanding of their health conditions.While the Transformer architecture is widely used in multimodal fields,its application in MedVQA requires further enhancement.A critical limitation of contemporary MedVQA systems lies in the inability to integrate lifelong knowledge with specific patient data to generate human-like responses.Existing Transformer-based MedVQA models require enhancing their capabitities for interpreting answers through the applications of medical image knowledge.The introduction of the medical knowledge graph visual language transformer(MKGViLT),designed for joint medical knowledge graphs(KGs),addresses this challenge.MKGViLT incorporates an enhanced Transformer structure to effectively extract features and combine modalities for MedVQA tasks.The MKGViLT model delivers answers based on richer background knowledge,thereby enhancing performance.The efficacy of MKGViLT is evaluated using the SLAKE and P-VQA datasets.Experimental results show that MKGViLT surpasses the most advanced methods on the SLAKE dataset.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.