Purpose: This paper aims to better understand a large number of papers in the medical domain of Alzheimer's disease (AD) and related diseases using the machine reading approach. Design/methodology/approach: The s...Purpose: This paper aims to better understand a large number of papers in the medical domain of Alzheimer's disease (AD) and related diseases using the machine reading approach. Design/methodology/approach: The study uses the topic modeling method to obtain an overview of the field, and employs open information extraction to further comprehend the field at a specific fact level. Findings: Several topics within the AD research field are identified, such as the Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS), which can help answer the question of how A1DS/HIV and AD are very different yet related diseases. Research limitations: Some manual data cleaning could improve the study, such as removing incorrect facts found by open information extraction. Practical implications: This study uses the literature to answer specific questions on a scientific domain, which can help domain experts find interesting and meaningful relations among entities in a similar manner, such as to discover relations between AD and AIDS/HIV. Origlnality/value: Both the overview and specific information from the literature are obtained using two distinct methods in a complementary manner. This combination is novel because previous work has only focused on one of them, and thus provides a better way to understand an important scientific field using data-driven methods.展开更多
How to improve the metacognition abilities of the models is a hot topic in Machine Reading Comprehension(MRC),a dataset with unanswerable questions is an effective way to test the abilities.There are many datasets wit...How to improve the metacognition abilities of the models is a hot topic in Machine Reading Comprehension(MRC),a dataset with unanswerable questions is an effective way to test the abilities.There are many datasets with unanswerable questions for MRC in Chinese and English,but the related research in low-resource languages such as Tibetan is in initial progress.TibetanQA is an extractive dataset with answerable questions for MRC in Tibetan,which contains 20,000 question-and-answer pairs and 1,513 articles,but this dataset mainly focuses on answerable questions,which models trained on this dataset may produce less accurate predictions when encountering questions that do not have a precise answer in the text.To address these weaknesses,this paper constructs the Dataset with Unanswerable Questions for Tibetan Machine Reading Comprehension(TibetanQA2.0).This dataset was constructed by crowd workers,and contains 505 passages and 1,347 unanswerable question-and-answer pairs.The passages cover six topics,which are geography,biology,history,nature,culture,and medicine.The dataset's quality was ensured by unifying the use of interrogative particles,correcting spelling and grammar errors,and verifying its compatibility with the Tibetan context.展开更多
Opinion question machine reading comprehension(MRC)requires a machine to answer questions by analyzing corresponding passages.Compared with traditional MRC tasks where the answer to every question is a segment of text...Opinion question machine reading comprehension(MRC)requires a machine to answer questions by analyzing corresponding passages.Compared with traditional MRC tasks where the answer to every question is a segment of text in corresponding passages,opinion question MRC is more challenging because the answer to an opinion question may not appear in corresponding passages but needs to be deduced from multiple sentences.In this study,a novel framework based on neural networks is proposed to address such problems,in which a new hybrid embedding training method combining text features is used.Furthermore,extra attention and output layers which generate auxiliary losses are introduced to jointly train the stacked recurrent neural networks.To deal with imbalance of the dataset,irrelevancy of question and passage is used for data augmentation.Experimental results show that the proposed method achieves state-of-the-art performance.We are the biweekly champion in the opinion question MRC task in Artificial Intelligence Challenger 2018(AIC2018).展开更多
Machine reading comprehension has been a research focus in natural language processing and intelligence engineering.However,there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain.Moreove...Machine reading comprehension has been a research focus in natural language processing and intelligence engineering.However,there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain.Moreover,current research lacks the ability to embed accurate background knowledge and provide precise answers.To address these two problems,this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner.Then,it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text.To eliminate knowledge noise that could lead to semantic deviation,this paper uses a mixed mutual ttention mechanism among questions,passages,and knowledge triples to select the most relevant triples before embedding their semantics into the sentences.Experiment results indicate that the proposed approach can achieve a 70.70%EM value and an 87.91%F1 score,with a 4.23%and 3.35%improvement over existing methods,respectively.展开更多
Several parallel sorting techniques on different architectures have been studied for many years. Due to the need for faster systems in today's world, parallelism can be used to accelerate applications. Nowadays, para...Several parallel sorting techniques on different architectures have been studied for many years. Due to the need for faster systems in today's world, parallelism can be used to accelerate applications. Nowadays, parallel operations are used to solve computer problems such as sort and search, which result in a reasonable speed. Sorting is one of the most important operations in computing world. The authors always try to find the best in different areas which the premier is speedup. In this paper, the authors issued a sort with O(logn) time complexity on PRAM EREW (Parallel Random Access Machine Exclusive Read Exclusive Write). The algorithm is designed in a manner that keeps the tradeoff between the number of processor elements in the architecture and execution time. The simulation of the algorithm proves the theoretical analysis of the algorithm. The results of this research can be utilized in developing faster embedded systems. Sorting on Centralized Diamond (SOCD) algorithm is issued on the novel Centralized Diamond architecture which takes the advantages of Single Instruction Multiple Data (SIMD) architecture. This architecture and the sort on it are intuitive and optimal.展开更多
文摘Purpose: This paper aims to better understand a large number of papers in the medical domain of Alzheimer's disease (AD) and related diseases using the machine reading approach. Design/methodology/approach: The study uses the topic modeling method to obtain an overview of the field, and employs open information extraction to further comprehend the field at a specific fact level. Findings: Several topics within the AD research field are identified, such as the Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS), which can help answer the question of how A1DS/HIV and AD are very different yet related diseases. Research limitations: Some manual data cleaning could improve the study, such as removing incorrect facts found by open information extraction. Practical implications: This study uses the literature to answer specific questions on a scientific domain, which can help domain experts find interesting and meaningful relations among entities in a similar manner, such as to discover relations between AD and AIDS/HIV. Origlnality/value: Both the overview and specific information from the literature are obtained using two distinct methods in a complementary manner. This combination is novel because previous work has only focused on one of them, and thus provides a better way to understand an important scientific field using data-driven methods.
基金supported by the National Nature Science Foundation (61972436)the National Social Science Foundation (22&ZD035)the Minzu University of China Foundation (GRSCP202316, 2023QNYL22, 2024GJYY43)
文摘How to improve the metacognition abilities of the models is a hot topic in Machine Reading Comprehension(MRC),a dataset with unanswerable questions is an effective way to test the abilities.There are many datasets with unanswerable questions for MRC in Chinese and English,but the related research in low-resource languages such as Tibetan is in initial progress.TibetanQA is an extractive dataset with answerable questions for MRC in Tibetan,which contains 20,000 question-and-answer pairs and 1,513 articles,but this dataset mainly focuses on answerable questions,which models trained on this dataset may produce less accurate predictions when encountering questions that do not have a precise answer in the text.To address these weaknesses,this paper constructs the Dataset with Unanswerable Questions for Tibetan Machine Reading Comprehension(TibetanQA2.0).This dataset was constructed by crowd workers,and contains 505 passages and 1,347 unanswerable question-and-answer pairs.The passages cover six topics,which are geography,biology,history,nature,culture,and medicine.The dataset's quality was ensured by unifying the use of interrogative particles,correcting spelling and grammar errors,and verifying its compatibility with the Tibetan context.
基金Project supported by the China Knowledge Centre for Engineering Sciences and Technology(No.CKCEST-2019-1-12)the National Natural Science Foundation of China(No.61572434)。
文摘Opinion question machine reading comprehension(MRC)requires a machine to answer questions by analyzing corresponding passages.Compared with traditional MRC tasks where the answer to every question is a segment of text in corresponding passages,opinion question MRC is more challenging because the answer to an opinion question may not appear in corresponding passages but needs to be deduced from multiple sentences.In this study,a novel framework based on neural networks is proposed to address such problems,in which a new hybrid embedding training method combining text features is used.Furthermore,extra attention and output layers which generate auxiliary losses are introduced to jointly train the stacked recurrent neural networks.To deal with imbalance of the dataset,irrelevancy of question and passage is used for data augmentation.Experimental results show that the proposed method achieves state-of-the-art performance.We are the biweekly champion in the opinion question MRC task in Artificial Intelligence Challenger 2018(AIC2018).
基金National key research and development program(2020AAA0108500)National Natural Science Foundation of China Project(No.U1836118)Key Laboratory of Rich Media Digital Publishing,Content Organization and Knowledge Service(No.:ZD2022-10/05).
文摘Machine reading comprehension has been a research focus in natural language processing and intelligence engineering.However,there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain.Moreover,current research lacks the ability to embed accurate background knowledge and provide precise answers.To address these two problems,this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner.Then,it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text.To eliminate knowledge noise that could lead to semantic deviation,this paper uses a mixed mutual ttention mechanism among questions,passages,and knowledge triples to select the most relevant triples before embedding their semantics into the sentences.Experiment results indicate that the proposed approach can achieve a 70.70%EM value and an 87.91%F1 score,with a 4.23%and 3.35%improvement over existing methods,respectively.
文摘Several parallel sorting techniques on different architectures have been studied for many years. Due to the need for faster systems in today's world, parallelism can be used to accelerate applications. Nowadays, parallel operations are used to solve computer problems such as sort and search, which result in a reasonable speed. Sorting is one of the most important operations in computing world. The authors always try to find the best in different areas which the premier is speedup. In this paper, the authors issued a sort with O(logn) time complexity on PRAM EREW (Parallel Random Access Machine Exclusive Read Exclusive Write). The algorithm is designed in a manner that keeps the tradeoff between the number of processor elements in the architecture and execution time. The simulation of the algorithm proves the theoretical analysis of the algorithm. The results of this research can be utilized in developing faster embedded systems. Sorting on Centralized Diamond (SOCD) algorithm is issued on the novel Centralized Diamond architecture which takes the advantages of Single Instruction Multiple Data (SIMD) architecture. This architecture and the sort on it are intuitive and optimal.