This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential c...This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN) ′s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node's radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.展开更多
Drug-drug interaction(DDI)prediction is a crucial issue in molecular biology.Traditional methods of observing drug-drug interactions through medical experiments require significant resources and labour.The authors pre...Drug-drug interaction(DDI)prediction is a crucial issue in molecular biology.Traditional methods of observing drug-drug interactions through medical experiments require significant resources and labour.The authors present a Medical Knowledge Graph Question Answering(MedKGQA)model,dubbed MedKGQA,that predicts DDI by employing machine reading comprehension(MRC)from closed-domain literature and constructing a knowledge graph of“drug-protein”triplets from open-domain documents.The model vectorises the drug-protein target attributes in the graph using entity embeddings and establishes directed connections between drug and protein entities based on the metabolic interaction pathways of protein targets in the human body.This aligns multiple external knowledge and applies it to learn the graph neural network.Without bells and whistles,the proposed model achieved a 4.5%improvement in terms of DDI prediction accuracy compared to previous state-of-the-art models on the QAngaroo MedHop dataset.Experimental results demonstrate the efficiency and effectiveness of the model and verify the feasibility of integrating external knowledge in MRC tasks.展开更多
Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid...Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance (CARD), CARDx, and centroid all rank distance batch K-means (CARDBK) are three clustering algorithms that adopt the proposed soft competition learning method. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas (NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability.展开更多
基金Science and Technology Development Project of Tianjin(No. 06FZRJGX02400)National Natural Science Foundation of China (No.60603027)
文摘This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN) ′s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node's radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.
基金China Postdoctoral Science Foundation under Grant 2023M732022Qufu Normal University under Grant 167-602801.
文摘Drug-drug interaction(DDI)prediction is a crucial issue in molecular biology.Traditional methods of observing drug-drug interactions through medical experiments require significant resources and labour.The authors present a Medical Knowledge Graph Question Answering(MedKGQA)model,dubbed MedKGQA,that predicts DDI by employing machine reading comprehension(MRC)from closed-domain literature and constructing a knowledge graph of“drug-protein”triplets from open-domain documents.The model vectorises the drug-protein target attributes in the graph using entity embeddings and establishes directed connections between drug and protein entities based on the metabolic interaction pathways of protein targets in the human body.This aligns multiple external knowledge and applies it to learn the graph neural network.Without bells and whistles,the proposed model achieved a 4.5%improvement in terms of DDI prediction accuracy compared to previous state-of-the-art models on the QAngaroo MedHop dataset.Experimental results demonstrate the efficiency and effectiveness of the model and verify the feasibility of integrating external knowledge in MRC tasks.
基金supported by the Project of Natural Science Foundation Research Project of Shaanxi Province of China (2015JM6318)the Humanities and Social Sciences Research Youth Fund Project of Ministry of Education of China (13YJCZH251)
文摘Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance (CARD), CARDx, and centroid all rank distance batch K-means (CARDBK) are three clustering algorithms that adopt the proposed soft competition learning method. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas (NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability.