目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等...目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等特征的规范化数据库。基于决策树、支持向量机、轻量级梯度提升机等6种基础模型构建Voting集成模型,并以七折交叉验证和基于树结构的贝叶斯优化算法超参数优化提升模型性能。利用SHAP(SHapley Additive ex Planations)解释器可视化关键药性特征。结果经筛选后,共纳入522味抗炎中药构建数据库。Voting集成模型综合性能最优,F1分数为0.797,AUC值为0.77,较单一模型平均提升7.4%。SHAP分析表明使中药发挥抗炎作用的重要特征分别是“脾经”“甘味”“补益”等,使中药不具有抗炎作用的重要特征为“性温或平”和“毒性”。结论首次通过集成算法构建具有良好性能的中药抗炎作用预测模型,为中医药与机器学习结合的研究模式提供了新思路。展开更多
Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptibl...Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptible to security and privacy threats due to hardware and architectural issues. Although small drones hold promise for expansion in both civil and defense sectors, they have safety, security, and privacy threats. Addressing these challenges is crucial to maintaining the security and uninterrupted operations of these drones. In this regard, this study investigates security, and preservation concerning both the drones and Internet of Drones (IoD), emphasizing the significance of creating drone networks that are secure and can robustly withstand interceptions and intrusions. The proposed framework incorporates a weighted voting ensemble model comprising three convolutional neural network (CNN) models to enhance intrusion detection within the network. The employed CNNs are customized 1D models optimized to obtain better performance. The output from these CNNs is voted using a weighted criterion using a 0.4, 0.3, and 0.3 ratio for three CNNs, respectively. Experiments involve using multiple benchmark datasets, achieving an impressive accuracy of up to 99.89% on drone data. The proposed model shows promising results concerning precision, recall, and F1 as indicated by their obtained values of 99.92%, 99.98%, and 99.97%, respectively. Furthermore, cross-validation and performance comparison with existing works is also carried out. Findings indicate that the proposed approach offers a prospective solution for detecting security threats for aerial systems and satellite systems with high accuracy.展开更多
Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a...Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a bottleneck to the widespread adoption of UCAN in 6G.In this paper,we propose Overlap Shard,a blockchain framework based on a novel reputation voting(RV)scheme,to dynamically manage the APs in UCAN.AP nodes in UCAN are distributed across multiple shards based on the RV scheme.That is,nodes with good reputation(virtuous behavior)are likely to be selected in the overlap shard.The RV mechanism ensures the security of UCAN because most APs adopt virtuous behaviors.Furthermore,to improve the efficiency of the Overlap Shard,we reduce cross-shard transactions by introducing core nodes.Specifically,a few nodes are overlapped in different shards,which can directly process the transactions in two shards instead of crossshard transactions.This greatly increases the speed of transactions between shards and thus the throughput of the overlap shard.The experiments show that the throughput of the overlap shard is about 2.5 times that of the non-sharded blockchain.展开更多
3D model classification has emerged as a significant research focus in computer vision.However,traditional convolutional neural networks(CNNs)often struggle to capture global dependencies across both height and width ...3D model classification has emerged as a significant research focus in computer vision.However,traditional convolutional neural networks(CNNs)often struggle to capture global dependencies across both height and width dimensions simultaneously,leading to limited feature representation capabilities when handling complex visual tasks.To address this challenge,we propose a novel 3D model classification network named ViT-GE(Vision Transformer with Global and Efficient Attention),which integrates Global Grouped Coordinate Attention(GGCA)and Efficient Channel Attention(ECA)mechanisms.Specifically,the Vision Transformer(ViT)is employed to extract comprehensive global features from multi-view inputs using its self-attention mechanism,effectively capturing 3D shape characteristics.To further enhance spatial feature modeling,the GGCA module introduces a grouping strategy and global context interactions.Concurrently,the ECA module strengthens inter-channel information flow,enabling the network to adaptively emphasize key features and improve feature fusion.Finally,a voting mechanism is adopted to enhance classification accuracy,robustness,and stability.Experimental results on the ModelNet10 dataset demonstrate that our method achieves a classification accuracy of 93.50%,validating its effectiveness and superior performance.展开更多
Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine lea...Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.展开更多
This paper presents 3RVAV(Three-Round Voting with Advanced Validation),a novel Byzantine Fault Tolerant consensus protocol combining Proof-of-Stake with a multi-phase voting mechanism.The protocol introduces three lay...This paper presents 3RVAV(Three-Round Voting with Advanced Validation),a novel Byzantine Fault Tolerant consensus protocol combining Proof-of-Stake with a multi-phase voting mechanism.The protocol introduces three layers of randomized committee voting with distinct participant roles(Validators,Delegators,and Users),achieving(4/5)-threshold approval per round through a verifiable random function(VRF)-based selection process.Our security analysis demonstrates 3RVAV provides 1−(1−s/n)^(3k) resistance to Sybil attacks with n participants and stake s,while maintaining O(kn log n)communication complexity.Experimental simulations show 3247 TPS throughput with 4-s finality,representing a 5.8×improvement over Algorand’s committee-based approach.The proposed protocol achieves approximately 4.2-s finality,demonstrating low latency while maintaining strong consistency and resilience.The protocol introduces a novel punishment matrix incorporating both stake slashing and probabilistic blacklisting,proving a Nash equilibrium for honest participation under rational actor assumptions.展开更多
According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting th...According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting the will of all Member States.Combining interdisciplinary,qualitative and quantitative research methods,in response to the dilemma of Security Council voting reform,this article suggests retaining the Security Council voting system and recommending a simplified model of“basic and weighted half”for voting allocation.This model not only inherits the authorized voting system of the collective security system,but also follows the allocation system of sovereignty equality in the Charter.It can also achieve the“draw on the advantages and avoid disadvantages”of Member States towards international development,promote the transformation of“absolute equality”of overall consistency into“real fairness”relative to individual contributions,and further promote the development of international law in the United Nations voting system.展开更多
This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno...This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.展开更多
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo...Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.展开更多
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met...We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].展开更多
Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an...This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.展开更多
As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of do...As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordp. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordp method outperforms the state of the art systems when tested on the Duc2004 data set. Key words: multi-document, graph algorithm, keyword density, Graph & Keywordp, Due2004展开更多
Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summar...Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summary is proposed,which has two stages,the acquisition of acandidate sentence set and the optimum selection of sentence.At the first stage,the candidate sentenceset is obtained by redundancy-based sentence selection approach.At the second stage,optimum se-lection of sentences is proposed to delete sentences in the candidate sentence set according to itscontribution to the whole set until getting the appointed summary length.With a test corpus,theROUGE value of summaries gotten by the proposed approach proves its validity,compared with thetraditional method of sentence selection.The influence of the token chosen in the two-stage sentenceselection approach on the quality of the generated summaries is analyzed.展开更多
A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decompos...A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient.展开更多
This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main c...This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods.展开更多
文摘目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等特征的规范化数据库。基于决策树、支持向量机、轻量级梯度提升机等6种基础模型构建Voting集成模型,并以七折交叉验证和基于树结构的贝叶斯优化算法超参数优化提升模型性能。利用SHAP(SHapley Additive ex Planations)解释器可视化关键药性特征。结果经筛选后,共纳入522味抗炎中药构建数据库。Voting集成模型综合性能最优,F1分数为0.797,AUC值为0.77,较单一模型平均提升7.4%。SHAP分析表明使中药发挥抗炎作用的重要特征分别是“脾经”“甘味”“补益”等,使中药不具有抗炎作用的重要特征为“性温或平”和“毒性”。结论首次通过集成算法构建具有良好性能的中药抗炎作用预测模型,为中医药与机器学习结合的研究模式提供了新思路。
文摘Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptible to security and privacy threats due to hardware and architectural issues. Although small drones hold promise for expansion in both civil and defense sectors, they have safety, security, and privacy threats. Addressing these challenges is crucial to maintaining the security and uninterrupted operations of these drones. In this regard, this study investigates security, and preservation concerning both the drones and Internet of Drones (IoD), emphasizing the significance of creating drone networks that are secure and can robustly withstand interceptions and intrusions. The proposed framework incorporates a weighted voting ensemble model comprising three convolutional neural network (CNN) models to enhance intrusion detection within the network. The employed CNNs are customized 1D models optimized to obtain better performance. The output from these CNNs is voted using a weighted criterion using a 0.4, 0.3, and 0.3 ratio for three CNNs, respectively. Experiments involve using multiple benchmark datasets, achieving an impressive accuracy of up to 99.89% on drone data. The proposed model shows promising results concerning precision, recall, and F1 as indicated by their obtained values of 99.92%, 99.98%, and 99.97%, respectively. Furthermore, cross-validation and performance comparison with existing works is also carried out. Findings indicate that the proposed approach offers a prospective solution for detecting security threats for aerial systems and satellite systems with high accuracy.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant 61931005.
文摘Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a bottleneck to the widespread adoption of UCAN in 6G.In this paper,we propose Overlap Shard,a blockchain framework based on a novel reputation voting(RV)scheme,to dynamically manage the APs in UCAN.AP nodes in UCAN are distributed across multiple shards based on the RV scheme.That is,nodes with good reputation(virtuous behavior)are likely to be selected in the overlap shard.The RV mechanism ensures the security of UCAN because most APs adopt virtuous behaviors.Furthermore,to improve the efficiency of the Overlap Shard,we reduce cross-shard transactions by introducing core nodes.Specifically,a few nodes are overlapped in different shards,which can directly process the transactions in two shards instead of crossshard transactions.This greatly increases the speed of transactions between shards and thus the throughput of the overlap shard.The experiments show that the throughput of the overlap shard is about 2.5 times that of the non-sharded blockchain.
基金funded by the project supported by the Heilongjiang Provincial Natural Science Foundation of China(Grant Number LH2022F030).
文摘3D model classification has emerged as a significant research focus in computer vision.However,traditional convolutional neural networks(CNNs)often struggle to capture global dependencies across both height and width dimensions simultaneously,leading to limited feature representation capabilities when handling complex visual tasks.To address this challenge,we propose a novel 3D model classification network named ViT-GE(Vision Transformer with Global and Efficient Attention),which integrates Global Grouped Coordinate Attention(GGCA)and Efficient Channel Attention(ECA)mechanisms.Specifically,the Vision Transformer(ViT)is employed to extract comprehensive global features from multi-view inputs using its self-attention mechanism,effectively capturing 3D shape characteristics.To further enhance spatial feature modeling,the GGCA module introduces a grouping strategy and global context interactions.Concurrently,the ECA module strengthens inter-channel information flow,enabling the network to adaptively emphasize key features and improve feature fusion.Finally,a voting mechanism is adopted to enhance classification accuracy,robustness,and stability.Experimental results on the ModelNet10 dataset demonstrate that our method achieves a classification accuracy of 93.50%,validating its effectiveness and superior performance.
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
文摘Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.
文摘This paper presents 3RVAV(Three-Round Voting with Advanced Validation),a novel Byzantine Fault Tolerant consensus protocol combining Proof-of-Stake with a multi-phase voting mechanism.The protocol introduces three layers of randomized committee voting with distinct participant roles(Validators,Delegators,and Users),achieving(4/5)-threshold approval per round through a verifiable random function(VRF)-based selection process.Our security analysis demonstrates 3RVAV provides 1−(1−s/n)^(3k) resistance to Sybil attacks with n participants and stake s,while maintaining O(kn log n)communication complexity.Experimental simulations show 3247 TPS throughput with 4-s finality,representing a 5.8×improvement over Algorand’s committee-based approach.The proposed protocol achieves approximately 4.2-s finality,demonstrating low latency while maintaining strong consistency and resilience.The protocol introduces a novel punishment matrix incorporating both stake slashing and probabilistic blacklisting,proving a Nash equilibrium for honest participation under rational actor assumptions.
文摘According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting the will of all Member States.Combining interdisciplinary,qualitative and quantitative research methods,in response to the dilemma of Security Council voting reform,this article suggests retaining the Security Council voting system and recommending a simplified model of“basic and weighted half”for voting allocation.This model not only inherits the authorized voting system of the collective security system,but also follows the allocation system of sovereignty equality in the Charter.It can also achieve the“draw on the advantages and avoid disadvantages”of Member States towards international development,promote the transformation of“absolute equality”of overall consistency into“real fairness”relative to individual contributions,and further promote the development of international law in the United Nations voting system.
文摘This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.
文摘Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.
文摘We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10].
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
文摘This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.
文摘As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordp. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordp method outperforms the state of the art systems when tested on the Duc2004 data set. Key words: multi-document, graph algorithm, keyword density, Graph & Keywordp, Due2004
基金the National Natural Science Foundation of China(No.60575041)the High Technology Researchand Development Program of China(No.2006AA01Z150).
文摘Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summary is proposed,which has two stages,the acquisition of acandidate sentence set and the optimum selection of sentence.At the first stage,the candidate sentenceset is obtained by redundancy-based sentence selection approach.At the second stage,optimum se-lection of sentences is proposed to delete sentences in the candidate sentence set according to itscontribution to the whole set until getting the appointed summary length.With a test corpus,theROUGE value of summaries gotten by the proposed approach proves its validity,compared with thetraditional method of sentence selection.The influence of the token chosen in the two-stage sentenceselection approach on the quality of the generated summaries is analyzed.
文摘A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient.
文摘This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods.