The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the comm...The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.展开更多
In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy...In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All the experiments conducted in this research show that keeping rare terms substantially improves the performance of Polynomial Networks in Text Categorization, regardless of the term reduction method, the number of terms used in classification, or the term weighting scheme adopted.展开更多
Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order ...Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order to solve this problem, the thought of text mining is used to extract the feature and curtail feature sets from text data firstly, and frequent pattern tree (FPT) of the text data is constructed to reason frequent item-set between the key factors by frequent patter growth (FPC) algorithm. Then on the basis of fuzzy Bayesian network (FBN) and sample distribution, this paper fuzzifies the key attributes, which forms associated relationship in frequent item-sets and their main parameters, eliminates the subjective influence factors and obtains condition mutual information and maximum weight directed tree among all the attribute variables. Furthermore, the hybrid model is established by reason fuzzy prior probability and contingent probability and concluding parameter learning method. Finally, the example indicates the model is believable and effective.展开更多
近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localizatio...近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localization Quality and Classification Score)joint regression,能够将坐标质量和类别分数关联到一起,消除预测阶段不一致的问题.为了让网络区分复杂样本,结合传统的边缘检测算法提出Socle-Mask分支生成分割掩码.该模块在水平和垂直方向上分区别提取纹理特征,并加入通道自注意力机制,让网络自主选择通道特征.我们在三个具有挑战性的数据集(Total-Text、CTW1500和ICDAR2015)中进行了广泛的实验,验证了该算法具有很好的文本检测性能.展开更多
文摘The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.
文摘In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All the experiments conducted in this research show that keeping rare terms substantially improves the performance of Polynomial Networks in Text Categorization, regardless of the term reduction method, the number of terms used in classification, or the term weighting scheme adopted.
基金the Weapon Equipment Beforehand Research Foundation of China(No.9140A19030314JB35275)the Army Technology Element Foundation of China(No.A157167)
文摘Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order to solve this problem, the thought of text mining is used to extract the feature and curtail feature sets from text data firstly, and frequent pattern tree (FPT) of the text data is constructed to reason frequent item-set between the key factors by frequent patter growth (FPC) algorithm. Then on the basis of fuzzy Bayesian network (FBN) and sample distribution, this paper fuzzifies the key attributes, which forms associated relationship in frequent item-sets and their main parameters, eliminates the subjective influence factors and obtains condition mutual information and maximum weight directed tree among all the attribute variables. Furthermore, the hybrid model is established by reason fuzzy prior probability and contingent probability and concluding parameter learning method. Finally, the example indicates the model is believable and effective.
文摘近年来场景文本检测技术飞速发展,提出一种可适用于任意形状文本检测的新颖算法Mask Text Detector.该算法在Mask R-CNN的基础上,用anchor-free的方法替代了原本的RPN层生成建议框,减少了超参、模型参数和计算量.还提出LQCS(Localization Quality and Classification Score)joint regression,能够将坐标质量和类别分数关联到一起,消除预测阶段不一致的问题.为了让网络区分复杂样本,结合传统的边缘检测算法提出Socle-Mask分支生成分割掩码.该模块在水平和垂直方向上分区别提取纹理特征,并加入通道自注意力机制,让网络自主选择通道特征.我们在三个具有挑战性的数据集(Total-Text、CTW1500和ICDAR2015)中进行了广泛的实验,验证了该算法具有很好的文本检测性能.