期刊文献+
共找到630篇文章
< 1 2 32 >
每页显示 20 50 100
Automatic Classification of Swedish Metadata Using Dewey Decimal Classification:A Comparison of Approaches 被引量:2
1
作者 Koraljka Golub Johan Hagelback Anders Ardo 《Journal of Data and Information Science》 CSCD 2020年第1期18-38,共21页
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst... Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems. 展开更多
关键词 LIBRIS Dewey Decimal classification Automatic classification Machine learning Support Vector Machine Multinomial Naive Bayes Simple linear network Standard neural network 1D convolutional neural network Recurrent neural network Word embeddings String matching
在线阅读 下载PDF
Correct Classification Rates in Multi-Category Discriminant Analysis of Spatial Gaussian Data 被引量:1
2
作者 Lina Dreiziene Kestutis Ducinskas Laura Paulioniene 《Open Journal of Statistics》 2015年第1期21-26,共6页
This paper discusses the problem of classifying a multivariate Gaussian random field observation into one of the several categories specified by different parametric mean models. Investigation is conducted on the clas... This paper discusses the problem of classifying a multivariate Gaussian random field observation into one of the several categories specified by different parametric mean models. Investigation is conducted on the classifier based on plug-in Bayes classification rule (PBCR) formed by replacing unknown parameters in Bayes classification rule (BCR) with category parameters estimators. This is the extension of the previous one from the two category cases to the multi-category case. The novel closed-form expressions for the Bayes classification probability and actual correct classification rate associated with PBCR are derived. These correct classification rates are suggested as performance measures for the classifications procedure. An empirical study has been carried out to analyze the dependence of derived classification rates on category parameters. 展开更多
关键词 Gaussian Random Field Bayes classification Rule Pairwise Discriminant Function Actual Correct classification Rate
暂未订购
Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark 被引量:22
3
作者 LIU Peng ZHAO Hui-han +3 位作者 TENG Jia-yu YANG Yan-yan LIU Ya-feng ZHU Zong-wei 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第1期1-12,共12页
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall... The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining. 展开更多
关键词 Chinese text classification naive Bayes SPARK HADOOP resilient distributed dataset PARALLELIZATION
在线阅读 下载PDF
Automatically Constructing an Effective Domain Ontology for Document Classification 被引量:2
4
作者 Yi-Hsing Chang 《Computer Technology and Application》 2011年第3期182-189,共8页
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the... An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification. 展开更多
关键词 Naive bayes classifier ONTOLOGY formal concept analysis document classification.
在线阅读 下载PDF
Roman Urdu News Headline Classification Empowered with Machine Learning 被引量:2
5
作者 Rizwan Ali Naqvi Muhammad Adnan Khan +3 位作者 Nauman Malik Shazia Saqib Tahir Alyas Dildar Hussain 《Computers, Materials & Continua》 SCIE EI 2020年第11期1221-1236,共16页
Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for ... Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation. 展开更多
关键词 Roman urdu news headline classification long short term memory recurrent neural network logistic regression multinomial naïve Bayes random forest k neighbor gradient boosting classifier
在线阅读 下载PDF
Naïve Bayes Algorithm for Large Scale Text Classification
6
作者 Pirunthavi SIVAKUMAR Jayalath EKANAYAKE 《Instrumentation》 2021年第4期55-62,共8页
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry... This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification. 展开更多
关键词 Naïve Bayes Text classification YOUTUBE Sentimental Analysis
原文传递
An IoT-Cloud Based Intelligent Computer-Aided Diagnosis of Diabetic Retinopathy Stage Classification Using Deep Learning Approach 被引量:9
7
作者 K.Shankar Eswaran Perumal +1 位作者 Mohamed Elhoseny Phong Thanh Nguyen 《Computers, Materials & Continua》 SCIE EI 2021年第2期1665-1680,共16页
Diabetic retinopathy(DR)is a disease with an increasing prevalence and the major reason for blindness among working-age population.The possibility of severe vision loss can be extensively reduced by timely diagnosis a... Diabetic retinopathy(DR)is a disease with an increasing prevalence and the major reason for blindness among working-age population.The possibility of severe vision loss can be extensively reduced by timely diagnosis and treatment.An automated screening for DR has been identified as an effective method for early DR detection,which can decrease the workload associated to manual grading as well as save diagnosis costs and time.Several studies have been carried out to develop automated detection and classification models for DR.This paper presents a new IoT and cloud-based deep learning for healthcare diagnosis of Diabetic Retinopathy(DR).The proposed model incorporates different processes namely data collection,preprocessing,segmentation,feature extraction and classification.At first,the IoT-based data collection process takes place where the patient wears a head mounted camera to capture the retinal fundus image and send to cloud server.Then,the contrast level of the input DR image gets increased in the preprocessing stage using Contrast Limited Adaptive Histogram Equalization(CLAHE)model.Next,the preprocessed image is segmented using Adaptive Spatial Kernel distance measure-based Fuzzy C-Means clustering(ASKFCM)model.Afterwards,deep Convolution Neural Network(CNN)based Inception v4 model is applied as a feature extractor and the resulting feature vectors undergo classification in line with the Gaussian Naive Bayes(GNB)model.The proposed model was tested using a benchmark DR MESSIDOR image dataset and the obtained results showcased superior performance of the proposed model over other such models compared in the study. 展开更多
关键词 Deep learning classification GaussianNaive Bayes feature extraction diabetic retinopathy
在线阅读 下载PDF
Selecting and applying quantification models for ecosystem services to forest ecosystems in South Korea 被引量:1
8
作者 Hyun-Ah Choi Woo-Kyun Lee +4 位作者 Cholho Song Nicklas Forsell Seongwoo Jeon Joon Soon Kim So Ra Kim 《Journal of Forestry Research》 SCIE CAS CSCD 2016年第6期1373-1384,共12页
There is growing interest in using ecosystem services to aid development of management strategies that target sustainability and enhance ecosystem support to humans. Challenges remain in the search for methods and ind... There is growing interest in using ecosystem services to aid development of management strategies that target sustainability and enhance ecosystem support to humans. Challenges remain in the search for methods and indicators that can quantify ecosystem services using metrics that are meaningful in light of their high priorities. We developed a framework to link ecosystems to human wellbeing based on a stepwise approach. We evaluated prospective models in terms of their capacity to quantify national ecosystem services of forests. The most applicable models were subsequently used to quantify ecosystem services. The Korea Forest Research Institute model sat- isfied all criteria in its first practical use. A total of 12 key ecosystem services were identified. For our case study, we quantified four ecosystem functions, viz. water storage capacity in forest soil for water storage service, reduced suspended sediment for water purification service, reduced soil erosion for landslide prevention service, and reduced sediment yield for sediment regulation service. Water storage capacity in forest soil was estimated at 2142 t/ha, and reduced suspended sediment was estimated at 608 kg/ ha. Reduced soil erosion was estimated at 77 m^3/ha, and reduced sediment yield was estimated at 285 m^3/ha. These results were similar to those reported by previous studies. Mapped results revealed hotspots of ecosystem services around protected areas that were particularly rich in bio- diversity. In addition, the proposed framework illustrated that quantification of ecosystem services could be sup- ported by the spatial flow of ecosystem services. However, our approach did not address challenges faced when quantifying connections between ecosystem indicators and actual benefits of services described. 展开更多
关键词 classification Ecosystem services quantification Stepwise approach
在线阅读 下载PDF
Statistical Classification Using the Maximum Function 被引量:1
9
作者 T. Pham-Gia Nguyen D. Nhat Nguyen V. Phong 《Open Journal of Statistics》 2015年第7期665-679,共15页
The maximum of k numerical functions defined on , , by , ??is used here in Statistical classification. Previously, it has been used in Statistical Discrimination [1] and in Clustering [2]. We present first some theore... The maximum of k numerical functions defined on , , by , ??is used here in Statistical classification. Previously, it has been used in Statistical Discrimination [1] and in Clustering [2]. We present first some theoretical results on this function, and then its application in classification using a computer program we have developed. This approach leads to clear decisions, even in cases where the extension to several classes of Fisher’s linear discriminant function fails to be effective. 展开更多
关键词 MAXIMUM DISCRIMINANT Function Pattern classification NORMAL Distribution BAYES Error L1-Norm Linear QUADRATIC Space CURVES
暂未订购
Decision Tree and Naive Bayes Algorithm for Classification and Generation of Actionable Knowledge for Direct Marketing
10
作者 Masud Karim Rashedur M.Rahman 《Journal of Software Engineering and Applications》 2013年第4期196-206,共11页
Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audie... Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area. 展开更多
关键词 CRM Actionable KNOWLEDGE Data Mining C4.5 NAIVE BAYES ROC classification
暂未订购
DDoS Attack Detection Using Heuristics Clustering Algorithm and Naive Bayes Classification
11
作者 Sharmila Bista Roshan Chitrakar 《Journal of Information Security》 2018年第1期33-44,共12页
In recent times among the multitude of attacks present in network system, DDoS attacks have emerged to be the attacks with the most devastating effects. The main objective of this paper is to propose a system that eff... In recent times among the multitude of attacks present in network system, DDoS attacks have emerged to be the attacks with the most devastating effects. The main objective of this paper is to propose a system that effectively detects DDoS attacks appearing in any networked system using the clustering technique of data mining followed by classification. This method uses a Heuristics Clustering Algorithm (HCA) to cluster the available data and Na?ve Bayes (NB) classification to classify the data and detect the attacks created in the system based on some network attributes of the data packet. The clustering algorithm is based in unsupervised learning technique and is sometimes unable to detect some of the attack instances and few normal instances, therefore classification techniques are also used along with clustering to overcome this classification problem and to enhance the accuracy. Na?ve Bayes classifiers are based on very strong independence assumptions with fairly simple construction to derive the conditional probability for each relationship. A series of experiment is performed using “The CAIDA UCSD DDoS Attack 2007 Dataset” and “DARPA 2000 Dataset” and the efficiency of the proposed system has been tested based on the following performance parameters: Accuracy, Detection Rate and False Positive Rate and the result obtained from the proposed system has been found that it has enhanced accuracy and detection rate with low false positive rate. 展开更多
关键词 DDOS Attacks Heuristic Clustering Algorithm NAIVE BAYES classification CAIDA UCSD DARPA 2000
暂未订购
Comparative Analysis of Machine Learning Algorithms for Optimal Land Use and Land Cover Classification:Guiding Method Selection for Resource-Limited Settings in Tiaty,Baringo County,Kenya
12
作者 John Kapoi Kipterer Mark K.Boitt Charles N.Mundia 《Journal of Geoscience and Environment Protection》 2025年第4期393-414,共22页
Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.M... Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.Machine learn-ing methods for land use and land cover(LULC)classification are vital for monitoring environmental changes.Remote sensing advancements increase the potential for classifying land cover,which requires assessing algorithm ac-curacy and efficiency for fragile environments.This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensi-tive ecosystems.Landsat-9 imagery from January to April 2023 facilitated land use class identification.Preprocessing in the Google Earth Engine applied spec-tral indices such as the NDVI,NDWI,BSI,and NDBI.Supervised classification uses random forest(RF),support vector machine(SVM),classification and re-gression trees(CARTs),gradient boosting trees(GBTs),and naïve Bayes.An accuracy assessment was used to determine the optimal classifiers for future land use analyses.The evaluation revealed that the RF model achieved 84.4%accuracy with a 0.85 weighted F1 score,indicating its effectiveness for complex LULC data.In contrast,the GBT and CART methods yielded moderate F1 scores(0.77 and 0.68),indicating the presence of overclassification and class imbalance issues.The SVM and naïve Bayes methods were less accurate,ren-dering them unsuitable for LULC tasks.RF is optimal for monitoring and plan-ning land use in dynamic arid areas.Future research should explore hybrid methods and diversify training sites to improve performance. 展开更多
关键词 Support Vector Machine Random Forest classification and Regression Trees.Gradient Boosting Trees Naïve Bayes SEMIARID Weighted F-1 Score Land Use and Land Cover
在线阅读 下载PDF
引入激活扩散的贝叶斯网络分类器
13
作者 董飒 刘杰 +4 位作者 刘大有 李婷婷 徐海啸 吴旗 欧阳若川 《吉林大学学报(信息科学版)》 2025年第2期317-326,共10页
针对网络数据分类的关系分类器都基于同质性假设,而基于一阶马尔可夫假设的简化处理存在一定局限性的问题,在贝叶斯网络分类器中,引入局部图排序激活扩散方法替代原始的直接邻域获取方法。通过设置初始能量值和最小能量阈值,适当扩大分... 针对网络数据分类的关系分类器都基于同质性假设,而基于一阶马尔可夫假设的简化处理存在一定局限性的问题,在贝叶斯网络分类器中,引入局部图排序激活扩散方法替代原始的直接邻域获取方法。通过设置初始能量值和最小能量阈值,适当扩大分类时待分类节点的邻域范围,从而提高了节点的同质性。结合松弛标注的协作推理方法,引入激活扩散的贝叶斯网络分类器ASNBC(Activation Spreading Network Bayes Classifier)在一定程度上提高了网络数据的分类精度。与4个网络分类器的对比实验结果表明,该方法在6个网络数据集上的分类精度都有不同程度的提高。 展开更多
关键词 人工智能 网络数据分类 激活扩散 贝叶斯网络分类器 协作推理
在线阅读 下载PDF
基于机器学习的中文图书智能分类实证研究
14
作者 夏丹 《新世纪图书馆》 2025年第2期45-52,共8页
为提高图书审校效率,论文以高校图书馆馆藏中文书目为数据源,以内容提要、主题词和题名为特征词来源,根据特征词来源位置对特征词进行加权处理和特征词词频统计,构建图书-特征词稀疏矩阵,按比例对带有图书分类号的稀疏矩阵进行朴素贝叶... 为提高图书审校效率,论文以高校图书馆馆藏中文书目为数据源,以内容提要、主题词和题名为特征词来源,根据特征词来源位置对特征词进行加权处理和特征词词频统计,构建图书-特征词稀疏矩阵,按比例对带有图书分类号的稀疏矩阵进行朴素贝叶斯计算,找到图书分类最大概率,评估训练分类模型。实验结果表明,利用朴素贝叶斯算法基于加权精选特征词的图书智能分类模型,具有良好的实用性,对进一步提高采编部工作的智能化和高效化是有效可行的。 展开更多
关键词 机器学习 朴素贝叶斯 图书智能分类 中文图书
在线阅读 下载PDF
Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification 被引量:3
15
作者 Mitat Poyraz Zeynep Hilal Kilimci Murat Can Ganiz 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期376-391,共16页
It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics".... It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets. 展开更多
关键词 Naive Bayes semantic smoothing higher-order Naive Bayes higher-order smoothing text classification
原文传递
A new feature selection method for handling redundant information in text classification 被引量:3
16
作者 You-wei WANG Li-zhou FENG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第2期221-234,共14页
Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant informa... Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection(OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets(Web KB, 20-Newsgroups, and Reuters-21578) where in support vector machines and na?ve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods(information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods(improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification. 展开更多
关键词 Feature selection Dimensionality reduction Text classification Redundant features Support vector machine Naive Bayes Mutual information
原文传递
简短视空间记忆测验-修订版对精神病临床高危综合征人群转化的预测效度
17
作者 熊凌川 崔慧茹 +6 位作者 徐丽华 魏燕燕 张丹 钱禛颖 唐莺莹 张天宏 王继军 《中国神经精神疾病杂志》 北大核心 2025年第9期528-534,共7页
目的探索简短视空间记忆测验-修订版(brief visuospatial memory test-revised,BVMT-R)在预测精神病临床高危综合征(clinical high-risk for psychosis,CHR-P)受试者精神病转化中的作用。方法募集217名CHR-P受试者,基线时进行BVMT-R评估... 目的探索简短视空间记忆测验-修订版(brief visuospatial memory test-revised,BVMT-R)在预测精神病临床高危综合征(clinical high-risk for psychosis,CHR-P)受试者精神病转化中的作用。方法募集217名CHR-P受试者,基线时进行BVMT-R评估,并进行3年的随访,评估其是否发生精神病转化。应用广义可加模型分析BVMT-R总分与CHR-P精神病转化概率的关系,并采用最大选择秩统计量法,计算BVMT-R总分预测CHR-P人群精神病转化的截断值,根据得到的截断值将BVMT-R总分划分为不同区间,计算不同区间的阳性似然比和在不同时点的精神病转化率。结果最终168例CHR-P完成3年随访。广义可加模型结果显示,BVMT-R总分与CHR-P精神病转化概率之间的关系呈现分段函数模型特征。最大选择秩统计量法确定截断值为18分和29分,进而将BVMT-R总分划分为0~18分、19~29分、30~36分3个区间,3个区间预测CHR-P精神病转化的阳性似然比两两之间差异存在统计学意义(均P<0.01),3个区间在不同随访时点的精神病转化率差异有统计学意义(均P<0.01)。结论BVMT-R总分可划分成3个区间,每个区间有着不同的预测CHR-P精神病转化的阳性似然比,据此BVMT-R总分可初步预测CHR-P人群的精神病转化概率。 展开更多
关键词 精神病 临床高危综合征 简短视空间记忆测验-修订版 转化 广义可加模型 最大选择秩统计量 贝叶斯分类 预测
暂未订购
基于误分类修正的朴素贝叶斯分类器及其在政务热线行业分类中的应用
18
作者 官国宇 杨皓翔 +1 位作者 王运豪 郝立柱 《数理统计与管理》 北大核心 2025年第1期179-190,共12页
传统统计分类方法应用于政务热线行业文本分类问题时存在一定系统性偏差。为了修正系统性偏差,进而减少由误分类导致的额外人力和时间成本,本文将朴素贝叶斯模型作为基准分类器,在最大后验概率判别准则中引入修正系数,并基于验证集上的... 传统统计分类方法应用于政务热线行业文本分类问题时存在一定系统性偏差。为了修正系统性偏差,进而减少由误分类导致的额外人力和时间成本,本文将朴素贝叶斯模型作为基准分类器,在最大后验概率判别准则中引入修正系数,并基于验证集上的误分类结果对修正系数进行学习,将其应用于政务热线的行业文本分类中。实证结果表明,修正后分类器的分类精确度比基准分类器提升了至少1个百分点,使误分类样本量减少了4个百分点。由于政务热线的文本工单数量庞大,故该方法对提升行政服务效率,降低人力资源成本具有积极意义。 展开更多
关键词 朴素贝叶斯 政务热线 文本分类 修正系数
原文传递
基于多项式朴素贝叶斯的惠企政策文本分类研究
19
作者 马建淮 喻金平 《电脑与信息技术》 2025年第2期27-30,共4页
多项式朴素贝叶斯算法是机器学习中常用的一种统计分类技术。当应用于文本分类时,该算法使用文本中出现的特征词频率来预测文本的分类标签。对惠企政策进行文本分类问题,即自动化地将大量政策文本归类到相应的主题类别中,提出了一种基于... 多项式朴素贝叶斯算法是机器学习中常用的一种统计分类技术。当应用于文本分类时,该算法使用文本中出现的特征词频率来预测文本的分类标签。对惠企政策进行文本分类问题,即自动化地将大量政策文本归类到相应的主题类别中,提出了一种基于TF-IDF算法和多项式朴素贝叶斯相结合的方法。首先,通过jieba分词工具对文本进行分词处理,并加载停用词表过滤掉一些不重要的常见词。接着,利用TF-IDF方法提取文本中的特征词,将TF-IDF特征值与卡方统计量相结合,作为特征向量输入多项式朴素贝叶斯算法中进行分类。实验结果表明,该模型对惠企政策文本的分类准确率达到了0.88,在相应的10个文本类别中,分类准确率最高为0.95,最低为0.82,分类效果理想。 展开更多
关键词 多项式朴素贝叶斯 TF-IDF 惠企政策 文本分类
在线阅读 下载PDF
基于加权朴素贝叶斯的多PLC通信数据精准分类研究
20
作者 李瑞平 楚贝贝 曹威 《通信电源技术》 2025年第3期19-21,共3页
针对多可编程逻辑控制器(Programmable Logic Controller,PLC)通信数据的精准分类问题,引入加权朴素贝叶斯,开展多PLC通信数据精准分类研究。通过数据预处理,包括数据清洗、标准化等,为后续分析奠定坚实基础。采用时间序列模糊分割技术... 针对多可编程逻辑控制器(Programmable Logic Controller,PLC)通信数据的精准分类问题,引入加权朴素贝叶斯,开展多PLC通信数据精准分类研究。通过数据预处理,包括数据清洗、标准化等,为后续分析奠定坚实基础。采用时间序列模糊分割技术,将复杂的通信数据划分为多个具有明确特征的模糊段,有效提取数据的内在结构信息。在此基础上,引入加权朴素贝叶斯算法,通过综合考虑各特征的重要性及其权重,实现对通信数据的深度集成与分类。通过对比实验证明,该方法相较于现有分类算法,在分类精度方面表现出显著优势,为PLC通信数据的智能分析与处理提供了新的思路与解决方案。研究不仅提升了多PLC通信数据的分类精度,而且为相关领域的研究与应用提供了有力支撑。 展开更多
关键词 加权朴素贝叶斯 通信 精准分类 数据 可编程逻辑控制器(PLC)
在线阅读 下载PDF
上一页 1 2 32 下一页 到第
使用帮助 返回顶部