期刊文献+
共找到603篇文章
< 1 2 31 >
每页显示 20 50 100
Multimodal Deep Neural Networks for Digitized Document Classification
1
作者 Aigerim Baimakhanova Ainur Zhumadillayeva +4 位作者 Bigul Mukhametzhanova Natalya Glazyrina Rozamgul Niyazova Nurseit Zhunissov Aizhan Sambetbayeva 《Computer Systems Science & Engineering》 2024年第3期793-811,共19页
As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of d... As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker. 展开更多
关键词 document categorization deep learning machine learning classification DIGITIZATION
在线阅读 下载PDF
Document classification approach by rough-set-based corner classification neural network 被引量:1
2
作者 张卫丰 徐宝文 +1 位作者 崔自峰 徐峻岭 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期439-444,共6页
A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and... A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents. 展开更多
关键词 document classification neural network rough set meta search engine
在线阅读 下载PDF
Study on Multi-Label Classification of Medical Dispute Documents 被引量:2
3
作者 Baili Zhang Shan Zhou +2 位作者 Le Yang Jianhua Lv Mingjun Zhong 《Computers, Materials & Continua》 SCIE EI 2020年第12期1975-1986,共12页
The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treat... The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treatment process in order to provide detailed trace data in medical dispute resolution.Second,IoMT can infiltrate the ongoing treatment and provide timely intelligent decision support to medical staff.This information includes recommendation of similar historical cases,guidance for medical treatment,alerting of hired dispute profiteers etc.The multi-label classification of medical dispute documents(MDDs)plays an important role as a front-end process for intelligent decision support,especially in the recommendation of similar historical cases.However,MDDs usually appear as long texts containing a large amount of redundant information,and there is a serious distribution imbalance in the dataset,which directly leads to weaker classification performance.Accordingly,in this paper,a multi-label classification method based on key sentence extraction is proposed for MDDs.The method is divided into two parts.First,the attention-based hierarchical bi-directional long short-term memory(BiLSTM)model is used to extract key sentences from documents;second,random comprehensive sampling Bagging(RCS-Bagging),which is an ensemble multi-label classification model,is employed to classify MDDs based on key sentence sets.The use of this approach greatly improves the classification performance.Experiments show that the performance of the two models proposed in this paper is remarkably better than that of the baseline methods. 展开更多
关键词 Internet of Medical Things(IoMT) medical disputes medical dispute document(MDD) multi-label classification(MLC) key sentence extraction class imbalance
在线阅读 下载PDF
Automatically Constructing an Effective Domain Ontology for Document Classification 被引量:2
4
作者 Yi-Hsing Chang 《Computer Technology and Application》 2011年第3期182-189,共8页
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the... An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification. 展开更多
关键词 Naive bayes classifier ONTOLOGY formal concept analysis document classification.
在线阅读 下载PDF
Integrating Intra-and Inter-document Evidences for Improving Sentence Sentiment Classification 被引量:6
5
作者 ZHAO Yan-Yan QIN Bing LIU Ting 《自动化学报》 EI CSCD 北大核心 2010年第10期1417-1425,共9页
关键词 数码相机 像素 富士 光学变焦
在线阅读 下载PDF
Automatic Arabic Document Classification via kNN
6
作者 HANI M. O. Iwidat 《Computer Aided Drafting,Design and Manufacturing》 2008年第2期65-73,共9页
Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The... Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories. 展开更多
关键词 Arabic documents classification KNN vector model keywords extraction
在线阅读 下载PDF
An improved TF-IDF approach for text classification 被引量:6
7
作者 张云涛 龚玲 王永成 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第1期49-55,共7页
This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synony... This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach. 展开更多
关键词 Term frequency/inverse document frequency (TF-IDF) Text classification CONFIDENCE SUPPORT Characteristic words
在线阅读 下载PDF
Word Net-based lexical semantic classification for text corpus analysis
8
作者 龙军 王鲁达 +2 位作者 李祖德 张祖平 杨柳 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第5期1833-1840,共8页
Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, lea... Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors.This work proposed a document representation method, Word Net-based lexical semantic VSM, to solve the problem. Using Word Net,this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis. 展开更多
关键词 document representation lexical semantic content classification EIGENVECTOR
在线阅读 下载PDF
Stemming Algorithm to Classify Arabic Documents 被引量:1
9
作者 Marwan AIi.H. Omer Shilong Ma 《通讯和计算机(中英文版)》 2010年第9期1-5,共5页
关键词 阿拉伯语 机密文件 文本分类 算法 分类系统 文件分类 阿拉伯文 实验数据
在线阅读 下载PDF
Meaningful String Extraction Based on Clustering for Improving Webpage Classification
10
作者 Chen Jie Tan Jianlong +1 位作者 Liao Hao Zhou Yanquan 《China Communications》 SCIE CSCD 2012年第3期68-77,共10页
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ... Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification. 展开更多
关键词 webpage classification meaningfulstring extraction document clustering term cluste-ring K-MEANS spectral clustering
在线阅读 下载PDF
Blockchain Technology Based Information Classification Management Service
11
作者 Gi-Wan Hong Jeong-Wook Kim Hangbae Chang 《Computers, Materials & Continua》 SCIE EI 2021年第5期1489-1501,共13页
Hyper-connectivity in Industry 4.0 has resulted in not only a rapid increase in the amount of information,but also the expansion of areas and assets to be protected.In terms of information security,it has led to an en... Hyper-connectivity in Industry 4.0 has resulted in not only a rapid increase in the amount of information,but also the expansion of areas and assets to be protected.In terms of information security,it has led to an enormous economic cost due to the various and numerous security solutions used in protecting the increased assets.Also,it has caused difficulties in managing those issues due to reasons such as mutual interference,countless security events and logs’data,etc.Within this security environment,an organization should identify and classify assets based on the value of data and their security perspective,and then apply appropriate protection measures according to the assets’security classification for effective security management.But there are still difficulties stemming from the need to manage numerous security solutions in order to protect the classified assets.In this paper,we propose an information classification management service based on blockchain,which presents and uses a model of the value of data and the security perspective.It records transactions of classifying assets and managing assets by each class in a distributed ledger of blockchain.The proposed service reduces assets to be protected and security solutions to be applied,and provides security measures at the platform level rather than individual security solutions,by using blockchain.In the rapidly changing security environment of Industry 4.0,this proposed service enables economic security,provides a new integrated security platform,and demonstrates service value. 展开更多
关键词 Information classification data integrity document security blockchain CIA
在线阅读 下载PDF
Incrementally Exploiting Sentential Association for Email Classification
12
作者 李曲 何玉 +1 位作者 冯剑琳 冯玉才 《Journal of Southwest Jiaotong University(English Edition)》 2006年第2期129-134,共6页
A novel association-based algorithm EmailinClass is proposed for incremental Email classification. In view of the fact that the basic semantic unit in an Email is actually a sentence, and the words within the same sen... A novel association-based algorithm EmailinClass is proposed for incremental Email classification. In view of the fact that the basic semantic unit in an Email is actually a sentence, and the words within the same sentence are typically more semantically related than the words that just appear in the same Email, EmailInClass views a sentence rather than an Email as a transaction. Extensive experiments conducted on benchmark corpora Enron reveal that the effectiveness of EmallInClass is superior to the non-incremental alternatives such as NalveBayes and SAT-MOD. In addition, the classification rules generated by EroaillnClass are human readable and revisable, 展开更多
关键词 document Requent itemset Category frequent itemset MODFIT heuristic Category prefix-tree Incremental classification
在线阅读 下载PDF
Chinese Sentiment Classification Using Extended Word2Vec
13
作者 张胜 张鑫 +1 位作者 程佳军 王晖 《Journal of Donghua University(English Edition)》 EI CAS 2016年第5期823-826,共4页
Sentiment analysis is now more and more important in modern natural language processing,and the sentiment classification is the one of the most popular applications.The crucial part of sentiment classification is feat... Sentiment analysis is now more and more important in modern natural language processing,and the sentiment classification is the one of the most popular applications.The crucial part of sentiment classification is feature extraction.In this paper,two methods for feature extraction,feature selection and feature embedding,are compared.Then Word2Vec is used as an embedding method.In this experiment,Chinese document is used as the corpus,and tree methods are used to get the features of a document:average word vectors,Doc2Vec and weighted average word vectors.After that,these samples are fed to three machine learning algorithms to do the classification,and support vector machine(SVM) has the best result.Finally,the parameters of random forest are analyzed. 展开更多
关键词 embedding document segmentation dimensionality suffers projection latter classify preprocessing probabilistic
在线阅读 下载PDF
Least Squares One-Class Support Tensor Machine
14
作者 Kaiwen Zhao Yali Fan 《Journal of Computer and Communications》 2024年第4期186-200,共15页
One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification ... One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification problem for second-order tensor data. Traditional vector-based one-class classification methods such as one-class support vector machine (OCSVM) and least squares one-class support vector machine (LSOCSVM) have limitations when tensor is used as input data, so we propose a new tensor one-class classification method, LSOCSTM, which directly uses tensor as input data. On one hand, using tensor as input data not only enables to classify tensor data, but also for vector data, classifying it after high dimensionalizing it into tensor still improves the classification accuracy and overcomes the over-fitting problem. On the other hand, different from one-class support tensor machine (OCSTM), we use squared loss instead of the original loss function so that we solve a series of linear equations instead of quadratic programming problems. Therefore, we use the distance to the hyperplane as a metric for classification, and the proposed method is more accurate and faster compared to existing methods. The experimental results show the high efficiency of the proposed method compared with several state-of-the-art methods. 展开更多
关键词 Least Square one-class Support Tensor Machine one-class classification Upscale Least Square one-class Support Vector Machine one-class Support Tensor Machine
在线阅读 下载PDF
从出土医学文献论新三世医学
15
作者 何仙童 柳长华 高锋 《中华中医药杂志》 北大核心 2025年第1期118-123,共6页
中国的医学,到了两汉始为之纂集。在中国传统文化繁荣融合背景下,形成了中国独特的医学理论体系。上古有伏羲制九针、神农尝百草、黄帝论经脉的传说。至西汉,《汉书·艺文志》将医学分为四类:医经、经方、房中、神仙。传世《素问... 中国的医学,到了两汉始为之纂集。在中国传统文化繁荣融合背景下,形成了中国独特的医学理论体系。上古有伏羲制九针、神农尝百草、黄帝论经脉的传说。至西汉,《汉书·艺文志》将医学分为四类:医经、经方、房中、神仙。传世《素问·异法方宜论》按地域将医学分为五类:砭石、毒药、灸芮、九针、导引按蹻。柳长华教授根据对出土及传世医学文献的研究,综合考古和史学界对中国区域文化的认识,提出中国实际从两汉至今存在新三世医学,即经脉医学、汤液医学和导引医学。《黄帝内经》已然是三世医学的融合之作。文章即利用出土医学文献与传世文献,结合中国区域文化的特点,对新三世医学的分类和形成进行详细梳理。 展开更多
关键词 出土文献 中医学 分类 三世医学
原文传递
人工智能在图书馆文献分类标引工作中的应用效果研究
16
作者 刘彩玉 姜云莉 《图书情报导刊》 2025年第7期17-22,共6页
聚焦人工智能在图书馆文献分类标引工作中的应用,阐述相关背景与意义,剖析人工智能应用现状、效果和问题,提出质量控制策略,探讨其对图书馆工作影响与未来发展,旨在为图书馆学领域相关研究与实践提供全面参考。
关键词 人工智能 图书馆 文献分类标引
在线阅读 下载PDF
基于NLP和图像分类模型的中文科技文献双模态分类方法
17
作者 王峥 丁熠 +1 位作者 陈海明 陈盈 《南京师大学报(自然科学版)》 北大核心 2025年第3期84-92,共9页
随着当前对科技文献管理和组织要求的急剧增加,对于更为可扩展、精确且自动化的文献分类方式的需求也更高.为了有效应对海量科技文献数据的分析难题,提出了融合YOLOv7图像分类模型和自然语言处理(NLP)模型的多模态文献分析引擎.该架构... 随着当前对科技文献管理和组织要求的急剧增加,对于更为可扩展、精确且自动化的文献分类方式的需求也更高.为了有效应对海量科技文献数据的分析难题,提出了融合YOLOv7图像分类模型和自然语言处理(NLP)模型的多模态文献分析引擎.该架构充分挖掘文档中的自然语言文本、描述性图像以及两者间的内在关联这3种关键信息,通过综合训练流程整合不同模态的深度学习网络,达成相较于单模态分类方法更优的分类精准度.同时,将所提方法应用到中文科技文献数据集,并依据中图分类号对文献进行了分类训练.结果表明,所提双模态文献分类方法具有更高的分类准确性,有助于企事业单位和研究机构在数据与知识管理方面的效率提升. 展开更多
关键词 科技文献分类 图像分类 多模态特征 自然语言处理 深度学习 YOLOv7
在线阅读 下载PDF
基于误分类修正的朴素贝叶斯分类器及其在政务热线行业分类中的应用
18
作者 官国宇 杨皓翔 +1 位作者 王运豪 郝立柱 《数理统计与管理》 北大核心 2025年第1期179-190,共12页
传统统计分类方法应用于政务热线行业文本分类问题时存在一定系统性偏差。为了修正系统性偏差,进而减少由误分类导致的额外人力和时间成本,本文将朴素贝叶斯模型作为基准分类器,在最大后验概率判别准则中引入修正系数,并基于验证集上的... 传统统计分类方法应用于政务热线行业文本分类问题时存在一定系统性偏差。为了修正系统性偏差,进而减少由误分类导致的额外人力和时间成本,本文将朴素贝叶斯模型作为基准分类器,在最大后验概率判别准则中引入修正系数,并基于验证集上的误分类结果对修正系数进行学习,将其应用于政务热线的行业文本分类中。实证结果表明,修正后分类器的分类精确度比基准分类器提升了至少1个百分点,使误分类样本量减少了4个百分点。由于政务热线的文本工单数量庞大,故该方法对提升行政服务效率,降低人力资源成本具有积极意义。 展开更多
关键词 朴素贝叶斯 政务热线 文本分类 修正系数
原文传递
一种基于DA_FASTTEXT的文档分类研究
19
作者 王栋平 穆宁 +1 位作者 王峥 张晓燕 《价值工程》 2025年第6期145-147,共3页
传统文档分类系统都是基于文档的词属性,利用庞大的词典支持和复杂的切词处理实现文档分类,导致很难兼顾分类准确性和分类速度。本文研究基于FASTTEXT算法的中文文档分类,使中文文档分类在保证分类准确性的同时还降低了时间开销,同时利... 传统文档分类系统都是基于文档的词属性,利用庞大的词典支持和复杂的切词处理实现文档分类,导致很难兼顾分类准确性和分类速度。本文研究基于FASTTEXT算法的中文文档分类,使中文文档分类在保证分类准确性的同时还降低了时间开销,同时利用蜻蜓优化算法(DA)实现FASTTEXT参数优化,解决FASTTEXT算法参数过多、模型效果依赖于参数值的设定优化问题。本文提出了DA_FASTTEXT分类方法,实现了一个基于DA_FASTTEXT分类方法的中文文档分类系统。测试结果表明其具有更优的分类准确性和分类速度综合性能。 展开更多
关键词 文档分类 蜻蜓优化算法(DA) 参数优化 FASTTEXT算法
在线阅读 下载PDF
上一页 1 2 31 下一页 到第
使用帮助 返回顶部