This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning doc...This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem;2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm;3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction.展开更多
The automatic and accurate classification of Magnetic Resonance Imaging(MRI)radiology report is essential for the analysis and interpretation epilepsy and non-epilepsy.Since the majority of MRI radiology reports are u...The automatic and accurate classification of Magnetic Resonance Imaging(MRI)radiology report is essential for the analysis and interpretation epilepsy and non-epilepsy.Since the majority of MRI radiology reports are unstructured,the manual information extraction is time-consuming and requires specific expertise.In this paper,a comprehensive method is proposed to classify epilepsy and non-epilepsy real brain MRI radiology text reports automatically.This method combines the Natural Language Processing technique and statisticalMachine Learning methods.122 realMRI radiology text reports(97 epilepsy,25 non-epilepsy)are studied by our proposed method which consists of the following steps:(i)for a given text report our systems first cleans HTML/XML tags,tokenize,erase punctuation,normalize text,(ii)then it converts into MRI text reports numeric sequences by using indexbased word encoding,(iii)then we applied the deep learning models that are uni-directional long short-term memory(LSTM)network,bidirectional long short-term memory(BiLSTM)network and convolutional neural network(CNN)for the classifying comparison of the data,(iv)finally,we used 70%of used for training,15%for validation,and 15%for test observations.Unlike previous methods,this study encompasses the following objectives:(a)to extract significant text features from radiologic reports of epilepsy disease;(b)to ensure successful classifying accuracy performance to enhance epilepsy data attributes.Therefore,our study is a comprehensive comparative study with the epilepsy dataset obtained from numeric sequences by using index-based word encoding method applied for the deep learning models.The traditionalmethod is numeric sequences by using index-based word encoding which has been made for the first time in the literature,is successful feature descriptor in the epilepsy data set.The BiLSTM network has shown a promising performance regarding the accuracy rates.We show that the larger sizedmedical text reports can be analyzed by our proposed method.展开更多
Deep neural networks (DNNs) have achieved great success in tasks such as image classification, speech recognition, and natural language processing. However, they are susceptible to false predictions caused by adversar...Deep neural networks (DNNs) have achieved great success in tasks such as image classification, speech recognition, and natural language processing. However, they are susceptible to false predictions caused by adversarial exemplars, which are normal inputs with imperceptible perturbations. Adversarial samples have been widely studied in image classification, but not as much in text classification. Current textual attack methods often rely on low-success-rate heuristic replacement strategies at the character or word level, which cannot search for the best solution while maintaining semantic consistency and linguistic fluency. Our framework, FastAttacker, generates natural adversarial text efficiently and effectively by constructing different semantic perturbation functions. It optimizes perturbations constrained in generic semantic spaces, such as the typo space, knowledge space, contextualized semantic space, or a combination. As a result, the generated adversarial texts are semantically close to the original inputs. Experiments show that FastAttacker generates adversarial texts from different levels of spatial constraints, making the problem of finding synonyms an optimal solution problem. Our approach is not only robust in terms of attack generation, but also in terms of adversarial defense. Experiments have shown that state-of-the-art language models and defense strategies are still vulnerable to FastAttack attacks.展开更多
Purpose-The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning(DL)techniques.A distributed framework utilizing B...Purpose-The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning(DL)techniques.A distributed framework utilizing Bidirectional Encoder Representations from Transformers(BERT)was developed to classify news headlines.This approach leverages various text mining and DL techniques on a distributed infrastructure,aiming to offer an alternative to traditional news classification methods.Design/methodology/approach-This study focuses on the classification of distinct types of news by analyzing tweets from various news channels.It addresses the limitations of using benchmark datasets for news classification,which often result in models that are impractical for real-world applications.Findings-The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository,assessing the performance of each text mining and classification method across these datasets.The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time.This indicates that the distributed framework,coupled with the use of BERT for text analysis,provides a robust solution for analyzing large volumes of data efficiently.The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification,suggesting its potential to facilitate advancements in these areas.Originality/value-This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets.By utilizing cutting-edge techniques and a novel dataset,the study offers significant improvements in accuracy and processing speed.The release of the corpus represents a valuable contribution to the field,enabling further exploration into news and emotion classification.This work sets a new standard for the analysis of news data,offering practical implications for the development of more effective and efficient news classification systems.展开更多
文摘This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem;2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm;3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction.
文摘The automatic and accurate classification of Magnetic Resonance Imaging(MRI)radiology report is essential for the analysis and interpretation epilepsy and non-epilepsy.Since the majority of MRI radiology reports are unstructured,the manual information extraction is time-consuming and requires specific expertise.In this paper,a comprehensive method is proposed to classify epilepsy and non-epilepsy real brain MRI radiology text reports automatically.This method combines the Natural Language Processing technique and statisticalMachine Learning methods.122 realMRI radiology text reports(97 epilepsy,25 non-epilepsy)are studied by our proposed method which consists of the following steps:(i)for a given text report our systems first cleans HTML/XML tags,tokenize,erase punctuation,normalize text,(ii)then it converts into MRI text reports numeric sequences by using indexbased word encoding,(iii)then we applied the deep learning models that are uni-directional long short-term memory(LSTM)network,bidirectional long short-term memory(BiLSTM)network and convolutional neural network(CNN)for the classifying comparison of the data,(iv)finally,we used 70%of used for training,15%for validation,and 15%for test observations.Unlike previous methods,this study encompasses the following objectives:(a)to extract significant text features from radiologic reports of epilepsy disease;(b)to ensure successful classifying accuracy performance to enhance epilepsy data attributes.Therefore,our study is a comprehensive comparative study with the epilepsy dataset obtained from numeric sequences by using index-based word encoding method applied for the deep learning models.The traditionalmethod is numeric sequences by using index-based word encoding which has been made for the first time in the literature,is successful feature descriptor in the epilepsy data set.The BiLSTM network has shown a promising performance regarding the accuracy rates.We show that the larger sizedmedical text reports can be analyzed by our proposed method.
文摘Deep neural networks (DNNs) have achieved great success in tasks such as image classification, speech recognition, and natural language processing. However, they are susceptible to false predictions caused by adversarial exemplars, which are normal inputs with imperceptible perturbations. Adversarial samples have been widely studied in image classification, but not as much in text classification. Current textual attack methods often rely on low-success-rate heuristic replacement strategies at the character or word level, which cannot search for the best solution while maintaining semantic consistency and linguistic fluency. Our framework, FastAttacker, generates natural adversarial text efficiently and effectively by constructing different semantic perturbation functions. It optimizes perturbations constrained in generic semantic spaces, such as the typo space, knowledge space, contextualized semantic space, or a combination. As a result, the generated adversarial texts are semantically close to the original inputs. Experiments show that FastAttacker generates adversarial texts from different levels of spatial constraints, making the problem of finding synonyms an optimal solution problem. Our approach is not only robust in terms of attack generation, but also in terms of adversarial defense. Experiments have shown that state-of-the-art language models and defense strategies are still vulnerable to FastAttack attacks.
文摘Purpose-The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning(DL)techniques.A distributed framework utilizing Bidirectional Encoder Representations from Transformers(BERT)was developed to classify news headlines.This approach leverages various text mining and DL techniques on a distributed infrastructure,aiming to offer an alternative to traditional news classification methods.Design/methodology/approach-This study focuses on the classification of distinct types of news by analyzing tweets from various news channels.It addresses the limitations of using benchmark datasets for news classification,which often result in models that are impractical for real-world applications.Findings-The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository,assessing the performance of each text mining and classification method across these datasets.The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time.This indicates that the distributed framework,coupled with the use of BERT for text analysis,provides a robust solution for analyzing large volumes of data efficiently.The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification,suggesting its potential to facilitate advancements in these areas.Originality/value-This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets.By utilizing cutting-edge techniques and a novel dataset,the study offers significant improvements in accuracy and processing speed.The release of the corpus represents a valuable contribution to the field,enabling further exploration into news and emotion classification.This work sets a new standard for the analysis of news data,offering practical implications for the development of more effective and efficient news classification systems.