Evaluating the adversarial robustness of classification algorithms in machine learning is a crucial domain.However,current methods lack measurable and interpretable metrics.To address this issue,this paper introduces ...Evaluating the adversarial robustness of classification algorithms in machine learning is a crucial domain.However,current methods lack measurable and interpretable metrics.To address this issue,this paper introduces a visual evaluation index named confidence centroid skewing quadrilateral,which is based on a classification confidence-based confusion matrix,offering a quantitative and visual comparison of the adversarial robustness among different classification algorithms,and enhances intuitiveness and interpretability of attack impacts.We first conduct a validity test and sensitive analysis of the method.Then,prove its effectiveness through the experiments of five classification algorithms including artificial neural network(ANN),logistic regression(LR),support vector machine(SVM),convolutional neural network(CNN)and transformer against three adversarial attacks such as fast gradient sign method(FGSM),DeepFool,and projected gradient descent(PGD)attack.展开更多
The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weig...The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.展开更多
With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information...With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information about settlements in western Jilin Province, and the manually-extracted information about settlements in western Jilin Province was evaluated by confusion matrix. The results showed that Decision Tree Model was convenient for extracting settlements information by integrating spectral and texture features, and the accuracy of such a method was higher than that of the traditional Maximum Liklihood Method, in addition, calculation methods of extracting settlements information by this mean were concluded.展开更多
Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is graduall...Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.展开更多
Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challen...Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.展开更多
The productivity and quality in the turning process can be improved by utilizing the predicted performance of the cutting tools.This research incorporates condition monitoring of a non-carbide tool insert using vibrat...The productivity and quality in the turning process can be improved by utilizing the predicted performance of the cutting tools.This research incorporates condition monitoring of a non-carbide tool insert using vibration analysis along with machine learning and fuzzy logic approach.A non-carbide tool insert is considered for the process of cutting operation in a semi-automatic lathe,where the condition of tool is monitored using vibration characteristics.The vibration signals for conditions such as heathy,damaged,thermal and flank were acquired with the help of piezoelectric transducer and data acquisition system.The descriptive statistical features were extracted from the acquired vibration signal using the feature extraction techniques.The extracted statistical features were selected using a feature selection process through J48 decision tree algorithm.The selected features were classified using J48 decision tree and fuzzy to develop the fault diagnosis model for the improved predictive analysis.The decision tree model produced the classification accuracy as 94.78%with five selected features.The developed fuzzy model produced the classification accuracy as 94.02%with five membership functions.Hence,the decision tree has been proposed as a suitable fault diagnosis model for predicting the tool insert health condition under different fault conditions.展开更多
The requirement of fault diagnosis in the field of automobiles is growing higher day by day.The reliability of human resources for the fault diagnosis is uncertain.Brakes are one of the major critical components in au...The requirement of fault diagnosis in the field of automobiles is growing higher day by day.The reliability of human resources for the fault diagnosis is uncertain.Brakes are one of the major critical components in automobiles that require closer and active observation.This research work demonstrates a fault diagnosis technique for monitoring the hydraulic brake system using vibration analysis.Vibration signals of a rotating element contain dynamic information about its health condition.Hence,the vibration signals were used for the brake fault diagnosis study.The study was carried out on a brake fault diagnosis experimental setup.The vibration signals under different fault conditions were acquired from the setup using an accelerometer.The condition monitoring of the hydraulic brake system using the vibration signal was processed using a machine learning approach.The machine learning approach has three phases,namely,feature extraction,feature selection,and feature classification.Histogram features were extracted from the vibration signals.The prominent features were selected using the decision tree.The selected features were classified using a fuzzy classifier.The histogram features and the fuzzy classifier combination produced maximum classification accuracy than that of the statistical features.展开更多
The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the ...The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the difference between the decision trees in the model is ignored and the prediction accuracy of the model is reduced. Taking into consideration these defects, an improved random forest model based on confusion matrix (CM-RF)is proposed. The decision tree cluster is selectively constructed by the similarity measure in the process of constructing the model, and the result is output by using the dynamic weighted voting fusion method in the final voting session. Experiments show that the proposed CM-RF can reduce the impact of low-performance decision trees on the output result, thus improving the accuracy and generalization ability of random forest model.展开更多
Rockbursts have become a significant hazard in underground mining,underscoring the need for a robust early warning model to ensure safety management.This study presents a novel approach for rockburst prediction,integr...Rockbursts have become a significant hazard in underground mining,underscoring the need for a robust early warning model to ensure safety management.This study presents a novel approach for rockburst prediction,integrating the Mann-Kendall trend test(MKT)and multi-indices fusion to enable real-time and quantitative assessment of rockburst hazards.The methodology employed in this study involves the development of a comprehensive precursory index library for rockbursts.The MKT is then applied to analyze the real-time trend of each index,with adherence to rockburst characterization laws serving as the warning criterion.By employing a confusion matrix,the warning effectiveness of each index is assessed,enabling index preference determination.Ultimately,the integrated rockburst hazard index Q is derived through data fusion.The results demonstrate that the proposed model achieves a warning effectiveness of 0.563 for Q,surpassing the performance of any individual index.Moreover,the model’s adaptability and scalability are enhanced through periodic updates driven by actual field monitoring data,making it suitable for complex underground working environments.By providing an efficient and accurate basis for decision-making,the proposed model holds great potential for the prevention and control of rockbursts.It offers a valuable tool for enhancing safety measures in underground mining operations.展开更多
IIF(Indirect Immune Florescence)has gained much attention recently due to its importance in medical sciences.The primary purpose of this work is to highlight a step-by-step methodology for detecting autoimmune disease...IIF(Indirect Immune Florescence)has gained much attention recently due to its importance in medical sciences.The primary purpose of this work is to highlight a step-by-step methodology for detecting autoimmune diseases.The use of IIF for detecting autoimmune diseases is widespread in different medical areas.Nearly 80 different types of autoimmune diseases have existed in various body parts.The IIF has been used for image classification in both ways,manually and by using the Computer-Aided Detection(CAD)system.The data scientists conducted various research works using an automatic CAD system with low accuracy.The diseases in the human body can be detected with the help of Transfer Learning(TL),an advanced Convolutional Neural Network(CNN)approach.The baseline paper applied the manual classification to the MIVIA dataset of Human Epithelial cells(HEP)type II cells and the Sub Class Discriminant(SDA)analysis technique used to detect autoimmune diseases.The technique yielded an accuracy of up to 90.03%,which was not reliable for detecting autoimmune disease in the mitotic cells of the body.In the current research,the work has been performed on the MIVIA data set of HEP type II cells by using four well-known models of TL.Data augmentation and normalization have been applied to the dataset to overcome the problem of overfitting and are also used to improve the performance of TL models.These models are named Inception V3,Dens Net 121,VGG-16,and Mobile Net,and their performance can be calculated through parameters of the confusion matrix(accuracy,precision,recall,and F1 measures).The results show that the accuracy value of VGG-16 is 78.00%,Inception V3 is 92.00%,Dense Net 121 is 95.00%,and Mobile Net shows 88.00%accuracy,respectively.Therefore,DenseNet-121 shows the highest performance with suitable analysis of autoimmune diseases.The overall performance highlighted that TL is a suitable and enhanced technique compared to its counterparts.Also,the proposed technique is used to detect autoimmune diseases with a minimal margin of errors and flaws.展开更多
Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical...Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.展开更多
Social media are interactive computer mediated technology that facilitates the sharing of information via virtual communities and networks. And Twitter is one of the most popular social media for social interaction an...Social media are interactive computer mediated technology that facilitates the sharing of information via virtual communities and networks. And Twitter is one of the most popular social media for social interaction and microblogging. This paper introduces an improved system model to analyze twitter data and detect terrorist attack event. In this model, a ternary search is used to find the weights of predefined keywords and the Aho-Corasick algorithm is applied to perform pattern matching and assign the weight which is the main contribution of this paper. Weights are categorized into three categories: Terror attack, Severe Terror Attack and Normal Data and the weights are used as attributes for classification. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are two machine learning algorithms used to predict whether a terror attack happened or not. We compare the accuracy with our actual data by using confusion matrix and measure whether our result is right or wrong and the achieved result shows that the proposed model performs better.展开更多
Data clustering plays a vital role in object identification. In real life we mainly use the concept in biometric identification and object detection. In this paper we use Fuzzy Weighted Rules, Fuzzy Inference System (...Data clustering plays a vital role in object identification. In real life we mainly use the concept in biometric identification and object detection. In this paper we use Fuzzy Weighted Rules, Fuzzy Inference System (FIS), Fuzzy C-Mean clustering (FCM), Support Vector Machine (SVM) and Artificial Neural Network (ANN) to distinguish three types of Iris data called Iris-Setosa, Iris-Versicolor and Iris-Virginica. Each class in the data table is identified by four-dimensional vector, where vectors are used as the input variable called: Sepal Length (SL), Sepal Width (SW), Petal Length (PL) and Petal Width (PW). The combination of five machine learning methods provides above 98% accuracy of class identification.展开更多
In this work,a consistency detection method is proposed,to overcome the inconsistencies in the use of large-scale lead-carbon energy storage batteries(LCESBs)and the difficulties of large-scale detection for LCESBs.Ba...In this work,a consistency detection method is proposed,to overcome the inconsistencies in the use of large-scale lead-carbon energy storage batteries(LCESBs)and the difficulties of large-scale detection for LCESBs.Based on the chemical materials and physical mechanisms of LCESBs,the internal and external factors that affect the consistency and their characterization parameters are analyzed.The inconsistent characterization parameters,such as voltage,temperature,and resistance,are used to construct a high-dimensional random matrix and calculate the matrix eigenvalue.Single loop theorem and average spectral radius are then employed to carry out preliminary consistency detection.Next,short-term discharge experiments are conducted on individual batteries with inconsistent initial screening.The voltage and temperature data is collected,and sequential overlapping derivative(SOD)transformation is performed to extract the characteristics of voltage and temperature changes.The consistency of individual cells using the Wasserstein distance is quantitatively characterized.Finally,the reliability of the consistency detection method is evaluated by the confusion matrix.The large amounts of actual measurement data shows a false negative rate of the algorithm of 0 and an accuracy of 99.94%.This study shows that using random matrix theory for preliminary detection is suitable for processing high-dimensional data of large-scale energy storage power plants.Using SOD for precise detection can amplify the voltage,temperature,and resistance differences of inconsistent batteries,making the consistency detection more accurate.展开更多
Lung cancer is one of the malignancies with the highest incidence and mortality rates worldwide.Accurate detection of early lung nodules is crucial for improving patient survival.However,manual reading of medical imag...Lung cancer is one of the malignancies with the highest incidence and mortality rates worldwide.Accurate detection of early lung nodules is crucial for improving patient survival.However,manual reading of medical images carries risks of missed or incorrect diagnoses,highlighting the urgent need for efficient and precise automated detection methods.This study constructs and optimizes a lung nodule detection model based on You Only Look Once version 8(YOLOv8).Model adjustments include tuning learning rate strategies,weight decay,and color augmentation parameters.Experiments were conducted on the standard LIDC-IDRI dataset.Model performance was evaluated using recall,Mean Average Precision(mAP),and the confusion matrix.The results indicate a recall rate of 0.87 and an mAP of 0.776,with a correct lung nodule recognition rate of 0.76 in the confusion matrix.The study demonstrates that rational parameter optimization can effectively reduce missed detections and misjudgments while significantly improving detection accuracy and classification stability.This provides a feasible technical pathway for computer-aided early lung cancer diagnosis.Future work may further enhance the model’s application value in intelligent medical imaging analysis by integrating data augmentation with parameter optimization.展开更多
Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most...Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most of their products and programs in Machakos County have been reducing due to re-payment challenges,threatening their financial ability to extend further credit.This could be attributed to ineffective credit scoring models which are not able to establish the nuanced non-linear repayment behavior and patterns of the loan applicants.The research objective was to enhance credit risk scoring for microfinance institutions in Machakos County using supervised machine learning algorithms.The study adopted a mixed research design under supervised machine learning approach.It randomly sampled 6771 loan application ac-count records and repayment history.Rstudio and Python programming lan-guages were deployed for data pre-processing and analysis.Logistic regression algorithm,XG Boosting and the random forest ensemble method were used.Metric evaluations used included the performance accuracy,Area under the Curve and F1-Score.Based on the study findings:XG Boosting was the best performer with 83.3%accuracy and 0.202 Brier score.Development of legal framework to govern ethical and open use of machine learning assessment was recommended.A similar research but using different machine learning al-gorithms,locations,and institutions,to ascertain the validity,reliability and the generalizability of the study findings was recommended for further re-search.展开更多
The performances of semisupervised clustering for unlabeled data are often superior to those of unsupervised learning,which indicates that semantic information attached to clusters can significantly improve feature re...The performances of semisupervised clustering for unlabeled data are often superior to those of unsupervised learning,which indicates that semantic information attached to clusters can significantly improve feature representation capability.In a graph convolutional network(GCN),each node contains information about itself and its neighbors that is beneficial to common and unique features among samples.Combining these findings,we propose a deep clustering method based on GCN and semantic feature guidance(GFDC) in which a deep convolutional network is used as a feature generator,and a GCN with a softmax layer performs clustering assignment.First,the diversity and amount of input information are enhanced to generate highly useful representations for downstream tasks.Subsequently,the topological graph is constructed to express the spatial relationship of features.For a pair of datasets,feature correspondence constraints are used to regularize clustering loss,and clustering outputs are iteratively optimized.Three external evaluation indicators,i.e.,clustering accuracy,normalized mutual information,and the adjusted Rand index,and an internal indicator,i.e., the Davidson-Bouldin index(DBI),are employed to evaluate clustering performances.Experimental results on eight public datasets show that the GFDC algorithm is significantly better than the majority of competitive clustering methods,i.e.,its clustering accuracy is20% higher than the best clustering method on the United States Postal Service dataset.The GFDC algorithm also has the highest accuracy on the smaller Amazon and Caltech datasets.Moreover,DBI indicates the dispersion of cluster distribution and compactness within the cluster.展开更多
Various diseases seriously affect the quality and yield of tomatoes. Fast and accurate identification of disease types is of great significance for the development of smart agriculture. Many Convolution Neural Network...Various diseases seriously affect the quality and yield of tomatoes. Fast and accurate identification of disease types is of great significance for the development of smart agriculture. Many Convolution Neural Network (CNN) models have been applied to the identification of tomato leaf diseases and achieved good results. However, some of these are executed at the cost of large calculation time and huge storage space. This study proposed a lightweight CNN model named MFRCNN, which is established by the multi-scale and feature reuse structure rather than simply stacking convolution layer by layer. To examine the model performances, two types of tomato leaf disease datasets were collected. One is the laboratory-based dataset, including one healthy and nine diseases, and the other is the field-based dataset, including five kinds of diseases. Afterward, the proposed MFRCNN and some popular CNN models (AlexNet, SqueezeNet, VGG16, ResNet18, and GoogLeNet) were tested on the two datasets. The results showed that compared to traditional models, the MFRCNN achieved the optimal performance, with an accuracy of 99.01% and 98.75% in laboratory and field datasets, respectively. The MFRCNN not only had the highest accuracy but also had relatively less computing time and few training parameters. Especially in terms of storage space, the MFRCNN model only needs 2.7 MB of space. Therefore, this work provides a novel solution for plant disease diagnosis, which is of great importance for the development of plant disease diagnosis systems on low-performance terminals.展开更多
Lean combustion is environment friendly with low NO_(x)emissions providing better fuel efficiency in a combustion system.However,approaching towards lean combustion can make engines more susceptible to an undesirable ...Lean combustion is environment friendly with low NO_(x)emissions providing better fuel efficiency in a combustion system.However,approaching towards lean combustion can make engines more susceptible to an undesirable phenomenon called lean blowout(LBO)that can cause flame extinction leading to sudden loss of power.During the design stage,it is quite challenging for the scientists to accurately determine the optimal operating limits to avoid sudden LBO occurrences.Therefore,it is crucial to develop accurate and computationally tractable frameworks for online LBO prediction in low NO_(x)emission engines.To the best of our knowledge,for the first time,we propose a deep learning approach to detect the transition to LBO in combustion systems.In this work,we utilize a laboratory-scale swirl-stabilized combustor to collect acoustic data for different protocols.For each protocol,starting far from LBO,we gradually move towards the LBO regime,capturing a quasi-static time series dataset at different conditions.Using one of the protocols in our dataset as the reference protocol,we find a transition state metric for our trained deep learning model to detect the imminent LBO in other test protocols.We find that our proposed approach is more precise and computationally faster than other baseline models to detect the transition to LBO.Therefore,we endorse this technique for monitoring the operation of lean combustion engines in real time.展开更多
Broiler flock welfare is usually assessed through mortality,physiology,behavior,and walk-ing ability.The possibility of assessing broiler chicken lameness using the bird walking abil-ity was investigated using the mac...Broiler flock welfare is usually assessed through mortality,physiology,behavior,and walk-ing ability.The possibility of assessing broiler chicken lameness using the bird walking abil-ity was investigated using the machine learning approach for the first time.Data on broiler walking speed and acceleration,genetic strain,and sex were recorded and input in a data-set.Broilers were classified according to the 6-point gait score(GS0 is a sound bird,and GS5 is a severely lame bird).Decision trees were built initially using all datasets.The confusion matrix of each developed model was analyzed.The pruning technique was used,removing from the dataset the variables that did not infer in the classification results.We reorganized the dataset and re-arranged the data by grouping the intermediate target class of gait score using the Borda Count method.Re-processing data,we obtained a new set of decision trees.Using the 3-point gait score(GS0 is a sound bird,and GS2 is a lame bird),we obtained a new model with better accuracy(78%);however,the model had a lower accuracy for classifying lame broilers(GS2,5%).The final decision tree was selected for classifying broilers,either sound or lame,according to their walking speed.The developed model presented good accuracy(91%),and it ordered properly sound(86%)and lame birds(92%).The novel model might be used to assess broiler lameness on-farm by registering the bird displacement velocity.Further developments using the model might allow flock lameness detection automatically.展开更多
文摘Evaluating the adversarial robustness of classification algorithms in machine learning is a crucial domain.However,current methods lack measurable and interpretable metrics.To address this issue,this paper introduces a visual evaluation index named confidence centroid skewing quadrilateral,which is based on a classification confidence-based confusion matrix,offering a quantitative and visual comparison of the adversarial robustness among different classification algorithms,and enhances intuitiveness and interpretability of attack impacts.We first conduct a validity test and sensitive analysis of the method.Then,prove its effectiveness through the experiments of five classification algorithms including artificial neural network(ANN),logistic regression(LR),support vector machine(SVM),convolutional neural network(CNN)and transformer against three adversarial attacks such as fast gradient sign method(FGSM),DeepFool,and projected gradient descent(PGD)attack.
基金supported by the National Natural Science Foundation of China under Grants No.61005004,No.61175011,No.61171193the Next-Generation Broadband Wireless Mobile Communications Network Technology Key Project under Grant No.2011ZX03002-005-01+2 种基金the One Church,One Family,One Purpose(111Project)under Grant No.B08004the Key Project of Ministry of Science and Technology of China under Grant No.2012ZX-03002019-002the National High Techni-cal Research and Development Program of China(863Program)under Grant No.2011A-A01A205
文摘The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.
基金Supported by Financial Support of China Geological Survey(1212010916048)the Fundamental Research Funds for the Central Universities(200903046)~~
文摘With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information about settlements in western Jilin Province, and the manually-extracted information about settlements in western Jilin Province was evaluated by confusion matrix. The results showed that Decision Tree Model was convenient for extracting settlements information by integrating spectral and texture features, and the accuracy of such a method was higher than that of the traditional Maximum Liklihood Method, in addition, calculation methods of extracting settlements information by this mean were concluded.
基金Project(52161135301)supported by the International Cooperation and Exchange of the National Natural Science Foundation of ChinaProject(202306370296)supported by China Scholarship Council。
文摘Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.
基金supported by the National Key Research & Development Plan of China (No. 2017YFB1002804)National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472)the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189)
文摘Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.
文摘The productivity and quality in the turning process can be improved by utilizing the predicted performance of the cutting tools.This research incorporates condition monitoring of a non-carbide tool insert using vibration analysis along with machine learning and fuzzy logic approach.A non-carbide tool insert is considered for the process of cutting operation in a semi-automatic lathe,where the condition of tool is monitored using vibration characteristics.The vibration signals for conditions such as heathy,damaged,thermal and flank were acquired with the help of piezoelectric transducer and data acquisition system.The descriptive statistical features were extracted from the acquired vibration signal using the feature extraction techniques.The extracted statistical features were selected using a feature selection process through J48 decision tree algorithm.The selected features were classified using J48 decision tree and fuzzy to develop the fault diagnosis model for the improved predictive analysis.The decision tree model produced the classification accuracy as 94.78%with five selected features.The developed fuzzy model produced the classification accuracy as 94.02%with five membership functions.Hence,the decision tree has been proposed as a suitable fault diagnosis model for predicting the tool insert health condition under different fault conditions.
文摘The requirement of fault diagnosis in the field of automobiles is growing higher day by day.The reliability of human resources for the fault diagnosis is uncertain.Brakes are one of the major critical components in automobiles that require closer and active observation.This research work demonstrates a fault diagnosis technique for monitoring the hydraulic brake system using vibration analysis.Vibration signals of a rotating element contain dynamic information about its health condition.Hence,the vibration signals were used for the brake fault diagnosis study.The study was carried out on a brake fault diagnosis experimental setup.The vibration signals under different fault conditions were acquired from the setup using an accelerometer.The condition monitoring of the hydraulic brake system using the vibration signal was processed using a machine learning approach.The machine learning approach has three phases,namely,feature extraction,feature selection,and feature classification.Histogram features were extracted from the vibration signals.The prominent features were selected using the decision tree.The selected features were classified using a fuzzy classifier.The histogram features and the fuzzy classifier combination produced maximum classification accuracy than that of the statistical features.
基金Science Research Project of Gansu Provincial Transportation Department(No.2017-012)
文摘The random forest model is universal and easy to understand, which is often used for classification and prediction. However, it uses non-selective integration and the majority rule to judge the final result, thus the difference between the decision trees in the model is ignored and the prediction accuracy of the model is reduced. Taking into consideration these defects, an improved random forest model based on confusion matrix (CM-RF)is proposed. The decision tree cluster is selectively constructed by the similarity measure in the process of constructing the model, and the result is output by using the dynamic weighted voting fusion method in the final voting session. Experiments show that the proposed CM-RF can reduce the impact of low-performance decision trees on the output result, thus improving the accuracy and generalization ability of random forest model.
基金The authors gratefully acknowledge the financial support from the National Natural Science Foundation of China(Grant Nos.52011530037 and 51904019)the Fundamental Research Funds for the Central Universities and the Youth Teacher International Exchange&Growth Program(Grant No.QNXM20210004).We also greatly appreciate the assistance provided by Kuangou coal mine,China Energy Group Xinjiang Energy Co.,Ltd.
文摘Rockbursts have become a significant hazard in underground mining,underscoring the need for a robust early warning model to ensure safety management.This study presents a novel approach for rockburst prediction,integrating the Mann-Kendall trend test(MKT)and multi-indices fusion to enable real-time and quantitative assessment of rockburst hazards.The methodology employed in this study involves the development of a comprehensive precursory index library for rockbursts.The MKT is then applied to analyze the real-time trend of each index,with adherence to rockburst characterization laws serving as the warning criterion.By employing a confusion matrix,the warning effectiveness of each index is assessed,enabling index preference determination.Ultimately,the integrated rockburst hazard index Q is derived through data fusion.The results demonstrate that the proposed model achieves a warning effectiveness of 0.563 for Q,surpassing the performance of any individual index.Moreover,the model’s adaptability and scalability are enhanced through periodic updates driven by actual field monitoring data,making it suitable for complex underground working environments.By providing an efficient and accurate basis for decision-making,the proposed model holds great potential for the prevention and control of rockbursts.It offers a valuable tool for enhancing safety measures in underground mining operations.
基金supported by the EIAS Data Science and Blockchain Lab,College of Computer and Information Sciences,Prince Sultan University,Riyadh Saudi Arabia.
文摘IIF(Indirect Immune Florescence)has gained much attention recently due to its importance in medical sciences.The primary purpose of this work is to highlight a step-by-step methodology for detecting autoimmune diseases.The use of IIF for detecting autoimmune diseases is widespread in different medical areas.Nearly 80 different types of autoimmune diseases have existed in various body parts.The IIF has been used for image classification in both ways,manually and by using the Computer-Aided Detection(CAD)system.The data scientists conducted various research works using an automatic CAD system with low accuracy.The diseases in the human body can be detected with the help of Transfer Learning(TL),an advanced Convolutional Neural Network(CNN)approach.The baseline paper applied the manual classification to the MIVIA dataset of Human Epithelial cells(HEP)type II cells and the Sub Class Discriminant(SDA)analysis technique used to detect autoimmune diseases.The technique yielded an accuracy of up to 90.03%,which was not reliable for detecting autoimmune disease in the mitotic cells of the body.In the current research,the work has been performed on the MIVIA data set of HEP type II cells by using four well-known models of TL.Data augmentation and normalization have been applied to the dataset to overcome the problem of overfitting and are also used to improve the performance of TL models.These models are named Inception V3,Dens Net 121,VGG-16,and Mobile Net,and their performance can be calculated through parameters of the confusion matrix(accuracy,precision,recall,and F1 measures).The results show that the accuracy value of VGG-16 is 78.00%,Inception V3 is 92.00%,Dense Net 121 is 95.00%,and Mobile Net shows 88.00%accuracy,respectively.Therefore,DenseNet-121 shows the highest performance with suitable analysis of autoimmune diseases.The overall performance highlighted that TL is a suitable and enhanced technique compared to its counterparts.Also,the proposed technique is used to detect autoimmune diseases with a minimal margin of errors and flaws.
文摘Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.
文摘Social media are interactive computer mediated technology that facilitates the sharing of information via virtual communities and networks. And Twitter is one of the most popular social media for social interaction and microblogging. This paper introduces an improved system model to analyze twitter data and detect terrorist attack event. In this model, a ternary search is used to find the weights of predefined keywords and the Aho-Corasick algorithm is applied to perform pattern matching and assign the weight which is the main contribution of this paper. Weights are categorized into three categories: Terror attack, Severe Terror Attack and Normal Data and the weights are used as attributes for classification. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are two machine learning algorithms used to predict whether a terror attack happened or not. We compare the accuracy with our actual data by using confusion matrix and measure whether our result is right or wrong and the achieved result shows that the proposed model performs better.
文摘Data clustering plays a vital role in object identification. In real life we mainly use the concept in biometric identification and object detection. In this paper we use Fuzzy Weighted Rules, Fuzzy Inference System (FIS), Fuzzy C-Mean clustering (FCM), Support Vector Machine (SVM) and Artificial Neural Network (ANN) to distinguish three types of Iris data called Iris-Setosa, Iris-Versicolor and Iris-Virginica. Each class in the data table is identified by four-dimensional vector, where vectors are used as the input variable called: Sepal Length (SL), Sepal Width (SW), Petal Length (PL) and Petal Width (PW). The combination of five machine learning methods provides above 98% accuracy of class identification.
基金supported in part by the National Natural Science Foundation of China(No.52037003)the Major Science and Technology Projects in Yunnan Province(No.202402AG050006).
文摘In this work,a consistency detection method is proposed,to overcome the inconsistencies in the use of large-scale lead-carbon energy storage batteries(LCESBs)and the difficulties of large-scale detection for LCESBs.Based on the chemical materials and physical mechanisms of LCESBs,the internal and external factors that affect the consistency and their characterization parameters are analyzed.The inconsistent characterization parameters,such as voltage,temperature,and resistance,are used to construct a high-dimensional random matrix and calculate the matrix eigenvalue.Single loop theorem and average spectral radius are then employed to carry out preliminary consistency detection.Next,short-term discharge experiments are conducted on individual batteries with inconsistent initial screening.The voltage and temperature data is collected,and sequential overlapping derivative(SOD)transformation is performed to extract the characteristics of voltage and temperature changes.The consistency of individual cells using the Wasserstein distance is quantitatively characterized.Finally,the reliability of the consistency detection method is evaluated by the confusion matrix.The large amounts of actual measurement data shows a false negative rate of the algorithm of 0 and an accuracy of 99.94%.This study shows that using random matrix theory for preliminary detection is suitable for processing high-dimensional data of large-scale energy storage power plants.Using SOD for precise detection can amplify the voltage,temperature,and resistance differences of inconsistent batteries,making the consistency detection more accurate.
基金The Undergraduate Innovation and Entrepreneurship Project-Intelligent Medical Assistance System for Lung Nodule Detection Based on Deep Learning(Project No.S202510656067)The Special Fund for Basic Scientific Research of Central Universities at Southwest Minzu University(Approval No.ZYN2024069)。
文摘Lung cancer is one of the malignancies with the highest incidence and mortality rates worldwide.Accurate detection of early lung nodules is crucial for improving patient survival.However,manual reading of medical images carries risks of missed or incorrect diagnoses,highlighting the urgent need for efficient and precise automated detection methods.This study constructs and optimizes a lung nodule detection model based on You Only Look Once version 8(YOLOv8).Model adjustments include tuning learning rate strategies,weight decay,and color augmentation parameters.Experiments were conducted on the standard LIDC-IDRI dataset.Model performance was evaluated using recall,Mean Average Precision(mAP),and the confusion matrix.The results indicate a recall rate of 0.87 and an mAP of 0.776,with a correct lung nodule recognition rate of 0.76 in the confusion matrix.The study demonstrates that rational parameter optimization can effectively reduce missed detections and misjudgments while significantly improving detection accuracy and classification stability.This provides a feasible technical pathway for computer-aided early lung cancer diagnosis.Future work may further enhance the model’s application value in intelligent medical imaging analysis by integrating data augmentation with parameter optimization.
文摘Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most of their products and programs in Machakos County have been reducing due to re-payment challenges,threatening their financial ability to extend further credit.This could be attributed to ineffective credit scoring models which are not able to establish the nuanced non-linear repayment behavior and patterns of the loan applicants.The research objective was to enhance credit risk scoring for microfinance institutions in Machakos County using supervised machine learning algorithms.The study adopted a mixed research design under supervised machine learning approach.It randomly sampled 6771 loan application ac-count records and repayment history.Rstudio and Python programming lan-guages were deployed for data pre-processing and analysis.Logistic regression algorithm,XG Boosting and the random forest ensemble method were used.Metric evaluations used included the performance accuracy,Area under the Curve and F1-Score.Based on the study findings:XG Boosting was the best performer with 83.3%accuracy and 0.202 Brier score.Development of legal framework to govern ethical and open use of machine learning assessment was recommended.A similar research but using different machine learning al-gorithms,locations,and institutions,to ascertain the validity,reliability and the generalizability of the study findings was recommended for further re-search.
基金supported by the Hebei Province Introduction of Studying Abroad Talent Funded Project (No. C20200302)the Opening Fund of Hebei Key Laboratory of Machine Learning and Computational Intelligence (Nos. 2019-2021-A and ZZ201909-202109-1)+1 种基金the National Natural Science Foundation of China (No. 61976141)the Social Science Foundation of Hebei Province (No. HB20TQ005)。
文摘The performances of semisupervised clustering for unlabeled data are often superior to those of unsupervised learning,which indicates that semantic information attached to clusters can significantly improve feature representation capability.In a graph convolutional network(GCN),each node contains information about itself and its neighbors that is beneficial to common and unique features among samples.Combining these findings,we propose a deep clustering method based on GCN and semantic feature guidance(GFDC) in which a deep convolutional network is used as a feature generator,and a GCN with a softmax layer performs clustering assignment.First,the diversity and amount of input information are enhanced to generate highly useful representations for downstream tasks.Subsequently,the topological graph is constructed to express the spatial relationship of features.For a pair of datasets,feature correspondence constraints are used to regularize clustering loss,and clustering outputs are iteratively optimized.Three external evaluation indicators,i.e.,clustering accuracy,normalized mutual information,and the adjusted Rand index,and an internal indicator,i.e., the Davidson-Bouldin index(DBI),are employed to evaluate clustering performances.Experimental results on eight public datasets show that the GFDC algorithm is significantly better than the majority of competitive clustering methods,i.e.,its clustering accuracy is20% higher than the best clustering method on the United States Postal Service dataset.The GFDC algorithm also has the highest accuracy on the smaller Amazon and Caltech datasets.Moreover,DBI indicates the dispersion of cluster distribution and compactness within the cluster.
基金supported by the China Agriculture Research System (CARS-170404)Qingyuan Science and Technology Plan (Grant No.2022KJJH063)Guangzhou Science and Technology Plan (Grant No.201903010063).
文摘Various diseases seriously affect the quality and yield of tomatoes. Fast and accurate identification of disease types is of great significance for the development of smart agriculture. Many Convolution Neural Network (CNN) models have been applied to the identification of tomato leaf diseases and achieved good results. However, some of these are executed at the cost of large calculation time and huge storage space. This study proposed a lightweight CNN model named MFRCNN, which is established by the multi-scale and feature reuse structure rather than simply stacking convolution layer by layer. To examine the model performances, two types of tomato leaf disease datasets were collected. One is the laboratory-based dataset, including one healthy and nine diseases, and the other is the field-based dataset, including five kinds of diseases. Afterward, the proposed MFRCNN and some popular CNN models (AlexNet, SqueezeNet, VGG16, ResNet18, and GoogLeNet) were tested on the two datasets. The results showed that compared to traditional models, the MFRCNN achieved the optimal performance, with an accuracy of 99.01% and 98.75% in laboratory and field datasets, respectively. The MFRCNN not only had the highest accuracy but also had relatively less computing time and few training parameters. Especially in terms of storage space, the MFRCNN model only needs 2.7 MB of space. Therefore, this work provides a novel solution for plant disease diagnosis, which is of great importance for the development of plant disease diagnosis systems on low-performance terminals.
基金supported in part by National Science Foundation, USA grants CNS1954556 and CNS 1932033.
文摘Lean combustion is environment friendly with low NO_(x)emissions providing better fuel efficiency in a combustion system.However,approaching towards lean combustion can make engines more susceptible to an undesirable phenomenon called lean blowout(LBO)that can cause flame extinction leading to sudden loss of power.During the design stage,it is quite challenging for the scientists to accurately determine the optimal operating limits to avoid sudden LBO occurrences.Therefore,it is crucial to develop accurate and computationally tractable frameworks for online LBO prediction in low NO_(x)emission engines.To the best of our knowledge,for the first time,we propose a deep learning approach to detect the transition to LBO in combustion systems.In this work,we utilize a laboratory-scale swirl-stabilized combustor to collect acoustic data for different protocols.For each protocol,starting far from LBO,we gradually move towards the LBO regime,capturing a quasi-static time series dataset at different conditions.Using one of the protocols in our dataset as the reference protocol,we find a transition state metric for our trained deep learning model to detect the imminent LBO in other test protocols.We find that our proposed approach is more precise and computationally faster than other baseline models to detect the transition to LBO.Therefore,we endorse this technique for monitoring the operation of lean combustion engines in real time.
文摘Broiler flock welfare is usually assessed through mortality,physiology,behavior,and walk-ing ability.The possibility of assessing broiler chicken lameness using the bird walking abil-ity was investigated using the machine learning approach for the first time.Data on broiler walking speed and acceleration,genetic strain,and sex were recorded and input in a data-set.Broilers were classified according to the 6-point gait score(GS0 is a sound bird,and GS5 is a severely lame bird).Decision trees were built initially using all datasets.The confusion matrix of each developed model was analyzed.The pruning technique was used,removing from the dataset the variables that did not infer in the classification results.We reorganized the dataset and re-arranged the data by grouping the intermediate target class of gait score using the Borda Count method.Re-processing data,we obtained a new set of decision trees.Using the 3-point gait score(GS0 is a sound bird,and GS2 is a lame bird),we obtained a new model with better accuracy(78%);however,the model had a lower accuracy for classifying lame broilers(GS2,5%).The final decision tree was selected for classifying broilers,either sound or lame,according to their walking speed.The developed model presented good accuracy(91%),and it ordered properly sound(86%)and lame birds(92%).The novel model might be used to assess broiler lameness on-farm by registering the bird displacement velocity.Further developments using the model might allow flock lameness detection automatically.