The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep...The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.展开更多
As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medi...As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.展开更多
Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique ...Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.展开更多
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry...This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.展开更多
AIM:To investigate the efficacy and safety of repeated dexamethasone implants with real-life data in eyes with naive retinal vein occlusion(RVO)with macular edema(ME)at a minimum of 60mo follow-up.METHODS:In this retr...AIM:To investigate the efficacy and safety of repeated dexamethasone implants with real-life data in eyes with naive retinal vein occlusion(RVO)with macular edema(ME)at a minimum of 60mo follow-up.METHODS:In this retrospective cohort study,the data about best corrected visual acuity(BCVA),central macular thickness(CMT),serous macular detachment(SMD),hard exudate,hyperreflective foci(HRF),cystoid degeneration,pearl necklace sign,epiretinal membrane(ERM),disorganization of retinal inner layers(DRIL),ellipsoid zone and external limiting membrane(EZ-ELM)integrity,intraocular pressure(IOP)and lens condition were recorded.RESULTS:Thirty-eight eyes of 38 patients were included in the study.Thirteen patients presented with central RVO(CRVO)and 25 with branch RVO(BRVO).The mean follow-up time was 69.9±15.8mo,and the mean number of injections was 7.9±4.0.The mean BCVA gain was 25.0±36 letters,and this difference was statistically significant(P=0.021).The BCVA gain was 19.4±20.4 letters in the CRVO group,and 26.5±38.6 letters in the BRVO group(P=0.763).Besides,21(55.2%)of the patients achieved≥15 letters improvement.At the end of the follow-up period,SMD was not observed in any of the patients(P=0.016).Hard exudate,HRF number were decreased;while DRIL,ERM and EZ-ELM defects were increased but not significantly.CONCLUSION:Intravitreal dexamethasone monotherapy is an effective and safe treatment option for the treatment-naive RVO-ME patients in the long-term follow-up.展开更多
N6-methyladenosine(m^(6)A)plays crucial roles in development and cellular reprogramming.During embryonic development,pluripotency transitions from a naïve to a primed state,and modeling the reverse primed-to-na...N6-methyladenosine(m^(6)A)plays crucial roles in development and cellular reprogramming.During embryonic development,pluripotency transitions from a naïve to a primed state,and modeling the reverse primed-to-naïve transition(PNT)provides a valuable framework for investigating pluripotency regulation.Here,we show that inhibiting METTL3 significantly promotes PNT in an m^(6)A-dependent manner.Mechanistically,we found that suppressing METTL3 and YTHDF2 prolongs the lifetimes of pluripotency-associated mRNAs,such as Nanog and Sox2,during PNT.In addition,Gstp1 was identified as a downstream target of METTL3 inhibition and YTHDF2 knockout.Gstp1 overexpression enhances PNT,whereas its inhibition impedes the transition.Overall,our findings suggest that YTHDF2 facilitates the removal of pluripotency gene transcripts and Gstp1,thereby promoting PNT reprogramming through m^(6)A-mediated posttranscriptional control.展开更多
<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span&...<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>展开更多
The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic ...The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic cost of these fatalities has been estimated to be in the millions of dollars.It is therefore necessary to investigate the predictability of the occurrence of theses crashes,based on pertinent factors,in order to provide mitigating measures.This research focused on the development of models to predict the injury severity of crashes using support vector machines(SVMs)and Gaussian naïve Bayes classifiers(GNBCs).The models were developed based on 3307 crashes that occurred from 2008 to 2015.Eight SVM models and a GNBC model were developed.The most accurate model was the SVM with a radial basis kernel function.This model predicted the severity of an injury sustained in a crash with an accuracy of approximately 83.2%.The GNBC produced the worst-performing model with an accuracy of 48.5%.These models will enable transport officials to identify crash-prone unsignalized intersections to provide the necessary countermeasures beforehand.展开更多
In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method...In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method for detecting and categorizing attacks within the Internet of Things(IoT)environment,leveraging the NSL-KDD dataset.To achieve high accuracy,the authors used the feature extraction technique in combination with an autoencoder,integrated with a gated recurrent unit(GRU).Therefore,the accurate features are selected by using the cuckoo search algorithm integrated particle swarm optimization(PSO),and PSO has been employed for training the features.The final classification of features has been carried out by using the proposed RF-GNB random forest with the Gaussian Naïve Bayes classifier.The proposed model has been evaluated and its performance is verified with some of the standard metrics such as precision,accuracy rate,recall F1-score,etc.,and has been compared with different existing models.The generated results that detected approximately 99.87%of intrusions within the IoT environments,demonstrated the high performance of the proposed method.These results affirmed the efficacy of the proposed method in increasing the accuracy of intrusion detection within IoT network systems.展开更多
Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, ...Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.展开更多
Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the pare...Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the parents.We determined the prediction performance of the two models with the overall accuracy and the area under the receiver operating curve(AUC).The overall accuracy for both models ranged between 82%and 87%.The values of the area under the receiver operating curvewere 0.90 or higher for Logistic Regression models,and 0.85 or higher for Gaussian Naïve Bayesmodels.These statistics indicated that the two models were very effective in predicting the genotypes of the offspring.Furthermore,bothmodels predicted the BLC genotype with higher accuracy than they did the JNE genotype.The BLC genotype appeared more homogeneous and more predictable.A Chi-square test for the homogeneity of the confusionmatrices showed that in all cases the twomodels produced similar prediction results.That finding was in line with the assertion by Mitchell(2010)who theoretically showed that the twomodels are essentially the same.With logistic regression,each subset of the original data or its corresponding principal components produced exactly the same prediction results.The AUC value may be viewed as a criterion for parent-offspring resemblance for each set of phenotypic traits considered in the analysis.展开更多
Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.M...Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.Machine learn-ing methods for land use and land cover(LULC)classification are vital for monitoring environmental changes.Remote sensing advancements increase the potential for classifying land cover,which requires assessing algorithm ac-curacy and efficiency for fragile environments.This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensi-tive ecosystems.Landsat-9 imagery from January to April 2023 facilitated land use class identification.Preprocessing in the Google Earth Engine applied spec-tral indices such as the NDVI,NDWI,BSI,and NDBI.Supervised classification uses random forest(RF),support vector machine(SVM),classification and re-gression trees(CARTs),gradient boosting trees(GBTs),and naïve Bayes.An accuracy assessment was used to determine the optimal classifiers for future land use analyses.The evaluation revealed that the RF model achieved 84.4%accuracy with a 0.85 weighted F1 score,indicating its effectiveness for complex LULC data.In contrast,the GBT and CART methods yielded moderate F1 scores(0.77 and 0.68),indicating the presence of overclassification and class imbalance issues.The SVM and naïve Bayes methods were less accurate,ren-dering them unsuitable for LULC tasks.RF is optimal for monitoring and plan-ning land use in dynamic arid areas.Future research should explore hybrid methods and diversify training sites to improve performance.展开更多
In the era of rapid digital transformation,social networks generate huge amounts of textual data every day,making sentiment analysis an essential tool for understanding public opinion.This study focuses on the applica...In the era of rapid digital transformation,social networks generate huge amounts of textual data every day,making sentiment analysis an essential tool for understanding public opinion.This study focuses on the application of probabilistic and statistical methods to sentiment analysis in social networks,highlighting their effectiveness in dealing with uncertainty and modeling the distribution of emotions.The main objective is to evaluate the role of Naïve Bayesian(NB),Hidden Markov Models(HMMs),and Bayesian networks in emotion classification,emotion propagation,and dynamic emotion tracking.Through literature review and comparative analysis,this study examines the existing research,computational efficiency,and real-world applications of probabilistic classification models.The results show that Naive Bayes is computationally efficient and effective for large-scale emotion classification,while HMM and Bayesian networks excel in sequential emotion prediction and user behavior modeling.The study highlights the advantages of probabilistic methods in sentiment analysis,while acknowledging their limitations,such as their reliance on probabilistic assumptions and the challenges of capturing deep contextual semantics.Future research should explore hybrid approaches that combine probabilistic models with deep learning techniques to improve the predictive performance and scalability of real-time sentiment analysis.展开更多
Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automa...Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automated diagnosis in TCM.We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph.There are two kinds of path patterns in the knowledge graph:one-hop and two-hop.The one-hop path pattern maps the symptom to syndromes immediately.The two-hop path pattern maps the symptom to syndromes through the nature of disease,etiology,and pathomechanism to support the diagnostic reasoning.Considering the different support strengths for the knowledge paths in reasoning,we design a dynamic weight mechanism.We utilize Naïve Bayes and TF-IDF to implement the reasoning method and the weighted score calculation.The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph based on the reasoning path patterns.We evaluate the method with clinical records and clinical practice in hospitals.The preliminary results suggest that the method achieves high performance and can help TCM doctors make better diagnosis decisions in practice.Meanwhile,the method is robust and explainable under the guide of the knowledge graph.It could help TCM physicians,especially primary physicians in rural areas,and provide clinical decision support in clinical practice.展开更多
:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates cha...:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews.展开更多
Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its i...Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its indirect damage to the social fabric and peace but also the more direct negative impacts on the economy,social parameters,and reputation of a nation.Policing and other preventive resources are limited and have to be utilized.The conventional methods are being superseded by more modern approaches of machine learning algorithms capable of making predictions where the relationships between the features and the outcomes are complex.Making it possible for such algorithms to provide indicators of specific areas that may become criminal hot-spots.These predictions can be used by policymakers and police personals alike to make effective and informed strategies that can curtail criminal activities and contribute to the nation’s development.This paper aims to predict factors that most affected crimes in Saudi Arabia by developing a machine learning model to predict an acceptable output value.Our results show that FAMD as features selection methods showed more accuracy on machine learning classifiers than the PCA method.The naïve Bayes classifier performs better than other classifiers on both features selections methods with an accuracy of 97.53%for FAMD,and PCA equals to 97.10%.展开更多
The rapid progress of the Internet has exposed networks to an increasednumber of threats. Intrusion detection technology can effectively protect networksecurity against malicious attacks. In this paper, we propose a R...The rapid progress of the Internet has exposed networks to an increasednumber of threats. Intrusion detection technology can effectively protect networksecurity against malicious attacks. In this paper, we propose a ReliefF-P-NaiveBayes and softmax regression (RP-NBSR) model based on machine learningfor network attack detection to improve the false detection rate and F1 score ofunknown intrusion behavior. In the proposed model, the Pearson correlation coef-ficient is introduced to compensate for deficiencies in correlation analysis betweenfeatures by the ReliefF feature selection algorithm, and a ReliefF-Pearson correlation coefficient (ReliefF-P) algorithm is proposed. Then, the Relief-P algorithm isused to preprocess the UNSW-NB15 dataset to remove irrelevant features andobtain a new feature subset. Finally, naïve Bayes and softmax regression (NBSR)classifier is constructed by cascading the naïve Bayes classifier and softmaxregression classifier, and an attack detection model based on RP-NBSR is established. The experimental results on the UNSW-NB15 dataset show that the attackdetection model based on RP-NBSR has a lower false detection rate and higherF1 score than other detection models.展开更多
Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims a...Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims and society as a whole.It results from a misunderstanding regarding freedom of speech.In this work,we proposed a methodology for detecting such behaviors(bullying,harassment,and hate-related texts)using supervised machine learning algo-rithms(SVM,Naïve Bayes,Logistic regression,and random forest)and for predicting a topic associated with these text data using unsupervised natural language processing,such as latent Dirichlet allocation.In addition,we used accuracy,precision,recall,and F1 score to assess prior classifiers.Results show that the use of logistic regression,support vector machine,random forest model,and Naïve Bayes has 95%,94.97%,94.66%,and 93.1%accuracy,respectively.展开更多
文摘The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.
文摘As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.
文摘Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.
文摘This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.
文摘AIM:To investigate the efficacy and safety of repeated dexamethasone implants with real-life data in eyes with naive retinal vein occlusion(RVO)with macular edema(ME)at a minimum of 60mo follow-up.METHODS:In this retrospective cohort study,the data about best corrected visual acuity(BCVA),central macular thickness(CMT),serous macular detachment(SMD),hard exudate,hyperreflective foci(HRF),cystoid degeneration,pearl necklace sign,epiretinal membrane(ERM),disorganization of retinal inner layers(DRIL),ellipsoid zone and external limiting membrane(EZ-ELM)integrity,intraocular pressure(IOP)and lens condition were recorded.RESULTS:Thirty-eight eyes of 38 patients were included in the study.Thirteen patients presented with central RVO(CRVO)and 25 with branch RVO(BRVO).The mean follow-up time was 69.9±15.8mo,and the mean number of injections was 7.9±4.0.The mean BCVA gain was 25.0±36 letters,and this difference was statistically significant(P=0.021).The BCVA gain was 19.4±20.4 letters in the CRVO group,and 26.5±38.6 letters in the BRVO group(P=0.763).Besides,21(55.2%)of the patients achieved≥15 letters improvement.At the end of the follow-up period,SMD was not observed in any of the patients(P=0.016).Hard exudate,HRF number were decreased;while DRIL,ERM and EZ-ELM defects were increased but not significantly.CONCLUSION:Intravitreal dexamethasone monotherapy is an effective and safe treatment option for the treatment-naive RVO-ME patients in the long-term follow-up.
基金supported by the National Key R&D Program of China(2021YFA1102200,2024YFA1802300 and 2024YFA1107000)The National Natural Science Foundation of China(32225012)+4 种基金Major Project of Guangzhou National Laboratory(GZNL2023A02005)Guangdong Basic and Applied Basic Research Foundation(2025A1515012426,2023A1515010420)Science and Technology Projects in Guangzhou(2023A04J0724)Science and Technology Planning Project of Guangdong Province,China(2023B1212060050,2023B1212120009)Health@InnoHK Program launched by Innovation Technology Commission of the Hong Kong SAR,P.R.China.
文摘N6-methyladenosine(m^(6)A)plays crucial roles in development and cellular reprogramming.During embryonic development,pluripotency transitions from a naïve to a primed state,and modeling the reverse primed-to-naïve transition(PNT)provides a valuable framework for investigating pluripotency regulation.Here,we show that inhibiting METTL3 significantly promotes PNT in an m^(6)A-dependent manner.Mechanistically,we found that suppressing METTL3 and YTHDF2 prolongs the lifetimes of pluripotency-associated mRNAs,such as Nanog and Sox2,during PNT.In addition,Gstp1 was identified as a downstream target of METTL3 inhibition and YTHDF2 knockout.Gstp1 overexpression enhances PNT,whereas its inhibition impedes the transition.Overall,our findings suggest that YTHDF2 facilitates the removal of pluripotency gene transcripts and Gstp1,thereby promoting PNT reprogramming through m^(6)A-mediated posttranscriptional control.
文摘<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>
文摘The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic cost of these fatalities has been estimated to be in the millions of dollars.It is therefore necessary to investigate the predictability of the occurrence of theses crashes,based on pertinent factors,in order to provide mitigating measures.This research focused on the development of models to predict the injury severity of crashes using support vector machines(SVMs)and Gaussian naïve Bayes classifiers(GNBCs).The models were developed based on 3307 crashes that occurred from 2008 to 2015.Eight SVM models and a GNBC model were developed.The most accurate model was the SVM with a radial basis kernel function.This model predicted the severity of an injury sustained in a crash with an accuracy of approximately 83.2%.The GNBC produced the worst-performing model with an accuracy of 48.5%.These models will enable transport officials to identify crash-prone unsignalized intersections to provide the necessary countermeasures beforehand.
基金the Deanship of Scientific Research at Shaqra University for funding this research work through the project number(SU-ANN-2023051).
文摘In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method for detecting and categorizing attacks within the Internet of Things(IoT)environment,leveraging the NSL-KDD dataset.To achieve high accuracy,the authors used the feature extraction technique in combination with an autoencoder,integrated with a gated recurrent unit(GRU).Therefore,the accurate features are selected by using the cuckoo search algorithm integrated particle swarm optimization(PSO),and PSO has been employed for training the features.The final classification of features has been carried out by using the proposed RF-GNB random forest with the Gaussian Naïve Bayes classifier.The proposed model has been evaluated and its performance is verified with some of the standard metrics such as precision,accuracy rate,recall F1-score,etc.,and has been compared with different existing models.The generated results that detected approximately 99.87%of intrusions within the IoT environments,demonstrated the high performance of the proposed method.These results affirmed the efficacy of the proposed method in increasing the accuracy of intrusion detection within IoT network systems.
文摘Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.
文摘Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the parents.We determined the prediction performance of the two models with the overall accuracy and the area under the receiver operating curve(AUC).The overall accuracy for both models ranged between 82%and 87%.The values of the area under the receiver operating curvewere 0.90 or higher for Logistic Regression models,and 0.85 or higher for Gaussian Naïve Bayesmodels.These statistics indicated that the two models were very effective in predicting the genotypes of the offspring.Furthermore,bothmodels predicted the BLC genotype with higher accuracy than they did the JNE genotype.The BLC genotype appeared more homogeneous and more predictable.A Chi-square test for the homogeneity of the confusionmatrices showed that in all cases the twomodels produced similar prediction results.That finding was in line with the assertion by Mitchell(2010)who theoretically showed that the twomodels are essentially the same.With logistic regression,each subset of the original data or its corresponding principal components produced exactly the same prediction results.The AUC value may be viewed as a criterion for parent-offspring resemblance for each set of phenotypic traits considered in the analysis.
文摘Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.Machine learn-ing methods for land use and land cover(LULC)classification are vital for monitoring environmental changes.Remote sensing advancements increase the potential for classifying land cover,which requires assessing algorithm ac-curacy and efficiency for fragile environments.This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensi-tive ecosystems.Landsat-9 imagery from January to April 2023 facilitated land use class identification.Preprocessing in the Google Earth Engine applied spec-tral indices such as the NDVI,NDWI,BSI,and NDBI.Supervised classification uses random forest(RF),support vector machine(SVM),classification and re-gression trees(CARTs),gradient boosting trees(GBTs),and naïve Bayes.An accuracy assessment was used to determine the optimal classifiers for future land use analyses.The evaluation revealed that the RF model achieved 84.4%accuracy with a 0.85 weighted F1 score,indicating its effectiveness for complex LULC data.In contrast,the GBT and CART methods yielded moderate F1 scores(0.77 and 0.68),indicating the presence of overclassification and class imbalance issues.The SVM and naïve Bayes methods were less accurate,ren-dering them unsuitable for LULC tasks.RF is optimal for monitoring and plan-ning land use in dynamic arid areas.Future research should explore hybrid methods and diversify training sites to improve performance.
文摘In the era of rapid digital transformation,social networks generate huge amounts of textual data every day,making sentiment analysis an essential tool for understanding public opinion.This study focuses on the application of probabilistic and statistical methods to sentiment analysis in social networks,highlighting their effectiveness in dealing with uncertainty and modeling the distribution of emotions.The main objective is to evaluate the role of Naïve Bayesian(NB),Hidden Markov Models(HMMs),and Bayesian networks in emotion classification,emotion propagation,and dynamic emotion tracking.Through literature review and comparative analysis,this study examines the existing research,computational efficiency,and real-world applications of probabilistic classification models.The results show that Naive Bayes is computationally efficient and effective for large-scale emotion classification,while HMM and Bayesian networks excel in sequential emotion prediction and user behavior modeling.The study highlights the advantages of probabilistic methods in sentiment analysis,while acknowledging their limitations,such as their reliance on probabilistic assumptions and the challenges of capturing deep contextual semantics.Future research should explore hybrid approaches that combine probabilistic models with deep learning techniques to improve the predictive performance and scalability of real-time sentiment analysis.
基金This work is supported by the National Key Research and Development Program of China under Grant 2017YFB1002304the China Scholarship Council under Grant 201906465021.
文摘Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automated diagnosis in TCM.We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph.There are two kinds of path patterns in the knowledge graph:one-hop and two-hop.The one-hop path pattern maps the symptom to syndromes immediately.The two-hop path pattern maps the symptom to syndromes through the nature of disease,etiology,and pathomechanism to support the diagnostic reasoning.Considering the different support strengths for the knowledge paths in reasoning,we design a dynamic weight mechanism.We utilize Naïve Bayes and TF-IDF to implement the reasoning method and the weighted score calculation.The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph based on the reasoning path patterns.We evaluate the method with clinical records and clinical practice in hospitals.The preliminary results suggest that the method achieves high performance and can help TCM doctors make better diagnosis decisions in practice.Meanwhile,the method is robust and explainable under the guide of the knowledge graph.It could help TCM physicians,especially primary physicians in rural areas,and provide clinical decision support in clinical practice.
基金This research is funded by Taif University, TURSP-2020/115.
文摘:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews.
文摘Crimes are expected to rise with an increase in population and the rising gap between society’s income levels.Crimes contribute to a significant portion of the socioeconomic loss to any society,not only through its indirect damage to the social fabric and peace but also the more direct negative impacts on the economy,social parameters,and reputation of a nation.Policing and other preventive resources are limited and have to be utilized.The conventional methods are being superseded by more modern approaches of machine learning algorithms capable of making predictions where the relationships between the features and the outcomes are complex.Making it possible for such algorithms to provide indicators of specific areas that may become criminal hot-spots.These predictions can be used by policymakers and police personals alike to make effective and informed strategies that can curtail criminal activities and contribute to the nation’s development.This paper aims to predict factors that most affected crimes in Saudi Arabia by developing a machine learning model to predict an acceptable output value.Our results show that FAMD as features selection methods showed more accuracy on machine learning classifiers than the PCA method.The naïve Bayes classifier performs better than other classifiers on both features selections methods with an accuracy of 97.53%for FAMD,and PCA equals to 97.10%.
基金supported by the National Natural Science Foundation of China(61300216,Wang,H,www.nsfc.gov.cn).
文摘The rapid progress of the Internet has exposed networks to an increasednumber of threats. Intrusion detection technology can effectively protect networksecurity against malicious attacks. In this paper, we propose a ReliefF-P-NaiveBayes and softmax regression (RP-NBSR) model based on machine learningfor network attack detection to improve the false detection rate and F1 score ofunknown intrusion behavior. In the proposed model, the Pearson correlation coef-ficient is introduced to compensate for deficiencies in correlation analysis betweenfeatures by the ReliefF feature selection algorithm, and a ReliefF-Pearson correlation coefficient (ReliefF-P) algorithm is proposed. Then, the Relief-P algorithm isused to preprocess the UNSW-NB15 dataset to remove irrelevant features andobtain a new feature subset. Finally, naïve Bayes and softmax regression (NBSR)classifier is constructed by cascading the naïve Bayes classifier and softmaxregression classifier, and an attack detection model based on RP-NBSR is established. The experimental results on the UNSW-NB15 dataset show that the attackdetection model based on RP-NBSR has a lower false detection rate and higherF1 score than other detection models.
文摘Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims and society as a whole.It results from a misunderstanding regarding freedom of speech.In this work,we proposed a methodology for detecting such behaviors(bullying,harassment,and hate-related texts)using supervised machine learning algo-rithms(SVM,Naïve Bayes,Logistic regression,and random forest)and for predicting a topic associated with these text data using unsupervised natural language processing,such as latent Dirichlet allocation.In addition,we used accuracy,precision,recall,and F1 score to assess prior classifiers.Results show that the use of logistic regression,support vector machine,random forest model,and Naïve Bayes has 95%,94.97%,94.66%,and 93.1%accuracy,respectively.