AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along w...AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along with patient medical records,were retrospectively collected from 3244 cases.Diagnostic models were constructed using logistic regression(LR),decision tree(DT),support vector machine(SVM),extreme gradient boosting(XGBoost),and deep learning(DL)algorithms.A total of 2757 diplopia images were randomly selected as training data,while the test dataset contained 487 diplopia images.The optimal diagnostic model was evaluated using test set accuracy,confusion matrix,and precision-recall curve(P-R curve).RESULTS:The test set accuracy of the LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.762,0.811,0.818,0.812,0.858 and 0.858,respectively.The accuracy in the training set was 0.785,0.815,0.998,0.965,0.968,and 0.967,respectively.The weighted precision of LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.74,0.77,0.83,0.80,0.85,and 0.85,respectively;weighted recall was 0.76,0.81,0.82,0.81,0.86,and 0.86,respectively;weighted F1 score was 0.74,0.79,0.82,0.80,0.85,and 0.85,respectively.CONCLUSION:In this study,the 7 machine learning algorithms all achieve automatic diagnosis of extraocular muscle palsy.The DL(64 categories)and DL(6 binary classifications)algorithms have a significant advantage over other machine learning algorithms regarding diagnostic accuracy on the test set,with a high level of consistency with clinical diagnoses made by physicians.Therefore,it can be used as a reference for diagnosis.展开更多
Gradient boosting decision tree(GBDT)machine learning(ML)method was adopted for the first time to automatically recognize and conduct quantitative statistical analysis of boundaries in bainitic microstructure using el...Gradient boosting decision tree(GBDT)machine learning(ML)method was adopted for the first time to automatically recognize and conduct quantitative statistical analysis of boundaries in bainitic microstructure using electron back-scatter diffraction(EBSD)data.In spite of lack of large sets of EBSD data,we were successful in achieving the desired accuracy and accomplishing the objective of recognizing the boundaries.Compared with a low model accuracy of<50%as using Euler angles or axis-angle pair as characteristic features,the accuracy of the model was significantly enhanced to about 88%when the Euler angle was converted to overall misorientation angle(OMA)and specific misorientation angle(SMA)and considered as important features.In this model,the recall score of prior austenite grain(PAG)boundary was~93%,high angle packet boundary(OMA>40°)was~97%,and block boundary was~96%.The derived outcomes of ML were used to obtain insights into the ductile-to-brittle transition(DBTT)behavior.Interestingly,ML modeling approach suggested that DBTT was not determined by the density of high angle grain boundaries,but significantly influenced by the density of PAG and packet boundaries.The study underscores that ML has a great potential in detailed recognition of complex multi-hierarchical microstructure such as bainite and martensite and relates to material performance.展开更多
Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the mo...Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.展开更多
With the rapid development of social economy,the society has entered into a new stage of development,especially in new media under the background of rapid development,makes the importance of news and information to ge...With the rapid development of social economy,the society has entered into a new stage of development,especially in new media under the background of rapid development,makes the importance of news and information to get the comprehensive promotion,and in order to further identify the positive and negative news,should be fully using machine learning methods,based on the emotion to realize the automatic classifying of news,in order to improve the efficiency of news classification.Therefore,the article first makes clear the basic outline of news sentiment classification.Secondly,the specific way of automatic classification of news emotion is deeply analyzed.On the basis of this,the paper puts forward the concrete measures of automatic classification of news emotion by using machine learning.展开更多
Perinatal hypoxic-ischemic-encephalopathy significantly contributes to neonatal death and life-long disability such as cerebral palsy. Advances in signal processing and machine learning have provided the research comm...Perinatal hypoxic-ischemic-encephalopathy significantly contributes to neonatal death and life-long disability such as cerebral palsy. Advances in signal processing and machine learning have provided the research community with an opportunity to develop automated real-time identification techniques to detect the signs of hypoxic-ischemic-encephalopathy in larger electroencephalography/amplitude-integrated electroencephalography data sets more easily. This review details the recent achievements, performed by a number of prominent research groups across the world, in the automatic identification and classification of hypoxic-ischemic epileptiform neonatal seizures using advanced signal processing and machine learning techniques. This review also addresses the clinical challenges that current automated techniques face in order to be fully utilized by clinicians, and highlights the importance of upgrading the current clinical bedside sampling frequencies to higher sampling rates in order to provide better hypoxic-ischemic biomarker detection frameworks. Additionally, the article highlights that current clinical automated epileptiform detection strategies for human neonates have been only concerned with seizure detection after the therapeutic latent phase of injury. Whereas recent animal studies have demonstrated that the latent phase of opportunity is critically important for early diagnosis of hypoxic-ischemic-encephalopathy electroencephalography biomarkers and although difficult, detection strategies could utilize biomarkers in the latent phase to also predict the onset of future seizures.展开更多
The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an au...The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an auto machine learning(AutoML)-based approach is proposed to precisely solve the issue.Seven input parameters are considered in the database covering two physical aspects,namely soil property,and spatial characteristics of the deep excavation.The 10-fold cross-validation method is employed to overcome the scarcity of data,and promote model’s robustness.Six genetic algorithm(GA)-ML models are established as well for comparison.The results indicated that the proposed AutoML model is a comprehensive model that integrates efficiency and robustness.Importance analysis reveals that the ratio of the average shear strength to the vertical effective stress E_(ur)/σ′_(v),the excavation depth H,and the excavation width B are the most influential variables for the displacements.Finally,the AutoML model is further validated by practical engineering.The prediction results are in a good agreement with monitoring data,signifying that our model can be applied in real projects.展开更多
Agriculture 4.0,as the future of farming technology,comprises numerous key enabling technologies towards sustainable agriculture.The use of state-of-the-art technologies,such as the Internet of Things,transform tradit...Agriculture 4.0,as the future of farming technology,comprises numerous key enabling technologies towards sustainable agriculture.The use of state-of-the-art technologies,such as the Internet of Things,transform traditional cultivation practices,like irrigation,to modern solutions of precision agriculture.To achieve effectivewater resource usage and automated irrigation in precision agriculture,recent technologies like machine learning(ML)can be employed.With this motivation,this paper design an IoT andML enabled smart irrigation system(IoTML-SIS)for precision agriculture.The proposed IoTML-SIS technique allows to sense the parameters of the farmland and make appropriate decisions for irrigation.The proposed IoTML-SIS model involves different IoT based sensors for soil moisture,humidity,temperature sensor,and light.Besides,the sensed data are transmitted to the cloud server for processing and decision making.Moreover,artificial algae algorithm(AAA)with least squares-support vector machine(LS-SVM)model is employed for the classification process to determine the need for irrigation.Furthermore,the AAA is applied to optimally tune the parameters involved in the LS-SVM model,and thereby the classification efficiency is significantly increased.The performance validation of the proposed IoTML-SIS technique ensured better performance over the compared methods with the maximum accuracy of 0.975.展开更多
Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by h...Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by human scientists.This paper presents the experimental results of an automatic machine learning system which derives forecasting rules from real observational data.We tested the system on the two large real data sets from the areas of centra! China and Victoria of Australia.The experimental results show that the forecasting rules discovered by the system are very competitive to human experts.The forecasting accuracy rates are 86.4% and 78% of the two data sets respectively展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary ...Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.展开更多
The significance of land use classification has garnered attention due to its implications for climate and ecosystems.This paper establishes a connection by introducing and applying automatic machine learning(Auto ML)...The significance of land use classification has garnered attention due to its implications for climate and ecosystems.This paper establishes a connection by introducing and applying automatic machine learning(Auto ML)techniques to salt lake landscape,with a specific focus on the Qarhan Salt Lake area.Utilizing Landsat-5 Thematic Mappe(TM)and Landsat-8 Operational Land Imager(OLI)imagery,six machine learning algorithms were employed to classify eight land use types from 2000 to 2020.Results show that XGBLD performed optimally with 77%accuracy.Over two decades,salt fields,construction land,and water areas increased due to transformations in saline land and salt flats.The exposed lakes area exhibited a rise followed by a decline,mainly transforming into salt flats.Agricultural land areas slightly increased,influenced by both human activities and climate.Our analysis reveals a strong correlation between salt fields and precipitation,while exposed lakes demonstrate a significant negative correlation with evaporation and temperature,highlighting their vulnerability to climate change.Additionally,human water usage was identified as a significant factor impacting land use change,emphasizing the dual influence of anthropogenic activities and natural factors.This paper addresses the void in the application of Auto ML in salt lake environments and provides valuable insights into the dynamic evolution of land use types in the Qarhan Salt Lake region.展开更多
There are various heterogeneous networks for terminals to deliver a better quality of service. Signal system recognition and classification contribute a lot to the process. However, in low signal to noise ratio(SNR)...There are various heterogeneous networks for terminals to deliver a better quality of service. Signal system recognition and classification contribute a lot to the process. However, in low signal to noise ratio(SNR) circumstances or under time-varying multipath channels, the majority of the existing algorithms for signal recognition are already facing limitations. In this series, we present a robust signal recognition method based upon the original and latest updated version of the extreme learning machine(ELM) to help users to switch between networks. The ELM utilizes signal characteristics to distinguish systems. The superiority of this algorithm lies in the random choices of hidden nodes and in the fact that it determines the output weights analytically, which result in lower complexity. Theoretically, the algorithm tends to offer a good generalization performance at an extremely fast speed of learning. Moreover, we implement the GSM/WCDMA/LTE models in the Matlab environment by using the Simulink tools. The simulations reveal that the signals can be recognized successfully to achieve a 95% accuracy in a low SNR(0 dB) environment in the time-varying multipath Rayleigh fading channel.展开更多
In the field of radiocommunication, modulation type identification is one of the most important characteristics in signal processing. This study aims to implement a modulation recognition system on two approaches to m...In the field of radiocommunication, modulation type identification is one of the most important characteristics in signal processing. This study aims to implement a modulation recognition system on two approaches to machine learning techniques, the K-Nearest Neighbors (KNN) and Artificial Neural Networks (ANN). From a statistical and spectral analysis of signals, nine key differentiation features are extracted and used as input vectors for each trained model. The feature extraction is performed by using the Hilbert transform, the forward and inverse Fourier transforms. The experiments with the AMC Master dataset classify ten (10) types of analog and digital modulations. AM_DSB_FC, AM_DSB_SC, AM_USB, AM_LSB, FM, MPSK, 2PSK, MASK, 2ASK, MQAM are put forward in this article. For the simulation of the chosen model, signals are polluted by the Additive White Gaussian Noise (AWGN). The simulation results show that the best identification rate is the MLP neuronal method with 90.5% of accuracy after 10 dB signal-to-noise ratio value, with a shift of more than 15% from the k-nearest neighbors’ algorithm.展开更多
Objective To develop and evaluate an automated system for digitizing audiograms,classifying hearing loss levels,and comparing their performance with traditional methods and otolaryngologists'interpretations.Design...Objective To develop and evaluate an automated system for digitizing audiograms,classifying hearing loss levels,and comparing their performance with traditional methods and otolaryngologists'interpretations.Designed and Methods We conducted a retrospective diagnostic study using 1,959 audiogram images from patients aged 7 years and older at the Faculty of Medicine,Vajira Hospital,Navamindradhiraj University.We employed an object detection approach to digitize audiograms and developed multiple machine learning models to classify six hearing loss levels.The dataset was split into 70%training(1,407 images)and 30%testing(352 images)sets.We compared our model's performance with classifications based on manually extracted audiogram values and otolaryngologists'interpretations.Result Our object detection-based model achieved an F1-score of 94.72%in classifying hearing loss levels,comparable to the 96.43%F1-score obtained using manually extracted values.The Light Gradient Boosting Machine(LGBM)model is used as the classifier for the manually extracted data,which achieved top performance with 94.72%accuracy,94.72%f1-score,94.72 recall,and 94.72 precision.In object detection based model,The Random Forest Classifier(RFC)model showed the highest 96.43%accuracy in predicting hearing loss level,with a F1-score of 96.43%,recall of 96.43%,and precision of 96.45%.Conclusion Our proposed automated approach for audiogram digitization and hearing loss classification performs comparably to traditional methods and otolaryngologists'interpretations.This system can potentially assist otolaryngologists in providing more timely and effective treatment by quickly and accurately classifying hearing loss.展开更多
Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracte...Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).展开更多
AIM: To investigate and compare the efficacy of two machine-learning technologies with deep-learning(DL) and support vector machine(SVM) for the detection of branch retinal vein occlusion(BRVO) using ultrawide-field f...AIM: To investigate and compare the efficacy of two machine-learning technologies with deep-learning(DL) and support vector machine(SVM) for the detection of branch retinal vein occlusion(BRVO) using ultrawide-field fundus images. METHODS: This study included 237 images from 236 patients with BRVO with a mean±standard deviation of age 66.3±10.6 y and 229 images from 176 non-BRVO healthy subjects with a mean age of 64.9±9.4 y. Training was conducted using a deep convolutional neural network using ultrawide-field fundus images to construct the DL model. The sensitivity, specificity, positive predictive value(PPV), negative predictive value(NPV) and area under the curve(AUC) were calculated to compare the diagnostic abilities of the DL and SVM models. RESULTS: For the DL model, the sensitivity, specificity, PPV, NPV and AUC for diagnosing BRVO was 94.0%(95%CI: 93.8%-98.8%), 97.0%(95%CI: 89.7%-96.4%), 96.5%(95%CI: 94.3%-98.7%), 93.2%(95%CI: 90.5%-96.0%) and 0.976(95%CI: 0.960-0.993), respectively. In contrast, for the SVM model, these values were 80.5%(95%CI: 77.8%-87.9%), 84.3%(95%CI: 75.8%-86.1%), 83.5%(95%CI: 78.4%-88.6%), 75.2%(95%CI: 72.1%-78.3%) and 0.857(95%CI: 0.811-0.903), respectively. The DL model outperformed the SVM model in all the aforementioned parameters(P<0.001). CONCLUSION: These results indicate that the combination of the DL model and ultrawide-field fundus ophthalmoscopy may distinguish between healthy and BRVO eyes with a high level of accuracy. The proposed combination may be used for automatically diagnosing BRVO in patients residing in remote areas lacking access to an ophthalmic medical center.展开更多
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst...Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems.展开更多
基金Supported by National Natural Science Foundation of China(No.82074524)Harbin Medical University Graduate Research and Practice Innovation Project(No.YJSCX2023-50HYD).
文摘AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along with patient medical records,were retrospectively collected from 3244 cases.Diagnostic models were constructed using logistic regression(LR),decision tree(DT),support vector machine(SVM),extreme gradient boosting(XGBoost),and deep learning(DL)algorithms.A total of 2757 diplopia images were randomly selected as training data,while the test dataset contained 487 diplopia images.The optimal diagnostic model was evaluated using test set accuracy,confusion matrix,and precision-recall curve(P-R curve).RESULTS:The test set accuracy of the LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.762,0.811,0.818,0.812,0.858 and 0.858,respectively.The accuracy in the training set was 0.785,0.815,0.998,0.965,0.968,and 0.967,respectively.The weighted precision of LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.74,0.77,0.83,0.80,0.85,and 0.85,respectively;weighted recall was 0.76,0.81,0.82,0.81,0.86,and 0.86,respectively;weighted F1 score was 0.74,0.79,0.82,0.80,0.85,and 0.85,respectively.CONCLUSION:In this study,the 7 machine learning algorithms all achieve automatic diagnosis of extraocular muscle palsy.The DL(64 categories)and DL(6 binary classifications)algorithms have a significant advantage over other machine learning algorithms regarding diagnostic accuracy on the test set,with a high level of consistency with clinical diagnoses made by physicians.Therefore,it can be used as a reference for diagnosis.
基金financially supported by the National Key Research and Development Program of China(No.2017YFB0304900)。
文摘Gradient boosting decision tree(GBDT)machine learning(ML)method was adopted for the first time to automatically recognize and conduct quantitative statistical analysis of boundaries in bainitic microstructure using electron back-scatter diffraction(EBSD)data.In spite of lack of large sets of EBSD data,we were successful in achieving the desired accuracy and accomplishing the objective of recognizing the boundaries.Compared with a low model accuracy of<50%as using Euler angles or axis-angle pair as characteristic features,the accuracy of the model was significantly enhanced to about 88%when the Euler angle was converted to overall misorientation angle(OMA)and specific misorientation angle(SMA)and considered as important features.In this model,the recall score of prior austenite grain(PAG)boundary was~93%,high angle packet boundary(OMA>40°)was~97%,and block boundary was~96%.The derived outcomes of ML were used to obtain insights into the ductile-to-brittle transition(DBTT)behavior.Interestingly,ML modeling approach suggested that DBTT was not determined by the density of high angle grain boundaries,but significantly influenced by the density of PAG and packet boundaries.The study underscores that ML has a great potential in detailed recognition of complex multi-hierarchical microstructure such as bainite and martensite and relates to material performance.
基金This work was supported by the GRRC program of Gyeonggi province.[GRRC-Gachon2020(B04),Development of AI-based Healthcare Devices].
文摘Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.
文摘With the rapid development of social economy,the society has entered into a new stage of development,especially in new media under the background of rapid development,makes the importance of news and information to get the comprehensive promotion,and in order to further identify the positive and negative news,should be fully using machine learning methods,based on the emotion to realize the automatic classifying of news,in order to improve the efficiency of news classification.Therefore,the article first makes clear the basic outline of news sentiment classification.Secondly,the specific way of automatic classification of news emotion is deeply analyzed.On the basis of this,the paper puts forward the concrete measures of automatic classification of news emotion by using machine learning.
基金supported by the Auckland Medical Research Foundation,No.1117017(to CPU)
文摘Perinatal hypoxic-ischemic-encephalopathy significantly contributes to neonatal death and life-long disability such as cerebral palsy. Advances in signal processing and machine learning have provided the research community with an opportunity to develop automated real-time identification techniques to detect the signs of hypoxic-ischemic-encephalopathy in larger electroencephalography/amplitude-integrated electroencephalography data sets more easily. This review details the recent achievements, performed by a number of prominent research groups across the world, in the automatic identification and classification of hypoxic-ischemic epileptiform neonatal seizures using advanced signal processing and machine learning techniques. This review also addresses the clinical challenges that current automated techniques face in order to be fully utilized by clinicians, and highlights the importance of upgrading the current clinical bedside sampling frequencies to higher sampling rates in order to provide better hypoxic-ischemic biomarker detection frameworks. Additionally, the article highlights that current clinical automated epileptiform detection strategies for human neonates have been only concerned with seizure detection after the therapeutic latent phase of injury. Whereas recent animal studies have demonstrated that the latent phase of opportunity is critically important for early diagnosis of hypoxic-ischemic-encephalopathy electroencephalography biomarkers and although difficult, detection strategies could utilize biomarkers in the latent phase to also predict the onset of future seizures.
基金supported by the National Natural Science Foundation of China(Grant Nos.51978517,52090082,and 52108381)Innovation Program of Shanghai Municipal Education Commission(Grant No.2019-01-07-00-07-456 E00051)Shanghai Science and Technology Committee Program(Grant Nos.21DZ1200601 and 20DZ1201404).
文摘The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an auto machine learning(AutoML)-based approach is proposed to precisely solve the issue.Seven input parameters are considered in the database covering two physical aspects,namely soil property,and spatial characteristics of the deep excavation.The 10-fold cross-validation method is employed to overcome the scarcity of data,and promote model’s robustness.Six genetic algorithm(GA)-ML models are established as well for comparison.The results indicated that the proposed AutoML model is a comprehensive model that integrates efficiency and robustness.Importance analysis reveals that the ratio of the average shear strength to the vertical effective stress E_(ur)/σ′_(v),the excavation depth H,and the excavation width B are the most influential variables for the displacements.Finally,the AutoML model is further validated by practical engineering.The prediction results are in a good agreement with monitoring data,signifying that our model can be applied in real projects.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/209/42).
文摘Agriculture 4.0,as the future of farming technology,comprises numerous key enabling technologies towards sustainable agriculture.The use of state-of-the-art technologies,such as the Internet of Things,transform traditional cultivation practices,like irrigation,to modern solutions of precision agriculture.To achieve effectivewater resource usage and automated irrigation in precision agriculture,recent technologies like machine learning(ML)can be employed.With this motivation,this paper design an IoT andML enabled smart irrigation system(IoTML-SIS)for precision agriculture.The proposed IoTML-SIS technique allows to sense the parameters of the farmland and make appropriate decisions for irrigation.The proposed IoTML-SIS model involves different IoT based sensors for soil moisture,humidity,temperature sensor,and light.Besides,the sensed data are transmitted to the cloud server for processing and decision making.Moreover,artificial algae algorithm(AAA)with least squares-support vector machine(LS-SVM)model is employed for the classification process to determine the need for irrigation.Furthermore,the AAA is applied to optimally tune the parameters involved in the LS-SVM model,and thereby the classification efficiency is significantly increased.The performance validation of the proposed IoTML-SIS technique ensured better performance over the compared methods with the maximum accuracy of 0.975.
文摘Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by human scientists.This paper presents the experimental results of an automatic machine learning system which derives forecasting rules from real observational data.We tested the system on the two large real data sets from the areas of centra! China and Victoria of Australia.The experimental results show that the forecasting rules discovered by the system are very competitive to human experts.The forecasting accuracy rates are 86.4% and 78% of the two data sets respectively
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
文摘Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.
基金supported cooperatively by the Second Tibetan Plateau Scientific Expedition and Research Program(2019QZKK0805)the National Natural Science Foundation of China(U20A2088)+1 种基金the Innovation Team Foundation of Qinghai Office of Science and Technology(2022-ZJ-903)CITIC Top 10 Technological Innovation Projects Comprehensive development and utilization of salt lake resources(2023ZXKYA05100).
文摘The significance of land use classification has garnered attention due to its implications for climate and ecosystems.This paper establishes a connection by introducing and applying automatic machine learning(Auto ML)techniques to salt lake landscape,with a specific focus on the Qarhan Salt Lake area.Utilizing Landsat-5 Thematic Mappe(TM)and Landsat-8 Operational Land Imager(OLI)imagery,six machine learning algorithms were employed to classify eight land use types from 2000 to 2020.Results show that XGBLD performed optimally with 77%accuracy.Over two decades,salt fields,construction land,and water areas increased due to transformations in saline land and salt flats.The exposed lakes area exhibited a rise followed by a decline,mainly transforming into salt flats.Agricultural land areas slightly increased,influenced by both human activities and climate.Our analysis reveals a strong correlation between salt fields and precipitation,while exposed lakes demonstrate a significant negative correlation with evaporation and temperature,highlighting their vulnerability to climate change.Additionally,human water usage was identified as a significant factor impacting land use change,emphasizing the dual influence of anthropogenic activities and natural factors.This paper addresses the void in the application of Auto ML in salt lake environments and provides valuable insights into the dynamic evolution of land use types in the Qarhan Salt Lake region.
基金supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China(2014 ZX03001027)
文摘There are various heterogeneous networks for terminals to deliver a better quality of service. Signal system recognition and classification contribute a lot to the process. However, in low signal to noise ratio(SNR) circumstances or under time-varying multipath channels, the majority of the existing algorithms for signal recognition are already facing limitations. In this series, we present a robust signal recognition method based upon the original and latest updated version of the extreme learning machine(ELM) to help users to switch between networks. The ELM utilizes signal characteristics to distinguish systems. The superiority of this algorithm lies in the random choices of hidden nodes and in the fact that it determines the output weights analytically, which result in lower complexity. Theoretically, the algorithm tends to offer a good generalization performance at an extremely fast speed of learning. Moreover, we implement the GSM/WCDMA/LTE models in the Matlab environment by using the Simulink tools. The simulations reveal that the signals can be recognized successfully to achieve a 95% accuracy in a low SNR(0 dB) environment in the time-varying multipath Rayleigh fading channel.
文摘In the field of radiocommunication, modulation type identification is one of the most important characteristics in signal processing. This study aims to implement a modulation recognition system on two approaches to machine learning techniques, the K-Nearest Neighbors (KNN) and Artificial Neural Networks (ANN). From a statistical and spectral analysis of signals, nine key differentiation features are extracted and used as input vectors for each trained model. The feature extraction is performed by using the Hilbert transform, the forward and inverse Fourier transforms. The experiments with the AMC Master dataset classify ten (10) types of analog and digital modulations. AM_DSB_FC, AM_DSB_SC, AM_USB, AM_LSB, FM, MPSK, 2PSK, MASK, 2ASK, MQAM are put forward in this article. For the simulation of the chosen model, signals are polluted by the Additive White Gaussian Noise (AWGN). The simulation results show that the best identification rate is the MLP neuronal method with 90.5% of accuracy after 10 dB signal-to-noise ratio value, with a shift of more than 15% from the k-nearest neighbors’ algorithm.
文摘Objective To develop and evaluate an automated system for digitizing audiograms,classifying hearing loss levels,and comparing their performance with traditional methods and otolaryngologists'interpretations.Designed and Methods We conducted a retrospective diagnostic study using 1,959 audiogram images from patients aged 7 years and older at the Faculty of Medicine,Vajira Hospital,Navamindradhiraj University.We employed an object detection approach to digitize audiograms and developed multiple machine learning models to classify six hearing loss levels.The dataset was split into 70%training(1,407 images)and 30%testing(352 images)sets.We compared our model's performance with classifications based on manually extracted audiogram values and otolaryngologists'interpretations.Result Our object detection-based model achieved an F1-score of 94.72%in classifying hearing loss levels,comparable to the 96.43%F1-score obtained using manually extracted values.The Light Gradient Boosting Machine(LGBM)model is used as the classifier for the manually extracted data,which achieved top performance with 94.72%accuracy,94.72%f1-score,94.72 recall,and 94.72 precision.In object detection based model,The Random Forest Classifier(RFC)model showed the highest 96.43%accuracy in predicting hearing loss level,with a F1-score of 96.43%,recall of 96.43%,and precision of 96.45%.Conclusion Our proposed automated approach for audiogram digitization and hearing loss classification performs comparably to traditional methods and otolaryngologists'interpretations.This system can potentially assist otolaryngologists in providing more timely and effective treatment by quickly and accurately classifying hearing loss.
基金This work is developed with the support of the H2020 RISIS 2 Project(No.824091)and of the“Sapienza”Research Awards No.RM1161550376E40E of 2016 and RM11916B8853C925 of 2019.This article is a largely extended version of Bianchi et al.(2019)presented at the ISSI 2019 Conference held in Rome,2–5 September 2019.
文摘Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).
文摘AIM: To investigate and compare the efficacy of two machine-learning technologies with deep-learning(DL) and support vector machine(SVM) for the detection of branch retinal vein occlusion(BRVO) using ultrawide-field fundus images. METHODS: This study included 237 images from 236 patients with BRVO with a mean±standard deviation of age 66.3±10.6 y and 229 images from 176 non-BRVO healthy subjects with a mean age of 64.9±9.4 y. Training was conducted using a deep convolutional neural network using ultrawide-field fundus images to construct the DL model. The sensitivity, specificity, positive predictive value(PPV), negative predictive value(NPV) and area under the curve(AUC) were calculated to compare the diagnostic abilities of the DL and SVM models. RESULTS: For the DL model, the sensitivity, specificity, PPV, NPV and AUC for diagnosing BRVO was 94.0%(95%CI: 93.8%-98.8%), 97.0%(95%CI: 89.7%-96.4%), 96.5%(95%CI: 94.3%-98.7%), 93.2%(95%CI: 90.5%-96.0%) and 0.976(95%CI: 0.960-0.993), respectively. In contrast, for the SVM model, these values were 80.5%(95%CI: 77.8%-87.9%), 84.3%(95%CI: 75.8%-86.1%), 83.5%(95%CI: 78.4%-88.6%), 75.2%(95%CI: 72.1%-78.3%) and 0.857(95%CI: 0.811-0.903), respectively. The DL model outperformed the SVM model in all the aforementioned parameters(P<0.001). CONCLUSION: These results indicate that the combination of the DL model and ultrawide-field fundus ophthalmoscopy may distinguish between healthy and BRVO eyes with a high level of accuracy. The proposed combination may be used for automatically diagnosing BRVO in patients residing in remote areas lacking access to an ophthalmic medical center.
文摘Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems.