N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning m...N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.展开更多
Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emer...Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.展开更多
As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important f...As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important for the operation and maintenance in optical networks.Due to complex relationships among each network element in topology level,each board in network element level,and each component in board level,the con-crete fault location is hard for traditional method.In recent years,machine learning,es-pecially deep learning,has been applied to many complex problems,because machine learning can find potential non-linear mapping from some inputs to the output.In this paper,we introduce supervised machine learning to propose a complete process for fault location.Firstly,we use data preprocessing,data annotation,and data augmenta-tion in order to process original collected data to build a high-quality dataset.Then,two machine learning algorithms(convolutional neural networks and deep neural networks)are applied on the dataset.The evaluation on commercial optical networks shows that this process helps improve the quality of dataset,and two algorithms perform well on fault location.展开更多
The DNA sequences of an organism play an important influence on its transcription and translation process, thus affecting its protein production and growth rate. Due to the com-plexity of DNA, it was extremely difficu...The DNA sequences of an organism play an important influence on its transcription and translation process, thus affecting its protein production and growth rate. Due to the com-plexity of DNA, it was extremely difficult to predict the macroscopic characteristics of or-ganisms. However, with the rapid development of machine learning in recent years, it be-comes possible to use powerful machine learning algorithms to process and analyze biolog-ical data. Based on the synthetic DNA sequences of a specific microbe, <em>E. coli</em>, I designed a process to predict its protein production and growth rate. By observing the properties of a data set constructed by previous work, I chose to use supervised learning regressors with encoded DNA sequences as input features to perform the predictions. After comparing different encoders and algorithms, I selected three encoders to encode the DNA sequences as inputs and trained seven different regressors to predict the outputs. The hy-per-parameters are optimized for three regressors which have the best potential prediction performance. Finally, I successfully predicted the protein production and growth rates, with the best <em>R</em><sup><em>2</em></sup> score 0.55 and 0.77, respectively, by using encoders to catch the potential fea-tures from the DNA sequences.展开更多
The technique of Enhanced Gas Recovery by CO_(2) injection(CO_(2)-EGR)into shale reservoirs has brought increasing attention in the recent decade.CO_(2)-EGR is a complex geophysical process that is controlled by sever...The technique of Enhanced Gas Recovery by CO_(2) injection(CO_(2)-EGR)into shale reservoirs has brought increasing attention in the recent decade.CO_(2)-EGR is a complex geophysical process that is controlled by several parameters of shale properties and engineering design.Nevertheless,more challenges arise when simulating and predicting CO_(2)/CH4 displacement within the complex pore systems of shales.Therefore,the petroleum industry is in need of developing a cost-effective tool/approach to evaluate the potential of applying CO_(2) injection to shale reservoirs.In recent years,machine learning applications have gained enormous interest due to their high-speed performance in handling complex data and efficiently solving practical problems.Thus,this work proposes a solution by developing a supervised machine learning(ML)based model to preliminary evaluate CO_(2)-EGR efficiency.Data used for this work was drawn across a wide range of simulation sensitivity studies and experimental investigations.In this work,linear regression and artificial neural networks(ANNs)implementations were considered for predicting the incremental enhanced CH4.Based on the model performance in training and validation sets,our accuracy comparison showed that(ANNs)algorithms gave 15%higher accuracy in predicting the enhanced CH4 compared to the linear regression model.To ensure the model is more generalizable,the size of hidden layers of ANNs was adjusted to improve the generalization ability of ANNs model.Among ANNs models presented,ANNs of 100 hidden layer size gave the best predictive performance with the coefficient of determination(R2)of 0.78 compared to the linear regression model with R2 of 0.68.Our developed MLbased model presents a powerful,reliable and cost-effective tool which can accurately predict the incremental enhanced CH4 by CO_(2) injection in shale gas reservoirs.展开更多
With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,l...With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.展开更多
BACKGROUND It is important to diagnose depression in Parkinson’s disease(DPD)as soon as possible and identify the predictors of depression to improve quality of life in Parkinson’s disease(PD)patients.AIM To develop...BACKGROUND It is important to diagnose depression in Parkinson’s disease(DPD)as soon as possible and identify the predictors of depression to improve quality of life in Parkinson’s disease(PD)patients.AIM To develop a model for predicting DPD based on the support vector machine,while considering sociodemographic factors,health habits,Parkinson's symptoms,sleep behavior disorders,and neuropsychiatric indicators as predictors and provide baseline data for identifying DPD.METHODS This study analyzed 223 of 335 patients who were 60 years or older with PD.Depression was measured using the 30 items of the Geriatric Depression Scale,and the explanatory variables included PD-related motor signs,rapid eye movement sleep behavior disorders,and neuropsychological tests.The support vector machine was used to develop a DPD prediction model.RESULTS When the effects of PD motor symptoms were compared using“functional weight”,late motor complications(occurrence of levodopa-induced dyskinesia)were the most influential risk factors for Parkinson's symptoms.CONCLUSION It is necessary to develop customized screening tests that can detect DPD in the early stage and continuously monitor high-risk groups based on the factors related to DPD derived from this predictive model in order to maintain the emotional health of PD patients.展开更多
Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two...Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.展开更多
Recently,machine learning(ML)has been considered a powerful technological element of different society areas.To transform the computer into a decision maker,several sophisticated methods and algorithms are constantly ...Recently,machine learning(ML)has been considered a powerful technological element of different society areas.To transform the computer into a decision maker,several sophisticated methods and algorithms are constantly created and analyzed.In geophysics,both supervised and unsupervised ML methods have dramatically contributed to the development of seismic and well-log data interpretation.In well-logging,ML algorithms are well-suited for lithologic reconstruction problems,once there is no analytical expressions for computing well-log data produced by a particular rock unit.Additionally,supervised ML methods are strongly dependent on a accurate-labeled training data-set,which is not a simple task to achieve,due to data absences or corruption.Once an adequate supervision is performed,the classification outputs tend to be more accurate than unsupervised methods.This work presents a supervised version of a Self-Organizing Map,named as SSOM,to solve a lithologic reconstruction problem from well-log data.Firstly,we go for a more controlled problem and simulate well-log data directly from an interpreted geologic cross-section.We then define two specific training data-sets composed by density(RHOB),sonic(DT),spontaneous potential(SP)and gamma-ray(GR)logs,all simulated through a Gaussian distribution function per lithology.Once the training data-set is created,we simulate a particular pseudo-well,referred to as classification well,for defining controlled tests.First one comprises a training data-set with no labeled log data of the simulated fault zone.In the second test,we intentionally improve the training data-set with the fault.To bespeak the obtained results for each test,we analyze confusion matrices,logplots,accuracy and precision.Apart from very thin layer misclassifications,the SSOM provides reasonable lithologic reconstructions,especially when the improved training data-set is considered for supervision.The set of numerical experiments shows that our SSOM is extremely well-suited for a supervised lithologic reconstruction,especially to recover lithotypes that are weakly-sampled in the training log-data.On the other hand,some misclassifications are also observed when the cortex could not group the slightly different lithologies.展开更多
Offshore Anchor handling tug supply(AHTS)vessels,a type of offshore support vessel,are critical for the operations related to handling anchors of offshore structures,oil rigs,and wind turbines,towing them to remote de...Offshore Anchor handling tug supply(AHTS)vessels,a type of offshore support vessel,are critical for the operations related to handling anchors of offshore structures,oil rigs,and wind turbines,towing them to remote deep-sea locations,and securing them in place.Amidst growing concerns regarding the en-vironmental footprints of carbon-based fuels and impending carbon taxation,the International Maritime Organization,policymakers,classification societies,shipping firms,and stakeholders are seeking cleaner alternatives.LNG(Liquefied natural gas)and Green ammonia as energy vectors are considered among the top contenders for future clean alternatives for offshore vessels.This study evaluated the environmental performance of newly built AHTS vessels powered by LNG and Green ammonia as marine fuels designed for offshore operations.This environmental impact assessment study uses IPCC and Environmental foot-print methodologies.Considered broad impact groups:Global warming,human toxicity,eutrophication,ecotoxicity,and atmosphere-related impacts,and analyzed the process impacts.This study uses Super-vised machine learning algorithms such as the Random forest,Decision tree,and XGBOOST models for environmental performance evaluation and prediction.The study reveals that the recently manufactured AHTS vessel,utilizing conventional fuels like Heavy fuel oil,Marine diesel oil,and LNG,exhibits signifi-cantly increased GTP 100 and GWP 100 emission levels per tonne-kilometer when compared to green am-monia,with a 44%and 10.6%rise compared to Heavy fuel oil,respectively.Furthermore,the XGBOOST regression model outperformed the Random forest and Decision tree models in predictive accuracy for GWP 100.It is analyzed and proposed that effectively managing unsustainable processes would minimize environmental footprints and reduce carbon,nitrogen oxide,LNG,and ammonia-based emissions.展开更多
Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most...Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most of their products and programs in Machakos County have been reducing due to re-payment challenges,threatening their financial ability to extend further credit.This could be attributed to ineffective credit scoring models which are not able to establish the nuanced non-linear repayment behavior and patterns of the loan applicants.The research objective was to enhance credit risk scoring for microfinance institutions in Machakos County using supervised machine learning algorithms.The study adopted a mixed research design under supervised machine learning approach.It randomly sampled 6771 loan application ac-count records and repayment history.Rstudio and Python programming lan-guages were deployed for data pre-processing and analysis.Logistic regression algorithm,XG Boosting and the random forest ensemble method were used.Metric evaluations used included the performance accuracy,Area under the Curve and F1-Score.Based on the study findings:XG Boosting was the best performer with 83.3%accuracy and 0.202 Brier score.Development of legal framework to govern ethical and open use of machine learning assessment was recommended.A similar research but using different machine learning al-gorithms,locations,and institutions,to ascertain the validity,reliability and the generalizability of the study findings was recommended for further re-search.展开更多
Cyber-physical systems(CPSs)in critical infrastructure face serious threats of attack,motivating research into a wide variety of defence mechanisms such as those that monitor for violations of invariants,i.e.logical p...Cyber-physical systems(CPSs)in critical infrastructure face serious threats of attack,motivating research into a wide variety of defence mechanisms such as those that monitor for violations of invariants,i.e.logical properties over sensor and actuator states that should always be true.Many approaches for identifying invariants attempt to do so automatically,typically using data logs,but these can miss valid system properties if relevant behaviours are not well-represented in the data.Furthermore,as the CPS is already built,resolving any design flaws or weak points identified through this process is costly.In this paper,we propose a systematic method for deriving invariants from an analysis of a CPS design,based on principles of the axiomatic design methodology from design science.Our method iteratively decomposes a high-level CPS design to identify sets of dependent design parameters(i.e.sensors and actuators),allowing for invariants and invariant checkers to be derived in parallel to the implementation of the system.We apply our method to the designs of two CPS testbeds,SWaT and WADI,deriving a suite of invariant checkers that are able to detect a variety of single-and multi-stage attacks without any false positives.Finally,we reflect on the strengths and weaknesses of our approach,how it can be complemented by other defence mechanisms,and how it could help engineers to identify and resolve weak points in a design before the controllers of a CPS are implemented.展开更多
Nonlinear photovoltaic(PV)output is greatly affected by the nonuniform distribution of daily irradiance,preventing conventional protection devices from reliably detecting faults.Smart fault diagnosis and good maintena...Nonlinear photovoltaic(PV)output is greatly affected by the nonuniform distribution of daily irradiance,preventing conventional protection devices from reliably detecting faults.Smart fault diagnosis and good maintenance systems are essential for optimizing the overall productivity of a PV system and improving its life cycle.Hence,a multiscale smart fault diagnosis model for improved PV system maintenance strategies is proposed.This study focuses on diagnosing permanent faults(open-circuit faults,ground faults,and line-line faults)and temporary faults(partial shading)in PV arrays,using the random forest algorithm to conduct time-series analysis of waveform length and autoregression(RF-WLAR)as the main features,with 10-fold cross-validation using Matlab/Simulink.The actual irradiance data at 5.86°N and 102.03°E were used as inputs to produce simulated data that closely matched the on-site PV output data.Fault data from the maintenance database of a 2 MW PV power plant in Pasir Mas Kelantan,Malaysia,were used for field testing to verify the developed model.The RF-WLAR model achieved an average fault-type classification accuracy of 98%,with 100%accuracy in classifying partial shading and line-line faults.展开更多
文摘N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.
基金Supported by the National High Technology Research and Development Programme of China (No. 2005AA121620, 2006AA01Z232)the Zhejiang Provincial Natural Science Foundation of China (No. Y1080935 )the Research Innovation Program for Graduate Students in Jiangsu Province (No. CX07B_ 110zF)
文摘Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.
文摘As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important for the operation and maintenance in optical networks.Due to complex relationships among each network element in topology level,each board in network element level,and each component in board level,the con-crete fault location is hard for traditional method.In recent years,machine learning,es-pecially deep learning,has been applied to many complex problems,because machine learning can find potential non-linear mapping from some inputs to the output.In this paper,we introduce supervised machine learning to propose a complete process for fault location.Firstly,we use data preprocessing,data annotation,and data augmenta-tion in order to process original collected data to build a high-quality dataset.Then,two machine learning algorithms(convolutional neural networks and deep neural networks)are applied on the dataset.The evaluation on commercial optical networks shows that this process helps improve the quality of dataset,and two algorithms perform well on fault location.
文摘The DNA sequences of an organism play an important influence on its transcription and translation process, thus affecting its protein production and growth rate. Due to the com-plexity of DNA, it was extremely difficult to predict the macroscopic characteristics of or-ganisms. However, with the rapid development of machine learning in recent years, it be-comes possible to use powerful machine learning algorithms to process and analyze biolog-ical data. Based on the synthetic DNA sequences of a specific microbe, <em>E. coli</em>, I designed a process to predict its protein production and growth rate. By observing the properties of a data set constructed by previous work, I chose to use supervised learning regressors with encoded DNA sequences as input features to perform the predictions. After comparing different encoders and algorithms, I selected three encoders to encode the DNA sequences as inputs and trained seven different regressors to predict the outputs. The hy-per-parameters are optimized for three regressors which have the best potential prediction performance. Finally, I successfully predicted the protein production and growth rates, with the best <em>R</em><sup><em>2</em></sup> score 0.55 and 0.77, respectively, by using encoders to catch the potential fea-tures from the DNA sequences.
文摘The technique of Enhanced Gas Recovery by CO_(2) injection(CO_(2)-EGR)into shale reservoirs has brought increasing attention in the recent decade.CO_(2)-EGR is a complex geophysical process that is controlled by several parameters of shale properties and engineering design.Nevertheless,more challenges arise when simulating and predicting CO_(2)/CH4 displacement within the complex pore systems of shales.Therefore,the petroleum industry is in need of developing a cost-effective tool/approach to evaluate the potential of applying CO_(2) injection to shale reservoirs.In recent years,machine learning applications have gained enormous interest due to their high-speed performance in handling complex data and efficiently solving practical problems.Thus,this work proposes a solution by developing a supervised machine learning(ML)based model to preliminary evaluate CO_(2)-EGR efficiency.Data used for this work was drawn across a wide range of simulation sensitivity studies and experimental investigations.In this work,linear regression and artificial neural networks(ANNs)implementations were considered for predicting the incremental enhanced CH4.Based on the model performance in training and validation sets,our accuracy comparison showed that(ANNs)algorithms gave 15%higher accuracy in predicting the enhanced CH4 compared to the linear regression model.To ensure the model is more generalizable,the size of hidden layers of ANNs was adjusted to improve the generalization ability of ANNs model.Among ANNs models presented,ANNs of 100 hidden layer size gave the best predictive performance with the coefficient of determination(R2)of 0.78 compared to the linear regression model with R2 of 0.68.Our developed MLbased model presents a powerful,reliable and cost-effective tool which can accurately predict the incremental enhanced CH4 by CO_(2) injection in shale gas reservoirs.
文摘With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.
基金the National Research Foundation of Korea,No.NRF-2019S1A5A8034211the National Research Foundation of Korea,No.NRF-2018R1D1A1B07041091.
文摘BACKGROUND It is important to diagnose depression in Parkinson’s disease(DPD)as soon as possible and identify the predictors of depression to improve quality of life in Parkinson’s disease(PD)patients.AIM To develop a model for predicting DPD based on the support vector machine,while considering sociodemographic factors,health habits,Parkinson's symptoms,sleep behavior disorders,and neuropsychiatric indicators as predictors and provide baseline data for identifying DPD.METHODS This study analyzed 223 of 335 patients who were 60 years or older with PD.Depression was measured using the 30 items of the Geriatric Depression Scale,and the explanatory variables included PD-related motor signs,rapid eye movement sleep behavior disorders,and neuropsychological tests.The support vector machine was used to develop a DPD prediction model.RESULTS When the effects of PD motor symptoms were compared using“functional weight”,late motor complications(occurrence of levodopa-induced dyskinesia)were the most influential risk factors for Parkinson's symptoms.CONCLUSION It is necessary to develop customized screening tests that can detect DPD in the early stage and continuously monitor high-risk groups based on the factors related to DPD derived from this predictive model in order to maintain the emotional health of PD patients.
基金the support of the Monash-IITB Academy Scholarshipthe Australian Research Council for funding the present research (DP190103592)。
文摘Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.
文摘Recently,machine learning(ML)has been considered a powerful technological element of different society areas.To transform the computer into a decision maker,several sophisticated methods and algorithms are constantly created and analyzed.In geophysics,both supervised and unsupervised ML methods have dramatically contributed to the development of seismic and well-log data interpretation.In well-logging,ML algorithms are well-suited for lithologic reconstruction problems,once there is no analytical expressions for computing well-log data produced by a particular rock unit.Additionally,supervised ML methods are strongly dependent on a accurate-labeled training data-set,which is not a simple task to achieve,due to data absences or corruption.Once an adequate supervision is performed,the classification outputs tend to be more accurate than unsupervised methods.This work presents a supervised version of a Self-Organizing Map,named as SSOM,to solve a lithologic reconstruction problem from well-log data.Firstly,we go for a more controlled problem and simulate well-log data directly from an interpreted geologic cross-section.We then define two specific training data-sets composed by density(RHOB),sonic(DT),spontaneous potential(SP)and gamma-ray(GR)logs,all simulated through a Gaussian distribution function per lithology.Once the training data-set is created,we simulate a particular pseudo-well,referred to as classification well,for defining controlled tests.First one comprises a training data-set with no labeled log data of the simulated fault zone.In the second test,we intentionally improve the training data-set with the fault.To bespeak the obtained results for each test,we analyze confusion matrices,logplots,accuracy and precision.Apart from very thin layer misclassifications,the SSOM provides reasonable lithologic reconstructions,especially when the improved training data-set is considered for supervision.The set of numerical experiments shows that our SSOM is extremely well-suited for a supervised lithologic reconstruction,especially to recover lithotypes that are weakly-sampled in the training log-data.On the other hand,some misclassifications are also observed when the cortex could not group the slightly different lithologies.
基金support of the Major International(Regional)Joint Research Project of the National Nat-ural Science Foundation of China(52020105009).
文摘Offshore Anchor handling tug supply(AHTS)vessels,a type of offshore support vessel,are critical for the operations related to handling anchors of offshore structures,oil rigs,and wind turbines,towing them to remote deep-sea locations,and securing them in place.Amidst growing concerns regarding the en-vironmental footprints of carbon-based fuels and impending carbon taxation,the International Maritime Organization,policymakers,classification societies,shipping firms,and stakeholders are seeking cleaner alternatives.LNG(Liquefied natural gas)and Green ammonia as energy vectors are considered among the top contenders for future clean alternatives for offshore vessels.This study evaluated the environmental performance of newly built AHTS vessels powered by LNG and Green ammonia as marine fuels designed for offshore operations.This environmental impact assessment study uses IPCC and Environmental foot-print methodologies.Considered broad impact groups:Global warming,human toxicity,eutrophication,ecotoxicity,and atmosphere-related impacts,and analyzed the process impacts.This study uses Super-vised machine learning algorithms such as the Random forest,Decision tree,and XGBOOST models for environmental performance evaluation and prediction.The study reveals that the recently manufactured AHTS vessel,utilizing conventional fuels like Heavy fuel oil,Marine diesel oil,and LNG,exhibits signifi-cantly increased GTP 100 and GWP 100 emission levels per tonne-kilometer when compared to green am-monia,with a 44%and 10.6%rise compared to Heavy fuel oil,respectively.Furthermore,the XGBOOST regression model outperformed the Random forest and Decision tree models in predictive accuracy for GWP 100.It is analyzed and proposed that effectively managing unsustainable processes would minimize environmental footprints and reduce carbon,nitrogen oxide,LNG,and ammonia-based emissions.
文摘Microfinance institutions in Kenya play a unique role in promoting financial inclusion,loans,and savings provision,especially to low-income individuals and small-scale entrepreneurs.However,despite their benefits,most of their products and programs in Machakos County have been reducing due to re-payment challenges,threatening their financial ability to extend further credit.This could be attributed to ineffective credit scoring models which are not able to establish the nuanced non-linear repayment behavior and patterns of the loan applicants.The research objective was to enhance credit risk scoring for microfinance institutions in Machakos County using supervised machine learning algorithms.The study adopted a mixed research design under supervised machine learning approach.It randomly sampled 6771 loan application ac-count records and repayment history.Rstudio and Python programming lan-guages were deployed for data pre-processing and analysis.Logistic regression algorithm,XG Boosting and the random forest ensemble method were used.Metric evaluations used included the performance accuracy,Area under the Curve and F1-Score.Based on the study findings:XG Boosting was the best performer with 83.3%accuracy and 0.202 Brier score.Development of legal framework to govern ethical and open use of machine learning assessment was recommended.A similar research but using different machine learning al-gorithms,locations,and institutions,to ascertain the validity,reliability and the generalizability of the study findings was recommended for further re-search.
基金the National Research Foundation,Singapore,under its National Satellite of Excellence Programme“Design Science and Technology for Secure Critical Infrastructure”(Award Number:NSoE DeST-SCI2019-0004).
文摘Cyber-physical systems(CPSs)in critical infrastructure face serious threats of attack,motivating research into a wide variety of defence mechanisms such as those that monitor for violations of invariants,i.e.logical properties over sensor and actuator states that should always be true.Many approaches for identifying invariants attempt to do so automatically,typically using data logs,but these can miss valid system properties if relevant behaviours are not well-represented in the data.Furthermore,as the CPS is already built,resolving any design flaws or weak points identified through this process is costly.In this paper,we propose a systematic method for deriving invariants from an analysis of a CPS design,based on principles of the axiomatic design methodology from design science.Our method iteratively decomposes a high-level CPS design to identify sets of dependent design parameters(i.e.sensors and actuators),allowing for invariants and invariant checkers to be derived in parallel to the implementation of the system.We apply our method to the designs of two CPS testbeds,SWaT and WADI,deriving a suite of invariant checkers that are able to detect a variety of single-and multi-stage attacks without any false positives.Finally,we reflect on the strengths and weaknesses of our approach,how it can be complemented by other defence mechanisms,and how it could help engineers to identify and resolve weak points in a design before the controllers of a CPS are implemented.
基金Supported by the Universiti Malaysia Pahang (UMP)for the Financial Support Received under Project Number RDU223001 and PGRS2003189.
文摘Nonlinear photovoltaic(PV)output is greatly affected by the nonuniform distribution of daily irradiance,preventing conventional protection devices from reliably detecting faults.Smart fault diagnosis and good maintenance systems are essential for optimizing the overall productivity of a PV system and improving its life cycle.Hence,a multiscale smart fault diagnosis model for improved PV system maintenance strategies is proposed.This study focuses on diagnosing permanent faults(open-circuit faults,ground faults,and line-line faults)and temporary faults(partial shading)in PV arrays,using the random forest algorithm to conduct time-series analysis of waveform length and autoregression(RF-WLAR)as the main features,with 10-fold cross-validation using Matlab/Simulink.The actual irradiance data at 5.86°N and 102.03°E were used as inputs to produce simulated data that closely matched the on-site PV output data.Fault data from the maintenance database of a 2 MW PV power plant in Pasir Mas Kelantan,Malaysia,were used for field testing to verify the developed model.The RF-WLAR model achieved an average fault-type classification accuracy of 98%,with 100%accuracy in classifying partial shading and line-line faults.