The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements a...The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements are required to bear large impulse load.However,traditional methods cannot non-destructively and quickly detect the internal structural of pavements.Thus,accurate and fast prediction of the mechanical properties of layered pavements is of great importance and necessity.In recent years,machine learning has shown great superiority in solving nonlinear problems.In this work,we present a method of predicting the maximum deflection and damage factor of layered pavements under instantaneous large impact based on random forest regression with the deflection basin parameters obtained from falling weight deflection testing.The regression coefficient R^(2)of testing datasets are above 0.94 in the process of predicting the elastic moduli of structural layers and mechanical responses,which indicates that the prediction results have great consistency with finite element simulation results.This paper provides a novel method for fast and accurate prediction of pavement mechanical responses under instantaneous large impact load using partial structural parameters of pavements,and has application potential in non-destructive evaluation of pavement structure.展开更多
Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medi...Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.展开更多
Modeling implied volatility(IV)is important for option pricing,hedging,and risk management.Previous studies of deterministic implied volatility functions(DIVFs)propose two parameters,moneyness and time to maturity,to ...Modeling implied volatility(IV)is important for option pricing,hedging,and risk management.Previous studies of deterministic implied volatility functions(DIVFs)propose two parameters,moneyness and time to maturity,to estimate implied volatility.Recent DIVF models have included factors such as a moving average ratio and relative bid-ask spread but fail to enhance modeling accuracy.The current study offers a generalized DIVF model by including a momentum indicator for the underlying asset using a relative strength index(RSI)covering multiple time resolutions as a factor,as momentum is often used by investors and speculators in their trading decisions,and in contrast to volatility,RSI can distinguish between bull and bear markets.To the best of our knowledge,prior studies have not included RSI as a predictive factor in modeling IV.Instead of using a simple linear regression as in previous studies,we use a machine learning regression algorithm,namely random forest,to model a nonlinear IV.Previous studies apply DVIF modeling to options on traditional financial assets,such as stock and foreign exchange markets.Here,we study options on the largest cryptocurrency,Bitcoin,which poses greater modeling challenges due to its extreme volatility and the fact that it is not as well studied as traditional financial assets.Recent Bitcoin option chain data were collected from a leading cryptocurrency option exchange over a four-month period for model development and validation.Our dataset includes short-maturity options with expiry in less than six days,as well as a full range of moneyness,both of which are often excluded in existing studies as prices for options with these characteristics are often highly volatile and pose challenges to model building.Our in-sample and out-sample results indicate that including our proposed momentum indicator significantly enhances the model’s accuracy in pricing options.The nonlinear machine learning random forest algorithm also performed better than a simple linear regression.Compared to prevailing option pricing models that employ stochastic variables,our DIVF model does not include stochastic factors but exhibits reasonably good performance.It is also easy to compute due to the availability of real-time RSIs.Our findings indicate our enhanced DIVF model offers significant improvements and may be an excellent alternative to existing option pricing models that are primarily stochastic in nature.展开更多
Accurate,reliable,and regularly updated information is necessary for targeted management of forest stands.This information is usually obtained from sample-based field inventory data.Due to the time-consuming and costl...Accurate,reliable,and regularly updated information is necessary for targeted management of forest stands.This information is usually obtained from sample-based field inventory data.Due to the time-consuming and costly procedure of forest inventory,it is imperative to generate and use the resulting data optimally.Integrating field inventory information with remote sensing data increases the value of field approaches,such as national forest inventories.This study investigated the optimal integration of forest inventory data with aerial image-based canopy height models(CHM)for forest growing stock estimation.For this purpose,fixed-area and angle-count plots from a forest area in Austria were used to assess which type of inventory system is more suitable when the field data is integrated with aerial image analysis.Although a higher correlation was observed between remotely predicted growing stocks and field inventory values for fixed-area plots,the paired t-test results revealed no statistical difference between the two methods.The R2 increased by 0.08 points and the RMSE decreased by 7.7 percentage points(24.8m^(3)·ha^(−1))using fixed-area plots.Since tree height is the most critical variable essential for modeling forest growing stock using aerial images,we also compared the tree heights obtained from CHM to those from the typical field inventory approach.The result shows a high correlation(R^(2)=0.781)between the tree heights extracted from the CHM and those measured in the field.However,the correlation decreased by 0.113 points and the RMSE increased by 4.2 percentage points(1.04m)when the allometrically derived tree heights were analyzed.Moreover,the results of the paired t-test revealed that there is no significant statistical difference between the tree heights extracted from CHM and those measured in the field,but there is a significant statistical difference when the CHM-derived and the allometrically-derived heights were compared.This proved that image-based CHM can obtain more accurate tree height information than field inventory estimations.Overall,the results of this study demonstrated that image-based CHM can be integrated into the forest inventory data at large scales and provide reliable information on forest growing stock.The produced maps reflect the variability of growth conditions and developmental stages of different forest stands.This information is required to characterize the status and changes,e.g.,in forest structure diversity,parameters for volume,and can be used for forest aboveground biomass estimation,which plays an important role in managing and controlling forest resources in mid-term forest management.This is of particular interest to forest managers and forest ecologists.展开更多
The spatial distribution of overburden layer thickness(OLT)is crucial for landslide susceptibility prediction and slope stability analysis.Due to OLT spatial heterogeneity in hillslope regions,combined with the diffic...The spatial distribution of overburden layer thickness(OLT)is crucial for landslide susceptibility prediction and slope stability analysis.Due to OLT spatial heterogeneity in hillslope regions,combined with the difficulty and time consumption of OLT sample collection,accurately predicting OLT distribution remains a challenging.To address this,a novel framework has been developed.First,OLT samples are collected through field surveys,remote sensing,and geological drilling.Next,the heterogeneity of OLT’s spatial distribution is analyzed using the probability distribution of OLT samples and their horizontal and vertical distributions.The OLT samples are categorized and the small sample categories are expanded using the synthetic minority over-sampling technique(SMOTE).The slope position is selected as a key conditioning factor.Subsequently,16 conditioning factors are applied to construct OLT prediction model using the random forest regression algorithm.Weights are assigned to each OLT sample category to balance the uneven distribution of sample sizes.Finally,the Pearson correlation coefficient,mean absolute error(MAE),root mean square error(RMSE),and Lin’s concordance correlation coefficient(Lin’s CCC)are employed to validate the OLT prediction results.The Huangtan town serves as the case study.Results show:(1)heterogeneity analysis,SMOTE-based OLT sample expansion strategy and slope position selection can significantly mitigate the effect of spatial heterogeneity on OLT prediction.(2)The Pearson correlation coefficient,RMSE,MAE and Lin’s CCC values are 0.84,1.173,1.378 and 0.804,respectively,indicating excellent prediction performance.This research provides an effective solution for predicting OLT distribution in hillslope regions.展开更多
The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations...The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.展开更多
CO_(2)flooding for enhanced oil recovery(EOR)not only enables underground carbon storage but also plays a critical role in tertiary oil recovery.However,its displacement efficiency is constrained by whether CO_(2)and ...CO_(2)flooding for enhanced oil recovery(EOR)not only enables underground carbon storage but also plays a critical role in tertiary oil recovery.However,its displacement efficiency is constrained by whether CO_(2)and crude oil achieve miscibility,necessitating precise prediction of the minimum miscibility pressure(MMP)for CO_(2)-oil systems.Traditional methods,such as experimental measurements and empirical correlations,face challenges including time-consuming procedures and limited applicability.In contrast,artificial intelligence(AI)algorithms have emerged as superior alternatives due to their efficiency,broad applicability,and high prediction accuracy.This study employs four AI algorithms—Random Forest Regression(RFR),Genetic Algorithm Based Back Propagation Artificial Neural Network(GA-BPNN),Support Vector Regression(SVR),and Gaussian Process Regression(GPR)—to establish predictive models for CO_(2)-oil MMP.A comprehensive database comprising 151 data entries was utilized for model development.The performance of these models was rigorously evaluated using five distinct statistical metrics and visualized comparisons.Validation results confirm their accuracy.Field applications demonstrate that all four models are effective for predicting MMP in ultra-deep reservoirs(burial depth>5000 m)with complex crude oil compositions.Among them,the RFR and GA-BPNN models outperform SVR and GPR,achieving root mean square errors(RMSE)of 0.33%and 2.23%,and average absolute percentage relative errors(AAPRE)of 0.01%and 0.04%,respectively.Sensitivity analysis of MMP-influencing factors reveals that reservoir temperature(T_(R))exerts the most significant impact on MMP,while Xint(mole fraction of intermediate oil components,including C_(2)-C_(4),CO_(2),and H_(2)S)exhibits the least influence.展开更多
Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat i...Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.展开更多
Due to the coarse scale of soil moisture products retrieved from passive microwave observations(SMPMW),several downscaling methods have been developed to enable regional scale applications.However,it can be challengin...Due to the coarse scale of soil moisture products retrieved from passive microwave observations(SMPMW),several downscaling methods have been developed to enable regional scale applications.However,it can be challenging for users to access final data products and algorithms,as well as managing different data sources and formats,various data processing methods,and the complexity of the workflows from raw data to information products.Here,the Google Earth Engine(GEE),which as of late offers SMPMW,is used to implement a workflow for retrieving 1 km SM at a depth of 0-5 cm using MODIS optical/thermal measurements,the SM_(PMW)coarse scale product,and a random forest regression.The proposed method was implemented on the African continent to estimate weekly SM maps.The results of this study were evaluated against in-situ measurements of three validation networks.Overall,in comparison to the original SM_(PMW)product,which was limited by a spatial resolution of only 9 km,this method is able to estimate SM at 1 km spatial resolution with acceptable accuracy(an average correlation coefficient of 0.64 and a ubRMSD of 0.069 m^(3)/m^(3)).The results show that the proposed method in GEE provides a precise estimation of SM with a higher spatial resolution across the entire continent.展开更多
This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one ...This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.展开更多
Estimating spatial variation in crop transpiration coefficients(CTc) and aboveground biomass(AGB)rapidly and accurately by remote sensing can facilitate precision irrigation management in semiarid regions. This study ...Estimating spatial variation in crop transpiration coefficients(CTc) and aboveground biomass(AGB)rapidly and accurately by remote sensing can facilitate precision irrigation management in semiarid regions. This study developed and assessed a novel machine learning(ML) method for estimating CTc and AGB using time-series unmanned aerial vehicle(UAV)-based multispectral vegetation indices(VIs)of maize under several irrigation treatments at the field scale. Four ML regression methods: multiple linear regression(MLR), support vector regression(SVR), random forest regression(RFR), and adaptive boosting regression(ABR), were used to address the complex relationship between CTcand VIs. AGB was then estimated using exponential, logistic, sigmoid, and linear equations because of their clear mathematical formulations based on the optimal CTcestimation model. The UAV VIs-derived CTcusing the RFR estimation model yielded the highest accuracy(R^(2)= 0.91, RMSE = 0.0526, and n RMSE = 9.07%). The normalized difference red-edge index, transformed chlorophyll absorption in reflectance index, and simple ratio contributed significantly to the RFR-based CTcmodel. The accuracy of AGB estimation using nonlinear methods was higher than that using the linear method. The exponential method yielded the highest accuracy(R^(2)= 0.76, RMSE = 282.8 g m, and n RMSE = 39.24%) in both the 2018 and 2019 growing seasons. The study confirms that AGB estimation models based on cumulative CTcperformed well under several irrigation treatments using high-resolution time-series UAV multispectral VIs and can support irrigation management with high spatial precision at a field scale.展开更多
Objective:To study the application of a machine learning algorithm for predicting gestational diabetes mellitus(GDM)in early pregnancy.Methods:This study identified indicators related to GDM through a literature revie...Objective:To study the application of a machine learning algorithm for predicting gestational diabetes mellitus(GDM)in early pregnancy.Methods:This study identified indicators related to GDM through a literature review and expert discussion.Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis,and the collected indicators were retrospectively analyzed.Based on Python,the indicators were classified and modeled using a random forest regression algorithm,and the performance of the prediction model was analyzed.Results:We obtained 4806 analyzable data from 1625 pregnant women.Among these,3265 samples with all 67 indicators were used to establish data set F1;4806 samples with 38 identical indicators were used to establish data set F2.Each of F1 and F2 was used for training the random forest algorithm.The overall predictive accuracy of the F1 model was 93.10%,area under the receiver operating characteristic curve(AUC)was 0.66,and the predictive accuracy of GDM-positive cases was 37.10%.The corresponding values for the F2 model were 88.70%,0.87,and 79.44%.The results thus showed that the F2 prediction model performed better than the F1 model.To explore the impact of sacrificial indicators on GDM prediction,the F3 data set was established using 3265 samples(F1)with 38 indicators(F2).After training,the overall predictive accuracy of the F3 model was 91.60%,AUC was 0.58,and the predictive accuracy of positive cases was 15.85%.Conclusions:In this study,a model for predicting GDM with several input variables(e.g.,physical examination,past history,personal history,family history,and laboratory indicators)was established using a random forest regression algorithm.The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy.In addition,there are cer tain requirements for the propor tions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.展开更多
Due to unforeseen climate change,complicated chronic diseases,and mutation of viruses’hospital administration’s top challenge is to know about the Length of stay(LOS)of different diseased patients in the hospitals.H...Due to unforeseen climate change,complicated chronic diseases,and mutation of viruses’hospital administration’s top challenge is to know about the Length of stay(LOS)of different diseased patients in the hospitals.Hospital management does not exactly know when the existing patient leaves the hospital;this information could be crucial for hospital management.It could allow them to take more patients for admission.As a result,hospitals face many problems managing available resources and new patients in getting entries for their prompt treatment.Therefore,a robust model needs to be designed to help hospital administration predict patients’LOS to resolve these issues.For this purpose,a very large-sized data(more than 2.3 million patients’data)related to New-York Hospitals patients and containing information about a wide range of diseases including Bone-Marrow,Tuberculosis,Intestinal Transplant,Mental illness,Leukaemia,Spinal cord injury,Trauma,Rehabilitation,Kidney and Alcoholic Patients,HIV Patients,Malignant Breast disorder,Asthma,Respiratory distress syndrome,etc.have been analyzed to predict the LOS.We selected six Machine learning(ML)models named:Multiple linear regression(MLR),Lasso regression(LR),Ridge regression(RR),Decision tree regression(DTR),Extreme gradient boosting regression(XGBR),and Random Forest regression(RFR).The selected models’predictive performance was checked using R square andMean square error(MSE)as the performance evaluation criteria.Our results revealed the superior predictive performance of the RFRmodel,both in terms of RS score(92%)and MSE score(5),among all selected models.By Exploratory data analysis(EDA),we conclude that maximumstay was between 0 to 5 days with the meantime of each patient 5.3 days and more than 50 years old patients spent more days in the hospital.Based on the average LOS,results revealed that the patients with diagnoses related to birth complications spent more days in the hospital than other diseases.This finding could help predict the future length of hospital stay of new patients,which will help the hospital administration estimate and manage their resources efficiently.展开更多
Artificial intelligence(AI)and machine learning(ML)help in making predictions and businesses to make key decisions that are beneficial for them.In the case of the online shopping business,it’s very important to find ...Artificial intelligence(AI)and machine learning(ML)help in making predictions and businesses to make key decisions that are beneficial for them.In the case of the online shopping business,it’s very important to find trends in the data and get knowledge of features that helps drive the success of the business.In this research,a dataset of 12,330 records of customers has been analyzedwho visited an online shoppingwebsite over a period of one year.The main objective of this research is to find features that are relevant in terms of correctly predicting the purchasing decisions made by visiting customers and build ML models which could make correct predictions on unseen data in the future.The permutation feature importance approach has been used to get the importance of features according to the output variable(Revenue).Five ML models i.e.,decision tree(DT),random forest(RF),extra tree(ET)classifier,Neural networks(NN),and Logistic regression(LR)have been used to make predictions on the unseen data in the future.The performance of each model has been discussed in detail using performance measurement techniques such as accuracy score,precision,recall,F1 score,and ROC-AUC curve.RF model is the bestmodel among all five chosen based on accuracy score of 90%and F1 score of 79%followed by extra tree classifier.Hence,our study indicates that RF model can be used by online retailing businesses for predicting consumer buying behaviour.Our research also reveals the importance of page value as a key feature for capturing online purchasing trends.This may give a clue to future businesses who can focus on this specific feature and can find key factors behind page value success which in turn will help the online shopping business.展开更多
Major fields such as military applications,medical fields,weather forecasting,and environmental applications use wireless sensor networks for major computing processes.Sensors play a vital role in emerging technologie...Major fields such as military applications,medical fields,weather forecasting,and environmental applications use wireless sensor networks for major computing processes.Sensors play a vital role in emerging technologies of the 20th century.Localization of sensors in needed locations is a very serious problem.The environment is home to every living being in the world.The growth of industries after the industrial revolution increased pollution across the environment.Owing to recent uncontrolled growth and development,sensors to measure pollution levels across industries and surroundings are needed.An interesting and challenging task is choosing the place to fit the sensors.Many meta-heuristic techniques have been introduced in node localization.Swarm intelligent algorithms have proven their efficiency in many studies on localization problems.In this article,we introduce an industrial-centric approach to solve the problem of node localization in the sensor network.First,our work aims at selecting industrial areas in the sensed location.We use random forest regression methodology to select the polluted area.Then,the elephant herding algorithm is used in sensor node localization.These two algorithms are combined to produce the best standard result in localizing the sensor nodes.To check the proposed performance,experiments are conducted with data from the KDD Cup 2018,which contain the name of 35 stations with concentrations of air pollutants such as PM,SO_(2),CO,NO_(2),and O_(3).These data are normalized and tested with algorithms.The results are comparatively analyzed with other swarm intelligence algorithms such as the elephant herding algorithm,particle swarm optimization,and machine learning algorithms such as decision tree regression and multi-layer perceptron.Results can indicate our proposed algorithm can suggest more meaningful locations for localizing the sensors in the topology.Our proposed method achieves a lower root mean square value with 0.06 to 0.08 for localizing with Stations 1 to 5.展开更多
Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the gene...Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the generating mechanisms in order to mitigate the pitfalls that might arise during drilling.This research provides estimates of pore pressure along three offshore wells using the Eaton's transit time method,multi-layer perceptron artificial neural network(MLP-ANN)and random forest regression(RFR)algorithms.Our results show that there are three pressure magnitude regimes:normal pressure zone(hydrostatic pressure),transition pressure zone(slightly above hydrostatic pressure),and over pressured zone(significantly above hydrostatic pressure).The top of the geopressured zone(2873 mbRT or 9425.853 ft)averagely marks the onset of overpressurization with the excess pore pressure above hydrostatic pressure(P∗)varying averagely along the three wells between 1.06−24.75 MPa.The results from the three methods are self-consistent with strong correlation between the Eaton's method and the two machine learning models.The models have high accuracy of about>97%,low mean absolute percentage error(MAPE<3%)and coefficient of determination(R2>0.98).Our results have also shown that the principal generating mechanisms responsible for high pore pressure in the offshore Niger Delta are disequilibrium compaction,unloading(fluid expansion)and shale diagenesis.展开更多
The aviation industry has seen significant advancements in safety procedures over the past few decades, resulting in a steady decline in aviation deaths worldwide. However, the safety standards in General Aviation (GA...The aviation industry has seen significant advancements in safety procedures over the past few decades, resulting in a steady decline in aviation deaths worldwide. However, the safety standards in General Aviation (GA) are still lower compared to those in commercial aviation. With the anticipated growth in air travel, there is an imminent need to improve operational safety in GA. One way to improve aircraft and operational safety is through trajectory prediction. Trajectory prediction plays a key role in optimizing air traffic control and improving overall flight safety. This paper proposes a meta-learning approach to predict short- to mid-term trajectories of aircraft using historical real flight data collected from multiple GA aircraft. The proposed solution brings together multiple models to improve prediction accuracy. In this paper, we are combining two models, Random Forest Regression (RFR) and Long Short-term Memory (LSTM), using k-Nearest Neighbors (k-NN), to output the final prediction based on the combined output of the individual models. This approach gives our model an edge over single-model predictions. We present the results of our meta-learner and evaluate its performance against individual models using the Mean Absolute Error (MAE), Absolute Altitude Error (AAE), and Root Mean Squared Error (RMSE) evaluation metrics. The proposed methodology for aircraft trajectory forecasting is discussed in detail, accompanied by a literature review and an overview of the data preprocessing techniques used. The results demonstrate that the proposed meta-learner outperforms individual models in terms of accuracy, providing a more robust and proactive approach to improve operational safety in GA.展开更多
The commercialization of perovskite solar cells(PSCs)is hindered by the instability of organic components and the resource-intensive nature of experimental optimization.Machine learning(ML)is revolutionizing the disco...The commercialization of perovskite solar cells(PSCs)is hindered by the instability of organic components and the resource-intensive nature of experimental optimization.Machine learning(ML)is revolutionizing the discovery and optimization of photovoltaic devices by reducing reliance on conventional trial-and-error approaches.This study aims to optimize the performance of CsPbI₃-based all-inorganic PSCs using a combined SCAPS-1D and machine learning(ML)approach.We generated 56,390 unique device configurations via SCAPS-1D simulations,varying layer thicknesses and defect densities.Five ML models were trained,with XGBoost achieving the highest accuracy(R^(2)=0.999).Feature importance was analyzed using SHAP.Optimization increased the PCE from 15.15%to 19.16%,with the perovskite layer thickness(2μm)and defect density(<10^(15)cm^(-3))identified as critical parameters.This study highlights the potential of ML-driven optimization in perovskite solar cells,offering a systematic and data-driven approach to enhancing device efficiency and accelerating the development of next-generation photovoltaics.展开更多
Nitrogen(N)as a pivotal factor in influencing the growth,development,and yield of maize.Monitoring the N status of maize rapidly and non-destructive and real-time is meaningful in fertilization management of agricultu...Nitrogen(N)as a pivotal factor in influencing the growth,development,and yield of maize.Monitoring the N status of maize rapidly and non-destructive and real-time is meaningful in fertilization management of agriculture,based on unmanned aerial vehicle(UAV)remote sensing technology.In this study,the hyperspectral images were acquired by UAV and the leaf nitrogen content(LNC)and leaf nitrogen accumulation(LNA)were measured to estimate the N nutrition status of maize.24 vegetation indices(VIs)were constructed using hyperspectral images,and four prediction models were used to estimate the LNC and LNA of maize.The models include a single linear regression model,multivariable linear regression(MLR)model,random forest regression(RFR)model,and support vector regression(SVR)model.Moreover,the model with the highest prediction accuracy was applied to invert the LNC and LNA of maize in breeding fields.The results of the single linear regression model with 24 VIs showed that normalized difference chlorophyll(NDchl)had the highest prediction accuracy for LNC(R^(2),RMSE,and RE were 0.72,0.21,and 12.19%,respectively)and LNA(R^(2),RMSE,and RE were 0.77,0.26,and 14.34%,respectively).And then,24 VIs were divided into 13 important VIs and 11 unimportant VIs.Three prediction models for LNC and LNA were constructed using 13 important VIs,and the results showed that RFR and SVR models significantly enhanced the prediction accuracy of LNC and LNA compared to the multivariable linear regression model,in which RFR model had the highest prediction accuracy for the validation dataset of LNC(R^(2),RMSE,and RE were 0.78,0.16,and 8.83%,respectively)and LNA(R^(2),RMSE,and RE were 0.85,0.19,and 9.88%,respectively).This study provides a theoretical basis for N diagnosis and precise management of crop production based on hyperspectral remote sensing in precision agriculture.展开更多
Early crop yield prediction provides critical information for Precision Agriculture(PA)procedures,policymaking,and food security.The availability of Remote Sensing(RS)datasets and Machine Learning(ML)approaches improv...Early crop yield prediction provides critical information for Precision Agriculture(PA)procedures,policymaking,and food security.The availability of Remote Sensing(RS)datasets and Machine Learning(ML)approaches improved the prediction of sugarcane crop yield on the local and global scales,but an additional effort on the plot scale prediction is required.Challenges for plot-level prediction include a high ratooning capacity of the sugarcane crop,the lack of high spatial resolution data during the critical growth stages,and the non-linear complexation of yield data.The principal objective of the study is to analyse the potential of a time series of high-resolution multispectral Unmanned Aerial Vehicle(UAV)imagery along with three advanced ML techniques,namely Random Forest Regression(RFR),Support Vector Regression(SVR),and Nonlinear Autoregressive Exogenous Artificial Neural Network(NARX ANN)as a solution to the plot-level sugarcane yield prediction.An experimental sugarcane field containing 48 plots was selected,and UAV imagery was collected during the three consecutive cropping seasons’early and middle crop growth stages.Each dataset per growth stage was analyzed separately to predict the sugarcane crop yield in an attempt to discover how early the prediction of pre-harvest yield can be achieved.The datasets of the first two cropping seasons were trained and tested using the three ML techniques,utilizing 10-fold cross-validation to avoid overfitting.The third cropping season dataset was then used to evaluate the reliability of the developed prediction models.The results show that the correlation of Vegetation Indices(VIs)with crop yield in the middle stage outperforms the early stage in all three ML models.Moreover,comparing these models indicates that the NARX ANN method outperformed the others in the middle stage with the highest correlation coefficient(R^(2))of 0.96 and the lowest Root Mean Square Error(RMSE)of 4.92 t/ha.It was followed by the SVR(R^(2)=0.52,RMSE of 14.85 t/ha),which performed similarly to the RFR method(R^(2)=0.48,RMSE=11.20 t/ha).In conclusion,the best-suited model for predicting sugarcane yields during the middle growth stage is a NARX ANN model employing the Normalized Difference RedEdge(NDRE),which demonstrates the feasibility of the ML approaches to predict the plot level sugarcane yield at a specific period of growth as they are less sensitive to the inconsistency of data collection times.展开更多
基金Project supported in part by the National Natural Science Foundation of China(Grant No.12075168)the Fund from the Science and Technology Commission of Shanghai Municipality(Grant No.21JC1405600)。
文摘The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements are required to bear large impulse load.However,traditional methods cannot non-destructively and quickly detect the internal structural of pavements.Thus,accurate and fast prediction of the mechanical properties of layered pavements is of great importance and necessity.In recent years,machine learning has shown great superiority in solving nonlinear problems.In this work,we present a method of predicting the maximum deflection and damage factor of layered pavements under instantaneous large impact based on random forest regression with the deflection basin parameters obtained from falling weight deflection testing.The regression coefficient R^(2)of testing datasets are above 0.94 in the process of predicting the elastic moduli of structural layers and mechanical responses,which indicates that the prediction results have great consistency with finite element simulation results.This paper provides a novel method for fast and accurate prediction of pavement mechanical responses under instantaneous large impact load using partial structural parameters of pavements,and has application potential in non-destructive evaluation of pavement structure.
文摘Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.
文摘Modeling implied volatility(IV)is important for option pricing,hedging,and risk management.Previous studies of deterministic implied volatility functions(DIVFs)propose two parameters,moneyness and time to maturity,to estimate implied volatility.Recent DIVF models have included factors such as a moving average ratio and relative bid-ask spread but fail to enhance modeling accuracy.The current study offers a generalized DIVF model by including a momentum indicator for the underlying asset using a relative strength index(RSI)covering multiple time resolutions as a factor,as momentum is often used by investors and speculators in their trading decisions,and in contrast to volatility,RSI can distinguish between bull and bear markets.To the best of our knowledge,prior studies have not included RSI as a predictive factor in modeling IV.Instead of using a simple linear regression as in previous studies,we use a machine learning regression algorithm,namely random forest,to model a nonlinear IV.Previous studies apply DVIF modeling to options on traditional financial assets,such as stock and foreign exchange markets.Here,we study options on the largest cryptocurrency,Bitcoin,which poses greater modeling challenges due to its extreme volatility and the fact that it is not as well studied as traditional financial assets.Recent Bitcoin option chain data were collected from a leading cryptocurrency option exchange over a four-month period for model development and validation.Our dataset includes short-maturity options with expiry in less than six days,as well as a full range of moneyness,both of which are often excluded in existing studies as prices for options with these characteristics are often highly volatile and pose challenges to model building.Our in-sample and out-sample results indicate that including our proposed momentum indicator significantly enhances the model’s accuracy in pricing options.The nonlinear machine learning random forest algorithm also performed better than a simple linear regression.Compared to prevailing option pricing models that employ stochastic variables,our DIVF model does not include stochastic factors but exhibits reasonably good performance.It is also easy to compute due to the availability of real-time RSIs.Our findings indicate our enhanced DIVF model offers significant improvements and may be an excellent alternative to existing option pricing models that are primarily stochastic in nature.
基金supported by grants provided within the research project»EO4Forest:Use of multi-temporal Sentinel-2 and VHR Pleiades stereo data for sustainable forest monitoring and management«funded by the Austrian Federal Ministry for Climate Action,Environ-ment,Energy,Mobility,Innovation and Technology(BMK)within the FFG Austrian Space Applications Program ASAP 12(grant agreement number 854027).
文摘Accurate,reliable,and regularly updated information is necessary for targeted management of forest stands.This information is usually obtained from sample-based field inventory data.Due to the time-consuming and costly procedure of forest inventory,it is imperative to generate and use the resulting data optimally.Integrating field inventory information with remote sensing data increases the value of field approaches,such as national forest inventories.This study investigated the optimal integration of forest inventory data with aerial image-based canopy height models(CHM)for forest growing stock estimation.For this purpose,fixed-area and angle-count plots from a forest area in Austria were used to assess which type of inventory system is more suitable when the field data is integrated with aerial image analysis.Although a higher correlation was observed between remotely predicted growing stocks and field inventory values for fixed-area plots,the paired t-test results revealed no statistical difference between the two methods.The R2 increased by 0.08 points and the RMSE decreased by 7.7 percentage points(24.8m^(3)·ha^(−1))using fixed-area plots.Since tree height is the most critical variable essential for modeling forest growing stock using aerial images,we also compared the tree heights obtained from CHM to those from the typical field inventory approach.The result shows a high correlation(R^(2)=0.781)between the tree heights extracted from the CHM and those measured in the field.However,the correlation decreased by 0.113 points and the RMSE increased by 4.2 percentage points(1.04m)when the allometrically derived tree heights were analyzed.Moreover,the results of the paired t-test revealed that there is no significant statistical difference between the tree heights extracted from CHM and those measured in the field,but there is a significant statistical difference when the CHM-derived and the allometrically-derived heights were compared.This proved that image-based CHM can obtain more accurate tree height information than field inventory estimations.Overall,the results of this study demonstrated that image-based CHM can be integrated into the forest inventory data at large scales and provide reliable information on forest growing stock.The produced maps reflect the variability of growth conditions and developmental stages of different forest stands.This information is required to characterize the status and changes,e.g.,in forest structure diversity,parameters for volume,and can be used for forest aboveground biomass estimation,which plays an important role in managing and controlling forest resources in mid-term forest management.This is of particular interest to forest managers and forest ecologists.
基金funded by the Natural Science Foundation of China(No.42407241,42272326 and 52222905)Jiangxi Provincial Natural Science Foundation(Nos.20242BAB20241,20242BAB23052 and 20242BAB24001).
文摘The spatial distribution of overburden layer thickness(OLT)is crucial for landslide susceptibility prediction and slope stability analysis.Due to OLT spatial heterogeneity in hillslope regions,combined with the difficulty and time consumption of OLT sample collection,accurately predicting OLT distribution remains a challenging.To address this,a novel framework has been developed.First,OLT samples are collected through field surveys,remote sensing,and geological drilling.Next,the heterogeneity of OLT’s spatial distribution is analyzed using the probability distribution of OLT samples and their horizontal and vertical distributions.The OLT samples are categorized and the small sample categories are expanded using the synthetic minority over-sampling technique(SMOTE).The slope position is selected as a key conditioning factor.Subsequently,16 conditioning factors are applied to construct OLT prediction model using the random forest regression algorithm.Weights are assigned to each OLT sample category to balance the uneven distribution of sample sizes.Finally,the Pearson correlation coefficient,mean absolute error(MAE),root mean square error(RMSE),and Lin’s concordance correlation coefficient(Lin’s CCC)are employed to validate the OLT prediction results.The Huangtan town serves as the case study.Results show:(1)heterogeneity analysis,SMOTE-based OLT sample expansion strategy and slope position selection can significantly mitigate the effect of spatial heterogeneity on OLT prediction.(2)The Pearson correlation coefficient,RMSE,MAE and Lin’s CCC values are 0.84,1.173,1.378 and 0.804,respectively,indicating excellent prediction performance.This research provides an effective solution for predicting OLT distribution in hillslope regions.
文摘The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.
文摘CO_(2)flooding for enhanced oil recovery(EOR)not only enables underground carbon storage but also plays a critical role in tertiary oil recovery.However,its displacement efficiency is constrained by whether CO_(2)and crude oil achieve miscibility,necessitating precise prediction of the minimum miscibility pressure(MMP)for CO_(2)-oil systems.Traditional methods,such as experimental measurements and empirical correlations,face challenges including time-consuming procedures and limited applicability.In contrast,artificial intelligence(AI)algorithms have emerged as superior alternatives due to their efficiency,broad applicability,and high prediction accuracy.This study employs four AI algorithms—Random Forest Regression(RFR),Genetic Algorithm Based Back Propagation Artificial Neural Network(GA-BPNN),Support Vector Regression(SVR),and Gaussian Process Regression(GPR)—to establish predictive models for CO_(2)-oil MMP.A comprehensive database comprising 151 data entries was utilized for model development.The performance of these models was rigorously evaluated using five distinct statistical metrics and visualized comparisons.Validation results confirm their accuracy.Field applications demonstrate that all four models are effective for predicting MMP in ultra-deep reservoirs(burial depth>5000 m)with complex crude oil compositions.Among them,the RFR and GA-BPNN models outperform SVR and GPR,achieving root mean square errors(RMSE)of 0.33%and 2.23%,and average absolute percentage relative errors(AAPRE)of 0.01%and 0.04%,respectively.Sensitivity analysis of MMP-influencing factors reveals that reservoir temperature(T_(R))exerts the most significant impact on MMP,while Xint(mole fraction of intermediate oil components,including C_(2)-C_(4),CO_(2),and H_(2)S)exhibits the least influence.
基金Under the auspices of National Natural Science Foundation of China(No.41977411,41771383)Technology Research Project of the Education Department of Jilin Province(No.JJKH20210445KJ)。
文摘Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.
基金funded by the Deutsche Forschungsgemeinschaft(DFG,German Research Foundation)-SFB 1502/1-2022-project number:450058266.
文摘Due to the coarse scale of soil moisture products retrieved from passive microwave observations(SMPMW),several downscaling methods have been developed to enable regional scale applications.However,it can be challenging for users to access final data products and algorithms,as well as managing different data sources and formats,various data processing methods,and the complexity of the workflows from raw data to information products.Here,the Google Earth Engine(GEE),which as of late offers SMPMW,is used to implement a workflow for retrieving 1 km SM at a depth of 0-5 cm using MODIS optical/thermal measurements,the SM_(PMW)coarse scale product,and a random forest regression.The proposed method was implemented on the African continent to estimate weekly SM maps.The results of this study were evaluated against in-situ measurements of three validation networks.Overall,in comparison to the original SM_(PMW)product,which was limited by a spatial resolution of only 9 km,this method is able to estimate SM at 1 km spatial resolution with acceptable accuracy(an average correlation coefficient of 0.64 and a ubRMSD of 0.069 m^(3)/m^(3)).The results show that the proposed method in GEE provides a precise estimation of SM with a higher spatial resolution across the entire continent.
基金supported by the High-end Foreign Expert Introduction program(No.G20190022002)Chongqing Construction Science and Technology Plan Project(2019-0045)+1 种基金the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJZD-K201900102)The financial support is gratefully acknowledged。
文摘This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.
基金funded by the National Natural Science Foundation of China (51979233)the Natural Science Basic Research Plan in Shaanxi Province of China (2022JQ-363)。
文摘Estimating spatial variation in crop transpiration coefficients(CTc) and aboveground biomass(AGB)rapidly and accurately by remote sensing can facilitate precision irrigation management in semiarid regions. This study developed and assessed a novel machine learning(ML) method for estimating CTc and AGB using time-series unmanned aerial vehicle(UAV)-based multispectral vegetation indices(VIs)of maize under several irrigation treatments at the field scale. Four ML regression methods: multiple linear regression(MLR), support vector regression(SVR), random forest regression(RFR), and adaptive boosting regression(ABR), were used to address the complex relationship between CTcand VIs. AGB was then estimated using exponential, logistic, sigmoid, and linear equations because of their clear mathematical formulations based on the optimal CTcestimation model. The UAV VIs-derived CTcusing the RFR estimation model yielded the highest accuracy(R^(2)= 0.91, RMSE = 0.0526, and n RMSE = 9.07%). The normalized difference red-edge index, transformed chlorophyll absorption in reflectance index, and simple ratio contributed significantly to the RFR-based CTcmodel. The accuracy of AGB estimation using nonlinear methods was higher than that using the linear method. The exponential method yielded the highest accuracy(R^(2)= 0.76, RMSE = 282.8 g m, and n RMSE = 39.24%) in both the 2018 and 2019 growing seasons. The study confirms that AGB estimation models based on cumulative CTcperformed well under several irrigation treatments using high-resolution time-series UAV multispectral VIs and can support irrigation management with high spatial precision at a field scale.
基金supported by the Qingdao Municipal Bureau of Science and Technology(No.19-6-1-55-nsh)。
文摘Objective:To study the application of a machine learning algorithm for predicting gestational diabetes mellitus(GDM)in early pregnancy.Methods:This study identified indicators related to GDM through a literature review and expert discussion.Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis,and the collected indicators were retrospectively analyzed.Based on Python,the indicators were classified and modeled using a random forest regression algorithm,and the performance of the prediction model was analyzed.Results:We obtained 4806 analyzable data from 1625 pregnant women.Among these,3265 samples with all 67 indicators were used to establish data set F1;4806 samples with 38 identical indicators were used to establish data set F2.Each of F1 and F2 was used for training the random forest algorithm.The overall predictive accuracy of the F1 model was 93.10%,area under the receiver operating characteristic curve(AUC)was 0.66,and the predictive accuracy of GDM-positive cases was 37.10%.The corresponding values for the F2 model were 88.70%,0.87,and 79.44%.The results thus showed that the F2 prediction model performed better than the F1 model.To explore the impact of sacrificial indicators on GDM prediction,the F3 data set was established using 3265 samples(F1)with 38 indicators(F2).After training,the overall predictive accuracy of the F3 model was 91.60%,AUC was 0.58,and the predictive accuracy of positive cases was 15.85%.Conclusions:In this study,a model for predicting GDM with several input variables(e.g.,physical examination,past history,personal history,family history,and laboratory indicators)was established using a random forest regression algorithm.The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy.In addition,there are cer tain requirements for the propor tions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.
文摘Due to unforeseen climate change,complicated chronic diseases,and mutation of viruses’hospital administration’s top challenge is to know about the Length of stay(LOS)of different diseased patients in the hospitals.Hospital management does not exactly know when the existing patient leaves the hospital;this information could be crucial for hospital management.It could allow them to take more patients for admission.As a result,hospitals face many problems managing available resources and new patients in getting entries for their prompt treatment.Therefore,a robust model needs to be designed to help hospital administration predict patients’LOS to resolve these issues.For this purpose,a very large-sized data(more than 2.3 million patients’data)related to New-York Hospitals patients and containing information about a wide range of diseases including Bone-Marrow,Tuberculosis,Intestinal Transplant,Mental illness,Leukaemia,Spinal cord injury,Trauma,Rehabilitation,Kidney and Alcoholic Patients,HIV Patients,Malignant Breast disorder,Asthma,Respiratory distress syndrome,etc.have been analyzed to predict the LOS.We selected six Machine learning(ML)models named:Multiple linear regression(MLR),Lasso regression(LR),Ridge regression(RR),Decision tree regression(DTR),Extreme gradient boosting regression(XGBR),and Random Forest regression(RFR).The selected models’predictive performance was checked using R square andMean square error(MSE)as the performance evaluation criteria.Our results revealed the superior predictive performance of the RFRmodel,both in terms of RS score(92%)and MSE score(5),among all selected models.By Exploratory data analysis(EDA),we conclude that maximumstay was between 0 to 5 days with the meantime of each patient 5.3 days and more than 50 years old patients spent more days in the hospital.Based on the average LOS,results revealed that the patients with diagnoses related to birth complications spent more days in the hospital than other diseases.This finding could help predict the future length of hospital stay of new patients,which will help the hospital administration estimate and manage their resources efficiently.
文摘Artificial intelligence(AI)and machine learning(ML)help in making predictions and businesses to make key decisions that are beneficial for them.In the case of the online shopping business,it’s very important to find trends in the data and get knowledge of features that helps drive the success of the business.In this research,a dataset of 12,330 records of customers has been analyzedwho visited an online shoppingwebsite over a period of one year.The main objective of this research is to find features that are relevant in terms of correctly predicting the purchasing decisions made by visiting customers and build ML models which could make correct predictions on unseen data in the future.The permutation feature importance approach has been used to get the importance of features according to the output variable(Revenue).Five ML models i.e.,decision tree(DT),random forest(RF),extra tree(ET)classifier,Neural networks(NN),and Logistic regression(LR)have been used to make predictions on the unseen data in the future.The performance of each model has been discussed in detail using performance measurement techniques such as accuracy score,precision,recall,F1 score,and ROC-AUC curve.RF model is the bestmodel among all five chosen based on accuracy score of 90%and F1 score of 79%followed by extra tree classifier.Hence,our study indicates that RF model can be used by online retailing businesses for predicting consumer buying behaviour.Our research also reveals the importance of page value as a key feature for capturing online purchasing trends.This may give a clue to future businesses who can focus on this specific feature and can find key factors behind page value success which in turn will help the online shopping business.
文摘Major fields such as military applications,medical fields,weather forecasting,and environmental applications use wireless sensor networks for major computing processes.Sensors play a vital role in emerging technologies of the 20th century.Localization of sensors in needed locations is a very serious problem.The environment is home to every living being in the world.The growth of industries after the industrial revolution increased pollution across the environment.Owing to recent uncontrolled growth and development,sensors to measure pollution levels across industries and surroundings are needed.An interesting and challenging task is choosing the place to fit the sensors.Many meta-heuristic techniques have been introduced in node localization.Swarm intelligent algorithms have proven their efficiency in many studies on localization problems.In this article,we introduce an industrial-centric approach to solve the problem of node localization in the sensor network.First,our work aims at selecting industrial areas in the sensed location.We use random forest regression methodology to select the polluted area.Then,the elephant herding algorithm is used in sensor node localization.These two algorithms are combined to produce the best standard result in localizing the sensor nodes.To check the proposed performance,experiments are conducted with data from the KDD Cup 2018,which contain the name of 35 stations with concentrations of air pollutants such as PM,SO_(2),CO,NO_(2),and O_(3).These data are normalized and tested with algorithms.The results are comparatively analyzed with other swarm intelligence algorithms such as the elephant herding algorithm,particle swarm optimization,and machine learning algorithms such as decision tree regression and multi-layer perceptron.Results can indicate our proposed algorithm can suggest more meaningful locations for localizing the sensors in the topology.Our proposed method achieves a lower root mean square value with 0.06 to 0.08 for localizing with Stations 1 to 5.
文摘Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the generating mechanisms in order to mitigate the pitfalls that might arise during drilling.This research provides estimates of pore pressure along three offshore wells using the Eaton's transit time method,multi-layer perceptron artificial neural network(MLP-ANN)and random forest regression(RFR)algorithms.Our results show that there are three pressure magnitude regimes:normal pressure zone(hydrostatic pressure),transition pressure zone(slightly above hydrostatic pressure),and over pressured zone(significantly above hydrostatic pressure).The top of the geopressured zone(2873 mbRT or 9425.853 ft)averagely marks the onset of overpressurization with the excess pore pressure above hydrostatic pressure(P∗)varying averagely along the three wells between 1.06−24.75 MPa.The results from the three methods are self-consistent with strong correlation between the Eaton's method and the two machine learning models.The models have high accuracy of about>97%,low mean absolute percentage error(MAPE<3%)and coefficient of determination(R2>0.98).Our results have also shown that the principal generating mechanisms responsible for high pore pressure in the offshore Niger Delta are disequilibrium compaction,unloading(fluid expansion)and shale diagenesis.
文摘The aviation industry has seen significant advancements in safety procedures over the past few decades, resulting in a steady decline in aviation deaths worldwide. However, the safety standards in General Aviation (GA) are still lower compared to those in commercial aviation. With the anticipated growth in air travel, there is an imminent need to improve operational safety in GA. One way to improve aircraft and operational safety is through trajectory prediction. Trajectory prediction plays a key role in optimizing air traffic control and improving overall flight safety. This paper proposes a meta-learning approach to predict short- to mid-term trajectories of aircraft using historical real flight data collected from multiple GA aircraft. The proposed solution brings together multiple models to improve prediction accuracy. In this paper, we are combining two models, Random Forest Regression (RFR) and Long Short-term Memory (LSTM), using k-Nearest Neighbors (k-NN), to output the final prediction based on the combined output of the individual models. This approach gives our model an edge over single-model predictions. We present the results of our meta-learner and evaluate its performance against individual models using the Mean Absolute Error (MAE), Absolute Altitude Error (AAE), and Root Mean Squared Error (RMSE) evaluation metrics. The proposed methodology for aircraft trajectory forecasting is discussed in detail, accompanied by a literature review and an overview of the data preprocessing techniques used. The results demonstrate that the proposed meta-learner outperforms individual models in terms of accuracy, providing a more robust and proactive approach to improve operational safety in GA.
基金supported by the EU Horizon2020 Project Marketplace,No.760173.
文摘The commercialization of perovskite solar cells(PSCs)is hindered by the instability of organic components and the resource-intensive nature of experimental optimization.Machine learning(ML)is revolutionizing the discovery and optimization of photovoltaic devices by reducing reliance on conventional trial-and-error approaches.This study aims to optimize the performance of CsPbI₃-based all-inorganic PSCs using a combined SCAPS-1D and machine learning(ML)approach.We generated 56,390 unique device configurations via SCAPS-1D simulations,varying layer thicknesses and defect densities.Five ML models were trained,with XGBoost achieving the highest accuracy(R^(2)=0.999).Feature importance was analyzed using SHAP.Optimization increased the PCE from 15.15%to 19.16%,with the perovskite layer thickness(2μm)and defect density(<10^(15)cm^(-3))identified as critical parameters.This study highlights the potential of ML-driven optimization in perovskite solar cells,offering a systematic and data-driven approach to enhancing device efficiency and accelerating the development of next-generation photovoltaics.
基金financially supported by the Hainan Province Science and Technology Special Fund(Grant No.ZDYF2021GXJS038 and Grant No.ZDYF2024XDNY196)Hainan Provincial Natural Science Foundation of China(Grant No.320RC486)the National Natural Science Foundation of China(Grant No.42167011).
文摘Nitrogen(N)as a pivotal factor in influencing the growth,development,and yield of maize.Monitoring the N status of maize rapidly and non-destructive and real-time is meaningful in fertilization management of agriculture,based on unmanned aerial vehicle(UAV)remote sensing technology.In this study,the hyperspectral images were acquired by UAV and the leaf nitrogen content(LNC)and leaf nitrogen accumulation(LNA)were measured to estimate the N nutrition status of maize.24 vegetation indices(VIs)were constructed using hyperspectral images,and four prediction models were used to estimate the LNC and LNA of maize.The models include a single linear regression model,multivariable linear regression(MLR)model,random forest regression(RFR)model,and support vector regression(SVR)model.Moreover,the model with the highest prediction accuracy was applied to invert the LNC and LNA of maize in breeding fields.The results of the single linear regression model with 24 VIs showed that normalized difference chlorophyll(NDchl)had the highest prediction accuracy for LNC(R^(2),RMSE,and RE were 0.72,0.21,and 12.19%,respectively)and LNA(R^(2),RMSE,and RE were 0.77,0.26,and 14.34%,respectively).And then,24 VIs were divided into 13 important VIs and 11 unimportant VIs.Three prediction models for LNC and LNA were constructed using 13 important VIs,and the results showed that RFR and SVR models significantly enhanced the prediction accuracy of LNC and LNA compared to the multivariable linear regression model,in which RFR model had the highest prediction accuracy for the validation dataset of LNC(R^(2),RMSE,and RE were 0.78,0.16,and 8.83%,respectively)and LNA(R^(2),RMSE,and RE were 0.85,0.19,and 9.88%,respectively).This study provides a theoretical basis for N diagnosis and precise management of crop production based on hyperspectral remote sensing in precision agriculture.
文摘Early crop yield prediction provides critical information for Precision Agriculture(PA)procedures,policymaking,and food security.The availability of Remote Sensing(RS)datasets and Machine Learning(ML)approaches improved the prediction of sugarcane crop yield on the local and global scales,but an additional effort on the plot scale prediction is required.Challenges for plot-level prediction include a high ratooning capacity of the sugarcane crop,the lack of high spatial resolution data during the critical growth stages,and the non-linear complexation of yield data.The principal objective of the study is to analyse the potential of a time series of high-resolution multispectral Unmanned Aerial Vehicle(UAV)imagery along with three advanced ML techniques,namely Random Forest Regression(RFR),Support Vector Regression(SVR),and Nonlinear Autoregressive Exogenous Artificial Neural Network(NARX ANN)as a solution to the plot-level sugarcane yield prediction.An experimental sugarcane field containing 48 plots was selected,and UAV imagery was collected during the three consecutive cropping seasons’early and middle crop growth stages.Each dataset per growth stage was analyzed separately to predict the sugarcane crop yield in an attempt to discover how early the prediction of pre-harvest yield can be achieved.The datasets of the first two cropping seasons were trained and tested using the three ML techniques,utilizing 10-fold cross-validation to avoid overfitting.The third cropping season dataset was then used to evaluate the reliability of the developed prediction models.The results show that the correlation of Vegetation Indices(VIs)with crop yield in the middle stage outperforms the early stage in all three ML models.Moreover,comparing these models indicates that the NARX ANN method outperformed the others in the middle stage with the highest correlation coefficient(R^(2))of 0.96 and the lowest Root Mean Square Error(RMSE)of 4.92 t/ha.It was followed by the SVR(R^(2)=0.52,RMSE of 14.85 t/ha),which performed similarly to the RFR method(R^(2)=0.48,RMSE=11.20 t/ha).In conclusion,the best-suited model for predicting sugarcane yields during the middle growth stage is a NARX ANN model employing the Normalized Difference RedEdge(NDRE),which demonstrates the feasibility of the ML approaches to predict the plot level sugarcane yield at a specific period of growth as they are less sensitive to the inconsistency of data collection times.