Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have becom...Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have become major challenges in forestry research.In this study,we selected the Shaanxi-Gansu-Ningxia region of Northeast China as the research area and utilized multi-source datasets from the summer of 2019 to extract information on spectral,textural,climatic,water balance,and stand characteristics.By integrating the Random Forest(RF)model with Monte Carlo(MC)simulation,we constructed six regression models based on different combina-tions of features and evaluated the uncertainty of each model.Furthermore,we investigated the driving factors influencing stand age modeling by analyzing the effects of different types of features on age inversion.Model performance and accuracy were assessed using the root mean square error(RMSE),mean absolute error(MAE),and the coefficient of determination(R^(2)),while the relative root mean square error(rRMSE)was employed to quantify model uncertainty.The results indicate that the scenarios with more obvious improve-ment in accuracy and effective reduction in uncertainty were Scenario 3 with the inclusion of climate and water balance information(RMSE=25.54 yr,MAE=18.03 yr,R^(2)=0.51,rRMSE=19.17%)and Scenario 5 with the inclusion of stand characterization informa-tion(RMSE=18.47 yr,MAE=13.05 yr,R^(2)=0.74,rRMSE=16.99%).Scenario 6,incorporating all feature types,achieved the highest accuracy(RMSE=17.60 yr,MAE=12.06 yr,R^(2)=0.77,rRMSE=14.19%).In this study,elevation,minimum temperature,and diameter at breast height(DBH)emerged as the key drivers of stand-age modeling.The proposed method can be used to identify drivers and to quantify uncertainty in stand-age estimation,providing a useful reference for improving model accuracy and uncertainty assessment.展开更多
Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automaticall...Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automatically and manually corrected hydrological slope unit division,the Longhua District,Shenzhen City,Guangdong Province,was selected as the study area.A total of 15 influencing factors,namely Fluctuation,slope,slope aspect,curvature,topographic witness index(TWI),stream power index(SPI),topographic roughness index(TRI),annual average rainfall,distance to water system,engineering rock group,distance to fault,land use,normalized difference vegetation index(NDVI),nighttime light,and distance to road,were selected as evaluation indicators.The information volume model(IV)and random points were used to select non-geological disaster units,and then the random forest model(RF)was used to evaluate the susceptibility to geological disasters.The automatic slope unit and the hydrological slope unit were compared and analyzed in the random forest and information volume random forest models.The results show that the area under the curve(AUC)values of the automatic slope unit evaluation results are 0.931 for the IV-RF model and 0.716 for the RF model,which are 0.6%(IV-RF model)and 1.9%(RF model)higher than those for the hydrological slope unit.Based on a comparison of the evaluation methods based on the two types of slope units,the hydrological slope unit evaluation method based on manual correction is highly subjective,is complicated to operate,and has a low evaluation accuracy,whereas the evaluation method based on automatic slope unit division is efficient and accurate,is suitable for large-scale efficient geological disaster evaluation,and can better deal with the problem of geological disaster susceptibility evaluation.展开更多
Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection sei...Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the firstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verified by trial calculation in the porosity prediction of model data.Taking the actual coalfield refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding significance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.展开更多
To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section,...To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.展开更多
One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this p...One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.展开更多
The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT...The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.展开更多
Mountainous rangelands play a pivotal role in providing forage resources for livestock, particularly in summer, and maintaining ecological balance. This study aimed to identify environmental variables affecting range ...Mountainous rangelands play a pivotal role in providing forage resources for livestock, particularly in summer, and maintaining ecological balance. This study aimed to identify environmental variables affecting range plant species distribution, ecological analysis of the relationship between these variables and the distribution of plants, and to model and map the plant habitats suitability by the Random Forest Method(RFM) in rangelands of the Taftan Mountain, Sistan and Baluchestan Province, southeastern Iran. In order to determine the environmental variables and estimate the potential distribution of plant species, the presence points of plants were recorded by using systematic random sampling method(90 points of presence) and soils were sampled in 5 habitats by random method in 0–30 and 30–60 cm depths. The layers of environmental variables were prepared using the Kriging interpolation method and Geographic Information System facilities. The distribution of the plant habitats was finally modelled and mapped by the RFM. Continuous maps of the habitat suitability were converted to binary maps using Youden Index(?) in order to evaluate the accuracy of the RFM in estimation of the distribution of species potentialhabitat. Based on the values of the area under curve(AUC) statistics, accuracy of predictive models of all habitats was in good level. Investigating the agreement between the predicted map, generated by each model, and actual maps, generated from fieldmeasured data, of the plant habitats, was at a high level for all habitats, except for Amygdalus scoparia habitat. This study concluded that the RFM is a robust model to analyze the relationships between the distribution of plant species and environmental variables as well as to prepare potential distribution maps of plant habitats that are of higher priority for conservation on the local scale in arid mountainous rangelands.展开更多
The automatic recognition of landforms is regarded as one of the most important procedures to classify landforms and deepen the understanding on the morphology of the earth. However, landform types are rather complex ...The automatic recognition of landforms is regarded as one of the most important procedures to classify landforms and deepen the understanding on the morphology of the earth. However, landform types are rather complex and gradual changes often occur in these landforms, thus increasing the difficulty in automatically recognizing and classifying landforms. In this study, small-scale watersheds, which are regarded as natural geomorphological elements, were extracted and selected as basic analysis and recognition units based on the data of SRTM DEM. In addition, datasets integrated with terrain derivatives(e.g., average slope gradient, and elevation range) and texture derivatives(e.g., slope gradient contrast and elevation variance) were constructed to quantify the topographical characteristics of watersheds. Finally, Random Forest(RF) method was employed to automatically select features and classify landforms based on their topographical characteristics. The proposed method was applied and validated in seven case areas in the Northern Shaanxi Loess Plateau for its complex andgradual changed landforms. Experimental results show that the highest recognition accuracy based on the selected derivations is 92.06%. During the recognition procedure, the contributions of terrain derivations were higher than that of texture derivations within selected derivative datasets. Loess terrace and loess mid-mountain obtained the highest accuracy among the seven typical loess landforms. However, the recognition precision of loess hill, loess hill–ridge, and loess sloping ridge is relatively low. The experiment also shows that watershed-based strategy could achieve better results than object-based strategy, and the method of RF could effectively extract and recognize the feature of landforms.展开更多
As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly...As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.展开更多
Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op...Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.展开更多
As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empi...As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.展开更多
The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection h...The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.展开更多
With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick read...With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.展开更多
Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat i...Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.展开更多
Massive Open Online Course(MOOC)has become a popular way of online learning used across the world by millions of people.Meanwhile,a vast amount of information has been collected from the MOOC learners and institutions...Massive Open Online Course(MOOC)has become a popular way of online learning used across the world by millions of people.Meanwhile,a vast amount of information has been collected from the MOOC learners and institutions.Based on the educational data,a lot of researches have been investigated for the prediction of the MOOC learner’s final grade.However,there are still two problems in this research field.The first problem is how to select the most proper features to improve the prediction accuracy,and the second problem is how to use or modify the data mining algorithms for a better analysis of the MOOC data.In order to solve these two problems,an improved random forests method is proposed in this paper.First,a hybrid indicator is defined to measure the importance of the features,and a rule is further established for the feature selection;then,a Clustering-Synthetic Minority Over-sampling Technique(SMOTE)is embedded into the traditional random forests algorithm to solve the class imbalance problem.In experiment part,we verify the performance of the proposed method by using the Canvas Network Person-Course(CNPC)dataset.Furthermore,four well-known prediction methods have been applied for comparison,where the superiority of our method has been proved.展开更多
The moisture control of materials in silk reeling technology of tobacco is regarded as the key factor influencing the inner quality of cigarette. In this paper, according to the statistical data of the silk reeling pr...The moisture control of materials in silk reeling technology of tobacco is regarded as the key factor influencing the inner quality of cigarette. In this paper, according to the statistical data of the silk reeling production line of Yunyan (Ruanzhen brand) of Qujing cigarette factory from June 2013 to May 2014, it is feasible to apply the random forest regression model to study the problem of moisture control theoretically. In the perfuming stage of silk reeling, a random forest regression model is established to describe the change of moisture content of finished cut tobacco in the export link of perfuming stage, aroused by several factors including incoming water content and different environment. According to the model, good moisture control in the export link of perfuming stage (accordance with the technological standards) can be realized by adjusting the regulating reference value of incoming moisture under specific workshop environments. In the drying stage of silk reeling, the most effective method of moisture control is to adjust the cylinder wall temperature by means of analyzing the correlation coefficients among variables which influence the moisture content of cut tobacco in the export link of drying stage and then establishing another random forest regression model. And this method is consistent with the traditional production experiences. In conclusion, these methods referred above provide strong theoretical basis for stable moisture control in the export link of perfuming stage.展开更多
The discretization of random fields is the first and most important step in the stochastic analysis of engineering structures with spatially dependent random parameters.The essential step of discretization is solving ...The discretization of random fields is the first and most important step in the stochastic analysis of engineering structures with spatially dependent random parameters.The essential step of discretization is solving the Fredholm integral equation to obtain the eigenvalues and eigenfunctions of the covariance functions of the random fields.The collocation method,which has fewer integral operations,is more efficient in accomplishing the task than the timeconsuming Galerkin method,and it is more suitable for engineering applications with complex geometries and a large number of elements.With the help of isogeometric analysis that preserves accurate geometry in analysis,the isogeometric collocation method can efficiently achieve the results with sufficient accuracy.An adaptive moment abscissa is proposed to calculate the coordinates of the collocation points to further improve the accuracy of the collocation method.The adaptive moment abscissae led to more accurate results than the classical Greville abscissae when using the moment parameter optimized with intelligent algorithms.Numerical and engineering examples illustrate the advantages of the proposed isogeometric collocation method based on the adaptive moment abscissae over existing methods in terms of accuracy and efficiency.展开更多
This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70...This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70%for training and 30%for validation,and analyses the correlation between features using a correlation matrix.The experimental results show that the Elastic Net feature selection method generally outperforms PCA in all models,especially when combined with the Random Forest and XGBoost models,and the ElasticNet+Random Forest model achieves the highest accuracy of 0.968 and AUC value of 0.983,while the Kappa and MCC also reached 0.839 and 0.844 respectively,showing extremely high consistency and correlation.This indicates that combining Elastic Net feature selection and Random Forest model has significant performance advantages in online fraud detection.展开更多
Accurate spatial prediction of soil organic carbon(SOC)and soil inorganic carbon(SIC)is vital for land management decisions.This study targets SOC/SIC mapping challenges at the watershed scale in central Iran by addre...Accurate spatial prediction of soil organic carbon(SOC)and soil inorganic carbon(SIC)is vital for land management decisions.This study targets SOC/SIC mapping challenges at the watershed scale in central Iran by addressing environmental heterogeneity through a random forest(RF)model combined with bootstrapping to assess prediction uncertainty.Thirty-eight environmental variables-categorized into climatic,soil physicochemical,topographic,geomorphic,and remote sensing(RS)-based factors-were considered.Variable importance analysis(via)and partial dependence plots(PDP)identified land use,RS indices,and topography as key predictors of SOC.For SIC,soil reflectance(Bands 5 and 7,ETM+),topography,and geomorphic units were most influential.Climatic factors showed minimal impact in the studied semi-arid watershed.The RF model achieved moderate prediction accuracy(SOC:R^(2)=0.43±0.13,nRMSE=0.28;SIC:R^(2)=0.47±0.11,nRMSE=0.37).Via and PDP analyses enhanced model interpretability by clarifying environmental influences on SOC/SIC spatial distribution.展开更多
基金Under the auspices of the Natural Science Foundation of China(No.32371875,32001249)。
文摘Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have become major challenges in forestry research.In this study,we selected the Shaanxi-Gansu-Ningxia region of Northeast China as the research area and utilized multi-source datasets from the summer of 2019 to extract information on spectral,textural,climatic,water balance,and stand characteristics.By integrating the Random Forest(RF)model with Monte Carlo(MC)simulation,we constructed six regression models based on different combina-tions of features and evaluated the uncertainty of each model.Furthermore,we investigated the driving factors influencing stand age modeling by analyzing the effects of different types of features on age inversion.Model performance and accuracy were assessed using the root mean square error(RMSE),mean absolute error(MAE),and the coefficient of determination(R^(2)),while the relative root mean square error(rRMSE)was employed to quantify model uncertainty.The results indicate that the scenarios with more obvious improve-ment in accuracy and effective reduction in uncertainty were Scenario 3 with the inclusion of climate and water balance information(RMSE=25.54 yr,MAE=18.03 yr,R^(2)=0.51,rRMSE=19.17%)and Scenario 5 with the inclusion of stand characterization informa-tion(RMSE=18.47 yr,MAE=13.05 yr,R^(2)=0.74,rRMSE=16.99%).Scenario 6,incorporating all feature types,achieved the highest accuracy(RMSE=17.60 yr,MAE=12.06 yr,R^(2)=0.77,rRMSE=14.19%).In this study,elevation,minimum temperature,and diameter at breast height(DBH)emerged as the key drivers of stand-age modeling.The proposed method can be used to identify drivers and to quantify uncertainty in stand-age estimation,providing a useful reference for improving model accuracy and uncertainty assessment.
文摘Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automatically and manually corrected hydrological slope unit division,the Longhua District,Shenzhen City,Guangdong Province,was selected as the study area.A total of 15 influencing factors,namely Fluctuation,slope,slope aspect,curvature,topographic witness index(TWI),stream power index(SPI),topographic roughness index(TRI),annual average rainfall,distance to water system,engineering rock group,distance to fault,land use,normalized difference vegetation index(NDVI),nighttime light,and distance to road,were selected as evaluation indicators.The information volume model(IV)and random points were used to select non-geological disaster units,and then the random forest model(RF)was used to evaluate the susceptibility to geological disasters.The automatic slope unit and the hydrological slope unit were compared and analyzed in the random forest and information volume random forest models.The results show that the area under the curve(AUC)values of the automatic slope unit evaluation results are 0.931 for the IV-RF model and 0.716 for the RF model,which are 0.6%(IV-RF model)and 1.9%(RF model)higher than those for the hydrological slope unit.Based on a comparison of the evaluation methods based on the two types of slope units,the hydrological slope unit evaluation method based on manual correction is highly subjective,is complicated to operate,and has a low evaluation accuracy,whereas the evaluation method based on automatic slope unit division is efficient and accurate,is suitable for large-scale efficient geological disaster evaluation,and can better deal with the problem of geological disaster susceptibility evaluation.
基金National Natural Science Foundation of China(Grant No.42274180)National Key Research and Development Program of China(2021YFC2902003).
文摘Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the firstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verified by trial calculation in the porosity prediction of model data.Taking the actual coalfield refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding significance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.
文摘To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.
基金support of the project from the National Key R&D Program of China,Research and Application of Sensing System for Cross-regional Complex Oil&Gas Pipeline Network Safe and Efficiency Operational Status Monitoring(Grant No.2022YFB3207603).
文摘One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.
文摘The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.
基金funded by University of Zabol,Iran(Grant No.UOZ-GR-9517-24)the Vice Chancellery for Research and Technology,University of Zabol,for funding this study
文摘Mountainous rangelands play a pivotal role in providing forage resources for livestock, particularly in summer, and maintaining ecological balance. This study aimed to identify environmental variables affecting range plant species distribution, ecological analysis of the relationship between these variables and the distribution of plants, and to model and map the plant habitats suitability by the Random Forest Method(RFM) in rangelands of the Taftan Mountain, Sistan and Baluchestan Province, southeastern Iran. In order to determine the environmental variables and estimate the potential distribution of plant species, the presence points of plants were recorded by using systematic random sampling method(90 points of presence) and soils were sampled in 5 habitats by random method in 0–30 and 30–60 cm depths. The layers of environmental variables were prepared using the Kriging interpolation method and Geographic Information System facilities. The distribution of the plant habitats was finally modelled and mapped by the RFM. Continuous maps of the habitat suitability were converted to binary maps using Youden Index(?) in order to evaluate the accuracy of the RFM in estimation of the distribution of species potentialhabitat. Based on the values of the area under curve(AUC) statistics, accuracy of predictive models of all habitats was in good level. Investigating the agreement between the predicted map, generated by each model, and actual maps, generated from fieldmeasured data, of the plant habitats, was at a high level for all habitats, except for Amygdalus scoparia habitat. This study concluded that the RFM is a robust model to analyze the relationships between the distribution of plant species and environmental variables as well as to prepare potential distribution maps of plant habitats that are of higher priority for conservation on the local scale in arid mountainous rangelands.
基金supported by the National Natural Science Foundation of China (Grant NOs. 41601411, 41571398, 41671389)the Priority Academic Program Development of Jiangsu Higher Education Institutions-PAPD (Grant No.164320H101)
文摘The automatic recognition of landforms is regarded as one of the most important procedures to classify landforms and deepen the understanding on the morphology of the earth. However, landform types are rather complex and gradual changes often occur in these landforms, thus increasing the difficulty in automatically recognizing and classifying landforms. In this study, small-scale watersheds, which are regarded as natural geomorphological elements, were extracted and selected as basic analysis and recognition units based on the data of SRTM DEM. In addition, datasets integrated with terrain derivatives(e.g., average slope gradient, and elevation range) and texture derivatives(e.g., slope gradient contrast and elevation variance) were constructed to quantify the topographical characteristics of watersheds. Finally, Random Forest(RF) method was employed to automatically select features and classify landforms based on their topographical characteristics. The proposed method was applied and validated in seven case areas in the Northern Shaanxi Loess Plateau for its complex andgradual changed landforms. Experimental results show that the highest recognition accuracy based on the selected derivations is 92.06%. During the recognition procedure, the contributions of terrain derivations were higher than that of texture derivations within selected derivative datasets. Loess terrace and loess mid-mountain obtained the highest accuracy among the seven typical loess landforms. However, the recognition precision of loess hill, loess hill–ridge, and loess sloping ridge is relatively low. The experiment also shows that watershed-based strategy could achieve better results than object-based strategy, and the method of RF could effectively extract and recognize the feature of landforms.
基金supported by National High Technology Research and Development Program of China (863 Program. No. 2013AA102402)
文摘As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
基金supported by the National Natural Science Foundation of China(61873006)Beijing Natural Science Foundation(4204087,4212040).
文摘Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.
基金National Natural Science Foundation of China,Grant/Award Numbers:61673084,National Natural Science Foundation of ChinaThe Fundamental Research Foundation for Universities of Heilongjiang Province,Grant/Award Number:LGYC2018JC017。
文摘As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.
基金Funds for the Central Universities(grant number CUC24SG018).
文摘The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.
基金This work was partially supported by the National Natural Science Foundation of China(61876089).
文摘With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.
基金Under the auspices of National Natural Science Foundation of China(No.41977411,41771383)Technology Research Project of the Education Department of Jilin Province(No.JJKH20210445KJ)。
文摘Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.
基金supported by the National Natural Science Foundation of China under Grant No.61801222in part supported by the Fundamental Research Funds for the Central Universities under Grant No.30919011230in part supported by the Jiangsu Provincial Department of Education Degree and Graduate Education Research Fund under Grant No.JGZD18_012.
文摘Massive Open Online Course(MOOC)has become a popular way of online learning used across the world by millions of people.Meanwhile,a vast amount of information has been collected from the MOOC learners and institutions.Based on the educational data,a lot of researches have been investigated for the prediction of the MOOC learner’s final grade.However,there are still two problems in this research field.The first problem is how to select the most proper features to improve the prediction accuracy,and the second problem is how to use or modify the data mining algorithms for a better analysis of the MOOC data.In order to solve these two problems,an improved random forests method is proposed in this paper.First,a hybrid indicator is defined to measure the importance of the features,and a rule is further established for the feature selection;then,a Clustering-Synthetic Minority Over-sampling Technique(SMOTE)is embedded into the traditional random forests algorithm to solve the class imbalance problem.In experiment part,we verify the performance of the proposed method by using the Canvas Network Person-Course(CNPC)dataset.Furthermore,four well-known prediction methods have been applied for comparison,where the superiority of our method has been proved.
文摘The moisture control of materials in silk reeling technology of tobacco is regarded as the key factor influencing the inner quality of cigarette. In this paper, according to the statistical data of the silk reeling production line of Yunyan (Ruanzhen brand) of Qujing cigarette factory from June 2013 to May 2014, it is feasible to apply the random forest regression model to study the problem of moisture control theoretically. In the perfuming stage of silk reeling, a random forest regression model is established to describe the change of moisture content of finished cut tobacco in the export link of perfuming stage, aroused by several factors including incoming water content and different environment. According to the model, good moisture control in the export link of perfuming stage (accordance with the technological standards) can be realized by adjusting the regulating reference value of incoming moisture under specific workshop environments. In the drying stage of silk reeling, the most effective method of moisture control is to adjust the cylinder wall temperature by means of analyzing the correlation coefficients among variables which influence the moisture content of cut tobacco in the export link of drying stage and then establishing another random forest regression model. And this method is consistent with the traditional production experiences. In conclusion, these methods referred above provide strong theoretical basis for stable moisture control in the export link of perfuming stage.
基金Supported by National Natural Science Foundation of China(Grant Nos.U22A6001 and 52375273)Major Project of Science and Technology Innovation 2030(Grant No.2021ZD0113100)Zhejiang Provincial Natural Science Foundation of China(Grant No.LZ24E050005)。
文摘The discretization of random fields is the first and most important step in the stochastic analysis of engineering structures with spatially dependent random parameters.The essential step of discretization is solving the Fredholm integral equation to obtain the eigenvalues and eigenfunctions of the covariance functions of the random fields.The collocation method,which has fewer integral operations,is more efficient in accomplishing the task than the timeconsuming Galerkin method,and it is more suitable for engineering applications with complex geometries and a large number of elements.With the help of isogeometric analysis that preserves accurate geometry in analysis,the isogeometric collocation method can efficiently achieve the results with sufficient accuracy.An adaptive moment abscissa is proposed to calculate the coordinates of the collocation points to further improve the accuracy of the collocation method.The adaptive moment abscissae led to more accurate results than the classical Greville abscissae when using the moment parameter optimized with intelligent algorithms.Numerical and engineering examples illustrate the advantages of the proposed isogeometric collocation method based on the adaptive moment abscissae over existing methods in terms of accuracy and efficiency.
基金Guangdong Innovation and Entrepreneurship Training Programme for Undergraduates“Automatic Classification and Identification of Fraudulent Websites Based on Machine Learning”(Project No.:DC2023125)。
文摘This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70%for training and 30%for validation,and analyses the correlation between features using a correlation matrix.The experimental results show that the Elastic Net feature selection method generally outperforms PCA in all models,especially when combined with the Random Forest and XGBoost models,and the ElasticNet+Random Forest model achieves the highest accuracy of 0.968 and AUC value of 0.983,while the Kappa and MCC also reached 0.839 and 0.844 respectively,showing extremely high consistency and correlation.This indicates that combining Elastic Net feature selection and Random Forest model has significant performance advantages in online fraud detection.
基金The Iranian National Science Foundation(INSF)provided financial support for this research under Project Number 4004169the authors would like to thank Isfahan University of Technology and the University of Isfahan for their valuable contributions.
文摘Accurate spatial prediction of soil organic carbon(SOC)and soil inorganic carbon(SIC)is vital for land management decisions.This study targets SOC/SIC mapping challenges at the watershed scale in central Iran by addressing environmental heterogeneity through a random forest(RF)model combined with bootstrapping to assess prediction uncertainty.Thirty-eight environmental variables-categorized into climatic,soil physicochemical,topographic,geomorphic,and remote sensing(RS)-based factors-were considered.Variable importance analysis(via)and partial dependence plots(PDP)identified land use,RS indices,and topography as key predictors of SOC.For SIC,soil reflectance(Bands 5 and 7,ETM+),topography,and geomorphic units were most influential.Climatic factors showed minimal impact in the studied semi-arid watershed.The RF model achieved moderate prediction accuracy(SOC:R^(2)=0.43±0.13,nRMSE=0.28;SIC:R^(2)=0.47±0.11,nRMSE=0.37).Via and PDP analyses enhanced model interpretability by clarifying environmental influences on SOC/SIC spatial distribution.