This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,...This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.展开更多
With the recent trends in urban agriculture and climate change,there is an emerging need for alternative plant culture techniques where dependence on soil can be eliminated.Hydroponic and aquaponic growth techniques h...With the recent trends in urban agriculture and climate change,there is an emerging need for alternative plant culture techniques where dependence on soil can be eliminated.Hydroponic and aquaponic growth techniques have proven to be viable alternatives,but the lack of efficient and optimal practices for irrigation and nutrient supply limits its applications on a large-scale commercial basis.The main purpose of this research was to develop statistical methods and Machine Learning algorithms to regulate nutrient concentrations in aquaponic irrigation water based on plant needs,for achieving optimal plant growth and promoting broader adoption of aquaponic culture on a commercial scale.One of the key challenges to developing these algorithms is the sparsity of data which requires the use of Bolstered error estimation approaches.In this paper,several linear and non-linear algorithms trained on relatively small datasets using Bolstered error estimation techniques were evaluated,for selecting the best method in making decisions regarding the regulation of nutrients in hydroponic environments.After repeated tests on the dataset,it was decided that Semi-Bolstered Resubstitution Error estimation technique works best in our case using Linear Support Vector Machine as the classifier with the value of penalty parameter set to one.A set of recommended rules have been prescribed as a Decision Support System,using the output of the Machine Learning algorithm,which have been tested against the results of the baseline model.Further,the positive impact of the recommended nutrient concentrationson plant growth in aquaponic environments has been elaborately discussed.展开更多
The substantial progress in machine learning(ML)techniques and the growing availability of building data have created significant opportunities for rapid and precise building energy modeling.However,despite the notabl...The substantial progress in machine learning(ML)techniques and the growing availability of building data have created significant opportunities for rapid and precise building energy modeling.However,despite the notable capabilities of ML algorithms,their performance could severely degrade when available training dataset is limited,undermining trustworthiness and effectiveness of model application in practice.To address this challenge,this study develops the seasonal naïve-neural-ordinary differential equations(SN-NODE)model to predict the cooling and heating loads of buildings,especially in scenarios with severe data scarcity.By incorporating a physics-informed structure into SN-NODE,the model aligns predictions with the underlying physical principle governed by resistance–capacitance(RC)models,enhancing both accuracy and reliability.The resulting predictions for hourly and sub-hourly cooling and heating loads achieved a coefficient of variation of root mean square error(CVRMSE)of approximately 0.3 and 0.2,respectively,demonstrating its strong potential for accurate building load prediction.The physics-informed structure further improved prediction accuracy over the original SN-NODE when trained with hourly dataset,ensuring physically consistent and interpretable results.Moreover,a robustness index(RI)function was proposed to evaluate the model robustness in a nonlinear manner,showcasing the superior performance of the SN-NODE model with limited training data compared to conventional data-driven models including long-short term memory(LSTM)and support vector machine(SVM).Notably,the SN-NODE model maintained high prediction accuracy even with only two weeks of training data,whereas the performance of LSTM decreased dramatically(CVRMSE increases from approximately 0.3 to 0.5)under similar conditions.Finally,the SN-NODE model exhibited robust performance across different time resolutions and forecasting horizons,achieving CVRMSE ranging from approximately 0.15 to 0.3 in building energy use prediction.展开更多
Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a...Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a sparse algorithm for adaptive pruning LSSVR algorithm based on global representative point ranking(GRPR-AP-LSSVR)is proposed.At first,the global representative point ranking(GRPR)algorithm is given,and relevant data analysis experiment is implemented which depicts the importance ranking of data points.Furthermore,the pruning strategy of removing two samples in the decremental learning procedure is designed to accelerate the training speed and ensure the sparsity.The removed data points are utilized to test the temporary learning model which ensures the regression accuracy.Finally,the proposed algorithm is verified on artificial datasets and UCI regression datasets,and experimental results indicate that,compared with several benchmark algorithms,the GRPR-AP-LSSVR algorithm has excellent sparsity and prediction speed without impairing the generalization performance.展开更多
This study aims to reveal the impacts of three important uncertainty issues in landslide susceptibility prediction(LSP),namely the spatial resolution,proportion of model training and testing datasets and selection of ...This study aims to reveal the impacts of three important uncertainty issues in landslide susceptibility prediction(LSP),namely the spatial resolution,proportion of model training and testing datasets and selection of machine learning models.Taking Yanchang County of China as example,the landslide inventory and 12 important conditioning factors were acquired.The frequency ratios of each conditioning factor were calculated under five spatial resolutions(15,30,60,90 and 120 m).Landslide and non-landslide samples obtained under each spatial resolution were further divided into five proportions of training and testing datasets(9:1,8:2,7:3,6:4 and 5:5),and four typical machine learning models were applied for LSP modelling.The results demonstrated that different spatial resolution and training and testing dataset proportions induce basically similar influences on the modeling uncertainty.With a decrease in the spatial resolution from 15 m to 120 m and a change in the proportions of the training and testing datasets from 9:1 to 5:5,the modelling accuracy gradually decreased,while the mean values of predicted landslide susceptibility indexes increased and their standard deviations decreased.The sensitivities of the three uncertainty issues to LSP modeling were,in order,the spatial resolution,the choice of machine learning model and the proportions of training/testing datasets.展开更多
Diagnosing lithium-ion battery degradation is challenging due to the complex, nonlinear, and path-dependent nature of the problem. Here, we develop a generalised and rapid degradation diagnostic method with a deep lea...Diagnosing lithium-ion battery degradation is challenging due to the complex, nonlinear, and path-dependent nature of the problem. Here, we develop a generalised and rapid degradation diagnostic method with a deep learning-convolutional neural network that quantifies degradation modes of batteries aged under various conditions in 0.012 s without feature engineering. Rather than performing extensive aging experiments, synthetic aging datasets for network training are generated. This dramatically lowers training cost/time, with these datasets covering almost all the aging paths, enabling a generalised degradation diagnostic framework. We show that the five thermodynamic degradation modes are correlated, and systematically elucidate their correlations. We thus propose a non-invasive comprehensive evaluation method and find the degradation diagnostic errors to be less than 1.22% for three leading commercial battery chemistries. The comparison with the traditional diagnostic methods confirms the high accuracy and fast nature of the proposed approach. Quantification of degradation modes with the partial discharge/charge data using the proposed diagnostic framework validates the real-world feasibility of this approach. This work, therefore, enables the promise of online identification of battery degradation and efficient analysis of large-data sets, unlocking potential for long lifetime energy storage systems.展开更多
基金supported by a project entitled Loess Plateau Region-Watershed-Slope Geological Hazard Multi-Scale Collaborative Intelligent Early Warning System of the National Key R&D Program of China(2022YFC3003404)a project of the Shaanxi Youth Science and Technology Star(2021KJXX-87)public welfare geological survey projects of Shaanxi Institute of Geologic Survey(20180301,201918,202103,and 202413).
文摘This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.
文摘With the recent trends in urban agriculture and climate change,there is an emerging need for alternative plant culture techniques where dependence on soil can be eliminated.Hydroponic and aquaponic growth techniques have proven to be viable alternatives,but the lack of efficient and optimal practices for irrigation and nutrient supply limits its applications on a large-scale commercial basis.The main purpose of this research was to develop statistical methods and Machine Learning algorithms to regulate nutrient concentrations in aquaponic irrigation water based on plant needs,for achieving optimal plant growth and promoting broader adoption of aquaponic culture on a commercial scale.One of the key challenges to developing these algorithms is the sparsity of data which requires the use of Bolstered error estimation approaches.In this paper,several linear and non-linear algorithms trained on relatively small datasets using Bolstered error estimation techniques were evaluated,for selecting the best method in making decisions regarding the regulation of nutrients in hydroponic environments.After repeated tests on the dataset,it was decided that Semi-Bolstered Resubstitution Error estimation technique works best in our case using Linear Support Vector Machine as the classifier with the value of penalty parameter set to one.A set of recommended rules have been prescribed as a Decision Support System,using the output of the Machine Learning algorithm,which have been tested against the results of the baseline model.Further,the positive impact of the recommended nutrient concentrationson plant growth in aquaponic environments has been elaborately discussed.
基金funded by the US National Science Foundation(NSF).Award title:Elements:A Convergent Physics-based and Data-driven Computing Platform for Building Modeling(#2311685).
文摘The substantial progress in machine learning(ML)techniques and the growing availability of building data have created significant opportunities for rapid and precise building energy modeling.However,despite the notable capabilities of ML algorithms,their performance could severely degrade when available training dataset is limited,undermining trustworthiness and effectiveness of model application in practice.To address this challenge,this study develops the seasonal naïve-neural-ordinary differential equations(SN-NODE)model to predict the cooling and heating loads of buildings,especially in scenarios with severe data scarcity.By incorporating a physics-informed structure into SN-NODE,the model aligns predictions with the underlying physical principle governed by resistance–capacitance(RC)models,enhancing both accuracy and reliability.The resulting predictions for hourly and sub-hourly cooling and heating loads achieved a coefficient of variation of root mean square error(CVRMSE)of approximately 0.3 and 0.2,respectively,demonstrating its strong potential for accurate building load prediction.The physics-informed structure further improved prediction accuracy over the original SN-NODE when trained with hourly dataset,ensuring physically consistent and interpretable results.Moreover,a robustness index(RI)function was proposed to evaluate the model robustness in a nonlinear manner,showcasing the superior performance of the SN-NODE model with limited training data compared to conventional data-driven models including long-short term memory(LSTM)and support vector machine(SVM).Notably,the SN-NODE model maintained high prediction accuracy even with only two weeks of training data,whereas the performance of LSTM decreased dramatically(CVRMSE increases from approximately 0.3 to 0.5)under similar conditions.Finally,the SN-NODE model exhibited robust performance across different time resolutions and forecasting horizons,achieving CVRMSE ranging from approximately 0.15 to 0.3 in building energy use prediction.
基金supported by the Science and Technology on Space Intelligent Control Laboratory for National Defense(KGJZDSYS-2018-08)。
文摘Least square support vector regression(LSSVR)is a method for function approximation,whose solutions are typically non-sparse,which limits its application especially in some occasions of fast prediction.In this paper,a sparse algorithm for adaptive pruning LSSVR algorithm based on global representative point ranking(GRPR-AP-LSSVR)is proposed.At first,the global representative point ranking(GRPR)algorithm is given,and relevant data analysis experiment is implemented which depicts the importance ranking of data points.Furthermore,the pruning strategy of removing two samples in the decremental learning procedure is designed to accelerate the training speed and ensure the sparsity.The removed data points are utilized to test the temporary learning model which ensures the regression accuracy.Finally,the proposed algorithm is verified on artificial datasets and UCI regression datasets,and experimental results indicate that,compared with several benchmark algorithms,the GRPR-AP-LSSVR algorithm has excellent sparsity and prediction speed without impairing the generalization performance.
基金This research is funded by the National Natural Science Foundation of China(41807285,41762020,51879127 and 51769014E)Natural Science Foundation of Hebei Province(D2022202005).
文摘This study aims to reveal the impacts of three important uncertainty issues in landslide susceptibility prediction(LSP),namely the spatial resolution,proportion of model training and testing datasets and selection of machine learning models.Taking Yanchang County of China as example,the landslide inventory and 12 important conditioning factors were acquired.The frequency ratios of each conditioning factor were calculated under five spatial resolutions(15,30,60,90 and 120 m).Landslide and non-landslide samples obtained under each spatial resolution were further divided into five proportions of training and testing datasets(9:1,8:2,7:3,6:4 and 5:5),and four typical machine learning models were applied for LSP modelling.The results demonstrated that different spatial resolution and training and testing dataset proportions induce basically similar influences on the modeling uncertainty.With a decrease in the spatial resolution from 15 m to 120 m and a change in the proportions of the training and testing datasets from 9:1 to 5:5,the modelling accuracy gradually decreased,while the mean values of predicted landslide susceptibility indexes increased and their standard deviations decreased.The sensitivities of the three uncertainty issues to LSP modeling were,in order,the spatial resolution,the choice of machine learning model and the proportions of training/testing datasets.
基金supported by the Faraday Institution’s Industrial Fellowship(FIIF-013)Innovate UK Battery Advanced for Future Transport Applications(BAFTA)project(104428)+2 种基金the EPSRC Faraday Institution’s Multi-Scale Modelling Project(EP/S003053/1,grant number FIRG003)the EPSRC Joint UK-India Clean Energy centre(JUICE)(EP/P003605/1)the EPSRC Integrated Development of Low-Carbon Energy Systems(IDLES)project(EP/R045518/1).
文摘Diagnosing lithium-ion battery degradation is challenging due to the complex, nonlinear, and path-dependent nature of the problem. Here, we develop a generalised and rapid degradation diagnostic method with a deep learning-convolutional neural network that quantifies degradation modes of batteries aged under various conditions in 0.012 s without feature engineering. Rather than performing extensive aging experiments, synthetic aging datasets for network training are generated. This dramatically lowers training cost/time, with these datasets covering almost all the aging paths, enabling a generalised degradation diagnostic framework. We show that the five thermodynamic degradation modes are correlated, and systematically elucidate their correlations. We thus propose a non-invasive comprehensive evaluation method and find the degradation diagnostic errors to be less than 1.22% for three leading commercial battery chemistries. The comparison with the traditional diagnostic methods confirms the high accuracy and fast nature of the proposed approach. Quantification of degradation modes with the partial discharge/charge data using the proposed diagnostic framework validates the real-world feasibility of this approach. This work, therefore, enables the promise of online identification of battery degradation and efficient analysis of large-data sets, unlocking potential for long lifetime energy storage systems.