Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference ...Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference between two correlated SNRs when the readings are from bivariate normal and bivariate lognormal distribution. We use the Pearsons system of curves to approximate the difference between the two estimates and use the bootstrap methods to validate the approximate distributions of the statistic of interest. Methods: The paper uses the delta method to find the first four central moments, and hence the skewness and kurtosis which are important in the determination of the parameters of the Pearsons distribution. Results: The approach is illustrated in two examples;one from veterinary microbiology and food safety data and the other on data from clinical medicine. We derived the four central moments of the target statistics, together with the bootstrap method to evaluate the parameters of Pearsons distribution. The fitted Pearsons curves of Types I and II were recommended based on the available data. The R-codes are also provided to be readily used by the readers.展开更多
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
文摘Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference between two correlated SNRs when the readings are from bivariate normal and bivariate lognormal distribution. We use the Pearsons system of curves to approximate the difference between the two estimates and use the bootstrap methods to validate the approximate distributions of the statistic of interest. Methods: The paper uses the delta method to find the first four central moments, and hence the skewness and kurtosis which are important in the determination of the parameters of the Pearsons distribution. Results: The approach is illustrated in two examples;one from veterinary microbiology and food safety data and the other on data from clinical medicine. We derived the four central moments of the target statistics, together with the bootstrap method to evaluate the parameters of Pearsons distribution. The fitted Pearsons curves of Types I and II were recommended based on the available data. The R-codes are also provided to be readily used by the readers.