This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processi...This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processing choices. Bootstrap data simulation is used to measure the bias-variance decomposition of an empirical risk function, mean square error (MSE). Results of the risk function decomposition are used to measure the effects of model development choices on </span><span style="font-family:Verdana;">model</span><span style="font-family:Verdana;"> bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development. Best performing model-normalization-data structure combinations were found to illustrate the downstream analysis effects of these model development choices. </span><span style="font-family:Verdana;">In addition</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;">, results found from simulations were verified and expanded to include additional data characteristics (imbalanced, sparse) by testing on benchmark datasets available from the UCI Machine Learning Library. Normalization results on benchmark data were consistent with those found using simulations, while also illustrating that more complex and/or non-linear models provide better performance on datasets with additional complexities. Finally, applying the findings from simulation experiments to previously tested applications led to equivalent or improved results with less model development overhead and processing time.</span>展开更多
This paper introduces supervised learning model, and surveys related research work. The paper is organised as follows. A supervised learning model is firstly described. The bias-variance trade-off is then discussed fo...This paper introduces supervised learning model, and surveys related research work. The paper is organised as follows. A supervised learning model is firstly described. The bias-variance trade-off is then discussed for the supervised learning model. Based on the bias-variance trade-off, both the single neural network approaches and the neural network ensemble approaches are overviewed, and problems with the existing approaches are indicated. Finally, the paper concludes with specifying potential future research directions.展开更多
文摘This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processing choices. Bootstrap data simulation is used to measure the bias-variance decomposition of an empirical risk function, mean square error (MSE). Results of the risk function decomposition are used to measure the effects of model development choices on </span><span style="font-family:Verdana;">model</span><span style="font-family:Verdana;"> bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development. Best performing model-normalization-data structure combinations were found to illustrate the downstream analysis effects of these model development choices. </span><span style="font-family:Verdana;">In addition</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;">, results found from simulations were verified and expanded to include additional data characteristics (imbalanced, sparse) by testing on benchmark datasets available from the UCI Machine Learning Library. Normalization results on benchmark data were consistent with those found using simulations, while also illustrating that more complex and/or non-linear models provide better performance on datasets with additional complexities. Finally, applying the findings from simulation experiments to previously tested applications led to equivalent or improved results with less model development overhead and processing time.</span>
基金Supported by the National Natural Science Foundation of China(60133010)
文摘This paper introduces supervised learning model, and surveys related research work. The paper is organised as follows. A supervised learning model is firstly described. The bias-variance trade-off is then discussed for the supervised learning model. Based on the bias-variance trade-off, both the single neural network approaches and the neural network ensemble approaches are overviewed, and problems with the existing approaches are indicated. Finally, the paper concludes with specifying potential future research directions.