摘要
【目的】配电网工程造价受规模容量、设备材料成本、地理条件等多维度因素影响,传统统计方法难以有效处理高维非线性数据,而现有机器学习方法虽引入特征降维技术,但仍存在一定局限性,主成分分析(PCA)虽能降低维度却牺牲了预测精度,而灰色关联分析(GRA)忽略了特征间的交互作用。因此,亟须构建一种既能保留关键特征信息、又能兼顾特征间复杂关系的预测方法。通过融合递归特征消除(RFE)法与随机森林(RF)算法构建RFE-RF预测模型,旨在解决特征冗余与非线性建模难题。【方法】采用“特征选择-模型构建-实验验证”技术路线,选用RFE法进行特征选择,通过迭代训练模型逐步剔除对预测贡献最小的特征并保留最优特征子集。采用RF算法进行模型构建,基于集成学习思路构建多棵决策树,通过平均化输出结果有效抑制过拟合,提升模型鲁棒性。RF对噪声数据不敏感且能量化特征重要性,可为RFE提供可靠的特征排序依据,从而可将RFE嵌入RF训练流程形成闭环优化过程。【结果】选用某电网公司190个配电网工程项目数据,数据涵盖电压等级、线路长度、设备价格等21个初始特征,对分类型特征进行数值化映射并保留原始分布特征。通过五折交叉验证与均方根误差优化,确定包括线路长度、电缆综合价格、电压等级等关键因素的12个最佳特征子集。与传统线性回归(LR)算法、随机森林算法、基于互信息的随机森林(MI-RF)算法相比,RFE-RF算法在测试集上的预测平均绝对误差为8.6579,预测平均绝对百分误差为6.97%,显著优于其他算法。RFE-RF算法在测试集的平均绝对误差仅比训练集增加约4.5%,其过拟合风险低于其他算法,表明可以通过特征选择有效提升算法稳定性。【结论】特征选择成为提升配电网造价预测精度的关键,RFE法能够通过动态迭代来剔除冗余特征,显著降低数据维度与噪声干扰。RFE-RF模型兼具高精度与强解释性,其平均绝对误差相比传统模型大为降低,且能够清晰量化不同特征对造价的影响权重。将RFE与RF结合应用于配电网造价预测,能够解决特征交互与冗余筛选难题,可为复杂工程系统的数据建模提供新范式。RFE-RF模型可为电网企业提供精准造价预测工具,辅助投资决策与成本控制,推动配电网工程建设的智能化与精细化,并可通过揭示特征选择对机器学习模型泛化能力的影响机制,为高维非线性数据的特征优化提供实践参考。
[Objective]The cost of distribution network engineering is influenced by multidimensional factors such as scale and capacity,equipment and material costs,and geographical conditions.Traditional statistical methods(e.g.,linear regression)struggle to handle high-dimensional nonlinear data effectively,while existing machine learning approaches,despite incorporating feature reduction techniques,still exhibit limitations.For instance,principal component analysis(PCA)sacrifices prediction accuracy for dimensionality reduction,and grey relational analysis(GRA)ignores feature interactions.Therefore,there is an urgent need for a prediction method that retains critical feature information while accounting for complex inter-feature relationships.This study integrated recursive feature elimination(RFE)with the random forest(RF)algorithm to develop a RFE-RF prediction model,aiming to resolve feature redundancy and nonlinear modeling challenges.[Methods]A technical framework of“feature selection-model construction-experimental validation”was adopted.For feature selection,the recursive feature elimination(RFE)method was employed,which iterated training models to gradually eliminate features with minimal predictive contributions,retaining an optimal feature subset.For model construction,the RF algorithm was utilized.Based on ensemble learning principles,RF constructed multiple decision trees and averaged their outputs,effectively mitigating overfitting and enhancing model robustness.RF was insensitive to noisy data and quantified feature importance,providing reliable feature ranking criteria for RFE.By embedding RFE into the RF training process,a closed-loop optimization workflow was established.[Results]Experimental validation used data from 190 distribution network engineering projects provided by a power grid company,covering 21 initial features such as voltage level,line length,and equipment costs.Categorical features were numerically encoded while preserving their original distribution characteristics.Through five-fold cross-validation and root mean square error(RMSE)optimization,the optimal feature subset was identified as 12 optimal feature subsets,including such key factors as line length,comprehensive cable price,and voltage level.Compared with traditional linear regression(LR),RF,and mutual information-based RF(MI-RF)algorithms,the RFE-RF algorithm achieves a mean absolute error(MAE)of 8.6579 and a mean absolute percentage error(MAPE)of 6.97%on the test set,significantly outperforming other algorithms.The MAE of RFE-RF on the test set increases by only about 4.5%compared to the training set,indicating lower overfitting risks and demonstrating that feature selection effectively enhances model stability.[Conclusion]Feature selection is pivotal for improving the accuracy of distribution network cost prediction.RFE dynamically eliminates redundant features through iterative processes,substantially reducing data dimensionality and noise interference.The RFE-RF model combines high precision with strong interpretability,reduces MAE significantly compared to traditional models,and clearly quantifies the impact weights of individual features on costs.This study marks the application of combining RFE and RF in cost prediction for distribution network engineering,addressing challenges in feature interaction and redundancy filtering and providing a new paradigm for data modeling in complex engineering systems.The model serves as a precise cost prediction tool for power grid enterprises,aiding investment decisions and cost control,thus advancing intelligent and refined construction of distribution networks.Moreover,it reveals the impact mechanism of feature selection on the generalization capability of machine learning models,offering practical references for feature optimization in high-dimensional nonlinear datasets.
作者
徐宁
李维嘉
周波
刘云
李洁
XU Ning;LI Weijia;ZHOU Bo;LIU Yun;LI Jie(School of Electrical and Electronic Engineering,North China Electric Power University,Baoding 071003,Hebei,China;School of Energy,Power and Mechanical Engineering,North China Electric Power University,Baoding 071003,Hebei,China;Economic and Technological Research Institute,State Grid Hebei Electric Power Co.,Ltd.,Shijiazhuang 050001,Hebei,China;School of Electrical Engineering and Automation,Wuhan University,Wuhan 430072,Hubei,China;Hebei SECPT Computer Consulting Service Co.,Ltd.,Shijiazhuang 050081,Hebei,China)
出处
《沈阳工业大学学报》
北大核心
2025年第5期558-565,共8页
Journal of Shenyang University of Technology
基金
河北省自然科学基金重点项目(E2018210044)
河北省教育厅科技项目(QN16214510D)。
关键词
配电网工程
造价预测
特征维度
非线性
数据冗余
特征选择
递归特征消除
机器学习
distribution network engineering
cost prediction
feature dimension
non-linearity
data redundancy
feature selection
recursive feature elimination
machine learning