期刊文献+

考虑样本不平衡性的严重交通事故致因耦合作用分析

Coupling Analysis of Causative Factors for Severe Traffic Accidents Considering Sample Imbalance
在线阅读 下载PDF
导出
摘要 道路交通事故频发,但基于传统事故严重程度分类的数据比例往往不平衡,为探究样本比例不平衡时多维因素的耦合作用对严重交通事故的影响,研究了1种融合自适应合成采样(adaptive synthetic sampling,ADASYN)算法、Stacking集成学习模型与Apriori算法的分析框架,利用美国交通部2017—2021年道路交通死亡分析报告的数据,从“人、车、路、环境”这4个维度选取15个潜在特征变量,分析多维因素耦合对严重事故的影响。本文利用ADASYN算法进行样本不平衡性处理,选取经典的4类机器学习模型:随机森林(random forest,RF)、分类提升(categorical boosting,CatBoost)、极端梯度提升(extreme gradient boosting,XGBoost),以及梯度提升决策树(gradient boosting decision tree,GBDT)作为基模型,并通过比较5种不同的元模型,即逻辑回归模型、高斯朴素贝叶斯模型、支持向量机、轻量级梯度提升机、多层感知机,筛选出与所选基学习器结合后泛化能力最优的Stacking集成学习模型,随后基于最优模型获取特征重要性排序,筛选出关键因素,并利用Apriori算法对特征进行多维耦合分析,探究五维因素耦合对严重交通事故率的影响。研究表明:(1)以逻辑回归作为元模型结合RF、CatBoost、XGBoost,以及GBDT作为基学习器构成的集成学习模型效果最优,召回率达0.80;(2)道路类型、季节、碰撞类型、碰撞时的灯光情况、驾驶人饮酒这5个因素重要性占全部因素总重要性的53.2%,显著高于其他变量,其中严重事故率最高的为碰撞类型特征中的“与树木等杆状物碰撞”,达86.2%。且有光时的严重事故率比无光时的严重事故率提高了13.5%;(3)多维因素耦合分析发现,自治市道路与驾驶人未饮酒、碰撞时处于无光-有光环境,季节为秋季等多维因素耦合时耦合时发生严重事故的概率最高,置信度达89.0%,打破了未饮酒被认为是低风险因素的常规认知。 Road traffic accidents occur frequently,yet the data distribution based on traditional accident severity classification is often imbalanced.To explore the coupling effects of multidimensional factors on severe traffic accidents under sample imbalance conditions,this study proposes an analytical framework integrating the Adaptive Synthetic Sampling(ADASYN)algorithm,a Stacking ensemble learning model,and the Apriori algorithm.Utilizing data from the U.S.Department of Transportation’s Fatality Analysis Reporting System(FARS)from 2017 to 2021,fifteen potential feature variables are selected across four dimensions—human,vehicle,road,and environment—to analyze the effects of multidimensional factor coupling on the occurrence of severe accidents.The ADASYN algorithm was employed to address sample imbalance.Four classical machine learning models including random forest(RF),categorical boosting(CatBoost),extreme gradient boosting(XGBoost),and gradient boosting decision tree(GBDT),are selected as base learners.Five types of meta-learners,namely logistic regression,Gaussian Naïve Bayes,support vector machine(SVM),light gradient boosting machine(LightGBM),and multilayer perceptron(MLP),are compared to identify the optimal Stacking ensemble model with the strongest generalization performance.Based on the optimal model,feature importance ranking is obtained to determine key influencing factors,followed by the application of the Apriori algorithm for multidimensional coupling analysis,which explored the impact of five-dimensional factor coupling on the rate of severe accidents.The results indicate that:①The Stacking ensemble model composed of Logistic Regression as the meta-learner and RF,CatBoost,XGBoost,and GBDT as base learners achieved the best overall performance,with a recall of 0.80;②The five factors of road type,season,collision type,lighting conditions at the time of the collision,and driver alcohol consumption,accounted for 53.2%of the total importance of all factors,which is substantially higher than that of the other variables.Among them,collisions involving“impact with trees or other pole-like objects”exhibited the highest severe accident rate at 86.2%,and the severe accident rate under illuminated conditions is 13.5%higher than under non-illuminated conditions;③Multidimensional factor coupling analysis reveals that the probability of severe crashes is highest when multiple factors coexist:municipal roads,sober drivers,transitions between unlit and lit lighting conditions at the time of the collision,and the autumn season.Under this coupled condition,the confidence level reaches 89.0%,challenging the conventional perception that non-drinking is a low-risk factor.
作者 王健宇 董悦 陈献天 赵鹏飞 周备 那博 WANG Jianyu;DONG Yue;CHEN Xiantian;ZHAO Pengfei;ZHOU Bei;NA Bo(School of Civil and Transportation Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102616,China;School of Traffic and Transportation,Beijing Jiaotong University,Beijing 100044,China;School of Transportation Engineering,Chang’an University,Xi’an 710064,China)
出处 《交通信息与安全》 北大核心 2025年第5期70-78,共9页 Journal of Transport Information and Safety
基金 国家自然科学基金青年科学基金项目(52102404) 北京市自然科学基金项目(9234025) 北京建筑大学2023年培育项目—青椒计划(X23044) 中国建设教育协会教育教学科研项目(2023108)资助。
关键词 交通安全 严重事故致因机理 样本不平衡性 自适应合成采样 Stacking集成学习 关联规则 traffic safety causal mechanism of severe accidents sample imbalance adaptive synthetic sampling Stacking ensemble learning association rules
  • 相关文献

参考文献23

二级参考文献177

共引文献173

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部