SaaS software that provides services through cloud platform has been more widely used nowadays.However,when SaaS software is running,it will suffer from performance fault due to factors such as the software structural...SaaS software that provides services through cloud platform has been more widely used nowadays.However,when SaaS software is running,it will suffer from performance fault due to factors such as the software structural design or complex environments.It is a major challenge that how to diagnose software quickly and accurately when the performance fault occurs.For this challenge,we propose a novel performance fault diagnosis method for SaaS software based on GBDT(Gradient Boosting Decision Tree)algorithm.In particular,we leverage the monitoring mean to obtain the performance log and warning log when the SaaS software system runs,and establish the performance fault type set and determine performance log feature.We also perform performance fault type annotation for the performance log combined with the analysis result of the warning log.Moreover,we deal with the incomplete performance log and the type non-equalization problem by using the mean filling for the same type and combination of SMOTE(Synthetic Minority Oversampling Technique)and undersampling methods.Finally,we conduct an empirical study combined with the disaster reduction system deployed on the cloud platform,and it demonstrates that the proposed method has high efficiency and accuracy for the performance diagnosis when SaaS software system runs.展开更多
催化裂化装置是一个高度非线性和相互强关联的多变量系统,基于数据挖掘技术的分析方法是优化该工艺过程的一类有力工具。笔者利用某石油化工企业集散控制系统(Distributed control system,DCS)和实验室信息管理系统(Laboratory informat...催化裂化装置是一个高度非线性和相互强关联的多变量系统,基于数据挖掘技术的分析方法是优化该工艺过程的一类有力工具。笔者利用某石油化工企业集散控制系统(Distributed control system,DCS)和实验室信息管理系统(Laboratory information management system,LIMS)的工业生产实时数据,分别从指标与汽油收率的正负相关性、工业经验以及模型重要性筛选等方面选取了182个关键影响参数,利用梯度提升决策树(GBDT)算法构建催化裂化汽油收率的预测模型,预测相应的汽油收率。基于GBDT集成学习框架构建了P-GBDT模型,引入了特征扰动和特征权重,增大经验可控参数的权重,解决了普通GBDT模型对特征缺乏偏好、经验可控参数特征的权重较小的问题。结果显示,由P-GBDT算法构建的汽油收率预测模型预测结果的准确率、R^2、均方根误差等指标相比由GBDT算法构建的基准模型的预测结果明显更好,对真实收率的拟合效果更为接近,对优化改进实际可控装置操作条件具有更好的指导意义。展开更多
基金This work is supported in part by the National Science Foundation of China(61672392,61373038)in part by the National Key Research and Development Program of China(No.2016YFC1202204).
文摘SaaS software that provides services through cloud platform has been more widely used nowadays.However,when SaaS software is running,it will suffer from performance fault due to factors such as the software structural design or complex environments.It is a major challenge that how to diagnose software quickly and accurately when the performance fault occurs.For this challenge,we propose a novel performance fault diagnosis method for SaaS software based on GBDT(Gradient Boosting Decision Tree)algorithm.In particular,we leverage the monitoring mean to obtain the performance log and warning log when the SaaS software system runs,and establish the performance fault type set and determine performance log feature.We also perform performance fault type annotation for the performance log combined with the analysis result of the warning log.Moreover,we deal with the incomplete performance log and the type non-equalization problem by using the mean filling for the same type and combination of SMOTE(Synthetic Minority Oversampling Technique)and undersampling methods.Finally,we conduct an empirical study combined with the disaster reduction system deployed on the cloud platform,and it demonstrates that the proposed method has high efficiency and accuracy for the performance diagnosis when SaaS software system runs.
文摘催化裂化装置是一个高度非线性和相互强关联的多变量系统,基于数据挖掘技术的分析方法是优化该工艺过程的一类有力工具。笔者利用某石油化工企业集散控制系统(Distributed control system,DCS)和实验室信息管理系统(Laboratory information management system,LIMS)的工业生产实时数据,分别从指标与汽油收率的正负相关性、工业经验以及模型重要性筛选等方面选取了182个关键影响参数,利用梯度提升决策树(GBDT)算法构建催化裂化汽油收率的预测模型,预测相应的汽油收率。基于GBDT集成学习框架构建了P-GBDT模型,引入了特征扰动和特征权重,增大经验可控参数的权重,解决了普通GBDT模型对特征缺乏偏好、经验可控参数特征的权重较小的问题。结果显示,由P-GBDT算法构建的汽油收率预测模型预测结果的准确率、R^2、均方根误差等指标相比由GBDT算法构建的基准模型的预测结果明显更好,对真实收率的拟合效果更为接近,对优化改进实际可控装置操作条件具有更好的指导意义。