Missing values are prevalent in real-world datasets and they may reduce predictive performance of a learning algorithm. Dissolved Gas Analysis (DGA), one of the most deployable methods for detecting and predicting inc...Missing values are prevalent in real-world datasets and they may reduce predictive performance of a learning algorithm. Dissolved Gas Analysis (DGA), one of the most deployable methods for detecting and predicting incipient faults in power transformers is one of the casualties. Thus, this paper proposes filling-in the missing values found in a DGA dataset using the k-nearest neighbor imputation method with two different distance metrics: Euclidean and Cityblock. Thereafter, using these imputed datasets as inputs, this study applies Support Vector Machine (SVM) to built models which are used to classify transformer faults. Experimental results are provided to show the effectiveness of the proposed approach.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
为解决光伏功率预测不准确问题,提出了一种基于自适应k均值和支持向量回归的光伏出力预测方法。首先,分析了k均值聚类及其改进方法,给出了支持向量回归(SVR)的基本原理和应用流程,介绍了SVR中径向基函数凸优化模型。然后,结合自适应k均...为解决光伏功率预测不准确问题,提出了一种基于自适应k均值和支持向量回归的光伏出力预测方法。首先,分析了k均值聚类及其改进方法,给出了支持向量回归(SVR)的基本原理和应用流程,介绍了SVR中径向基函数凸优化模型。然后,结合自适应k均值和支持向量回归,依据光伏出力基本特点,分析了光伏出力预测流程及预测结果统计学评价指标。最后,以“云南昆明”光照数据为实际算例,确定了预测模型结构,并分别采用k-means and SVR、ARMA和ANN这3种方法进行预测,对比了不同聚类结果和不同算法时的预测统计指标,验证了所提方法的有效性,为光伏出力预测提供了一种方法。展开更多
文摘Missing values are prevalent in real-world datasets and they may reduce predictive performance of a learning algorithm. Dissolved Gas Analysis (DGA), one of the most deployable methods for detecting and predicting incipient faults in power transformers is one of the casualties. Thus, this paper proposes filling-in the missing values found in a DGA dataset using the k-nearest neighbor imputation method with two different distance metrics: Euclidean and Cityblock. Thereafter, using these imputed datasets as inputs, this study applies Support Vector Machine (SVM) to built models which are used to classify transformer faults. Experimental results are provided to show the effectiveness of the proposed approach.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘为解决光伏功率预测不准确问题,提出了一种基于自适应k均值和支持向量回归的光伏出力预测方法。首先,分析了k均值聚类及其改进方法,给出了支持向量回归(SVR)的基本原理和应用流程,介绍了SVR中径向基函数凸优化模型。然后,结合自适应k均值和支持向量回归,依据光伏出力基本特点,分析了光伏出力预测流程及预测结果统计学评价指标。最后,以“云南昆明”光照数据为实际算例,确定了预测模型结构,并分别采用k-means and SVR、ARMA和ANN这3种方法进行预测,对比了不同聚类结果和不同算法时的预测统计指标,验证了所提方法的有效性,为光伏出力预测提供了一种方法。