To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is pla...To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is placed on the critical components of material and heat balance.Through a thorough analysis of the interactions between various components and energy consumptions,six pivotal factors have been identified—raw material composition,steel type,steel temperature,slag temperature,recycling practices,and operational parameters.Utilizing a framework based on an equivalent energy consumption model,an integrated intelligent diagnostic model has been developed that encapsulates these factors,providing a comprehensive assessment tool for converter energy consumption.Employing the K-means clustering algorithm,historical operational data from the converter have been meticulously analyzed to determine baseline values for essential variables such as energy consumption and recovery rates.Building upon this data-driven foundation,an innovative online system for the intelligent diagnosis of converter energy consumption has been crafted and implemented,enhancing the precision and efficiency of energy management.Upon implementation with energy consumption data at a steel plant in 2023,the diagnostic analysis performed by the system exposed significant variations in energy usage across different converter units.The analysis revealed that the most significant factor influencing the variation in energy consumption for both furnaces was the steel grade,with contributions of−0.550 and 0.379.展开更多
Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-clus...Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.展开更多
The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and ...The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
The development of wind power clusters has scaled in terms of both scale and coverage,and the impact of weather fluctuations on cluster output changes has become increasingly complex.Accurately identifying the forward...The development of wind power clusters has scaled in terms of both scale and coverage,and the impact of weather fluctuations on cluster output changes has become increasingly complex.Accurately identifying the forward-looking information of key wind farms in a cluster under different weather conditions is an effective method to improve the accuracy of ultrashort-term cluster power forecasting.To this end,this paper proposes a refined modeling method for ultrashort-term wind power cluster forecasting based on a convergent cross-mapping algorithm.From the perspective of causality,key meteorological forecasting factors under different cluster power fluctuation processes were screened,and refined training modeling was performed for different fluctuation processes.First,a wind process description index system and classification model at the wind power cluster level are established to realize the classification of typical fluctuation processes.A meteorological-cluster power causal relationship evaluation model based on the convergent cross-mapping algorithm is pro-posed to screen meteorological forecasting factors under multiple types of typical fluctuation processes.Finally,a refined modeling meth-od for a variety of different typical fluctuation processes is proposed,and the strong causal meteorological forecasting factors of each scenario are used as inputs to realize high-precision modeling and forecasting of ultra-short-term wind cluster power.An example anal-ysis shows that the short-term wind power cluster power forecasting accuracy of the proposed method can reach 88.55%,which is 1.57-7.32%higher than that of traditional methods.展开更多
The Coordinate Descent Method for K-means(CDKM)is an improved algorithm of K-means.It identifies better locally optimal solutions than the original K-means algorithm.That is,it achieves solutions that yield smaller ob...The Coordinate Descent Method for K-means(CDKM)is an improved algorithm of K-means.It identifies better locally optimal solutions than the original K-means algorithm.That is,it achieves solutions that yield smaller objective function values than the K-means algorithm.However,CDKM is sensitive to initialization,which makes the K-means objective function values not small enough.Since selecting suitable initial centers is not always possible,this paper proposes a novel algorithm by modifying the process of CDKM.The proposed algorithm first obtains the partition matrix by CDKM and then optimizes the partition matrix by designing the split-merge criterion to reduce the objective function value further.The split-merge criterion can minimize the objective function value as much as possible while ensuring that the number of clusters remains unchanged.The algorithm avoids the distance calculation in the traditional K-means algorithm because all the operations are completed only using the partition matrix.Experiments on ten UCI datasets show that the solution accuracy of the proposed algorithm,measured by the E value,is improved by 11.29%compared with CDKM and retains its efficiency advantage for the high dimensional datasets.The proposed algorithm can find a better locally optimal solution in comparison to other tested K-means improved algorithms in less run time.展开更多
Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent an...Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.展开更多
Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose ch...Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose challenges in prac-tical applications.To improve the conventional FMEA,many modified FMEA models have been suggested.However,the majority of them inadequately address consensus issues and focus on achieving a complete ranking of failure modes.In this research,we propose a new FMEA approach that integrates a two-stage consensus reaching model and a density peak clus-tering algorithm for the assessment and clustering of failure modes.Firstly,we employ the interval 2-tuple linguistic vari-ables(I2TLVs)to express the uncertain risk evaluations provided by FMEA experts.Then,a two-stage consensus reaching model is adopted to enable FMEA experts to reach a consensus.Next,failure modes are categorized into several risk clusters using a density peak clustering algorithm.Finally,the proposed FMEA is illustrated by a case study of load-bearing guidance devices of subway systems.The results show that the proposed FMEA model can more easily to describe the uncertain risk information of failure modes by using the I2TLVs;the introduction of an endogenous feedback mechanism and an exogenous feedback mechanism can accelerate the process of consensus reaching;and the density peak clustering of failure modes successfully improves the practical applicability of FMEA.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
基金financial support from the National Key R&D Program of China(Grant No.2020YFB1711100).
文摘To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is placed on the critical components of material and heat balance.Through a thorough analysis of the interactions between various components and energy consumptions,six pivotal factors have been identified—raw material composition,steel type,steel temperature,slag temperature,recycling practices,and operational parameters.Utilizing a framework based on an equivalent energy consumption model,an integrated intelligent diagnostic model has been developed that encapsulates these factors,providing a comprehensive assessment tool for converter energy consumption.Employing the K-means clustering algorithm,historical operational data from the converter have been meticulously analyzed to determine baseline values for essential variables such as energy consumption and recovery rates.Building upon this data-driven foundation,an innovative online system for the intelligent diagnosis of converter energy consumption has been crafted and implemented,enhancing the precision and efficiency of energy management.Upon implementation with energy consumption data at a steel plant in 2023,the diagnostic analysis performed by the system exposed significant variations in energy usage across different converter units.The analysis revealed that the most significant factor influencing the variation in energy consumption for both furnaces was the steel grade,with contributions of−0.550 and 0.379.
文摘Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.
文摘The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits: raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un- confined compression strengths and other parameters of the different soils.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金funded by the State Grid Science and Technology Project“Research on Key Technologies for Prediction and Early Warning of Large-Scale Offshore Wind Power Ramp Events Based on Meteorological Data Enhancement”(4000-202318098A-1-1-ZN).
文摘The development of wind power clusters has scaled in terms of both scale and coverage,and the impact of weather fluctuations on cluster output changes has become increasingly complex.Accurately identifying the forward-looking information of key wind farms in a cluster under different weather conditions is an effective method to improve the accuracy of ultrashort-term cluster power forecasting.To this end,this paper proposes a refined modeling method for ultrashort-term wind power cluster forecasting based on a convergent cross-mapping algorithm.From the perspective of causality,key meteorological forecasting factors under different cluster power fluctuation processes were screened,and refined training modeling was performed for different fluctuation processes.First,a wind process description index system and classification model at the wind power cluster level are established to realize the classification of typical fluctuation processes.A meteorological-cluster power causal relationship evaluation model based on the convergent cross-mapping algorithm is pro-posed to screen meteorological forecasting factors under multiple types of typical fluctuation processes.Finally,a refined modeling meth-od for a variety of different typical fluctuation processes is proposed,and the strong causal meteorological forecasting factors of each scenario are used as inputs to realize high-precision modeling and forecasting of ultra-short-term wind cluster power.An example anal-ysis shows that the short-term wind power cluster power forecasting accuracy of the proposed method can reach 88.55%,which is 1.57-7.32%higher than that of traditional methods.
基金funded by National Defense Basic Research Program,grant number JCKY2019411B001funded by National Key Research and Development Program,grant number 2022YFC3601305funded by Key R&D Projects of Jilin Provincial Science and Technology Department,grant number 20210203218SF.
文摘The Coordinate Descent Method for K-means(CDKM)is an improved algorithm of K-means.It identifies better locally optimal solutions than the original K-means algorithm.That is,it achieves solutions that yield smaller objective function values than the K-means algorithm.However,CDKM is sensitive to initialization,which makes the K-means objective function values not small enough.Since selecting suitable initial centers is not always possible,this paper proposes a novel algorithm by modifying the process of CDKM.The proposed algorithm first obtains the partition matrix by CDKM and then optimizes the partition matrix by designing the split-merge criterion to reduce the objective function value further.The split-merge criterion can minimize the objective function value as much as possible while ensuring that the number of clusters remains unchanged.The algorithm avoids the distance calculation in the traditional K-means algorithm because all the operations are completed only using the partition matrix.Experiments on ten UCI datasets show that the solution accuracy of the proposed algorithm,measured by the E value,is improved by 11.29%compared with CDKM and retains its efficiency advantage for the high dimensional datasets.The proposed algorithm can find a better locally optimal solution in comparison to other tested K-means improved algorithms in less run time.
文摘Efficient iterative unsupervised machine learning involving probabilistic clustering analysis with the expectation-maximization(EM)clustering algorithm is applied to categorize reservoir facies by exploiting latent and observable well-log variables from a clastic reservoir in the Majnoon oilfield,southern Iraq.The observable well-log variables consist of conventional open-hole,well-log data and the computer-processed interpretation of gamma rays,bulk density,neutron porosity,compressional sonic,deep resistivity,shale volume,total porosity,and water saturation,from three wells located in the Nahr Umr reservoir.The latent variables include shale volume and water saturation.The EM algorithm efficiently characterizes electrofacies through iterative machine learning to identify the local maximum likelihood estimates(MLE)of the observable and latent variables in the studied dataset.The optimized EM model developed successfully predicts the core-derived facies classification in two of the studied wells.The EM model clusters the data into three distinctive reservoir electrofacies(F1,F2,and F3).F1 represents a gas-bearing electrofacies with low shale volume(Vsh)and water saturation(Sw)and high porosity and permeability values identifying it as an attractive reservoir target.The results of the EM model are validated using nuclear magnetic resonance(NMR)data from the third studied well for which no cores were recovered.The NMR results confirm the effectiveness and accuracy of the EM model in predicting electrofacies.The utilization of the EM algorithm for electrofacies classification/cluster analysis is innovative.Specifically,the clusters it establishes are less rigidly constrained than those derived from the more commonly used K-means clustering method.The EM methodology developed generates dependable electrofacies estimates in the studied reservoir intervals where core samples are not available.Therefore,once calibrated with core data in some wells,the model is suitable for application to other wells that lack core data.
基金supported by the Fundamental Research Funds for the Central Universities(22120240094)Humanities and Social Science Fund of Ministry of Education China(22YJA630082).
文摘Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose challenges in prac-tical applications.To improve the conventional FMEA,many modified FMEA models have been suggested.However,the majority of them inadequately address consensus issues and focus on achieving a complete ranking of failure modes.In this research,we propose a new FMEA approach that integrates a two-stage consensus reaching model and a density peak clus-tering algorithm for the assessment and clustering of failure modes.Firstly,we employ the interval 2-tuple linguistic vari-ables(I2TLVs)to express the uncertain risk evaluations provided by FMEA experts.Then,a two-stage consensus reaching model is adopted to enable FMEA experts to reach a consensus.Next,failure modes are categorized into several risk clusters using a density peak clustering algorithm.Finally,the proposed FMEA is illustrated by a case study of load-bearing guidance devices of subway systems.The results show that the proposed FMEA model can more easily to describe the uncertain risk information of failure modes by using the I2TLVs;the introduction of an endogenous feedback mechanism and an exogenous feedback mechanism can accelerate the process of consensus reaching;and the density peak clustering of failure modes successfully improves the practical applicability of FMEA.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.