To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is pla...To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is placed on the critical components of material and heat balance.Through a thorough analysis of the interactions between various components and energy consumptions,six pivotal factors have been identified—raw material composition,steel type,steel temperature,slag temperature,recycling practices,and operational parameters.Utilizing a framework based on an equivalent energy consumption model,an integrated intelligent diagnostic model has been developed that encapsulates these factors,providing a comprehensive assessment tool for converter energy consumption.Employing the K-means clustering algorithm,historical operational data from the converter have been meticulously analyzed to determine baseline values for essential variables such as energy consumption and recovery rates.Building upon this data-driven foundation,an innovative online system for the intelligent diagnosis of converter energy consumption has been crafted and implemented,enhancing the precision and efficiency of energy management.Upon implementation with energy consumption data at a steel plant in 2023,the diagnostic analysis performed by the system exposed significant variations in energy usage across different converter units.The analysis revealed that the most significant factor influencing the variation in energy consumption for both furnaces was the steel grade,with contributions of−0.550 and 0.379.展开更多
To enhance the rationality of the layout of electric vehicle charging stations,meet the actual needs of users,and optimise the service range and coverage efficiency of charging stations,this paper proposes an optimisa...To enhance the rationality of the layout of electric vehicle charging stations,meet the actual needs of users,and optimise the service range and coverage efficiency of charging stations,this paper proposes an optimisation strategy for the layout of electric vehicle charging stations that integrates Mini Batch K-Means and simulated annealing algorithms.By constructing a circle-like service area model with the charging station as the centre and a certain distance as the radius,the maximum coverage of electric vehicle charging stations in the region and the influence of different regional environments on charging demand are considered.Based on the real data of electric vehicle charging stations in Nanjing,Jiangsu Province,this paper uses the model proposed in this paper to optimise the layout of charging stations in the study area.The results show that the optimisation strategy incorporating Mini Batch K-Means and simulated annealing algorithms outperforms the existing charging station layouts in terms of coverage and the number of stations served,and compared to the original charging station layouts,the optimised charging station layouts have flatter Lorentzian curves and are closer to the average distribution.The proposed optimisation strategy not only improves the service efficiency and user satisfaction of EV(Electric Vehicle)charging stations but also provides a reference for the layout optimisation of EV charging stations in other cities,which has important practical value and promotion potential.展开更多
In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction m...Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction method,the photographic method has the advantages of simple operation and high extraction accuracy.However,when soil moisture and acquisition times vary,the extraction results are less accurate.To accommodate various conditions of FVC extraction,this study proposes a new FVC extraction method that extracts FVC from a normalized difference vegetation index(NDVI)greyscale image of wheat by using a density peak k-means(DPK-means)algorithm.In this study,Yangfumai 4(YF4)planted in pots and Yangmai 16(Y16)planted in the field were used as the research materials.With a hyperspectral imaging camera mounted on a tripod,ground hyperspectral images of winter wheat under different soil conditions(dry and wet)were collected at 1 m above the potted wheat canopy.Unmanned aerial vehicle(UAV)hyperspectral images of winter wheat at various stages were collected at 50 m above the field wheat canopy by a UAV equipped with a hyperspectral camera.The pixel dichotomy method and DPK-means algorithm were used to classify vegetation pixels and non-vegetation pixels in NDVI greyscale images of wheat,and the extraction effects of the two methods were compared and analysed.The results showed that extraction by pixel dichotomy was influenced by the acquisition conditions and its error distribution was relatively scattered,while the extraction effect of the DPK-means algorithm was less affected by the acquisition conditions and its error distribution was concentrated.The absolute values of error were 0.042 and 0.044,the root mean square errors(RMSE)were 0.028 and 0.030,and the fitting accuracy R2 of the FVC was 0.87 and 0.93,under dry and wet soil conditions and under various time conditions,respectively.This study found that the DPK-means algorithm was capable of achieving more accurate results than the pixel dichotomy method in various soil and time conditions and was an accurate and robust method for FVC extraction.展开更多
A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the a...A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the accuracy of the nominal flight profile,including the nominal altitude profile and the speed profile.First,considering the characteristics of trajectory data,we developed an improved K-means algorithm.The approach was to measure the similarity between different altitude profiles by integrating the space warp edit distance algorithm,thereby to acquire several fitted nominal flight altitude profiles.This approach breaks the constraints of traditional K-means algorithms.Second,to eliminate the influence of meteorological factors,we introduced historical gridded binary data to determine the en-route wind speed and temperature via inverse distance weighted interpolation.Finally,we facilitated the true airspeed determined by speed triangle relationships and the calibrated airspeed determined by aircraft data model to extract a more accurate nominal speed profile from each cluster,therefore we could describe the airspeed profiles above and below the airspeed transition altitude,respectively.Our experimental results showed that the proposed method could obtain a highly accurate nominal flight profile,which reflects the actual aircraft flight status.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty c...The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary principles of natural selection and genetics. This paper presents a hybrid version of the k-means algorithm with GAs that efficiently eliminates this empty cluster problem. Results of simulation experiments using several data sets prove our claim.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease ...Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.展开更多
The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature inclu...The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature includes much research on feature selection for supervised learning.However,feature selection for unsupervised learning has only recently been studied.Finding the subset of features in unsupervised learning that enhances the performance is challenging since the clusters are indeterminate.This work proposes a hybrid technique for unsupervised feature selection called GAk-MEANS,which combines the genetic algorithm(GA)approach with the classical k-Means algorithm.In the proposed algorithm,a new fitness func-tion is designed in addition to new smart crossover and mutation operators.The effectiveness of this algorithm is demonstrated on various datasets.Fur-thermore,the performance of GAk-MEANS has been compared with other genetic algorithms,such as the genetic algorithm using the Sammon Error Function and the genetic algorithm using the Sum of Squared Error Function.Additionally,the performance of GAk-MEANS is compared with the state-of-the-art statistical unsupervised feature selection techniques.Experimental results show that GAk-MEANS consistently selects subsets of features that result in better classification accuracy compared to others.In particular,GAk-MEANS is able to significantly reduce the size of the subset of selected features by an average of 86.35%(72%–96.14%),which leads to an increase of the accuracy by an average of 3.78%(1.05%–6.32%)compared to using all features.When compared with the genetic algorithm using the Sammon Error Function,GAk-MEANS is able to reduce the size of the subset of selected features by 41.29%on average,improve the accuracy by 5.37%,and reduce the time by 70.71%.When compared with the genetic algorithm using the Sum of Squared Error Function,GAk-MEANS on average is able to reduce the size of the subset of selected features by 15.91%,and improve the accuracy by 9.81%,but the time is increased by a factor of 3.When compared with the machine-learning based methods,we observed that GAk-MEANS is able to increase the accuracy by 13.67%on average with an 88.76%average increase in time.展开更多
The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the ...The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.展开更多
Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-clus...Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.展开更多
The Bat algorithm,a metaheuristic optimization technique inspired by the foraging behaviour of bats,has been employed to tackle optimization problems.Known for its ease of implementation,parameter tunability,and stron...The Bat algorithm,a metaheuristic optimization technique inspired by the foraging behaviour of bats,has been employed to tackle optimization problems.Known for its ease of implementation,parameter tunability,and strong global search capabilities,this algorithm finds application across diverse optimization problem domains.However,in the face of increasingly complex optimization challenges,the Bat algorithm encounters certain limitations,such as slow convergence and sensitivity to initial solutions.In order to tackle these challenges,the present study incorporates a range of optimization compo-nents into the Bat algorithm,thereby proposing a variant called PKEBA.A projection screening strategy is implemented to mitigate its sensitivity to initial solutions,thereby enhancing the quality of the initial solution set.A kinetic adaptation strategy reforms exploration patterns,while an elite communication strategy enhances group interaction,to avoid algorithm from local optima.Subsequently,the effectiveness of the proposed PKEBA is rigorously evaluated.Testing encompasses 30 benchmark functions from IEEE CEC2014,featuring ablation experiments and comparative assessments against classical algorithms and their variants.Moreover,real-world engineering problems are employed as further validation.The results conclusively demonstrate that PKEBA ex-hibits superior convergence and precision compared to existing algorithms.展开更多
In wireless sensor network cluster architecture is useful because of its inherent suitability for data fusion. In this paper we represent a new approach called Multiple Parameter based Clustering (MPC) embedded with t...In wireless sensor network cluster architecture is useful because of its inherent suitability for data fusion. In this paper we represent a new approach called Multiple Parameter based Clustering (MPC) embedded with the traditional k-means algorithm which takes different parameters (Node energy level, Euclidian distance from the base station, RSSI, Latency of data to reach base station) into consideration to form clusters. Then the effectiveness of the clusters is evaluated based on the uniformity of the node distribution, Node range per cluster, Intra and Inter cluster distance and required energy level of each centroid. Our result shows that by varying multiple parameters we can create clusters with more uniformly distributed nodes, minimize intra and maximize inter cluster distance and elect less power consuming centroid.展开更多
The bat algorithm(BA)is a metaheuristic algorithm for global optimisation that simulates the echolocation behaviour of bats with varying pulse rates of emission and loudness,which can be used to find the globally opti...The bat algorithm(BA)is a metaheuristic algorithm for global optimisation that simulates the echolocation behaviour of bats with varying pulse rates of emission and loudness,which can be used to find the globally optimal solutions for various optimisation problems.Knowing the recent criticises of the originality of equations,the principle of BA is concise and easy to implement,and its mathematical structure can be seen as a hybrid particle swarm with simulated annealing.In this research,the authors focus on the performance optimisation of BA as a solver rather than discussing its originality issues.In terms of operation effect,BA has an acceptable convergence speed.However,due to the low proportion of time used to explore the search space,it is easy to converge prematurely and fall into the local optima.The authors propose an adaptive multi-stage bat algorithm(AMSBA).By tuning the algorithm's focus at three different stages of the search process,AMSBA can achieve a better balance between exploration and exploitation and improve its exploration ability by enhancing its performance in escaping local optima as well as maintaining a certain convergence speed.Therefore,AMSBA can achieve solutions with better quality.A convergence analysis was conducted to demonstrate the global convergence of AMSBA.The authors also perform simulation experiments on 30 benchmark functions from IEEE CEC 2017 as the objective functions and compare AMSBA with some original and improved swarm-based algorithms.The results verify the effectiveness and superiority of AMSBA.AMSBA is also compared with eight representative optimisation algorithms on 10 benchmark functions derived from IEEE CEC 2020,while this experiment is carried out on five different dimensions of the objective functions respectively.A balance and diversity analysis was performed on AMSBA to demonstrate its improvement over the original BA in terms of balance.AMSBA was also applied to the multi-threshold image segmentation of Citrus Macular disease,which is a bacterial infection that causes lesions on citrus trees.The segmentation results were analysed by comparing each comparative algorithm's peak signal-to-noise ratio,structural similarity index and feature similarity index.The results show that the proposed BA-based algorithm has apparent advantages,and it can effectively segment the disease spots from citrus leaves when the segmentation threshold is at a low level.Based on a comprehensive study,the authors think the proposed optimiser has mitigated the main drawbacks of the BA,and it can be utilised as an effective optimisation tool.展开更多
Variable-fidelity(VF)surrogate models have received increasing attention in engineering design optimization as they can approximate expensive high-fidelity(HF)simulations with reduced computational power.A key challen...Variable-fidelity(VF)surrogate models have received increasing attention in engineering design optimization as they can approximate expensive high-fidelity(HF)simulations with reduced computational power.A key challenge to building a VF model is devising an adaptive model updating strategy that jointly selects additional low-fidelity(LF)and/or HF samples.The additional samples must enhance the model accuracy while maximizing the computational efficiency.We propose ISMA-VFEEI,a global optimization framework that integrates an Improved Slime-Mould Algorithm(ISMA)and a Variable-Fidelity Expected Extension Improvement(VFEEI)learning function to construct a VF surrogate model efficiently.First,A cost-aware VFEEI function guides the adaptive LF/HF sampling by explicitly incorporating evaluation cost and existing sample proximity.Second,ISMA is employed to solve the resulting non-convex optimization problem and identify global optimal infill points for model enhancement.The efficacy of ISMA-VFEEI is demonstrated through six numerical benchmarks and one real-world engineering case study.The engineering case study of a high-speed railway Electric Multiple Unit(EMU),the optimization objective of a sanding device attained a minimum value of 1.546 using only 20 HF evaluations,outperforming all the compared methods.展开更多
Solar radio burst(SRB)is one of the main natural interference sources of Global Positioning System(GPS)signals and can reduce the signal-to-noise ratio(SNR),directly affecting the tracking performance of GPS receivers...Solar radio burst(SRB)is one of the main natural interference sources of Global Positioning System(GPS)signals and can reduce the signal-to-noise ratio(SNR),directly affecting the tracking performance of GPS receivers.In this paper,a tracking algorithm based on the adaptive Kalman filter(AKF)with carrier-to-noise ratio estimation is proposed and compared with the conventional second-order phase-locked loop tracking algo-rithms and the improved Sage-Husa adaptive Kalman filter(SHAKF)algorithm.It is discovered that when the SRBs occur,the improved SHAKF and the AKF with carrier-to-noise ratio estimation enable stable tracking to loop signals.The conven-tional second-order phase-locked loop tracking algorithms fail to track the receiver signal.The standard deviation of the carrier phase error of the AKF with carrier-to-noise ratio estimation out-performs 50.51%of the improved SHAKF algorithm,showing less fluctuation and better stability.The proposed algorithm is proven to show more excellent adaptability in the severe envi-ronment caused by the SRB occurrence and has better tracking performance.展开更多
基金financial support from the National Key R&D Program of China(Grant No.2020YFB1711100).
文摘To address the challenge of identifying the primary causes of energy consumption fluctuations and accurately assessing the influence of various factors in the converter unit of an iron and steel plant,the focus is placed on the critical components of material and heat balance.Through a thorough analysis of the interactions between various components and energy consumptions,six pivotal factors have been identified—raw material composition,steel type,steel temperature,slag temperature,recycling practices,and operational parameters.Utilizing a framework based on an equivalent energy consumption model,an integrated intelligent diagnostic model has been developed that encapsulates these factors,providing a comprehensive assessment tool for converter energy consumption.Employing the K-means clustering algorithm,historical operational data from the converter have been meticulously analyzed to determine baseline values for essential variables such as energy consumption and recovery rates.Building upon this data-driven foundation,an innovative online system for the intelligent diagnosis of converter energy consumption has been crafted and implemented,enhancing the precision and efficiency of energy management.Upon implementation with energy consumption data at a steel plant in 2023,the diagnostic analysis performed by the system exposed significant variations in energy usage across different converter units.The analysis revealed that the most significant factor influencing the variation in energy consumption for both furnaces was the steel grade,with contributions of−0.550 and 0.379.
基金supported by the Jiangsu Provincial College Students Innovation andEntrepreneurship Training Plan Project(grant number 202411276037Z)the Nanjing Institute ofTechnology Fund for Research Startup Projects of Introduced Talents(grant number TB202406012).
文摘To enhance the rationality of the layout of electric vehicle charging stations,meet the actual needs of users,and optimise the service range and coverage efficiency of charging stations,this paper proposes an optimisation strategy for the layout of electric vehicle charging stations that integrates Mini Batch K-Means and simulated annealing algorithms.By constructing a circle-like service area model with the charging station as the centre and a certain distance as the radius,the maximum coverage of electric vehicle charging stations in the region and the influence of different regional environments on charging demand are considered.Based on the real data of electric vehicle charging stations in Nanjing,Jiangsu Province,this paper uses the model proposed in this paper to optimise the layout of charging stations in the study area.The results show that the optimisation strategy incorporating Mini Batch K-Means and simulated annealing algorithms outperforms the existing charging station layouts in terms of coverage and the number of stations served,and compared to the original charging station layouts,the optimised charging station layouts have flatter Lorentzian curves and are closer to the average distribution.The proposed optimisation strategy not only improves the service efficiency and user satisfaction of EV(Electric Vehicle)charging stations but also provides a reference for the layout optimisation of EV charging stations in other cities,which has important practical value and promotion potential.
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金supported by the Beijing Natural Science Foundation,China(4202066)the Central Public-interest Scientific Institution Basal Research Fund,China(JBYWAII-2020-29 and JBYW-AII-2020-31)+1 种基金the Key Research and Development Program of Hebei Province,China(19227407D)the Technology Innovation Project Fund of Chinese Academy of Agricultural Sciences(CAAS-ASTIP2020-All)。
文摘Fractional vegetation cover(FVC)is an important parameter to measure crop growth.In studies of crop growth monitoring,it is very important to extract FVC quickly and accurately.As the most widely used FVC extraction method,the photographic method has the advantages of simple operation and high extraction accuracy.However,when soil moisture and acquisition times vary,the extraction results are less accurate.To accommodate various conditions of FVC extraction,this study proposes a new FVC extraction method that extracts FVC from a normalized difference vegetation index(NDVI)greyscale image of wheat by using a density peak k-means(DPK-means)algorithm.In this study,Yangfumai 4(YF4)planted in pots and Yangmai 16(Y16)planted in the field were used as the research materials.With a hyperspectral imaging camera mounted on a tripod,ground hyperspectral images of winter wheat under different soil conditions(dry and wet)were collected at 1 m above the potted wheat canopy.Unmanned aerial vehicle(UAV)hyperspectral images of winter wheat at various stages were collected at 50 m above the field wheat canopy by a UAV equipped with a hyperspectral camera.The pixel dichotomy method and DPK-means algorithm were used to classify vegetation pixels and non-vegetation pixels in NDVI greyscale images of wheat,and the extraction effects of the two methods were compared and analysed.The results showed that extraction by pixel dichotomy was influenced by the acquisition conditions and its error distribution was relatively scattered,while the extraction effect of the DPK-means algorithm was less affected by the acquisition conditions and its error distribution was concentrated.The absolute values of error were 0.042 and 0.044,the root mean square errors(RMSE)were 0.028 and 0.030,and the fitting accuracy R2 of the FVC was 0.87 and 0.93,under dry and wet soil conditions and under various time conditions,respectively.This study found that the DPK-means algorithm was capable of achieving more accurate results than the pixel dichotomy method in various soil and time conditions and was an accurate and robust method for FVC extraction.
基金supported by the National Natural Science Foundation of China(Nos.61174180,U1433125)the Jiangsu Province Science Foundation (No.BK20141413)the Chinese Postdoctoral Science Foundation (No.2014M550291)
文摘A high-precision nominal flight profile,involving controllers′intentions is critical for 4Dtrajectory estimation in modern automatic air traffic control systems.We proposed a novel method to effectively improve the accuracy of the nominal flight profile,including the nominal altitude profile and the speed profile.First,considering the characteristics of trajectory data,we developed an improved K-means algorithm.The approach was to measure the similarity between different altitude profiles by integrating the space warp edit distance algorithm,thereby to acquire several fitted nominal flight altitude profiles.This approach breaks the constraints of traditional K-means algorithms.Second,to eliminate the influence of meteorological factors,we introduced historical gridded binary data to determine the en-route wind speed and temperature via inverse distance weighted interpolation.Finally,we facilitated the true airspeed determined by speed triangle relationships and the calibrated airspeed determined by aircraft data model to extract a more accurate nominal speed profile from each cluster,therefore we could describe the airspeed profiles above and below the airspeed transition altitude,respectively.Our experimental results showed that the proposed method could obtain a highly accurate nominal flight profile,which reflects the actual aircraft flight status.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
文摘The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary principles of natural selection and genetics. This paper presents a hybrid version of the k-means algorithm with GAs that efficiently eliminates this empty cluster problem. Results of simulation experiments using several data sets prove our claim.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.
文摘The dimensionality of data is increasing very rapidly,which creates challenges for most of the current mining and learning algorithms,such as large memory requirements and high computational costs.The literature includes much research on feature selection for supervised learning.However,feature selection for unsupervised learning has only recently been studied.Finding the subset of features in unsupervised learning that enhances the performance is challenging since the clusters are indeterminate.This work proposes a hybrid technique for unsupervised feature selection called GAk-MEANS,which combines the genetic algorithm(GA)approach with the classical k-Means algorithm.In the proposed algorithm,a new fitness func-tion is designed in addition to new smart crossover and mutation operators.The effectiveness of this algorithm is demonstrated on various datasets.Fur-thermore,the performance of GAk-MEANS has been compared with other genetic algorithms,such as the genetic algorithm using the Sammon Error Function and the genetic algorithm using the Sum of Squared Error Function.Additionally,the performance of GAk-MEANS is compared with the state-of-the-art statistical unsupervised feature selection techniques.Experimental results show that GAk-MEANS consistently selects subsets of features that result in better classification accuracy compared to others.In particular,GAk-MEANS is able to significantly reduce the size of the subset of selected features by an average of 86.35%(72%–96.14%),which leads to an increase of the accuracy by an average of 3.78%(1.05%–6.32%)compared to using all features.When compared with the genetic algorithm using the Sammon Error Function,GAk-MEANS is able to reduce the size of the subset of selected features by 41.29%on average,improve the accuracy by 5.37%,and reduce the time by 70.71%.When compared with the genetic algorithm using the Sum of Squared Error Function,GAk-MEANS on average is able to reduce the size of the subset of selected features by 15.91%,and improve the accuracy by 9.81%,but the time is increased by a factor of 3.When compared with the machine-learning based methods,we observed that GAk-MEANS is able to increase the accuracy by 13.67%on average with an 88.76%average increase in time.
文摘The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.
文摘Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.
基金partially supported by MRC(MC_PC_17171)Royal Society(RP202G0230)+8 种基金BHF(AA/18/3/34220)Hope Foundation for Cancer Research(RM60G0680)GCRF(20P2PF11)Sino-UK Industrial Fund(RP202G0289)LIAS(20P2ED10,20P2RE969)Data Science Enhancement Fund(20P2RE237)Fight for Sight(24NN201)Sino-UK Education Fund(OP202006)BBSRC(RM32G0178B8).
文摘The Bat algorithm,a metaheuristic optimization technique inspired by the foraging behaviour of bats,has been employed to tackle optimization problems.Known for its ease of implementation,parameter tunability,and strong global search capabilities,this algorithm finds application across diverse optimization problem domains.However,in the face of increasingly complex optimization challenges,the Bat algorithm encounters certain limitations,such as slow convergence and sensitivity to initial solutions.In order to tackle these challenges,the present study incorporates a range of optimization compo-nents into the Bat algorithm,thereby proposing a variant called PKEBA.A projection screening strategy is implemented to mitigate its sensitivity to initial solutions,thereby enhancing the quality of the initial solution set.A kinetic adaptation strategy reforms exploration patterns,while an elite communication strategy enhances group interaction,to avoid algorithm from local optima.Subsequently,the effectiveness of the proposed PKEBA is rigorously evaluated.Testing encompasses 30 benchmark functions from IEEE CEC2014,featuring ablation experiments and comparative assessments against classical algorithms and their variants.Moreover,real-world engineering problems are employed as further validation.The results conclusively demonstrate that PKEBA ex-hibits superior convergence and precision compared to existing algorithms.
文摘In wireless sensor network cluster architecture is useful because of its inherent suitability for data fusion. In this paper we represent a new approach called Multiple Parameter based Clustering (MPC) embedded with the traditional k-means algorithm which takes different parameters (Node energy level, Euclidian distance from the base station, RSSI, Latency of data to reach base station) into consideration to form clusters. Then the effectiveness of the clusters is evaluated based on the uniformity of the node distribution, Node range per cluster, Intra and Inter cluster distance and required energy level of each centroid. Our result shows that by varying multiple parameters we can create clusters with more uniformly distributed nodes, minimize intra and maximize inter cluster distance and elect less power consuming centroid.
基金BBSRC,Grant/Award Number:RM32G0178B8National Natural Science Foundation of China,Grant/Award Numbers:U19A2061,U1809209,62076185+11 种基金Science and Technology Development Project of Jilin Province,Grant/Award Number:20190301024NYJilin Provincial Industrial Innovation Special Fund Project,Grant/Award Number:2018C039-3MRC,Grant/Award Number:MC_PC_17171Royal Society,Grant/Award Number:RP202G0230BHF,Grant/Award Number:AA/18/3/34220Hope Foundation for Cancer Research,Grant/Award Number:RM60G0680GCRF,Grant/Award Number:P202PF11Sino-UK Industrial Fund,Grant/Award Number:RP202G0289LIAS,Grant/Award Numbers:P202ED10,P202RE969Data Science Enhancement Fund,Grant/Award Number:P202RE237Fight for Sight,Grant/Award Number:24NN201Sino-UK Education Fund,Grant/Award Number:OP202006。
文摘The bat algorithm(BA)is a metaheuristic algorithm for global optimisation that simulates the echolocation behaviour of bats with varying pulse rates of emission and loudness,which can be used to find the globally optimal solutions for various optimisation problems.Knowing the recent criticises of the originality of equations,the principle of BA is concise and easy to implement,and its mathematical structure can be seen as a hybrid particle swarm with simulated annealing.In this research,the authors focus on the performance optimisation of BA as a solver rather than discussing its originality issues.In terms of operation effect,BA has an acceptable convergence speed.However,due to the low proportion of time used to explore the search space,it is easy to converge prematurely and fall into the local optima.The authors propose an adaptive multi-stage bat algorithm(AMSBA).By tuning the algorithm's focus at three different stages of the search process,AMSBA can achieve a better balance between exploration and exploitation and improve its exploration ability by enhancing its performance in escaping local optima as well as maintaining a certain convergence speed.Therefore,AMSBA can achieve solutions with better quality.A convergence analysis was conducted to demonstrate the global convergence of AMSBA.The authors also perform simulation experiments on 30 benchmark functions from IEEE CEC 2017 as the objective functions and compare AMSBA with some original and improved swarm-based algorithms.The results verify the effectiveness and superiority of AMSBA.AMSBA is also compared with eight representative optimisation algorithms on 10 benchmark functions derived from IEEE CEC 2020,while this experiment is carried out on five different dimensions of the objective functions respectively.A balance and diversity analysis was performed on AMSBA to demonstrate its improvement over the original BA in terms of balance.AMSBA was also applied to the multi-threshold image segmentation of Citrus Macular disease,which is a bacterial infection that causes lesions on citrus trees.The segmentation results were analysed by comparing each comparative algorithm's peak signal-to-noise ratio,structural similarity index and feature similarity index.The results show that the proposed BA-based algorithm has apparent advantages,and it can effectively segment the disease spots from citrus leaves when the segmentation threshold is at a low level.Based on a comprehensive study,the authors think the proposed optimiser has mitigated the main drawbacks of the BA,and it can be utilised as an effective optimisation tool.
基金funded by National Natural Science Foundation of China(grant No.52405255)Special Program of Huzhou(grant No.2023GZ05)+1 种基金Projects of Huzhou Science and Technology Correspondent(grant No.2023KT76)Guangdong Basic and Applied Basic Research Foundation(grant No.2025A1515010487)。
文摘Variable-fidelity(VF)surrogate models have received increasing attention in engineering design optimization as they can approximate expensive high-fidelity(HF)simulations with reduced computational power.A key challenge to building a VF model is devising an adaptive model updating strategy that jointly selects additional low-fidelity(LF)and/or HF samples.The additional samples must enhance the model accuracy while maximizing the computational efficiency.We propose ISMA-VFEEI,a global optimization framework that integrates an Improved Slime-Mould Algorithm(ISMA)and a Variable-Fidelity Expected Extension Improvement(VFEEI)learning function to construct a VF surrogate model efficiently.First,A cost-aware VFEEI function guides the adaptive LF/HF sampling by explicitly incorporating evaluation cost and existing sample proximity.Second,ISMA is employed to solve the resulting non-convex optimization problem and identify global optimal infill points for model enhancement.The efficacy of ISMA-VFEEI is demonstrated through six numerical benchmarks and one real-world engineering case study.The engineering case study of a high-speed railway Electric Multiple Unit(EMU),the optimization objective of a sanding device attained a minimum value of 1.546 using only 20 HF evaluations,outperforming all the compared methods.
基金supported by the Foundation of Key Laboratory of Micro-inertial Instrument and Advanced Navigation Technology,Ministry of Education,Chinathe National Natural Science Foundation of China (61873064)
文摘Solar radio burst(SRB)is one of the main natural interference sources of Global Positioning System(GPS)signals and can reduce the signal-to-noise ratio(SNR),directly affecting the tracking performance of GPS receivers.In this paper,a tracking algorithm based on the adaptive Kalman filter(AKF)with carrier-to-noise ratio estimation is proposed and compared with the conventional second-order phase-locked loop tracking algo-rithms and the improved Sage-Husa adaptive Kalman filter(SHAKF)algorithm.It is discovered that when the SRBs occur,the improved SHAKF and the AKF with carrier-to-noise ratio estimation enable stable tracking to loop signals.The conven-tional second-order phase-locked loop tracking algorithms fail to track the receiver signal.The standard deviation of the carrier phase error of the AKF with carrier-to-noise ratio estimation out-performs 50.51%of the improved SHAKF algorithm,showing less fluctuation and better stability.The proposed algorithm is proven to show more excellent adaptability in the severe envi-ronment caused by the SRB occurrence and has better tracking performance.