In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
Water quality assessment of lakes is important to determine functional zones of water use.Considering the fuzziness during the partitioning process for lake water quality in an arid area,a multiplex model of fuzzy clu...Water quality assessment of lakes is important to determine functional zones of water use.Considering the fuzziness during the partitioning process for lake water quality in an arid area,a multiplex model of fuzzy clustering with pattern recognition was developed by integrating transitive closure method,ISODATA algorithm in fuzzy clustering and fuzzy pattern recognition.The model was applied to partition the Ulansuhai Lake,a typical shallow lake in arid climate zone in the west part of Inner Mongolia,China and grade the condition of water quality divisions.The results showed that the partition well matched the real conditions of the lake,and the method has been proved accurate in the application.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the s...For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.展开更多
The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or under...The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or underlying data are represented in the form of graph and graph based matching is performed. The conventional algorithms of graph matching have higher complexity. This is because the most of the applications have large number of sub graphs and the matching of these sub graphs becomes computationally expensive. In this paper, we propose a graph based novel algorithm for fingerprint recognition. In our work we perform graph based clustering which reduces the computational complexity heavily. In our algorithm, we exploit structural features of the fingerprint for K-means clustering of the database. The proposed algorithm is evaluated using realtime fingerprint database and the simulation results show that our algorithm outperforms the existing algorithm for the same task.展开更多
The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the ...The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.展开更多
To improve the recognition rate of signal modulation recognition methods based on the clustering algorithm under the low SNR, a modulation recognition method is proposed. The characteristic parameter of the signal is ...To improve the recognition rate of signal modulation recognition methods based on the clustering algorithm under the low SNR, a modulation recognition method is proposed. The characteristic parameter of the signal is extracted by using a clustering algorithm, the neural network is trained by using the algorithm of variable gradient correction (Polak-Ribiere) so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram. Simulation results show that the recognition rate based on this algorithm is enhanced over 30% compared with the methods that adopt clustering algorithm or neural network based on the back propagation algorithm alone under the low SNR. The recognition rate can reach 90% when the SNR is 4 dB, and the method is easy to be achieved so that it has a broad application prospect in the modulating recognition.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease ...Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.展开更多
Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-clus...Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.展开更多
To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and ...To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and the GNC model which are based on Euclidean distance, the presented model is based on kernel-induced distance by using kernel method. By kernel method the input data are nonlinearly and implicitly mapped into a high-dimensional feature space, where the nonlinear pattern appears linear and the GNC algorithm is performed. It is unnecessary to calculate in high-dimensional feature space because the kernel function can do it just in input space. The effectiveness of the proposed algorithm is verified by experiments on three data sets. It is concluded that the KGNC algorithm has better clustering accuracy than FCM and GNC in clustering data sets containing noisy data.展开更多
The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty c...The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary principles of natural selection and genetics. This paper presents a hybrid version of the k-means algorithm with GAs that efficiently eliminates this empty cluster problem. Results of simulation experiments using several data sets prove our claim.展开更多
Transversal distribution of the steel strip thickness in the entry section of the cold rolling mill seriously affects to the flatness and transversal thickness precision of the final products. Pattern clustering metho...Transversal distribution of the steel strip thickness in the entry section of the cold rolling mill seriously affects to the flatness and transversal thickness precision of the final products. Pattern clustering method is introduced into the steel rolling field and used in the patterns recognition of transversal distribution of the steel strip thickness. The well-known k-means clustering algorithm has the advantage of being easily completed, but still has some drawbacks. An improved k-means clustering algorithm is presented, and the main improvements include: (1) the initial clustering points are preselected according to the density queue of data objects; and (2) Mahalanobis distance is applied instead of Euclidean distance in the actual application. Compared to the patterns obtained from the common kmeans algorithm, the patterns identified by the improved algorithm show that the improved clustering algorithm is well suitable for the patterns' recognition of transversal distribution of steel strip thickness and it will be useful in online quality control system.展开更多
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper prop...The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.展开更多
Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-com...Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.展开更多
To improve the intermodal service at Qingdao Jiaodong Airport,addressing operational challenges such as fuzzy passenger demand layering and insufficient cross-modal coordination,and to solve the core issues of supply-...To improve the intermodal service at Qingdao Jiaodong Airport,addressing operational challenges such as fuzzy passenger demand layering and insufficient cross-modal coordination,and to solve the core issues of supply-demand mismatches and a single pricing mechanism in the air-rail intermodal ticketing system,this study proposes a personalized ticketing optimization strategy based on user profiling.First,through extensive survey data,the study analyzes the personal attributes and travel characteristics of the surveyed passengers.Then,using the K-means clustering algorithm,the study clusters passengers'multidimensional features and determines the optimal number of clusters through the elbow method and silhouette coefficient method.This leads to the establishment of differentiated user labels:economy-class passengers,business-class passengers,and leisure-class passengers.The market segmentation research on passenger groups shows that these three distinct groups perceive the bottlenecks of intermodal services differently,especially exhibiting significant layering features in the key dimensions of time sensitivity and price sensitivity.The results provide a comparative scheme for improving the air-rail intermodal ticketing service at Qingdao Jiaodong International Airport,offering differentiated service strategies for each passenger group.Through responsive demand and resource optimization,this study has significant practical implications for enhancing passenger experience and strengthening the market competitiveness of the service.展开更多
Text clustering is an important research issue of clustering technique. It aims to use the similar characteristics or similar expression to group the text so that the texts in the same clusters have the greatest simil...Text clustering is an important research issue of clustering technique. It aims to use the similar characteristics or similar expression to group the text so that the texts in the same clusters have the greatest similarity, and those in different clusters have the greatest dissimilarity.There are many characteristics in Mongolian structure and writing-mode compared with other kinds of characters. By combining K-means and clone immune algorithm, we propose a novel clustering technique called ICKM. Numerical experiments on four elements sets illustrate the validity of our method in the clustering task for Mongolian.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Data clustering is crucial when it comes to data processing and analytics.The new clustering method overcomes the challenge of evaluating and extracting data from big data.Numerical or categorical data can be grouped....Data clustering is crucial when it comes to data processing and analytics.The new clustering method overcomes the challenge of evaluating and extracting data from big data.Numerical or categorical data can be grouped.Existing clustering methods favor numerical data clustering and ignore categorical data clustering.Until recently,the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods.However,these algorithms could not use the concept of categorical data for clustering.Following that,suggestions for expanding traditional categorical data processing methods were made.In addition to expansions,several new clustering methods and extensions have been proposed in recent years.ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them.This paper aims to modify the algo-rithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures.The parameterized ROCK algorithm is the name given to the modified algorithm(P-ROCK).The proposed modification makes the original algorithm moreflexible by using user-defined parameters.A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm.A comparison with the original ROCK algorithm is also provided.Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%.The proposed P-ROCK algorithm has improved the runtime and is moreflexible and scalable.展开更多
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
基金Supported by the National Natural Science Foundation of China (No.50269001, 50569002, 50669004)Natural Science Foundation of Inner Mongolia (No.200208020512, 200711020604)The Key Scientific and Technologic Project of the 10th Five-Year Plan of Inner Mongolia (No.20010103)
文摘Water quality assessment of lakes is important to determine functional zones of water use.Considering the fuzziness during the partitioning process for lake water quality in an arid area,a multiplex model of fuzzy clustering with pattern recognition was developed by integrating transitive closure method,ISODATA algorithm in fuzzy clustering and fuzzy pattern recognition.The model was applied to partition the Ulansuhai Lake,a typical shallow lake in arid climate zone in the west part of Inner Mongolia,China and grade the condition of water quality divisions.The results showed that the partition well matched the real conditions of the lake,and the method has been proved accurate in the application.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金supported in part by the National Natural Science Foundation of China under Grand No.61871129 and No.61301179Projects of Science and Technology Plan Guangdong Province under Grand No.2014A010101284
文摘For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.
文摘The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or underlying data are represented in the form of graph and graph based matching is performed. The conventional algorithms of graph matching have higher complexity. This is because the most of the applications have large number of sub graphs and the matching of these sub graphs becomes computationally expensive. In this paper, we propose a graph based novel algorithm for fingerprint recognition. In our work we perform graph based clustering which reduces the computational complexity heavily. In our algorithm, we exploit structural features of the fingerprint for K-means clustering of the database. The proposed algorithm is evaluated using realtime fingerprint database and the simulation results show that our algorithm outperforms the existing algorithm for the same task.
文摘The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.
基金supported by the National Natural Science Foundation of China(6107207061301179)the National Science and Technology Major Project(2010ZX03006-002-04)
文摘To improve the recognition rate of signal modulation recognition methods based on the clustering algorithm under the low SNR, a modulation recognition method is proposed. The characteristic parameter of the signal is extracted by using a clustering algorithm, the neural network is trained by using the algorithm of variable gradient correction (Polak-Ribiere) so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram. Simulation results show that the recognition rate based on this algorithm is enhanced over 30% compared with the methods that adopt clustering algorithm or neural network based on the back propagation algorithm alone under the low SNR. The recognition rate can reach 90% when the SNR is 4 dB, and the method is easy to be achieved so that it has a broad application prospect in the modulating recognition.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Several pests feed on leaves,stems,bases,and the entire plant,causing plant illnesses.As a result,it is vital to identify and eliminate the disease before causing any damage to plants.Manually detecting plant disease and treating it is pretty challenging in this period.Image processing is employed to detect plant disease since it requires much effort and an extended processing period.The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases,including Phytophthora infestans,Fusarium graminearum,Puccinia graminis,tomato yellow leaf curl.Therefore,this work uses the Support vector machine(SVM)classifier to detect and classify the plant disease using various steps like image acquisition,Pre-processing,Segmentation,feature extraction,and classification.The gray level co-occurrence matrix(GLCM)and the local binary pattern features(LBP)are used to identify the disease-affected portion of the plant leaf.According to experimental data,the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.
文摘Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application.
基金The 15th Plan National Defence Preven-tive Research Project (No.413030201)
文摘To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and the GNC model which are based on Euclidean distance, the presented model is based on kernel-induced distance by using kernel method. By kernel method the input data are nonlinearly and implicitly mapped into a high-dimensional feature space, where the nonlinear pattern appears linear and the GNC algorithm is performed. It is unnecessary to calculate in high-dimensional feature space because the kernel function can do it just in input space. The effectiveness of the proposed algorithm is verified by experiments on three data sets. It is concluded that the KGNC algorithm has better clustering accuracy than FCM and GNC in clustering data sets containing noisy data.
文摘The K-means method is one of the most widely used clustering methods and has been implemented in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary principles of natural selection and genetics. This paper presents a hybrid version of the k-means algorithm with GAs that efficiently eliminates this empty cluster problem. Results of simulation experiments using several data sets prove our claim.
基金Sponsored by National Natural Science Foundation of China(50705057)
文摘Transversal distribution of the steel strip thickness in the entry section of the cold rolling mill seriously affects to the flatness and transversal thickness precision of the final products. Pattern clustering method is introduced into the steel rolling field and used in the patterns recognition of transversal distribution of the steel strip thickness. The well-known k-means clustering algorithm has the advantage of being easily completed, but still has some drawbacks. An improved k-means clustering algorithm is presented, and the main improvements include: (1) the initial clustering points are preselected according to the density queue of data objects; and (2) Mahalanobis distance is applied instead of Euclidean distance in the actual application. Compared to the patterns obtained from the common kmeans algorithm, the patterns identified by the improved algorithm show that the improved clustering algorithm is well suitable for the patterns' recognition of transversal distribution of steel strip thickness and it will be useful in online quality control system.
文摘The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.
基金Deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support(GSR).
文摘Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.
文摘To improve the intermodal service at Qingdao Jiaodong Airport,addressing operational challenges such as fuzzy passenger demand layering and insufficient cross-modal coordination,and to solve the core issues of supply-demand mismatches and a single pricing mechanism in the air-rail intermodal ticketing system,this study proposes a personalized ticketing optimization strategy based on user profiling.First,through extensive survey data,the study analyzes the personal attributes and travel characteristics of the surveyed passengers.Then,using the K-means clustering algorithm,the study clusters passengers'multidimensional features and determines the optimal number of clusters through the elbow method and silhouette coefficient method.This leads to the establishment of differentiated user labels:economy-class passengers,business-class passengers,and leisure-class passengers.The market segmentation research on passenger groups shows that these three distinct groups perceive the bottlenecks of intermodal services differently,especially exhibiting significant layering features in the key dimensions of time sensitivity and price sensitivity.The results provide a comparative scheme for improving the air-rail intermodal ticketing service at Qingdao Jiaodong International Airport,offering differentiated service strategies for each passenger group.Through responsive demand and resource optimization,this study has significant practical implications for enhancing passenger experience and strengthening the market competitiveness of the service.
基金Supported by the Project of Inner Mongolia University for Nationalities(Grant No.NMDYB18009)the National Natural Science Foundation of China(Grant No.61473328+1 种基金 11401076)the Fundamental Research Funds for the Central Universities(Grant No.DUT18JC02)
文摘Text clustering is an important research issue of clustering technique. It aims to use the similar characteristics or similar expression to group the text so that the texts in the same clusters have the greatest similarity, and those in different clusters have the greatest dissimilarity.There are many characteristics in Mongolian structure and writing-mode compared with other kinds of characters. By combining K-means and clone immune algorithm, we propose a novel clustering technique called ICKM. Numerical experiments on four elements sets illustrate the validity of our method in the clustering task for Mongolian.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金supporting project number(RSP2022R498),King Saud University,Riyadh,Saudi Arabia.
文摘Data clustering is crucial when it comes to data processing and analytics.The new clustering method overcomes the challenge of evaluating and extracting data from big data.Numerical or categorical data can be grouped.Existing clustering methods favor numerical data clustering and ignore categorical data clustering.Until recently,the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods.However,these algorithms could not use the concept of categorical data for clustering.Following that,suggestions for expanding traditional categorical data processing methods were made.In addition to expansions,several new clustering methods and extensions have been proposed in recent years.ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them.This paper aims to modify the algo-rithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures.The parameterized ROCK algorithm is the name given to the modified algorithm(P-ROCK).The proposed modification makes the original algorithm moreflexible by using user-defined parameters.A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm.A comparison with the original ROCK algorithm is also provided.Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%.The proposed P-ROCK algorithm has improved the runtime and is moreflexible and scalable.