In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reve...In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.展开更多
Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,lo...Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.展开更多
Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro c...Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.展开更多
Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stag...Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.展开更多
With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a larg...With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage.An effective clustering of the objects can minimize the I/O cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two dimension coordinate, and then develop a density based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.展开更多
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases...We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.展开更多
Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missi...Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.展开更多
Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensiv...Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensively utilised in pharmaceuticals for the comminution of the excipients or drugs.Surprisingly,for ball mill,little is known concerning the mechanism of size reduction.Traditional prediction approaches are not deemed useful to provide significant insights into the operation or facilitate radical step changes in performance.Therefore,the discrete element method(DEM)as a computational modelling approach has been used in this paper.In previous research,DEM has been applied to simulate breaking behaviour through the impact energy of all ball collisions as the driving force for fracturing.However,the nature of pharmaceutical material fragmentation during ball milling is more complex.Suitable functional equations which link broken media and applied energy do not consider the collision of particulate media of different shapes or collisions of particulate media(such as granules)with balls and rotating mill drum.This could have a significant impact on fragmentation.Therefore,this paper aimed to investigate the fragmentation of bounded particles into DEM granules of different shape/size during the ball milling process.A systematic study was undertaken to explore the effect of milling speed on breakage behaviour.Also,in this study,a combination of a density-based clustering method and discrete element method was employed to numerically investigate the number and size of the fragments generated during the ball milling process over time.It was discovered that the collisions of the ball increased proportionally with rotation speed until reaching the critical rotation speed.Consequently,results illustrate that with an increase of rotation speed,the mill power increased correspondingly.The caratacting motion of mill material together with balls was identified as the most effective regime regarding the fragmentation,and fewer breakage events occurred for centrifugal motion.Higher quantities of the fines in each batch were produced with increased milling speed with less quantities of grain fragments.Moreover,the relationship between the number of produced fragment and milling speed at the end of the process exhibited a linear tendency.展开更多
As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have...As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.展开更多
Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique bas...Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique based on low rank tensor approximation is proposed for multichannel monitoring signal processing and utilization.Firstly,the affinity relationship of multichannel signals can be acquired based on the clustering results of each channel signal.Wherein an affinity tensor is constructed to integrate the diverse and consistent information of the clustering information among multichannel signals.Secondly,a low-rank tensor optimization model is built and the joint affinity matrix is optimized with the assistance of the strong confidence affinity matrix.Through solving the optimization model,the fused affinity relationship graph of multichannel signals can be obtained.Finally,the multichannel fused clustering results can be acquired though the updated joint affinity relationship graph.The multichannel signal utilization examples in health state assessment with public datasets and microwave detection with actual echoes verify the advantages and effectiveness of the proposed method.展开更多
This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the compl...This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.展开更多
Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rel...Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.展开更多
The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,convention...The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,conventional clustering-based methods face notable drawbacks,including poor scalability in handling high-dimensional datasets and a strong dependence of outcomes on initial conditions.To overcome the performance limitations of existing methods,this study proposes a novel quantum-inspired clustering algorithm that relies on a similarity coefficient-based quantum genetic algorithm(SC-QGA)and an improved quantum artificial bee colony algorithm hybrid K-means(IQABC-K).First,the SC-QGA algorithmis constructed based on quantum computing and integrates similarity coefficient theory to strengthen genetic diversity and feature extraction capabilities.For the subsequent clustering phase,the process based on the IQABC-K algorithm is enhanced with the core improvement of adaptive rotation gate and movement exploitation strategies to balance the exploration capabilities of global search and the exploitation capabilities of local search.Simultaneously,the acceleration of convergence toward the global optimum and a reduction in computational complexity are facilitated by means of the global optimum bootstrap strategy and a linear population reduction strategy.Through experimental evaluation with multiple algorithms and diverse performance metrics,the proposed algorithm confirms reliable accuracy on three datasets:KDD CUP99,NSL_KDD,and UNSW_NB15,achieving accuracy of 98.57%,98.81%,and 98.32%,respectively.These results affirm its potential as an effective solution for practical clustering applications.展开更多
Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the constru...Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.展开更多
AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 to...AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.展开更多
Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous...Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous.Considering the challenge mentioned,this article proposes a clustering-based machine learning(ML)framework to enhance the stability of EPS networks by suppressing LFOs through real-time tuning of key power system stabilizer(PSS)parameters.To validate the proposed strategy,two distinct EPS networks are selected:the single-machine infinite-bus(SMIB)with a single-stage PSS and the unified power flow controller(UPFC)coordinated SMIB with a double-stage PSS.To generate data under various loading conditions for both networks,an efficient but offline meta-heuristic algorithm,namely the grey wolf optimizer(GWO),is used,with the loading conditions as inputs and the key PSS parameters as outputs.The generated loading conditions are then clustered using the fuzzy k-means(FKM)clustering method.Finally,the group method of data handling(GMDH)and long short-term memory(LSTM)ML models are developed for clustered data to predict PSS key parameters in real time for any loading condition.A few well-known statistical performance indices(SPI)are considered for validation and robustness of the training and testing procedure of the developed FKM-GMDH and FKM-LSTM models based on the prediction of PSS parameters.The performance of the ML models is also evaluated using three stability indices(i.e.,minimum damping ratio,eigenvalues,and time-domain simulations)after optimally tuned PSS with real-time estimated parameters under changing operating conditions.Besides,the outputs of the offline(GWO-based)metaheuristic model,proposed real-time(FKM-GMDH and FKM-LSTM)machine learning models,and previously reported literature models are compared.According to the results,the proposed methodology outperforms the others in enhancing the stability of the selected EPS networks by damping out the observed unwanted LFOs under various loading conditions.展开更多
Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algor...Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.展开更多
Overlapping community detection in a network is a challenging issue which attracts lots of attention in recent years.A notion of hesitant node(HN) is proposed. An HN contacts with multiple communities while the comm...Overlapping community detection in a network is a challenging issue which attracts lots of attention in recent years.A notion of hesitant node(HN) is proposed. An HN contacts with multiple communities while the communications are not strong or even accidental, thus the HN holds an implicit community structure.However, HNs are not rare in the real world network. It is important to identify them because they can be efficient hubs which form the overlapping portions of communities or simple attached nodes to some communities. Current approaches have difficulties in identifying and clustering HNs. A density-based rough set model(DBRSM) is proposed by combining the virtue of densitybased algorithms and rough set models. It incorporates the macro perspective of the community structure of the whole network and the micro perspective of the local information held by HNs, which would facilitate the further "growth" of HNs in community. We offer a theoretical support for this model from the point of strength of the trust path. The experiments on the real-world and synthetic datasets show the practical significance of analyzing and clustering the HNs based on DBRSM. Besides, the clustering based on DBRSM promotes the modularity optimization.展开更多
Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflecti...Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.展开更多
文摘In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.
基金the Deanship of Scientific Research at Umm Al-Qura University,Grant Code:(23UQU4361009DSR001).
文摘Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.
文摘Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R113)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.
基金This work is supported by University IT Research Center Project in KOREA.
文摘With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage.An effective clustering of the objects can minimize the I/O cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two dimension coordinate, and then develop a density based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.
文摘We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.
基金supported by the National Natural Science Foundation of China(Nos.U1866602 and 71773025)the National Key Research and Development Program of China(No.2020YFB1006104)
文摘Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.
基金supported by the Career-FIT Fellowshipsfunded through European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No.713654supported by ACCORD(ITMS project code:313021X329),funded through the European Regional Development Fund.
文摘Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensively utilised in pharmaceuticals for the comminution of the excipients or drugs.Surprisingly,for ball mill,little is known concerning the mechanism of size reduction.Traditional prediction approaches are not deemed useful to provide significant insights into the operation or facilitate radical step changes in performance.Therefore,the discrete element method(DEM)as a computational modelling approach has been used in this paper.In previous research,DEM has been applied to simulate breaking behaviour through the impact energy of all ball collisions as the driving force for fracturing.However,the nature of pharmaceutical material fragmentation during ball milling is more complex.Suitable functional equations which link broken media and applied energy do not consider the collision of particulate media of different shapes or collisions of particulate media(such as granules)with balls and rotating mill drum.This could have a significant impact on fragmentation.Therefore,this paper aimed to investigate the fragmentation of bounded particles into DEM granules of different shape/size during the ball milling process.A systematic study was undertaken to explore the effect of milling speed on breakage behaviour.Also,in this study,a combination of a density-based clustering method and discrete element method was employed to numerically investigate the number and size of the fragments generated during the ball milling process over time.It was discovered that the collisions of the ball increased proportionally with rotation speed until reaching the critical rotation speed.Consequently,results illustrate that with an increase of rotation speed,the mill power increased correspondingly.The caratacting motion of mill material together with balls was identified as the most effective regime regarding the fragmentation,and fewer breakage events occurred for centrifugal motion.Higher quantities of the fines in each batch were produced with increased milling speed with less quantities of grain fragments.Moreover,the relationship between the number of produced fragment and milling speed at the end of the process exhibited a linear tendency.
基金supported by the National Natural Science Foundation of China (12473105 and 12473106)the central government guides local funds for science and technology development (YDZJSX2024D049)the Graduate Student Practice and Innovation Program of Shanxi Province (2024SJ313)
文摘As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.
基金supported by Shanghai Aerospace Science and Technology Innovation Foundation(SAST2023-075)。
文摘Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique based on low rank tensor approximation is proposed for multichannel monitoring signal processing and utilization.Firstly,the affinity relationship of multichannel signals can be acquired based on the clustering results of each channel signal.Wherein an affinity tensor is constructed to integrate the diverse and consistent information of the clustering information among multichannel signals.Secondly,a low-rank tensor optimization model is built and the joint affinity matrix is optimized with the assistance of the strong confidence affinity matrix.Through solving the optimization model,the fused affinity relationship graph of multichannel signals can be obtained.Finally,the multichannel fused clustering results can be acquired though the updated joint affinity relationship graph.The multichannel signal utilization examples in health state assessment with public datasets and microwave detection with actual echoes verify the advantages and effectiveness of the proposed method.
基金supported by the Research Project of China Southern Power Grid(No.056200KK52222031).
文摘This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.
基金supported by the NSFC(Grant Nos.62176273,62271070,62441212)The Open Foundation of State Key Laboratory of Networking and Switching Technology(Beijing University of Posts and Telecommunications)under Grant SKLNST-2024-1-062025Major Project of the Natural Science Foundation of Inner Mongolia(2025ZD008).
文摘The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,conventional clustering-based methods face notable drawbacks,including poor scalability in handling high-dimensional datasets and a strong dependence of outcomes on initial conditions.To overcome the performance limitations of existing methods,this study proposes a novel quantum-inspired clustering algorithm that relies on a similarity coefficient-based quantum genetic algorithm(SC-QGA)and an improved quantum artificial bee colony algorithm hybrid K-means(IQABC-K).First,the SC-QGA algorithmis constructed based on quantum computing and integrates similarity coefficient theory to strengthen genetic diversity and feature extraction capabilities.For the subsequent clustering phase,the process based on the IQABC-K algorithm is enhanced with the core improvement of adaptive rotation gate and movement exploitation strategies to balance the exploration capabilities of global search and the exploitation capabilities of local search.Simultaneously,the acceleration of convergence toward the global optimum and a reduction in computational complexity are facilitated by means of the global optimum bootstrap strategy and a linear population reduction strategy.Through experimental evaluation with multiple algorithms and diverse performance metrics,the proposed algorithm confirms reliable accuracy on three datasets:KDD CUP99,NSL_KDD,and UNSW_NB15,achieving accuracy of 98.57%,98.81%,and 98.32%,respectively.These results affirm its potential as an effective solution for practical clustering applications.
基金supported by the National Natural Science Foundation of China(Grant Nos.52069029,52369026)the Belt and Road Special Foundation of National Key Laboratory of Water Disaster Preven-tion(Grant No.2023490411)+2 种基金the Yunnan Agricultural Basic Research Joint Special General Project(Grant Nos.202501BD070001-060,202401BD070001-071)Construction Project of the Yunnan Key Laboratory of Water Security(No.20254916CE340051)the Youth Talent Project of“Xingdian Talent Support Plan”in Yunnan Province(Grant No.XDYC-QNRC-2023-0412).
文摘Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.
基金Supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),the Ministry of Health&Welfare,Republic of Korea(No.RS-2020-KH088726)the Patient-Centered Clinical Research Coordinating Center(PACEN),the Ministry of Health and Welfare,Republic of Korea(No.HC19C0276)the National Research Foundation of Korea(NRF),the Korea Government(MSIT)(No.RS-2023-00247504).
文摘AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.
基金supported by the Deanship of Research at the King Fahd University of Petroleum&Minerals,Dhahran,31261,Saudi Arabia,under Project No.EC241001.
文摘Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous.Considering the challenge mentioned,this article proposes a clustering-based machine learning(ML)framework to enhance the stability of EPS networks by suppressing LFOs through real-time tuning of key power system stabilizer(PSS)parameters.To validate the proposed strategy,two distinct EPS networks are selected:the single-machine infinite-bus(SMIB)with a single-stage PSS and the unified power flow controller(UPFC)coordinated SMIB with a double-stage PSS.To generate data under various loading conditions for both networks,an efficient but offline meta-heuristic algorithm,namely the grey wolf optimizer(GWO),is used,with the loading conditions as inputs and the key PSS parameters as outputs.The generated loading conditions are then clustered using the fuzzy k-means(FKM)clustering method.Finally,the group method of data handling(GMDH)and long short-term memory(LSTM)ML models are developed for clustered data to predict PSS key parameters in real time for any loading condition.A few well-known statistical performance indices(SPI)are considered for validation and robustness of the training and testing procedure of the developed FKM-GMDH and FKM-LSTM models based on the prediction of PSS parameters.The performance of the ML models is also evaluated using three stability indices(i.e.,minimum damping ratio,eigenvalues,and time-domain simulations)after optimally tuned PSS with real-time estimated parameters under changing operating conditions.Besides,the outputs of the offline(GWO-based)metaheuristic model,proposed real-time(FKM-GMDH and FKM-LSTM)machine learning models,and previously reported literature models are compared.According to the results,the proposed methodology outperforms the others in enhancing the stability of the selected EPS networks by damping out the observed unwanted LFOs under various loading conditions.
文摘Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.
基金supported by the National Natural Science Foundation of China(71271018)
文摘Overlapping community detection in a network is a challenging issue which attracts lots of attention in recent years.A notion of hesitant node(HN) is proposed. An HN contacts with multiple communities while the communications are not strong or even accidental, thus the HN holds an implicit community structure.However, HNs are not rare in the real world network. It is important to identify them because they can be efficient hubs which form the overlapping portions of communities or simple attached nodes to some communities. Current approaches have difficulties in identifying and clustering HNs. A density-based rough set model(DBRSM) is proposed by combining the virtue of densitybased algorithms and rough set models. It incorporates the macro perspective of the community structure of the whole network and the micro perspective of the local information held by HNs, which would facilitate the further "growth" of HNs in community. We offer a theoretical support for this model from the point of strength of the trust path. The experiments on the real-world and synthetic datasets show the practical significance of analyzing and clustering the HNs based on DBRSM. Besides, the clustering based on DBRSM promotes the modularity optimization.
基金supported in part by Boeing Company and Nanjing University of Aeronautics and Astronautics(NUAA)through the Research on Decision Support Technology of Air Traffic Operation Management in Convective Weather under Project 2022-GT-129in part by the Postgraduate Research and Practice Innovation Program of NUAA(No.xcxjh20240709)。
文摘Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.