期刊文献+
共找到2,932篇文章
< 1 2 147 >
每页显示 20 50 100
A Hybrid Feature Selection and Clustering-Based Ensemble Learning Approach for Real-Time Fraud Detection in Financial Transactions
1
作者 Naif Almusallam Junaid Qayyum 《Computers, Materials & Continua》 2025年第11期3653-3687,共35页
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction m... This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics. 展开更多
关键词 Fraud detection financial transactions economic impact feature selection clusterING ensemble learning
在线阅读 下载PDF
Enhancing BERTopic with Pre-Clustered Knowledge: Reducing Feature Sparsity in Short Text Topic Modeling
2
作者 Qian Wang Biao Ma 《Journal of Data Analysis and Information Processing》 2024年第4期597-611,共15页
Modeling topics in short texts presents significant challenges due to feature sparsity, particularly when analyzing content generated by large-scale online users. This sparsity can substantially impair semantic captur... Modeling topics in short texts presents significant challenges due to feature sparsity, particularly when analyzing content generated by large-scale online users. This sparsity can substantially impair semantic capture accuracy. We propose a novel approach that incorporates pre-clustered knowledge into the BERTopic model while reducing the l2 norm for low-frequency words. Our method effectively mitigates feature sparsity during cluster mapping. Empirical evaluation on the StackOverflow dataset demonstrates that our approach outperforms baseline models, achieving superior Macro-F1 scores. These results validate the effectiveness of our proposed feature sparsity reduction technique for short-text topic modeling. 展开更多
关键词 Topic Model BERTopic Short Text feature Sparsity cluster
在线阅读 下载PDF
PhytoCluster:a generative deep learning model for clustering plant single-cell RNA-seq data
3
作者 Hao Wang Xiangzheng Fu +9 位作者 Lijia Liu Yi Wang Jingpeng Hong Bintao Pan Yaning Cao Yanqing Chen Yongsheng Cao Xiaoding Ma Wei Fang Shen Yan 《aBIOTECH》 2025年第2期189-201,共13页
Single-cell RNA sequencing(scRNA-seq)technology enables a deep understanding of cellular differentiation during plant development and reveals heterogeneity among the cells of a given tissue.However,the computational c... Single-cell RNA sequencing(scRNA-seq)technology enables a deep understanding of cellular differentiation during plant development and reveals heterogeneity among the cells of a given tissue.However,the computational characterization of such cellular heterogeneity is complicated by the high dimensionality,sparsity,and biological noise inherent to the raw data.Here,we introduce PhytoCluster,an unsupervised deep learning algorithm,to cluster scRNA-seq data by extracting latent features.We benchmarked PhytoCluster against four simulated datasets and five real scRNA-seq datasets with varying protocols and data quality levels.A comprehensive evaluation indicated that PhytoCluster outperforms other methods in clustering accuracy,noise removal,and signal retention.Additionally,we evaluated the performance of the latent features extracted by PhytoCluster across four machine learning models.The computational results highlight the ability of PhytoCluster to extract meaningful information from plant scRNA-seq data,with machine learning models achieving accuracy comparable to that of raw features.We believe that PhytoCluster will be a valuable tool for disentangling complex cellular heterogeneity based on scRNA-seq data. 展开更多
关键词 scRNA-seq Deep learning Cellular heterogeneity Latent features clusterING
原文传递
Relative-Density-Viewpoint-Based Weighted Kernel Fuzzy Clustering
4
作者 Yuhan Xia Xu Li +2 位作者 Ye Liu Wenbo Zhou Yiming Tang 《Computers, Materials & Continua》 2025年第7期625-651,共27页
Applying domain knowledge in fuzzy clustering algorithms continuously promotes the development of clustering technology.The combination of domain knowledge and fuzzy clustering algorithms has some problems,such as ini... Applying domain knowledge in fuzzy clustering algorithms continuously promotes the development of clustering technology.The combination of domain knowledge and fuzzy clustering algorithms has some problems,such as initialization sensitivity and information granule weight optimization.Therefore,we propose a weighted kernel fuzzy clustering algorithm based on a relative density view(RDVWKFC).Compared with the traditional density-based methods,RDVWKFC can capture the intrinsic structure of the data more accurately,thus improving the initial quality of the clustering.By introducing a Relative Density based Knowledge Extraction Method(RDKM)and adaptive weight optimization mechanism,we effectively solve the limitations of view initialization and information granule weight optimization.RDKM can accurately identify high-density regions and optimize the initialization process.The adaptive weight mechanism can reduce noise and outliers’interference in the initial cluster centre selection by dynamically allocating weights.Experimental results on 14 benchmark datasets show that the proposed algorithm is superior to the existing algorithms in terms of clustering accuracy,stability,and convergence speed.It shows adaptability and robustness,especially when dealing with different data distributions and noise interference.Moreover,RDVWKFC can also show significant advantages when dealing with data with complex structures and high-dimensional features.These advancements provide versatile tools for real-world applications such as bioinformatics,image segmentation,and anomaly detection. 展开更多
关键词 Fuzzy clustering fuzzy c-means feature weighting information granule
在线阅读 下载PDF
Auto-Weighted Neutrosophic Fuzzy Clustering for Multi-View Data
5
作者 Zhe Liu Jiahao Shi +2 位作者 Dania Santina Yulong Huang Nabil Mlaiki 《Computer Modeling in Engineering & Sciences》 2025年第9期3531-3555,共25页
The increasing prevalence of multi-view data has made multi-view clustering a crucial technique for discovering latent structures from heterogeneous representations.However,traditional fuzzy clustering algorithms show... The increasing prevalence of multi-view data has made multi-view clustering a crucial technique for discovering latent structures from heterogeneous representations.However,traditional fuzzy clustering algorithms show limitations with the inherent uncertainty and imprecision of such data,as they rely on a single-dimensional membership value.To overcome these limitations,we propose an auto-weighted multi-view neutrosophic fuzzy clustering(AW-MVNFC)algorithm.Our method leverages the neutrosophic framework,an extension of fuzzy sets,to explicitly model imprecision and ambiguity through three membership degrees.The core novelty of AWMVNFC lies in a hierarchical weighting strategy that adaptively learns the contributions of both individual data views and the importance of each feature within a view.Through a unified objective function,AW-MVNFC jointly optimizes the neutrosophic membership assignments,cluster centers,and the distributions of view and feature weights.Comprehensive experiments conducted on synthetic and real-world datasets demonstrate that our algorithm achieves more accurate and stable clustering than existing methods,demonstrating its effectiveness in handling the complexities of multi-view data. 展开更多
关键词 Multi-view data neutrosophic fuzzy clustering view weight feature weight UNCERTAINTY
在线阅读 下载PDF
A systematic data-driven modelling framework for nonlinear distillation processes incorporating data intervals clustering and new integrated learning algorithm
6
作者 Zhe Wang Renchu He Jian Long 《Chinese Journal of Chemical Engineering》 2025年第5期182-199,共18页
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie... The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation. 展开更多
关键词 Integrated learning algorithm Data intervals clustering feature selection Application of artificial intelligence in distillation industry Data-driven modelling
在线阅读 下载PDF
Identification of Visibility Level for Enhanced Road Safety under Different Visibility Conditions:A Hierarchical Clustering-Based Learning Model
7
作者 Asmat Ullah Yar Muhammad +4 位作者 Bakht Zada Korhan Cengiz Nikola Ivkovic Mario Konecki Abid Yahya 《Computers, Materials & Continua》 2025年第11期3767-3786,共20页
Low visibility conditions,particularly those caused by fog,significantly affect road safety and reduce drivers’ability to see ahead clearly.The conventional approaches used to address this problem primarily rely on i... Low visibility conditions,particularly those caused by fog,significantly affect road safety and reduce drivers’ability to see ahead clearly.The conventional approaches used to address this problem primarily rely on instrument-based and fixed-threshold-based theoretical frameworks,which face challenges in adaptability and demonstrate lower performance under varying environmental conditions.To overcome these challenges,we propose a real-time visibility estimation model that leverages roadside CCTV cameras to monitor and identify visibility levels under different weather conditions.The proposedmethod begins by identifying specific regions of interest(ROI)in the CCTVimages and focuses on extracting specific features such as the number of lines and contours detected within these regions.These features are then provided as an input to the proposed hierarchical clusteringmodel,which classifies them into different visibility levels without the need for predefined rules and threshold values.In the proposed approach,we used two different distance similaritymetrics,namely dynamic time warping(DTW)and Euclidean distance,alongside the proposed hierarchical clustering model and noted its performance in terms of numerous evaluation measures.The proposed model achieved an average accuracy of 97.81%,precision of 91.31%,recall of 91.25%,and F1-score of 91.27% using theDTWdistancemetric.We also conducted experiments for other deep learning(DL)-based models used in the literature and compared their performances with the proposed model.The experimental results demonstrate that the proposedmodel ismore adaptable and consistent compared to themethods used in the literature.The proposedmethod provides drivers real-time and accurate visibility information and enhances road safety during low visibility conditions. 展开更多
关键词 CCTV images road safety and security visibility level estimation hierarchical clustering learning feature extraction safe and secure transportation
在线阅读 下载PDF
NEW SHADOWED C-MEANS CLUSTERING WITH FEATURE WEIGHTS 被引量:2
8
作者 王丽娜 王建东 姜坚 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2012年第3期273-283,共11页
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ... Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms. 展开更多
关键词 fuzzy C-means shadowed sets shadowed C-means feature weights cluster validity index
在线阅读 下载PDF
Heuristic feature selection method for clustering
9
作者 徐峻岭 徐宝文 +1 位作者 张卫丰 崔自峰 《Journal of Southeast University(English Edition)》 EI CAS 2006年第2期169-175,共7页
In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed. This method has three steps which are all carried out in a wrapper framework. First, all the... In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed. This method has three steps which are all carried out in a wrapper framework. First, all the original features are ranked according to their importance. An evaluation function E(f) used to evaluate the importance of a feature is introduced. Secondly, the set of important features is selected sequentially. Finally, the possible redundant features are removed from the important feature subset. Because the features are selected sequentially, it is not necessary to search through the large feature subset space, thus the efficiency can be improved. Experimental results show that the set of important features for clustering can be found and those unimportant features or features that may hinder the clustering task will be discarded by this method. 展开更多
关键词 feature selection clusterING unsupervised learning
在线阅读 下载PDF
Improved method for the feature extraction of laser scanner using genetic clustering 被引量:6
10
作者 Yu Jinxia Cai Zixing Duan Zhuohua 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期280-285,共6页
Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method b... Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method based on genetic clustering VGA-clustering is presented. By integrating the spatial neighbouring information of range data into fuzzy clustering algorithm, a weighted fuzzy clustering algorithm (WFCA) instead of standard clustering algorithm is introduced to realize feature extraction of laser scanner. Aimed at the unknown clustering number in advance, several validation index functions are used to estimate the validity of different clustering algorithms and one validation index is selected as the fitness function of genetic algorithm so as to determine the accurate clustering number automatically. At the same time, an improved genetic algorithm IVGA on the basis of VGA is proposed to solve the local optimum of clustering algorithm, which is implemented by increasing the population diversity and improving the genetic operators of elitist rule to enhance the local search capacity and to quicken the convergence speed. By the comparison with other algorithms, the effectiveness of the algorithm introduced is demonstrated. 展开更多
关键词 laser scanner feature extraction weighted fuzzy clustering validation index genetic algorithm.
在线阅读 下载PDF
A New Feature Selection Method for Text Clustering 被引量:3
11
作者 XU Junling XU Baowen +2 位作者 ZHANG Weifeng CUI Zifeng ZHANG Wei 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期912-916,共5页
Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method... Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. 展开更多
关键词 feature selection text clustering unsupervised learning data preprocessing
在线阅读 下载PDF
Internal Defects Detection Method of the Railway Track Based on Generalization Features Cluster Under Ultrasonic Images 被引量:4
12
作者 Fupei Wu Xiaoyang Xie +1 位作者 Jiahua Guo Qinghua Li 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2022年第5期364-381,共18页
There may be several internal defects in railway track work that have different shapes and distribution rules,and these defects affect the safety of high-speed trains.Establishing reliable detection models and methods... There may be several internal defects in railway track work that have different shapes and distribution rules,and these defects affect the safety of high-speed trains.Establishing reliable detection models and methods for these internal defects remains a challenging task.To address this challenge,in this study,an intelligent detection method based on a generalization feature cluster is proposed for internal defects of railway tracks.First,the defects are classified and counted according to their shape and location features.Then,generalized features of the internal defects are extracted and formulated based on the maximum difference between different types of defects and the maximum tolerance among same defects’types.Finally,the extracted generalized features are expressed by function constraints,and formulated as generalization feature clusters to classify and identify internal defects in the railway track.Furthermore,to improve the detection reliability and speed,a reduced-dimension method of the generalization feature clusters is presented in this paper.Based on this reduced-dimension feature and strongly constrained generalized features,the K-means clustering algorithm is developed for defect clustering,and good clustering results are achieved.Regarding the defects in the rail head region,the clustering accuracy is over 95%,and the Davies-Bouldin index(DBI)index is negligible,which indicates the validation of the proposed generalization features with strong constraints.Experimental results prove that the accuracy of the proposed method based on generalization feature clusters is up to 97.55%,and the average detection time is 0.12 s/frame,which indicates that it performs well in adaptability,high accuracy,and detection speed under complex working environments.The proposed algorithm can effectively detect internal defects in railway tracks using an established generalization feature cluster model. 展开更多
关键词 Railway track Generalization features cluster Defects classification Ultrasonic image Defects detection
在线阅读 下载PDF
Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering 被引量:4
13
作者 SU Ya-ru WANG Ru-jing +3 位作者 CHEN Peng WEI Yuan-yuan LI Chuan-xi HU Yi-min 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2012年第5期752-759,共8页
Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension ... Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization. 展开更多
关键词 agricultural ontology feature optimization agricultural text clustering
在线阅读 下载PDF
Stress-induced trend:the clustering feature of coal mine disasters and earthquakes in China 被引量:4
14
作者 Bo Chen 《International Journal of Coal Science & Technology》 EI CAS 2020年第4期676-692,共17页
Nearly half of coal mine disasters in China have been found to occur in clusters or to be accompanied by earthquakes nearby,in which all the disaster types are involved.Stress disturbances seem to exist among mining a... Nearly half of coal mine disasters in China have been found to occur in clusters or to be accompanied by earthquakes nearby,in which all the disaster types are involved.Stress disturbances seem to exist among mining areas and to be responsible for the observed clustering.The earthquakes accompanied by coal mine disasters may be the vital geophysical evidence for tectonic stress disturbances around mining areas.This paper analyzes all the possible causative factors to demonstrate the authenticity and reliability of the observed phenomena.A quantitative study was performed on the degree of clustering,and space-time distribution curves are obtained.Under the threshold of 100 km,47%of disasters are involved in cluster series and 372 coal mine disasters accompanied by earthquakes.The majority cluster series lasting for 1-2 days correspond well earthquakes nearby,which are speculated to be related to local stress disturbance.While the minority lasting longer than 4 days correspond well with fatal earthquakes,which are speculated to be related to regional stress disturbance.The cluster series possess multiple properties,such as the area,the distance,the related disasters,etc.,and compared with the energy and the magnitude of earthquakes,good correspondences are acquired.It indicates that the cluster series of coal mine disasters and earthquakes are linked with fatal earthquakes and may serve as footprints of regional stress disturbance.Speculations relating to the geological model are made,and five disaster-causing models are examined.To earthquake research and disaster prevention,widely scientific significance is suggested. 展开更多
关键词 EARTHQUAKE Coal mine disaster cluster feature Stress disturbance
在线阅读 下载PDF
Application of Self-Organizing Feature Map Neural Network Based on K-means Clustering in Network Intrusion Detection 被引量:5
15
作者 Ling Tan Chong Li +1 位作者 Jingming Xia Jun Cao 《Computers, Materials & Continua》 SCIE EI 2019年第7期275-288,共14页
Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one... Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one of the most important technologies in network security detection.The accuracy of network intrusion detection has reached higher accuracy so far.However,these methods have very low efficiency in network intrusion detection,even the most popular SOM neural network method.In this paper,an efficient and fast network intrusion detection method was proposed.Firstly,the fundamental of the two different methods are introduced respectively.Then,the selforganizing feature map neural network based on K-means clustering(KSOM)algorithms was presented to improve the efficiency of network intrusion detection.Finally,the NSLKDD is used as network intrusion data set to demonstrate that the KSOM method can significantly reduce the number of clustering iteration than SOM method without substantially affecting the clustering results and the accuracy is much higher than Kmeans method.The Experimental results show that our method can relatively improve the accuracy of network intrusion and significantly reduce the number of clustering iteration. 展开更多
关键词 K-means clustering self-organizing feature map neural network network security intrusion detection NSL-KDD data set
在线阅读 下载PDF
Different Feature Selection of Soil Attributes Influenced Clustering Performance on Soil Datasets 被引量:1
16
作者 Jiaogen Zhou Yang Wang 《International Journal of Geosciences》 2019年第10期919-929,共11页
Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the respons... Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the response of clustering performance to different features subsets. In the present paper, we analyzed the performance differences between k-means, fuzzy c-means, and spectral clustering algorithms in the conditions of different feature subsets of soil data sets. The experimental results demonstrated that the performances of spectral clustering algorithm were generally better than those of k-means and fuzzy c-means with different features subsets. The feature subsets containing environmental attributes helped to improve clustering performances better than those having spatial attributes and produced more accurate and meaningful clustering results. Our results demonstrated that combination of spectral clustering algorithm with the feature subsets containing environmental attributes rather than spatial attributes may be a better choice in applications of soil data clustering. 展开更多
关键词 feature Selection K-MEANS clusterING Fuzzy C-MEANS clusterING Spectral clusterING SOIL Attributes
在线阅读 下载PDF
Massive Power Device Condition Monitoring Data Feature Extraction and Clustering Analysis using MapReduce and Graph Model 被引量:4
17
作者 Hongtao Shen Peng Tao +1 位作者 Pei Zhao Hao Ma 《CES Transactions on Electrical Machines and Systems》 CSCD 2019年第2期221-230,共10页
Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at ... Effective storage,processing and analyzing of power device condition monitoring data faces enormous challenges.A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun DTplus platform.First,power device condition monitoring data storage based on MaxCompute table and parallel permutation entropy feature extraction based on MaxCompute MapReduce are designed and implemented on DTplus platform.Then,Graph based k-means algorithm is implemented and used for massive condition monitoring data clustering analysis.Finally,performance tests are performed to compare the execution time between serial program and parallel program.Performance is analyzed from CPU cores consumption,memory utilization and parallel granularity.Experimental results show that the designed framework and parallel algorithms can efficiently process massive power device condition monitoring data. 展开更多
关键词 clustering analysis GRAPH feature extraction MAPREDUCE maxcompute power device condition monitoring.
在线阅读 下载PDF
Stable Label-Specific Features Generation for Multi-Label Learning via Mixture-Based Clustering Ensemble 被引量:1
18
作者 Yi-Bo Wang Jun-Yi Hang Min-Ling Zhang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第7期1248-1261,共14页
Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess... Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms. 展开更多
关键词 clustering ensemble expectation-maximization al-gorithm label-specific features multi-label learning
在线阅读 下载PDF
Feature Selection for Cluster Analysis in Spectroscopy 被引量:1
19
作者 Simon Crase Benjamin Hall Suresh N.Thennadil 《Computers, Materials & Continua》 SCIE EI 2022年第5期2435-2458,共24页
Cluster analysis in spectroscopy presents some unique challenges due to the specific data characteristics in spectroscopy,namely,high dimensionality and small sample size.In order to improve cluster analysis outcomes,... Cluster analysis in spectroscopy presents some unique challenges due to the specific data characteristics in spectroscopy,namely,high dimensionality and small sample size.In order to improve cluster analysis outcomes,feature selection can be used to remove redundant or irrelevant features and reduce the dimensionality.However,for cluster analysis,this must be done in an unsupervised manner without the benefit of data labels.This paper presents a novel feature selection approach for cluster analysis,utilizing clusterability metrics to remove features that least contribute to a dataset’s tendency to cluster.Two versions are presented and evaluated:The Hopkins clusterability filter which utilizes the Hopkins test for spatial randomness and the Dip clusterability filter which utilizes the Dip test for unimodality.These new techniques,along with a range of existing filter and wrapper feature selection techniques were evaluated on eleven real-world spectroscopy datasets using internal and external clustering indices.Our newly proposed Hopkins clusterability filter performed the best of the six filter techniques evaluated.However,it was observed that results varied greatly for different techniques depending on the specifics of the dataset and the number of features selected,with significant instability observed for most techniques at low numbers of features.It was identified that the genetic algorithm wrapper technique avoided this instability,performed consistently across all datasets and resulted in better results on average than utilizing the all the features in the spectra. 展开更多
关键词 cluster analysis SPECTROSCOPY unsupervised learning feature selection wavenumber selection
在线阅读 下载PDF
RESEARCH FOR CLUSTERING OF FEATURE MANUFACTURING-ORIENTED
20
作者 Liu Xuan Shen Xiaohong Nie Xuejun Chen Shan College of Mechanical Engineering and Automation, Beijing University of Industry and Commerce,Beijing 100037,China 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2002年第1期11-14,共4页
The following questions are discussed: feature cluster, feature clusterconcept and the reasoning formula. The defect based on approach direction and feed direction areanalyzed. Feature tool axis direction concept and ... The following questions are discussed: feature cluster, feature clusterconcept and the reasoning formula. The defect based on approach direction and feed direction areanalyzed. Feature tool axis direction concept and its definition method are submitted. The featurefor practical part is also clustered by tool axis direction. 展开更多
关键词 feature clustering of feature Tool axis direction
在线阅读 下载PDF
上一页 1 2 147 下一页 到第
使用帮助 返回顶部