Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal...Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.展开更多
One of the main problems of machine learning and data mining is to develop a basic model with a few features,to reduce the algorithms involved in classification’s computational complexity.In this paper,the collection...One of the main problems of machine learning and data mining is to develop a basic model with a few features,to reduce the algorithms involved in classification’s computational complexity.In this paper,the collection of features has an essential importance in the classification process to be able minimize computational time,which decreases data size and increases the precision and effectiveness of specific machine learning activities.Due to its superiority to conventional optimization methods,several metaheuristics have been used to resolve FS issues.This is why hybrid metaheuristics help increase the search and convergence rate of the critical algorithms.A modern hybrid selection algorithm combining the two algorithms;the genetic algorithm(GA)and the Particle Swarm Optimization(PSO)to enhance search capabilities is developed in this paper.The efficacy of our proposed method is illustrated in a series of simulation phases,using the UCI learning array as a benchmark dataset.展开更多
Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are no...Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.展开更多
Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature sub...Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature subset which can meet classification correctness rate,then applies wrapper feature selection method select optimal feature subset. A successful techniquefor solving optimization problems is given by genetic algorithm (GA). GA is applied to the problemof optimal feature selection. The composite method saves computing time several times of the wrappermethod with holding the classification accuracy in data simulation and experiment on bearing faultfeature selection. So this method possesses excellent optimization property, can save more selectiontime, and has the characteristics of high accuracy and high efficiency.展开更多
CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
In this paper,a hybrid model based on sooty tern optimization algo-rithm(STOA)is proposed to optimize the parameters of the support vector machine(SVM)and identify the best feature sets simultaneously.Feature selec-ti...In this paper,a hybrid model based on sooty tern optimization algo-rithm(STOA)is proposed to optimize the parameters of the support vector machine(SVM)and identify the best feature sets simultaneously.Feature selec-tion is an essential process of data preprocessing,and it aims to find the most rele-vant subset of features.In recent years,it has been applied in many practical domains of intelligent systems.The application of SVM in many fields has proved its effectiveness in classification tasks of various types.Its performance is mainly determined by the kernel type and its parameters.One of the most challenging process in machine learning is feature selection,intending to select effective and representative features.The main disadvantages of feature selection processes included in classical optimization algorithm are local optimal stagnation and slow convergence.Therefore,the hybrid model proposed in this paper merges the STOA and differential evolution(DE)to improve the search efficiency and con-vergence rate.A series of experiments are conducted on 12 datasets from the UCI repository to comprehensively and objectively evaluate the performance of the proposed method.The superiority of the proposed method is illustrated from dif-ferent aspects,such as the classification accuracy,convergence performance,reduced feature dimensionality,standard deviation(STD),and computation time.展开更多
In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making d...In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.展开更多
为解决光伏序列的强噪音干扰以及单一模型在光伏功率预测方面精度偏低和泛化性较差的问题,提出了一种基于特征优化和混合改进灰狼算法优化双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)的短期光伏功率预测方法。首...为解决光伏序列的强噪音干扰以及单一模型在光伏功率预测方面精度偏低和泛化性较差的问题,提出了一种基于特征优化和混合改进灰狼算法优化双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)的短期光伏功率预测方法。首先,运用互信息算法进行输入数据的变量选择,以消除冗余变量。其次,通过互补集合经验模态分解和改进的小波阈值算法对筛选后的数据进行特征重构,旨在降低数据中的噪声干扰并完成输入变量的特征优化。随后,结合改进的Tent混沌映射、非线性递减因子、动态权重策略和差分进化算法对标准灰狼优化算法进行混合优化,以确定双向长短期记忆神经网络的最优超参数组合,并引入注意力机制以挖掘数据中的关键时序信息,最终构建出一种新型的短期光伏功率预测模型。仿真实验表明,相较于最小二乘支持向量机、长短期记忆网络和双向长短期记忆网络,所提模型在晴天、多云、阴天和降雨等不同工况下的均方根误差平均分别降低了12.45%、7.95%和5.37%,显示出优秀的预测性能、良好的泛化能力和潜在的工程应用价值。展开更多
文摘Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.
基金This work was partially supported by the National Natural Science Foundation of China(61876089,61876185,61902281,61375121)the Opening Project of Jiangsu Key Laboratory of Data Science and Smart Software(No.2019DS301)+1 种基金the Engineering Research Center of Digital Forensics,Ministry of Education,the Key Research and Development Program of Jiangsu Province(BE2020633)the Priority Academic Program Development of Jiangsu Higher Education Institutions。
文摘One of the main problems of machine learning and data mining is to develop a basic model with a few features,to reduce the algorithms involved in classification’s computational complexity.In this paper,the collection of features has an essential importance in the classification process to be able minimize computational time,which decreases data size and increases the precision and effectiveness of specific machine learning activities.Due to its superiority to conventional optimization methods,several metaheuristics have been used to resolve FS issues.This is why hybrid metaheuristics help increase the search and convergence rate of the critical algorithms.A modern hybrid selection algorithm combining the two algorithms;the genetic algorithm(GA)and the Particle Swarm Optimization(PSO)to enhance search capabilities is developed in this paper.The efficacy of our proposed method is illustrated in a series of simulation phases,using the UCI learning array as a benchmark dataset.
基金supported by Natural Science Foundation of Liaoning Province under Grant 2021-MS-272Educational Committee project of Liaoning Province under Grant LJKQZ2021088.
文摘Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.
基金This project is supported by Scientific Research Foundation of National Defence of China (No.41319040202).
文摘Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature subset which can meet classification correctness rate,then applies wrapper feature selection method select optimal feature subset. A successful techniquefor solving optimization problems is given by genetic algorithm (GA). GA is applied to the problemof optimal feature selection. The composite method saves computing time several times of the wrappermethod with holding the classification accuracy in data simulation and experiment on bearing faultfeature selection. So this method possesses excellent optimization property, can save more selectiontime, and has the characteristics of high accuracy and high efficiency.
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
基金Sanming University introduces high-level talents to start scientific research funding support project(20YG14,20YG01)Guiding science and technology projects in Sanming City(2020-G-61,2020-S-39)+1 种基金Educational research projects of young and middle-aged teachers in Fujian Province(JAT200618,JAT200638)Scientific research and development fund of Sanming University(B202009,B202029).
文摘In this paper,a hybrid model based on sooty tern optimization algo-rithm(STOA)is proposed to optimize the parameters of the support vector machine(SVM)and identify the best feature sets simultaneously.Feature selec-tion is an essential process of data preprocessing,and it aims to find the most rele-vant subset of features.In recent years,it has been applied in many practical domains of intelligent systems.The application of SVM in many fields has proved its effectiveness in classification tasks of various types.Its performance is mainly determined by the kernel type and its parameters.One of the most challenging process in machine learning is feature selection,intending to select effective and representative features.The main disadvantages of feature selection processes included in classical optimization algorithm are local optimal stagnation and slow convergence.Therefore,the hybrid model proposed in this paper merges the STOA and differential evolution(DE)to improve the search efficiency and con-vergence rate.A series of experiments are conducted on 12 datasets from the UCI repository to comprehensively and objectively evaluate the performance of the proposed method.The superiority of the proposed method is illustrated from dif-ferent aspects,such as the classification accuracy,convergence performance,reduced feature dimensionality,standard deviation(STD),and computation time.
文摘In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.
文摘为解决光伏序列的强噪音干扰以及单一模型在光伏功率预测方面精度偏低和泛化性较差的问题,提出了一种基于特征优化和混合改进灰狼算法优化双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)的短期光伏功率预测方法。首先,运用互信息算法进行输入数据的变量选择,以消除冗余变量。其次,通过互补集合经验模态分解和改进的小波阈值算法对筛选后的数据进行特征重构,旨在降低数据中的噪声干扰并完成输入变量的特征优化。随后,结合改进的Tent混沌映射、非线性递减因子、动态权重策略和差分进化算法对标准灰狼优化算法进行混合优化,以确定双向长短期记忆神经网络的最优超参数组合,并引入注意力机制以挖掘数据中的关键时序信息,最终构建出一种新型的短期光伏功率预测模型。仿真实验表明,相较于最小二乘支持向量机、长短期记忆网络和双向长短期记忆网络,所提模型在晴天、多云、阴天和降雨等不同工况下的均方根误差平均分别降低了12.45%、7.95%和5.37%,显示出优秀的预测性能、良好的泛化能力和潜在的工程应用价值。