The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Due to the fla...The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Due to the flaws of traditional data stream subspace clustering algorithms, we propose SCRP, a new data stream subspace clustering algorithm. SCRP has the advantages of fast clustering and being insensitive to outliers. When data stream changes, the changes will be recorded by the data structure named Region-tree, and the corresponding statistics information will be updated. Further SCRP can regulate clustering results in time when data stream changes. According to the experiments on real datasets and synthetic datasets, SCRP is superior to the existing data stream subspace clustering algorithms on both clustering precision and clustering speed, and it has good scalability to the number of clusters and dimensions.展开更多
Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse...Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse sliced inverse regression algorithm for the first time.Our method can simultaneously estimate the central dimension reduction subspace and perform variable selection in a federated setting.We transform this federated high-dimensional sparse sliced inverse regression problem into a convex optimization problem by constructing the covariance matrix safely and losslessly.We then use a linearized alternating direction method of multipliers algorithm to estimate the central subspace.We also give approaches of Bayesian information criterion and holdout validation to ascertain the dimension of the central subspace and the hyperparameter of the algorithm.We establish an upper bound of the statistical error rate of our estimator under the heterogeneous setting.We demonstrate the effectiveness of our method through simulations and real world applications.展开更多
风速波动具有随机性和不确定性,导致风速预测的准确度不高。准确的风速预测对于优化风电运行策略和提高发电效率具有重大意义。利用最大信息系数(Maximal Information Coefficient,MIC)对风机SCADA数据进行变量相关性分析,并以MIC值大...风速波动具有随机性和不确定性,导致风速预测的准确度不高。准确的风速预测对于优化风电运行策略和提高发电效率具有重大意义。利用最大信息系数(Maximal Information Coefficient,MIC)对风机SCADA数据进行变量相关性分析,并以MIC值大小对原始变量排序,将包含7项变量的子集作为深度信念网络(Deep Belief Network,DBN)输入,得到MIC-DBN风速预测模型。基于风场实际数据将MIC-DBN模型与BP神经网络模型和GA-BP模型进行测试对比,实验结果表明,MIC-DBN风速预测模型的预测精度和泛化性能具有良好的效果。展开更多
文摘The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Due to the flaws of traditional data stream subspace clustering algorithms, we propose SCRP, a new data stream subspace clustering algorithm. SCRP has the advantages of fast clustering and being insensitive to outliers. When data stream changes, the changes will be recorded by the data structure named Region-tree, and the corresponding statistics information will be updated. Further SCRP can regulate clustering results in time when data stream changes. According to the experiments on real datasets and synthetic datasets, SCRP is superior to the existing data stream subspace clustering algorithms on both clustering precision and clustering speed, and it has good scalability to the number of clusters and dimensions.
文摘Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse sliced inverse regression algorithm for the first time.Our method can simultaneously estimate the central dimension reduction subspace and perform variable selection in a federated setting.We transform this federated high-dimensional sparse sliced inverse regression problem into a convex optimization problem by constructing the covariance matrix safely and losslessly.We then use a linearized alternating direction method of multipliers algorithm to estimate the central subspace.We also give approaches of Bayesian information criterion and holdout validation to ascertain the dimension of the central subspace and the hyperparameter of the algorithm.We establish an upper bound of the statistical error rate of our estimator under the heterogeneous setting.We demonstrate the effectiveness of our method through simulations and real world applications.
文摘风速波动具有随机性和不确定性,导致风速预测的准确度不高。准确的风速预测对于优化风电运行策略和提高发电效率具有重大意义。利用最大信息系数(Maximal Information Coefficient,MIC)对风机SCADA数据进行变量相关性分析,并以MIC值大小对原始变量排序,将包含7项变量的子集作为深度信念网络(Deep Belief Network,DBN)输入,得到MIC-DBN风速预测模型。基于风场实际数据将MIC-DBN模型与BP神经网络模型和GA-BP模型进行测试对比,实验结果表明,MIC-DBN风速预测模型的预测精度和泛化性能具有良好的效果。