期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
A Model Distributed Training Method Based on Improved Split Learning in the Internet of Things
1
作者 Hui Cai Meifeng Zhu +5 位作者 Shiji Tao Yuning Zhou Jian Zhou Biyun Sheng Lu Zhao Xin He 《Data Intelligence》 2026年第1期164-182,共19页
With the rapid development of the Internet of Things(IoT),a large amount of data has been brought to distributed nodes in edge environments,which has caused concerns about data privacy leakage and data transmission la... With the rapid development of the Internet of Things(IoT),a large amount of data has been brought to distributed nodes in edge environments,which has caused concerns about data privacy leakage and data transmission latency.Split learning(SL)divides the model among distributed nodes,addressing the high storage and computational demands brought by storing the entire model on a single node in federated learning.However,SL suffers from scalability limitations,leading to decreased training effectiveness with a changing number of nodes in Io T.Furthermore,the heterogeneity of data in Io T nodes also affects the training effectiveness due to the generalization limitations of SL.Therefore,we propose a model distributed training method based on improved SL in Io T.Firstly,we leverage Mixup,which is a data augmentation method to blend the“smashed data”generated by client models,thereby creating more diverse samples with smooth transitions in the feature space.This improves robustness against the influence of node number fluctuations.Secondly,a hypernetwork is employed to allocate weights to model parameters,obtaining personalized training parameters for each client model by evaluating the importance of different clients,thus improving the impact of heterogeneous node data on model generalization performance.Experimental results show that the proposed method outperforms other distributed training methods on various datasets in terms of model performance. 展开更多
关键词 Deep echo state network distributed training Split learning
原文传递
DSparse:A Distributed Training Method for Edge Clusters Based on Sparse Update 被引量:1
2
作者 Xiao-Hui Peng Yi-Xuan Sun +1 位作者 Zheng-Hui Zhang Yi-Fan Wang 《Journal of Computer Science & Technology》 2025年第3期637-653,共17页
Edge machine learning creates a new computational paradigm by enabling the deployment of intelligent applications at the network edge.It enhances application efficiency and responsiveness by performing inference and t... Edge machine learning creates a new computational paradigm by enabling the deployment of intelligent applications at the network edge.It enhances application efficiency and responsiveness by performing inference and training tasks closer to data sources.However,it encounters several challenges in practice.The variance in hardware specifications and performance across different devices presents a major issue for the training and inference tasks.Additionally,edge devices typically possess limited network bandwidth and computing resources compared with data centers.Moreover,existing distributed training architectures often fail to consider the constraints of resources and communication efficiency in edge environments.In this paper,we propose DSparse,a method for distributed training based on sparse update in edge clusters with various memory capacities.It aims at maximizing the utilization of memory resources across all devices within a cluster.To reduce memory consumption during the training process,we adopt sparse update to prioritize the updating of selected layers on the devices in the cluster,which not only lowers memory usage but also reduces the data volume of parameters and the time required for parameter aggregation.Furthermore,DSparse utilizes a parameter aggregation mechanism based on multi-process groups,subdividing the aggregation tasks into AllReduce and Broadcast types,thereby further reducing the communication frequency for parameter aggregation.Experimental results using the MobileNetV2 model on the CIFAR-10 dataset demonstrate that DSparse reduces memory consumption by an average of 59.6%across seven devices,with a 75.4%reduction in parameter aggregation time,while maintaining model precision. 展开更多
关键词 distributed training edge computing edge machine learning sparse update edge cluster
原文传递
A Secured and Continuously Developing Methodology for Breast Cancer Image Segmentation via U-Net Based Architecture and Distributed Data Training
3
作者 Rifat Sarker Aoyon Ismail Hossain +1 位作者 M.Abdullah-Al-Wadud Jia Uddin 《Computer Modeling in Engineering & Sciences》 2025年第3期2617-2640,共24页
This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this resear... This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects. 展开更多
关键词 Breast cancer U-Net distributed training data privacy low-powerful machines
在线阅读 下载PDF
Rationing bandwidth resources for mitigating network resource contention in distributed DNN training clusters
4
作者 Qiang Qi Fei Xu +1 位作者 Li Chen Zhi Zhou 《CCF Transactions on High Performance Computing》 2021年第2期171-185,共15页
Distributed deep neural network(DDNN)training becomes increasingly compelling as the DNN model gets complex and the dataset grows large.Through an in-depth analysis of the latest Microsoft GPU cluster trace,we show th... Distributed deep neural network(DDNN)training becomes increasingly compelling as the DNN model gets complex and the dataset grows large.Through an in-depth analysis of the latest Microsoft GPU cluster trace,we show that the co-located Parameter Server(PS)configuration is not uncommon in production DDNN training clusters,which inevitably causes intense network resource contention among the co-located PS and worker tasks.Our motivation experiments on Amazon EC2 further show that such network resource contention brings severe performance variation to DDNN training jobs.While existing works largely mitigate the inter-job network resource contention,the intra-job(i.e.,task-level)network resource contention among the co-located PS and worker tasks has received comparably little attention.To tackle such performance issues,in this paper,we design and implement Nebula,a Network bandwidth resource allocation strategy for DDNN training tasks,in order to mitigate the network resource contention and alleviate the performance variation of DDNN training jobs.Nebula monitors the weights of co-located PS and workers and rations the network bandwidth resources for the two tasks by comparing the corresponding task weights.We implement a prototype of Nebula and conduct extensive prototype experiments with representative DNN models trained on Amazon EC2.Our experiment results demonstrate that Nebula can reduce the iteration time of a DDNN training job by up to 25%and improve the cluster resource utilization by up to 30%in comparison to MXNet,yet with practically acceptable runtime overhead. 展开更多
关键词 distributed DNN training Bandwidth allocation Network resource contention
在线阅读 下载PDF
BAFT:bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelism
5
作者 Runzhe CHEN Guandong LU +6 位作者 Yakai WANG Rui ZHANG Zheng HU Yanming MIAO Zhifang CAI Jingwen LENG Minyi GUO 《Frontiers of Computer Science》 2025年第1期29-39,共11页
As deep neural networks (DNNs) have been successfully adopted in various domains, the training of these large-scale models becomes increasingly difficult and is often deployed on compute clusters composed of many devi... As deep neural networks (DNNs) have been successfully adopted in various domains, the training of these large-scale models becomes increasingly difficult and is often deployed on compute clusters composed of many devices like GPUs. However, as the size of the cluster increases, so does the possibility of failures during training. Currently, faults are mainly handled by recording checkpoints and recovering, but this approach causes large overhead and affects the training efficiency even when no error occurs. The low checkpointing frequency leads to a large loss of training time, while the high recording frequency affects the training efficiency. To solve this contradiction, we propose BAFT, a bubble-aware fault tolerant framework for hybrid parallel distributed training. BAFT can automatically analyze parallel strategies, profile the runtime information, and schedule checkpointing tasks at the granularity of pipeline stage depending on the bubble distribution in the training. It supports higher checkpoint efficiency and only introduces less than 1% time overhead, which allows us to record checkpoints at high frequency, thereby reducing the time loss in error recovery and avoiding the impact of fault tolerance on training. 展开更多
关键词 distributed training fault tolerance CHECKPOINT pipeline parallelism error recovery
原文传递
A Novel Agricultural Data Sharing Mode Based on Rice Disease Identification
6
作者 Mengmeng ZHANG Xiujuan WANG +3 位作者 Mengzhen KANG Jing HUA Haoyu WANG Feiyue WANG 《Plant Diseases and Pests》 2024年第2期9-16,共8页
In this paper,a variety of classical convolutional neural networks are trained on two different datasets using transfer learning method.We demonstrated that the training dataset has a significant impact on the trainin... In this paper,a variety of classical convolutional neural networks are trained on two different datasets using transfer learning method.We demonstrated that the training dataset has a significant impact on the training results,in addition to the optimization achieved through the model structure.However,the lack of open-source agricultural data,combined with the absence of a comprehensive open-source data sharing platform,remains a substantial obstacle.This issue is closely related to the difficulty and high cost of obtaining high-quality agricultural data,the low level of education of most employees,underdeveloped distributed training systems and unsecured data security.To address these challenges,this paper proposes a novel idea of constructing an agricultural data sharing platform based on a federated learning(FL)framework,aiming to overcome the deficiency of high-quality data in agricultural field training. 展开更多
关键词 Rice disease and pest identification Convolutional neural networks distributed training Federated learning(FL) Open-source data sharing platform
在线阅读 下载PDF
Multi-User MmWave Beam Tracking via Multi-Agent Deep Q-Learning 被引量:2
7
作者 MENG Fan HUANG Yongming +1 位作者 LU Zhaohua XIAO Huahua 《ZTE Communications》 2023年第2期53-60,共8页
Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynami... Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method. 展开更多
关键词 multi-agent deep Q-learning centralized training and distributed execution mmWave communication beam tracking scalability
在线阅读 下载PDF
Crosswind stability of high-speed trains in special cuts 被引量:3
8
作者 张洁 高广军 +1 位作者 刘堂红 李志伟 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第7期2849-2856,共8页
Analysis of the aerodynamic performance of high-speed trains in special cuts would provide references for the critical overturning velocity and complement the operation safety management under strong winds.This work w... Analysis of the aerodynamic performance of high-speed trains in special cuts would provide references for the critical overturning velocity and complement the operation safety management under strong winds.This work was conducted to investigate the flow structure around trains under different cut depths,slope angles using computational fluid dynamics(CFD).The high-speed train was considered with bogies and inter-carriage gaps.And the accuracy of the numerical method was validated by combining with the experimental data of wind tunnel tests.Then,the variations of aerodynamic forces and surface pressure distribution of the train were mainly analyzed.The results show that the surroundings of cuts along the railway line have a great effect on the crosswind stability of trains.With the slope angle and depth of the cut increasing,the coefficients of aerodynamic forces tend to reduce.An angle of 75°is chosen as the optimum one for the follow-up research.Under different depth conditions,the reasonable cut depth for high-speed trains to run safely is 3 m lower than that of the conventional cut whose slope ratio is 1:1.5.Furthermore,the windward slope angle is more important than the leeward one for the train aerodynamic performance.Due to the shield of appropriate cuts,the train body is in a minor positive pressure environment.Thus,designing a suitable cut can contribute to improving the operation safety of high-speed trains. 展开更多
关键词 high-speed train crosswind stability cut pressure distribution numerical simulation
在线阅读 下载PDF
The Characteristics and Distribution Pattern of Seafloor Sinuous Pockmark Train in the Niger Delta Basin,West Africa 被引量:2
9
作者 JIANG Li WU Shenghe +1 位作者 HU Guangyi ZHANG Jiajia 《Acta Geologica Sinica(English Edition)》 SCIE CAS CSCD 2016年第3期1057-1058,共2页
Objective The term "pockmark" was introduced by King and MacLean (1970) to describe small "circular" on echosounder records in Nova Scotia. described as circular, near Pockmarks are usually circular or elongated... Objective The term "pockmark" was introduced by King and MacLean (1970) to describe small "circular" on echosounder records in Nova Scotia. described as circular, near Pockmarks are usually circular or elongated depressions, generally 10--400 m in diameter and 30-50 m in deep. Pockmarks are normally regarded to be manifestations of fluids escape through the seabed. Pockmarks are valuable features on the seafloor and are useful in constraining the hydrodynamics of sedimentary basins. Since then pockmarks have been recognized in many areas around the world. They occur predominantly in fine-grained siliciclastic depositional settings, although a few case studies have been reported in carbonate settings. In this paper we illustrate a suite of fluid escape features, discovered during the course of petroleum exploration on the West Africa continental margin (Fig. 1). They are particularly of interest to the oil and gas industry because they could be potential indicators of deeply buried hydrocarbon reservoirs, and fluid flow phenomena in the deep water oilfield are important for the safe and efficient exploration, development and production of hydrocarbons in the area. 展开更多
关键词 The Characteristics and Distribution Pattern of Seafloor Sinuous Pockmark Train in the Niger Delta Basin West Africa
在线阅读 下载PDF
Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs 被引量:2
10
作者 Yu Tang Zhigang Kan +4 位作者 Lujia Yin Zhiquan Lai Zhaoning Zhang Linbo Qiao Dongsheng Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期114-126,共13页
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training er... In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on CIFAR-10 by using Stochastic Gradient Descent(SGD) and Adaptive moment estimation(Adam) while keeping the total batch size in the parameter server constant and lowering the batch size on each Graphics Processing Unit(GPU).A new method that considers momentum to eliminate training errors in distributed training is proposed.We define a Momentum-like Factor(MF) to represent the influence of former gradients on parameter updates in each iteration.Then,we modify the MF values and conduct experiments to explore how different MF values influence the training performance based on SGD,Adam,and Nesterov accelerated gradient.Experimental results reveal that increasing MFs is a reliable method for reducing training errors in distributed training.The analysis of convergent conditions in distributed training with consideration of a large batch size and multiple GPUs is presented in this paper. 展开更多
关键词 multiple Graphics Processing Units(GPUs) batch size training error distributed training momentum-like factors
原文传递
Personalized Privacy-Preserving Data Utilization Approach Powered by Distributed-GAN
11
作者 Shuo Wang Chao Wang +2 位作者 Tianshuo Dong Yunhua He Ke Xiao 《Big Data Mining and Analytics》 CSCD 2024年第4期1098-1113,共16页
Recently,Machine Learning(ML)has made great achievements in a wide range of fields with the increasing data exploring.However,the development of ML faces the challenges of data silos due to privacy considerations.Then... Recently,Machine Learning(ML)has made great achievements in a wide range of fields with the increasing data exploring.However,the development of ML faces the challenges of data silos due to privacy considerations.Then with distributed ML architecture appearing,the training tasks can be completed without sharing data.In particular,distributed Generative Adversarial Network(GAN),as a generative model,can rebuild dataset for downstream tasks and does not need to re-access users’private datasets even if the target task changes.However,the training of distributed-GAN still faces serious privacy threats.Existing work usually adopts a uniform level of privacy protection,which does not meet the requirements about personalized privacy protection for each user.In this paper,we propose a privacy-preserving training framework of distributed-GAN,which combines differential privacy to provide users with personalized privacy protection.Further,a privacy-driven incentive strategy is proposed,which designs a series of smart contracts to provide customized payments for data owners with different privacy preferences as compensation for their privacy cost.At last,we conduct sufficient privacy analysis and experimental validation,which demonstrate that our approach can optimize generative model with lower privacy cost and generate higher quality data for downstream tasks. 展开更多
关键词 Generative Adversarial Network(GAN) differential privacy distributed training
原文传递
Adaptive sparse ternary gradient compression for distributed DNN training in edge computing
12
作者 Yingchi Mao Jun Wu +1 位作者 Xuesong Xu Longbao Wang 《CCF Transactions on High Performance Computing》 2022年第2期120-134,共15页
In edge computing,though distributed training of Deep Neural Networks(DNNs)is expected to exchange massive gradients between parameter servers and working nodes,the high communication cost constrains the training spee... In edge computing,though distributed training of Deep Neural Networks(DNNs)is expected to exchange massive gradients between parameter servers and working nodes,the high communication cost constrains the training speed.To break this limitation,gradient compression algorithms expect the ultimate compression ratio at the expense of the accuracy of the trained model.Therefore,new gradient compression techniques are necessary to ensure both communication efficiency and model accuracy.This paper introduces a novel technique—an Adaptive Sparse Ternary Gradient Compression(ASTC)scheme,which relies on the number of gradients in model layers to compress gradients.ASTC establishes the model compression selection criterion by gradients’amount,compresses the network layer that meets the model’s standard,evaluates the gradients’importance based on entropy to adaptively perform sparse compression,and finally conducts ternary quantization compression and a lossless code scheme on sparse gradients.Using public datasets(MNIST,CIFAR-10,Tiny ImageNet)and deep learning models(CNN,LeNet5,ResNet18)for experimental evaluation,we exhibit excellent results that the training efficiency of ASTC is about 1.6 times,1.37 times,and 1.1 times higher than that of Top-1,AdaComp,and SBC,respectively.Furthermore,ASTC can be improved by an average of about 1.9%in training accuracy compared with the above approaches. 展开更多
关键词 Edge computing distributed DNN training Gradient compression Communication cost
在线阅读 下载PDF
_(ν)GNN:Non‑Uniformly partitioned full‑graph GNN training on mixed GPUs
13
作者 Hemeng Wang Wenqing Lin +1 位作者 Qingxiao Sun Weifeng Liu 《CCF Transactions on High Performance Computing》 2025年第4期305-322,共18页
Graph neural networks(GNNs)can be adapted to GPUs with high computing capability due to massive arithmetic opera-tions.Compared with mini-batch training,full-graph training does not require sampling of the input graph... Graph neural networks(GNNs)can be adapted to GPUs with high computing capability due to massive arithmetic opera-tions.Compared with mini-batch training,full-graph training does not require sampling of the input graph and halo region,avoiding potential accuracy losses.Current deep learning frameworks evenly partition large graphs to scale GNN training to distributed multi-GPU platforms.On the other hand,the rapid revolution of hardware requires technology companies and research institutions to frequently update their equipment to cope with the latest tasks.This results in a large-scale cluster with a mixture of GPUs with various computational capabilities and hardware specifications.However,existing works fail to consider sub-graphs adapted to different GPU generations,leading to inefficient resource utilization and degraded training efficiency.Therefore,we propose_(ν)GNN,a Non-Uniformly partitioned full-graph GNN training framework on heterogeneous distributed platforms._(ν)GNN first models the GNN processing ability of hardware based on various theoretical parameters.Then,_(ν)GNN automatically obtains a reasonable task partitioning scheme by combining hardware,model,and graph dataset information.Finally,_(ν)GNN implements an irregular graph partitioning mechanism that allows GNN training tasks to execute efficiently on distributed heterogeneous systems.The experimental results show that in real-world scenarios with a mixture of GPU generations,_(ν)GNN can outperform other static partitioning schemes based on hardware specifications. 展开更多
关键词 Graph neural network distributed training Graph partitioning GPU
在线阅读 下载PDF
xCCL:A Survey of Industry-Led Collective Communication Libraries for Deep Learning 被引量:2
14
作者 Adam Weingram Yuke Li +3 位作者 Hao Qi Darren Ng Liuyao Dai Xiaoyi Lu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期166-195,共30页
Machine learning techniques have become ubiquitous both in industry and academic applications.Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches.Collective ... Machine learning techniques have become ubiquitous both in industry and academic applications.Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches.Collective communications greatly simplify inter-and intra-node data transfer and are an essential part of the distributed training process as information such as gradients must be shared between processing nodes.In this paper,we survey the current state-of-the-art collective communication libraries(namely xCCL,including NCCL,oneCCL,RCCL,MSCCL,ACCL,and Gloo),with a focus on the industry-led ones for deep learning workloads.We investigate the design features of these xCCLs,discuss their use cases in the industry deep learning workloads,compare their performance with industry-made benchmarks(i.e.,NCCL Tests and PARAM),and discuss key take-aways and interesting observations.We believe our survey sheds light on potential research directions of future designs for xCCLs. 展开更多
关键词 COLLECTIVE deep learning distributed training GPUDirect RDMA(remote direct memory access)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部