Distributed Quantum Computing(DQC)provides a means for scaling available quantum computation by interconnecting multiple quantum processor units(QPUs).A key challenge in this domain is efficiently allocating logical q...Distributed Quantum Computing(DQC)provides a means for scaling available quantum computation by interconnecting multiple quantum processor units(QPUs).A key challenge in this domain is efficiently allocating logical qubits from quantum circuits to the physical qubits within QPUs,a task known to be NP-hard.Traditional approaches,primarily focused on graph partitioning strategies,have sought to reduce the number of required Bell pairs for executing non-local CNOT operations,a form of gate teleportation.However,these methods have limitations in terms of efficiency and scalability.Addressing this,our work jointly considers gate and qubit teleportations introducing a novel meta-heuristic algorithm to minimise the network cost of executing a quantum circuit.By allowing dynamic reallocation of qubits along with gate teleportations during circuit execution,our method significantly enhances the overall efficacy and potential scalability of DQC frameworks.In our numerical analysis,we demonstrate that integrating qubit teleportations into our genetic algorithm for optimizing circuit blocking reduces the required resources,specifically the number of EPR pairs,compared to traditional graph partitioning methods.Our results,derived fromboth benchmark and randomly generated circuits,show that as circuit complexity increases—demanding more qubit teleportations—our approach effectively optimises these teleportations throughout the execution,thereby enhancing performance through strategic circuit partitioning.This is a step forward in the pursuit of a global quantum compiler which will ultimately enable the efficient use of a‘quantum data center’in the future.展开更多
With the rapid development of generative artificial intelligence(GenAI),the task of story visualization,which transforms natural language narratives into coherent and consistent image sequences,has attracted growing r...With the rapid development of generative artificial intelligence(GenAI),the task of story visualization,which transforms natural language narratives into coherent and consistent image sequences,has attracted growing research attention.However,existing methods still face limitations in balancing multi-frame character consistency and generation efficiency,which restricts their feasibility for large-scale practical applications.To address this issue,this study proposes a modular cloud-based distributed system built on Stable Diffusion.By separating the character generation and story generation processes,and integratingmulti-feature control techniques,a cachingmechanism,and an asynchronous task queue architecture,the system enhances generation efficiency and scalability.The experimental design includes both automated and human evaluations of character consistency,performance testing,and multinode simulation.The results show that the proposed system outperforms the baseline model StoryGen in both CLIP-I and human evaluation metrics.In terms of performance,under the experimental environment of this study,dual-node deployment reduces average waiting time by approximately 19%,while the four-node simulation further reduces it by up to 65%.Overall,this study demonstrates the advantages of cloud-distributed GenAI in maintaining character consistency and reducing generation latency,highlighting its potential value inmulti-user collaborative story visualization applications.展开更多
Federated learning often experiences slow and unstable convergence due to edge-side data heterogeneity.This problem becomes more severe when edge participation rate is low,as the information collected from different e...Federated learning often experiences slow and unstable convergence due to edge-side data heterogeneity.This problem becomes more severe when edge participation rate is low,as the information collected from different edge devices varies significantly.As a result,communication overhead increases,which further slows down the convergence process.To address this challenge,we propose a simple yet effective federated learning framework that improves consistency among edge devices.The core idea is clusters the lookahead gradients collected from edge devices on the cloud server to obtain personalized momentum for steering local updates.In parallel,a global momentum is applied during model aggregation,enabling faster convergence while preserving personalization.This strategy enables efficient propagation of the estimated global update direction to all participating edge devices and maintains alignment in local training,without introducing extra memory or communication overhead.We conduct extensive experiments on benchmark datasets such as Cifar100 and Tiny-ImageNet.The results confirm the effectiveness of our framework.On CIFAR-100,our method reaches 55%accuracy with 37 fewer rounds and achieves a competitive final accuracy of 65.46%.Even under extreme non-IID scenarios,it delivers significant improvements in both accuracy and communication efficiency.The implementation is publicly available at https://github.com/sjmp525/CollaborativeComputing/tree/FedCCM(accessed on 20 October 2025).展开更多
With the continuous use of cloud and distributed computing, the threats associated with data and information technology (IT) in such an environment have also increased. Thus, data security and data leakage prevention ...With the continuous use of cloud and distributed computing, the threats associated with data and information technology (IT) in such an environment have also increased. Thus, data security and data leakage prevention have become important in a distributed environment. In this aspect, mobile agent-based systems are one of the latest mechanisms to identify and prevent the intrusion and leakage of the data across the network. Thus, to tackle one or more of the several challenges on Mobile Agent-Based Information Leakage Prevention, this paper aim at providing a comprehensive, detailed, and systematic study of the Distribution Model for Mobile Agent-Based Information Leakage Prevention. This paper involves the review of papers selected from the journals which are published in 2009 and 2019. The critical review is presented for the distributed mobile agent-based intrusion detection systems in terms of their design analysis, techniques, and shortcomings. Initially, eighty-five papers were identified, but a paper selection process reduced the number of papers to thirteen important reviews.展开更多
Federated Learning(FL)has become a popular training paradigm in recent years.However,stragglers are critical bottlenecks in an Internet of Things(IoT)network while training.These nodes produce stale updates to the ser...Federated Learning(FL)has become a popular training paradigm in recent years.However,stragglers are critical bottlenecks in an Internet of Things(IoT)network while training.These nodes produce stale updates to the server,which slow down the convergence.In this paper,we studied the impact of the stale updates on the global model,which is observed to be significant.To address this,we propose a weighted averaging scheme,FedStrag,that optimizes the training with stale updates.The work is focused on training a model in an IoT network that has multiple challenges,such as resource constraints,stragglers,network issues,device heterogeneity,etc.To this end,we developed a time-bounded asynchronous FL paradigm that can train a model on the continuous iflow of data in the edge-fog-cloud continuum.To test the FedStrag approach,a model is trained with multiple stragglers scenarios on both Independent and Identically Distributed(IID)and non-IID datasets on Raspberry Pis.The experiment results suggest that the FedStrag outperforms the baseline FedAvg in all possible cases.展开更多
Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demand...Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demands of resource utilization and performance optimization.Scheduling strategy optimization,as a key direction for improving Spark’s execution efficiency,has attracted widespread attention.This paper first introduces the basic theories of Spark,compares several default scheduling strategies,and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency.Subsequently,existing scheduling optimization schemes are summarized based on three scheduling modes:load characteristics,cluster characteristics,and matching of both,and representative algorithms are analyzed in terms of performance indicators and applicable scenarios,comparing the advantages and disadvantages of different scheduling modes.The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments.Finally,the limitations of the existing schemes are analyzed,and prospects are envisioned.展开更多
Distributed computing is an important topic in the field of wireless communications and networking,and its high efficiency in handling large amounts of data is particularly noteworthy.Although distributed computing be...Distributed computing is an important topic in the field of wireless communications and networking,and its high efficiency in handling large amounts of data is particularly noteworthy.Although distributed computing benefits from its ability of processing data in parallel,the communication burden between different servers is incurred,thereby the computation process is detained.Recent researches have applied coding in distributed computing to reduce the communication burden,where repetitive computation is utilized to enable multicast opportunities so that the same coded information can be reused across different servers.To handle the computation tasks in practical heterogeneous systems,we propose a novel coding scheme to effectively mitigate the "straggling effect" in distributed computing.We assume that there are two types of servers in the system and the only difference between them is their computational capabilities,the servers with lower computational capabilities are called stragglers.Given any ratio of fast servers to slow servers and any gap of computational capabilities between them,we achieve approximately the same computation time for both fast and slow servers by assigning different amounts of computation tasks to them,thus reducing the overall computation time.Furthermore,we investigate the informationtheoretic lower bound of the inter-communication load and show that the lower bound is within a constant multiplicative gap to the upper bound achieved by our scheme.Various simulations also validate the effectiveness of the proposed scheme.展开更多
In the current noisy intermediate-scale quantum(NISQ)era,a single quantum processing unit(QPU)is insufficient to implement large-scale quantum algorithms;this has driven extensive research into distributed quantum com...In the current noisy intermediate-scale quantum(NISQ)era,a single quantum processing unit(QPU)is insufficient to implement large-scale quantum algorithms;this has driven extensive research into distributed quantum computing(DQC).DQC involves the cooperative operation of multiple QPUs but is concurrently challenged by excessive communication complexity.To address this issue,this paper proposes a quantum circuit partitioning method based on spectral clustering.The approach transforms quantum circuits into weighted graphs and,through computation of the Laplacian matrix and clustering techniques,identifies candidate partition schemes that minimize the total weight of the cut.Additionally,a global gate search tree strategy is introduced to meticulously explore opportunities for merged transfer of global gates,thereby minimizing the transmission cost of distributed quantum circuits and selecting the optimal partition scheme from the candidates.Finally,the proposed method is evaluated through various comparative experiments.The experimental results demonstrate that spectral clustering-based partitioning exhibits robust stability and efficiency in runtime in quantum circuits of different scales.In experiments involving the quantum Fourier transform algorithm and Revlib quantum circuits,the transmission cost achieved by the global gate search tree strategy is significantly optimized.展开更多
[Objective]This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models,as we...[Objective]This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models,as well as their inefficient execution.[Methods]We propose an automatic distributed parallelization method for AI-for-Science tasks,called FlowAware.Based on the AI-for-Science framework JAX,this approach thoroughly analyzes task characteristics,operator structures,and data flow properties of deep learning models.By incorporating cluster topology information,it constructs a search space for distributed parallel computing strategies.Guided by load balancing and communication optimization objectives,FlowAware automatically identifies optimal distributed parallel computing strategies for AI models.[Results]Comparative experiments conducted on both GPU-like accelerator clusters and GPU clusters demonstrated that FlowAware achieves a throughput improvement of up to 7.8×compared to Alpa.[Conclusions]FlowAware effectively enhances the search efficiency of distributed parallel computing strategies for AI models in scientific computing tasks and significantly improves their computational performance.展开更多
An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advan...An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.展开更多
The shortage of computation methods and storage devices has largely limited the development of multiobjective optimization in industrial processes.To improve the operational levels of the process industries,we propose...The shortage of computation methods and storage devices has largely limited the development of multiobjective optimization in industrial processes.To improve the operational levels of the process industries,we propose a multi-objective optimization framework based on cloud services and a cloud distribution system.Real-time data from manufacturing procedures are first temporarily stored in a local database,and then transferred to the relational database in the cloud.Next,a distribution system with elastic compute power is set up for the optimization framework.Finally,a multi-objective optimization model based on deep learning and an evolutionary algorithm is proposed to optimize several conflicting goals of the blast furnace ironmaking process.With the application of this optimization service in a cloud factory,iron production was found to increase by 83.91 t∙d^(-1),the coke ratio decreased 13.50 kg∙t^(-1),and the silicon content decreased by an average of 0.047%.展开更多
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interacti...Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.展开更多
The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this pape...The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this paper,we propose a novel method for mobile device model recognition by using statistical information derived from large amounts of mobile network traffic data.Specifically,we create a Jaccardbased coefficient measure method to identify a proper keyword representing each mobile device model from massive unstructured textual HTTP access logs.To handle the large amount of traffic data generated from large mobile networks,this method is designed as a set of parallel algorithms,and is implemented through the MapReduce framework which is a distributed parallel programming model with proven low-cost and high-efficiency features.Evaluations using real data sets show that our method can accurately recognise mobile client models while meeting the scalability and producer-independency requirements of large mobile network operators.Results show that a 91.5% accuracy rate is achieved for recognising mobile client models from 2 billion records,which is dramatically higher than existing solutions.展开更多
In LEO(Low Earth Orbit)satellite communication systems,the satellite network is made up of a large number of satellites,the dynamically changing network environment affects the results of distributed computing.In orde...In LEO(Low Earth Orbit)satellite communication systems,the satellite network is made up of a large number of satellites,the dynamically changing network environment affects the results of distributed computing.In order to improve the fault tolerance rate,a novel public blockchain consensus mechanism that applies a distributed computing architecture in a public network is proposed.Redundant calculation of blockchain ensures the credibility of the results;and the transactions with calculation results of a task are stored distributed in sequence in Directed Acyclic Graphs(DAG).The transactions issued by nodes are connected to form a net.The net can quickly provide node reputation evaluation that does not rely on third parties.Simulations show that our proposed blockchain has the following advantages:1.The task processing speed of the blockchain can be close to that of the fastest node in the entire blockchain;2.When the tasks’arrival time intervals and demanded working nodes(WNs)meet certain conditions,the network can tolerate more than 50%of malicious devices;3.No matter the number of nodes in the blockchain is increased or reduced,the network can keep robustness by adjusting the task’s arrival time interval and demanded WNs.展开更多
A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process u...A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process under a series of complex constraints,which is important for enhancing the matching between resources and requirements.A complex algorithm is not available because that the LEO on-board resources is limi-ted.The proposed genetic algorithm(GA)based on two-dimen-sional individual model and uncorrelated single paternal inheri-tance method is designed to support distributed computation to enhance the feasibility of on-board application.A distributed system composed of eight embedded devices is built to verify the algorithm.A typical scenario is built in the system to evalu-ate the resource allocation process,algorithm mathematical model,trigger strategy,and distributed computation architec-ture.According to the simulation and measurement results,the proposed algorithm can provide an allocation result for more than 1500 tasks in 14 s and the success rate is more than 91%in a typical scene.The response time is decreased by 40%com-pared with the conditional GA.展开更多
A computational fluid dynamics (CFD) approach is used to study the respiratory airflow dynamics within a human upper airway. The airway model which consists of the airway from nasal cavity, pharynx, larynx and trach...A computational fluid dynamics (CFD) approach is used to study the respiratory airflow dynamics within a human upper airway. The airway model which consists of the airway from nasal cavity, pharynx, larynx and trachea to triple bifurcation is built based on the CT images of a healthy volunteer and the Weibel model. The flow character- istics of the whole upper airway are quantitatively described at any time level of respiratory cycle. Simulation results of respiratory flow show good agreement with the clinical mea- sures, experimental and computational results in the litera- ture. The air mainly passes through the floor of the nasal cavity in the common, middle and inferior nasal meatus. The higher airway resistance and wall shear stresses are distrib- uted on the posterior nasal valve. Although the airways of pharynx, larynx and bronchi experience low shear stresses, it is notable that relatively high shear stresses are distrib- uted on the wall of epiglottis and bronchial bifurcations. Besides, two-dimensional fluid-structure interaction models of normal and abnormal airways are built to discuss the flow-induced deformation in various anatomy models. The result shows that the wall deformation in normal airway is relatively small.展开更多
To security support large-scale intelligent applications,distributed machine learning based on blockchain is an intuitive solution scheme.However,the distributed machine learning is difficult to train due to that the ...To security support large-scale intelligent applications,distributed machine learning based on blockchain is an intuitive solution scheme.However,the distributed machine learning is difficult to train due to that the corresponding optimization solver algorithms converge slowly,which highly demand on computing and memory resources.To overcome the challenges,we propose a distributed computing framework for L-BFGS optimization algorithm based on variance reduction method,which is a lightweight,few additional cost and parallelized scheme for the model training process.To validate the claims,we have conducted several experiments on multiple classical datasets.Results show that our proposed computing framework can steadily accelerate the training process of solver in either local mode or distributed mode.展开更多
Landslide inventory plays an important role in recording landslide events and showing their temporal-spatial distribution. This paper describes the development, visualization, and analysis of a China's Landslide I...Landslide inventory plays an important role in recording landslide events and showing their temporal-spatial distribution. This paper describes the development, visualization, and analysis of a China's Landslide Inventory Database(Cs LID) by utilizing Google's public cloud computing platform. Firstly, Cs LID(Landslide Inventory Database) compiles a total of 1221 historical landslide events spanning the years 1949-2011 from relevant data sources. Secondly, the Cs LID is further broken down into six zones for characterizing landslide cause-effect, spatiotemporal distribution, fatalities, and socioeconomic impacts based on the geological environment and terrain. The results show that among all the six zones, zone V, located in Qinba and Southwest Mountainous Area is the most active landslide hotspot with the highest landslide hazard in China. Additionally, the Google public cloud computing platform enables the Cs LID to be easily accessible, visually interactive, and with the capability of allowing new data input to dynamically augment the database. This work developed a cyber-landslide inventory and used it to analyze the landslide temporal-spatial distribution in China.展开更多
In this paper,we develop a distributed solver for a group of strict(non-strict)linear matrix inequalities over a multi-agent network,where each agent only knows one inequality,and all agents co-operate to reach a cons...In this paper,we develop a distributed solver for a group of strict(non-strict)linear matrix inequalities over a multi-agent network,where each agent only knows one inequality,and all agents co-operate to reach a consensus solution in the intersection of all the feasible regions.The formulation is transformed into a distributed optimization problem by introducing slack variables and consensus constraints.Then,by the primal–dual methods,a distributed algorithm is proposed with the help of projection operators and derivative feedback.Finally,the convergence of the algorithm is analyzed,followed by illustrative simulations.展开更多
Mobile agents provide a new method for the distributed computation. This paper presents the advantages of using mobile agents in a distributed virtual environment (DVE) system, and describes the architecture of hetero...Mobile agents provide a new method for the distributed computation. This paper presents the advantages of using mobile agents in a distributed virtual environment (DVE) system, and describes the architecture of heterogeneous computer's distributed virtual environment system (HCWES) designed to populate some mobile agents as well as stationary agents. Finally, the paper introduces how heterogeneous computer network communication is to be realized.展开更多
文摘Distributed Quantum Computing(DQC)provides a means for scaling available quantum computation by interconnecting multiple quantum processor units(QPUs).A key challenge in this domain is efficiently allocating logical qubits from quantum circuits to the physical qubits within QPUs,a task known to be NP-hard.Traditional approaches,primarily focused on graph partitioning strategies,have sought to reduce the number of required Bell pairs for executing non-local CNOT operations,a form of gate teleportation.However,these methods have limitations in terms of efficiency and scalability.Addressing this,our work jointly considers gate and qubit teleportations introducing a novel meta-heuristic algorithm to minimise the network cost of executing a quantum circuit.By allowing dynamic reallocation of qubits along with gate teleportations during circuit execution,our method significantly enhances the overall efficacy and potential scalability of DQC frameworks.In our numerical analysis,we demonstrate that integrating qubit teleportations into our genetic algorithm for optimizing circuit blocking reduces the required resources,specifically the number of EPR pairs,compared to traditional graph partitioning methods.Our results,derived fromboth benchmark and randomly generated circuits,show that as circuit complexity increases—demanding more qubit teleportations—our approach effectively optimises these teleportations throughout the execution,thereby enhancing performance through strategic circuit partitioning.This is a step forward in the pursuit of a global quantum compiler which will ultimately enable the efficient use of a‘quantum data center’in the future.
文摘With the rapid development of generative artificial intelligence(GenAI),the task of story visualization,which transforms natural language narratives into coherent and consistent image sequences,has attracted growing research attention.However,existing methods still face limitations in balancing multi-frame character consistency and generation efficiency,which restricts their feasibility for large-scale practical applications.To address this issue,this study proposes a modular cloud-based distributed system built on Stable Diffusion.By separating the character generation and story generation processes,and integratingmulti-feature control techniques,a cachingmechanism,and an asynchronous task queue architecture,the system enhances generation efficiency and scalability.The experimental design includes both automated and human evaluations of character consistency,performance testing,and multinode simulation.The results show that the proposed system outperforms the baseline model StoryGen in both CLIP-I and human evaluation metrics.In terms of performance,under the experimental environment of this study,dual-node deployment reduces average waiting time by approximately 19%,while the four-node simulation further reduces it by up to 65%.Overall,this study demonstrates the advantages of cloud-distributed GenAI in maintaining character consistency and reducing generation latency,highlighting its potential value inmulti-user collaborative story visualization applications.
基金supported by the National Natural Science Foundation of China(62462040)the Yunnan Fundamental Research Projects(202501AT070345)the Major Science and Technology Projects in Yunnan Province(202202AD080013).
文摘Federated learning often experiences slow and unstable convergence due to edge-side data heterogeneity.This problem becomes more severe when edge participation rate is low,as the information collected from different edge devices varies significantly.As a result,communication overhead increases,which further slows down the convergence process.To address this challenge,we propose a simple yet effective federated learning framework that improves consistency among edge devices.The core idea is clusters the lookahead gradients collected from edge devices on the cloud server to obtain personalized momentum for steering local updates.In parallel,a global momentum is applied during model aggregation,enabling faster convergence while preserving personalization.This strategy enables efficient propagation of the estimated global update direction to all participating edge devices and maintains alignment in local training,without introducing extra memory or communication overhead.We conduct extensive experiments on benchmark datasets such as Cifar100 and Tiny-ImageNet.The results confirm the effectiveness of our framework.On CIFAR-100,our method reaches 55%accuracy with 37 fewer rounds and achieves a competitive final accuracy of 65.46%.Even under extreme non-IID scenarios,it delivers significant improvements in both accuracy and communication efficiency.The implementation is publicly available at https://github.com/sjmp525/CollaborativeComputing/tree/FedCCM(accessed on 20 October 2025).
文摘With the continuous use of cloud and distributed computing, the threats associated with data and information technology (IT) in such an environment have also increased. Thus, data security and data leakage prevention have become important in a distributed environment. In this aspect, mobile agent-based systems are one of the latest mechanisms to identify and prevent the intrusion and leakage of the data across the network. Thus, to tackle one or more of the several challenges on Mobile Agent-Based Information Leakage Prevention, this paper aim at providing a comprehensive, detailed, and systematic study of the Distribution Model for Mobile Agent-Based Information Leakage Prevention. This paper involves the review of papers selected from the journals which are published in 2009 and 2019. The critical review is presented for the distributed mobile agent-based intrusion detection systems in terms of their design analysis, techniques, and shortcomings. Initially, eighty-five papers were identified, but a paper selection process reduced the number of papers to thirteen important reviews.
基金supported by SERB,India,through grant CRG/2021/003888financial support to UoH-IoE by MHRD,India(F11/9/2019-U3(A)).
文摘Federated Learning(FL)has become a popular training paradigm in recent years.However,stragglers are critical bottlenecks in an Internet of Things(IoT)network while training.These nodes produce stale updates to the server,which slow down the convergence.In this paper,we studied the impact of the stale updates on the global model,which is observed to be significant.To address this,we propose a weighted averaging scheme,FedStrag,that optimizes the training with stale updates.The work is focused on training a model in an IoT network that has multiple challenges,such as resource constraints,stragglers,network issues,device heterogeneity,etc.To this end,we developed a time-bounded asynchronous FL paradigm that can train a model on the continuous iflow of data in the edge-fog-cloud continuum.To test the FedStrag approach,a model is trained with multiple stragglers scenarios on both Independent and Identically Distributed(IID)and non-IID datasets on Raspberry Pis.The experiment results suggest that the FedStrag outperforms the baseline FedAvg in all possible cases.
基金supported in part by the Key Research and Development Program of Shaanxi under Grant 2023-ZDLGY-34.
文摘Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demands of resource utilization and performance optimization.Scheduling strategy optimization,as a key direction for improving Spark’s execution efficiency,has attracted widespread attention.This paper first introduces the basic theories of Spark,compares several default scheduling strategies,and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency.Subsequently,existing scheduling optimization schemes are summarized based on three scheduling modes:load characteristics,cluster characteristics,and matching of both,and representative algorithms are analyzed in terms of performance indicators and applicable scenarios,comparing the advantages and disadvantages of different scheduling modes.The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments.Finally,the limitations of the existing schemes are analyzed,and prospects are envisioned.
基金supported by NSF China(No.T2421002,62061146002,62020106005)。
文摘Distributed computing is an important topic in the field of wireless communications and networking,and its high efficiency in handling large amounts of data is particularly noteworthy.Although distributed computing benefits from its ability of processing data in parallel,the communication burden between different servers is incurred,thereby the computation process is detained.Recent researches have applied coding in distributed computing to reduce the communication burden,where repetitive computation is utilized to enable multicast opportunities so that the same coded information can be reused across different servers.To handle the computation tasks in practical heterogeneous systems,we propose a novel coding scheme to effectively mitigate the "straggling effect" in distributed computing.We assume that there are two types of servers in the system and the only difference between them is their computational capabilities,the servers with lower computational capabilities are called stragglers.Given any ratio of fast servers to slow servers and any gap of computational capabilities between them,we achieve approximately the same computation time for both fast and slow servers by assigning different amounts of computation tasks to them,thus reducing the overall computation time.Furthermore,we investigate the informationtheoretic lower bound of the inter-communication load and show that the lower bound is within a constant multiplicative gap to the upper bound achieved by our scheme.Various simulations also validate the effectiveness of the proposed scheme.
基金supported by the National Natural Science Foundation of China(Grant No.62072259)in part by the Natural Science Foundation of Jiangsu Province(Grant No.BK20221411)+1 种基金the PhD Start-up Fund of Nantong University(Grant No.23B03)the Postgraduate Research&Practice Innovation Program of School of Information Science and Technology,Nantong University(Grant No.NTUSISTPR2405).
文摘In the current noisy intermediate-scale quantum(NISQ)era,a single quantum processing unit(QPU)is insufficient to implement large-scale quantum algorithms;this has driven extensive research into distributed quantum computing(DQC).DQC involves the cooperative operation of multiple QPUs but is concurrently challenged by excessive communication complexity.To address this issue,this paper proposes a quantum circuit partitioning method based on spectral clustering.The approach transforms quantum circuits into weighted graphs and,through computation of the Laplacian matrix and clustering techniques,identifies candidate partition schemes that minimize the total weight of the cut.Additionally,a global gate search tree strategy is introduced to meticulously explore opportunities for merged transfer of global gates,thereby minimizing the transmission cost of distributed quantum circuits and selecting the optimal partition scheme from the candidates.Finally,the proposed method is evaluated through various comparative experiments.The experimental results demonstrate that spectral clustering-based partitioning exhibits robust stability and efficiency in runtime in quantum circuits of different scales.In experiments involving the quantum Fourier transform algorithm and Revlib quantum circuits,the transmission cost achieved by the global gate search tree strategy is significantly optimized.
基金supported by the National Key Research and Development Program of China(2023YFB3001501)the National Natural Science Foundation of China(NSFC)(62302133)+3 种基金the Key Research and Development Program of Zhejiang Province(2024C01026)the Yangtze River Delta Project(2023ZY1068)Hangzhou Key Research Plan Project(2024SZD1A02)the GHfund A(202302019816).
文摘[Objective]This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models,as well as their inefficient execution.[Methods]We propose an automatic distributed parallelization method for AI-for-Science tasks,called FlowAware.Based on the AI-for-Science framework JAX,this approach thoroughly analyzes task characteristics,operator structures,and data flow properties of deep learning models.By incorporating cluster topology information,it constructs a search space for distributed parallel computing strategies.Guided by load balancing and communication optimization objectives,FlowAware automatically identifies optimal distributed parallel computing strategies for AI models.[Results]Comparative experiments conducted on both GPU-like accelerator clusters and GPU clusters demonstrated that FlowAware achieves a throughput improvement of up to 7.8×compared to Alpa.[Conclusions]FlowAware effectively enhances the search efficiency of distributed parallel computing strategies for AI models in scientific computing tasks and significantly improves their computational performance.
文摘An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.
基金This work was supported in part by the National Natural Science Foundation of China(61933015).
文摘The shortage of computation methods and storage devices has largely limited the development of multiobjective optimization in industrial processes.To improve the operational levels of the process industries,we propose a multi-objective optimization framework based on cloud services and a cloud distribution system.Real-time data from manufacturing procedures are first temporarily stored in a local database,and then transferred to the relational database in the cloud.Next,a distribution system with elastic compute power is set up for the optimization framework.Finally,a multi-objective optimization model based on deep learning and an evolutionary algorithm is proposed to optimize several conflicting goals of the blast furnace ironmaking process.With the application of this optimization service in a cloud factory,iron production was found to increase by 83.91 t∙d^(-1),the coke ratio decreased 13.50 kg∙t^(-1),and the silicon content decreased by an average of 0.047%.
基金This work was supported in part by the National Natural Science Foundation of China(61772493)the CAAI-Huawei MindSpore Open Fund(CAAIXSJLJJ-2020-004B)+4 种基金the Natural Science Foundation of Chongqing(China)(cstc2019jcyjjqX0013)Chongqing Research Program of Technology Innovation and Application(cstc2019jscx-fxydX0024,cstc2019jscx-fxydX0027,cstc2018jszx-cyzdX0041)Guangdong Province Universities and College Pearl River Scholar Funded Scheme(2019)the Pioneer Hundred Talents Program of Chinese Academy of Sciencesthe Deanship of Scientific Research(DSR)at King Abdulaziz University(G-21-135-38).
文摘Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.
基金supported in part by the National Natural Science Foundation of China under Grant No.61072061the National Science and Technology Major Projects under Grant No.2012ZX03002008the Fundamental Research Funds for the Central Universities under Grant No.2012RC0121
文摘The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this paper,we propose a novel method for mobile device model recognition by using statistical information derived from large amounts of mobile network traffic data.Specifically,we create a Jaccardbased coefficient measure method to identify a proper keyword representing each mobile device model from massive unstructured textual HTTP access logs.To handle the large amount of traffic data generated from large mobile networks,this method is designed as a set of parallel algorithms,and is implemented through the MapReduce framework which is a distributed parallel programming model with proven low-cost and high-efficiency features.Evaluations using real data sets show that our method can accurately recognise mobile client models while meeting the scalability and producer-independency requirements of large mobile network operators.Results show that a 91.5% accuracy rate is achieved for recognising mobile client models from 2 billion records,which is dramatically higher than existing solutions.
基金funded in part by the National Natural Science Foundation of China (Grant no. 61772352, 62172061, 61871422)National Key Research and Development Project (Grants nos. 2020YFB1711800 and 2020YFB1707900)+2 种基金the Science and Technology Project of Sichuan Province (Grants no. 2021YFG0152, 2021YFG0025, 2020YFG0479, 2020YFG0322, 2020GFW035, 2020GFW033, 2020YFH0071)the R&D Project of Chengdu City (Grant no. 2019-YF05-01790-GX)the Central Universities of Southwest Minzu University (Grants no. ZYN2022032)
文摘In LEO(Low Earth Orbit)satellite communication systems,the satellite network is made up of a large number of satellites,the dynamically changing network environment affects the results of distributed computing.In order to improve the fault tolerance rate,a novel public blockchain consensus mechanism that applies a distributed computing architecture in a public network is proposed.Redundant calculation of blockchain ensures the credibility of the results;and the transactions with calculation results of a task are stored distributed in sequence in Directed Acyclic Graphs(DAG).The transactions issued by nodes are connected to form a net.The net can quickly provide node reputation evaluation that does not rely on third parties.Simulations show that our proposed blockchain has the following advantages:1.The task processing speed of the blockchain can be close to that of the fastest node in the entire blockchain;2.When the tasks’arrival time intervals and demanded working nodes(WNs)meet certain conditions,the network can tolerate more than 50%of malicious devices;3.No matter the number of nodes in the blockchain is increased or reduced,the network can keep robustness by adjusting the task’s arrival time interval and demanded WNs.
基金This work was supported by the National Key Research and Development Program of China(2021YFB2900603)the National Natural Science Foundation of China(61831008).
文摘A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process under a series of complex constraints,which is important for enhancing the matching between resources and requirements.A complex algorithm is not available because that the LEO on-board resources is limi-ted.The proposed genetic algorithm(GA)based on two-dimen-sional individual model and uncorrelated single paternal inheri-tance method is designed to support distributed computation to enhance the feasibility of on-board application.A distributed system composed of eight embedded devices is built to verify the algorithm.A typical scenario is built in the system to evalu-ate the resource allocation process,algorithm mathematical model,trigger strategy,and distributed computation architec-ture.According to the simulation and measurement results,the proposed algorithm can provide an allocation result for more than 1500 tasks in 14 s and the success rate is more than 91%in a typical scene.The response time is decreased by 40%com-pared with the conditional GA.
基金supported by the National Natural Science Foundation of China (10472025 10672036+1 种基金 10872043)Natural Science Foundation of Liaoning Province, China (20032109).
文摘A computational fluid dynamics (CFD) approach is used to study the respiratory airflow dynamics within a human upper airway. The airway model which consists of the airway from nasal cavity, pharynx, larynx and trachea to triple bifurcation is built based on the CT images of a healthy volunteer and the Weibel model. The flow character- istics of the whole upper airway are quantitatively described at any time level of respiratory cycle. Simulation results of respiratory flow show good agreement with the clinical mea- sures, experimental and computational results in the litera- ture. The air mainly passes through the floor of the nasal cavity in the common, middle and inferior nasal meatus. The higher airway resistance and wall shear stresses are distrib- uted on the posterior nasal valve. Although the airways of pharynx, larynx and bronchi experience low shear stresses, it is notable that relatively high shear stresses are distrib- uted on the wall of epiglottis and bronchial bifurcations. Besides, two-dimensional fluid-structure interaction models of normal and abnormal airways are built to discuss the flow-induced deformation in various anatomy models. The result shows that the wall deformation in normal airway is relatively small.
基金partly supported by National Key Basic Research Program of China(2016YFB1000100)partly supported by National Natural Science Foundation of China(NO.61402490)。
文摘To security support large-scale intelligent applications,distributed machine learning based on blockchain is an intuitive solution scheme.However,the distributed machine learning is difficult to train due to that the corresponding optimization solver algorithms converge slowly,which highly demand on computing and memory resources.To overcome the challenges,we propose a distributed computing framework for L-BFGS optimization algorithm based on variance reduction method,which is a lightweight,few additional cost and parallelized scheme for the model training process.To validate the claims,we have conducted several experiments on multiple classical datasets.Results show that our proposed computing framework can steadily accelerate the training process of solver in either local mode or distributed mode.
基金funded by National Natural Science Foundation (Grant No. 41501458)National Natural Science Foundation (Grant No. 41201380)+4 种基金National Basic Research Program of China: (Grant No. 2013CB733204)Key Laboratory of Mining Spatial Information Technology of NASMG (KLM201309)Science Program of Shanghai Normal University (SK201525)sponsored by Shanghai Gaofeng & Gaoyuan Project for University Academic Program Development, project 2013LASW-A09, project SKHL1310the Center of Spatial Information Science and Sustainable Development Applications, Tongji University, Shanghai, China
文摘Landslide inventory plays an important role in recording landslide events and showing their temporal-spatial distribution. This paper describes the development, visualization, and analysis of a China's Landslide Inventory Database(Cs LID) by utilizing Google's public cloud computing platform. Firstly, Cs LID(Landslide Inventory Database) compiles a total of 1221 historical landslide events spanning the years 1949-2011 from relevant data sources. Secondly, the Cs LID is further broken down into six zones for characterizing landslide cause-effect, spatiotemporal distribution, fatalities, and socioeconomic impacts based on the geological environment and terrain. The results show that among all the six zones, zone V, located in Qinba and Southwest Mountainous Area is the most active landslide hotspot with the highest landslide hazard in China. Additionally, the Google public cloud computing platform enables the Cs LID to be easily accessible, visually interactive, and with the capability of allowing new data input to dynamically augment the database. This work developed a cyber-landslide inventory and used it to analyze the landslide temporal-spatial distribution in China.
基金This work was supported by the Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0100)the National Natural Science Foundation of China(Nos.61733018,62073035)。
文摘In this paper,we develop a distributed solver for a group of strict(non-strict)linear matrix inequalities over a multi-agent network,where each agent only knows one inequality,and all agents co-operate to reach a consensus solution in the intersection of all the feasible regions.The formulation is transformed into a distributed optimization problem by introducing slack variables and consensus constraints.Then,by the primal–dual methods,a distributed algorithm is proposed with the help of projection operators and derivative feedback.Finally,the convergence of the algorithm is analyzed,followed by illustrative simulations.
文摘Mobile agents provide a new method for the distributed computation. This paper presents the advantages of using mobile agents in a distributed virtual environment (DVE) system, and describes the architecture of heterogeneous computer's distributed virtual environment system (HCWES) designed to populate some mobile agents as well as stationary agents. Finally, the paper introduces how heterogeneous computer network communication is to be realized.