In non-independent and identically distributed(non-IID)data environments,model performance often degrades significantly.To address this issue,two improvement methods are proposed:FedReg and FedReg^(*).FedReg is a meth...In non-independent and identically distributed(non-IID)data environments,model performance often degrades significantly.To address this issue,two improvement methods are proposed:FedReg and FedReg^(*).FedReg is a method based on hybrid regularization aimed at enhancing federated learning in non-IID scenarios.It introduces hybrid regularization to replace traditional L2 regularization,combining the advantages of L1 and L2 regularization to enable feature selection while preventing overfitting.This method better adapts to the diverse data distributions of different clients,improving the overall model performance.FedReg^(*)combines hybrid regularization with weighted model aggregation.In addition to the benefits of hybrid regularization,FedReg^(*)applies a weighted averaging method in the model aggregation process,calculating weights based on the cosine similarity between each client gradient and the global gradient to more reasonably distribute client contributions.By considering variations in data quality and quantity among clients,FedReg^(*)highlights the importance of key clients and enhances the model’s generalization performance.These improvement methods enhance model accuracy and communication efficiency.展开更多
Small angle x-ray scattering(SAXS)is an advanced technique for characterizing the particle size distribution(PSD)of nanoparticles.However,the ill-posed nature of inverse problems in SAXS data analysis often reduces th...Small angle x-ray scattering(SAXS)is an advanced technique for characterizing the particle size distribution(PSD)of nanoparticles.However,the ill-posed nature of inverse problems in SAXS data analysis often reduces the accuracy of conventional methods.This article proposes a user-friendly software for PSD analysis,GranuSAS,which employs an algorithm that integrates truncated singular value decomposition(TSVD)with the Chahine method.This approach employs TSVD for data preprocessing,generating a set of initial solutions with noise suppression.A high-quality initial solution is subsequently selected via the L-curve method.This selected candidate solution is then iteratively refined by the Chahine algorithm,enforcing constraints such as non-negativity and improving physical interpretability.Most importantly,GranuSAS employs a parallel architecture that simultaneously yields inversion results from multiple shape models and,by evaluating the accuracy of each model's reconstructed scattering curve,offers a suggestion for model selection in material systems.To systematically validate the accuracy and efficiency of the software,verification was performed using both simulated and experimental datasets.The results demonstrate that the proposed software delivers both satisfactory accuracy and reliable computational efficiency.It provides an easy-to-use and reliable tool for researchers in materials science,helping them fully exploit the potential of SAXS in nanoparticle characterization.展开更多
This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this resear...This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.展开更多
An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advan...An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.展开更多
This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key de...This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.展开更多
This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a ...This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a coefficient matrices calculation method of the information sharing principle is derived. Finally, the federated Kalman filter is used to combine these independent, parallel, real\|time data. A pseudolite (PL) simulation example is given.展开更多
It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel alg...It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.展开更多
There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detec...There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.展开更多
Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed s...Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.展开更多
With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become ...With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.展开更多
On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since...On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since the 2008 Wenchuan earthquake,and especially in proximity to the seismic gaps on the east Kunlun fault.Here we use satellite interferometric synthetic aperture radar data and subpixel offset observations along the range directions to characterize the coseismic deformation of the earthquake.Range offset displacements depict clear surface ruptures with a total length of~170 km involving two possible activated fault segments in the earthquake.Coseismic modeling results indicate that the earthquake was dominated by left-lateral strike-slip motions of up to 7 m within the top 12 km of the crust.The well-resolved slip variations are characterized by five major slip patches along strike and 64%of shallow slip deficit,suggesting a young seismogenic structure.Spatial-temporal changes of the postseismic deformation are mapped from early 6-day and 24-day InSAR observations,and are well explained by time-dependent afterslip models.Analysis of Global Navigation Satellite System(GNSS)velocity profiles and strain rates suggests that the eastward extrusion of plateau is diffusely distributed across the east Bayan Har block,but exhibits significant lateral heterogeneities,as evidenced by magnetotelluric observations.The block-wide distributed deformation of the east Bayan Har block along with the significant co-and post-seismic stress loadings from the Madoi earthquake imply high seismic risks along regional faults,especially the Tuosuo Lake and Maqên-Maqu segments of the Kunlun fault that are known as seismic gaps.展开更多
This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performan...This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performance and other issues.展开更多
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale...Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale scenarios,and it intuitively performs the SLAM as a pose graph.But because of the high data overlap rate,traditional graph-based SLAM is not efficient in some respects,such as real time performance and memory usage.To reduce1 data overlap rate,a graph-based SLAM with distributed submap strategy(DSS)is presented.In its front-end,submap based scan matching is processed and loop closing detection is conducted.Moreover in its back-end,pose graph is updated for global optimization and submap merging.From a series of experiments,it is demonstrated that graph-based SLAM with DSS reduces 51.79%data overlap rate,decreases 39.70%runtime and 24.60%memory usage.The advantages over other low overlap rate method is also proved in runtime,memory usage,accuracy and robustness performance.展开更多
The purpose of data fusion is to produce an improved model or estimate of a system from a set of independent data sources. Various multisensor data fusion approaches exist, in which Kalman filtering is important. In t...The purpose of data fusion is to produce an improved model or estimate of a system from a set of independent data sources. Various multisensor data fusion approaches exist, in which Kalman filtering is important. In this paper, a fusion algorithm based on multisensor systems is discussed and a distributed multisensor data fusion algorithm based on Kalman filtering presented. The algorithm has been implemented on cluster-based high performance computers. Experimental results show that the method produces precise estimation in considerably reduced execution time.展开更多
To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using d...To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.展开更多
Remote data auditing becomes critical to ensure the storage reliability in distributed cloud storage.Recently,Le et al proposed an efficient private data auditing scheme NC-Audit designed for regenerating codes,which ...Remote data auditing becomes critical to ensure the storage reliability in distributed cloud storage.Recently,Le et al proposed an efficient private data auditing scheme NC-Audit designed for regenerating codes,which claimed that NC-Audit can effectively realize privacy-preserving data auditing for distributed storage systems.However,our analysis shows that NC-Audit is not secure for that the adversarial cloud can forge some illegal blocks to cheat the auditor successfully with a high probability even without storing the user’s whole data,when the coding field is large enough.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of n...Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of network link states.To this end,this paper proposes a distributed asynchronous deep reinforcement learning framework to intensify the dynamics and prediction of adaptive packet scheduling.Our framework contains two parts:local asynchronous packet scheduling and distributed cooperative control center.In local asynchronous packet scheduling,an asynchronous prioritized replay double deep Q-learning packets scheduling algorithm is proposed for dynamic adaptive packet scheduling learning,which makes a combination of prioritized replay double deep Q-learning network(P-DDQN)to make the fitting analysis.In distributed cooperative control center,a distributed scheduling learning and neural fitting acceleration algorithm to adaptively update neural network parameters of P-DDQN for more precise packet scheduling.Experimental results show that our solution has a better performance than Random weight algorithm and Round-Robin algorithm in throughput and loss ratio.Further,our solution has 1.32 times and 1.54 times better than Random weight algorithm and Round-Robin algorithm on the stability of multipath data transmission,respectively.展开更多
Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data...Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.展开更多
文摘In non-independent and identically distributed(non-IID)data environments,model performance often degrades significantly.To address this issue,two improvement methods are proposed:FedReg and FedReg^(*).FedReg is a method based on hybrid regularization aimed at enhancing federated learning in non-IID scenarios.It introduces hybrid regularization to replace traditional L2 regularization,combining the advantages of L1 and L2 regularization to enable feature selection while preventing overfitting.This method better adapts to the diverse data distributions of different clients,improving the overall model performance.FedReg^(*)combines hybrid regularization with weighted model aggregation.In addition to the benefits of hybrid regularization,FedReg^(*)applies a weighted averaging method in the model aggregation process,calculating weights based on the cosine similarity between each client gradient and the global gradient to more reasonably distribute client contributions.By considering variations in data quality and quantity among clients,FedReg^(*)highlights the importance of key clients and enhances the model’s generalization performance.These improvement methods enhance model accuracy and communication efficiency.
基金Project supported by the Project of the Anhui Provincial Natural Science Foundation(Grant No.2308085MA19)Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA0410401)+2 种基金the National Natural Science Foundation of China(Grant No.52202120)the National Key Research and Development Program of China(Grant No.2023YFA1609800)USTC Research Funds of the Double First-Class Initiative(Grant No.YD2310002013)。
文摘Small angle x-ray scattering(SAXS)is an advanced technique for characterizing the particle size distribution(PSD)of nanoparticles.However,the ill-posed nature of inverse problems in SAXS data analysis often reduces the accuracy of conventional methods.This article proposes a user-friendly software for PSD analysis,GranuSAS,which employs an algorithm that integrates truncated singular value decomposition(TSVD)with the Chahine method.This approach employs TSVD for data preprocessing,generating a set of initial solutions with noise suppression.A high-quality initial solution is subsequently selected via the L-curve method.This selected candidate solution is then iteratively refined by the Chahine algorithm,enforcing constraints such as non-negativity and improving physical interpretability.Most importantly,GranuSAS employs a parallel architecture that simultaneously yields inversion results from multiple shape models and,by evaluating the accuracy of each model's reconstructed scattering curve,offers a suggestion for model selection in material systems.To systematically validate the accuracy and efficiency of the software,verification was performed using both simulated and experimental datasets.The results demonstrate that the proposed software delivers both satisfactory accuracy and reliable computational efficiency.It provides an easy-to-use and reliable tool for researchers in materials science,helping them fully exploit the potential of SAXS in nanoparticle characterization.
基金the Researchers Supporting Project,King Saud University,Saudi Arabia,for funding this research work through Project No.RSPD2025R951.
文摘This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.
文摘An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.
基金supported by Poongsan-KAIST Future Research Center Projectthe fund support provided by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Grant No.2023R1A2C2005661)。
文摘This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.
文摘This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a coefficient matrices calculation method of the information sharing principle is derived. Finally, the federated Kalman filter is used to combine these independent, parallel, real\|time data. A pseudolite (PL) simulation example is given.
文摘It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.
文摘There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.
基金Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068)the New Century Excel-lent Talents in University of China (No. NCET-06-0393)
文摘Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.
基金supported by the National Natural Science Foundation of China(Grant No.62102449)the Central Plains Talent Program under Grant No.224200510003.
文摘With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.
基金supported by the Natural Science Foundation of Jiangsu Province(Grant No.SBK2020043202)by Key Laboratory of Geospace Environment and Geodesy,Ministry of Education,Wuhan University(No.19-01-08).
文摘On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since the 2008 Wenchuan earthquake,and especially in proximity to the seismic gaps on the east Kunlun fault.Here we use satellite interferometric synthetic aperture radar data and subpixel offset observations along the range directions to characterize the coseismic deformation of the earthquake.Range offset displacements depict clear surface ruptures with a total length of~170 km involving two possible activated fault segments in the earthquake.Coseismic modeling results indicate that the earthquake was dominated by left-lateral strike-slip motions of up to 7 m within the top 12 km of the crust.The well-resolved slip variations are characterized by five major slip patches along strike and 64%of shallow slip deficit,suggesting a young seismogenic structure.Spatial-temporal changes of the postseismic deformation are mapped from early 6-day and 24-day InSAR observations,and are well explained by time-dependent afterslip models.Analysis of Global Navigation Satellite System(GNSS)velocity profiles and strain rates suggests that the eastward extrusion of plateau is diffusely distributed across the east Bayan Har block,but exhibits significant lateral heterogeneities,as evidenced by magnetotelluric observations.The block-wide distributed deformation of the east Bayan Har block along with the significant co-and post-seismic stress loadings from the Madoi earthquake imply high seismic risks along regional faults,especially the Tuosuo Lake and Maqên-Maqu segments of the Kunlun fault that are known as seismic gaps.
文摘This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performance and other issues.
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
基金the Project Fund for Key Discipline of the Shanghai Municipal Education Commission(No.J50104)the Major State Basic Research Development Program of China(No.2017YFB0403500)。
文摘Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale scenarios,and it intuitively performs the SLAM as a pose graph.But because of the high data overlap rate,traditional graph-based SLAM is not efficient in some respects,such as real time performance and memory usage.To reduce1 data overlap rate,a graph-based SLAM with distributed submap strategy(DSS)is presented.In its front-end,submap based scan matching is processed and loop closing detection is conducted.Moreover in its back-end,pose graph is updated for global optimization and submap merging.From a series of experiments,it is demonstrated that graph-based SLAM with DSS reduces 51.79%data overlap rate,decreases 39.70%runtime and 24.60%memory usage.The advantages over other low overlap rate method is also proved in runtime,memory usage,accuracy and robustness performance.
文摘The purpose of data fusion is to produce an improved model or estimate of a system from a set of independent data sources. Various multisensor data fusion approaches exist, in which Kalman filtering is important. In this paper, a fusion algorithm based on multisensor systems is discussed and a distributed multisensor data fusion algorithm based on Kalman filtering presented. The algorithm has been implemented on cluster-based high performance computers. Experimental results show that the method produces precise estimation in considerably reduced execution time.
文摘To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.
基金Supported by the National Natural Science Foundation of China(61872088)the Science and Technology Plan Project of Xi’an(2020KJWL02,2017CGWL35)the China National Study Abroad Fund。
文摘Remote data auditing becomes critical to ensure the storage reliability in distributed cloud storage.Recently,Le et al proposed an efficient private data auditing scheme NC-Audit designed for regenerating codes,which claimed that NC-Audit can effectively realize privacy-preserving data auditing for distributed storage systems.However,our analysis shows that NC-Audit is not secure for that the adversarial cloud can forge some illegal blocks to cheat the auditor successfully with a high probability even without storing the user’s whole data,when the coding field is large enough.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金the National Key Research and Development Program of China under Grant No.2018YFE0206800the National Natural Science Foundation of Beijing,China,under Grant No.4212010the National Natural Science Foundation of China,under Grant No.61971028。
文摘Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of network link states.To this end,this paper proposes a distributed asynchronous deep reinforcement learning framework to intensify the dynamics and prediction of adaptive packet scheduling.Our framework contains two parts:local asynchronous packet scheduling and distributed cooperative control center.In local asynchronous packet scheduling,an asynchronous prioritized replay double deep Q-learning packets scheduling algorithm is proposed for dynamic adaptive packet scheduling learning,which makes a combination of prioritized replay double deep Q-learning network(P-DDQN)to make the fitting analysis.In distributed cooperative control center,a distributed scheduling learning and neural fitting acceleration algorithm to adaptively update neural network parameters of P-DDQN for more precise packet scheduling.Experimental results show that our solution has a better performance than Random weight algorithm and Round-Robin algorithm in throughput and loss ratio.Further,our solution has 1.32 times and 1.54 times better than Random weight algorithm and Round-Robin algorithm on the stability of multipath data transmission,respectively.
文摘Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.