This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this resear...This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.展开更多
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data...Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.展开更多
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel...HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.展开更多
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat...Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.展开更多
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new genera...Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.展开更多
With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is d...With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is directly related to the future development and investment direction of rural power enterprises.At present,the evaluation of the production and operation of rural network enterprises and the development status of power network only relies on the experience of the evaluation personnel,sets the reference index,and forms the evaluation results through artificial scoring.Due to the strong subjective consciousness of the evaluation results,the practical guiding significance is weak.Therefore,distributed data mining method in rural power enterprises status evaluation was proposed which had been applied in many fields,such as food science,economy or chemical industry.The distributed mathematical model was established by using principal component analysis(PCA)and regression analysis.By screening various technical indicators and determining their relevance,the reference value of evaluation results was improved.Combined with statistical program for social sciences(SPSS)data analysis software,the operation status of rural network enterprises was evaluated,and the rationality,effectiveness and economy of the evaluation was verified through comparison with current evaluation results and calculation examples of actual grid operation data.展开更多
Increasing global competition forces manufacturers of products from alltechnical fields to guarantee a high product quality for a long period of time. At thesame time it is necessary to minimize production costs. In o...Increasing global competition forces manufacturers of products from alltechnical fields to guarantee a high product quality for a long period of time. At thesame time it is necessary to minimize production costs. In order to meet all theserequirements, on-line data acquisition and processing are of increasing importancein distributed automation systems. A software bus operating on industrial Ethernethas an ability to minimize operating costs by offering easy installation, scalability,high degree of reliability and remote monitoring and control.展开更多
This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machin...This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.展开更多
This study investigates a consistent fusion algorithm for distributed multi-rate multi-sensor systems operating in feedback-memory configurations, where each sensor's sampling period is uniform and an integer mult...This study investigates a consistent fusion algorithm for distributed multi-rate multi-sensor systems operating in feedback-memory configurations, where each sensor's sampling period is uniform and an integer multiple of the state update period. The focus is on scenarios where the correlations among Measurement Noises(MNs) from different sensors are unknown. Firstly, a non-augmented local estimator that applies to sampling cases is designed to provide unbiased Local Estimates(LEs) at the fusion points. Subsequently, a measurement-equivalent approach is then developed to parameterize the correlation structure between LEs and reformulate LEs into a unified form, thereby constraining the correlations arising from MNs to an admissible range. Simultaneously, a family of upper bounds on the joint error covariance matrix of LEs is derived based on the constrained correlations, avoiding the need to calculate the exact error cross-covariance matrix of LEs. Finally, a sequential fusion estimator is proposed in the sense of Weighted Minimum Mean Square Error(WMMSE), and it is proven to be unbiased, consistent, and more accurate than the well-known covariance intersection method. Simulation results illustrate the effectiveness of the proposed algorithm by highlighting improvements in consistency and accuracy.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are ca...The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key de...This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.展开更多
With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become ...With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.展开更多
In the era of big data,the growing number of real-time data streams often contains a lot of sensitive privacy information.Releasing or sharing this data directly without processing will lead to serious privacy informa...In the era of big data,the growing number of real-time data streams often contains a lot of sensitive privacy information.Releasing or sharing this data directly without processing will lead to serious privacy information leakage.This poses a great challenge to conventional privacy protection mechanisms(CPPM).The existing data partitioning methods ignore the number of data replications and information exchanges,resulting in complex distance calculations and inefficient indexing for high-dimensional data.Therefore,CPPM often fails to meet the stringent requirements of efficiency and reliability,especially in dynamic spatiotemporal environments.Addressing this concern,we proposed the Principal Component Enhanced Vantage-point tree(PEV-Tree),which is an enhanced data structure based on the idea of dimension reduction,and constructed a Distributed Spatio-Temporal Privacy Preservation Mechanism(DST-PPM)on it.In this work,principal component analysis and the vantage tree are used to establish the PEV-Tree.In addition,we designed three distributed anonymization algorithms for data streams.These algorithms are named CK-AA,CL-DA,and CT-CA,fulfill the anonymization rules of K-Anonymity,L-Diversity,and T-Closeness,respectively,which have different computational complexities and reliabilities.The higher the complexity,the lower the risk of privacy leakage.DST-PPM can reduce the dimension of high-dimensional information while preserving data characteristics and dividing the data space into vantage points based on distance.It effectively enhances the data processing workflow and increases algorithmefficiency.To verify the validity of the method in this paper,we conducted empirical tests of CK-AA,CL-DA,and CT-CA on conventional datasets and the PEV-Tree,respectively.Based on the big data background of the Internet of Vehicles,we conducted experiments using artificial simulated on-board network data.The results demonstrated that the operational efficiency of the CK-AA,CL-DA,and CT-CA is enhanced by 15.12%,24.55%,and 52.74%,respectively,when deployed on the PEV-Tree.Simultaneously,during homogeneity attacks,the probabilities of information leakage were reduced by 2.31%,1.76%,and 0.19%,respectively.Furthermore,these algorithms showcased superior utility(scalability)when executed across PEV-Trees of varying scales in comparison to their performance on conventional data structures.It indicates that DST-PPM offers marked advantages over CPPM in terms of efficiency,reliability,and scalability.展开更多
Given the rapid development of advanced information systems,microgrids(MGs)suffer from more potential attacks that affect their operational performance.Conventional distributed secondary control with a small,fixed sam...Given the rapid development of advanced information systems,microgrids(MGs)suffer from more potential attacks that affect their operational performance.Conventional distributed secondary control with a small,fixed sampling time period inevitably causes the wasteful use of communication resources.This paper proposes a self-triggered secondary control scheme under perturbations from false data injection(FDI)attacks.We designed a linear clock for each DG to trigger its controller at aperiodic and intermittent instants.Sub-sequently,a hash-based defense mechanism(HDM)is designed for detecting and eliminating malicious data infiltrated in the MGs.With the aid of HDM,a self-triggered control scheme achieves the secondary control objectives even in the presence of FDI attacks.Rigorous theoretical analyses and simulation results indicate that the introduced secondary control scheme significantly reduces communication costs and enhances the resilience of MGs under FDI attacks.展开更多
Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it ...Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it often lacks the flexibility needed to capture complex hazard patterns.In this article,we propose a novel extension of the classical log-logistic distribution,termed the new exponential log-logistic(NExLL)distribution,designed to provide enhanced flexibility in modeling time-to-event data with complex failure behaviors.The NExLL model incorporates a new exponential generator to expand the shape adaptability of the baseline log-logistic distribution,allowing it to capture a wide range of hazard rate shapes,including increasing,decreasing,J-shaped,reversed J-shaped,modified bathtub,and unimodal forms.A key feature of the NExLL distribution is its formulation as a mixture of log-logistic densities,offering both symmetric and asymmetric patterns suitable for diverse real-world reliability scenarios.We establish several theoretical properties of the model,including closed-form expressions for its probability density function,cumulative distribution function,moments,hazard rate function,and quantiles.Parameter estimation is performed using seven classical estimation techniques,with extensive Monte Carlo simulations used to evaluate and compare their performance under various conditions.The practical utility and flexibility of the proposed model are illustrated using two real-world datasets from reliability and engineering applications,where the NExLL model demonstrates superior fit and predictive performance compared to existing log-logistic-basedmodels.This contribution advances the toolbox of parametric survivalmodels,offering a robust alternative formodeling complex aging and failure patterns in reliability,engineering,and other applied domains.展开更多
Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed s...Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.展开更多
基金the Researchers Supporting Project,King Saud University,Saudi Arabia,for funding this research work through Project No.RSPD2025R951.
文摘This research introduces a unique approach to segmenting breast cancer images using a U-Net-based architecture.However,the computational demand for image processing is very high.Therefore,we have conducted this research to build a system that enables image segmentation training with low-power machines.To accomplish this,all data are divided into several segments,each being trained separately.In the case of prediction,the initial output is predicted from each trained model for an input,where the ultimate output is selected based on the pixel-wise majority voting of the expected outputs,which also ensures data privacy.In addition,this kind of distributed training system allows different computers to be used simultaneously.That is how the training process takes comparatively less time than typical training approaches.Even after completing the training,the proposed prediction system allows a newly trained model to be included in the system.Thus,the prediction is consistently more accurate.We evaluated the effectiveness of the ultimate output based on four performance matrices:average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy.The experimental results show that the scores of average pixel accuracy,mean absolute error,average specificity,and average balanced accuracy are 0.9216,0.0687,0.9477,and 0.8674,respectively.In addition,the proposed method was compared with four other state-of-the-art models in terms of total training time and usage of computational resources.And it outperformed all of them in these aspects.
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
文摘Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.
基金The project supported by the Meg-science Engineering Project of the Chinese Academy of Sciences
文摘HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.
基金Supported by National Natural Science Foundation of China (No. 50475117)Tianjin Natural Science Foundation (No.06YFJMJC03700).
文摘Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.
文摘Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.
基金Supported by Funding(2017RAXXJ075)from Harbin Applied Technology Research and Development Project
文摘With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is directly related to the future development and investment direction of rural power enterprises.At present,the evaluation of the production and operation of rural network enterprises and the development status of power network only relies on the experience of the evaluation personnel,sets the reference index,and forms the evaluation results through artificial scoring.Due to the strong subjective consciousness of the evaluation results,the practical guiding significance is weak.Therefore,distributed data mining method in rural power enterprises status evaluation was proposed which had been applied in many fields,such as food science,economy or chemical industry.The distributed mathematical model was established by using principal component analysis(PCA)and regression analysis.By screening various technical indicators and determining their relevance,the reference value of evaluation results was improved.Combined with statistical program for social sciences(SPSS)data analysis software,the operation status of rural network enterprises was evaluated,and the rationality,effectiveness and economy of the evaluation was verified through comparison with current evaluation results and calculation examples of actual grid operation data.
文摘Increasing global competition forces manufacturers of products from alltechnical fields to guarantee a high product quality for a long period of time. At thesame time it is necessary to minimize production costs. In order to meet all theserequirements, on-line data acquisition and processing are of increasing importancein distributed automation systems. A software bus operating on industrial Ethernethas an ability to minimize operating costs by offering easy installation, scalability,high degree of reliability and remote monitoring and control.
文摘This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.
基金supported by the National Natural Science Foundation of China (Nos. 62276204, 62203343)。
文摘This study investigates a consistent fusion algorithm for distributed multi-rate multi-sensor systems operating in feedback-memory configurations, where each sensor's sampling period is uniform and an integer multiple of the state update period. The focus is on scenarios where the correlations among Measurement Noises(MNs) from different sensors are unknown. Firstly, a non-augmented local estimator that applies to sampling cases is designed to provide unbiased Local Estimates(LEs) at the fusion points. Subsequently, a measurement-equivalent approach is then developed to parameterize the correlation structure between LEs and reformulate LEs into a unified form, thereby constraining the correlations arising from MNs to an admissible range. Simultaneously, a family of upper bounds on the joint error covariance matrix of LEs is derived based on the constrained correlations, avoiding the need to calculate the exact error cross-covariance matrix of LEs. Finally, a sequential fusion estimator is proposed in the sense of Weighted Minimum Mean Square Error(WMMSE), and it is proven to be unbiased, consistent, and more accurate than the well-known covariance intersection method. Simulation results illustrate the effectiveness of the proposed algorithm by highlighting improvements in consistency and accuracy.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金supported in part by the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2022C03174)the National Natural Science Foundation of China(No.92067103)+4 种基金the Key Research and Development Program of Shaanxi,China(No.2021ZDLGY06-02)the Natural Science Foundation of Shaanxi Province(No.2019ZDLGY12-02)the Shaanxi Innovation Team Project(No.2018TD-007)the Xi'an Science and technology Innovation Plan(No.201809168CX9JC10)the Fundamental Research Funds for the Central Universities(No.YJS2212)and National 111 Program of China B16037.
文摘The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
基金supported by Poongsan-KAIST Future Research Center Projectthe fund support provided by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Grant No.2023R1A2C2005661)。
文摘This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.
基金supported by the National Natural Science Foundation of China(Grant No.62102449)the Central Plains Talent Program under Grant No.224200510003.
文摘With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.
基金supported by the Natural Science Foundation of Sichuan Province(No.2024NSFSC1450)the Fundamental Research Funds for the Central Universities(No.SCU2024D012)the Science and Engineering Connotation Development Project of Sichuan University(No.2020SCUNG129).
文摘In the era of big data,the growing number of real-time data streams often contains a lot of sensitive privacy information.Releasing or sharing this data directly without processing will lead to serious privacy information leakage.This poses a great challenge to conventional privacy protection mechanisms(CPPM).The existing data partitioning methods ignore the number of data replications and information exchanges,resulting in complex distance calculations and inefficient indexing for high-dimensional data.Therefore,CPPM often fails to meet the stringent requirements of efficiency and reliability,especially in dynamic spatiotemporal environments.Addressing this concern,we proposed the Principal Component Enhanced Vantage-point tree(PEV-Tree),which is an enhanced data structure based on the idea of dimension reduction,and constructed a Distributed Spatio-Temporal Privacy Preservation Mechanism(DST-PPM)on it.In this work,principal component analysis and the vantage tree are used to establish the PEV-Tree.In addition,we designed three distributed anonymization algorithms for data streams.These algorithms are named CK-AA,CL-DA,and CT-CA,fulfill the anonymization rules of K-Anonymity,L-Diversity,and T-Closeness,respectively,which have different computational complexities and reliabilities.The higher the complexity,the lower the risk of privacy leakage.DST-PPM can reduce the dimension of high-dimensional information while preserving data characteristics and dividing the data space into vantage points based on distance.It effectively enhances the data processing workflow and increases algorithmefficiency.To verify the validity of the method in this paper,we conducted empirical tests of CK-AA,CL-DA,and CT-CA on conventional datasets and the PEV-Tree,respectively.Based on the big data background of the Internet of Vehicles,we conducted experiments using artificial simulated on-board network data.The results demonstrated that the operational efficiency of the CK-AA,CL-DA,and CT-CA is enhanced by 15.12%,24.55%,and 52.74%,respectively,when deployed on the PEV-Tree.Simultaneously,during homogeneity attacks,the probabilities of information leakage were reduced by 2.31%,1.76%,and 0.19%,respectively.Furthermore,these algorithms showcased superior utility(scalability)when executed across PEV-Trees of varying scales in comparison to their performance on conventional data structures.It indicates that DST-PPM offers marked advantages over CPPM in terms of efficiency,reliability,and scalability.
基金supported by Hainan Provincial Natural Science Foundation of China(No.524RC532)Research Startup Funding from Hainan Institute of Zhejiang University(No.0210-6602-A12202)Project of Sanya Yazhou Bay Science and Technology City(No.SKJC-2022-PTDX-009/010/011).
文摘Given the rapid development of advanced information systems,microgrids(MGs)suffer from more potential attacks that affect their operational performance.Conventional distributed secondary control with a small,fixed sampling time period inevitably causes the wasteful use of communication resources.This paper proposes a self-triggered secondary control scheme under perturbations from false data injection(FDI)attacks.We designed a linear clock for each DG to trigger its controller at aperiodic and intermittent instants.Sub-sequently,a hash-based defense mechanism(HDM)is designed for detecting and eliminating malicious data infiltrated in the MGs.With the aid of HDM,a self-triggered control scheme achieves the secondary control objectives even in the presence of FDI attacks.Rigorous theoretical analyses and simulation results indicate that the introduced secondary control scheme significantly reduces communication costs and enhances the resilience of MGs under FDI attacks.
文摘Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it often lacks the flexibility needed to capture complex hazard patterns.In this article,we propose a novel extension of the classical log-logistic distribution,termed the new exponential log-logistic(NExLL)distribution,designed to provide enhanced flexibility in modeling time-to-event data with complex failure behaviors.The NExLL model incorporates a new exponential generator to expand the shape adaptability of the baseline log-logistic distribution,allowing it to capture a wide range of hazard rate shapes,including increasing,decreasing,J-shaped,reversed J-shaped,modified bathtub,and unimodal forms.A key feature of the NExLL distribution is its formulation as a mixture of log-logistic densities,offering both symmetric and asymmetric patterns suitable for diverse real-world reliability scenarios.We establish several theoretical properties of the model,including closed-form expressions for its probability density function,cumulative distribution function,moments,hazard rate function,and quantiles.Parameter estimation is performed using seven classical estimation techniques,with extensive Monte Carlo simulations used to evaluate and compare their performance under various conditions.The practical utility and flexibility of the proposed model are illustrated using two real-world datasets from reliability and engineering applications,where the NExLL model demonstrates superior fit and predictive performance compared to existing log-logistic-basedmodels.This contribution advances the toolbox of parametric survivalmodels,offering a robust alternative formodeling complex aging and failure patterns in reliability,engineering,and other applied domains.
基金Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068)the New Century Excel-lent Talents in University of China (No. NCET-06-0393)
文摘Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.