This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,inc...This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,including multi-party data intersection calculation,distributed machine learning,etc.It also compares performance differences,conducts formal verification,points out the value and limitations of architecture innovation,and looks forward to future opportunities.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
The rapid adoption of machine learning in sensitive domains,such as healthcare,finance,and government services,has heightened the need for robust,privacy-preserving techniques.Traditional machine learning approaches l...The rapid adoption of machine learning in sensitive domains,such as healthcare,finance,and government services,has heightened the need for robust,privacy-preserving techniques.Traditional machine learning approaches lack built-in privacy mechanisms,exposing sensitive data to risks,which motivates the development of Privacy-Preserving Machine Learning(PPML)methods.Despite significant advances in PPML,a comprehensive and focused exploration of Secure Multi-Party Computing(SMPC)within this context remains underdeveloped.This review aims to bridge this knowledge gap by systematically analyzing the role of SMPC in PPML,offering a structured overviewof current techniques,challenges,and future directions.Using a semi-systematicmapping studymethodology,this paper surveys recent literature spanning SMPC protocols,PPML frameworks,implementation approaches,threat models,and performance metrics.Emphasis is placed on identifying trends,technical limitations,and comparative strengths of leading SMPC-based methods.Our findings reveal thatwhile SMPCoffers strong cryptographic guarantees for privacy,challenges such as computational overhead,communication costs,and scalability persist.The paper also discusses critical vulnerabilities,practical deployment issues,and variations in protocol efficiency across use cases.展开更多
Rack-level loop thermosyphons have been widely adopted as a solution to data centers’growing energy demands.While numerous studies have highlighted the heat transfer performance and energy-saving benefits of this sys...Rack-level loop thermosyphons have been widely adopted as a solution to data centers’growing energy demands.While numerous studies have highlighted the heat transfer performance and energy-saving benefits of this system,its economic feasibility,water usage effectiveness(WUE),and carbon usage effectiveness(CUE)remain underexplored.This study introduces a comprehensive evaluation index designed to assess the applicability of the rack-level loop thermosyphon system across various computing hub nodes.The air wet bulb temperature Ta,w was identified as the most significant factor influencing the variability in the combination of PUE,CUE,and WUE values.The results indicate that the rack-level loop thermosyphon system achieves the highest score in Lanzhou(94.485)and the lowest in Beijing(89.261)based on the comprehensive evaluation index.The overall ranking of cities according to the comprehensive evaluation score is as follows:Gansu hub(Lanzhou)>Inner Mongolia hub(Hohhot)>Ningxia hub(Yinchuan)>Yangtze River Delta hub(Shanghai)>Chengdu Chongqing hub(Chongqing)>Guangdong-Hong Kong-Macao Greater Bay Area hub(Guangzhou)>Guizhou hub(Guiyang)>Beijing-Tianjin-Hebei hub(Beijing).Furthermore,Hohhot,Lanzhou,and Yinchuan consistently rank among the top three cities for comprehensive scores across all load rates,while Guiyang(at a 25%load rate),Guangzhou(at a 50%load rate),and Beijing(at 75%and 100%load rates)exhibited the lowest comprehensive scores.展开更多
With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for clou...With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.展开更多
The current education field is experiencing an innovation driven by big data and cloud technologies,and these advanced technologies play a central role in the construction of smart campuses.Big data technology has a w...The current education field is experiencing an innovation driven by big data and cloud technologies,and these advanced technologies play a central role in the construction of smart campuses.Big data technology has a wide range of applications in student learning behavior analysis,teaching resource management,campus safety monitoring,and decision support,which improves the quality of education and management efficiency.Cloud computing technology supports the integration,distribution,and optimal use of educational resources through cloud resource sharing,virtual classrooms,intelligent campus management systems,and Infrastructure-as-a-Service(IaaS)models,which reduce costs and increase flexibility.This paper comprehensively discusses the practical application of big data and cloud computing technologies in smart campuses,showing how these technologies can contribute to the development of smart campuses,and laying the foundation for the future innovation of education models.展开更多
The Internet of Things(IoT)has revolutionized how we interact with and gather data from our surrounding environment.IoT devices with various sensors and actuators generate vast amounts of data that can be harnessed to...The Internet of Things(IoT)has revolutionized how we interact with and gather data from our surrounding environment.IoT devices with various sensors and actuators generate vast amounts of data that can be harnessed to derive valuable insights.The rapid proliferation of Internet of Things(IoT)devices has ushered in an era of unprecedented data generation and connectivity.These IoT devices,equipped with many sensors and actuators,continuously produce vast volumes of data.However,the conventional approach of transmitting all this data to centralized cloud infrastructures for processing and analysis poses significant challenges.However,transmitting all this data to a centralized cloud infrastructure for processing and analysis can be inefficient and impractical due to bandwidth limitations,network latency,and scalability issues.This paper proposed a Self-Learning Internet Traffic Fuzzy Classifier(SLItFC)for traffic data analysis.The proposed techniques effectively utilize clustering and classification procedures to improve classification accuracy in analyzing network traffic data.SLItFC addresses the intricate task of efficiently managing and analyzing IoT data traffic at the edge.It employs a sophisticated combination of fuzzy clustering and self-learning techniques,allowing it to adapt and improve its classification accuracy over time.This adaptability is a crucial feature,given the dynamic nature of IoT environments where data patterns and traffic characteristics can evolve rapidly.With the implementation of the fuzzy classifier,the accuracy of the clustering process is improvised with the reduction of the computational time.SLItFC can reduce computational time while maintaining high classification accuracy.This efficiency is paramount in edge computing,where resource constraints demand streamlined data processing.Additionally,SLItFC’s performance advantages make it a compelling choice for organizations seeking to harness the potential of IoT data for real-time insights and decision-making.With the Self-Learning process,the SLItFC model monitors the network traffic data acquired from the IoT Devices.The Sugeno fuzzy model is implemented within the edge computing environment for improved classification accuracy.Simulation analysis stated that the proposed SLItFC achieves 94.5%classification accuracy with reduced classification time.展开更多
In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical D...In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.展开更多
To solve the lag problem of the traditional storage technology in mass data storage and management,the application platform is designed and built for big data on Hadoop and data warehouse integration platform,which en...To solve the lag problem of the traditional storage technology in mass data storage and management,the application platform is designed and built for big data on Hadoop and data warehouse integration platform,which ensured the convenience for the management and usage of data.In order to break through the master node system bottlenecks,a storage system with better performance is designed through introduction of cloud computing technology,which adopts the design of master-slave distribution patterns by the network access according to the recent principle.Thus the burden of single access the master node is reduced.Also file block update strategy and fault recovery mechanism are provided to solve the management bottleneck problem of traditional storage system on the data update and fault recovery and offer feasible technical solutions to storage management for big data.展开更多
With the development of Internet technology and human computing, the computing environment has changed dramatically over the last three decades. Cloud computing emerges as a paradigm of Internet computing in which dyn...With the development of Internet technology and human computing, the computing environment has changed dramatically over the last three decades. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtuMized resources are provided as services. With virtualization technology, cloud computing offers diverse services (such as virtual computing, virtual storage, virtual bandwidth, etc.) for the public by means of multi-tenancy mode. Although users are enjoying the capabilities of super-computing and mass storage supplied by cloud computing, cloud security still remains as a hot spot problem, which is in essence the trust management between data owners and storage service providers. In this paper, we propose a data coloring method based on cloud watermarking to recognize and ensure mutual reputations. The experimental results show that the robustness of reverse cloud generator can guarantee users' embedded social reputation identifications. Hence, our work provides a reference solution to the critical problem of cloud security.展开更多
Advanced cloud computing technology provides cost saving and flexibility of services for users.With the explosion of multimedia data,more and more data owners would outsource their personal multimedia data on the clou...Advanced cloud computing technology provides cost saving and flexibility of services for users.With the explosion of multimedia data,more and more data owners would outsource their personal multimedia data on the cloud.In the meantime,some computationally expensive tasks are also undertaken by cloud servers.However,the outsourced multimedia data and its applications may reveal the data owner’s private information because the data owners lose the control of their data.Recently,this thought has aroused new research interest on privacy-preserving reversible data hiding over outsourced multimedia data.In this paper,two reversible data hiding schemes are proposed for encrypted image data in cloud computing:reversible data hiding by homomorphic encryption and reversible data hiding in encrypted domain.The former is that additional bits are extracted after decryption and the latter is that extracted before decryption.Meanwhile,a combined scheme is also designed.This paper proposes the privacy-preserving outsourcing scheme of reversible data hiding over encrypted image data in cloud computing,which not only ensures multimedia data security without relying on the trustworthiness of cloud servers,but also guarantees that reversible data hiding can be operated over encrypted images at the different stages.Theoretical analysis confirms the correctness of the proposed encryption model and justifies the security of the proposed scheme.The computation cost of the proposed scheme is acceptable and adjusts to different security levels.展开更多
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V...The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.展开更多
Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is p...Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is proposed in order to integrate and share industrial big data with high raw data security and low network traffic loads by moving the integration task from the cloud to the edge of networks.First,a task flow graph(TFG)is designed to model the data analysis process.The TFG is composed of several tasks,which are executed by the data owners through the Fog-IBDIS platform in order to protect raw data privacy.Second,the function of Fog-IBDIS to enable data integration and sharing is presented in five modules:TFG management,compilation and running control,the data integration model,the basic algorithm library,and the management component.Finally,a case study is presented to illustrate the implementation of Fog-IBDIS,which ensures raw data security by deploying the analysis tasks executed by the data generators,and eases the network traffic load by greatly reducing the volume of transmitted data.展开更多
Efficient and effective data acquisition is of theoretical and practical importance in WSN applications because data measured and collected by WSN is often unreliable, such as those often accompanied by noise and erro...Efficient and effective data acquisition is of theoretical and practical importance in WSN applications because data measured and collected by WSN is often unreliable, such as those often accompanied by noise and error, missing values or inconsistent data. Motivated by fog computing, which focuses on how to effectively offload computation-intensive tasks from resource-constrained devices, this paper proposes a simple but yet effective data acquisition approach with the ability of filtering abnormal data and meeting the real-time requirement. Our method uses a cooperation mechanism by leveraging on both an architectural and algorithmic approach. Firstly, the sensor node with the limited computing resource only accomplishes detecting and marking the suspicious data using a light weight algorithm. Secondly, the cluster head evaluates suspicious data by referring to the data from the other sensor nodes in the same cluster and discard the abnormal data directly. Thirdly, the sink node fills up the discarded data with an approximate value using nearest neighbor data supplement method. Through the architecture, each node only consumes a few computational resources and distributes the heavily computing load to several nodes. Simulation results show that our data acquisition method is effective considering the real-time outlier filtering and the computing overhead.展开更多
Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute ...Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency.展开更多
Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industr...Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industry adoption and migration of traditional computing services to the cloud,one of the main challenges in cybersecurity is to provide mechanisms to secure these technologies.This work proposes a Data Security Framework for cloud computing services(CCS)that evaluates and improves CCS data security from a software engineering perspective by evaluating the levels of security within the cloud computing paradigm using engineering methods and techniques applied to CCS.This framework is developed by means of a methodology based on a heuristic theory that incorporates knowledge generated by existing works as well as the experience of their implementation.The paper presents the design details of the framework,which consists of three stages:identification of data security requirements,management of data security risks and evaluation of data security performance in CCS.展开更多
文摘This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,including multi-party data intersection calculation,distributed machine learning,etc.It also compares performance differences,conducts formal verification,points out the value and limitations of architecture innovation,and looks forward to future opportunities.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
文摘The rapid adoption of machine learning in sensitive domains,such as healthcare,finance,and government services,has heightened the need for robust,privacy-preserving techniques.Traditional machine learning approaches lack built-in privacy mechanisms,exposing sensitive data to risks,which motivates the development of Privacy-Preserving Machine Learning(PPML)methods.Despite significant advances in PPML,a comprehensive and focused exploration of Secure Multi-Party Computing(SMPC)within this context remains underdeveloped.This review aims to bridge this knowledge gap by systematically analyzing the role of SMPC in PPML,offering a structured overviewof current techniques,challenges,and future directions.Using a semi-systematicmapping studymethodology,this paper surveys recent literature spanning SMPC protocols,PPML frameworks,implementation approaches,threat models,and performance metrics.Emphasis is placed on identifying trends,technical limitations,and comparative strengths of leading SMPC-based methods.Our findings reveal thatwhile SMPCoffers strong cryptographic guarantees for privacy,challenges such as computational overhead,communication costs,and scalability persist.The paper also discusses critical vulnerabilities,practical deployment issues,and variations in protocol efficiency across use cases.
基金supported by the Natural Science Foundation of Hunan Province,China(Grant Nos.2023JJ50178 and 2023JJ50194)the Excellent Youth Project of Hunan Provincial Department of Education(Grant No.23B0542).
文摘Rack-level loop thermosyphons have been widely adopted as a solution to data centers’growing energy demands.While numerous studies have highlighted the heat transfer performance and energy-saving benefits of this system,its economic feasibility,water usage effectiveness(WUE),and carbon usage effectiveness(CUE)remain underexplored.This study introduces a comprehensive evaluation index designed to assess the applicability of the rack-level loop thermosyphon system across various computing hub nodes.The air wet bulb temperature Ta,w was identified as the most significant factor influencing the variability in the combination of PUE,CUE,and WUE values.The results indicate that the rack-level loop thermosyphon system achieves the highest score in Lanzhou(94.485)and the lowest in Beijing(89.261)based on the comprehensive evaluation index.The overall ranking of cities according to the comprehensive evaluation score is as follows:Gansu hub(Lanzhou)>Inner Mongolia hub(Hohhot)>Ningxia hub(Yinchuan)>Yangtze River Delta hub(Shanghai)>Chengdu Chongqing hub(Chongqing)>Guangdong-Hong Kong-Macao Greater Bay Area hub(Guangzhou)>Guizhou hub(Guiyang)>Beijing-Tianjin-Hebei hub(Beijing).Furthermore,Hohhot,Lanzhou,and Yinchuan consistently rank among the top three cities for comprehensive scores across all load rates,while Guiyang(at a 25%load rate),Guangzhou(at a 50%load rate),and Beijing(at 75%and 100%load rates)exhibited the lowest comprehensive scores.
基金sponsored by the National Natural Science Foundation of China under grant number No. 62172353, No. 62302114, No. U20B2046 and No. 62172115Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education No.1331007 and No. 1311022+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions Grant No. 17KJB520044Six Talent Peaks Project in Jiangsu Province No.XYDXX-108
文摘With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.
文摘The current education field is experiencing an innovation driven by big data and cloud technologies,and these advanced technologies play a central role in the construction of smart campuses.Big data technology has a wide range of applications in student learning behavior analysis,teaching resource management,campus safety monitoring,and decision support,which improves the quality of education and management efficiency.Cloud computing technology supports the integration,distribution,and optimal use of educational resources through cloud resource sharing,virtual classrooms,intelligent campus management systems,and Infrastructure-as-a-Service(IaaS)models,which reduce costs and increase flexibility.This paper comprehensively discusses the practical application of big data and cloud computing technologies in smart campuses,showing how these technologies can contribute to the development of smart campuses,and laying the foundation for the future innovation of education models.
基金This research is funded by 2023 Henan Province Science and Technology Research Projects:Key Technology of Rapid Urban Flood Forecasting Based onWater Level Feature Analysis and Spatio-Temporal Deep Learning(No.232102320015)Henan Provincial Higher Education Key Research Project Program(Project No.23B520024)a Multi-Sensor-Based Indoor Environmental Parameters Monitoring and Control System.
文摘The Internet of Things(IoT)has revolutionized how we interact with and gather data from our surrounding environment.IoT devices with various sensors and actuators generate vast amounts of data that can be harnessed to derive valuable insights.The rapid proliferation of Internet of Things(IoT)devices has ushered in an era of unprecedented data generation and connectivity.These IoT devices,equipped with many sensors and actuators,continuously produce vast volumes of data.However,the conventional approach of transmitting all this data to centralized cloud infrastructures for processing and analysis poses significant challenges.However,transmitting all this data to a centralized cloud infrastructure for processing and analysis can be inefficient and impractical due to bandwidth limitations,network latency,and scalability issues.This paper proposed a Self-Learning Internet Traffic Fuzzy Classifier(SLItFC)for traffic data analysis.The proposed techniques effectively utilize clustering and classification procedures to improve classification accuracy in analyzing network traffic data.SLItFC addresses the intricate task of efficiently managing and analyzing IoT data traffic at the edge.It employs a sophisticated combination of fuzzy clustering and self-learning techniques,allowing it to adapt and improve its classification accuracy over time.This adaptability is a crucial feature,given the dynamic nature of IoT environments where data patterns and traffic characteristics can evolve rapidly.With the implementation of the fuzzy classifier,the accuracy of the clustering process is improvised with the reduction of the computational time.SLItFC can reduce computational time while maintaining high classification accuracy.This efficiency is paramount in edge computing,where resource constraints demand streamlined data processing.Additionally,SLItFC’s performance advantages make it a compelling choice for organizations seeking to harness the potential of IoT data for real-time insights and decision-making.With the Self-Learning process,the SLItFC model monitors the network traffic data acquired from the IoT Devices.The Sugeno fuzzy model is implemented within the edge computing environment for improved classification accuracy.Simulation analysis stated that the proposed SLItFC achieves 94.5%classification accuracy with reduced classification time.
文摘In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.
文摘To solve the lag problem of the traditional storage technology in mass data storage and management,the application platform is designed and built for big data on Hadoop and data warehouse integration platform,which ensured the convenience for the management and usage of data.In order to break through the master node system bottlenecks,a storage system with better performance is designed through introduction of cloud computing technology,which adopts the design of master-slave distribution patterns by the network access according to the recent principle.Thus the burden of single access the master node is reduced.Also file block update strategy and fault recovery mechanism are provided to solve the management bottleneck problem of traditional storage system on the data update and fault recovery and offer feasible technical solutions to storage management for big data.
基金supported by National Basic Research Program of China (973 Program) (No. 2007CB310800)China Postdoctoral Science Foundation (No. 20090460107 and No. 201003794)
文摘With the development of Internet technology and human computing, the computing environment has changed dramatically over the last three decades. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtuMized resources are provided as services. With virtualization technology, cloud computing offers diverse services (such as virtual computing, virtual storage, virtual bandwidth, etc.) for the public by means of multi-tenancy mode. Although users are enjoying the capabilities of super-computing and mass storage supplied by cloud computing, cloud security still remains as a hot spot problem, which is in essence the trust management between data owners and storage service providers. In this paper, we propose a data coloring method based on cloud watermarking to recognize and ensure mutual reputations. The experimental results show that the robustness of reverse cloud generator can guarantee users' embedded social reputation identifications. Hence, our work provides a reference solution to the critical problem of cloud security.
基金This work was supported by the National Natural Science Foundation of China(No.61702276)the Startup Foundation for Introducing Talent of Nanjing University of Information Science and Technology under Grant 2016r055 and the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institutions.The authors are grateful for the anonymous reviewers who made constructive comments and improvements.
文摘Advanced cloud computing technology provides cost saving and flexibility of services for users.With the explosion of multimedia data,more and more data owners would outsource their personal multimedia data on the cloud.In the meantime,some computationally expensive tasks are also undertaken by cloud servers.However,the outsourced multimedia data and its applications may reveal the data owner’s private information because the data owners lose the control of their data.Recently,this thought has aroused new research interest on privacy-preserving reversible data hiding over outsourced multimedia data.In this paper,two reversible data hiding schemes are proposed for encrypted image data in cloud computing:reversible data hiding by homomorphic encryption and reversible data hiding in encrypted domain.The former is that additional bits are extracted after decryption and the latter is that extracted before decryption.Meanwhile,a combined scheme is also designed.This paper proposes the privacy-preserving outsourcing scheme of reversible data hiding over encrypted image data in cloud computing,which not only ensures multimedia data security without relying on the trustworthiness of cloud servers,but also guarantees that reversible data hiding can be operated over encrypted images at the different stages.Theoretical analysis confirms the correctness of the proposed encryption model and justifies the security of the proposed scheme.The computation cost of the proposed scheme is acceptable and adjusts to different security levels.
基金supported by the National Natural Science Eoundation of China under Grant No.40221503the China National Key Programme for Development Basic Sciences (Abbreviation:973 Project,Grant No.G1999032801)
文摘The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.
基金This work was supported in part by the National Natural Science Foundation of China(51435009)Shanghai Sailing Program(19YF1401500)the Fundamental Research Funds for the Central Universities(2232019D3-34).
文摘Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is proposed in order to integrate and share industrial big data with high raw data security and low network traffic loads by moving the integration task from the cloud to the edge of networks.First,a task flow graph(TFG)is designed to model the data analysis process.The TFG is composed of several tasks,which are executed by the data owners through the Fog-IBDIS platform in order to protect raw data privacy.Second,the function of Fog-IBDIS to enable data integration and sharing is presented in five modules:TFG management,compilation and running control,the data integration model,the basic algorithm library,and the management component.Finally,a case study is presented to illustrate the implementation of Fog-IBDIS,which ensures raw data security by deploying the analysis tasks executed by the data generators,and eases the network traffic load by greatly reducing the volume of transmitted data.
基金supported by National Natural Science Foundation of China, "Research on Accurate and Fair Service Recommendation Approach in Mobile Internet Environment", (No. 61571066)
文摘Efficient and effective data acquisition is of theoretical and practical importance in WSN applications because data measured and collected by WSN is often unreliable, such as those often accompanied by noise and error, missing values or inconsistent data. Motivated by fog computing, which focuses on how to effectively offload computation-intensive tasks from resource-constrained devices, this paper proposes a simple but yet effective data acquisition approach with the ability of filtering abnormal data and meeting the real-time requirement. Our method uses a cooperation mechanism by leveraging on both an architectural and algorithmic approach. Firstly, the sensor node with the limited computing resource only accomplishes detecting and marking the suspicious data using a light weight algorithm. Secondly, the cluster head evaluates suspicious data by referring to the data from the other sensor nodes in the same cluster and discard the abnormal data directly. Thirdly, the sink node fills up the discarded data with an approximate value using nearest neighbor data supplement method. Through the architecture, each node only consumes a few computational resources and distributes the heavily computing load to several nodes. Simulation results show that our data acquisition method is effective considering the real-time outlier filtering and the computing overhead.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.30919011282.
文摘Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency.
文摘Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industry adoption and migration of traditional computing services to the cloud,one of the main challenges in cybersecurity is to provide mechanisms to secure these technologies.This work proposes a Data Security Framework for cloud computing services(CCS)that evaluates and improves CCS data security from a software engineering perspective by evaluating the levels of security within the cloud computing paradigm using engineering methods and techniques applied to CCS.This framework is developed by means of a methodology based on a heuristic theory that incorporates knowledge generated by existing works as well as the experience of their implementation.The paper presents the design details of the framework,which consists of three stages:identification of data security requirements,management of data security risks and evaluation of data security performance in CCS.