In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have t...In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have to analyze complex and distributed Big Data(BD)processing cluster frameworks,which are adopted to manage yottabyte of unstructured sensitive data.For instance,Big Data systems’privacy and security restrictions are most likely to failure due to the malformed AC policy configurations.Furthermore,BD systems were initially developed toped to take care of some of the DB issues to address BD challenges and many of these dealt with the“three Vs”(Velocity,Volume,and Variety)attributes,without planning security consideration,which are considered to be patch work.Some of the BD“three Vs”characteristics,such as distributed computing,fragment,redundant data and node-to node communication,each with its own security challenges,complicate even more the applicability of AC in BD.This paper gives an overview of the latest security and privacy challenges in BD AC systems.Furthermore,it analyzes and compares some of the latest AC research frameworks to reduce privacy and security issues in distributed BD systems,which very few enforce AC in a cost-effective and in a timely manner.Moreover,this work discusses some of the future research methodologies and improvements for BD AC systems.This study is valuable asset for Artificial Intelligence(AI)researchers,DB developers and DB analysts who need the latest AC security and privacy research perspective before using and/or improving a current BD AC framework.展开更多
Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system...Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.展开更多
Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can han...Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach:Considering that tensor addition increases the size of a given tensor along all axes,the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors.Additionally,In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings:The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition.Consequently,the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations:There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods.However,the former require longer iteration time,and therefore their execution time cannot be compared with that of Spark-based algorithms,whereas the latter run on a single machine,thus limiting their ability to handle large data.Practical implications:The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.Originality/value:The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark.Moreover,In Par Ten2 can handle static as well as incremental tensor decomposition.展开更多
With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big D...With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
As Internet ofThings(IoT)technologies continue to evolve at an unprecedented pace,intelligent big data control and information systems have become critical enablers for organizational digital transformation,facilitati...As Internet ofThings(IoT)technologies continue to evolve at an unprecedented pace,intelligent big data control and information systems have become critical enablers for organizational digital transformation,facilitating data-driven decision making,fostering innovation ecosystems,and maintaining operational stability.In this study,we propose an advanced deployment algorithm for Service Function Chaining(SFC)that leverages an enhanced Practical Byzantine Fault Tolerance(PBFT)mechanism.The main goal is to tackle the issues of security and resource efficiency in SFC implementation across diverse network settings.By integrating blockchain technology and Deep Reinforcement Learning(DRL),our algorithm not only optimizes resource utilization and quality of service but also ensures robust security during SFC deployment.Specifically,the enhanced PBFT consensus mechanism(VRPBFT)significantly reduces consensus latency and improves Byzantine node detection through the introduction of a Verifiable Random Function(VRF)and a node reputation grading model.Experimental results demonstrate that compared to traditional PBFT,the proposed VRPBFT algorithm reduces consensus latency by approximately 30%and decreases the proportion of Byzantine nodes by 40%after 100 rounds of consensus.Furthermore,the DRL-based SFC deployment algorithm(SDRL)exhibits rapid convergence during training,with improvements in long-term average revenue,request acceptance rate,and revenue/cost ratio of 17%,14.49%,and 20.35%,respectively,over existing algorithms.Additionally,the CPU resource utilization of the SDRL algorithmreaches up to 42%,which is 27.96%higher than other algorithms.These findings indicate that the proposed algorithm substantially enhances resource utilization efficiency,service quality,and security in SFC deployment.展开更多
With the advancement of the rural revitalization strategy,preventing poverty recurrence among previously impoverished populations has become a crucial social concern.The application of big data technology in poverty r...With the advancement of the rural revitalization strategy,preventing poverty recurrence among previously impoverished populations has become a crucial social concern.The application of big data technology in poverty recurrence monitoring and agricultural product sales systems can effectively enhance precise identification and early warning capabilities,promoting the sustainable development of rural economies.This paper explores the application of big data technology in poverty recurrence monitoring,analyzes its innovative integration with agricultural product sales systems,and proposes an intelligent monitoring and sales platform model based on big data,aiming to provide a reference for relevant policy formulation.展开更多
In the contemporary era,characterized by the Internet and digitalization as fundamental features,the operation and application of digital currency have gradually developed into a comprehensive structural system.This s...In the contemporary era,characterized by the Internet and digitalization as fundamental features,the operation and application of digital currency have gradually developed into a comprehensive structural system.This system restores the essential characteristics of currency while providing auxiliary services related to the formation,circulation,storage,application,and promotion of digital currency.Compared to traditional currency management technologies,big data analysis technology,which is primarily embedded in digital currency systems,enables the rapid acquisition of information.This facilitates the identification of standard associations within currency data and provides technical support for the operational framework of digital currency.展开更多
With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the...With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.展开更多
The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This...The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This era integrates biotechnology,artificial intelligence(AI),and big data information technology.In contrast,China is still in a transition period between stages 2.0 and 3.0,which primarily relies on conventional selection and molecular breeding.In the context of increasingly complex international situations,accurately identifying core issues in China's seed industry innovation and seizing the frontier of international seed technology are strategically important.These efforts are essential for ensuring food security and revitalizing the seed industry.This paper systematically analyzes the characteristics of crop breeding data from artificial selection to intelligent design breeding.It explores the applications and development trends of AI and big data in modern crop breeding from several key perspectives.These include highthroughput phenotype acquisition and analysis,multiomics big data database and management system construction,AI-based multiomics integrated analysis,and the development of intelligent breeding software tools based on biological big data and AI technology.Based on an in-depth analysis of the current status and challenges of China's seed industry technology development,we propose strategic goals and key tasks for China's new generation of AI and big data-driven intelligent design breeding.These suggestions aim to accelerate the development of an intelligent-driven crop breeding engineering system that features large-scale gene mining,efficient gene manipulation,engineered variety design,and systematized biobreeding.This study provides a theoretical basis and practical guidance for the development of China's seed industry technology.展开更多
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model...This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model follows a“three-stage”and“two-subject”framework,incorporating a structured design for teaching content and assessment methods before,during,and after class.Practical results indicate that this approach significantly enhances teaching effectiveness and improves students’learning autonomy.展开更多
Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tecto...Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.展开更多
Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of th...Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big da...The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.展开更多
The system of hot metal quality monitoring was established based on big data and machine learning using the real-time production data of a steel enterprise in China.A working method that combines big data technology w...The system of hot metal quality monitoring was established based on big data and machine learning using the real-time production data of a steel enterprise in China.A working method that combines big data technology with process theory was proposed for the characteristics of blast furnace production data.After the data have been comprehensively processed,the independent variables that affect the target parameters are selected by using the method of multivariate feature selection.The use of this method not only ensures the interpretability of the input variables,but also improves the accuracy of the machine learning process and is more easily accepted by enterprises.For timely guidance on production,specific evaluation rules are established for the key quality that affects the quality of hot metal on the basis of completed predictions work and uses computer technology to build a quality monitoring system for hot metal.The online results show that the hot metal quality monitoring system established by relying on big data and machine learning operates stably on site,and has good guiding significance for production.展开更多
"Big data" which admittedly means many things to many people is no longer confined to the realm of technology. Today it is a business imperative and is providing solutions to long-standing business challenge..."Big data" which admittedly means many things to many people is no longer confined to the realm of technology. Today it is a business imperative and is providing solutions to long-standing business challenges for banking and financial markets companies around the world. Financial services firms are leveraging big data to transform their processes, their organizations and the entire industry. Since 2012, the term "big data" has frequently been mentioned and used to describe and define the huge amount of data in the information explosive era and to name related technological development and innovation. As to the police work, the coming of big data era is not only a challenge but also an opportunity. Police agencies should go with the tide of development to start with such aspects as work thinking, top design, public information sharing and application and talent provision so as to promote the new development and progress of police work. This paper expounds the practical effect and significance of police big data application by cases happened in some areas.展开更多
This work surveys and illustrates multiple open challenges in the field of industrial Internet of Things(IoT)-based big data management and analysis in cloud environments.Challenges arising from the fields of machine ...This work surveys and illustrates multiple open challenges in the field of industrial Internet of Things(IoT)-based big data management and analysis in cloud environments.Challenges arising from the fields of machine learning in cloud infrastructures,artificial intelligence techniques for big data analytics in cloud environments,and federated learning cloud systems are elucidated.Additionally,reinforcement learning,which is a novel technique that allows large cloud-based data centers,to allocate more energy-efficient resources is examined.Moreover,we propose an architecture that attempts to combine the features offered by several cloud providers to achieve an energy-efficient industrial IoT-based big data management framework(EEIBDM)established outside of every user in the cloud.IoT data can be integrated with techniques such as reinforcement and federated learning to achieve a digital twin scenario for the virtual representation of industrial IoT-based big data of machines and room tem-peratures.Furthermore,we propose an algorithm for determining the energy consumption of the infrastructure by evaluating the EEIBDM framework.Finally,future directions for the expansion of this research are discussed.展开更多
This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South K...This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South Korea,detailed procedures of the methodology were proposed and its scalability was confirmed.To analyze the risk from a more detailed and microscopic viewpoint,vessel routes as hazard sources were delineated on the basis of automated identification system(AIS)big data.The outliers and errors of AIS big data were removed using the density-based spatial clustering of applications with noise algorithm,and a marine traffic density map was evaluated by combining all of the gridded routes.Vulnerability of marine environment was identified on the basis of the sensitive resource map constructed by the Korea Coast Guard in a similar manner to the National Oceanic and Atmospheric Administration environmental sensitivity index approach.In this study,aquaculture sites,water intake facilities of power plants,and beach/resort areas were selected as representative indicators for each category.The vulnerability values of neighboring cells decreased according to the Euclidean distance from the resource cells.Two resulting maps were aggregated to construct a final sensitive resource and traffic density(SRTD)risk analysis map of the Busan–Ulsan sea areas.We confirmed the effectiveness of SRTD risk analysis by comparing it with the actual marine spill accident records.Results show that all of the marine spill accidents in 2018 occurred within 2 km of high-risk cells(level 6 and above).Thus,if accident management and monitoring capabilities are concentrated on high-risk cells,which account for only 6.45%of the total study area,then it is expected that it will be possible to cope with most marine spill accidents effectively.展开更多
文摘In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have to analyze complex and distributed Big Data(BD)processing cluster frameworks,which are adopted to manage yottabyte of unstructured sensitive data.For instance,Big Data systems’privacy and security restrictions are most likely to failure due to the malformed AC policy configurations.Furthermore,BD systems were initially developed toped to take care of some of the DB issues to address BD challenges and many of these dealt with the“three Vs”(Velocity,Volume,and Variety)attributes,without planning security consideration,which are considered to be patch work.Some of the BD“three Vs”characteristics,such as distributed computing,fragment,redundant data and node-to node communication,each with its own security challenges,complicate even more the applicability of AC in BD.This paper gives an overview of the latest security and privacy challenges in BD AC systems.Furthermore,it analyzes and compares some of the latest AC research frameworks to reduce privacy and security issues in distributed BD systems,which very few enforce AC in a cost-effective and in a timely manner.Moreover,this work discusses some of the future research methodologies and improvements for BD AC systems.This study is valuable asset for Artificial Intelligence(AI)researchers,DB developers and DB analysts who need the latest AC security and privacy research perspective before using and/or improving a current BD AC framework.
基金partially supported by grants from the China 863 High-tech Program (Grant No. 2015AA016002)the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20131103120001)+2 种基金the National Key Research and Development Program of China (Grant No. 2016YFB0800204)the National Science Foundation of China (No. 61502017)the Scientific Research Common Program of Beijing Municipal Commission of Education (KM201710005024)
文摘Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2016R1D1A1B03931529)。
文摘Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach:Considering that tensor addition increases the size of a given tensor along all axes,the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors.Additionally,In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings:The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition.Consequently,the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations:There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods.However,the former require longer iteration time,and therefore their execution time cannot be compared with that of Spark-based algorithms,whereas the latter run on a single machine,thus limiting their ability to handle large data.Practical implications:The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.Originality/value:The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark.Moreover,In Par Ten2 can handle static as well as incremental tensor decomposition.
基金supported by the Research Fund of Tencent Computer System Co.Ltd.under Grant No.170125
文摘With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
基金supported by the National Natural Science Foundation of China under Grant 62471493 and 62402257partially supported by the Natural Science Foundation of Shandong Province under Grant ZR2023LZH017,ZR2024MF066 and 2023QF025+2 种基金partially supported by the Open Research Subject of State Key Laboratory of Intelligent Game(No.ZBKF-24-12)partially supported by the Foundation of Key Laboratory of Education Informatization for Nationalities(Yunnan Normal University),the Ministry of Education(No.EIN2024C006)partially supported by the Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE(No.202306).
文摘As Internet ofThings(IoT)technologies continue to evolve at an unprecedented pace,intelligent big data control and information systems have become critical enablers for organizational digital transformation,facilitating data-driven decision making,fostering innovation ecosystems,and maintaining operational stability.In this study,we propose an advanced deployment algorithm for Service Function Chaining(SFC)that leverages an enhanced Practical Byzantine Fault Tolerance(PBFT)mechanism.The main goal is to tackle the issues of security and resource efficiency in SFC implementation across diverse network settings.By integrating blockchain technology and Deep Reinforcement Learning(DRL),our algorithm not only optimizes resource utilization and quality of service but also ensures robust security during SFC deployment.Specifically,the enhanced PBFT consensus mechanism(VRPBFT)significantly reduces consensus latency and improves Byzantine node detection through the introduction of a Verifiable Random Function(VRF)and a node reputation grading model.Experimental results demonstrate that compared to traditional PBFT,the proposed VRPBFT algorithm reduces consensus latency by approximately 30%and decreases the proportion of Byzantine nodes by 40%after 100 rounds of consensus.Furthermore,the DRL-based SFC deployment algorithm(SDRL)exhibits rapid convergence during training,with improvements in long-term average revenue,request acceptance rate,and revenue/cost ratio of 17%,14.49%,and 20.35%,respectively,over existing algorithms.Additionally,the CPU resource utilization of the SDRL algorithmreaches up to 42%,which is 27.96%higher than other algorithms.These findings indicate that the proposed algorithm substantially enhances resource utilization efficiency,service quality,and security in SFC deployment.
基金2025 College Students’Innovation Training Program“Return to Poverty Monitoring and Agricultural Products Sales System”2024 College Students’Innovation Training Program“Promoting Straw Recycling to Accelerate the Sustainable Development of Agriculture”(202413207010)。
文摘With the advancement of the rural revitalization strategy,preventing poverty recurrence among previously impoverished populations has become a crucial social concern.The application of big data technology in poverty recurrence monitoring and agricultural product sales systems can effectively enhance precise identification and early warning capabilities,promoting the sustainable development of rural economies.This paper explores the application of big data technology in poverty recurrence monitoring,analyzes its innovative integration with agricultural product sales systems,and proposes an intelligent monitoring and sales platform model based on big data,aiming to provide a reference for relevant policy formulation.
文摘In the contemporary era,characterized by the Internet and digitalization as fundamental features,the operation and application of digital currency have gradually developed into a comprehensive structural system.This system restores the essential characteristics of currency while providing auxiliary services related to the formation,circulation,storage,application,and promotion of digital currency.Compared to traditional currency management technologies,big data analysis technology,which is primarily embedded in digital currency systems,enables the rapid acquisition of information.This facilitates the identification of standard associations within currency data and provides technical support for the operational framework of digital currency.
基金supported by the National Key Research and Development Program of China under Grant No.2016YFB1000601
文摘With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
基金partially supported by the Construction of Collaborative Innovation Center of Beijing Academy of Agricultural and Forestry Sciences(KJCX20240406)the Beijing Natural Science Foundation(JQ24037)+1 种基金the National Natural Science Foundation of China(32330075)the Earmarked Fund for China Agriculture Research System(CARS-02 and CARS-54)。
文摘The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This era integrates biotechnology,artificial intelligence(AI),and big data information technology.In contrast,China is still in a transition period between stages 2.0 and 3.0,which primarily relies on conventional selection and molecular breeding.In the context of increasingly complex international situations,accurately identifying core issues in China's seed industry innovation and seizing the frontier of international seed technology are strategically important.These efforts are essential for ensuring food security and revitalizing the seed industry.This paper systematically analyzes the characteristics of crop breeding data from artificial selection to intelligent design breeding.It explores the applications and development trends of AI and big data in modern crop breeding from several key perspectives.These include highthroughput phenotype acquisition and analysis,multiomics big data database and management system construction,AI-based multiomics integrated analysis,and the development of intelligent breeding software tools based on biological big data and AI technology.Based on an in-depth analysis of the current status and challenges of China's seed industry technology development,we propose strategic goals and key tasks for China's new generation of AI and big data-driven intelligent design breeding.These suggestions aim to accelerate the development of an intelligent-driven crop breeding engineering system that features large-scale gene mining,efficient gene manipulation,engineered variety design,and systematized biobreeding.This study provides a theoretical basis and practical guidance for the development of China's seed industry technology.
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
基金2024 Anqing Normal University University-Level Key Project(ZK2024062D)。
文摘This study examines the Big Data Collection and Preprocessing course at Anhui Institute of Information Engineering,implementing a hybrid teaching reform using the Bosi Smart Learning Platform.The proposed hybrid model follows a“three-stage”and“two-subject”framework,incorporating a structured design for teaching content and assessment methods before,during,and after class.Practical results indicate that this approach significantly enhances teaching effectiveness and improves students’learning autonomy.
基金funded by the National Natural Science Foundation of China(No.42220104008)。
文摘Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.
基金supported By Grant (PLN2022-14) of State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation (Southwest Petroleum University)。
文摘Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
基金supported by two research grants provided by the Karachi Institute of Economics and Technology(KIET)the Big Data Analytics Laboratory at the Insitute of Business Administration(IBAKarachi)。
文摘The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.
基金the financial supports from the Basic Research Program of National Natural Science Foundation of China(52004096)the Natural Science Foundation of Hebei Province(E2020209208).
文摘The system of hot metal quality monitoring was established based on big data and machine learning using the real-time production data of a steel enterprise in China.A working method that combines big data technology with process theory was proposed for the characteristics of blast furnace production data.After the data have been comprehensively processed,the independent variables that affect the target parameters are selected by using the method of multivariate feature selection.The use of this method not only ensures the interpretability of the input variables,but also improves the accuracy of the machine learning process and is more easily accepted by enterprises.For timely guidance on production,specific evaluation rules are established for the key quality that affects the quality of hot metal on the basis of completed predictions work and uses computer technology to build a quality monitoring system for hot metal.The online results show that the hot metal quality monitoring system established by relying on big data and machine learning operates stably on site,and has good guiding significance for production.
基金the Major Program of National Social Science Foundation of China(No.13&ZD148)
文摘"Big data" which admittedly means many things to many people is no longer confined to the realm of technology. Today it is a business imperative and is providing solutions to long-standing business challenges for banking and financial markets companies around the world. Financial services firms are leveraging big data to transform their processes, their organizations and the entire industry. Since 2012, the term "big data" has frequently been mentioned and used to describe and define the huge amount of data in the information explosive era and to name related technological development and innovation. As to the police work, the coming of big data era is not only a challenge but also an opportunity. Police agencies should go with the tide of development to start with such aspects as work thinking, top design, public information sharing and application and talent provision so as to promote the new development and progress of police work. This paper expounds the practical effect and significance of police big data application by cases happened in some areas.
文摘This work surveys and illustrates multiple open challenges in the field of industrial Internet of Things(IoT)-based big data management and analysis in cloud environments.Challenges arising from the fields of machine learning in cloud infrastructures,artificial intelligence techniques for big data analytics in cloud environments,and federated learning cloud systems are elucidated.Additionally,reinforcement learning,which is a novel technique that allows large cloud-based data centers,to allocate more energy-efficient resources is examined.Moreover,we propose an architecture that attempts to combine the features offered by several cloud providers to achieve an energy-efficient industrial IoT-based big data management framework(EEIBDM)established outside of every user in the cloud.IoT data can be integrated with techniques such as reinforcement and federated learning to achieve a digital twin scenario for the virtual representation of industrial IoT-based big data of machines and room tem-peratures.Furthermore,we propose an algorithm for determining the energy consumption of the infrastructure by evaluating the EEIBDM framework.Finally,future directions for the expansion of this research are discussed.
基金This research was supported by a grant[KCG-01-2017-01]through the Disaster and Safety Management Institute funded by the Ministry of Public Safety and Securitythe National Research Foundation of Korea(NRF)grant[No.2018R1D1A1B07050208]funded by the Ministry of Science and ICT of Korea Government.
文摘This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South Korea,detailed procedures of the methodology were proposed and its scalability was confirmed.To analyze the risk from a more detailed and microscopic viewpoint,vessel routes as hazard sources were delineated on the basis of automated identification system(AIS)big data.The outliers and errors of AIS big data were removed using the density-based spatial clustering of applications with noise algorithm,and a marine traffic density map was evaluated by combining all of the gridded routes.Vulnerability of marine environment was identified on the basis of the sensitive resource map constructed by the Korea Coast Guard in a similar manner to the National Oceanic and Atmospheric Administration environmental sensitivity index approach.In this study,aquaculture sites,water intake facilities of power plants,and beach/resort areas were selected as representative indicators for each category.The vulnerability values of neighboring cells decreased according to the Euclidean distance from the resource cells.Two resulting maps were aggregated to construct a final sensitive resource and traffic density(SRTD)risk analysis map of the Busan–Ulsan sea areas.We confirmed the effectiveness of SRTD risk analysis by comparing it with the actual marine spill accident records.Results show that all of the marine spill accidents in 2018 occurred within 2 km of high-risk cells(level 6 and above).Thus,if accident management and monitoring capabilities are concentrated on high-risk cells,which account for only 6.45%of the total study area,then it is expected that it will be possible to cope with most marine spill accidents effectively.