Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the mul...GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.展开更多
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new genera...Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.展开更多
As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, ...As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for dat...Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for data mining provides unique opportunities for discovery and optimization. Here, a few examples of such opportunities are discussed relating to structural analysis and classification, discovery of correlations between materials properties, and discovery of unsuspected compounds.展开更多
For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and dura...For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.展开更多
Widely used in clinical research, the database is a new type of data management automation technology and the most efficient tool for data management. In this article, we first explain some basic concepts, such as the...Widely used in clinical research, the database is a new type of data management automation technology and the most efficient tool for data management. In this article, we first explain some basic concepts, such as the definition, classification, and establishment of databases. Afterward, the workflow for establishing databases, inputting data, verifying data, and managing databases is presented. Meanwhile, by discussing the application of databases in clinical research, we illuminate the important role of databases in clinical research practice. Lastly, we introduce the reanalysis of randomized controlled trials(RCTs) and cloud computing techniques, showing the most recent advancements of databases in clinical research.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language(SQL), comprises 41 disruption ...Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language(SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents,and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST,to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions,a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented.展开更多
Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational databas...Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.展开更多
To manage dynamic access control and deter pi- rate attacks on outsourced databases, a dynamic access control scheme with tracing is proposed. In our scheme, we introduce the traitor tracing idea into outsource databa...To manage dynamic access control and deter pi- rate attacks on outsourced databases, a dynamic access control scheme with tracing is proposed. In our scheme, we introduce the traitor tracing idea into outsource databases, and employ a polynomial function and filter function as the basic means of constructing encryption and decryption procedures to reduce computation, communication, and storage overheads. Compared to previous access control schemes for outsourced databases, our scheme can not only protect sensitive data from leaking and perform scalable encryption at the server side without shipping the outsourced data back to the data owner when group membership is changed, but also provide trace-and-revoke features. When malicious users clone and sell their decryption keys for profit, our scheme can trace the decryption keys to the malicious users and revoke them. Furthermore, our scheme avoids massive message exchanges for establishing the decryption key between the data owner and the user. Compared to previously proposed publickey traitor tracing schemes, our scheme can simultaneously achieve full collusion resistance, full recoverability, full revocation, and black-box traceability. The proof of security and analysis of performance show that our scheme is secure and efficient.展开更多
In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data be...In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data benchmark(HIS)has garnered the most attention due to its practicality and effectiveness.However,existing CPM reviews usually focus on the theoretical benchmark,and there is a lack of an in-depth review that thoroughly explores HIS-based methods.In this article,a comprehensive overview of HIS-based CPM is provided.First,we provide a novel static-dynamic perspective on data-level manifestations of control performance underlying typical controller capacities including regulation and servo:static and dynamic properties.The static property portrays time-independent variability in system output,and the dynamic property describes temporal behavior driven by closed-loop feedback.Accordingly,existing HIS-based CPM approaches and their intrinsic motivations are classified and analyzed from these two perspectives.Specifically,two mainstream solutions for CPM methods are summarized,including static analysis and dynamic analysis,which match data-driven techniques with actual controlling behavior.Furthermore,this paper also points out various opportunities and challenges faced in CPM for modern industry and provides promising directions in the context of artificial intelligence for inspiring future research.展开更多
[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data...[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.展开更多
In order to further enhance the numerical application of weather radar radial velocity,this paper proposes a quality control scheme for weather radar radial velocity from the perspective of data assimilation.The propo...In order to further enhance the numerical application of weather radar radial velocity,this paper proposes a quality control scheme for weather radar radial velocity from the perspective of data assimilation.The proposed scheme is based on the WRFDA(Weather Research and Forecasting Data Assimilation)system and utilizes the biweight algorithm to perform quality control on weather radar radial velocity data.A series of quality control tests conducted over the course of one month demonstrate that the scheme can be seamlessly integrated into the data assimilation process.The scheme is characterized by its simplicity,fast implementation,and ease of maintenance.By determining an appropri-ate threshold for quality control,the percentage of outliers identified by the scheme remains highly stable over time.Moreover,the mean errors and standard deviations of the O-B(observation-minus-background)values are significantly reduced,improving the overall data quality.The main information and spatial distribution features of the data are pre-served effectively.After quality control,the distribution of the O-B Probability Density Function is adjusted in a manner that brings it closer to a Gaussian distribution.This adjustment is beneficial for the subsequent data assimilation process,contributing to more accurate numerical weather predictions.Thus,the proposed quality control scheme provides a valuable tool for improving weather radar data quality and enhancing numerical forecasting performance.展开更多
Fine-grained access control (FGAC) must be supported by relational databases to satisfy the requirements of privacy preserving and Internet-based applications.Though much work on FGAC models has been conducted,there a...Fine-grained access control (FGAC) must be supported by relational databases to satisfy the requirements of privacy preserving and Internet-based applications.Though much work on FGAC models has been conducted,there are still a number of ongoing problems.We propose a new FGAC model which supports the specification of open access control policies as well as closed access control policies in relational databases.The negative authorization is supported,which allows the security administrator to specify what data should not be accessed by certain users.Moreover,multiple policies defined to regulate user access together are also supported.The definition and combination algorithm of multiple policies are thus provided.Finally,we implement the proposed FGAC model as a component of the database management system (DBMS) and evaluate its performance.The performance results show that the proposed model is feasible.展开更多
This article focuses on the current computer monitoring and control as the research direction,studying the application strategies of artificial intelligence and big data technology in this field.It includes an introdu...This article focuses on the current computer monitoring and control as the research direction,studying the application strategies of artificial intelligence and big data technology in this field.It includes an introduction to artificial intelligence and big data technology,the application strategies of artificial intelligence and big data technology in computer hardware,software,and network monitoring,as well as the application strategies of artificial intelligence and big data technology in computer process,access,and network control.This analysis aims to serve as a reference for the application of artificial intelligence and big data technology in computer monitoring and control,ultimately enhancing the security of computer systems.展开更多
In the production processes of modern industry,accurate assessment of the system’s health state and traceability non-optimal factors are key to ensuring“safe,stable,long-term,full load and optimal”operation of the ...In the production processes of modern industry,accurate assessment of the system’s health state and traceability non-optimal factors are key to ensuring“safe,stable,long-term,full load and optimal”operation of the production process.The benzene-to-ethylene ratio control system is a complex system based on anMPC-PID doublelayer architecture.Taking into consideration the interaction between levels,coupling between loops and conditions of incomplete operation data,this paper proposes a health assessment method for the dual-layer control system by comprehensively utilizing deep learning technology.Firstly,according to the results of the pre-assessment of the system layers and loops bymultivariate statisticalmethods,seven characteristic parameters that have a significant impact on the health state of the system are identified.Next,aiming at the problem of incomplete assessment data set due to the uneven distribution of actual system operating health state,the original unbalanced dataset is augmented using aWasserstein generative adversarial network with gradient penalty term,and a complete dataset is obtained to characterise all the health states of the system.On this basis,a new deep learning-based health assessment framework for the benzeneto-ethylene ratio control system is constructed based on traditionalmultivariate statistical assessment.This framework can overcome the shortcomings of the linear weighted fusion related to the coupling and nonlinearity of the subsystem health state at different layers,and reduce the dependence of the prior knowledge.Furthermore,by introducing a dynamic attention mechanism(AM)into the convolutional neural network(CNN),the assessment model integrating both assessment and traceability is constructed,which can achieve the health assessment and trace the non-optimal factors of the complex control systems with the double-layer architecture.Finally,the effectiveness and superiority of the proposed method have been verified by the benzene-ethylene ratio control system of the alkylation process unit in a styrene plant.展开更多
This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instan...This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instances attached to each type. The ranking criteria for each step considers the mapping rules and heterogeneous graph structure of the data web. Tests based on a social network dataset show that the linked data ranking is effective and easier for people to understand. This approach benefits from utilizing relationships deduced from mapping rules based on table schemas and distinguishing the relationship types, which results in better ranking and visualization of the linked data.展开更多
The main aim of this work is to design a non-fragile sampled data control(NFSDC) scheme for the asymptotic synchronization criteria for interconnected coupled circuit systems(multi-agent systems, MASs). NFSDC is used ...The main aim of this work is to design a non-fragile sampled data control(NFSDC) scheme for the asymptotic synchronization criteria for interconnected coupled circuit systems(multi-agent systems, MASs). NFSDC is used to conduct synchronization analysis of the considered MASs in the presence of time-varying delays. By constructing suitable Lyapunov functions, sufficient conditions are derived in terms of linear matrix inequalities(LMIs) to ensure synchronization between the MAS leader and follower systems. Finally, two numerical examples are given to show the effectiveness of the proposed control scheme and less conservation of the proposed Lyapunov functions.展开更多
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.
文摘Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.
文摘As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
文摘Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for data mining provides unique opportunities for discovery and optimization. Here, a few examples of such opportunities are discussed relating to structural analysis and classification, discovery of correlations between materials properties, and discovery of unsuspected compounds.
基金supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled "Fundamental Industrial Technology Development Program (1/4)"
文摘For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.
基金supported by Fundamental Research Funds of State Key Laboratory of Ophthalmology (Grant No.2015QN01)Young Teacher Top-Support project of Sun Yat-sen University(Grant No.2015ykzd11)+4 种基金the Cultivation Projects for Young Teaching Staff of Sun Yat-sen University(Grant No.12ykpy61) from the Fundamental Research Funds for the Central Universitiesthe Pearl River Science and Technology New Star(Grant No.2014J2200060)Project of Guangzhou City,the Guangdong Provincial Natural Science Foundation for Distinguished Young Scholars of China(Grant No. 2014A030306030)Youth Science and Technology Innovation Talents Funds in Special Support Plan for High Level Talents in Guangdong Province(Grant No. 2014TQ01R573)Key Research Plan for National Natural Science Foundation of China in Cultivation Project (No.91546101)
文摘Widely used in clinical research, the database is a new type of data management automation technology and the most efficient tool for data management. In this article, we first explain some basic concepts, such as the definition, classification, and establishment of databases. Afterward, the workflow for establishing databases, inputting data, verifying data, and managing databases is presented. Meanwhile, by discussing the application of databases in clinical research, we illuminate the important role of databases in clinical research practice. Lastly, we introduce the reanalysis of randomized controlled trials(RCTs) and cloud computing techniques, showing the most recent advancements of databases in clinical research.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金supported by the National Magnetic Confinement Fusion Science Program of China(No.2014GB103000)
文摘Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language(SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents,and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST,to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions,a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 ) .
文摘Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.
基金Acknowledgements This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61070164, 61272415), Science and Technology Planning Project of Guangdong Province, China (2010B010600025), and Natural Science Foundation of Guangdong Province, China (S2012010008767, 815106 32010000022).
文摘To manage dynamic access control and deter pi- rate attacks on outsourced databases, a dynamic access control scheme with tracing is proposed. In our scheme, we introduce the traitor tracing idea into outsource databases, and employ a polynomial function and filter function as the basic means of constructing encryption and decryption procedures to reduce computation, communication, and storage overheads. Compared to previous access control schemes for outsourced databases, our scheme can not only protect sensitive data from leaking and perform scalable encryption at the server side without shipping the outsourced data back to the data owner when group membership is changed, but also provide trace-and-revoke features. When malicious users clone and sell their decryption keys for profit, our scheme can trace the decryption keys to the malicious users and revoke them. Furthermore, our scheme avoids massive message exchanges for establishing the decryption key between the data owner and the user. Compared to previously proposed publickey traitor tracing schemes, our scheme can simultaneously achieve full collusion resistance, full recoverability, full revocation, and black-box traceability. The proof of security and analysis of performance show that our scheme is secure and efficient.
基金supported in part by the National Natural Science Foundation of China(62125306)Zhejiang Key Research and Development Project(2024C01163)the State Key Laboratory of Industrial Control Technology,China(ICT2024A06)
文摘In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data benchmark(HIS)has garnered the most attention due to its practicality and effectiveness.However,existing CPM reviews usually focus on the theoretical benchmark,and there is a lack of an in-depth review that thoroughly explores HIS-based methods.In this article,a comprehensive overview of HIS-based CPM is provided.First,we provide a novel static-dynamic perspective on data-level manifestations of control performance underlying typical controller capacities including regulation and servo:static and dynamic properties.The static property portrays time-independent variability in system output,and the dynamic property describes temporal behavior driven by closed-loop feedback.Accordingly,existing HIS-based CPM approaches and their intrinsic motivations are classified and analyzed from these two perspectives.Specifically,two mainstream solutions for CPM methods are summarized,including static analysis and dynamic analysis,which match data-driven techniques with actual controlling behavior.Furthermore,this paper also points out various opportunities and challenges faced in CPM for modern industry and provides promising directions in the context of artificial intelligence for inspiring future research.
基金the Fifth Batch of Innovation Teams of Wuzhou Meteorological Bureau"Wuzhou Innovation Team for Enhancing the Comprehensive Meteorological Observation Ability through Digitization and Intelligence"Wuzhou Science and Technology Planning Project(202402122,202402119).
文摘[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.
基金funded by Beijige Fund of Nanjing Joint Institute for Atmospheric Sciences(BJG202501)the Joint Research Project for Meteorological Capacity Improvement(22NLTSY009)+2 种基金Key Scientific Research Projects of Jiangsu Provincial Meteorological Bureau(KZ202203)China Meteorological Administration projects(CMAJBGS202316)the Guiding Research Projects of Jiangsu Provincial Meteorological Bureau(ZD202404,ZD202419).
文摘In order to further enhance the numerical application of weather radar radial velocity,this paper proposes a quality control scheme for weather radar radial velocity from the perspective of data assimilation.The proposed scheme is based on the WRFDA(Weather Research and Forecasting Data Assimilation)system and utilizes the biweight algorithm to perform quality control on weather radar radial velocity data.A series of quality control tests conducted over the course of one month demonstrate that the scheme can be seamlessly integrated into the data assimilation process.The scheme is characterized by its simplicity,fast implementation,and ease of maintenance.By determining an appropri-ate threshold for quality control,the percentage of outliers identified by the scheme remains highly stable over time.Moreover,the mean errors and standard deviations of the O-B(observation-minus-background)values are significantly reduced,improving the overall data quality.The main information and spatial distribution features of the data are pre-served effectively.After quality control,the distribution of the O-B Probability Density Function is adjusted in a manner that brings it closer to a Gaussian distribution.This adjustment is beneficial for the subsequent data assimilation process,contributing to more accurate numerical weather predictions.Thus,the proposed quality control scheme provides a valuable tool for improving weather radar data quality and enhancing numerical forecasting performance.
基金Project (No.2006AA01Z430) supported by the National High-Tech Research and Development Program (863) of China
文摘Fine-grained access control (FGAC) must be supported by relational databases to satisfy the requirements of privacy preserving and Internet-based applications.Though much work on FGAC models has been conducted,there are still a number of ongoing problems.We propose a new FGAC model which supports the specification of open access control policies as well as closed access control policies in relational databases.The negative authorization is supported,which allows the security administrator to specify what data should not be accessed by certain users.Moreover,multiple policies defined to regulate user access together are also supported.The definition and combination algorithm of multiple policies are thus provided.Finally,we implement the proposed FGAC model as a component of the database management system (DBMS) and evaluate its performance.The performance results show that the proposed model is feasible.
文摘This article focuses on the current computer monitoring and control as the research direction,studying the application strategies of artificial intelligence and big data technology in this field.It includes an introduction to artificial intelligence and big data technology,the application strategies of artificial intelligence and big data technology in computer hardware,software,and network monitoring,as well as the application strategies of artificial intelligence and big data technology in computer process,access,and network control.This analysis aims to serve as a reference for the application of artificial intelligence and big data technology in computer monitoring and control,ultimately enhancing the security of computer systems.
基金supported by the National Science Foundation of China(62263020)the Key Project of Natural Science Foundation of Gansu Province(25JRRA061)+1 种基金the Key R&D Program of Gansu Province(23YFGA0061)the Scientific Research Initiation Fund of Lanzhou University of Technology(061602).
文摘In the production processes of modern industry,accurate assessment of the system’s health state and traceability non-optimal factors are key to ensuring“safe,stable,long-term,full load and optimal”operation of the production process.The benzene-to-ethylene ratio control system is a complex system based on anMPC-PID doublelayer architecture.Taking into consideration the interaction between levels,coupling between loops and conditions of incomplete operation data,this paper proposes a health assessment method for the dual-layer control system by comprehensively utilizing deep learning technology.Firstly,according to the results of the pre-assessment of the system layers and loops bymultivariate statisticalmethods,seven characteristic parameters that have a significant impact on the health state of the system are identified.Next,aiming at the problem of incomplete assessment data set due to the uneven distribution of actual system operating health state,the original unbalanced dataset is augmented using aWasserstein generative adversarial network with gradient penalty term,and a complete dataset is obtained to characterise all the health states of the system.On this basis,a new deep learning-based health assessment framework for the benzeneto-ethylene ratio control system is constructed based on traditionalmultivariate statistical assessment.This framework can overcome the shortcomings of the linear weighted fusion related to the coupling and nonlinearity of the subsystem health state at different layers,and reduce the dependence of the prior knowledge.Furthermore,by introducing a dynamic attention mechanism(AM)into the convolutional neural network(CNN),the assessment model integrating both assessment and traceability is constructed,which can achieve the health assessment and trace the non-optimal factors of the complex control systems with the double-layer architecture.Finally,the effectiveness and superiority of the proposed method have been verified by the benzene-ethylene ratio control system of the alkylation process unit in a styrene plant.
文摘This paper investigates the problem of ranking linked data from relational databases using a rank-ing framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instances attached to each type. The ranking criteria for each step considers the mapping rules and heterogeneous graph structure of the data web. Tests based on a social network dataset show that the linked data ranking is effective and easier for people to understand. This approach benefits from utilizing relationships deduced from mapping rules based on table schemas and distinguishing the relationship types, which results in better ranking and visualization of the linked data.
基金Project supported by the National Natural Science Foundation of China(No.62103103)the Natural Science Foundation of Jiangsu Province,China(No.BK20210223)。
文摘The main aim of this work is to design a non-fragile sampled data control(NFSDC) scheme for the asymptotic synchronization criteria for interconnected coupled circuit systems(multi-agent systems, MASs). NFSDC is used to conduct synchronization analysis of the considered MASs in the presence of time-varying delays. By constructing suitable Lyapunov functions, sufficient conditions are derived in terms of linear matrix inequalities(LMIs) to ensure synchronization between the MAS leader and follower systems. Finally, two numerical examples are given to show the effectiveness of the proposed control scheme and less conservation of the proposed Lyapunov functions.