With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy...With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy has become a hot issue. One of the most popular research fields for solving the data security and data privacy is within the scope of big data governance and security, In this paper, we introduce the basic concepts of data governance and security. Then, all the state-of-the-art open source frameworks for data governance and security, including Apache Falcon, Apache Atlas, Apache Ranger, Apache Sentry and Kerberos, are detailed and discussed with descriptions of their implementation principles and possible applications.展开更多
As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, ...As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.展开更多
This article applies open source data of public facilities through data mining, not only to evaluate the public facilities from an objective dimension, but also to reflect the sensory opinions of the group factually, ...This article applies open source data of public facilities through data mining, not only to evaluate the public facilities from an objective dimension, but also to reflect the sensory opinions of the group factually, eventually realizing the evaluation measurement of urban public facilities. The research takes Shenzhen city as an empirical case and chooses typical public facilities to mine data, resolve address and weight to explore the application of public facilities evaluation under dimension reduction of open source data. The empirical study consists of three parts. first, as the objective evaluation, we estimate the density distribution and per capita of public facility through data mining and address resolution. Second, as the subjective evaluation, we carry on the location analysis to high-score public facility through attention and satisfaction data of Internet evaluation. finally, as mentioned above, we calculate the weight of objective and subjective evaluation of public facility, eventually formatting the comprehensive evaluation of public facilities.展开更多
The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods hav...The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods have been proposed in the research area of software reliability. Also, there are several software project analyses based on the software effort data such as the earned value management. In particular, the software reliability growth models can </span><span style="font-family:Verdana;">apply to the system testing phase of software development. On the other</span><span style="font-family:Verdana;"> hand, the software effort analysis can apply to all development phase, because the fault data is only recorded on the testing phase. We focus on the big fault data and effort data of open source software. Then, it is difficult to assess by using the typical statistical assessment method, because the data recorded on the bug tracking system is large scale. Also, we discuss the jump diffusion process model based on the estimation method of jump parameters by using the discriminant analysis. Moreover, we analyze actual big fault data to show numerical examples of software effort assessment considering many categorical data set.展开更多
基于关系型数据库的空间数据存储与处理是地理信息系统(geographic information system,GIS)领域的主流模式,但伴随着物联网、移动互联网、云计算及空间数据采集技术的发展,空间数据已从海量特征转变为大数据特征,对空间数据的存储和管...基于关系型数据库的空间数据存储与处理是地理信息系统(geographic information system,GIS)领域的主流模式,但伴随着物联网、移动互联网、云计算及空间数据采集技术的发展,空间数据已从海量特征转变为大数据特征,对空间数据的存储和管理在数据量和处理模式上提出了新的挑战。首先分析了基于传统的集中式存储与管理模式在处理和应用大数据方面的局限性,包括存储对象的适应性、存储能力的可扩展性及高并发处理能力要求;然后在分析当前几大主流NoSQL数据库特点的基础上,指出了空间大数据基于NoSQL数据库的单一存储模式在数据操作方式、查询方式和数据高效管理方面存在的局限性;最后结合GIS领域空间大数据存储对数据库存储能力的可扩展性及数据处理和访问的高并发要求,提出基于内存数据库和NoSQL数据库的空间大数据分布式存储与综合处理策略,并开发了原型系统对提出的存储策略进行可行性和有效性进行了验证。展开更多
传统关系数据库能够很好地支持结构化数据的存储和管理,且具有完备的数学理论、完善的事务管理机制和高效的查询处理引擎,因此得到了广泛应用。但随着大数据时代的到来,传统关系数据库无法满足各种类型的非结构化数据的大规模存储和高...传统关系数据库能够很好地支持结构化数据的存储和管理,且具有完备的数学理论、完善的事务管理机制和高效的查询处理引擎,因此得到了广泛应用。但随着大数据时代的到来,传统关系数据库无法满足各种类型的非结构化数据的大规模存储和高效处理需求,因此出现了NoSQL(Not only SQL)数据库。本文首先对二者进行了介绍,然后又从多个方面进行了比较和分析。展开更多
There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed 'big data'. The structural shift of the storage m...There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed 'big data'. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mecha- nism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the ex- isling approaches using Brewer's CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.展开更多
文摘With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy has become a hot issue. One of the most popular research fields for solving the data security and data privacy is within the scope of big data governance and security, In this paper, we introduce the basic concepts of data governance and security. Then, all the state-of-the-art open source frameworks for data governance and security, including Apache Falcon, Apache Atlas, Apache Ranger, Apache Sentry and Kerberos, are detailed and discussed with descriptions of their implementation principles and possible applications.
文摘As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.
基金This work was funded by National Natural Science Foundation of China(grant numbers. 51478189 and 51308220)%Natural Science Foundation of Guangdong(grant number 2014A03031326)%Fundamental Research Funds for the Central Universities(grant number 2015ZZ0022)
文摘This article applies open source data of public facilities through data mining, not only to evaluate the public facilities from an objective dimension, but also to reflect the sensory opinions of the group factually, eventually realizing the evaluation measurement of urban public facilities. The research takes Shenzhen city as an empirical case and chooses typical public facilities to mine data, resolve address and weight to explore the application of public facilities evaluation under dimension reduction of open source data. The empirical study consists of three parts. first, as the objective evaluation, we estimate the density distribution and per capita of public facility through data mining and address resolution. Second, as the subjective evaluation, we carry on the location analysis to high-score public facility through attention and satisfaction data of Internet evaluation. finally, as mentioned above, we calculate the weight of objective and subjective evaluation of public facility, eventually formatting the comprehensive evaluation of public facilities.
文摘The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods have been proposed in the research area of software reliability. Also, there are several software project analyses based on the software effort data such as the earned value management. In particular, the software reliability growth models can </span><span style="font-family:Verdana;">apply to the system testing phase of software development. On the other</span><span style="font-family:Verdana;"> hand, the software effort analysis can apply to all development phase, because the fault data is only recorded on the testing phase. We focus on the big fault data and effort data of open source software. Then, it is difficult to assess by using the typical statistical assessment method, because the data recorded on the bug tracking system is large scale. Also, we discuss the jump diffusion process model based on the estimation method of jump parameters by using the discriminant analysis. Moreover, we analyze actual big fault data to show numerical examples of software effort assessment considering many categorical data set.
文摘基于关系型数据库的空间数据存储与处理是地理信息系统(geographic information system,GIS)领域的主流模式,但伴随着物联网、移动互联网、云计算及空间数据采集技术的发展,空间数据已从海量特征转变为大数据特征,对空间数据的存储和管理在数据量和处理模式上提出了新的挑战。首先分析了基于传统的集中式存储与管理模式在处理和应用大数据方面的局限性,包括存储对象的适应性、存储能力的可扩展性及高并发处理能力要求;然后在分析当前几大主流NoSQL数据库特点的基础上,指出了空间大数据基于NoSQL数据库的单一存储模式在数据操作方式、查询方式和数据高效管理方面存在的局限性;最后结合GIS领域空间大数据存储对数据库存储能力的可扩展性及数据处理和访问的高并发要求,提出基于内存数据库和NoSQL数据库的空间大数据分布式存储与综合处理策略,并开发了原型系统对提出的存储策略进行可行性和有效性进行了验证。
文摘传统关系数据库能够很好地支持结构化数据的存储和管理,且具有完备的数学理论、完善的事务管理机制和高效的查询处理引擎,因此得到了广泛应用。但随着大数据时代的到来,传统关系数据库无法满足各种类型的非结构化数据的大规模存储和高效处理需求,因此出现了NoSQL(Not only SQL)数据库。本文首先对二者进行了介绍,然后又从多个方面进行了比较和分析。
文摘There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed 'big data'. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mecha- nism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the ex- isling approaches using Brewer's CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.