Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle hu...Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.展开更多
In order to settle the problem of workflow data consis-tency under the distributed environment, an invalidation strategy based-on timely updating record list is put forward. The strategy adopting the method of updatin...In order to settle the problem of workflow data consis-tency under the distributed environment, an invalidation strategy based-on timely updating record list is put forward. The strategy adopting the method of updating the records list and the recovery mechanism of updating message proves the classical invalidation strategy. When the request cycle of duplication is too long, the strategy uses the method of updating the records list to pause for sending updating message; when the long cycle duplication is requested again, it uses the recovery mechanism to resume the updating message. This strategy not only ensures the consistency of the workflow data, but also reduces the unnecessary network traffic. From theoretical comparison with those common strategies, the unnecessary network traffic of this strategy is fewer and more stable. The simulation results validate this conclusion.展开更多
A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of ...A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of “linked list” to the system of Micro Control Unit (MCU), and realizes the pointer access of nuclear data on the small storage space of MCU. Experimental results show that this method can solve some problems of traditional data storage method, which has the advantages of simple program design, stable performance, accurate data, strong repeatability, saving storage space and so on.展开更多
Despite that several studies have shown that data derived from species lists generated from distribution occurrence records in the Global Biodiversity Information Facility(GBIF)are not appropriate for those ecological...Despite that several studies have shown that data derived from species lists generated from distribution occurrence records in the Global Biodiversity Information Facility(GBIF)are not appropriate for those ecological and biogeographic studies that require high sampling completeness,because species lists derived from GBIF are generally very incomplete,Suissa et al.(2021)generated fern species lists based on data with GBIF for 100 km×100 km grid cells across the world,and used the data to determine fern diversity hotspots and species richness-climate relationships.We conduct an evaluation on the completeness of fern species lists derived from GBIF at the grid-cell scale and at a larger spatial scale,and determine whether fern data derived from GBIF are appropriate for studies on the relations of species composition and richness with climatic variables.We show that species sampling completeness of GBIF is low(<40%)for most of the grid cells examined,and such low sampling completeness can substantially bias the investigation of geographic and ecological patterns of species diversity and the identification of diversity hotspots.We conclude that fern species lists derived from GBIF are generally very incomplete across a wide range of spatial scales,and are not appropriate for studies that require data derived from species lists in high completeness.We present a map showing global patterns of fern species diversity based on complete or nearly complete regional fern species lists.展开更多
Often in longitudinal studies, some subjects complete their follow-up visits, but others miss their visits due to various reasons. For those who miss follow-up visits, some of them might learn that the event of intere...Often in longitudinal studies, some subjects complete their follow-up visits, but others miss their visits due to various reasons. For those who miss follow-up visits, some of them might learn that the event of interest has already happened when they come back. In this case, not only are their event times interval-censored, but also their time-dependent measurements are incomplete. This problem was motivated by a national longitudinal survey of youth data. Maximum likelihood estimation (MLE) method based on expectation-maximization (EM) algorithm is used for parameter estimation. Then missing information principle is applied to estimate the variance-covariance matrix of the MLEs. Simulation studies demonstrate that the proposed method works well in terms of bias, standard error, and power for samples of moderate size. The national longitudinal survey of youth 1997 (NLSY97) data is analyzed for illustration.展开更多
基金Projects(61363021,61540061,61663047)supported by the National Natural Science Foundation of ChinaProject(2017SE206)supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province,China
文摘Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.
基金National Basic Research Program of China (973 Program) (2005CD312904)
文摘In order to settle the problem of workflow data consis-tency under the distributed environment, an invalidation strategy based-on timely updating record list is put forward. The strategy adopting the method of updating the records list and the recovery mechanism of updating message proves the classical invalidation strategy. When the request cycle of duplication is too long, the strategy uses the method of updating the records list to pause for sending updating message; when the long cycle duplication is requested again, it uses the recovery mechanism to resume the updating message. This strategy not only ensures the consistency of the workflow data, but also reduces the unnecessary network traffic. From theoretical comparison with those common strategies, the unnecessary network traffic of this strategy is fewer and more stable. The simulation results validate this conclusion.
文摘A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of “linked list” to the system of Micro Control Unit (MCU), and realizes the pointer access of nuclear data on the small storage space of MCU. Experimental results show that this method can solve some problems of traditional data storage method, which has the advantages of simple program design, stable performance, accurate data, strong repeatability, saving storage space and so on.
文摘Despite that several studies have shown that data derived from species lists generated from distribution occurrence records in the Global Biodiversity Information Facility(GBIF)are not appropriate for those ecological and biogeographic studies that require high sampling completeness,because species lists derived from GBIF are generally very incomplete,Suissa et al.(2021)generated fern species lists based on data with GBIF for 100 km×100 km grid cells across the world,and used the data to determine fern diversity hotspots and species richness-climate relationships.We conduct an evaluation on the completeness of fern species lists derived from GBIF at the grid-cell scale and at a larger spatial scale,and determine whether fern data derived from GBIF are appropriate for studies on the relations of species composition and richness with climatic variables.We show that species sampling completeness of GBIF is low(<40%)for most of the grid cells examined,and such low sampling completeness can substantially bias the investigation of geographic and ecological patterns of species diversity and the identification of diversity hotspots.We conclude that fern species lists derived from GBIF are generally very incomplete across a wide range of spatial scales,and are not appropriate for studies that require data derived from species lists in high completeness.We present a map showing global patterns of fern species diversity based on complete or nearly complete regional fern species lists.
文摘Often in longitudinal studies, some subjects complete their follow-up visits, but others miss their visits due to various reasons. For those who miss follow-up visits, some of them might learn that the event of interest has already happened when they come back. In this case, not only are their event times interval-censored, but also their time-dependent measurements are incomplete. This problem was motivated by a national longitudinal survey of youth data. Maximum likelihood estimation (MLE) method based on expectation-maximization (EM) algorithm is used for parameter estimation. Then missing information principle is applied to estimate the variance-covariance matrix of the MLEs. Simulation studies demonstrate that the proposed method works well in terms of bias, standard error, and power for samples of moderate size. The national longitudinal survey of youth 1997 (NLSY97) data is analyzed for illustration.