期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Density estimation-based method to determine sample size for random sample partition of big data
1
作者 Yulin HE Jiaqi CHEN +2 位作者 Jiaxing SHEN Philippe FOURNIER-VIGER Joshua Zhexue HUANG 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第5期57-70,共14页
Random sample partition(RSP)is a newly developed big data representation and management model to deal with big data approximate computation problems.Academic research and practical applications have confirmed that RSP... Random sample partition(RSP)is a newly developed big data representation and management model to deal with big data approximate computation problems.Academic research and practical applications have confirmed that RSP is an efficient solution for big data processing and analysis.However,a challenge for implementing RSP is determining an appropriate sample size for RSP data blocks.While a large sample size increases the burden of big data computation,a small size will lead to insufficient distribution information for RSP data blocks.To address this problem,this paper presents a novel density estimation-based method(DEM)to determine the optimal sample size for RSP data blocks.First,a theoretical sample size is calculated based on the multivariate Dvoretzky-Kiefer-Wolfowitz(DKW)inequality by using the fixed-point iteration(FPI)method.Second,a practical sample size is determined by minimizing the validation error of a kernel density estimator(KDE)constructed on RSP data blocks for an increasing sample size.Finally,a series of persuasive experiments are conducted to validate the feasibility,rationality,and effectiveness of DEM.Experimental results show that(1)the iteration function of the FPI method is convergent for calculating the theoretical sample size from the multivariate DKW inequality;(2)the KDE constructed on RSP data blocks with sample size determined by DEM can yield a good approximation of the probability density function(p.d.f);and(3)DEM provides more accurate sample sizes than the existing sample size determination methods from the perspective of p.d.f.estimation.This demonstrates that DEM is a viable approach to deal with the sample size determination problem for big data RSP implementation. 展开更多
关键词 random sample partition big data sample size Dvoretzky-Kiefer-Wolfowitz inequality kerneldensity estimator probability density function
原文传递
A grid-based clustering algorithm for wild bird distribution 被引量:4
2
作者 Yuwei WANG Yuanchun ZHOU +7 位作者 Ying LIU Ze LUO Danhuai GUO Jing SHAO Fei TAN Liang WU Jianhui LI Baoping YAN 《Frontiers of Computer Science》 SCIE EI CSCD 2013年第4期475-485,共11页
Advanced satellite tracking technologies provide biologists with long-term location sequence data to understand movement of wild birds then to find explicit correlation between dynamics of migratory birds and the spre... Advanced satellite tracking technologies provide biologists with long-term location sequence data to understand movement of wild birds then to find explicit correlation between dynamics of migratory birds and the spread of avian influenza. In this paper, we propose a hierarchical clustering algorithm based on a recursive grid partition and kernel density estimation (KDE) to hierarchically identify wild bird habitats with different densities. We hierarchically cluster the GPS data by taking into account the following observations: 1) the habitat variation on a variety of geospatial scales; 2) the spatial variation of the activity patterns of birds in different stages of the migration cycle. In addition, we measure the site fidelity of wild birds based on clustering. To assess effectiveness, we have evaluated our system using a large-scale GPS dataset collected from 59 birds over three years. As a result, our approach can identify the hierarchical habitats and distribution of wild birds more efficiently than several commonly used algorithms such as DBSCAN and DENCLUE. 展开更多
关键词 hierarchical clustering bird migration kerneldensity estimation grid partition
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部