We read with great interest Deng et al.’s study 1 comparing sextant(6-core)and 12-core systematic biopsy in theMRI-targeted era,which valuably challenges the“more cores=higher accuracy”dogma by proposing a precisio...We read with great interest Deng et al.’s study 1 comparing sextant(6-core)and 12-core systematic biopsy in theMRI-targeted era,which valuably challenges the“more cores=higher accuracy”dogma by proposing a precision sampling strategy based on prostate cancer’s spatial distribution,aligning with personalized diagnosis trends.展开更多
Background:The local pivotal method(LPM)utilizing auxiliary data in sample selection has recently been proposed as a sampling method for national forest inventories(NFIs).Its performance compared to simple random samp...Background:The local pivotal method(LPM)utilizing auxiliary data in sample selection has recently been proposed as a sampling method for national forest inventories(NFIs).Its performance compared to simple random sampling(SRS)and LPM with geographical coordinates has produced promising results in simulation studies.In this simulation study we compared all these sampling methods to systematic sampling.The LPM samples were selected solely using the coordinates(LPMxy)or,in addition to that,auxiliary remote sensing-based forest variables(RS variables).We utilized field measurement data(NFI-field)and Multi-Source NFI(MS-NFI)maps as target data,and independent MS-NFI maps as auxiliary data.The designs were compared using relative efficiency(RE);a ratio of mean squared errors of the reference sampling design against the studied design.Applying a method in NFI also requires a proven estimator for the variance.Therefore,three different variance estimators were evaluated against the empirical variance of replications:1)an estimator corresponding to SRS;2)a Grafström-Schelin estimator repurposed for LPM;and 3)a Matérn estimator applied in the Finnish NFI for systematic sampling design.Results:The LPMxy was nearly comparable with the systematic design for the most target variables.The REs of the LPM designs utilizing auxiliary data compared to the systematic design varied between 0.74–1.18,according to the studied target variable.The SRS estimator for variance was expectedly the most biased and conservative estimator.Similarly,the Grafström-Schelin estimator gave overestimates in the case of LPMxy.When the RS variables were utilized as auxiliary data,the Grafström-Schelin estimates tended to underestimate the empirical variance.In systematic sampling the Matérn and Grafström-Schelin estimators performed for practical purposes equally.Conclusions:LPM optimized for a specific variable tended to be more efficient than systematic sampling,but all of the considered LPM designs were less efficient than the systematic sampling design for some target variables.The Grafström-Schelin estimator could be used as such with LPMxy or instead of the Matérn estimator in systematic sampling.Further studies of the variance estimators are needed if other auxiliary variables are to be used in LPM.展开更多
Conflicting views had greeted the use of systematic sampling for sample selection and estimation in stratified sampling in terms of the precision of the population mean base on the inherent characteristics of the popu...Conflicting views had greeted the use of systematic sampling for sample selection and estimation in stratified sampling in terms of the precision of the population mean base on the inherent characteristics of the population. These conflicting views were analyzed using Cochran data (1977, p. 211) [1]. When the population units are ordered, variance of systematic sampling for all possible systematic samples provides equal, non-negative and most precise estimates for all the variance functions considered i.e. , unlike when a single systematic sample is used and when variance of simple random sampling is used to estimate selected systematic samples.展开更多
There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the co...There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the condition tables of national inventory programs are characterized as cross-classes domains,such as vegetation type,productivity class,and age class.To date,challenges remain active for inventorying cross-classes domains because these domains are usually of unknown sampling frame and spatial distribution with the result that inference relies on population-level as opposed to domain-level sampling.Multiple challenges are noteworthy:(1)efficient sampling strategies are difficult to develop because of little priori information about the target domain;(2)domain inference relies on a sample designed for the population,so within-domain sample sizes could be too small to support a precise estimation;and(3)increasing sample size for the population does not ensure an increase to the domain,so actual sample size for a target domain remains highly uncertain,particularly for small domains.In this paper,we introduce a design-based generalized systematic adaptive cluster sampling(GSACS)for inventorying cross-classes domains.Design-unbiased Hansen-Hurwitz and Horvitz-Thompson estimators are derived for domain totals and compared within GSACS and with systematic sampling(SYS).Comprehensive Monte Carlo simulations show that(1)GSACS Hansen-Hurwitz and Horvitz-Thompson estimators are unbiased and equally efficient,whereas thelatter outperforms the former for supporting a sample of size one;(2)SYS is a special case of GSACS while the latter outperforms the former in terms of increased efficiency and reduced intensity;(3)GSACS Horvitz-Thompson variance estimator is design-unbiased for a single SYS sample;and(4)rules-ofthumb summarized with respect to sampling design and spatial effect improve precision.Because inventorying a mini domain is analogous to inventorying a rare variable,alternative network sampling procedures are also readily available for inventorying cross-classes domains.展开更多
The main aim of this study was to evaluate methods for fixed area and distance sampling in the Zagros open forest area in western Iran. Basic forest management and planning required appropriate quantitative and qualit...The main aim of this study was to evaluate methods for fixed area and distance sampling in the Zagros open forest area in western Iran. Basic forest management and planning required appropriate quantitative and qualitative information. Two sampling methods were compared on the basis of the actual means of characteristics derived from the 100 % survey. In total, 37 sampling plots were systematically installed with a grid of 100 m × 100 m in the study area. Density, crown canopy, and basal area of the stands were measured. The 100 % survey showed that tree density above 12.5 cm diameter at breast height was 68.04 stem ha-1, basal area was 15.16 m2 ha-1 and crown canopy percentage was 35.71% ha-1. The values for the traits determined by the two sampling methods differed significantly (P = 0.05). When the time required for the methods was compared, transect sampling required less than systematic-random sampling. Therefore, the transect sampling method was the more economical method for the Zagros open forests. The transect sampling method was statistically defensible and practical for quantitating characteristics of the Zagros open forests.展开更多
Direct measurement of snow water equivalent(SWE)in snow-dominated mountainous areas is difficult,thus its prediction is essential for water resources management in such areas.In addition,because of nonlinear trend of ...Direct measurement of snow water equivalent(SWE)in snow-dominated mountainous areas is difficult,thus its prediction is essential for water resources management in such areas.In addition,because of nonlinear trend of snow spatial distribution and the multiple influencing factors concerning the SWE spatial distribution,statistical models are not usually able to present acceptable results.Therefore,applicable methods that are able to predict nonlinear trends are necessary.In this research,to predict SWE,the Sohrevard Watershed located in northwest of Iran was selected as the case study.Database was collected,and the required maps were derived.Snow depth(SD)at 150 points with two sampling patterns including systematic random sampling and Latin hypercube sampling(LHS),and snow density at 18 points were randomly measured,and then SWE was calculated.SWE was predicted using artificial neural network(ANN),adaptive neuro-fuzzy inference system(ANFIS)and regression methods.The results showed that the performance of ANN and ANFIS models with two sampling patterns were observed better than the regression method.Moreover,based on most of the efficiency criteria,the efficiency of ANN,ANFIS and regression methods under LHS pattern were observed higher than the systematic random sampling pattern.However,there were no significant differences between the two methods of ANN and ANFIS in SWE prediction.Data of both two sampling patterns had the highest sensitivity to the elevation.In addition,the LHS and the systematic random sampling patterns had the least sensitivity to the profile curvature and plan curvature,respectively.展开更多
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phyloge-netic anal...Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phyloge-netic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incor-rect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applica-tions of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computa-tional power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.展开更多
Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden di...Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden different types of immunization programs have been implemented. For better immunization coverage in developing countries, considerable progress is to be made to improve the knowledge and awareness regarding importance of vaccines. In this study a compara-tive study of immunization coverage under two sampling methods has been performed. Methods: In this study variance and design effect of proportion of children vaccinated against different types of vaccines (BCG, OPV, DPT, Hepatitis B, Hib, Measles and MMR) are estimated under two stage (30 × 30) cluster and systematic sampling for comparison of these two survey sampling methods. Also the homogeneity of clusters has been tested by using chi-square test. Results: It is observed that BCG, OPV and DPT vaccination coverage is more than 90% whereas Hepatitis B, Measles, Hib and MMR vaccination coverage is between 50% - 64% only. Here systematic random sampling is more complicated than two stage (30 × 30) cluster sampling. Also the result shows that the clusters are homogeneous with respect to proportion of children vaccinated. Conclusion: There is no significant difference between the two survey methodologies regarding the point estimation of vaccination coverage but estimation of variances of vaccination coverage is less in two stage (30 × 30) cluster sampling than that of the systematic sampling. Also the clusters are homogeneous. Very less improvement has been observed in case of fully vaccination coverage than the previous study. From the study it can be said that two stage (30 × 30) cluster sampling will be preferred to systematic sampling and simple random sampling method.展开更多
Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k ...Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k subsets,which usually causes subsampling bias,inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation.Based on ordered systematic sampling theory in statistics and low-discrepancy sequence theory in number theory,we propose a new k-fold cross-validation procedure by replacing a pseudo-random sequence with a best-discrepancy sequence,which ensures low subsampling bias and leads to more precise expected-prediction-error(EPE)estimates.Experiments with 156 benchmark datasets and three classifiers(logistic regression,decision tree and na?ve bayes)show that in general,our cross-validation procedure can extrude subsampling bias in the MCCV by lowering the EPE around 7.18%and the variances around 26.73%.In comparison,the stratified MCCV can reduce the EPE and variances of the MCCV around 1.58%and 11.85%,respectively.The leave-one-out(LOO)can lower the EPE around 2.50%but its variances are much higher than the any other cross-validation(CV)procedure.The computational time of our cross-validation procedure is just 8.64%of the MCCV,8.67%of the stratified MCCV and 16.72%of the LOO.Experiments also show that our approach is more beneficial for datasets characterized by relatively small size and large aspect ratio.This makes our approach particularly pertinent when solving bioscience classification problems.Our proposed systematic subsampling technique could be generalized to other machine learning algorithms that involve random subsampling mechanism.展开更多
The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied...The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.展开更多
文摘We read with great interest Deng et al.’s study 1 comparing sextant(6-core)and 12-core systematic biopsy in theMRI-targeted era,which valuably challenges the“more cores=higher accuracy”dogma by proposing a precision sampling strategy based on prostate cancer’s spatial distribution,aligning with personalized diagnosis trends.
基金the Ministry of Agriculture and Forestry key project“Puuta liikkeelle ja uusia tuotteita metsästä”(“Wood on the move and new products from forest”)Academy of Finland(project numbers 295100 , 306875).
文摘Background:The local pivotal method(LPM)utilizing auxiliary data in sample selection has recently been proposed as a sampling method for national forest inventories(NFIs).Its performance compared to simple random sampling(SRS)and LPM with geographical coordinates has produced promising results in simulation studies.In this simulation study we compared all these sampling methods to systematic sampling.The LPM samples were selected solely using the coordinates(LPMxy)or,in addition to that,auxiliary remote sensing-based forest variables(RS variables).We utilized field measurement data(NFI-field)and Multi-Source NFI(MS-NFI)maps as target data,and independent MS-NFI maps as auxiliary data.The designs were compared using relative efficiency(RE);a ratio of mean squared errors of the reference sampling design against the studied design.Applying a method in NFI also requires a proven estimator for the variance.Therefore,three different variance estimators were evaluated against the empirical variance of replications:1)an estimator corresponding to SRS;2)a Grafström-Schelin estimator repurposed for LPM;and 3)a Matérn estimator applied in the Finnish NFI for systematic sampling design.Results:The LPMxy was nearly comparable with the systematic design for the most target variables.The REs of the LPM designs utilizing auxiliary data compared to the systematic design varied between 0.74–1.18,according to the studied target variable.The SRS estimator for variance was expectedly the most biased and conservative estimator.Similarly,the Grafström-Schelin estimator gave overestimates in the case of LPMxy.When the RS variables were utilized as auxiliary data,the Grafström-Schelin estimates tended to underestimate the empirical variance.In systematic sampling the Matérn and Grafström-Schelin estimators performed for practical purposes equally.Conclusions:LPM optimized for a specific variable tended to be more efficient than systematic sampling,but all of the considered LPM designs were less efficient than the systematic sampling design for some target variables.The Grafström-Schelin estimator could be used as such with LPMxy or instead of the Matérn estimator in systematic sampling.Further studies of the variance estimators are needed if other auxiliary variables are to be used in LPM.
文摘Conflicting views had greeted the use of systematic sampling for sample selection and estimation in stratified sampling in terms of the precision of the population mean base on the inherent characteristics of the population. These conflicting views were analyzed using Cochran data (1977, p. 211) [1]. When the population units are ordered, variance of systematic sampling for all possible systematic samples provides equal, non-negative and most precise estimates for all the variance functions considered i.e. , unlike when a single systematic sample is used and when variance of simple random sampling is used to estimate selected systematic samples.
基金supported by the Fundamental Research Funds for the Central Universities (Grant No. 2021ZY04)the National Natural Science Foundation of China (Grant No. 32001252)the International Center for Bamboo and Rattan (Grant No. 1632020029)
文摘There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the condition tables of national inventory programs are characterized as cross-classes domains,such as vegetation type,productivity class,and age class.To date,challenges remain active for inventorying cross-classes domains because these domains are usually of unknown sampling frame and spatial distribution with the result that inference relies on population-level as opposed to domain-level sampling.Multiple challenges are noteworthy:(1)efficient sampling strategies are difficult to develop because of little priori information about the target domain;(2)domain inference relies on a sample designed for the population,so within-domain sample sizes could be too small to support a precise estimation;and(3)increasing sample size for the population does not ensure an increase to the domain,so actual sample size for a target domain remains highly uncertain,particularly for small domains.In this paper,we introduce a design-based generalized systematic adaptive cluster sampling(GSACS)for inventorying cross-classes domains.Design-unbiased Hansen-Hurwitz and Horvitz-Thompson estimators are derived for domain totals and compared within GSACS and with systematic sampling(SYS).Comprehensive Monte Carlo simulations show that(1)GSACS Hansen-Hurwitz and Horvitz-Thompson estimators are unbiased and equally efficient,whereas thelatter outperforms the former for supporting a sample of size one;(2)SYS is a special case of GSACS while the latter outperforms the former in terms of increased efficiency and reduced intensity;(3)GSACS Horvitz-Thompson variance estimator is design-unbiased for a single SYS sample;and(4)rules-ofthumb summarized with respect to sampling design and spatial effect improve precision.Because inventorying a mini domain is analogous to inventorying a rare variable,alternative network sampling procedures are also readily available for inventorying cross-classes domains.
文摘The main aim of this study was to evaluate methods for fixed area and distance sampling in the Zagros open forest area in western Iran. Basic forest management and planning required appropriate quantitative and qualitative information. Two sampling methods were compared on the basis of the actual means of characteristics derived from the 100 % survey. In total, 37 sampling plots were systematically installed with a grid of 100 m × 100 m in the study area. Density, crown canopy, and basal area of the stands were measured. The 100 % survey showed that tree density above 12.5 cm diameter at breast height was 68.04 stem ha-1, basal area was 15.16 m2 ha-1 and crown canopy percentage was 35.71% ha-1. The values for the traits determined by the two sampling methods differed significantly (P = 0.05). When the time required for the methods was compared, transect sampling required less than systematic-random sampling. Therefore, the transect sampling method was the more economical method for the Zagros open forests. The transect sampling method was statistically defensible and practical for quantitating characteristics of the Zagros open forests.
文摘Direct measurement of snow water equivalent(SWE)in snow-dominated mountainous areas is difficult,thus its prediction is essential for water resources management in such areas.In addition,because of nonlinear trend of snow spatial distribution and the multiple influencing factors concerning the SWE spatial distribution,statistical models are not usually able to present acceptable results.Therefore,applicable methods that are able to predict nonlinear trends are necessary.In this research,to predict SWE,the Sohrevard Watershed located in northwest of Iran was selected as the case study.Database was collected,and the required maps were derived.Snow depth(SD)at 150 points with two sampling patterns including systematic random sampling and Latin hypercube sampling(LHS),and snow density at 18 points were randomly measured,and then SWE was calculated.SWE was predicted using artificial neural network(ANN),adaptive neuro-fuzzy inference system(ANFIS)and regression methods.The results showed that the performance of ANN and ANFIS models with two sampling patterns were observed better than the regression method.Moreover,based on most of the efficiency criteria,the efficiency of ANN,ANFIS and regression methods under LHS pattern were observed higher than the systematic random sampling pattern.However,there were no significant differences between the two methods of ANN and ANFIS in SWE prediction.Data of both two sampling patterns had the highest sensitivity to the elevation.In addition,the LHS and the systematic random sampling patterns had the least sensitivity to the profile curvature and plan curvature,respectively.
文摘Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phyloge-netic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incor-rect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applica-tions of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computa-tional power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.
文摘Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden different types of immunization programs have been implemented. For better immunization coverage in developing countries, considerable progress is to be made to improve the knowledge and awareness regarding importance of vaccines. In this study a compara-tive study of immunization coverage under two sampling methods has been performed. Methods: In this study variance and design effect of proportion of children vaccinated against different types of vaccines (BCG, OPV, DPT, Hepatitis B, Hib, Measles and MMR) are estimated under two stage (30 × 30) cluster and systematic sampling for comparison of these two survey sampling methods. Also the homogeneity of clusters has been tested by using chi-square test. Results: It is observed that BCG, OPV and DPT vaccination coverage is more than 90% whereas Hepatitis B, Measles, Hib and MMR vaccination coverage is between 50% - 64% only. Here systematic random sampling is more complicated than two stage (30 × 30) cluster sampling. Also the result shows that the clusters are homogeneous with respect to proportion of children vaccinated. Conclusion: There is no significant difference between the two survey methodologies regarding the point estimation of vaccination coverage but estimation of variances of vaccination coverage is less in two stage (30 × 30) cluster sampling than that of the systematic sampling. Also the clusters are homogeneous. Very less improvement has been observed in case of fully vaccination coverage than the previous study. From the study it can be said that two stage (30 × 30) cluster sampling will be preferred to systematic sampling and simple random sampling method.
基金supported by the Qilu Youth Scholar Project of Shandong Universitysupported by National Natural Science Foundation of China(Grant No.11531008)+1 种基金the Ministry of Education of China(Grant No.IRT16R43)the Taishan Scholar Project of Shandong Province。
文摘Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k subsets,which usually causes subsampling bias,inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation.Based on ordered systematic sampling theory in statistics and low-discrepancy sequence theory in number theory,we propose a new k-fold cross-validation procedure by replacing a pseudo-random sequence with a best-discrepancy sequence,which ensures low subsampling bias and leads to more precise expected-prediction-error(EPE)estimates.Experiments with 156 benchmark datasets and three classifiers(logistic regression,decision tree and na?ve bayes)show that in general,our cross-validation procedure can extrude subsampling bias in the MCCV by lowering the EPE around 7.18%and the variances around 26.73%.In comparison,the stratified MCCV can reduce the EPE and variances of the MCCV around 1.58%and 11.85%,respectively.The leave-one-out(LOO)can lower the EPE around 2.50%but its variances are much higher than the any other cross-validation(CV)procedure.The computational time of our cross-validation procedure is just 8.64%of the MCCV,8.67%of the stratified MCCV and 16.72%of the LOO.Experiments also show that our approach is more beneficial for datasets characterized by relatively small size and large aspect ratio.This makes our approach particularly pertinent when solving bioscience classification problems.Our proposed systematic subsampling technique could be generalized to other machine learning algorithms that involve random subsampling mechanism.
文摘The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.