In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fi...In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fixed skewness and fixed kurtosis by means of Monte Carlo simulation. This comparison has been made when the ratio of variance is two as well as with equal and different sample sizes for large sample volumes. The sample used in the study is: (25, 25), (25, 50), (25, 75), (25, 100), (50, 25), (50, 50), (50, 75), (50, 100), (75, 25), (75, 50), (75, 75), (75, 100), (100, 25), (100, 50), (100, 75), and (100, 100). According to the results of the study, it has been observed that the statistical power of both tests decreases when the coefficient of kurtosis is held fixed and the coefficient of skewness is reduced while it increases when the coefficient of skewness is held fixed and the coefficient of kurtosis is reduced. When the ratio of skewness is reduced in the case of fixed kurtosis, the WW test is stronger in sample volumes (25, 25), (25, 50), (25, 75), (25, 100), (50, 75), and (50, 100) while KS-2 test is stronger in other sample volumes. When the ratio of kurtosis is reduced in the case of fixed skewness, the statistical power of WW test is stronger in volume samples (25, 25), (25, 75), (25, 100), and (75, 25) while KS-2 test is stronger in other sample volumes.展开更多
This paper presents a new class of test procedures for two-sample location problem based on subsample quantiles. The class includes Mann-Whitney test as a special case. The asymptotic normality of the class of tests p...This paper presents a new class of test procedures for two-sample location problem based on subsample quantiles. The class includes Mann-Whitney test as a special case. The asymptotic normality of the class of tests proposed is established. The asymptotic relative performance of the proposed class of test with respect to the optimal member of Xie and Priebe (2000) is studied in terms of Pitman efficiency for various underlying distributions.展开更多
The key technology and main difficulty for optical fiber intrusion pre-warning systems (OFIPS) is the extraction of harmful-intrusion signals. After being processed by a phase-sensitive optical time-domain reflectom...The key technology and main difficulty for optical fiber intrusion pre-warning systems (OFIPS) is the extraction of harmful-intrusion signals. After being processed by a phase-sensitive optical time-domain reflectometer (O-0TDR), vibration signals can be preliminarily extracted. Generally, these include noises and intrusions. Here, intrusions can be divided into harmful and harmless intrusions. With respect to the close study of signal characteristics, an effective extraction method of harmful intrusion is proposed in the paper. Firstly, in the part of the background reconstruction, all intrusion signals are first detected by a constant false alarm rate (CFAR). We then reconstruct the backgrounds by extracting two-part information of alarm points, time and amplitude. This ensures that the detection background consists of intrusion signals. Secondly, in the part of the two-dimensional Kolmogorov-Smirnov (K-S) test, in order to extract harmful ones from all extracted intrusions, we design a separation method. It is based on the signal characteristics of harmful intrusion, which are shorter time interval and higher amplitude. In the actual OFIPS, the detection method is used in some typical scenes, which includes a lot of harmless intrusions, for example construction sites and busy roads. Results show that we can effectively extract harmful intrusions.展开更多
A saddlepoint approximation for a two-sample permutation test was obtained by Robinson[7].Although the approximation is very accurate, the formula is very complicated and difficult toapply. In this papert we shall rev...A saddlepoint approximation for a two-sample permutation test was obtained by Robinson[7].Although the approximation is very accurate, the formula is very complicated and difficult toapply. In this papert we shall revisit the same problem from a different angle. We shall first turnthe problem into a conditional probability and then apply a Lugannani-Rice type formula to it,which was developed by Skovagard[8] for the mean of i.i.d. samples and by Jing and Robinson[5]for smooth function of vector means. Both the Lugannani-Rice type formula and Robinson'sformula achieve the same relative error of order O(n-3/2), but the former is very compact andmuch easier to use in practice. Some numerical results will be presented to compare the twoformulas.展开更多
Neon flying squid Ommastrephes batramii is widely distributed in the North Pacific Ocean, which has become the main fishing species for Chinese squid jigging fleets since 1993. Many authors have made the studies on th...Neon flying squid Ommastrephes batramii is widely distributed in the North Pacific Ocean, which has become the main fishing species for Chinese squid jigging fleets since 1993. Many authors have made the studies on the fields of fishing ground and its environment conditions. However, the squid catch per fishing vessel attained the highest level of about 550 t in 2004. In this paper, the catch and its distribution in 2004 would be compared with the previous year. Based on the catch data from Chinese squid jigging vessels and sea surface temperature with the format of 1 °latitude by 1 °longitude from May to November in 2004, the distribution maps were drawn by Marine explorer 4.0. The results show that the production in the east waters to 160°E was low during May and July. During October and November, the production in the waters from 150°E to 160°E was relatively higher, which occupied 62.5 percent of the total catch. During November, the production in the west waters to 150°E was also low. The highest CPUE area located in the west waters to 150°E, the next was the area from 150°E to 160°E and the lowest CPUE area located in the east waters to 160°E. The SST in the fishing ground seems to change seasonally. The suitable SST for each month is as follows: 12-14 ℃ in May, 15 ℃ - 16 ℃ in June, 14 ℃ - 16 ℃ in July, 18 ℃ - 19 ℃ in August, 16 ℃ -17 ℃ in September, 15 ℃- 16 ℃ in October and 12 ℃ - 13 ℃ in November. The result of K-S test shows that the above monthly suitable SST is considered as the indicator of looking for the main fishing ground.展开更多
Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surv...Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surveys of elderly people aged 60 years and above. We found that there existed a typical power-law distribution for the rates of different numbers of chronic diseases among elderly Chinese people. A Kolmogorov-Smirnov test indicated that the result was robust, and the power exponents were approximately ?2.5. In addition, a paired t-test was conducted, which demonstrated that the rates of different numbers of chronic diseases did not have significant urban-rural differences, time differences or gender differences.展开更多
With the rapid development of big data technology, the personal credit evaluation industry has entered a new stage. Among them, the evaluation of personal credit based on mobile telecommunications data is one of the h...With the rapid development of big data technology, the personal credit evaluation industry has entered a new stage. Among them, the evaluation of personal credit based on mobile telecommunications data is one of the hotspots of current research. However, due to the complexity and diversity of personal credit evaluation variables, in order to reduce the complexity of the model and improve the prediction accuracy of the model, we need to reduce the dimension of the input variables. According to the data provided by a mobile telecommunications operator, this paper divides the data into a training sets and verification sets. We perform correlation analysis on each indicator of the data in the training set, and calculate the corresponding IV value based on the WOE value of the selected index, then binning data with SPSS Modeler. The selected variables were modeled using a logistic regression algorithm. In order to make the regression results more practical, we extract the scoring rules according to the results of logistic regression, convert them into the form of score cards, and finally verify the validity of the model.展开更多
Let X<sub>1</sub>,…,X<sub>m</sub> and Y<sub>1</sub>,…,Y<sub>n</sub> be two independent random simple samples drawn from FandG respectively, which are unknown continuou...Let X<sub>1</sub>,…,X<sub>m</sub> and Y<sub>1</sub>,…,Y<sub>n</sub> be two independent random simple samples drawn from FandG respectively, which are unknown continuous distributions on R. Considering hypothesistesting problem:展开更多
In this paper, a new statistics for testing two samples coming from the same population is derived from a simple linear model with an artificial parameter. Its limit distribution is a chi-squared distribution with 2 d...In this paper, a new statistics for testing two samples coming from the same population is derived from a simple linear model with an artificial parameter. Its limit distribution is a chi-squared distribution with 2 degrees of freedom under null hypothesis and the limit distribution is a noncentral chi-squared distribution with 2 degrees of freedom under certain sequence of alternative hypothesis. Finally, we make power comparison with other tests on two samples, especially, with Smirnov statistics.展开更多
This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical resu...This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical result,we find the asymptotic joint distribution for the quadratic form and maximum,which can be applied into the high-dimensional testing problems.By combining the sum-type test and the max-type test,we propose the Fisher’s combination tests for the one-sample mean test and two-sample mean test.Under this novel general framework,several strong assumptions in existing literature have been relaxed.Monte Carlo simulation has been done which shows that our proposed tests are strongly robust to both sparse and dense data.展开更多
文摘In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fixed skewness and fixed kurtosis by means of Monte Carlo simulation. This comparison has been made when the ratio of variance is two as well as with equal and different sample sizes for large sample volumes. The sample used in the study is: (25, 25), (25, 50), (25, 75), (25, 100), (50, 25), (50, 50), (50, 75), (50, 100), (75, 25), (75, 50), (75, 75), (75, 100), (100, 25), (100, 50), (100, 75), and (100, 100). According to the results of the study, it has been observed that the statistical power of both tests decreases when the coefficient of kurtosis is held fixed and the coefficient of skewness is reduced while it increases when the coefficient of skewness is held fixed and the coefficient of kurtosis is reduced. When the ratio of skewness is reduced in the case of fixed kurtosis, the WW test is stronger in sample volumes (25, 25), (25, 50), (25, 75), (25, 100), (50, 75), and (50, 100) while KS-2 test is stronger in other sample volumes. When the ratio of kurtosis is reduced in the case of fixed skewness, the statistical power of WW test is stronger in volume samples (25, 25), (25, 75), (25, 100), and (75, 25) while KS-2 test is stronger in other sample volumes.
文摘This paper presents a new class of test procedures for two-sample location problem based on subsample quantiles. The class includes Mann-Whitney test as a special case. The asymptotic normality of the class of tests proposed is established. The asymptotic relative performance of the proposed class of test with respect to the optimal member of Xie and Priebe (2000) is studied in terms of Pitman efficiency for various underlying distributions.
文摘The key technology and main difficulty for optical fiber intrusion pre-warning systems (OFIPS) is the extraction of harmful-intrusion signals. After being processed by a phase-sensitive optical time-domain reflectometer (O-0TDR), vibration signals can be preliminarily extracted. Generally, these include noises and intrusions. Here, intrusions can be divided into harmful and harmless intrusions. With respect to the close study of signal characteristics, an effective extraction method of harmful intrusion is proposed in the paper. Firstly, in the part of the background reconstruction, all intrusion signals are first detected by a constant false alarm rate (CFAR). We then reconstruct the backgrounds by extracting two-part information of alarm points, time and amplitude. This ensures that the detection background consists of intrusion signals. Secondly, in the part of the two-dimensional Kolmogorov-Smirnov (K-S) test, in order to extract harmful ones from all extracted intrusions, we design a separation method. It is based on the signal characteristics of harmful intrusion, which are shorter time interval and higher amplitude. In the actual OFIPS, the detection method is used in some typical scenes, which includes a lot of harmless intrusions, for example construction sites and busy roads. Results show that we can effectively extract harmful intrusions.
文摘A saddlepoint approximation for a two-sample permutation test was obtained by Robinson[7].Although the approximation is very accurate, the formula is very complicated and difficult toapply. In this papert we shall revisit the same problem from a different angle. We shall first turnthe problem into a conditional probability and then apply a Lugannani-Rice type formula to it,which was developed by Skovagard[8] for the mean of i.i.d. samples and by Jing and Robinson[5]for smooth function of vector means. Both the Lugannani-Rice type formula and Robinson'sformula achieve the same relative error of order O(n-3/2), but the former is very compact andmuch easier to use in practice. Some numerical results will be presented to compare the twoformulas.
文摘Neon flying squid Ommastrephes batramii is widely distributed in the North Pacific Ocean, which has become the main fishing species for Chinese squid jigging fleets since 1993. Many authors have made the studies on the fields of fishing ground and its environment conditions. However, the squid catch per fishing vessel attained the highest level of about 550 t in 2004. In this paper, the catch and its distribution in 2004 would be compared with the previous year. Based on the catch data from Chinese squid jigging vessels and sea surface temperature with the format of 1 °latitude by 1 °longitude from May to November in 2004, the distribution maps were drawn by Marine explorer 4.0. The results show that the production in the east waters to 160°E was low during May and July. During October and November, the production in the waters from 150°E to 160°E was relatively higher, which occupied 62.5 percent of the total catch. During November, the production in the west waters to 150°E was also low. The highest CPUE area located in the west waters to 150°E, the next was the area from 150°E to 160°E and the lowest CPUE area located in the east waters to 160°E. The SST in the fishing ground seems to change seasonally. The suitable SST for each month is as follows: 12-14 ℃ in May, 15 ℃ - 16 ℃ in June, 14 ℃ - 16 ℃ in July, 18 ℃ - 19 ℃ in August, 16 ℃ -17 ℃ in September, 15 ℃- 16 ℃ in October and 12 ℃ - 13 ℃ in November. The result of K-S test shows that the above monthly suitable SST is considered as the indicator of looking for the main fishing ground.
文摘Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surveys of elderly people aged 60 years and above. We found that there existed a typical power-law distribution for the rates of different numbers of chronic diseases among elderly Chinese people. A Kolmogorov-Smirnov test indicated that the result was robust, and the power exponents were approximately ?2.5. In addition, a paired t-test was conducted, which demonstrated that the rates of different numbers of chronic diseases did not have significant urban-rural differences, time differences or gender differences.
文摘With the rapid development of big data technology, the personal credit evaluation industry has entered a new stage. Among them, the evaluation of personal credit based on mobile telecommunications data is one of the hotspots of current research. However, due to the complexity and diversity of personal credit evaluation variables, in order to reduce the complexity of the model and improve the prediction accuracy of the model, we need to reduce the dimension of the input variables. According to the data provided by a mobile telecommunications operator, this paper divides the data into a training sets and verification sets. We perform correlation analysis on each indicator of the data in the training set, and calculate the corresponding IV value based on the WOE value of the selected index, then binning data with SPSS Modeler. The selected variables were modeled using a logistic regression algorithm. In order to make the regression results more practical, we extract the scoring rules according to the results of logistic regression, convert them into the form of score cards, and finally verify the validity of the model.
基金Project supported by the National Natural Science Foundation of China.
文摘Let X<sub>1</sub>,…,X<sub>m</sub> and Y<sub>1</sub>,…,Y<sub>n</sub> be two independent random simple samples drawn from FandG respectively, which are unknown continuous distributions on R. Considering hypothesistesting problem:
基金This project is supported by Beijing Natural Science Foundation by Chinese Natural ScienceFoundation.
文摘In this paper, a new statistics for testing two samples coming from the same population is derived from a simple linear model with an artificial parameter. Its limit distribution is a chi-squared distribution with 2 degrees of freedom under null hypothesis and the limit distribution is a noncentral chi-squared distribution with 2 degrees of freedom under certain sequence of alternative hypothesis. Finally, we make power comparison with other tests on two samples, especially, with Smirnov statistics.
基金supported by the National Natural Science Foundation of China(Grant Nos.12101335 and 12271271)the Natural Science Foundation of Tianjin(Grant No.21JCQNJC00020)+4 种基金the Fundamental Research Funds for the Central Universities,Nankai University(Grant Nos.63211088 and 63221050)supported by National Natural Science Foundation of China(Grant No.12101332)supported by Shenzhen Wukong Investment Company,the Fundamental Research Funds for the Central Universities under(Grant No.ZB22000105)the China National Key R&D Program(Grant Nos.2019YFC1908502,2022YFA1003703,2022YFA1003802,2022YFA1003803)the National Natural Science Foundation of China(Grants Nos.12271271,11925106,12231011,11931001 and 11971247)。
文摘This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical result,we find the asymptotic joint distribution for the quadratic form and maximum,which can be applied into the high-dimensional testing problems.By combining the sum-type test and the max-type test,we propose the Fisher’s combination tests for the one-sample mean test and two-sample mean test.Under this novel general framework,several strong assumptions in existing literature have been relaxed.Monte Carlo simulation has been done which shows that our proposed tests are strongly robust to both sparse and dense data.