This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independ...This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.展开更多
This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design...This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.展开更多
Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest ...Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.展开更多
This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statisti...This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.展开更多
Intelligent machines are knowledge systems with unique knowledge structure and function.In this paper,we discuss issues including the characteristics and forms of machine knowledge,the relationship between knowledge a...Intelligent machines are knowledge systems with unique knowledge structure and function.In this paper,we discuss issues including the characteristics and forms of machine knowledge,the relationship between knowledge and human cognition,and the approach to acquire machine knowledge.These issues are of great significance to the development of artificial intelligence.展开更多
We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measu...We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measures such as sensitivity,specificity and area under the ROC curve are no longer applicable.In recent literature,new diagnostic accuracy measures are introduced in medical research studies.In this paper,important statistical concepts for multi-category classification accuracy are reviewed and their utilities are demonstrated with real medical examples.We offer problem-based R code to illustrate how to perform these statistical computations step by step.We expect such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.Our program can be adapted to many classifiers among which logistic regression may be the most popular approach.We thus base our discussion and illustration completely on the logistic regression in this paper.展开更多
In certain cases, noises can improve signal transmission or signal processing. This phenomenon is the so-called stochastic resonance. In this paper, we firstly present two theorems to prove that the noisy threshold ne...In certain cases, noises can improve signal transmission or signal processing. This phenomenon is the so-called stochastic resonance. In this paper, we firstly present two theorems to prove that the noisy threshold neuron shows stochastic resonance in terms of the probability of correct reception. Secondly, we analytically discuss stochastic resonance effects and give the probability-optimal noise levels for four representative noises. Finally, we discuss the stochastic gradient ascent learning law, which can be used to find the probability-optimal noise levels. We also present our simulation results for the four representative noises. These results indicate that stochastic resonance is favorable both in biological neurons and in signal processing.展开更多
文摘This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.
文摘This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.
文摘Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.
文摘This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.
文摘Intelligent machines are knowledge systems with unique knowledge structure and function.In this paper,we discuss issues including the characteristics and forms of machine knowledge,the relationship between knowledge and human cognition,and the approach to acquire machine knowledge.These issues are of great significance to the development of artificial intelligence.
基金Li’s work was partially supported by National Medical Research Council in Singapore and AcRF R-155-000-174-114.NNSF[grant number 11371142].
文摘We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measures such as sensitivity,specificity and area under the ROC curve are no longer applicable.In recent literature,new diagnostic accuracy measures are introduced in medical research studies.In this paper,important statistical concepts for multi-category classification accuracy are reviewed and their utilities are demonstrated with real medical examples.We offer problem-based R code to illustrate how to perform these statistical computations step by step.We expect such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.Our program can be adapted to many classifiers among which logistic regression may be the most popular approach.We thus base our discussion and illustration completely on the logistic regression in this paper.
文摘In certain cases, noises can improve signal transmission or signal processing. This phenomenon is the so-called stochastic resonance. In this paper, we firstly present two theorems to prove that the noisy threshold neuron shows stochastic resonance in terms of the probability of correct reception. Secondly, we analytically discuss stochastic resonance effects and give the probability-optimal noise levels for four representative noises. Finally, we discuss the stochastic gradient ascent learning law, which can be used to find the probability-optimal noise levels. We also present our simulation results for the four representative noises. These results indicate that stochastic resonance is favorable both in biological neurons and in signal processing.