Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by st...Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value.展开更多
Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a spe...Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a specific column where the data cell was missing. Multivariate imputation works simultaneously, with all variables in all columns, whether missing or observed. It has emerged as a principal method of solving missing data problems. All incomplete datasets analyzed before Multiple Imputation by Chained Equations <span style="font-family:Verdana;">(MICE) presented were misdiagnosed;results obtained were invalid and should</span><span style="font-family:Verdana;"> not be countable to yield reasonable conclusions. This article will highlight why multiple imputations and how the MICE work with a particular focus on the cyber-security dataset.</span><b> </b><span style="font-family:Verdana;">Removing missing data in any dataset and replac</span><span style="font-family:Verdana;">ing it is imperative in analyzing the data and creating prediction models. Therefore,</span><span style="font-family:Verdana;"> a good imputation technique should recover the missingness, which involves extracting the good features. However, the widely used univariate imputation method does not impute missingness reasonably if the values are too large and may thus lead to bias. Therefore, we aim to propose an alternative imputation method that is efficient and removes potential bias after removing the missingness.</span>展开更多
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multip...Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.展开更多
Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension ...Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.展开更多
As most air quality monitoring sites are in urban areas worldwide,machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants.The bias st...As most air quality monitoring sites are in urban areas worldwide,machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants.The bias stems from the issue of dataset shift,as the density distributions of predictor variables differ greatly between urban and rural areas.We propose a data-augmentation approach based on the multiple imputation by chained equations(MICE-DA)to remedy the dataset shift problem.Compared with the benchmark models,MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity(Chengdu)at the foot of the Tibetan Plateau,especially for correcting the estimation bias,with the mean bias decreasing from-3.4µg/m^(3)to-1.6µg/m^(3).As a complement to the holdout validation,the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area.The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth(AOD)during the data augmentation.Consequently,the importance of AOD is largely enhanced for predicting PM2.5,and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5%to 18.4%.This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies.The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research.展开更多
Attrition is a common challenge in statistical analysis for longitudinal or multi-stage cross-sectional studies.While strategies to reduce attrition should ideally be implemented during the study design phase,they rem...Attrition is a common challenge in statistical analysis for longitudinal or multi-stage cross-sectional studies.While strategies to reduce attrition should ideally be implemented during the study design phase,they remain common in real-world research,necessitating statistical methods to address them.Tra-ditional approaches like multiple imputation(MI)and inverse probability weighting(IPW)rely on the assumption that data is missing at random(MAR),which is not always plausible.Recent developments in machine learn-ing(ML)based methods offer promising alternatives because of their ability to capture complex patterns in data and handle non-linear relationships more effectively.This study examines four ML-based imputation methods to ac-count for attrition and compares them with conventional MI and IPW in a two-stage epilepsy population-based prevalence survey involving 56,425 par-ticipants.Simulated attrition levels from 5%to 50%were applied following the MAR mechanism to assess the performance of the different methods.This was replicated 100 times using different random seeds.Results showed that bias increased with an increase in attrition levels.Complete case analysis had the largest bias in all scenarios.k-nearest neighbor(KNN)and sequential KNN(sKNN)performed similarly to MI under MAR but exhibited less bias than MI and IPW when data were MNAR.While IPW performed similarly to MI under MAR,it had greater bias under MNAR.Both missForest and the MI implemented using random forest were outperformed by sKNN and KNN.We have demonstrated that even a small attrition proportion of 5%can sig-nificantly bias estimates if not properly addressed.While MI is still the most preferred for missing data assuming MAR,ML methods,particularly sKNN and KNN demonstrated potential for addressing attrition when data are MNAR.Choosing the appropriate method to address missing data should be preceded by an evaluation of different available methods that could be suitable for the data being analysed.Future research should explore ML methods in various study designs and consider integrating ML into the very robust MI framework to improve prediction accuracy for missing data due to attrition.展开更多
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead...In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.展开更多
This study computes the durability of Return on Assets (ROA) in small and medium enterprises from different sample datasets. Utilizing information from the Financial Statements Statistics of Corporations by Industry...This study computes the durability of Return on Assets (ROA) in small and medium enterprises from different sample datasets. Utilizing information from the Financial Statements Statistics of Corporations by Industry, it verifies the precision of correlation coefficients using the Non-iterative Bayesian-based Imputation (NIBAS) and multiple imputation method for all combinations of common variables with auxiliary files. The following are the three important findings of this paper. First, statistical matching estimates of higher precision can be obtained using key variable sets with higher canonical correlation coefficients. Second, even if the key variable sets have high canonical correlation coefficients, key variables that are correlated extremely strongly with target variables and have high kurtosis should not be used. Finally, using auxiliary flies can improve the precision of statistical matching estimates. Accordingly, the durability of ROA in small and medium enterprises is computed. The author finds that the series of ROA correlation fluctuates for smaller enterprises compared to larger ones, and thus, the vulnerability of ROA in small and medium enterprises can be clarified via statistical matching.展开更多
Missing data are unavoidable in longitudinal clinical trials,and outcomes are not always normally distributed.In the presence of outliers or heavy-tailed distributions,the conventional multiple imputation with the mix...Missing data are unavoidable in longitudinal clinical trials,and outcomes are not always normally distributed.In the presence of outliers or heavy-tailed distributions,the conventional multiple imputation with the mixed model with repeated measures analysis of the average treatment effect(ATE)based on the multivariate normal assumption may produce bias and power loss.Control-based imputation(CBI)is an approach for evaluating the treatment effect under the assumption that participants in both the test and control groups with missing outcome data have a similar outcome profile as those with an identical history in the control group.We develop a robust framework to handle non-normal outcomes under CBI without imposing any parametric modeling assumptions.Under the proposed framework,sequential weighted robust regressions are applied to protect the constructed imputation model against non-normality in the covariates and the response variables.Accompanied by the subsequent mean imputation and robust model analysis,the resulting ATE estimator has good theoretical properties in terms of consistency and asymptotic normality.Moreover,our proposed method guarantees the analysis model robustness of the ATE estimation in the sense that its asymptotic results remain intact even when the analysis model is misspecified.The superiority of the proposed robust method is demonstrated by comprehensive simulation studies and an AIDS clinical trial data application.展开更多
In some situations,the failure time of interest is defined as the gap time between two related events and the observations on both event times can suffer either right or interval censoring.Such data are usually referr...In some situations,the failure time of interest is defined as the gap time between two related events and the observations on both event times can suffer either right or interval censoring.Such data are usually referred to as doubly censored data and frequently encountered in many clinical and observational studies.Additionally,there may also exist a cured subgroup in the whole population,which means that not every individual under study will experience the failure time of interest eventually.In this paper,we consider regression analysis of doubly censored data with a cured subgroup under a wide class of flexible transformation cure models.Specifically,we consider marginal likelihood estimation and develop a two-step approach by combining the multiple imputation and a new expectation-maximization(EM)algorithm for its implementation.The resulting estimators are shown to be consistent and asymptotically normal.The finite sample performance of the proposed method is investigated through simulation studies.The proposed method is also applied to a real dataset arising from an AIDS cohort study for illustration.展开更多
Although Gregory Johnson’s models have influenced social theory in archaeology,few have applied or built upon these models to predict aspects of social organization,group size,or fissioning.Exceptions have been limit...Although Gregory Johnson’s models have influenced social theory in archaeology,few have applied or built upon these models to predict aspects of social organization,group size,or fissioning.Exceptions have been limited to small case studies.Recently,the relationship between a society’s scale and its information-processing capacities has been explored using the Seshat Databank.Here,I apply multiple-linear regression analysis to the Seshat data using Turchin and colleagues’9“complexity characteristics”(CCs)to further examine the relationship between the hierarchy CC and the remaining 8 CCs which include both aspects of a polity’s scale and aspects of what Kohler et al.call“collective computation”.The results support Johnson’s ideas that stratification will generally increase with increases in a polity’s scale(population,territory);however,stratification is also higher when polities increase their developments in information-processing variables such as texts.展开更多
基金This research was financially supported by FDCT NO.005/2018/A1also supported by Guangdong Provincial Innovation and Entrepreneurship Training Program Project No.201713719017College Students Innovation Training Program held by Guangdong university of Science and Technology Nos.1711034,1711080,and No.1711088.
文摘Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value.
文摘Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a specific column where the data cell was missing. Multivariate imputation works simultaneously, with all variables in all columns, whether missing or observed. It has emerged as a principal method of solving missing data problems. All incomplete datasets analyzed before Multiple Imputation by Chained Equations <span style="font-family:Verdana;">(MICE) presented were misdiagnosed;results obtained were invalid and should</span><span style="font-family:Verdana;"> not be countable to yield reasonable conclusions. This article will highlight why multiple imputations and how the MICE work with a particular focus on the cyber-security dataset.</span><b> </b><span style="font-family:Verdana;">Removing missing data in any dataset and replac</span><span style="font-family:Verdana;">ing it is imperative in analyzing the data and creating prediction models. Therefore,</span><span style="font-family:Verdana;"> a good imputation technique should recover the missingness, which involves extracting the good features. However, the widely used univariate imputation method does not impute missingness reasonably if the values are too large and may thus lead to bias. Therefore, we aim to propose an alternative imputation method that is efficient and removes potential bias after removing the missingness.</span>
基金supported by the Stiftung Rheinland-Pfalz fur Innovation(959).
文摘Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.
基金supported by the National Natural Science Foundation of China under Grant Nos.11871287,11501208,11771144,11801359the Natural Science Foundation of Tianjin under Grant No.18JCYBJC41100+1 种基金Fundamental Research Funds for the Central Universitiesthe Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin。
文摘Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.
基金supported by the National Natural Science Foundation of China(Grant No.22076129)the Sichuan Key R&D Project(Grant No.2020YFS0055)the Chengdu Major Technology Application and Demonstration Project(Grant No.2020-YF09-00031-SN).
文摘As most air quality monitoring sites are in urban areas worldwide,machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants.The bias stems from the issue of dataset shift,as the density distributions of predictor variables differ greatly between urban and rural areas.We propose a data-augmentation approach based on the multiple imputation by chained equations(MICE-DA)to remedy the dataset shift problem.Compared with the benchmark models,MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity(Chengdu)at the foot of the Tibetan Plateau,especially for correcting the estimation bias,with the mean bias decreasing from-3.4µg/m^(3)to-1.6µg/m^(3).As a complement to the holdout validation,the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area.The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth(AOD)during the data augmentation.Consequently,the importance of AOD is largely enhanced for predicting PM2.5,and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5%to 18.4%.This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies.The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research.
基金the National Institute for Health Research(grant number NIHR200134)using Official Development Assistance(ODA)funding.
文摘Attrition is a common challenge in statistical analysis for longitudinal or multi-stage cross-sectional studies.While strategies to reduce attrition should ideally be implemented during the study design phase,they remain common in real-world research,necessitating statistical methods to address them.Tra-ditional approaches like multiple imputation(MI)and inverse probability weighting(IPW)rely on the assumption that data is missing at random(MAR),which is not always plausible.Recent developments in machine learn-ing(ML)based methods offer promising alternatives because of their ability to capture complex patterns in data and handle non-linear relationships more effectively.This study examines four ML-based imputation methods to ac-count for attrition and compares them with conventional MI and IPW in a two-stage epilepsy population-based prevalence survey involving 56,425 par-ticipants.Simulated attrition levels from 5%to 50%were applied following the MAR mechanism to assess the performance of the different methods.This was replicated 100 times using different random seeds.Results showed that bias increased with an increase in attrition levels.Complete case analysis had the largest bias in all scenarios.k-nearest neighbor(KNN)and sequential KNN(sKNN)performed similarly to MI under MAR but exhibited less bias than MI and IPW when data were MNAR.While IPW performed similarly to MI under MAR,it had greater bias under MNAR.Both missForest and the MI implemented using random forest were outperformed by sKNN and KNN.We have demonstrated that even a small attrition proportion of 5%can sig-nificantly bias estimates if not properly addressed.While MI is still the most preferred for missing data assuming MAR,ML methods,particularly sKNN and KNN demonstrated potential for addressing attrition when data are MNAR.Choosing the appropriate method to address missing data should be preceded by an evaluation of different available methods that could be suitable for the data being analysed.Future research should explore ML methods in various study designs and consider integrating ML into the very robust MI framework to improve prediction accuracy for missing data due to attrition.
文摘In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.
文摘This study computes the durability of Return on Assets (ROA) in small and medium enterprises from different sample datasets. Utilizing information from the Financial Statements Statistics of Corporations by Industry, it verifies the precision of correlation coefficients using the Non-iterative Bayesian-based Imputation (NIBAS) and multiple imputation method for all combinations of common variables with auxiliary files. The following are the three important findings of this paper. First, statistical matching estimates of higher precision can be obtained using key variable sets with higher canonical correlation coefficients. Second, even if the key variable sets have high canonical correlation coefficients, key variables that are correlated extremely strongly with target variables and have high kurtosis should not be used. Finally, using auxiliary flies can improve the precision of statistical matching estimates. Accordingly, the durability of ROA in small and medium enterprises is computed. The author finds that the series of ROA correlation fluctuates for smaller enterprises compared to larger ones, and thus, the vulnerability of ROA in small and medium enterprises can be clarified via statistical matching.
基金supported by the NSF(National Institutes of Health)[Grant Number DMS-1811245]NIA(National Science Foundation)[Grant Number 1R01AG066883]NIEHS[Grant Number 1R01ES031651].
文摘Missing data are unavoidable in longitudinal clinical trials,and outcomes are not always normally distributed.In the presence of outliers or heavy-tailed distributions,the conventional multiple imputation with the mixed model with repeated measures analysis of the average treatment effect(ATE)based on the multivariate normal assumption may produce bias and power loss.Control-based imputation(CBI)is an approach for evaluating the treatment effect under the assumption that participants in both the test and control groups with missing outcome data have a similar outcome profile as those with an identical history in the control group.We develop a robust framework to handle non-normal outcomes under CBI without imposing any parametric modeling assumptions.Under the proposed framework,sequential weighted robust regressions are applied to protect the constructed imputation model against non-normality in the covariates and the response variables.Accompanied by the subsequent mean imputation and robust model analysis,the resulting ATE estimator has good theoretical properties in terms of consistency and asymptotic normality.Moreover,our proposed method guarantees the analysis model robustness of the ATE estimation in the sense that its asymptotic results remain intact even when the analysis model is misspecified.The superiority of the proposed robust method is demonstrated by comprehensive simulation studies and an AIDS clinical trial data application.
基金Supported by the National Natural Science Foundation of China(Grant Nos.11771431,11690015,11926341,11901128 and 11601097)Key Laboratory of RCSDS,CAS(Grant Nos.2008DP 173182)Natural Science Foundation of Guangdong Province of China(Grant No.2018A030310068)。
文摘In some situations,the failure time of interest is defined as the gap time between two related events and the observations on both event times can suffer either right or interval censoring.Such data are usually referred to as doubly censored data and frequently encountered in many clinical and observational studies.Additionally,there may also exist a cured subgroup in the whole population,which means that not every individual under study will experience the failure time of interest eventually.In this paper,we consider regression analysis of doubly censored data with a cured subgroup under a wide class of flexible transformation cure models.Specifically,we consider marginal likelihood estimation and develop a two-step approach by combining the multiple imputation and a new expectation-maximization(EM)algorithm for its implementation.The resulting estimators are shown to be consistent and asymptotically normal.The finite sample performance of the proposed method is investigated through simulation studies.The proposed method is also applied to a real dataset arising from an AIDS cohort study for illustration.
基金the Graduate School at Washington State University and by the National Science Foundation(No.SMA-1620462)to the Santa Fe Institute and Washington State University.
文摘Although Gregory Johnson’s models have influenced social theory in archaeology,few have applied or built upon these models to predict aspects of social organization,group size,or fissioning.Exceptions have been limited to small case studies.Recently,the relationship between a society’s scale and its information-processing capacities has been explored using the Seshat Databank.Here,I apply multiple-linear regression analysis to the Seshat data using Turchin and colleagues’9“complexity characteristics”(CCs)to further examine the relationship between the hierarchy CC and the remaining 8 CCs which include both aspects of a polity’s scale and aspects of what Kohler et al.call“collective computation”.The results support Johnson’s ideas that stratification will generally increase with increases in a polity’s scale(population,territory);however,stratification is also higher when polities increase their developments in information-processing variables such as texts.