Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a spe...Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a specific column where the data cell was missing. Multivariate imputation works simultaneously, with all variables in all columns, whether missing or observed. It has emerged as a principal method of solving missing data problems. All incomplete datasets analyzed before Multiple Imputation by Chained Equations <span style="font-family:Verdana;">(MICE) presented were misdiagnosed;results obtained were invalid and should</span><span style="font-family:Verdana;"> not be countable to yield reasonable conclusions. This article will highlight why multiple imputations and how the MICE work with a particular focus on the cyber-security dataset.</span><b> </b><span style="font-family:Verdana;">Removing missing data in any dataset and replac</span><span style="font-family:Verdana;">ing it is imperative in analyzing the data and creating prediction models. Therefore,</span><span style="font-family:Verdana;"> a good imputation technique should recover the missingness, which involves extracting the good features. However, the widely used univariate imputation method does not impute missingness reasonably if the values are too large and may thus lead to bias. Therefore, we aim to propose an alternative imputation method that is efficient and removes potential bias after removing the missingness.</span>展开更多
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil...This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method.展开更多
How many imputations are sufficient in multiple imputations? The answer given by different researchers varies from as few as 2 - 3 to as many as hundreds. Perhaps no single number of imputations would fit all situatio...How many imputations are sufficient in multiple imputations? The answer given by different researchers varies from as few as 2 - 3 to as many as hundreds. Perhaps no single number of imputations would fit all situations. In this study, η, the minimally sufficient number of imputations, was determined based on the relationship between m, the number of imputations, and ω, the standard error of imputation variances using the 2012 National Ambulatory Medical Care Survey (NAMCS) Physician Workflow mail survey. Five variables of various value ranges, variances, and missing data percentages were tested. For all variables tested, ω decreased as m increased. The m value above which the cost of further increase in m would outweigh the benefit of reducing ω was recognized as the η. This method has a potential to be used by anyone to determine η that fits his or her own data situation.展开更多
In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. ...In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. To deal with this challenge, several imputation methods have been developed in the literature to handle missing values where the most commonly used are complete case method, mean imputation method, last observation carried forward (LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a simulation study to investigate the efficiency of these four typical imputation methods with longitudinal data setting under missing completely at random (MCAR). We categorize missingness with three cases from a lower percentage of 5% to a higher percentage of 30% and 50% missingness. With this simulation study, we make a conclusion that LOCF method has more bias than the other three methods in most situations. MI method has the least bias with the best coverage probability. Thus, we conclude that MI method is the most effective imputation method in our MCAR simulation study.展开更多
To fully leverage‘‘smart”transportation infrastructure data-stream investments,the creation of applications that provide real-time meaningful and actionable corridorperformance metrics is needed.However,the presenc...To fully leverage‘‘smart”transportation infrastructure data-stream investments,the creation of applications that provide real-time meaningful and actionable corridorperformance metrics is needed.However,the presence of gaps in data streams can lead to significant application implementation challenges.To demonstrate and help address these challenges,a digital twin smart-corridor application case study is presented with two primary research objectives:(1)explore the characteristics of volume data gaps on the case study corridor,and(2)investigate the feasibility of prioritizing data streams for data imputation to drive the real-time application.For the first objective,a K-means clustering analysis is used to identify similarities and differences among data gap patterns.The clustering analysis successfully identifies eight different data loss patterns.Patterns vary in both continuity and density of data gap occurrences,as well as time-dependent losses in several clusters.For the second objective,a temporal-neighboring interpolation approach for volume data imputation is explored.When investigating the use of temporalneighboring interpolation imputations on the digital twin application,performance is,in part,dependent on the combination of intersection approaches experiencing data loss,demand relative to capacity at individual locations,and the location of the loss along the corridor.The results indicate that these insights could be used to prioritize intersection approaches suitable for data imputation and to identify locations that require a more sensitive imputation methodology or improved maintenance and monitoring.展开更多
Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light grad...Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light gradient boosting machine(LGBM)algorithm was employed to impute more than 60%of the missing data,establishing a radionuclide diffusion dataset containing 16 input features and 813 instances.The effective diffusion coefficient(D_(e))was predicted using ten ML models.The predictive accuracy of the ensemble meta-models,namely LGBM-extreme gradient boosting(XGB)and LGBM-categorical boosting(CatB),surpassed that of the other ML models,with R^(2)values of 0.94.The models were applied to predict the D_(e)values of EuEDTA^(−)and HCrO_(4)^(−)in saturated compacted bentonites at compactions ranging from 1200 to 1800 kg/m^(3),which were measured using a through-diffusion method.The generalization ability of the LGBM-XGB model surpassed that of LGB-CatB in predicting the D_(e)of HCrO_(4)^(−).Shapley additive explanations identified total porosity as the most significant influencing factor.Additionally,the partial dependence plot analysis technique yielded clearer results in the univariate correlation analysis.This study provides a regression imputation technique to refine radionuclide diffusion datasets,offering deeper insights into analyzing the diffusion mechanism of radionuclides and supporting the safety assessment of the geological disposal of high-level radioactive waste.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
Outcrop analogue studies play an important role in advancing our comprehension of reservoir architectures,offering insights into hidden reservoir rocks prior to drilling,in a cost-effective manner.These studies contri...Outcrop analogue studies play an important role in advancing our comprehension of reservoir architectures,offering insights into hidden reservoir rocks prior to drilling,in a cost-effective manner.These studies contribute to the delineation of the three-dimensional geometry of geological structures,the characterization of petro-and thermo-physical properties,and the structural geological aspects of reservoir rocks.Nevertheless,several challenges,including inaccessible sampling sites,limited resources,and the dimensional constraints of different laboratories hinder the acquisition of comprehensive datasets.In this study,we employ machine learning techniques to estimate missing data in a petrophysical dataset of fractured Variscan granites from the Cornubian Batholith in Southwest UK.The utilization of mean,k-nearest neighbors,and random forest imputation methods addresses the challenge of missing data,thereby revealing the effectiveness of random forest imputation in providing realistic estimations.Subsequently,supervised classification models are trained to classify samples according to their pluton origins,with promising accuracy achieved by models trained with imputed values.Variable importance ranking of the models showed that the choice of imputation method influences the inferred importance of specific petrophysical properties.While porosity(POR)and grain density(GD)were among important variables,variables with high missingness ratio were not among the top variables.This study demonstrates the value of machine learning in enhancing petrophysical datasets,while emphasizing the importance of careful method selection and model validation for reliable results.The findings contribute to a more informed decision-making process in geothermal exploration and reservoir tion characteriza-efforts,thereby demonstrating the potential of machine learning in advancing subsurface characterization techniques.展开更多
Landslide dam failures can cause significant damage to both society and ecosystems.Predicting the failure of these dams in advance enables early preventive measures,thereby minimizing potential harm.This paper aims to...Landslide dam failures can cause significant damage to both society and ecosystems.Predicting the failure of these dams in advance enables early preventive measures,thereby minimizing potential harm.This paper aims to propose a fast and accurate model for predicting the longevity of landslide dams while also addressing the issue of missing data.Given the wide variation in the survival times of landslide dams—from mere minutes to several thousand years—predicting their longevity presents a considerable challenge.The study develops predictive models by considering key factors such as dam geometry,hydrodynamic conditions,materials,and triggering parameters.A dataset of 1045 landslide dam cases is analyzed,categorizing their longevity into three distinct groups:C1(<1 month),C2(1 month to 1 year),and C3(>1 year).Multiple imputation and knearest neighbor algorithms are used to handle missing data on geometric size,hydrodynamic conditions,materials,and triggers.Based on the imputed data,two predictive models are developed:a classification model for dam longevity categories and a regression model for precise longevity predictions.The classification model achieves an accuracy of 88.38%while the regression model outperforms existing models with an R^(2) value of 0.966.Two real-life landslide dam cases are used to validate the models,which show correct classification and small prediction errors.The longevity of landslide dams is jointly influenced by factors such as geometric size,hydrodynamic conditions,materials,and triggering events.Among these,geometric size has the greatest impact,followed by hydrodynamic conditions,materials,and triggers,as confirmed by variable importance in the model development.展开更多
Accurate lithofacies classification in low-permeability sandstone reservoirs remains challenging due to class imbalance in well-log data and the difficulty of the modeling vertical lithological dependencies.Traditiona...Accurate lithofacies classification in low-permeability sandstone reservoirs remains challenging due to class imbalance in well-log data and the difficulty of the modeling vertical lithological dependencies.Traditional core-based interpretation introduces subjectivity,while conventional deep learning models often fail to capture stratigraphic sequences effectively.To address these limitations,we propose a hybrid CNN–GRU framework that integrates spatial feature extraction and sequential modeling.Heat Kernel Imputation is applied to reconstruct missing log data,and Borderline SMOTE(BSMOTE)improves class balance by augmenting boundary-case minority samples.The CNN component extracts localized petrophysical features,and the GRU component captures depth-wise lithological transitions,to enable spatial-sequential feature fusion.Experiments on real-well datasets from tight sandstone reservoirs show that the proposed model achieves an average accuracy of 93.3%and a Macro F1-score of 0.934.It outperforms baseline models,including RF(87.8%),GBDT(81.8%),CNN-only(87.5%),and GRU-only(86.1%).Leave-one-well-out validation further confirms strong generalization ability.These results demonstrate that the proposed approach effectively addresses data imbalance and enhances classification robustness,offering a scalable and automated solution for lithofacies interpretation under complex geological conditions.展开更多
The accurate prediction and analysis of emergencies in Urban Rail Transit Systems(URTS)are essential for the development of effective early warning and prevention mechanisms.This study presents an integrated perceptio...The accurate prediction and analysis of emergencies in Urban Rail Transit Systems(URTS)are essential for the development of effective early warning and prevention mechanisms.This study presents an integrated perception model designed to predict emergencies and analyze their causes based on historical unstructured emergency data.To address issues related to data structuredness and missing values,we employed label encoding and an Elastic Net Regularization-based Generative Adversarial Interpolation Network(ER-GAIN)for data structuring and imputation.Additionally,to mitigate the impact of imbalanced data on the predictive performance of emergencies,we introduced an Adaptive Boosting Ensemble Model(AdaBoost)to forecast the key features of emergencies,including event types and levels.We also utilized Information Gain(IG)to analyze and rank the causes of various significant emergencies.Experimental results indicate that,compared to baseline data imputation models,ER-GAIN improved the prediction accuracy of key emergency features by 3.67%and 3.78%,respectively.Furthermore,AdaBoost enhanced the accuracy by over 4.34%and 3.25%compared to baseline predictivemodels.Through causation analysis,we identified the critical causes of train operation and fire incidents.The findings of this research will contribute to the establishment of early warning and prevention mechanisms for emergencies in URTS,potentially leading to safer and more reliable URTS operations.展开更多
Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attentio...Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data.展开更多
Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a sign...Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a significant challenge to maintaining prediction precision.This study introduces REPTF-TMDI,a novel method that combines a Reduced Error Pruning Tree Forest(REPTree Forest)with a newly proposed Time-based Missing Data Imputation(TMDI)approach.The REP Tree Forest,an ensemble learning approach,is tailored for time-related traffic data to enhance predictive accuracy and support the evolution of sustainable urbanmobility solutions.Meanwhile,the TMDI approach exploits temporal patterns to estimate missing values reliably whenever empty fields are encountered.The proposed method was evaluated using hourly traffic flow data from a major U.S.roadway spanning 2012-2018,incorporating temporal features(e.g.,hour,day,month,year,weekday),holiday indicator,and weather conditions(temperature,rain,snow,and cloud coverage).Experimental results demonstrated that the REPTF-TMDI method outperformed conventional imputation techniques across various missing data ratios by achieving an average 11.76%improvement in terms of correlation coefficient(R).Furthermore,REPTree Forest achieved improvements of 68.62%in RMSE and 70.52%in MAE compared to existing state-of-the-art models.These findings highlight the method’s ability to significantly boost traffic flow prediction accuracy,even in the presence of missing data,thereby contributing to the broader objectives of sustainable urban transportation systems.展开更多
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s...Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.展开更多
Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prog...Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prognostic model of cervical spinal cord injury without radiological abnormality. This retrospective analysis included 43 patients with cervical spinal cord injury without radiological abnormality. Seven potential factors were assessed: age, sex, external force strength causing damage, duration of disease, degree of cervical spinal stenosis, Japanese Orthopaedic Association score, and physiological cervical curvature. A model was established using multiple binary logistic regression analysis. The model was evaluated by concordant profiling and the area under the receiver operating characteristic curve. Bootstrapping was used for internal validation. The prognostic model was as follows: logit(P) =-25.4545 + 21.2576 VALUE + 1.2160SCORE-3.4224 TIME, where VALUE refers to the Pavlov ratio indicating the extent of cervical spinal stenosis, SCORE refers to the Japanese Orthopaedic Association score(0–17) after the operation, and TIME refers to the disease duration(from injury to operation). The area under the receiver operating characteristic curve for all patients was 0.8941(95% confidence interval, 0.7930–0.9952). Three factors assessed in the predictive model were associated with patient outcomes: a great extent of cervical stenosis, a poor preoperative neurological status, and a long disease duration. These three factors could worsen patient outcomes. Moreover, the disease prognosis was considered good when logit(P) ≥-2.5105. Overall, the model displayed a certain clinical value. This study was approved by the Biomedical Ethics Committee of the Second Affiliated Hospital of Xi'an Jiaotong University, China(approval number: 2018063) on May 8, 2018.展开更多
Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by st...Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value.展开更多
In current study an attempt is carried out by filling missing data of geopotiential height over Pakistan and identifying the optimum method for interpolation. In last thirteen years geopotential height values over wer...In current study an attempt is carried out by filling missing data of geopotiential height over Pakistan and identifying the optimum method for interpolation. In last thirteen years geopotential height values over were missing over Pakistan. These gaps are tried to be filled by interpolation Techniques. The techniques for interpolations included Bilinear interpolations [BI], Nearest Neighbor [NN], Natural [NI] and Inverse distance weighting [IDW]. These imputations were judged on the basis of performance parameters which include Root Mean Square Error [RMSE], Mean Absolute Error [MAE], Correlation Coefficient [Corr] and Coefficient of Determination [R2]. The NN and IDW interpolation Imputations were not precise and accurate. The Natural Neighbors and Bilinear interpolations immaculately fitted to the data set. A good correlation was found for Natural Neighbor interpolation imputations and perfectly fit to the surface of geopotential height. The root mean square error [maximum and minimum] values were ranges from ±5.10 to ±2.28 m respectively. However mean absolute error was near to 1. The validation of imputation revealed that NN interpolation produced more accurate results than BI. It can be concluded that Natural Interpolation was the best suited interpolation technique for filling missing data sets from AQUA satellite for geopotential height.展开更多
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead...In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.展开更多
Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputatio...Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation.Results: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24 X to 144 X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth(12 X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for resequencing. With fixed reference population size(24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1 X to 12 X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study.Conclusions: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations.展开更多
文摘Multiple imputations compensate for missing data and produce multiple datasets by regression model and are considered the solver of the old problem of univariate imputation. The univariate imputes data only from a specific column where the data cell was missing. Multivariate imputation works simultaneously, with all variables in all columns, whether missing or observed. It has emerged as a principal method of solving missing data problems. All incomplete datasets analyzed before Multiple Imputation by Chained Equations <span style="font-family:Verdana;">(MICE) presented were misdiagnosed;results obtained were invalid and should</span><span style="font-family:Verdana;"> not be countable to yield reasonable conclusions. This article will highlight why multiple imputations and how the MICE work with a particular focus on the cyber-security dataset.</span><b> </b><span style="font-family:Verdana;">Removing missing data in any dataset and replac</span><span style="font-family:Verdana;">ing it is imperative in analyzing the data and creating prediction models. Therefore,</span><span style="font-family:Verdana;"> a good imputation technique should recover the missingness, which involves extracting the good features. However, the widely used univariate imputation method does not impute missingness reasonably if the values are too large and may thus lead to bias. Therefore, we aim to propose an alternative imputation method that is efficient and removes potential bias after removing the missingness.</span>
文摘This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method.
文摘How many imputations are sufficient in multiple imputations? The answer given by different researchers varies from as few as 2 - 3 to as many as hundreds. Perhaps no single number of imputations would fit all situations. In this study, η, the minimally sufficient number of imputations, was determined based on the relationship between m, the number of imputations, and ω, the standard error of imputation variances using the 2012 National Ambulatory Medical Care Survey (NAMCS) Physician Workflow mail survey. Five variables of various value ranges, variances, and missing data percentages were tested. For all variables tested, ω decreased as m increased. The m value above which the cost of further increase in m would outweigh the benefit of reducing ω was recognized as the η. This method has a potential to be used by anyone to determine η that fits his or her own data situation.
文摘In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. To deal with this challenge, several imputation methods have been developed in the literature to handle missing values where the most commonly used are complete case method, mean imputation method, last observation carried forward (LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a simulation study to investigate the efficiency of these four typical imputation methods with longitudinal data setting under missing completely at random (MCAR). We categorize missingness with three cases from a lower percentage of 5% to a higher percentage of 30% and 50% missingness. With this simulation study, we make a conclusion that LOCF method has more bias than the other three methods in most situations. MI method has the least bias with the best coverage probability. Thus, we conclude that MI method is the most effective imputation method in our MCAR simulation study.
基金supported in part by the City of Atlanta(CoA)under Research Project FC-9930-Smart Cities Traffic Congestion Mitigation Program and in part by the National Center of Sustainable Transportation(NCST)under NCST Dissertation Fund.
文摘To fully leverage‘‘smart”transportation infrastructure data-stream investments,the creation of applications that provide real-time meaningful and actionable corridorperformance metrics is needed.However,the presence of gaps in data streams can lead to significant application implementation challenges.To demonstrate and help address these challenges,a digital twin smart-corridor application case study is presented with two primary research objectives:(1)explore the characteristics of volume data gaps on the case study corridor,and(2)investigate the feasibility of prioritizing data streams for data imputation to drive the real-time application.For the first objective,a K-means clustering analysis is used to identify similarities and differences among data gap patterns.The clustering analysis successfully identifies eight different data loss patterns.Patterns vary in both continuity and density of data gap occurrences,as well as time-dependent losses in several clusters.For the second objective,a temporal-neighboring interpolation approach for volume data imputation is explored.When investigating the use of temporalneighboring interpolation imputations on the digital twin application,performance is,in part,dependent on the combination of intersection approaches experiencing data loss,demand relative to capacity at individual locations,and the location of the loss along the corridor.The results indicate that these insights could be used to prioritize intersection approaches suitable for data imputation and to identify locations that require a more sensitive imputation methodology or improved maintenance and monitoring.
基金supported by the National Natural Science Foundation of China(No.12475340 and 12375350)Special Branch project of South Taihu Lakethe Scientific Research Fund of Zhejiang Provincial Education Department(No.Y202456326).
文摘Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light gradient boosting machine(LGBM)algorithm was employed to impute more than 60%of the missing data,establishing a radionuclide diffusion dataset containing 16 input features and 813 instances.The effective diffusion coefficient(D_(e))was predicted using ten ML models.The predictive accuracy of the ensemble meta-models,namely LGBM-extreme gradient boosting(XGB)and LGBM-categorical boosting(CatB),surpassed that of the other ML models,with R^(2)values of 0.94.The models were applied to predict the D_(e)values of EuEDTA^(−)and HCrO_(4)^(−)in saturated compacted bentonites at compactions ranging from 1200 to 1800 kg/m^(3),which were measured using a through-diffusion method.The generalization ability of the LGBM-XGB model surpassed that of LGB-CatB in predicting the D_(e)of HCrO_(4)^(−).Shapley additive explanations identified total porosity as the most significant influencing factor.Additionally,the partial dependence plot analysis technique yielded clearer results in the univariate correlation analysis.This study provides a regression imputation technique to refine radionuclide diffusion datasets,offering deeper insights into analyzing the diffusion mechanism of radionuclides and supporting the safety assessment of the geological disposal of high-level radioactive waste.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
文摘Outcrop analogue studies play an important role in advancing our comprehension of reservoir architectures,offering insights into hidden reservoir rocks prior to drilling,in a cost-effective manner.These studies contribute to the delineation of the three-dimensional geometry of geological structures,the characterization of petro-and thermo-physical properties,and the structural geological aspects of reservoir rocks.Nevertheless,several challenges,including inaccessible sampling sites,limited resources,and the dimensional constraints of different laboratories hinder the acquisition of comprehensive datasets.In this study,we employ machine learning techniques to estimate missing data in a petrophysical dataset of fractured Variscan granites from the Cornubian Batholith in Southwest UK.The utilization of mean,k-nearest neighbors,and random forest imputation methods addresses the challenge of missing data,thereby revealing the effectiveness of random forest imputation in providing realistic estimations.Subsequently,supervised classification models are trained to classify samples according to their pluton origins,with promising accuracy achieved by models trained with imputed values.Variable importance ranking of the models showed that the choice of imputation method influences the inferred importance of specific petrophysical properties.While porosity(POR)and grain density(GD)were among important variables,variables with high missingness ratio were not among the top variables.This study demonstrates the value of machine learning in enhancing petrophysical datasets,while emphasizing the importance of careful method selection and model validation for reliable results.The findings contribute to a more informed decision-making process in geothermal exploration and reservoir tion characteriza-efforts,thereby demonstrating the potential of machine learning in advancing subsurface characterization techniques.
基金support of the National Natural Science Foundation of China(U42107189,20A20111).
文摘Landslide dam failures can cause significant damage to both society and ecosystems.Predicting the failure of these dams in advance enables early preventive measures,thereby minimizing potential harm.This paper aims to propose a fast and accurate model for predicting the longevity of landslide dams while also addressing the issue of missing data.Given the wide variation in the survival times of landslide dams—from mere minutes to several thousand years—predicting their longevity presents a considerable challenge.The study develops predictive models by considering key factors such as dam geometry,hydrodynamic conditions,materials,and triggering parameters.A dataset of 1045 landslide dam cases is analyzed,categorizing their longevity into three distinct groups:C1(<1 month),C2(1 month to 1 year),and C3(>1 year).Multiple imputation and knearest neighbor algorithms are used to handle missing data on geometric size,hydrodynamic conditions,materials,and triggers.Based on the imputed data,two predictive models are developed:a classification model for dam longevity categories and a regression model for precise longevity predictions.The classification model achieves an accuracy of 88.38%while the regression model outperforms existing models with an R^(2) value of 0.966.Two real-life landslide dam cases are used to validate the models,which show correct classification and small prediction errors.The longevity of landslide dams is jointly influenced by factors such as geometric size,hydrodynamic conditions,materials,and triggering events.Among these,geometric size has the greatest impact,followed by hydrodynamic conditions,materials,and triggers,as confirmed by variable importance in the model development.
基金supported by the Langfang Science and Technology Program with self-raised funds under the project“Application of Deep Learning-Based Joint Well-Seismic Analysis in Lithology Prediction”(Project No.2024011013)the Science and Technology Innovation Program for Postgraduate students in IDP subsidized by Fundamental Research Funds for the Central Universities,under the project“Research on CNN Algorithm Enhanced by Physical Information for Lithofacies Prediction in Tight Sandstone Reservoirs”(Project No.ZY20250328).
文摘Accurate lithofacies classification in low-permeability sandstone reservoirs remains challenging due to class imbalance in well-log data and the difficulty of the modeling vertical lithological dependencies.Traditional core-based interpretation introduces subjectivity,while conventional deep learning models often fail to capture stratigraphic sequences effectively.To address these limitations,we propose a hybrid CNN–GRU framework that integrates spatial feature extraction and sequential modeling.Heat Kernel Imputation is applied to reconstruct missing log data,and Borderline SMOTE(BSMOTE)improves class balance by augmenting boundary-case minority samples.The CNN component extracts localized petrophysical features,and the GRU component captures depth-wise lithological transitions,to enable spatial-sequential feature fusion.Experiments on real-well datasets from tight sandstone reservoirs show that the proposed model achieves an average accuracy of 93.3%and a Macro F1-score of 0.934.It outperforms baseline models,including RF(87.8%),GBDT(81.8%),CNN-only(87.5%),and GRU-only(86.1%).Leave-one-well-out validation further confirms strong generalization ability.These results demonstrate that the proposed approach effectively addresses data imbalance and enhances classification robustness,offering a scalable and automated solution for lithofacies interpretation under complex geological conditions.
基金supported by the Fundamental Research Funds for the Central Universities(grant number 2024YJS096)National Natural Science Foundation of China(grant numbers 62433005,62272036,62173167).
文摘The accurate prediction and analysis of emergencies in Urban Rail Transit Systems(URTS)are essential for the development of effective early warning and prevention mechanisms.This study presents an integrated perception model designed to predict emergencies and analyze their causes based on historical unstructured emergency data.To address issues related to data structuredness and missing values,we employed label encoding and an Elastic Net Regularization-based Generative Adversarial Interpolation Network(ER-GAIN)for data structuring and imputation.Additionally,to mitigate the impact of imbalanced data on the predictive performance of emergencies,we introduced an Adaptive Boosting Ensemble Model(AdaBoost)to forecast the key features of emergencies,including event types and levels.We also utilized Information Gain(IG)to analyze and rank the causes of various significant emergencies.Experimental results indicate that,compared to baseline data imputation models,ER-GAIN improved the prediction accuracy of key emergency features by 3.67%and 3.78%,respectively.Furthermore,AdaBoost enhanced the accuracy by over 4.34%and 3.25%compared to baseline predictivemodels.Through causation analysis,we identified the critical causes of train operation and fire incidents.The findings of this research will contribute to the establishment of early warning and prevention mechanisms for emergencies in URTS,potentially leading to safer and more reliable URTS operations.
基金supported by the Intelligent System Research Group(ISysRG)supported by Universitas Sriwijaya funded by the Competitive Research 2024.
文摘Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data.
文摘Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a significant challenge to maintaining prediction precision.This study introduces REPTF-TMDI,a novel method that combines a Reduced Error Pruning Tree Forest(REPTree Forest)with a newly proposed Time-based Missing Data Imputation(TMDI)approach.The REP Tree Forest,an ensemble learning approach,is tailored for time-related traffic data to enhance predictive accuracy and support the evolution of sustainable urbanmobility solutions.Meanwhile,the TMDI approach exploits temporal patterns to estimate missing values reliably whenever empty fields are encountered.The proposed method was evaluated using hourly traffic flow data from a major U.S.roadway spanning 2012-2018,incorporating temporal features(e.g.,hour,day,month,year,weekday),holiday indicator,and weather conditions(temperature,rain,snow,and cloud coverage).Experimental results demonstrated that the REPTF-TMDI method outperformed conventional imputation techniques across various missing data ratios by achieving an average 11.76%improvement in terms of correlation coefficient(R).Furthermore,REPTree Forest achieved improvements of 68.62%in RMSE and 70.52%in MAE compared to existing state-of-the-art models.These findings highlight the method’s ability to significantly boost traffic flow prediction accuracy,even in the presence of missing data,thereby contributing to the broader objectives of sustainable urban transportation systems.
基金supported by the National Natural Science Foundation of China(Grant No.52409151)the Programme of Shenzhen Key Laboratory of Green,Efficient and Intelligent Construction of Underground Metro Station(Programme No.ZDSYS20200923105200001)the Science and Technology Major Project of Xizang Autonomous Region of China(XZ202201ZD0003G).
文摘Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.
基金supported by the National Natural Science Foundation of China,No.30672136(to HPL)
文摘Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prognostic model of cervical spinal cord injury without radiological abnormality. This retrospective analysis included 43 patients with cervical spinal cord injury without radiological abnormality. Seven potential factors were assessed: age, sex, external force strength causing damage, duration of disease, degree of cervical spinal stenosis, Japanese Orthopaedic Association score, and physiological cervical curvature. A model was established using multiple binary logistic regression analysis. The model was evaluated by concordant profiling and the area under the receiver operating characteristic curve. Bootstrapping was used for internal validation. The prognostic model was as follows: logit(P) =-25.4545 + 21.2576 VALUE + 1.2160SCORE-3.4224 TIME, where VALUE refers to the Pavlov ratio indicating the extent of cervical spinal stenosis, SCORE refers to the Japanese Orthopaedic Association score(0–17) after the operation, and TIME refers to the disease duration(from injury to operation). The area under the receiver operating characteristic curve for all patients was 0.8941(95% confidence interval, 0.7930–0.9952). Three factors assessed in the predictive model were associated with patient outcomes: a great extent of cervical stenosis, a poor preoperative neurological status, and a long disease duration. These three factors could worsen patient outcomes. Moreover, the disease prognosis was considered good when logit(P) ≥-2.5105. Overall, the model displayed a certain clinical value. This study was approved by the Biomedical Ethics Committee of the Second Affiliated Hospital of Xi'an Jiaotong University, China(approval number: 2018063) on May 8, 2018.
基金This research was financially supported by FDCT NO.005/2018/A1also supported by Guangdong Provincial Innovation and Entrepreneurship Training Program Project No.201713719017College Students Innovation Training Program held by Guangdong university of Science and Technology Nos.1711034,1711080,and No.1711088.
文摘Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value.
文摘In current study an attempt is carried out by filling missing data of geopotiential height over Pakistan and identifying the optimum method for interpolation. In last thirteen years geopotential height values over were missing over Pakistan. These gaps are tried to be filled by interpolation Techniques. The techniques for interpolations included Bilinear interpolations [BI], Nearest Neighbor [NN], Natural [NI] and Inverse distance weighting [IDW]. These imputations were judged on the basis of performance parameters which include Root Mean Square Error [RMSE], Mean Absolute Error [MAE], Correlation Coefficient [Corr] and Coefficient of Determination [R2]. The NN and IDW interpolation Imputations were not precise and accurate. The Natural Neighbors and Bilinear interpolations immaculately fitted to the data set. A good correlation was found for Natural Neighbor interpolation imputations and perfectly fit to the surface of geopotential height. The root mean square error [maximum and minimum] values were ranges from ±5.10 to ±2.28 m respectively. However mean absolute error was near to 1. The validation of imputation revealed that NN interpolation produced more accurate results than BI. It can be concluded that Natural Interpolation was the best suited interpolation technique for filling missing data sets from AQUA satellite for geopotential height.
文摘In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.
基金supported by the National Natural Science Foundation of China(31772556)the China Agricultural Research System(CARS-41-G03)+2 种基金the Science Innovation Project of Guangdong(2015A020209159)the Special Program for Applied Research on Super Computation of the NSFC Guangdong Joint Fund(the second phase)under Grant No.U1501501technical support from the National Supercomputer Center in Guangzhou
文摘Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation.Results: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24 X to 144 X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth(12 X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for resequencing. With fixed reference population size(24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1 X to 12 X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study.Conclusions: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations.