期刊文献+
共找到77篇文章
< 1 2 4 >
每页显示 20 50 100
Prediction of radionuclide diffusion enabled by missing data imputation and ensemble machine learning
1
作者 Jun-Lei Tian Jia-Xing Feng +4 位作者 Jia-Cong Shen Lei Yao Jing-Yan Wang Tao Wu Yao-Lin Zhao 《Nuclear Science and Techniques》 2025年第10期47-61,共15页
Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light grad... Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light gradient boosting machine(LGBM)algorithm was employed to impute more than 60%of the missing data,establishing a radionuclide diffusion dataset containing 16 input features and 813 instances.The effective diffusion coefficient(D_(e))was predicted using ten ML models.The predictive accuracy of the ensemble meta-models,namely LGBM-extreme gradient boosting(XGB)and LGBM-categorical boosting(CatB),surpassed that of the other ML models,with R^(2)values of 0.94.The models were applied to predict the D_(e)values of EuEDTA^(−)and HCrO_(4)^(−)in saturated compacted bentonites at compactions ranging from 1200 to 1800 kg/m^(3),which were measured using a through-diffusion method.The generalization ability of the LGBM-XGB model surpassed that of LGB-CatB in predicting the D_(e)of HCrO_(4)^(−).Shapley additive explanations identified total porosity as the most significant influencing factor.Additionally,the partial dependence plot analysis technique yielded clearer results in the univariate correlation analysis.This study provides a regression imputation technique to refine radionuclide diffusion datasets,offering deeper insights into analyzing the diffusion mechanism of radionuclides and supporting the safety assessment of the geological disposal of high-level radioactive waste. 展开更多
关键词 Machine learning Radionuclide diffusion BENTONITE Regression imputation missing data Diffusion experiments
在线阅读 下载PDF
A spatiotemporal recurrent neural network for missing data imputation in tunnel monitoring
2
作者 Junchen Ye Yuhao Mao +3 位作者 Ke Cheng Xuyan Tan Bowen Du Weizhong Chen 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第8期4815-4826,共12页
Given the swift proliferation of structural health monitoring(SHM)technology within tunnel engineering,there is a demand on proficiently and precisely imputing the missing monitoring data to uphold the precision of di... Given the swift proliferation of structural health monitoring(SHM)technology within tunnel engineering,there is a demand on proficiently and precisely imputing the missing monitoring data to uphold the precision of disaster prediction.In contrast to other SHM datasets,the monitoring data specific to tunnel engineering exhibits pronounced spatiotemporal correlations.Nevertheless,most methodologies fail to adequately combine these types of correlations.Hence,the objective of this study is to develop spatiotemporal recurrent neural network(ST-RNN)model,which exploits spatiotemporal information to effectively impute missing data within tunnel monitoring systems.ST-RNN consists of two moduli:a temporal module employing recurrent neural network(RNN)to capture temporal dependencies,and a spatial module employing multilayer perceptron(MLP)to capture spatial correlations.To confirm the efficacy of the model,several commonly utilized methods are chosen as baselines for conducting comparative analyses.Furthermore,parametric validity experiments are conducted to illustrate the efficacy of the parameter selection process.The experimentation is conducted using original raw datasets wherein various degrees of continuous missing data are deliberately introduced.The experimental findings indicate that the ST-RNN model,incorporating both spatiotemporal modules,exhibits superior interpolation performance compared to other baseline methods across varying degrees of missing data.This affirms the reliability of the proposed model. 展开更多
关键词 MONITORING TUNNEL Machine learning INTERPOLATION missing data
在线阅读 下载PDF
A Novel Reduced Error Pruning Tree Forest with Time-Based Missing Data Imputation(REPTF-TMDI)for Traffic Flow Prediction
3
作者 Yunus Dogan Goksu Tuysuzoglu +4 位作者 Elife Ozturk Kiyak Bita Ghasemkhani Kokten Ulas Birant Semih Utku Derya Birant 《Computer Modeling in Engineering & Sciences》 2025年第8期1677-1715,共39页
Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a sign... Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a significant challenge to maintaining prediction precision.This study introduces REPTF-TMDI,a novel method that combines a Reduced Error Pruning Tree Forest(REPTree Forest)with a newly proposed Time-based Missing Data Imputation(TMDI)approach.The REP Tree Forest,an ensemble learning approach,is tailored for time-related traffic data to enhance predictive accuracy and support the evolution of sustainable urbanmobility solutions.Meanwhile,the TMDI approach exploits temporal patterns to estimate missing values reliably whenever empty fields are encountered.The proposed method was evaluated using hourly traffic flow data from a major U.S.roadway spanning 2012-2018,incorporating temporal features(e.g.,hour,day,month,year,weekday),holiday indicator,and weather conditions(temperature,rain,snow,and cloud coverage).Experimental results demonstrated that the REPTF-TMDI method outperformed conventional imputation techniques across various missing data ratios by achieving an average 11.76%improvement in terms of correlation coefficient(R).Furthermore,REPTree Forest achieved improvements of 68.62%in RMSE and 70.52%in MAE compared to existing state-of-the-art models.These findings highlight the method’s ability to significantly boost traffic flow prediction accuracy,even in the presence of missing data,thereby contributing to the broader objectives of sustainable urban transportation systems. 展开更多
关键词 Machine learning traffic flow prediction missing data imputation reduced error pruning tree(REPTree) sustainable transportation systems traffic management artificial intelligence
在线阅读 下载PDF
A Modified Deep Residual-Convolutional Neural Network for Accurate Imputation of Missing Data
4
作者 Firdaus Firdaus Siti Nurmaini +8 位作者 Anggun Islami Annisa Darmawahyuni Ade Iriani Sapitri Muhammad Naufal Rachmatullah Bambang Tutuko Akhiar Wista Arum Muhammad Irfan Karim Yultrien Yultrien Ramadhana Noor Salassa Wandya 《Computers, Materials & Continua》 2025年第2期3419-3441,共23页
Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attentio... Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data. 展开更多
关键词 data imputation missing data deep learning deep residual convolutional neural network
在线阅读 下载PDF
Dynamic Relative Advantage-Driven Multi-Fault Synergistic Diagnosis Method for Motors under Imbalanced Missing Data Rates
5
作者 Zhenpeng Teng Xiaojian Yi Biao Wang 《Journal of Dynamics, Monitoring and Diagnostics》 2025年第2期111-120,共10页
Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.T... Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.These studies,however,have the following limitations:1)effective supervision is neglected for missing data across different fault types and 2)imbalance in missing rates among fault types results in inadequate learning during model training.To overcome the above limitations,this paper proposes a dynamic relative advantagedriven multi-fault synergistic diagnosis method to accomplish accurate fault diagnosis of motors under imbalanced missing data rates.Firstly,a cross-fault-type generalized synergistic diagnostic strategy is established based on variational information bottleneck theory,which is able to ensure sufficient supervision in handling missing data.Then,a dynamic relative advantage assessment technique is designed to reduce diagnostic accuracy decay caused by imbalanced missing data rates.The proposed method is validated using multi-sensor data from motor fault simulation experiments,and experimental results demonstrate its effectiveness and superiority in improving diagnostic accuracy and generalization under imbalanced missing data rates. 展开更多
关键词 data missing motor fault relative advantage synergistic diagnosis
在线阅读 下载PDF
A missing data processing method for dam deformation monitoring data using spatiotemporal clustering and support vector machine model 被引量:1
6
作者 Yan-tao Zhu Chong-shi Gu Mihai A.Diaconeasa 《Water Science and Engineering》 CSCD 2024年第4期417-424,共8页
Deformation monitoring is a critical measure for intuitively reflecting the operational behavior of a dam.However,the deformation monitoring data are often incomplete due to environmental changes,monitoring instrument... Deformation monitoring is a critical measure for intuitively reflecting the operational behavior of a dam.However,the deformation monitoring data are often incomplete due to environmental changes,monitoring instrument faults,and human operational errors,thereby often hindering the accurate assessment of actual deformation patterns.This study proposed a method for quantifying deformation similarity between measurement points by recognizing the spatiotemporal characteristics of concrete dam deformation monitoring data.It introduces a spatiotemporal clustering analysis of the concrete dam deformation behavior and employs the support vector machine model to address the missing data in concrete dam deformation monitoring.The proposed method was validated in a concrete dam project,with the model error maintaining within 5%,demonstrating its effectiveness in processing missing deformation data.This approach enhances the capability of early-warning systems and contributes to enhanced dam safety management. 展开更多
关键词 missing data recovery Concrete dam Deformation monitoring Spatiotemporal clustering Support vector machine model
在线阅读 下载PDF
Missing Data Imputation: A Comprehensive Review
7
作者 Majed Alwateer El-Sayed Atlam +2 位作者 Mahmoud Mohammed Abd El-Raouf Osama A. Ghoneim Ibrahim Gad 《Journal of Computer and Communications》 2024年第11期53-75,共23页
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn... Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines. 展开更多
关键词 missing data Machine Learning PREDICTION Deep Learning IMPUTATION
在线阅读 下载PDF
Data-driven fault diagnosis of control valve with missing data based on modeling and deep residual shrinkage network 被引量:3
8
作者 Feng SUN He XU +1 位作者 Yu-han ZHAO Yu-dong ZHANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2022年第4期303-313,共11页
A control valve is one of the most widely used machines in hydraulic systems.However,it often works in harsh environments and failure occurs from time to time.An intelligent and robust control valve fault diagnosis is... A control valve is one of the most widely used machines in hydraulic systems.However,it often works in harsh environments and failure occurs from time to time.An intelligent and robust control valve fault diagnosis is therefore important for operation of the system.In this study,a fault diagnosis based on the mathematical model(MM)imputation and the modified deep residual shrinkage network(MDRSN)is proposed to solve the problem that data-driven models for control valves are susceptible to changing operating conditions and missing data.The multiple fault time-series samples of the control valve at different openings are collected for fault diagnosis to verify the effectiveness of the proposed method.The effects of the proposed method in missing data imputation and fault diagnosis are analyzed.Compared with random and k-nearest neighbor(KNN)imputation,the accuracies of MM-based imputation are improved by 17.87%and 21.18%,in the circumstances of a20.00%data missing rate at valve opening from 10%to 28%.Furthermore,the results show that the proposed MDRSN can maintain high fault diagnosis accuracy with missing data. 展开更多
关键词 Control valve missing data Fault diagnosis Mathematical model(MM) Deep residual shrinkage network(DRSN)
原文传递
Comparison of Missing Data Imputation Methods in Time Series Forecasting 被引量:3
9
作者 Hyun Ahn Kyunghee Sun Kwanghoon Pio Kim 《Computers, Materials & Continua》 SCIE EI 2022年第1期767-779,共13页
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I... Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods. 展开更多
关键词 missing data imputation method time series forecasting LSTM
在线阅读 下载PDF
Generalized unscented Kalman filtering based radial basis function neural network for the prediction of ground radioactivity time series with missing data 被引量:2
10
作者 伍雪冬 王耀南 +1 位作者 刘维亭 朱志宇 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第6期546-551,共6页
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in... On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent. 展开更多
关键词 prediction of time series with missing data random interruption failures in the observation neural network approximation
原文传递
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
11
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 missing data Imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
在线阅读 下载PDF
Study on the Missing Data Mechanisms and Imputation Methods 被引量:1
12
作者 Abdullah Z. Alruhaymi Charles J. Kim 《Open Journal of Statistics》 2021年第4期477-492,共16页
The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="fo... The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="font-family:""> </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">aim</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">ed</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> at the missing data widespread problem faced by analysts and statisticians in academia and professional environments. Some data-driven methods were studied to obtain accurate data. Projects that highly rely on data face this missing data problem. And since machine learning models are only as good as the data used to train them, the missing data problem has a real impact on the solutions developed for real-world problems. Therefore, in this dissertation, there is an attempt to solve this problem using different mechanisms. This is done by testing the effectiveness of both traditional and modern data imputation techniques by determining the loss of statistical power when these different approaches are used to tackle the missing data problem. At the end of this research dissertation, it should be easy to establish which methods are the best when handling the research problem. It is recommended that using Multivariate Imputation by Chained Equations (MICE) for MAR missingness is the best approach </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">to</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> dealing with missing data. 展开更多
关键词 missing data MECHANISMS Imputation Techniques MODELS
在线阅读 下载PDF
Improved interpolation method based on singular spectrum analysis iteration and its application to missing data recovery
13
作者 王辉赞 张韧 +2 位作者 刘巍 王桂华 金宝刚 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2008年第10期1351-1361,共11页
A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the p... A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the principal components and the embedding dimension. Based on the improved SSA iterative interpolation, interpolated test and comparative analysis are carried out to the outgoing longwave radiation daily data. The results show that IQA can find globally optimal parameters to the error curve with local oscillation, and has advantage of fast computing speed. The improved interpolation method is effective in the interpolation of missing data. 展开更多
关键词 singular spectrum analysis outgoing longwave radiation interpolation of missing data interval quartering algorithm
在线阅读 下载PDF
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya
14
作者 Joshua O. Mwaro Linda Chaba Collins Odhiambo 《Journal of Data Analysis and Information Processing》 2020年第3期110-133,共24页
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec... In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%. 展开更多
关键词 missing data HIV/TB Co-Infection IMPUTATION missing at Random Count data
暂未订购
Fraction of Missing Information (γ) at Different Missing Data Fractions in the 2012 NAMCS Physician Workflow Mail Survey
15
作者 Qiyuan Pan Rong Wei 《Applied Mathematics》 2016年第10期1057-1067,共11页
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead... In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m. 展开更多
关键词 Multiple Imputation Fraction of missing Information (γ) Sufficient Number of Imputations missing data NAMCS
在线阅读 下载PDF
Improving Disease Prevalence Estimates Using Missing Data Techniques
16
作者 Elhadji Moustapha Seck Ngesa Owino Oscar Abdou Ka Diongue 《Open Journal of Statistics》 2016年第6期1110-1122,共14页
The prevalence of a disease in a population is defined as the proportion of people who are infected. Selection bias in disease prevalence estimates occurs if non-participation in testing is correlated with disease sta... The prevalence of a disease in a population is defined as the proportion of people who are infected. Selection bias in disease prevalence estimates occurs if non-participation in testing is correlated with disease status. Missing data are commonly encountered in most medical research. Unfortunately, they are often neglected or not properly handled during analytic procedures, and this may substantially bias the results of the study, reduce the study power, and lead to invalid conclusions. The goal of this study is to illustrate how to estimate prevalence in the presence of missing data. We consider a case where the variable of interest (response variable) is binary and some of the observations are missing and assume that all the covariates are fully observed. In most cases, the statistic of interest, when faced with binary data is the prevalence. We develop a two stage approach to improve the prevalence estimates;in the first stage, we use the logistic regression model to predict the missing binary observations and then in the second stage we recalculate the prevalence using the observed data and the imputed missing data. Such a model would be of great interest in research studies involving HIV/AIDS in which people usually refuse to donate blood for testing yet they are willing to provide other covariates. The prevalence estimation method is illustrated using simulated data and applied to HIV/AIDS data from the Kenya AIDS Indicator Survey, 2007. 展开更多
关键词 Disease Prevalence missing data Non-Participant Logistic Regression Model Prevalence Estimates HIV/AIDS
暂未订购
An Immediate Mortality Prediction Score That is Robust to Missing Data
17
作者 Tara M. Westover Marta B. Fernandes +1 位作者 M. Brandon Westover Sahar F. Zafar 《Open Journal of Statistics》 2025年第1期73-80,共8页
Objective: To develop an illness severity score that predicts short-term mortality, based on a small number of readily available measurements, and overcomes limitations of the SOFA score, for use in research involving... Objective: To develop an illness severity score that predicts short-term mortality, based on a small number of readily available measurements, and overcomes limitations of the SOFA score, for use in research involving large-scale electronic health records. Design: Retrospective analysis of electronic records for 37,739 adult inpatients. Setting: A single tertiary care hospital system from 2016-2022. Patients: 37,739 adult ICU patients. Interventions: IMPS was developed using logistic regression with the 6 SOFA components, age, sex and missingness indicators as predictors, and 10-day mortality as the outcome. This was compared with SOFA with median imputation. Measurements and Main Results: Discrimination was evaluated by AUROC, calibration by comparing predicted and observed mortality. IMPS showed excellent discrimination (AUROC 0.80) and calibration. It outperformed SOFA alone (AUROC 0.70) and with age/sex (0.74). Conclusions: By retaining continuous data, adding age, allowing for missingness, and optimizing weights based on empirical mortality association, IMPS achieved substantially better mortality prediction than the original SOFA. 展开更多
关键词 Critical Care missing data Electronic Health Records Illness Severity MORTALITY
在线阅读 下载PDF
Handling missing data in large-scale TBM datasets:Methods,strategies,and applications
18
作者 Haohan Xiao Ruilang Cao +5 位作者 Zuyu Chen Chengyu Hong Jun Wang Min Yao Litao Fan Teng Luo 《Intelligent Geoengineering》 2025年第3期109-125,共17页
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s... Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets. 展开更多
关键词 Tunnel boring machine(TBM) missing data imputation Machine learning(ML) Time series interpolation data preprocessing Real-time data stream
在线阅读 下载PDF
A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation
19
作者 Thierry Mugenzi Cahit Perkgoz 《Computers, Materials & Continua》 2026年第1期1985-2005,共21页
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a... Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications. 展开更多
关键词 missing data imputation autoencoder deep learning missing mechanisms
在线阅读 下载PDF
Random Subspace Sampling for Classification with Missing Data
20
作者 曹云浩 吴建鑫 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第2期472-486,共15页
Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large err... Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors.In this paper,we propose a random subspace sampling method,RSS,by sampling missing items from the corresponding feature histogram distributions in random subspaces,which is effective and efficient at different levels of missing data.Unlike most established approaches,RSS does not train on fixed imputed datasets.Instead,we design a dynamic training strategy where the filled values change dynamically by resampling during training.Moreover,thanks to the sampling strategy,we design an ensemble testing strategy where we combine the results of multiple runs of a single model,which is more efficient and resource-saving than previous ensemble methods.Finally,we combine these two strategies with the random subspace method,which makes our estimations more robust and accurate.The effectiveness of the proposed RSS method is well validated by experimental studies. 展开更多
关键词 missing data random subspace neural network ensemble learning
原文传递
上一页 1 2 4 下一页 到第
使用帮助 返回顶部