期刊文献+
共找到77篇文章
< 1 2 4 >
每页显示 20 50 100
Dynamic Relative Advantage-Driven Multi-Fault Synergistic Diagnosis Method for Motors under Imbalanced Missing Data Rates
1
作者 Zhenpeng Teng Xiaojian Yi Biao Wang 《Journal of Dynamics, Monitoring and Diagnostics》 2025年第2期111-120,共10页
Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.T... Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.These studies,however,have the following limitations:1)effective supervision is neglected for missing data across different fault types and 2)imbalance in missing rates among fault types results in inadequate learning during model training.To overcome the above limitations,this paper proposes a dynamic relative advantagedriven multi-fault synergistic diagnosis method to accomplish accurate fault diagnosis of motors under imbalanced missing data rates.Firstly,a cross-fault-type generalized synergistic diagnostic strategy is established based on variational information bottleneck theory,which is able to ensure sufficient supervision in handling missing data.Then,a dynamic relative advantage assessment technique is designed to reduce diagnostic accuracy decay caused by imbalanced missing data rates.The proposed method is validated using multi-sensor data from motor fault simulation experiments,and experimental results demonstrate its effectiveness and superiority in improving diagnostic accuracy and generalization under imbalanced missing data rates. 展开更多
关键词 data missing motor fault relative advantage synergistic diagnosis
在线阅读 下载PDF
A spatiotemporal recurrent neural network for missing data imputation in tunnel monitoring
2
作者 Junchen Ye Yuhao Mao +3 位作者 Ke Cheng Xuyan Tan Bowen Du Weizhong Chen 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第8期4815-4826,共12页
Given the swift proliferation of structural health monitoring(SHM)technology within tunnel engineering,there is a demand on proficiently and precisely imputing the missing monitoring data to uphold the precision of di... Given the swift proliferation of structural health monitoring(SHM)technology within tunnel engineering,there is a demand on proficiently and precisely imputing the missing monitoring data to uphold the precision of disaster prediction.In contrast to other SHM datasets,the monitoring data specific to tunnel engineering exhibits pronounced spatiotemporal correlations.Nevertheless,most methodologies fail to adequately combine these types of correlations.Hence,the objective of this study is to develop spatiotemporal recurrent neural network(ST-RNN)model,which exploits spatiotemporal information to effectively impute missing data within tunnel monitoring systems.ST-RNN consists of two moduli:a temporal module employing recurrent neural network(RNN)to capture temporal dependencies,and a spatial module employing multilayer perceptron(MLP)to capture spatial correlations.To confirm the efficacy of the model,several commonly utilized methods are chosen as baselines for conducting comparative analyses.Furthermore,parametric validity experiments are conducted to illustrate the efficacy of the parameter selection process.The experimentation is conducted using original raw datasets wherein various degrees of continuous missing data are deliberately introduced.The experimental findings indicate that the ST-RNN model,incorporating both spatiotemporal modules,exhibits superior interpolation performance compared to other baseline methods across varying degrees of missing data.This affirms the reliability of the proposed model. 展开更多
关键词 MONITORING TUNNEL Machine learning INTERPOLATION missing data
在线阅读 下载PDF
Prediction of radionuclide diffusion enabled by missing data imputation and ensemble machine learning
3
作者 Jun-Lei Tian Jia-Xing Feng +4 位作者 Jia-Cong Shen Lei Yao Jing-Yan Wang Tao Wu Yao-Lin Zhao 《Nuclear Science and Techniques》 2025年第10期47-61,共15页
Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light grad... Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of the machine learning(ML)models.In this study,regression-based missing data imputation method using a light gradient boosting machine(LGBM)algorithm was employed to impute more than 60%of the missing data,establishing a radionuclide diffusion dataset containing 16 input features and 813 instances.The effective diffusion coefficient(D_(e))was predicted using ten ML models.The predictive accuracy of the ensemble meta-models,namely LGBM-extreme gradient boosting(XGB)and LGBM-categorical boosting(CatB),surpassed that of the other ML models,with R^(2)values of 0.94.The models were applied to predict the D_(e)values of EuEDTA^(−)and HCrO_(4)^(−)in saturated compacted bentonites at compactions ranging from 1200 to 1800 kg/m^(3),which were measured using a through-diffusion method.The generalization ability of the LGBM-XGB model surpassed that of LGB-CatB in predicting the D_(e)of HCrO_(4)^(−).Shapley additive explanations identified total porosity as the most significant influencing factor.Additionally,the partial dependence plot analysis technique yielded clearer results in the univariate correlation analysis.This study provides a regression imputation technique to refine radionuclide diffusion datasets,offering deeper insights into analyzing the diffusion mechanism of radionuclides and supporting the safety assessment of the geological disposal of high-level radioactive waste. 展开更多
关键词 Machine learning Radionuclide diffusion BENTONITE Regression imputation missing data Diffusion experiments
在线阅读 下载PDF
A Modified Deep Residual-Convolutional Neural Network for Accurate Imputation of Missing Data
4
作者 Firdaus Firdaus Siti Nurmaini +8 位作者 Anggun Islami Annisa Darmawahyuni Ade Iriani Sapitri Muhammad Naufal Rachmatullah Bambang Tutuko Akhiar Wista Arum Muhammad Irfan Karim Yultrien Yultrien Ramadhana Noor Salassa Wandya 《Computers, Materials & Continua》 2025年第2期3419-3441,共23页
Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attentio... Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data. 展开更多
关键词 data imputation missing data deep learning deep residual convolutional neural network
在线阅读 下载PDF
A Novel Reduced Error Pruning Tree Forest with Time-Based Missing Data Imputation(REPTF-TMDI)for Traffic Flow Prediction
5
作者 Yunus Dogan Goksu Tuysuzoglu +4 位作者 Elife Ozturk Kiyak Bita Ghasemkhani Kokten Ulas Birant Semih Utku Derya Birant 《Computer Modeling in Engineering & Sciences》 2025年第8期1677-1715,共39页
Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a sign... Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a significant challenge to maintaining prediction precision.This study introduces REPTF-TMDI,a novel method that combines a Reduced Error Pruning Tree Forest(REPTree Forest)with a newly proposed Time-based Missing Data Imputation(TMDI)approach.The REP Tree Forest,an ensemble learning approach,is tailored for time-related traffic data to enhance predictive accuracy and support the evolution of sustainable urbanmobility solutions.Meanwhile,the TMDI approach exploits temporal patterns to estimate missing values reliably whenever empty fields are encountered.The proposed method was evaluated using hourly traffic flow data from a major U.S.roadway spanning 2012-2018,incorporating temporal features(e.g.,hour,day,month,year,weekday),holiday indicator,and weather conditions(temperature,rain,snow,and cloud coverage).Experimental results demonstrated that the REPTF-TMDI method outperformed conventional imputation techniques across various missing data ratios by achieving an average 11.76%improvement in terms of correlation coefficient(R).Furthermore,REPTree Forest achieved improvements of 68.62%in RMSE and 70.52%in MAE compared to existing state-of-the-art models.These findings highlight the method’s ability to significantly boost traffic flow prediction accuracy,even in the presence of missing data,thereby contributing to the broader objectives of sustainable urban transportation systems. 展开更多
关键词 Machine learning traffic flow prediction missing data imputation reduced error pruning tree(REPTree) sustainable transportation systems traffic management artificial intelligence
在线阅读 下载PDF
Cooperative Iteration Matching Method for Aligning Samples from Heterogeneous Industrial Datasets
6
作者 LI Han SHI Guohong +1 位作者 LIU Zhao ZHU Ping 《Journal of Shanghai Jiaotong university(Science)》 2025年第2期375-384,共10页
Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining s... Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method. 展开更多
关键词 dynamic time warping mismatched samples sample alignment industrial data data missing
原文传递
A missing data processing method for dam deformation monitoring data using spatiotemporal clustering and support vector machine model 被引量:1
7
作者 Yan-tao Zhu Chong-shi Gu Mihai A.Diaconeasa 《Water Science and Engineering》 CSCD 2024年第4期417-424,共8页
Deformation monitoring is a critical measure for intuitively reflecting the operational behavior of a dam.However,the deformation monitoring data are often incomplete due to environmental changes,monitoring instrument... Deformation monitoring is a critical measure for intuitively reflecting the operational behavior of a dam.However,the deformation monitoring data are often incomplete due to environmental changes,monitoring instrument faults,and human operational errors,thereby often hindering the accurate assessment of actual deformation patterns.This study proposed a method for quantifying deformation similarity between measurement points by recognizing the spatiotemporal characteristics of concrete dam deformation monitoring data.It introduces a spatiotemporal clustering analysis of the concrete dam deformation behavior and employs the support vector machine model to address the missing data in concrete dam deformation monitoring.The proposed method was validated in a concrete dam project,with the model error maintaining within 5%,demonstrating its effectiveness in processing missing deformation data.This approach enhances the capability of early-warning systems and contributes to enhanced dam safety management. 展开更多
关键词 missing data recovery Concrete dam Deformation monitoring Spatiotemporal clustering Support vector machine model
在线阅读 下载PDF
A Practical Approach for Missing Wireless Sensor Networks Data Recovery
8
作者 Song Xiaoxiang Guo Yan +1 位作者 Li Ning Ren Bing 《China Communications》 SCIE CSCD 2024年第5期202-217,共16页
In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and tra... In wireless sensor networks(WSNs),the performance of related applications is highly dependent on the quality of data collected.Unfortunately,missing data is almost inevitable in the process of data acquisition and transmission.Existing methods often rely on prior information such as low-rank characteristics or spatiotemporal correlation when recovering missing WSNs data.However,in realistic application scenarios,it is very difficult to obtain these prior information from incomplete data sets.Therefore,we aim to recover the missing WSNs data effectively while getting rid of the perplexity of prior information.By designing the corresponding measurement matrix that can capture the position of missing data and sparse representation matrix,a compressive sensing(CS)based missing data recovery model is established.Then,we design a comparison standard to select the best sparse representation basis and introduce average cross-correlation to examine the rationality of the established model.Furthermore,an improved fast matching pursuit algorithm is proposed to solve the model.Simulation results show that the proposed method can effectively recover the missing WSNs data. 展开更多
关键词 average cross correlation matching pursuit missing data wireless sensor networks
在线阅读 下载PDF
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
9
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 High-Dimensional Covariance Matrix missing data Sub-Gaussian Noise Optimal Estimation
在线阅读 下载PDF
Missing Data Imputation: A Comprehensive Review
10
作者 Majed Alwateer El-Sayed Atlam +2 位作者 Mahmoud Mohammed Abd El-Raouf Osama A. Ghoneim Ibrahim Gad 《Journal of Computer and Communications》 2024年第11期53-75,共23页
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn... Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines. 展开更多
关键词 missing data Machine Learning PREDICTION Deep Learning IMPUTATION
在线阅读 下载PDF
Comparison of two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials: a simulation study
11
作者 Jing-Bo Zhai Tian-Ci Guo Wei-Jie Yu 《Medical Data Mining》 2024年第1期10-15,共6页
Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for a... Background:Missing data are frequently occurred in clinical studies.Due to the development of precision medicine,there is an increased interest in N-of-1 trial.Bayesian models are one of main statistical methods for analyzing the data of N-of-1 trials.This simulation study aimed to compare two statistical methods for handling missing values of quantitative data in Bayesian N-of-1 trials.Methods:The simulated data of N-of-1 trials with different coefficients of autocorrelation,effect sizes and missing ratios are obtained by SAS 9.1 system.The missing values are filled with mean filling and regression filling respectively in the condition of different coefficients of autocorrelation,effect sizes and missing ratios by SPSS 25.0 software.Bayesian models are built to estimate the posterior means by Winbugs 14 software.Results:When the missing ratio is relatively small,e.g.5%,missing values have relatively little effect on the results.Therapeutic effects may be underestimated when the coefficient of autocorrelation increases and no filling is used.However,it may be overestimated when mean or regression filling is used,and the results after mean filling are closer to the actual effect than regression filling.In the case of moderate missing ratio,the estimated effect after mean filling is closer to the actual effect compared to regression filling.When a large missing ratio(20%)occurs,data missing can lead to significantly underestimate the effect.In this case,the estimated effect after regression filling is closer to the actual effect compared to mean filling.Conclusion:Data missing can affect the estimated therapeutic effects using Bayesian models in N-of-1 trials.The present study suggests that mean filling can be used under situation of missing ratio≤10%.Otherwise,regression filling may be preferable. 展开更多
关键词 N-of-1 trial BAYESIAN missing data simulation study
在线阅读 下载PDF
Comparison of Missing Data Imputation Methods in Time Series Forecasting 被引量:3
12
作者 Hyun Ahn Kyunghee Sun Kwanghoon Pio Kim 《Computers, Materials & Continua》 SCIE EI 2022年第1期767-779,共13页
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I... Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods. 展开更多
关键词 missing data imputation method time series forecasting LSTM
在线阅读 下载PDF
Data-driven fault diagnosis of control valve with missing data based on modeling and deep residual shrinkage network 被引量:3
13
作者 Feng SUN He XU +1 位作者 Yu-han ZHAO Yu-dong ZHANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2022年第4期303-313,共11页
A control valve is one of the most widely used machines in hydraulic systems.However,it often works in harsh environments and failure occurs from time to time.An intelligent and robust control valve fault diagnosis is... A control valve is one of the most widely used machines in hydraulic systems.However,it often works in harsh environments and failure occurs from time to time.An intelligent and robust control valve fault diagnosis is therefore important for operation of the system.In this study,a fault diagnosis based on the mathematical model(MM)imputation and the modified deep residual shrinkage network(MDRSN)is proposed to solve the problem that data-driven models for control valves are susceptible to changing operating conditions and missing data.The multiple fault time-series samples of the control valve at different openings are collected for fault diagnosis to verify the effectiveness of the proposed method.The effects of the proposed method in missing data imputation and fault diagnosis are analyzed.Compared with random and k-nearest neighbor(KNN)imputation,the accuracies of MM-based imputation are improved by 17.87%and 21.18%,in the circumstances of a20.00%data missing rate at valve opening from 10%to 28%.Furthermore,the results show that the proposed MDRSN can maintain high fault diagnosis accuracy with missing data. 展开更多
关键词 Control valve missing data Fault diagnosis Mathematical model(MM) Deep residual shrinkage network(DRSN)
原文传递
Generalized unscented Kalman filtering based radial basis function neural network for the prediction of ground radioactivity time series with missing data 被引量:2
14
作者 伍雪冬 王耀南 +1 位作者 刘维亭 朱志宇 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第6期546-551,共6页
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in... On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent. 展开更多
关键词 prediction of time series with missing data random interruption failures in the observation neural network approximation
原文传递
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
15
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 missing data Imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
在线阅读 下载PDF
Study on the Missing Data Mechanisms and Imputation Methods 被引量:1
16
作者 Abdullah Z. Alruhaymi Charles J. Kim 《Open Journal of Statistics》 2021年第4期477-492,共16页
The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="fo... The absence of some data values in any observed dataset has been a real hindrance to achieving valid results in statistical research. This paper</span></span><span><span><span style="font-family:""> </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">aim</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">ed</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> at the missing data widespread problem faced by analysts and statisticians in academia and professional environments. Some data-driven methods were studied to obtain accurate data. Projects that highly rely on data face this missing data problem. And since machine learning models are only as good as the data used to train them, the missing data problem has a real impact on the solutions developed for real-world problems. Therefore, in this dissertation, there is an attempt to solve this problem using different mechanisms. This is done by testing the effectiveness of both traditional and modern data imputation techniques by determining the loss of statistical power when these different approaches are used to tackle the missing data problem. At the end of this research dissertation, it should be easy to establish which methods are the best when handling the research problem. It is recommended that using Multivariate Imputation by Chained Equations (MICE) for MAR missingness is the best approach </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">to</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> dealing with missing data. 展开更多
关键词 missing data MECHANISMS Imputation Techniques MODELS
在线阅读 下载PDF
RAD-seq data reveals robust phylogeny and morphological evolutionary history of Rhododendron 被引量:1
17
作者 Yuanting Shen Gang Yao +6 位作者 Yunfei Li Xiaoling Tian Shiming Li Nian Wang Chengjun Zhang Fei Wang Yongpeng Ma 《Horticultural Plant Journal》 SCIE CAS CSCD 2024年第3期866-878,共13页
Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with... Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with high horticulture value need to be explored.Both problems largely hinder utilization of germplasm resources.Most studies attempted to disentangle the phylogeny of Rhododendron,but only used a few genomic markers and lacked large-scale sampling,resulting in low clade support and contradictory phylogenetic signals.Here,we used restriction-site associated DNA sequencing(RAD-seq)data and morphological traits for 144 species of Rhododendron,representing all subgenera and most sections and subsections of this species-rich genus,to decipher its intricate evolutionary history and reconstruct ancestral state.Our results revealed high resolutions at subgenera and section levels of Rhododendron based on RAD-seq data.Both optimal phylogenetic tree and split tree recovered five lineages among Rhododendron.Subg.Therorhodion(cladeⅠ)formed the basal lineage.Subg.Tsutsusi and Azaleastrum formed cladeⅡand had sister relationships.CladeⅢincluded all scaly rhododendron species.Subg.Pentanthera(cladeⅣ)formed a sister group to Subg.Hymenanthes(cladeⅤ).The results of ancestral state reconstruction showed that Rhododendron ancestor was a deciduous woody plant with terminal inflorescence,ten stamens,leaf blade without scales and broadly funnelform corolla with pink or purple color.This study shows significant distinguishability to resolve the evolutionary history of Rhododendron based on high clade support of phylogenetic tree constructed by RAD-seq data.It also provides an example to resolve discordant signals in phylogenetic trees and demonstrates the application feasibility of RAD-seq with large amounts of missing data in deciphering intricate evolutionary relationships.Additionally,the reconstructed ancestral state of six important characters provides insights into the innovation of key characters in Rhododendron. 展开更多
关键词 RHODODENDRON RAD-seq missing data Quartet sampling(QS) Ancestral state reconstruction
在线阅读 下载PDF
Improved interpolation method based on singular spectrum analysis iteration and its application to missing data recovery
18
作者 王辉赞 张韧 +2 位作者 刘巍 王桂华 金宝刚 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2008年第10期1351-1361,共11页
A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the p... A novel interval quartering algorithm (IQA) is proposed to overcome insufficiency of the conventional singular spectrum analysis (SSA) iterative interpolation for selecting parameters including the number of the principal components and the embedding dimension. Based on the improved SSA iterative interpolation, interpolated test and comparative analysis are carried out to the outgoing longwave radiation daily data. The results show that IQA can find globally optimal parameters to the error curve with local oscillation, and has advantage of fast computing speed. The improved interpolation method is effective in the interpolation of missing data. 展开更多
关键词 singular spectrum analysis outgoing longwave radiation interpolation of missing data interval quartering algorithm
在线阅读 下载PDF
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya
19
作者 Joshua O. Mwaro Linda Chaba Collins Odhiambo 《Journal of Data Analysis and Information Processing》 2020年第3期110-133,共24页
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec... In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%. 展开更多
关键词 missing data HIV/TB Co-Infection IMPUTATION missing at Random Count data
暂未订购
Fraction of Missing Information (γ) at Different Missing Data Fractions in the 2012 NAMCS Physician Workflow Mail Survey
20
作者 Qiyuan Pan Rong Wei 《Applied Mathematics》 2016年第10期1057-1067,共11页
In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead... In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m. 展开更多
关键词 Multiple Imputation Fraction of missing Information (γ) Sufficient Number of Imputations missing data NAMCS
在线阅读 下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部