期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
1
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 missing data imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
在线阅读 下载PDF
A Novel Reduced Error Pruning Tree Forest with Time-Based Missing Data Imputation(REPTF-TMDI)for Traffic Flow Prediction
2
作者 Yunus Dogan Goksu Tuysuzoglu +4 位作者 Elife Ozturk Kiyak Bita Ghasemkhani Kokten Ulas Birant Semih Utku Derya Birant 《Computer Modeling in Engineering & Sciences》 2025年第8期1677-1715,共39页
Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a sign... Accurate traffic flow prediction(TFP)is vital for efficient and sustainable transportation management and the development of intelligent traffic systems.However,missing data in real-world traffic datasets poses a significant challenge to maintaining prediction precision.This study introduces REPTF-TMDI,a novel method that combines a Reduced Error Pruning Tree Forest(REPTree Forest)with a newly proposed Time-based Missing Data Imputation(TMDI)approach.The REP Tree Forest,an ensemble learning approach,is tailored for time-related traffic data to enhance predictive accuracy and support the evolution of sustainable urbanmobility solutions.Meanwhile,the TMDI approach exploits temporal patterns to estimate missing values reliably whenever empty fields are encountered.The proposed method was evaluated using hourly traffic flow data from a major U.S.roadway spanning 2012-2018,incorporating temporal features(e.g.,hour,day,month,year,weekday),holiday indicator,and weather conditions(temperature,rain,snow,and cloud coverage).Experimental results demonstrated that the REPTF-TMDI method outperformed conventional imputation techniques across various missing data ratios by achieving an average 11.76%improvement in terms of correlation coefficient(R).Furthermore,REPTree Forest achieved improvements of 68.62%in RMSE and 70.52%in MAE compared to existing state-of-the-art models.These findings highlight the method’s ability to significantly boost traffic flow prediction accuracy,even in the presence of missing data,thereby contributing to the broader objectives of sustainable urban transportation systems. 展开更多
关键词 Machine learning traffic flow prediction missing data imputation reduced error pruning tree(REPTree) sustainable transportation systems traffic management artificial intelligence
在线阅读 下载PDF
Handling missing data in large-scale TBM datasets:Methods,strategies,and applications
3
作者 Haohan Xiao Ruilang Cao +5 位作者 Zuyu Chen Chengyu Hong Jun Wang Min Yao Litao Fan Teng Luo 《Intelligent Geoengineering》 2025年第3期109-125,共17页
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s... Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets. 展开更多
关键词 Tunnel boring machine(TBM) missing data imputation Machine learning(ML) Time series interpolation data preprocessing Real-time data stream
在线阅读 下载PDF
A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation
4
作者 Thierry Mugenzi Cahit Perkgoz 《Computers, Materials & Continua》 2026年第1期1985-2005,共21页
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a... Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications. 展开更多
关键词 missing data imputation autoencoder deep learning missing mechanisms
在线阅读 下载PDF
ResiDualNet: A novel electric vehicle charging data imputation technique to enhance load forecasting accuracy
5
作者 Belal Mahmud Fahim Mohammad Kaosain Akbar Manar Amayri 《Building Simulation》 2025年第4期897-922,共26页
Electric vehicles(EVs)are a sustainable mode of transportation,significantly reducing greenhouse gas emissions.The development of EV charging stations is crucial for supporting the growing number of EVs and integratin... Electric vehicles(EVs)are a sustainable mode of transportation,significantly reducing greenhouse gas emissions.The development of EV charging stations is crucial for supporting the growing number of EVs and integrating them into smart grid infrastructure.Efficient use of these stations requires optimized energy management and accurate forecasting of EV charging behaviors.However,forecasting accuracy is often hindered by missing data due to connectivity issues and equipment failures.To address these challenges,this study introduces a novel data imputation method ResiDualNet(Residual Dual BiLSTM-CNN Path Network),which is a residual sequence-to-sequence technique for imputing missing EV charging data.This model effectively captures underlying temporal and long-term dependencies,demonstrating strong performance across various scenarios.We compare our proposed model with two commonly used imputation methods KNN and Mean Imputation and one generative model,Generative Adversarial Network(GAN),across four different EV charging datasets.Experimental results demonstrate that our model significantly outperforms the others,showing an average improvement of 82%in terms of root mean squared error(RMSE)across all datasets.To further assess the effectiveness of our imputation model,we utilize three cutting-edge and newly introduced forecasting models:Bidirectional Long Short-Term Memory(BiLSTM),Mogrifier LSTM,and Sample Convolution and Interaction Network(SCINet)to predict EV charging load.The results indicate that SCINet outperforms the other forecasting techniques.Moreover,for SCINet,the dataset imputed by our proposed model performs second best after the real dataset,confirming the effectiveness of our imputation approach in improving forecasting accuracy for EV charging data.The complete source code is provided in the following repository:https://github.com/fffahim/ResiDualNet.git. 展开更多
关键词 electric vehicle load forecasting missing data imputation residual Seq2Seq SCINet
原文传递
Data-driven Missing Data Imputation for Wind Farms Using Context Encoder 被引量:3
6
作者 Wenlong Liao Birgitte Bak-Jensen +2 位作者 Jayakrishnan Radhakrishna Pillai Dechang Yang Yusen Wang 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2022年第4期964-976,共13页
High-quality datasets are of paramount importance for the operation and planning of wind farms.However,the datasets collected by the supervisory control and data acquisition(SCADA)system may contain missing data due t... High-quality datasets are of paramount importance for the operation and planning of wind farms.However,the datasets collected by the supervisory control and data acquisition(SCADA)system may contain missing data due to various factors such as sensor failure and communication congestion.In this paper,a data-driven approach is proposed to fill the missing data of wind farms based on a context encoder(CE),which consists of an encoder,a decoder,and a discriminator.Through deep convolutional neural networks,the proposed method is able to automatically explore the complex nonlinear characteristics of the datasets that are difficult to be modeled explicitly.The proposed method can not only fully use the surrounding context information by the reconstructed loss,but also make filling data look real by the adversarial loss.In addition,the correlation among multiple missing attributes is taken into account by adjusting the format of input data.The simulation results show that CE performs better than traditional methods for the attributes of wind farms with hallmark characteristics such as large peaks,large valleys,and fast ramps.Moreover,the CE shows stronger generalization ability than traditional methods such as auto-encoder,K-means,k-nearest neighbor,back propagation neural network,cubic interpolation,and conditional generative adversarial network for different missing data scales. 展开更多
关键词 data-DRIVEN missing data imputation wind farm deep learning context encoder
原文传递
Big Data Cleaning Based on Improved CLOF and Random Forest for Distribution Networks
7
作者 Jie Liu Yijia Cao +2 位作者 Yong Li Yixiu Guo Wei Deng 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2024年第6期2528-2538,共11页
In order to improve the data quality,the big data cleaning method for distribution networks is studied in this paper.First,the Local Outlier Factor(LOF)algorithm based on DBSCAN clustering is used to detect outliers.H... In order to improve the data quality,the big data cleaning method for distribution networks is studied in this paper.First,the Local Outlier Factor(LOF)algorithm based on DBSCAN clustering is used to detect outliers.However,due to the difficulty in determining the LOF threshold,a method of dynamically calculating the threshold based on the transformer districts and time is proposed.In addition,the LOF algorithm combines the statistical distribution method to reduce the misjudgment rate.Aiming at the diversity and complexity of data missing forms in power big data,this paper has improved the Random Forest imputation algorithm,which can be applied to various forms of missing data,especially the blocked missing data and even some completely missing horizontal or vertical data.The data in this paper are from real data of 44 transformer districts of a certain 10 kV line in a distribution network.Experimental results show that outlier detection is accurate and suitable for any shape and multidimensional power big data.The improved Random Forest imputation algorithm is suitable for all missing forms,with higher imputation accuracy and better model stability.By comparing the network loss prediction between the data using this data cleaning method and the data removing outliers and missing values,it can be found that the accuracy of network loss prediction has improved by nearly 4%using the data cleaning method identified in this paper.Additionally,as the proportion of bad data increased,the difference between the prediction accuracy of cleaned data and that of uncleaned data is more significant. 展开更多
关键词 data cleaning DBSCAN LOF missing data imputation outliers detection Random Forest
原文传递
Traffic volume imputation using the attention-based spatiotemporal generative adversarial imputation network
8
作者 Yixin Duan Chengcheng Wang +2 位作者 Chao Wang Jinjun Tang Qun Chen 《Transportation Safety and Environment》 2024年第4期54-67,共14页
With the increasing development of intelligent detection devices,a vast amount of traffic flow data can be collected from intelligent transportation systems.However,these data often encounter issues such as missing an... With the increasing development of intelligent detection devices,a vast amount of traffic flow data can be collected from intelligent transportation systems.However,these data often encounter issues such as missing and abnormal values,which can adversely affect the accuracy of future tasks like traffic flow forecasting.To address this problem,this paper proposes the Attention-based Spatiotemporal Generative Adversarial Imputation Network(ASTGAIN)model,comprising a generator and a discriminator,to conduct traffic volume imputation.The generator incorporates an information fuse module,a spatial attention mechanism,a causal inference module and a temporal attention mechanism,enabling it to capture historical information and extract spatiotemporal relationships from the traffic flow data.The discriminator features a bidirectional gated recurrent unit,which explores the temporal correlation of the imputed data to distinguish between imputed and original values.Additionally,we have devised an imputation filling technique that fully leverages the imputed data to enhance the imputation performance.Comparison experiments with several traditional imputation models demonstrate the superior performance of the ASTGAIN model across diverse missing scenarios. 展开更多
关键词 missing data imputation generative adversarial network spatiotemporal traffic flow data attention mechanism
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部