Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio...Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.展开更多
Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there sti...Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there still leaves a great deal of data unavailable. This makesdata prediction a valuable work. A method that can predict data of product under development basedon the existing similar products is proposed. Fuzzy theory is used to deal with the uncertainties indata prediction process. The proposed method can be used in life cycle design, life cycleassessment (LCA) etc. Case study on current refrigerator is used as a demonstration example.展开更多
This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key de...This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.展开更多
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in...On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.展开更多
Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reduc...Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reducing redundant data communication.It also discusses the Bayesian analysis that finds the conditional probability of at least two parametric based predictions for the data.The paper presents a method for improving the performance of Bayesian classification using the combination of Kalman Filter and K-means.The method is applied on a small dataset just for establishing the fact that the proposed algorithm can reduce the time for computing the clusters from data.The proposed Bayesian learning probabilistic model is used to check the statistical noise and other inaccuracies using unknown variables.This scenario is being implemented using efficient machine learning algorithm to perpetuate the Bayesian probabilistic approach.It also demonstrates the generative function forKalman-filer based prediction model and its observations.This paper implements the algorithm using open source platform of Python and efficiently integrates all different modules to piece of code via Common Platform Enumeration(CPE)for Python.展开更多
Multivariate time series forecasting iswidely used in traffic planning,weather forecasting,and energy consumption.Series decomposition algorithms can help models better understand the underlying patterns of the origin...Multivariate time series forecasting iswidely used in traffic planning,weather forecasting,and energy consumption.Series decomposition algorithms can help models better understand the underlying patterns of the original series to improve the forecasting accuracy of multivariate time series.However,the decomposition kernel of previous decomposition-based models is fixed,and these models have not considered the differences in frequency fluctuations between components.These problems make it difficult to analyze the intricate temporal variations of real-world time series.In this paper,we propose a series decomposition-based Mamba model,DecMamba,to obtain the intricate temporal dependencies and the dependencies among different variables of multivariate time series.A variable-level adaptive kernel combination search module is designed to interact with information on different trends and periods between variables.Two backbone structures are proposed to emphasize the differences in frequency fluctuations of seasonal and trend components.Mamba with superior performance is used instead of a Transformer in backbone structures to capture the dependencies among different variables.A new embedding block is designed to capture the temporal features better,especially for the high-frequency seasonal component whose semantic information is difficult to acquire.A gating mechanism is introduced to the decoder in the seasonal backbone to improve the prediction accuracy.A comparison with ten state-of-the-art models on seven real-world datasets demonstrates that DecMamba can better model the temporal dependencies and the dependencies among different variables,guaranteeing better prediction performance for multivariate time series.展开更多
The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration o...The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.展开更多
This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Da...This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Dalian,China.Due to the large error between the initial geological exploration data and real strata,the project construction is extremely difficult.In view of the current situation regarding the project,a quantitative method for evaluating the tunneling efficiency was proposed using cutterhead rotation(R),advance speed(S),total thrust(F)and torque(T).A total of 80 datasets with three input parameters and one output variable(F or T)were collected from this project,and a prediction framework based gray system model was established.Based on the prediction model,five prediction schemes were set up.Through error analysis,the optimal prediction scheme was obtained from the five schemes.The parametric investigation performed indicates that the relationships between F and the three input variables in the gray system model harmonize with the theoretical explanation.The case shows that the shield tunneling performance and efficiency are improved by the tunneling parameter prediction model based on the gray system model.展开更多
https://www.sciencedirect.com/journal/energy-and-buildings/vol/338/suppl/C Volume 338,1 July 2025[OA](1)Real long-term performance evaluation of an improved office building operation involving a Data-driven model pred...https://www.sciencedirect.com/journal/energy-and-buildings/vol/338/suppl/C Volume 338,1 July 2025[OA](1)Real long-term performance evaluation of an improved office building operation involving a Data-driven model predictive control by Peter Klanatsky,Fran ois Veynandt,Christian Heschl,et al,Article 115590 Abstract:Data-driven Model Predictive Control(DMPC)strategies,coupled with holistically optimized HVAC system control,represent a promising approach to achieve climate targets through significant reductions in building energy consumption and associated emissions.To validate this potential in a real-world environment,a comprehensive optimization study was conducted on an office building serving as a living laboratory.Through systematic analysis of historical operational data,multiple Energy Conservation Measures(ECMs)were identified and implemented.The cornerstone of these improvements was the development and deployment of a centralized adaptive DMPC system,which was operated and evaluated over a full year.展开更多
Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-...Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.展开更多
This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified succes...This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified successfully through five classi-fiers using the selected feature subset,and the best model can be determined further.The effects on analyzing electricity consump-tion of the other three attributes,including months,businesses,and meters,can be estimated using the chosen model.The data used for the project is provided by Beijing Power Supply Bureau.We use WEKA as the machine learning tool.The models we built are promising for electricity scheduling and power theft detection.展开更多
The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was wea...The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.展开更多
A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The pre...A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The predicted data are used to draw washability curves and to provide a rapid evaluation of the effect from heavy medium induced separation.Thirty-one production shifts worth of fast float/sink data and the corresponding quick ash data are used to verify the model.The results show a small error with an arithmetic average of 0.53 and an absolute average error of 1.50.This indicates that this model has high precision.The theoretical yield from the washability curves is 76.47% for the monthly comprehensive data and 81.31% using the model data.This is for a desired cleaned coal ash of 9%.The relative error between these two is 6.33%,which is small and indicates that the predicted data can be used to rapidly evaluate the separation effect of gravity separation equipment.展开更多
Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in t...Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.展开更多
Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each appl...Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each application scenario to a certain extent.In this paper,we select the time series prediction problem in the atmospheric environment scenario to start the application research.In terms of data support,we obtain the data of nearly 3500 vehicles in some cities in China fromRunwoda Research Institute,focusing on the major pollutant emission data of non-road mobile machinery and high emission vehicles in Beijing and Bozhou,Anhui Province to build the dataset and conduct the time series prediction analysis experiments on them.This paper proposes a P-gLSTNet model,and uses Autoregressive Integrated Moving Average model(ARIMA),long and short-term memory(LSTM),and Prophet to predict and compare the emissions in the future period.The experiments are validated on four public data sets and one self-collected data set,and the mean absolute error(MAE),root mean square error(RMSE),and mean absolute percentage error(MAPE)are selected as the evaluationmetrics.The experimental results show that the proposed P-gLSTNet fusion model predicts less error,outperforms the backbone method,and is more suitable for the prediction of time-series data in this scenario.展开更多
Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necess...Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necessary to obtain the data efficiently by reducing the consumption of nodal energy.Hence,a technique for data reduction in WSN is presented in this paper by proposing a prediction algorithm,called Hierarchical Fractional Bidirectional Least-Mean Square(HFBLMS)algorithm.The novel algorithm is designed by modifying Hierarchical Least-Mean Square(HLMS)algorithm with the inclusion of BLMS for bidirectional-based data prediction and Fractional Calculus(FC)in the weight update process.Data redundancy is achieved by transmitting only those data required based on the data predicted at the sensor node and the sink.Moreover,the proposed HFBLMS algorithm reduces the energy consumption in the network by the effective prediction attained by BLMS.Two metrics,such as energy consumption and prediction error,are used for the evaluation of performance of the HFBLMS prediction algorithm,where it can attain energy values of 0.3587 and 0.1953 at the maximum number of rounds and prediction errors of just 0.0213 and 0.0095,using air quality and localization datasets,respectively.展开更多
The rapid expansion of comprehensive sports datasets and the successful application of data mining techniques in various domains have given rise to the emergence of sports data prediction techniques.These techniques e...The rapid expansion of comprehensive sports datasets and the successful application of data mining techniques in various domains have given rise to the emergence of sports data prediction techniques.These techniques enable the extraction of hidden knowledge that can significantly impact the sports industry,as more and more clubs are using Machine Learning(ML)and Deep Learning(DL)methods to manage athletes and training.In this research,the focusing and intriguing aspects is predicting the outcomes of a specific basketball athletes,which has garnered significant attention for research.The paper was motivated by a dual interest in college and NBA basketball matches,alongside a keen observation of the evolving strategies employed by coaches in athlete management.Additionally,the interest was further reinforced by firsthand observations of such evolving methods during a baseball game at City Field in New York.These factors collectively underpin the relevance and significance of this research endeavor,highlighting the intersection of personal interest and the evolving landscape of sports management as compelling reasons for its pursuit.In the process of data selection,we acquired data from previously published essays as well as from Kaggle,a reputable online platform.Following this,we proceeded to evaluate several prominent machine learning models,namely Linear Regression,KNN,Gradient Boosting,Elastic Net,and Lasso,to ascertain their effectiveness in predicting the performance of specific players.Through rigorous analysis and comparison,we concluded that Linear Regression and Gradient Boosting exhibited superior predictive capabilities compared to the other models considered.These two models demonstrated a higher degree of accuracy and reliability in forecasting player performance,thus establishing them as the most suitable choices for our predictive modeling purposes.This meticulous selection process,involving both data acquisition and model evaluation,forms the foundation of our research methodology and underscores the rigor and precision with which our conclusions are drawn.展开更多
Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict ...Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.展开更多
Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and...Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and time consuming.Alternative methods are numerical models.These methods used measured experimental data to develop a representative model for predicting desired parameters.In this study,to predict saturation pressure,oil formation volume factor,and solution gas oil ratio,several Artificial Intelligent(AI)models were developed.582 reported data sets were used as data bank that covers a wide range of fluid properties.Accuracy and reliability of the model was examined by some statistical parameters such as correlation coefficient(R2),average absolute relative deviation(AARD),and root mean square error(RMSE).The results illustrated good accordance between predicted data and target values.The model was also compared with previous works and developed empirical correlations which indicated that it is more reliable than all compared models and correlations.At the end,relevancy factor was calculated for each input parameters to illustrate the impact of different parameters on the predicted values.Relevancy factor showed that in these models,solution gas oil ratio has greatest impact on both saturation pressure and oil formation volume factor.In the other hand,saturation pressure has greatest effect on solution gas oil ratio.展开更多
文摘Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
基金This project is supported by Ministry of Education, Culture, Sports, Science and Technology (MONBUSHO), Japan.
文摘Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there still leaves a great deal of data unavailable. This makesdata prediction a valuable work. A method that can predict data of product under development basedon the existing similar products is proposed. Fuzzy theory is used to deal with the uncertainties indata prediction process. The proposed method can be used in life cycle design, life cycleassessment (LCA) etc. Case study on current refrigerator is used as a demonstration example.
基金supported by Poongsan-KAIST Future Research Center Projectthe fund support provided by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Grant No.2023R1A2C2005661)。
文摘This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.
基金Project supported by the State Key Program of the National Natural Science of China (Grant No. 60835004)the Natural Science Foundation of Jiangsu Province of China (Grant No. BK2009727)+1 种基金the Natural Science Foundation of Higher Education Institutions of Jiangsu Province of China (Grant No. 10KJB510004)the National Natural Science Foundation of China (Grant No. 61075028)
文摘On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.
文摘Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reducing redundant data communication.It also discusses the Bayesian analysis that finds the conditional probability of at least two parametric based predictions for the data.The paper presents a method for improving the performance of Bayesian classification using the combination of Kalman Filter and K-means.The method is applied on a small dataset just for establishing the fact that the proposed algorithm can reduce the time for computing the clusters from data.The proposed Bayesian learning probabilistic model is used to check the statistical noise and other inaccuracies using unknown variables.This scenario is being implemented using efficient machine learning algorithm to perpetuate the Bayesian probabilistic approach.It also demonstrates the generative function forKalman-filer based prediction model and its observations.This paper implements the algorithm using open source platform of Python and efficiently integrates all different modules to piece of code via Common Platform Enumeration(CPE)for Python.
基金supported in part by the Interdisciplinary Project of Dalian University(DLUXK-2023-ZD-001).
文摘Multivariate time series forecasting iswidely used in traffic planning,weather forecasting,and energy consumption.Series decomposition algorithms can help models better understand the underlying patterns of the original series to improve the forecasting accuracy of multivariate time series.However,the decomposition kernel of previous decomposition-based models is fixed,and these models have not considered the differences in frequency fluctuations between components.These problems make it difficult to analyze the intricate temporal variations of real-world time series.In this paper,we propose a series decomposition-based Mamba model,DecMamba,to obtain the intricate temporal dependencies and the dependencies among different variables of multivariate time series.A variable-level adaptive kernel combination search module is designed to interact with information on different trends and periods between variables.Two backbone structures are proposed to emphasize the differences in frequency fluctuations of seasonal and trend components.Mamba with superior performance is used instead of a Transformer in backbone structures to capture the dependencies among different variables.A new embedding block is designed to capture the temporal features better,especially for the high-frequency seasonal component whose semantic information is difficult to acquire.A gating mechanism is introduced to the decoder in the seasonal backbone to improve the prediction accuracy.A comparison with ten state-of-the-art models on seven real-world datasets demonstrates that DecMamba can better model the temporal dependencies and the dependencies among different variables,guaranteeing better prediction performance for multivariate time series.
基金supported by the Basic Research Special Plan of Yunnan Provincial Department of Science and Technology-General Project(Grant No.202101AT070094)。
文摘The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.
基金support by the National Natural Science Foundation of China(Grant Nos.52108377,52090084,and 51938008).
文摘This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Dalian,China.Due to the large error between the initial geological exploration data and real strata,the project construction is extremely difficult.In view of the current situation regarding the project,a quantitative method for evaluating the tunneling efficiency was proposed using cutterhead rotation(R),advance speed(S),total thrust(F)and torque(T).A total of 80 datasets with three input parameters and one output variable(F or T)were collected from this project,and a prediction framework based gray system model was established.Based on the prediction model,five prediction schemes were set up.Through error analysis,the optimal prediction scheme was obtained from the five schemes.The parametric investigation performed indicates that the relationships between F and the three input variables in the gray system model harmonize with the theoretical explanation.The case shows that the shield tunneling performance and efficiency are improved by the tunneling parameter prediction model based on the gray system model.
文摘https://www.sciencedirect.com/journal/energy-and-buildings/vol/338/suppl/C Volume 338,1 July 2025[OA](1)Real long-term performance evaluation of an improved office building operation involving a Data-driven model predictive control by Peter Klanatsky,Fran ois Veynandt,Christian Heschl,et al,Article 115590 Abstract:Data-driven Model Predictive Control(DMPC)strategies,coupled with holistically optimized HVAC system control,represent a promising approach to achieve climate targets through significant reductions in building energy consumption and associated emissions.To validate this potential in a real-world environment,a comprehensive optimization study was conducted on an office building serving as a living laboratory.Through systematic analysis of historical operational data,multiple Energy Conservation Measures(ECMs)were identified and implemented.The cornerstone of these improvements was the development and deployment of a centralized adaptive DMPC system,which was operated and evaluated over a full year.
基金supported by the National Natural Science Foundation of China (41130530,91325301,41431177,41571212,41401237)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science,Chinese Academy of Sciences (ISSASIP1622)+1 种基金the Government Interest Related Program between Canadian Space Agency and Agriculture and Agri-Food,Canada (13MOA01002)the Natural Science Research Program of Jiangsu Province (14KJA170001)
文摘Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.
基金Supported by the National Earthquake Major Project of China (201008007)the Fundamental Research Funds for Central University of China (216275645)
文摘This paper proposes a model to analyze the massive data of electricity.Feature subset is determined by the correla-tion-based feature selection and the data-driven methods.The attribute season can be classified successfully through five classi-fiers using the selected feature subset,and the best model can be determined further.The effects on analyzing electricity consump-tion of the other three attributes,including months,businesses,and meters,can be estimated using the chosen model.The data used for the project is provided by Beijing Power Supply Bureau.We use WEKA as the machine learning tool.The models we built are promising for electricity scheduling and power theft detection.
文摘The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.
基金National Natural Science Foundation of China (No. 51174202)Doctoral Fund of Ministry of Education of China (No. 20100095110013)
文摘A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The predicted data are used to draw washability curves and to provide a rapid evaluation of the effect from heavy medium induced separation.Thirty-one production shifts worth of fast float/sink data and the corresponding quick ash data are used to verify the model.The results show a small error with an arithmetic average of 0.53 and an absolute average error of 1.50.This indicates that this model has high precision.The theoretical yield from the washability curves is 76.47% for the monthly comprehensive data and 81.31% using the model data.This is for a desired cleaned coal ash of 9%.The relative error between these two is 6.33%,which is small and indicates that the predicted data can be used to rapidly evaluate the separation effect of gravity separation equipment.
基金supported by the National Key Research and Development Program of China(No.2018YFB2101300)the National Natural Science Foundation of China(Grant No.61871186)the Dean’s Fund of Engineering Research Center of Software/Hardware Co-Design Technology and Application,Ministry of Education(East China Normal University).
文摘Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.
基金the Beijing Chaoyang District Collaborative Innovation Project(No.CYXT2013)the subject support of Beijing Municipal Science and Technology Key R&D Program-Capital Blue Sky Action Cultivation Project(Z19110900910000)+1 种基金“Research and Demonstration ofHigh Emission Vehicle Monitoring Equipment System Based on Sensor Integration Technology”(Z19110000911003)This work was supported by the Academic Research Projects of Beijing Union University(No.ZK80202103).
文摘Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each application scenario to a certain extent.In this paper,we select the time series prediction problem in the atmospheric environment scenario to start the application research.In terms of data support,we obtain the data of nearly 3500 vehicles in some cities in China fromRunwoda Research Institute,focusing on the major pollutant emission data of non-road mobile machinery and high emission vehicles in Beijing and Bozhou,Anhui Province to build the dataset and conduct the time series prediction analysis experiments on them.This paper proposes a P-gLSTNet model,and uses Autoregressive Integrated Moving Average model(ARIMA),long and short-term memory(LSTM),and Prophet to predict and compare the emissions in the future period.The experiments are validated on four public data sets and one self-collected data set,and the mean absolute error(MAE),root mean square error(RMSE),and mean absolute percentage error(MAPE)are selected as the evaluationmetrics.The experimental results show that the proposed P-gLSTNet fusion model predicts less error,outperforms the backbone method,and is more suitable for the prediction of time-series data in this scenario.
文摘Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necessary to obtain the data efficiently by reducing the consumption of nodal energy.Hence,a technique for data reduction in WSN is presented in this paper by proposing a prediction algorithm,called Hierarchical Fractional Bidirectional Least-Mean Square(HFBLMS)algorithm.The novel algorithm is designed by modifying Hierarchical Least-Mean Square(HLMS)algorithm with the inclusion of BLMS for bidirectional-based data prediction and Fractional Calculus(FC)in the weight update process.Data redundancy is achieved by transmitting only those data required based on the data predicted at the sensor node and the sink.Moreover,the proposed HFBLMS algorithm reduces the energy consumption in the network by the effective prediction attained by BLMS.Two metrics,such as energy consumption and prediction error,are used for the evaluation of performance of the HFBLMS prediction algorithm,where it can attain energy values of 0.3587 and 0.1953 at the maximum number of rounds and prediction errors of just 0.0213 and 0.0095,using air quality and localization datasets,respectively.
文摘The rapid expansion of comprehensive sports datasets and the successful application of data mining techniques in various domains have given rise to the emergence of sports data prediction techniques.These techniques enable the extraction of hidden knowledge that can significantly impact the sports industry,as more and more clubs are using Machine Learning(ML)and Deep Learning(DL)methods to manage athletes and training.In this research,the focusing and intriguing aspects is predicting the outcomes of a specific basketball athletes,which has garnered significant attention for research.The paper was motivated by a dual interest in college and NBA basketball matches,alongside a keen observation of the evolving strategies employed by coaches in athlete management.Additionally,the interest was further reinforced by firsthand observations of such evolving methods during a baseball game at City Field in New York.These factors collectively underpin the relevance and significance of this research endeavor,highlighting the intersection of personal interest and the evolving landscape of sports management as compelling reasons for its pursuit.In the process of data selection,we acquired data from previously published essays as well as from Kaggle,a reputable online platform.Following this,we proceeded to evaluate several prominent machine learning models,namely Linear Regression,KNN,Gradient Boosting,Elastic Net,and Lasso,to ascertain their effectiveness in predicting the performance of specific players.Through rigorous analysis and comparison,we concluded that Linear Regression and Gradient Boosting exhibited superior predictive capabilities compared to the other models considered.These two models demonstrated a higher degree of accuracy and reliability in forecasting player performance,thus establishing them as the most suitable choices for our predictive modeling purposes.This meticulous selection process,involving both data acquisition and model evaluation,forms the foundation of our research methodology and underscores the rigor and precision with which our conclusions are drawn.
基金supported by the National Key Research and Development Program of China (Grant No. 2017YFA0505500)Japan Society for the Promotion of Science KAKENHI Program (Grant No. JP15H05707)National Natural Science Foundation of China (Grant Nos. 11771010,31771476,91530320, 91529303,91439103 and 81471047)
文摘Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.
文摘Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and time consuming.Alternative methods are numerical models.These methods used measured experimental data to develop a representative model for predicting desired parameters.In this study,to predict saturation pressure,oil formation volume factor,and solution gas oil ratio,several Artificial Intelligent(AI)models were developed.582 reported data sets were used as data bank that covers a wide range of fluid properties.Accuracy and reliability of the model was examined by some statistical parameters such as correlation coefficient(R2),average absolute relative deviation(AARD),and root mean square error(RMSE).The results illustrated good accordance between predicted data and target values.The model was also compared with previous works and developed empirical correlations which indicated that it is more reliable than all compared models and correlations.At the end,relevancy factor was calculated for each input parameters to illustrate the impact of different parameters on the predicted values.Relevancy factor showed that in these models,solution gas oil ratio has greatest impact on both saturation pressure and oil formation volume factor.In the other hand,saturation pressure has greatest effect on solution gas oil ratio.