Background:Digital Twin(DT)has proven to be one of the most promising technologies for routine monitoring and management of complex systems with uncertainties.Methods:Our work,which is mainly concerned with heterogene...Background:Digital Twin(DT)has proven to be one of the most promising technologies for routine monitoring and management of complex systems with uncertainties.Methods:Our work,which is mainly concerned with heterogeneous spatial-temporal data,focuses on exploring data utilization methodology in DT.The goal of this research is to summarize the best practices that make the spatial-temporal data analytically tractable in a systematic and quantifiable manner.Some methods are found to handle those data via jointly spatial-temporal analysis in a highdimensional space effectively.We provide a concise yet comprehensive tutorial on spatial-temporal analysis considering data,theories,algorithms,indicators,and applications.The advantages of our spatial-temporal analysis are discussed,including model-free mode,solid theoretical foundation,and robustness against ubiquitous uncertainty and partial data error.Finally,we take the condition-based maintenance of a real digital substation in China as an example to verify our proposed spatial-temporal analysis mode.Results:Our proposed spatial-temporal data analysis mode successfully turned raw chromatographic data,which are valueless in low-dimensional space,into an informative high-dimensional indicator.The designed high-dimensional indicator could capture the’insulation’correlation among the sampling data over a long time span.Hence it is robust against external noise,and may support decision-making.This analysis is also adaptive to other daily spatialtemporal data in the same form.Conclusions:This exploration and summary of spatial-temporal data analysis may benefit the fields of both engineering and data science.展开更多
In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is import...In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is important to obtain accurate dynamic information on the spatial and temporal patterns of carbon emissions and carbon footprints to support formulating effective national carbon emission reduction policies.This study attempts to build a carbon emission panel data model that simulates carbon emissions in China from 2000–2013 using nighttime lighting data and carbon emission statistics data.By applying the Exploratory Spatial-Temporal Data Analysis(ESTDA)framework,this study conducted an analysis on the spatial patterns and dynamic spatial-temporal interactions of carbon footprints from 2001–2013.The improved Tapio decoupling model was adopted to investigate the levels of coupling or decoupling between the carbon emission load and economic growth in 336 prefecture-level units.The results show that,firstly,high accuracy was achieved by the model in simulating carbon emissions.Secondly,the total carbon footprints and carbon deficits across China increased with average annual growth rates of 4.82%and 5.72%,respectively.The overall carbon footprints and carbon deficits were larger in the North than that in the South.There were extremely significant spatial autocorrelation features in the carbon footprints of prefecture-level units.Thirdly,the relative lengths of the Local Indicators of Spatial Association(LISA)time paths were longer in the North than that in the South,and they increased from the coastal to the central and western regions.Lastly,the overall decoupling index was mainly a weak decoupling type,but the number of cities with this weak decoupling continued to decrease.The unsustainable development trend of China’s economic growth and carbon emission load will continue for some time.展开更多
The prosperity of deep learning has revolutionized many machine learning tasks(such as image recognition,natural language processing,etc.).With the widespread use of autonomous sensor networks,the Internet of Things,a...The prosperity of deep learning has revolutionized many machine learning tasks(such as image recognition,natural language processing,etc.).With the widespread use of autonomous sensor networks,the Internet of Things,and crowd sourcing to monitor real-world processes,the volume,diversity,and veracity of spatial-temporal data are expanding rapidly.However,traditional methods have their limitation in coping with spatial-temporal dependencies,which either incorporate too much data from weakly connected locations or ignore the relationships between those interrelated but geographically separated regions.In this paper,a novel deep learning model(termed RF-GWN)is proposed by combining Random Forest(RF)and Graph WaveNet(GWN).In RF-GWN,a new adaptive weight matrix is formulated by combining Variable Importance Measure(VIM)of RF with the long time series feature extraction ability of GWN in order to capture potential spatial dependencies and extract long-term dependencies from the input data.Furthermore,two experiments are conducted on two real-world datasets with the purpose of predicting traffic flow and groundwater level.Baseline models are implemented by Diffusion Convolutional Recurrent Neural Network(DCRNN),Spatial-Temporal GCN(ST-GCN),and GWN to verify the effectiveness of the RF-GWN.The Root Mean Square Error(RMSE),Mean Absolute Error(MAE),and Mean Absolute Percentage Error(MAPE)are selected as performance criteria.The results show that the proposed model can better capture the spatial-temporal relationships,the prediction performance on the METR-LA dataset is slightly improved,and the index of the prediction task on the PEMS-BAY dataset is significantly improved.These improvements are extended to the groundwater dataset,which can effectively improve the prediction accuracy.Thus,the applicability and effectiveness of the proposed model RF-GWN in both traffic flow and groundwater level prediction are demonstrated.展开更多
Lake surface water temperature (SWT) is an important indicator of lake state relative to its water chemistry and aquatic ecosystem,in addition to being an important regional climate indicator.However,few literatures...Lake surface water temperature (SWT) is an important indicator of lake state relative to its water chemistry and aquatic ecosystem,in addition to being an important regional climate indicator.However,few literatures involving spatial-temporal changes of lake SWT in the Qinghai-Tibet Plateau,including Qinghai Lake,are available.Our objective is to study the spatial-temporal changes in SWT of Qinghai Lake from 2001 to 2010,using Moderate-resolution Imaging Spectroradiometer (MODIS) data.Based on each pixel,we calculated the temporal SWT variations and long-term trends,compared the spatial patterns of annual average SWT in different years,and mapped and analyzed the seasonal cycles of the spatial patterns of SWT.The results revealed that the differences between the average daily SWT and air temperature during the temperature decreasing phase were relatively larger than those during the temperature increasing phase.The increasing rate of the annual average SWT during the study period was about 0.01℃/a,followed by an increasing rate of about 0.05℃/a in annual average air temperature.The annual average SWT from 2001 to 2010 showed similar spatial patterns,while the SWT spatial changes from January to December demonstrated an interesting seasonal reversion pattern.The high-temperature area transformed stepwise from the south to the north regions and then back to the south region from January to December,whereas the low-temperature area demonstrated a reversed annual cyclical trace.The spatial-temporal patterns of SWTs were shaped by the topography of the lake basin and the distribution of drainages.展开更多
As a significant city in the Yangtze River Delta regions,Hefei has experienced rapid changes in the sources of air pollution due to its high-speed economic development and urban expansion.However,there has been limite...As a significant city in the Yangtze River Delta regions,Hefei has experienced rapid changes in the sources of air pollution due to its high-speed economic development and urban expansion.However,there has been limited research in recent years on the spatial-temporal distribution and emission of its atmospheric pollutants.To address this,this study conducted mobile observations of urban roads using the Mobile-DOAS instrument from June 2021 to May 2022.The monitoring results exhibit a favourable consistent with TROPOMI satellite data and ground monitoring station data.Temporally,there were pronounced seasonal variations in air pollutants.Spatially,high concentration of HCHO and NO_(2)were closely associated with traffic congestion on roadways,while heightened SO_(2)levels were attributed to winter heating and industrial emissions.The study also revealed that with the implementation of road policies,the average vehicle speed increased by 95.4%,while the NO concentration decreased by 54.4%.In the estimation of urban NO_(x)emission flux,it was observed that in temporal terms,compared with inventory data,the emissions calculated viamobile measurements exhibitedmore distinct seasonal patterns,with the highest emission rate of 349 g/sec in winter and the lowest of 142 g/sec in summer.In spatial terms,the significant difference in emissions between the inner and outer ring roads also suggests the presence of the city’s primary NO_(x)emission sources in the area between these two rings.This study offers data support for formulating the next phase of air pollution control measures in urban areas.展开更多
1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich inf...1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on展开更多
The spatial pattern of meteorological factors cannot be accurately simulated by using observations from meteorological stations(OMS) that are distributed sparsely in complex terrain. It is expected that the spatial-te...The spatial pattern of meteorological factors cannot be accurately simulated by using observations from meteorological stations(OMS) that are distributed sparsely in complex terrain. It is expected that the spatial-temporal characteristics of drought in regions with complex terrain can be better represented by meteorological data with the high spatial-temporal resolution and accuracy. In this study, Standard Precipitation Evapotranspiration Index(SPEI) calculated with meteorological factors extracted from ITPCAS(China Meteorological Forcing Dataset produced by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences) was applied to identify the spatial-temporal characteristics of drought in Shaanxi Province of China, during the period of 1979–2016. Drought areas detected by SPEI calculated with data from ITPCAS(SPEI-ITPCAS) on the seasonal scale were validated by historical drought records from the Chinese Meteorological Disaster Canon-Shaanxi, and compared with drought areas detected by SPEI calculated with data from OMS(SPEI-OMS). Drought intensity, trend and temporal ranges for mutations of SPEI-ITPCAS were analyzed by using the cumulative drought intensity(CDI) index and the Mann-Kendall test. The results indicated that drought areas detected from SPEI-ITPCAS were closer to the historical drought records than those detected from SPEI-OMS. Severe and exceptional drought events with SPEI-ITPCAS lower than –1.0 occurred most frequently in summer, followed by spring. There was a general drying trend in spring and summer in Shaanxi Province and a significant wetting trend in autumn and winter in northern Shaanxi Province. On seasonal and annual scales, the regional and temporal ranges for mutations of SPEI-ITPCAS were different and most mutations occurred before the year 1990 in most regions of Shaanxi Province. The results reflect the response of different regions of Shaanxi Province to climate change, which will help to manage regional water resources.展开更多
In the current situation of decelerating economic expansion,examining the digital economy(DE)as a novel economic model is beneficial for the local economy’s sustainable and high-quality development(HQD).We analyzed p...In the current situation of decelerating economic expansion,examining the digital economy(DE)as a novel economic model is beneficial for the local economy’s sustainable and high-quality development(HQD).We analyzed panel data from the Yellow River(YR)region from 2013 to 2021 and discovered notable spatial variances in the composite index and coupling coordination of the two systems.Specifically,the downstream region exhibited the highest coupling coordination,while the upstream region had the lowest.We identified that favorable factors such as economic development,innovation,industrial upgrading,and government intervention can bolster the coupling.Our findings provide a valuable framework for promoting DE and HQD in the YR region.展开更多
Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
Spatial-temporal traffic prediction technology is crucial for network planning,resource allocation optimizing,and user experience improving.With the development of virtual network operators,multi-operator collaboratio...Spatial-temporal traffic prediction technology is crucial for network planning,resource allocation optimizing,and user experience improving.With the development of virtual network operators,multi-operator collaborations,and edge computing,spatial-temporal traffic data has taken on a distributed nature.Consequently,noncentralized spatial-temporal traffic prediction solutions have emerged as a recent research focus.Currently,the majority of research typically adopts federated learning methods to train traffic prediction models distributed on each base station.This method reduces additional burden on communication systems.However,this method has a drawback:it cannot handle irregular traffic data.Due to unstable wireless network environments,device failures,insufficient storage resources,etc.,data missing inevitably occurs during the process of collecting traffic data.This results in the irregular nature of distributed traffic data.Yet,commonly used traffic prediction models such as Recurrent Neural Networks(RNN)and Long Short-Term Memory(LSTM)typically assume that the data is complete and regular.To address the challenge of handling irregular traffic data,this paper transforms irregular traffic prediction into problems of estimating latent variables and generating future traffic.To solve the aforementioned problems,this paper introduces split learning to design a structured distributed learning framework.The framework comprises a Global-level Spatial structure mining Model(GSM)and several Nodelevel Generative Models(NGMs).NGM and GSM represent Seq2Seq models deployed on the base station and graph neural network models deployed on the cloud or central controller.Firstly,the time embedding layer in NGM establishes the mapping relationship between irregular traffic data and regular latent temporal feature variables.Secondly,GSM collects statistical feature parameters of latent temporal feature variables from various nodes and executes graph embedding for spatial-temporal traffic data.Finally,NGM generates future traffic based on latent temporal and spatial feature variables.The introduction of the time attention mechanism enhances the framework’s capability to handle irregular traffic data.Graph attention network introduces spatially correlated base station traffic feature information into local traffic prediction,which compensates for missing information in local irregular traffic data.The proposed framework effectively addresses the distributed prediction issues of irregular traffic data.By testing on real world datasets,the proposed framework improves traffic prediction accuracy by 35%compared to other commonly used distributed traffic prediction methods.展开更多
With the intelligent transformation of process manufacturing,accurate and comprehensive perception information is fundamental for application of artificial intelligence methods.In zinc smelting,the fluidized bed roast...With the intelligent transformation of process manufacturing,accurate and comprehensive perception information is fundamental for application of artificial intelligence methods.In zinc smelting,the fluidized bed roaster is a key piece of large-scale equipment and plays a critical role in the manufacturing industry;its internal temperature field directly determines the quality of zinc calcine and other related products.However,due to its vast spatial dimensions,the limited observation methods,and the complex multiphase,multifield coupled reaction atmosphere inside it,accurately and timely perceiving its temperature field remains a significant challenge.To address these challenges,a spatial-temporal reduced-order model(STROM)is proposed,which can realize fast and accurate temperature field perception based on sparse observation data.Specifically,to address the difficulty in matching the initial physical field with the sparse observation data,an initial field construction based on data assimilation(IFCDA)method is proposed to ensure that the initial conditions of the model can be matched with the actual operation state,which provides a basis for constructing a high-precision computational fluid dynamics(CFD)model.Then,to address the high simulation cost of high-precision CFD models under full working conditions,a high uniformity(HU)-orthogonal test design(OTD)method with the centered L2 deviation is innovatively proposed to ensure high information coverage of the temperature field dataset under typical working conditions in terms of multiple factors and levels of the component,feed,and blast parameters.Finally,to address the difficulty in real-time and accurate temperature field prediction,considering the spatial correlation between the observed temperature and the temperature field,as well as the dynamic correlation of the observed temperature in the time dimension,a spatial-temporal predictive model(STPM)is established,which realizes rapid prediction of the temperature field through sparse observa-tion data.To verify the accuracy and validity of the proposed method,CFD model validation and reduced-order model prediction experiments are designed,and the results show that the proposed method can realize high-precision and fast prediction of the roaster temperature field under different working conditions through sparse observation data.Compared with the CFD model,the prediction root-mean-square error(RMSE)of STROM is less than 0.038,and the computational efficiency is improved by 3.4184×10^(4)times.In particular,STROM also has a good prediction ability for unmodeled conditions,with a prediction RMSE of less than 0.1089.展开更多
With the rapid development of the society,water contamination events cause great loss if the accidents happen in the water supply system.A large number of sensor nodes of water quality are deployed in the water supply...With the rapid development of the society,water contamination events cause great loss if the accidents happen in the water supply system.A large number of sensor nodes of water quality are deployed in the water supply network to detect and warn the contamination events to prevent pollution from speading.If all of sensor nodes detect and transmit the water quality data when the contamination occurs,it results in the heavy communication overhead.To reduce the communication overhead,the Connected Dominated Set construction algorithm-Rule K,is adopted to select a part fo sensor nodes.Moreover,in order to improve the detection accuracy,a Spatial-Temporal Abnormal Event Detection Algorithm with Multivariate water quality data(M-STAEDA)was proposed.In M-STAEDA,first,Back Propagation neural network models are adopted to analyze the multiple water quality parameters and calculate the possible outliers.Then,M-STAEDA algorithm determines the potential contamination events through Bayesian sequential analysis to estimate the probability of a contamination event.Third,it can make decision based on the multiple event probabilities fusion.Finally,a spatial correlation model is applied to determine the spatial-temporal contamination event in the water supply networks.The experimental results indicate that the proposed M-STAEDA algorithm can obtain more accuracy with BP neural network model and improve the rate of detection and the false alarm rate,compared with the temporal event detection of Single Variate Temporal Abnormal Event Detection Algorithm(M-STAEDA).展开更多
Based on the night light data, urban area data, and economic data of Wuhan Urban Agglomeration from 2009 to 2015, we use spatial correlation dimension, spatial self-correlation analysis and weighted standard deviation...Based on the night light data, urban area data, and economic data of Wuhan Urban Agglomeration from 2009 to 2015, we use spatial correlation dimension, spatial self-correlation analysis and weighted standard deviation ellipse to identify the general characteristics and dynamic evolution characteristics of urban spatial pattern and economic disparity pattern. The research results prove that: between 2009 and 2013, Wuhan Urban Agglomeration expanded gradually from northwest to southeast and presented the dynamic evolution features of “along the river and the road”. The spatial structure is obvious, forming the pattern of “core-periphery”. The development of Wuhan Urban Agglomeration has obvious imbalance in economic geography space, presenting the development tendency of “One prominent, stronger in the west and weaker in the east”. The contract within Wuhan Urban Agglomeration is gradually decreased. Wuhan city and its surrounding areas have stronger economic growth strength as well as the cities along The Yangtze River. However, the relative development rate of Wuhan city area is still far higher than other cities and counties.展开更多
As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who vi...As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who visit them.Recently,social media big data has provided new data sources for sentiment analysis.However,there was limited researches that explored the connection between urban parks and individual’s sentiments.Therefore,this study firstly employed a pre-trained language model(BERT,Bidirectional Encoder Representations from Transformers)to calculate sentiment scores based on social media data.Secondly,this study analysed the relationship between urban parks and individual’s sentiment from both spatial and temporal perspectives.Finally,by utilizing structural equation model(SEM),we identified 13 factors and analyzed its degree of the influence.The research findings are listed as below:①It confirmed that individuals generally experienced positive sentiment with high sentiment scores in the majority of urban parks;②The urban park type showed an influence on sentiment scores.In this study,higher sentiment scores observed in Eco-parks,comprehensive parks,and historical parks;③The urban parks level showed low impact on sentiment scores.With distinctions observed mainly at level-3 and level-4;④Compared to internal factors in parks,the external infrastructure surround them exerted more significant impact on sentiment scores.For instance,number of bus and subway stations around urban parks led to higher sentiment scores,while scenic spots and restaurants had inverse result.This study provided a novel method to quantify the services of various urban parks,which can be served as inspiration for similar studies in other cities and countries,enhancing their park planning and management strategies.展开更多
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack...The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
基金supported by the National Natural Science Foundation of China(51907121,61871265)awarded to Xing HEthe Foundation from State Grid Shanghai Pudong Electric Power Supply Company(SGSHPD00YJJS2106751)awarded to Qian Ai and Xing HE.
文摘Background:Digital Twin(DT)has proven to be one of the most promising technologies for routine monitoring and management of complex systems with uncertainties.Methods:Our work,which is mainly concerned with heterogeneous spatial-temporal data,focuses on exploring data utilization methodology in DT.The goal of this research is to summarize the best practices that make the spatial-temporal data analytically tractable in a systematic and quantifiable manner.Some methods are found to handle those data via jointly spatial-temporal analysis in a highdimensional space effectively.We provide a concise yet comprehensive tutorial on spatial-temporal analysis considering data,theories,algorithms,indicators,and applications.The advantages of our spatial-temporal analysis are discussed,including model-free mode,solid theoretical foundation,and robustness against ubiquitous uncertainty and partial data error.Finally,we take the condition-based maintenance of a real digital substation in China as an example to verify our proposed spatial-temporal analysis mode.Results:Our proposed spatial-temporal data analysis mode successfully turned raw chromatographic data,which are valueless in low-dimensional space,into an informative high-dimensional indicator.The designed high-dimensional indicator could capture the’insulation’correlation among the sampling data over a long time span.Hence it is robust against external noise,and may support decision-making.This analysis is also adaptive to other daily spatialtemporal data in the same form.Conclusions:This exploration and summary of spatial-temporal data analysis may benefit the fields of both engineering and data science.
基金National Natural Science Foundation of China Youth Science Foundation ProjectNo.41701170+1 种基金National Natural Science Foundation of China,No.41661025,No.42071216Fundamental Research Funds for the Central Universities,No.18LZUJBWZY068。
文摘In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is important to obtain accurate dynamic information on the spatial and temporal patterns of carbon emissions and carbon footprints to support formulating effective national carbon emission reduction policies.This study attempts to build a carbon emission panel data model that simulates carbon emissions in China from 2000–2013 using nighttime lighting data and carbon emission statistics data.By applying the Exploratory Spatial-Temporal Data Analysis(ESTDA)framework,this study conducted an analysis on the spatial patterns and dynamic spatial-temporal interactions of carbon footprints from 2001–2013.The improved Tapio decoupling model was adopted to investigate the levels of coupling or decoupling between the carbon emission load and economic growth in 336 prefecture-level units.The results show that,firstly,high accuracy was achieved by the model in simulating carbon emissions.Secondly,the total carbon footprints and carbon deficits across China increased with average annual growth rates of 4.82%and 5.72%,respectively.The overall carbon footprints and carbon deficits were larger in the North than that in the South.There were extremely significant spatial autocorrelation features in the carbon footprints of prefecture-level units.Thirdly,the relative lengths of the Local Indicators of Spatial Association(LISA)time paths were longer in the North than that in the South,and they increased from the coastal to the central and western regions.Lastly,the overall decoupling index was mainly a weak decoupling type,but the number of cities with this weak decoupling continued to decrease.The unsustainable development trend of China’s economic growth and carbon emission load will continue for some time.
文摘The prosperity of deep learning has revolutionized many machine learning tasks(such as image recognition,natural language processing,etc.).With the widespread use of autonomous sensor networks,the Internet of Things,and crowd sourcing to monitor real-world processes,the volume,diversity,and veracity of spatial-temporal data are expanding rapidly.However,traditional methods have their limitation in coping with spatial-temporal dependencies,which either incorporate too much data from weakly connected locations or ignore the relationships between those interrelated but geographically separated regions.In this paper,a novel deep learning model(termed RF-GWN)is proposed by combining Random Forest(RF)and Graph WaveNet(GWN).In RF-GWN,a new adaptive weight matrix is formulated by combining Variable Importance Measure(VIM)of RF with the long time series feature extraction ability of GWN in order to capture potential spatial dependencies and extract long-term dependencies from the input data.Furthermore,two experiments are conducted on two real-world datasets with the purpose of predicting traffic flow and groundwater level.Baseline models are implemented by Diffusion Convolutional Recurrent Neural Network(DCRNN),Spatial-Temporal GCN(ST-GCN),and GWN to verify the effectiveness of the RF-GWN.The Root Mean Square Error(RMSE),Mean Absolute Error(MAE),and Mean Absolute Percentage Error(MAPE)are selected as performance criteria.The results show that the proposed model can better capture the spatial-temporal relationships,the prediction performance on the METR-LA dataset is slightly improved,and the index of the prediction task on the PEMS-BAY dataset is significantly improved.These improvements are extended to the groundwater dataset,which can effectively improve the prediction accuracy.Thus,the applicability and effectiveness of the proposed model RF-GWN in both traffic flow and groundwater level prediction are demonstrated.
基金supported by the National Basic Research Program of China(2012CB417001)the National Natural Science Foundation of China(41271125)
文摘Lake surface water temperature (SWT) is an important indicator of lake state relative to its water chemistry and aquatic ecosystem,in addition to being an important regional climate indicator.However,few literatures involving spatial-temporal changes of lake SWT in the Qinghai-Tibet Plateau,including Qinghai Lake,are available.Our objective is to study the spatial-temporal changes in SWT of Qinghai Lake from 2001 to 2010,using Moderate-resolution Imaging Spectroradiometer (MODIS) data.Based on each pixel,we calculated the temporal SWT variations and long-term trends,compared the spatial patterns of annual average SWT in different years,and mapped and analyzed the seasonal cycles of the spatial patterns of SWT.The results revealed that the differences between the average daily SWT and air temperature during the temperature decreasing phase were relatively larger than those during the temperature increasing phase.The increasing rate of the annual average SWT during the study period was about 0.01℃/a,followed by an increasing rate of about 0.05℃/a in annual average air temperature.The annual average SWT from 2001 to 2010 showed similar spatial patterns,while the SWT spatial changes from January to December demonstrated an interesting seasonal reversion pattern.The high-temperature area transformed stepwise from the south to the north regions and then back to the south region from January to December,whereas the low-temperature area demonstrated a reversed annual cyclical trace.The spatial-temporal patterns of SWTs were shaped by the topography of the lake basin and the distribution of drainages.
基金supported by the National Natural Science Foundation of China(Nos.U19A2044,42105132,42030609,41975037,and 42105133)the National Key Research and Development Program of China(No.2022YFC3703502)+1 种基金the Plan for Anhui Major Provincial Science&Technology Project(No.202203a07020003)Hefei Ecological Environment Bureau Project(No.2020BFFFD01804).
文摘As a significant city in the Yangtze River Delta regions,Hefei has experienced rapid changes in the sources of air pollution due to its high-speed economic development and urban expansion.However,there has been limited research in recent years on the spatial-temporal distribution and emission of its atmospheric pollutants.To address this,this study conducted mobile observations of urban roads using the Mobile-DOAS instrument from June 2021 to May 2022.The monitoring results exhibit a favourable consistent with TROPOMI satellite data and ground monitoring station data.Temporally,there were pronounced seasonal variations in air pollutants.Spatially,high concentration of HCHO and NO_(2)were closely associated with traffic congestion on roadways,while heightened SO_(2)levels were attributed to winter heating and industrial emissions.The study also revealed that with the implementation of road policies,the average vehicle speed increased by 95.4%,while the NO concentration decreased by 54.4%.In the estimation of urban NO_(x)emission flux,it was observed that in temporal terms,compared with inventory data,the emissions calculated viamobile measurements exhibitedmore distinct seasonal patterns,with the highest emission rate of 349 g/sec in winter and the lowest of 142 g/sec in summer.In spatial terms,the significant difference in emissions between the inner and outer ring roads also suggests the presence of the city’s primary NO_(x)emission sources in the area between these two rings.This study offers data support for formulating the next phase of air pollution control measures in urban areas.
基金supported by the Special Scientific Research Fund of Public Welfare Profession of Ministry of Land and Resources of the People’s Republic of China (No. 201011057)
文摘1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on
基金supported by the National Natural Science Foundation of China (41871307)the Shaanxi Coordinate Innovation Plan Project of Science and Technology (2016KTCL03-17)。
文摘The spatial pattern of meteorological factors cannot be accurately simulated by using observations from meteorological stations(OMS) that are distributed sparsely in complex terrain. It is expected that the spatial-temporal characteristics of drought in regions with complex terrain can be better represented by meteorological data with the high spatial-temporal resolution and accuracy. In this study, Standard Precipitation Evapotranspiration Index(SPEI) calculated with meteorological factors extracted from ITPCAS(China Meteorological Forcing Dataset produced by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences) was applied to identify the spatial-temporal characteristics of drought in Shaanxi Province of China, during the period of 1979–2016. Drought areas detected by SPEI calculated with data from ITPCAS(SPEI-ITPCAS) on the seasonal scale were validated by historical drought records from the Chinese Meteorological Disaster Canon-Shaanxi, and compared with drought areas detected by SPEI calculated with data from OMS(SPEI-OMS). Drought intensity, trend and temporal ranges for mutations of SPEI-ITPCAS were analyzed by using the cumulative drought intensity(CDI) index and the Mann-Kendall test. The results indicated that drought areas detected from SPEI-ITPCAS were closer to the historical drought records than those detected from SPEI-OMS. Severe and exceptional drought events with SPEI-ITPCAS lower than –1.0 occurred most frequently in summer, followed by spring. There was a general drying trend in spring and summer in Shaanxi Province and a significant wetting trend in autumn and winter in northern Shaanxi Province. On seasonal and annual scales, the regional and temporal ranges for mutations of SPEI-ITPCAS were different and most mutations occurred before the year 1990 in most regions of Shaanxi Province. The results reflect the response of different regions of Shaanxi Province to climate change, which will help to manage regional water resources.
基金supported by the National Office for Philosophy and Social Sciences(grant reference 22&ZD067).
文摘In the current situation of decelerating economic expansion,examining the digital economy(DE)as a novel economic model is beneficial for the local economy’s sustainable and high-quality development(HQD).We analyzed panel data from the Yellow River(YR)region from 2013 to 2021 and discovered notable spatial variances in the composite index and coupling coordination of the two systems.Specifically,the downstream region exhibited the highest coupling coordination,while the upstream region had the lowest.We identified that favorable factors such as economic development,innovation,industrial upgrading,and government intervention can bolster the coupling.Our findings provide a valuable framework for promoting DE and HQD in the YR region.
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
基金supported by the Beijing Natural Science Foundation(Certificate Number:L234025).
文摘Spatial-temporal traffic prediction technology is crucial for network planning,resource allocation optimizing,and user experience improving.With the development of virtual network operators,multi-operator collaborations,and edge computing,spatial-temporal traffic data has taken on a distributed nature.Consequently,noncentralized spatial-temporal traffic prediction solutions have emerged as a recent research focus.Currently,the majority of research typically adopts federated learning methods to train traffic prediction models distributed on each base station.This method reduces additional burden on communication systems.However,this method has a drawback:it cannot handle irregular traffic data.Due to unstable wireless network environments,device failures,insufficient storage resources,etc.,data missing inevitably occurs during the process of collecting traffic data.This results in the irregular nature of distributed traffic data.Yet,commonly used traffic prediction models such as Recurrent Neural Networks(RNN)and Long Short-Term Memory(LSTM)typically assume that the data is complete and regular.To address the challenge of handling irregular traffic data,this paper transforms irregular traffic prediction into problems of estimating latent variables and generating future traffic.To solve the aforementioned problems,this paper introduces split learning to design a structured distributed learning framework.The framework comprises a Global-level Spatial structure mining Model(GSM)and several Nodelevel Generative Models(NGMs).NGM and GSM represent Seq2Seq models deployed on the base station and graph neural network models deployed on the cloud or central controller.Firstly,the time embedding layer in NGM establishes the mapping relationship between irregular traffic data and regular latent temporal feature variables.Secondly,GSM collects statistical feature parameters of latent temporal feature variables from various nodes and executes graph embedding for spatial-temporal traffic data.Finally,NGM generates future traffic based on latent temporal and spatial feature variables.The introduction of the time attention mechanism enhances the framework’s capability to handle irregular traffic data.Graph attention network introduces spatially correlated base station traffic feature information into local traffic prediction,which compensates for missing information in local irregular traffic data.The proposed framework effectively addresses the distributed prediction issues of irregular traffic data.By testing on real world datasets,the proposed framework improves traffic prediction accuracy by 35%compared to other commonly used distributed traffic prediction methods.
基金supported in part by the National Key Research and Development Program of China(2022YFB3304900)in part by the National Natural Science Foundation of China(62394340 and 62073340)in part by the Science and Technology Innovation Program of Hunan Province(2022JJ10083).
文摘With the intelligent transformation of process manufacturing,accurate and comprehensive perception information is fundamental for application of artificial intelligence methods.In zinc smelting,the fluidized bed roaster is a key piece of large-scale equipment and plays a critical role in the manufacturing industry;its internal temperature field directly determines the quality of zinc calcine and other related products.However,due to its vast spatial dimensions,the limited observation methods,and the complex multiphase,multifield coupled reaction atmosphere inside it,accurately and timely perceiving its temperature field remains a significant challenge.To address these challenges,a spatial-temporal reduced-order model(STROM)is proposed,which can realize fast and accurate temperature field perception based on sparse observation data.Specifically,to address the difficulty in matching the initial physical field with the sparse observation data,an initial field construction based on data assimilation(IFCDA)method is proposed to ensure that the initial conditions of the model can be matched with the actual operation state,which provides a basis for constructing a high-precision computational fluid dynamics(CFD)model.Then,to address the high simulation cost of high-precision CFD models under full working conditions,a high uniformity(HU)-orthogonal test design(OTD)method with the centered L2 deviation is innovatively proposed to ensure high information coverage of the temperature field dataset under typical working conditions in terms of multiple factors and levels of the component,feed,and blast parameters.Finally,to address the difficulty in real-time and accurate temperature field prediction,considering the spatial correlation between the observed temperature and the temperature field,as well as the dynamic correlation of the observed temperature in the time dimension,a spatial-temporal predictive model(STPM)is established,which realizes rapid prediction of the temperature field through sparse observa-tion data.To verify the accuracy and validity of the proposed method,CFD model validation and reduced-order model prediction experiments are designed,and the results show that the proposed method can realize high-precision and fast prediction of the roaster temperature field under different working conditions through sparse observation data.Compared with the CFD model,the prediction root-mean-square error(RMSE)of STROM is less than 0.038,and the computational efficiency is improved by 3.4184×10^(4)times.In particular,STROM also has a good prediction ability for unmodeled conditions,with a prediction RMSE of less than 0.1089.
文摘With the rapid development of the society,water contamination events cause great loss if the accidents happen in the water supply system.A large number of sensor nodes of water quality are deployed in the water supply network to detect and warn the contamination events to prevent pollution from speading.If all of sensor nodes detect and transmit the water quality data when the contamination occurs,it results in the heavy communication overhead.To reduce the communication overhead,the Connected Dominated Set construction algorithm-Rule K,is adopted to select a part fo sensor nodes.Moreover,in order to improve the detection accuracy,a Spatial-Temporal Abnormal Event Detection Algorithm with Multivariate water quality data(M-STAEDA)was proposed.In M-STAEDA,first,Back Propagation neural network models are adopted to analyze the multiple water quality parameters and calculate the possible outliers.Then,M-STAEDA algorithm determines the potential contamination events through Bayesian sequential analysis to estimate the probability of a contamination event.Third,it can make decision based on the multiple event probabilities fusion.Finally,a spatial correlation model is applied to determine the spatial-temporal contamination event in the water supply networks.The experimental results indicate that the proposed M-STAEDA algorithm can obtain more accuracy with BP neural network model and improve the rate of detection and the false alarm rate,compared with the temporal event detection of Single Variate Temporal Abnormal Event Detection Algorithm(M-STAEDA).
文摘Based on the night light data, urban area data, and economic data of Wuhan Urban Agglomeration from 2009 to 2015, we use spatial correlation dimension, spatial self-correlation analysis and weighted standard deviation ellipse to identify the general characteristics and dynamic evolution characteristics of urban spatial pattern and economic disparity pattern. The research results prove that: between 2009 and 2013, Wuhan Urban Agglomeration expanded gradually from northwest to southeast and presented the dynamic evolution features of “along the river and the road”. The spatial structure is obvious, forming the pattern of “core-periphery”. The development of Wuhan Urban Agglomeration has obvious imbalance in economic geography space, presenting the development tendency of “One prominent, stronger in the west and weaker in the east”. The contract within Wuhan Urban Agglomeration is gradually decreased. Wuhan city and its surrounding areas have stronger economic growth strength as well as the cities along The Yangtze River. However, the relative development rate of Wuhan city area is still far higher than other cities and counties.
基金R&D Program of Beijing Municipal Education Commission(No.KM202211417015)Academic Research Projects of Beijing Union University(No.ZK10202209)+1 种基金The team-building subsidy of“Xuezhi Professorship”of the College of Applied Arts and Science of Beijing Union University(No.BUUCAS-XZJSTD-2024005)Academic Research Projects of Beijing Union University(No.ZKZD202305).
文摘As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who visit them.Recently,social media big data has provided new data sources for sentiment analysis.However,there was limited researches that explored the connection between urban parks and individual’s sentiments.Therefore,this study firstly employed a pre-trained language model(BERT,Bidirectional Encoder Representations from Transformers)to calculate sentiment scores based on social media data.Secondly,this study analysed the relationship between urban parks and individual’s sentiment from both spatial and temporal perspectives.Finally,by utilizing structural equation model(SEM),we identified 13 factors and analyzed its degree of the influence.The research findings are listed as below:①It confirmed that individuals generally experienced positive sentiment with high sentiment scores in the majority of urban parks;②The urban park type showed an influence on sentiment scores.In this study,higher sentiment scores observed in Eco-parks,comprehensive parks,and historical parks;③The urban parks level showed low impact on sentiment scores.With distinctions observed mainly at level-3 and level-4;④Compared to internal factors in parks,the external infrastructure surround them exerted more significant impact on sentiment scores.For instance,number of bus and subway stations around urban parks led to higher sentiment scores,while scenic spots and restaurants had inverse result.This study provided a novel method to quantify the services of various urban parks,which can be served as inspiration for similar studies in other cities and countries,enhancing their park planning and management strategies.
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金funded by the Science and Technology Project of State Grid Corporation of China(5108-202355437A-3-2-ZN).
文摘The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.