Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
High spatiotemporal resolution infrared radiances from FY-4A/AGRI(Advanced Geostationary Radiation Imager)can provide crucial information for rapidly developing severe convective weather.This study established a symme...High spatiotemporal resolution infrared radiances from FY-4A/AGRI(Advanced Geostationary Radiation Imager)can provide crucial information for rapidly developing severe convective weather.This study established a symmetric observation error model that differentiates between land and sea for FY-4A/AGRI all-sky assimilation,developed an all-sky assimilation scheme for FY-4A/AGRI based on hydrometeor control variables,and investigated the impacts of all-sky FY-4A/AGRI water vapor channels at different altitudes and rapid-update assimilation at different frequencies on the assimilation and forecasting of a severe convective weather event.Results show that simultaneous assimilation of two water vapor channels can enhance precipitation forecasts compared to single-channel assimilation,which is mainly attributable to a more accurate analysis of water vapor and hydrometeor information.Experiments with different assimilation frequencies demonstrate that the hourly assimilation frequency,compared to other frequencies,incorporates the high-frequency information from AGRI while reducing the impact of spurious oscillations caused by excessively high-frequency assimilation.This hourly assimilation frequency reduces the incoordination among thermal,dynamical,and water vapor conditions caused by excessively fast or slow assimilation frequencies,thus improving the forecast accuracy compared to other frequencies.展开更多
Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental con...Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.展开更多
The rapid growth of distributed data-centric applications and AI workloads increases demand for low-latency,high-throughput communication,necessitating frequent and flexible updates to network routing configurations.H...The rapid growth of distributed data-centric applications and AI workloads increases demand for low-latency,high-throughput communication,necessitating frequent and flexible updates to network routing configurations.However,maintaining consistent forwarding states during these updates is challenging,particularly when rerouting multiple flows simultaneously.Existing approaches pay little attention to multi-flow update,where improper update sequences across data plane nodes may construct deadlock dependencies.Moreover,these methods typically involve excessive control-data plane interactions,incurring significant resource overhead and performance degradation.This paper presents P4LoF,an efficient loop-free update approach that enables the controller to reroute multiple flows through minimal interactions.P4LoF first utilizes a greedy-based algorithm to generate the shortest update dependency chain for the single-flow update.These chains are then dynamically merged into a dependency graph and resolved as a Shortest Common Super-sequence(SCS)problem to produce the update sequence of multi-flow update.To address deadlock dependencies in multi-flow updates,P4LoF builds a deadlock-fix forwarding model that leverages the flexible packet processing capabilities of the programmable data plane.Experimental results show that P4LoF reduces control-data plane interactions by at least 32.6%with modest overhead,while effectively guaranteeing loop-free consistency.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address thes...Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address these challenges,we propose an Uncertainty-Driven Graph Embedding-Enhanced Lateral Movement Detection framework(UGEA-LMD).First,the framework employs event-level incremental encoding on a continuous-time graph to capture fine-grained behavioral evolution,enabling newly appearing nodes to retain temporal contextual awareness even in the absence of historical interactions and thereby fundamentally mitigating the cold-start problem.Second,in the embedding space,we model the dependency structure among feature dimensions using a Gaussian copula to quantify the uncertainty distribution,and generate augmented samples with consistent structural and semantic properties through adaptive sampling,thus expanding the representation space of sparse samples and enhancing the model’s generalization under sparse sample conditions.Unlike static graph methods that cannot model temporal dependencies or data augmentation techniques that depend on predefined structures,UGEA-LMD offers both superior temporaldynamic modeling and structural generalization.Experimental results on the large-scale LANL log dataset demonstrate that,under the transductive setting,UGEA-LMD achieves an AUC of 0.9254;even when 10%of nodes or edges are withheld during training,UGEA-LMD significantly outperforms baseline methods on metrics such as recall and AUC,confirming its robustness and generalization capability in sparse-sample and cold-start scenarios.展开更多
Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d...Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.展开更多
Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(...Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(2025)].In relation to seasonal forecasting and climate projection in the East Asian summer monsoon season,proper simulation of the seasonal migration of rain bands by models is a challenging and limiting factor[section 7.1 in Wang et al.(2025)].展开更多
背景:近期研究表明,肠道菌群可能会影响肌萎缩侧索硬化症的发展进程,然而两者之间的因果关系尚不清楚。目的:利用孟德尔随机化方法探索肠道菌群与肌萎缩侧索硬化症之间的因果关系。方法:从IEU Open GWAS数据库(由英国布里斯托尔大学的...背景:近期研究表明,肠道菌群可能会影响肌萎缩侧索硬化症的发展进程,然而两者之间的因果关系尚不清楚。目的:利用孟德尔随机化方法探索肠道菌群与肌萎缩侧索硬化症之间的因果关系。方法:从IEU Open GWAS数据库(由英国布里斯托尔大学的英国医学研究委员会和遗传流行病学研究所开发,旨在提供与多种疾病相关的全基因组关联研究数据,为开放数据库)中分别获取肠道菌群和肌萎缩侧索硬化症的GWAS数据,以肠道菌群为暴露因素、肌萎缩侧索硬化症为结局变量,使用逆方差加权法、MR-Egger回归法、加权中位数法、加权模型法和简单模型法来探究两者之间的因果关系。使用敏感性分析检验孟德尔随机化结果的可靠性,使用反向孟德尔随机化分析进一步验证两者间的因果关系。结果与结论:(1)正向孟德尔随机化分析结果表明,6种肠道菌群与肌萎缩侧索硬化症之间存在因果关系,其中嗜胆菌属(β=0.206,OR=1.229)、毛螺菌属(β=0.288,OR=1.333)、马文-布莱恩特氏菌属(β=0.196,OR=1.216)、瘤胃球菌UCG010属(β=0.254,OR=1.289)和泰泽氏菌属3型(β=0.128,OR=1.136)可能是肌萎缩侧索硬化症的潜在危险因素,肠杆菌属(β=-0.203,OR=0.816)可能是肌萎缩侧索硬化症的保护因素;(2)在敏感性分析中,未发现显著的异质性和水平多效性(P均> 0.05),反向孟德尔随机化分析亦未揭示肠道菌群与肌萎缩侧索硬化症之间存在反向因果关系;(3)该研究结果不仅为肌萎缩侧索硬化症治疗提供了潜在的生物标志物,还为开发基于肠道菌群的新的干预治疗方案提供了理论依据,对中国基础医学研究具有一定的启示意义。展开更多
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金supported by the National Key R&D Program of China(Grant No.2022YFC3080500)the National Natural Science Foundation of China(Grant Nos.U2142208,42475158,and 42105149)the High-Performance Computing Center of Nanjing University of Information Science&Technology for supporting this work。
文摘High spatiotemporal resolution infrared radiances from FY-4A/AGRI(Advanced Geostationary Radiation Imager)can provide crucial information for rapidly developing severe convective weather.This study established a symmetric observation error model that differentiates between land and sea for FY-4A/AGRI all-sky assimilation,developed an all-sky assimilation scheme for FY-4A/AGRI based on hydrometeor control variables,and investigated the impacts of all-sky FY-4A/AGRI water vapor channels at different altitudes and rapid-update assimilation at different frequencies on the assimilation and forecasting of a severe convective weather event.Results show that simultaneous assimilation of two water vapor channels can enhance precipitation forecasts compared to single-channel assimilation,which is mainly attributable to a more accurate analysis of water vapor and hydrometeor information.Experiments with different assimilation frequencies demonstrate that the hourly assimilation frequency,compared to other frequencies,incorporates the high-frequency information from AGRI while reducing the impact of spurious oscillations caused by excessively high-frequency assimilation.This hourly assimilation frequency reduces the incoordination among thermal,dynamical,and water vapor conditions caused by excessively fast or slow assimilation frequencies,thus improving the forecast accuracy compared to other frequencies.
文摘Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.
基金supported by the National Key Research and Development Program of China under Grant 2022YFB2901501in part by the Science and Technology Innovation leading Talents Subsidy Project of Central Plains under Grant 244200510038.
文摘The rapid growth of distributed data-centric applications and AI workloads increases demand for low-latency,high-throughput communication,necessitating frequent and flexible updates to network routing configurations.However,maintaining consistent forwarding states during these updates is challenging,particularly when rerouting multiple flows simultaneously.Existing approaches pay little attention to multi-flow update,where improper update sequences across data plane nodes may construct deadlock dependencies.Moreover,these methods typically involve excessive control-data plane interactions,incurring significant resource overhead and performance degradation.This paper presents P4LoF,an efficient loop-free update approach that enables the controller to reroute multiple flows through minimal interactions.P4LoF first utilizes a greedy-based algorithm to generate the shortest update dependency chain for the single-flow update.These chains are then dynamically merged into a dependency graph and resolved as a Shortest Common Super-sequence(SCS)problem to produce the update sequence of multi-flow update.To address deadlock dependencies in multi-flow updates,P4LoF builds a deadlock-fix forwarding model that leverages the flexible packet processing capabilities of the programmable data plane.Experimental results show that P4LoF reduces control-data plane interactions by at least 32.6%with modest overhead,while effectively guaranteeing loop-free consistency.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
基金supported by the Zhongyuan University of Technology Discipline Backbone Teacher Support Program Project(No.GG202417)the Key Research and Development Program of Henan under Grant 251111212000.
文摘Lateral movement represents the most covert and critical phase of Advanced Persistent Threats(APTs),and its detection still faces two primary challenges:sample scarcity and“cold start”of new entities.To address these challenges,we propose an Uncertainty-Driven Graph Embedding-Enhanced Lateral Movement Detection framework(UGEA-LMD).First,the framework employs event-level incremental encoding on a continuous-time graph to capture fine-grained behavioral evolution,enabling newly appearing nodes to retain temporal contextual awareness even in the absence of historical interactions and thereby fundamentally mitigating the cold-start problem.Second,in the embedding space,we model the dependency structure among feature dimensions using a Gaussian copula to quantify the uncertainty distribution,and generate augmented samples with consistent structural and semantic properties through adaptive sampling,thus expanding the representation space of sparse samples and enhancing the model’s generalization under sparse sample conditions.Unlike static graph methods that cannot model temporal dependencies or data augmentation techniques that depend on predefined structures,UGEA-LMD offers both superior temporaldynamic modeling and structural generalization.Experimental results on the large-scale LANL log dataset demonstrate that,under the transductive setting,UGEA-LMD achieves an AUC of 0.9254;even when 10%of nodes or edges are withheld during training,UGEA-LMD significantly outperforms baseline methods on metrics such as recall and AUC,confirming its robustness and generalization capability in sparse-sample and cold-start scenarios.
基金The work described in this paper was fully supported by a grant from Hong Kong Metropolitan University(RIF/2021/05).
文摘Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.
文摘Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(2025)].In relation to seasonal forecasting and climate projection in the East Asian summer monsoon season,proper simulation of the seasonal migration of rain bands by models is a challenging and limiting factor[section 7.1 in Wang et al.(2025)].
文摘背景:近期研究表明,肠道菌群可能会影响肌萎缩侧索硬化症的发展进程,然而两者之间的因果关系尚不清楚。目的:利用孟德尔随机化方法探索肠道菌群与肌萎缩侧索硬化症之间的因果关系。方法:从IEU Open GWAS数据库(由英国布里斯托尔大学的英国医学研究委员会和遗传流行病学研究所开发,旨在提供与多种疾病相关的全基因组关联研究数据,为开放数据库)中分别获取肠道菌群和肌萎缩侧索硬化症的GWAS数据,以肠道菌群为暴露因素、肌萎缩侧索硬化症为结局变量,使用逆方差加权法、MR-Egger回归法、加权中位数法、加权模型法和简单模型法来探究两者之间的因果关系。使用敏感性分析检验孟德尔随机化结果的可靠性,使用反向孟德尔随机化分析进一步验证两者间的因果关系。结果与结论:(1)正向孟德尔随机化分析结果表明,6种肠道菌群与肌萎缩侧索硬化症之间存在因果关系,其中嗜胆菌属(β=0.206,OR=1.229)、毛螺菌属(β=0.288,OR=1.333)、马文-布莱恩特氏菌属(β=0.196,OR=1.216)、瘤胃球菌UCG010属(β=0.254,OR=1.289)和泰泽氏菌属3型(β=0.128,OR=1.136)可能是肌萎缩侧索硬化症的潜在危险因素,肠杆菌属(β=-0.203,OR=0.816)可能是肌萎缩侧索硬化症的保护因素;(2)在敏感性分析中,未发现显著的异质性和水平多效性(P均> 0.05),反向孟德尔随机化分析亦未揭示肠道菌群与肌萎缩侧索硬化症之间存在反向因果关系;(3)该研究结果不仅为肌萎缩侧索硬化症治疗提供了潜在的生物标志物,还为开发基于肠道菌群的新的干预治疗方案提供了理论依据,对中国基础医学研究具有一定的启示意义。