Effective management of mining areas in the Luo River Basin,located in the eastern Qinling Mountains,is vital for the integrated protection and restoration needed to support the high-quality development of the Yellow ...Effective management of mining areas in the Luo River Basin,located in the eastern Qinling Mountains,is vital for the integrated protection and restoration needed to support the high-quality development of the Yellow River Basin.Using the‘cupball'model,this study analyzes the limiting factors and restoration characteristics across four mining areas and proposes a conceptual model for selecting appropriate restoration approaches.A second conceptual model is then introduced to address regional development needs,incorporating ecological conservation,safety protection,and people's wellbeing.The applicability of the integrated model selection framework is demonstrated through a case study on the south bank of the Qinglongjian River.The results indicate that:(1)The key limiting factors are similar across cases,but the degree of ecological degradation varies.(2)Mildly degraded areas are represented by a shallower and narrower‘cup',where natural recovery is the preferred approach,whereas moderately and severely degraded systems call for assisted regeneration and ecological reconstruction,respectively.(3)When the restoration models determined based on limiting factors and development needs are consistent,the model is directly applicable;if they differ,the option involving less artificial intervention is preferred;(4)Monitoring of the restored mining area on the Qinglongjian River's south bank confirms significant improvements in soil erosion control and vegetation coverage.This study provides a transferable methodology for balancing resource extraction with ecosystem conservation,offering practical insights for other ecologically vulnerable mining regions.展开更多
Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an in...Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an interpretable machine learning approach to UCS prediction is presented,pairing five models(Random Forest(RF),Gradient Boosting(GB),Extreme Gradient Boosting(XGB),CatBoost,and K-Nearest Neighbors(KNN))with SHapley Additive exPlanations(SHAP)for enhanced interpretability and to guide feature removal.A complete dataset of 12 geotechnical and chemical parameters,i.e.,Atterberg limits,compaction properties,stabilizer chemistry,dosage,curing time,was used to train and test the models.R2,RMSE,MSE,and MAE were used to assess performance.Initial results with all 12 features indicated that boosting-based models(GB,XGB,CatBoost)exhibited the highest predictive accuracy(R^(2)=0.93)with satisfactory generalization on test data,followed by RF and KNN.SHAP analysis consistently picked CaO content,curing time,stabilizer dosage,and compaction parameters as the most important features,aligning with established soil stabilization mechanisms.Models were then re-trained on the top 8 and top 5 SHAP-ranked features.Interestingly,GB,XGB,and CatBoost maintained comparable accuracy with reduced input sets,while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality.The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss.The suggested hybrid approach offers an explainable,interpretable,and cost-effective tool for geotechnical engineering practice.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Federated Learning(FL)provides an effective framework for efficient processing in vehicular edge computing.However,the dynamic and uncertain communication environment,along with the performance variations of vehicular...Federated Learning(FL)provides an effective framework for efficient processing in vehicular edge computing.However,the dynamic and uncertain communication environment,along with the performance variations of vehicular devices,affect the distribution and uploading processes of model parameters.In FL-assisted Internet of Vehicles(IoV)scenarios,challenges such as data heterogeneity,limited device resources,and unstable communication environments become increasingly prominent.These issues necessitate intelligent vehicle selection schemes to enhance training efficiency.Given this context,we propose a new scenario involving FL-assisted IoV systems under dynamic and uncertain communication conditions,and develop a dynamic interval multi-objective optimization algorithm to jointly optimize various factors including training experiments,system energy consumption,and bandwidth utilization to meet multi-criteria resource optimization requirements.For the problem at hand,we design a dynamic interval multi-objective optimization algorithm based on interval overlap detection.Simulation results demonstrate that our method outperforms other solutions in terms of accuracy,training cost,and server utilization.It effectively enhances training efficiency under wireless channel environments while rationally utilizing bandwidth resources,thus possessing significant scientific value and application potential in the field of IoV.展开更多
Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these i...Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.展开更多
Portfolio theory has been extensively studied and applied in finance.To determine the optimal portfolio weight under the global minimum variance strategy,it is necessary to estimate both the covariance matrix and its ...Portfolio theory has been extensively studied and applied in finance.To determine the optimal portfolio weight under the global minimum variance strategy,it is necessary to estimate both the covariance matrix and its inverse.However,the high dimensionality and heavy-tailed nature of financial data pose significant challenges to this estimation.In this study,we propose a method to estimate the Gini covariance matrix by introducing a low-rank and sparse correlation structure,as an alternative to the traditional sample covariance matrix.Our approach employs a factor model to capture the low-rank structure,combined with thresholding rules to achieve the final estimation.We demonstrate the consistency of our estimators and validate our approach through simulation experiments and empirical portfolio analyses.Simulation results show that our method is highly applicable across a variety of distributional scenarios.Furthermore,empirical portfolio analysis indicates that our method can construct portfolios with superior performance.展开更多
BACKGROUND Relieving pain is central to the early management of knee osteoarthritis,with a plethora of pharmacological agents licensed for this purpose.Intra-articular corticosteroid injections are a widely used optio...BACKGROUND Relieving pain is central to the early management of knee osteoarthritis,with a plethora of pharmacological agents licensed for this purpose.Intra-articular corticosteroid injections are a widely used option,albeit with variable efficacy.AIM To develop a machine learning(ML)model that predicts which patients will benefit from corticosteroid injections.METHODS Data from two prospective cohort studies[Osteoarthritis(OA)Initiative and Multicentre OA Study]was combined.The primary outcome was patientreported pain score following corticosteroid injection,assessed using the Western Ontario and McMaster Universities OA pain scale,with significant change defined using minimally clinically important difference and meaningful within person change.A ML algorithm was developed,utilizing linear discriminant analysis,to predict symptomatic improvement,and examine the association between pain scores and patient factors by calculating the sensitivity,specificity,positive predictive value,negative predictive value,accuracy,and F2 score.RESULTS A total of 330 patients were included,with a mean age of 63.4(SD:8.3).The mean Western Ontario and McMaster Universities OA pain score was 5.2(SD:4.1),with only 25.5%of patients achieving significant improvement in pain following corticosteroid injection.The ML model generated an accuracy of 67.8%(95%confidence interval:64.6%-70.9%),F1 score of 30.8%,and an area under the curve score of 0.60.CONCLUSION The model demonstrated feasibility to assist clinicians with decision-making in patient selection for corticosteroid injections.Further studies are required to improve the model prior to testing in clinical settings.展开更多
In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This...In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.展开更多
Selecting proper descriptors(also known feature selection,FS)is key in the process of establishing mechanical properties prediction model of hot-rolled microalloyed steels by using machine learning(ML)algorithm.FS met...Selecting proper descriptors(also known feature selection,FS)is key in the process of establishing mechanical properties prediction model of hot-rolled microalloyed steels by using machine learning(ML)algorithm.FS methods based on data-driving can reduce the redundancy of data features and improve the prediction accuracy of mechanical properties.Based on the collected data of hot-rolled microalloyed steels,the association rules are used to mine the correlation information between the data.High-quality feature subsets are selected by the proposed FS method(FS method based on genetic algorithm embedding,GAMIC).Compared with the common FS method,it is shown on dataset that GAMIC selects feature subsets more appropriately.Six different ML algorithms are trained and tested for mechanical properties prediction.The result shows that the root-mean-square error of yield strength,tensile strength and elongation based on limit gradient enhancement(XGBoost)algorithm is 21.95 MPa,20.85 MPa and 1.96%,the correlation coefficient(R^(2))is 0.969,0.968 and 0.830,and the mean absolute error is 16.84 MPa,15.83 MPa and 1.48%,respectively,showing the best prediction performance.Finally,SHapley Additive exPlanation is used to further explore the influence of feature variables on mechanical properties.GAMIC feature selection method proposed is universal,which provides a basis for the development of high-precision mechanical property prediction model.展开更多
Coordinate transformation models often fail to account for nonlinear and spatially dependent distortions,leading to significant residual errors in geospatial applications.Here,we propose a residual-based neural correc...Coordinate transformation models often fail to account for nonlinear and spatially dependent distortions,leading to significant residual errors in geospatial applications.Here,we propose a residual-based neural correction(RBNC)strategy,in which a neural network learns to model only the systematic distortions left by an initial geometric transformation.By focusing solely on residual patterns,RBNC reduces model complexity and improves performance,particularly in scenarios with sparse or structured control point configurations.We evaluate the method using both simulated datasets(with varying distortion intensities and sampling strategies)and real-world image georeferencing tasks.Compared with direct neural network coordinate converters and classical transformation models,RBNC delivers more accurate and stable results under challenging conditions,while maintaining comparable performance in ideal cases.These findings demonstrate the effectiveness of residual modelling as a light-weight and robust alternative for improving coordinate transformation accuracy.展开更多
In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection cr...In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.展开更多
Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certai...Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certain models,they do not invariably guarantee the extraction of the most critical or impactful features.Prior literature underscores the significance of equitable FS practices and has proposed diverse methodologies for the identification of appropriate features.However,the challenge of discerning the most relevant and influential features persists,particularly in the context of the exponential growth and heterogeneity of big data—a challenge that is increasingly salient in modern artificial intelligence(AI)applications.In response,this study introduces an innovative,automated statistical method termed Farea Similarity for Feature Selection(FSFS).The FSFS approach computes a similarity metric for each feature by benchmarking it against the record-wise mean,thereby finding feature dependencies and mitigating the influence of outliers that could potentially distort evaluation outcomes.Features are subsequently ranked according to their similarity scores,with the threshold established at the average similarity score.Notably,lower FSFS values indicate higher similarity and stronger data correlations,whereas higher values suggest lower similarity.The FSFS method is designed not only to yield reliable evaluation metrics but also to reduce data complexity without compromising model performance.Comparative analyses were performed against several established techniques,including Chi-squared(CS),Correlation Coefficient(CC),Genetic Algorithm(GA),Exhaustive Approach,Greedy Stepwise Approach,Gain Ratio,and Filtered Subset Eval,using a variety of datasets such as the Experimental Dataset,Breast Cancer Wisconsin(Original),KDD CUP 1999,NSL-KDD,UNSW-NB15,and Edge-IIoT.In the absence of the FSFS method,the highest classifier accuracies observed were 60.00%,95.13%,97.02%,98.17%,95.86%,and 94.62%for the respective datasets.When the FSFS technique was integrated with data normalization,encoding,balancing,and feature importance selection processes,accuracies improved to 100.00%,97.81%,98.63%,98.94%,94.27%,and 98.46%,respectively.The FSFS method,with a computational complexity of O(fn log n),demonstrates robust scalability and is well-suited for datasets of large size,ensuring efficient processing even when the number of features is substantial.By automatically eliminating outliers and redundant data,FSFS reduces computational overhead,resulting in faster training and improved model performance.Overall,the FSFS framework not only optimizes performance but also enhances the interpretability and explainability of data-driven models,thereby facilitating more trustworthy decision-making in AI applications.展开更多
With the development of More Electric Aircraft(MEA),the Permanent Magnet Synchronous Motor(PMSM)is widely used in the MEA field.The PMSM control system of MEA needs to consider the system reliability,and the inverter ...With the development of More Electric Aircraft(MEA),the Permanent Magnet Synchronous Motor(PMSM)is widely used in the MEA field.The PMSM control system of MEA needs to consider the system reliability,and the inverter switching frequency of the inverter is one of the impacting factors.At the same time,the control accuracy of the system also needs to be considered,and the torque ripple and flux ripple are usually considered to be its important indexes.This paper proposes a three-stage series Model Predictive Torque and Flux Control system(three-stage series MPTFC)based on fast optimal voltage vector selection to reduce switching frequency and suppress torque ripple and flux ripple.Firstly,the analytical model of the PMSM is established and the multi-stage series control method is used to reduce the switching frequency.Secondly,selectable voltage vectors are extended from 8 to 26 and a fast selection method for optimal voltage vector sectors is designed based on the hysteresis comparator,which can suppress the torque ripple and flux ripple to improve the control accuracy.Thirdly,a three-stage series control is obtained by expanding the two-stage series control using the P-Q torque decomposition theory.Finally,a model predictive torque and flux control experimental platform is built,and the feasibility and effectiveness of this method are verified through comparison experiments.展开更多
In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asy...In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asymptotic stability of the trivial solution and the positive periodic solution.Finally,numerical simulations are presented to validate our results.Our results show that age-selective harvesting is more conducive to sustainable population survival than non-age-selective harvesting.展开更多
The rapid evolution of smart cities through IoT,cloud computing,and connected infrastructures has significantly enhanced sectors such as transportation,healthcare,energy,and public safety,but also increased exposure t...The rapid evolution of smart cities through IoT,cloud computing,and connected infrastructures has significantly enhanced sectors such as transportation,healthcare,energy,and public safety,but also increased exposure to sophisticated cyber threats.The diversity of devices,high data volumes,and real-time operational demands complicate security,requiring not just robust intrusion detection but also effective feature selection for relevance and scalability.Traditional Machine Learning(ML)based Intrusion Detection System(IDS)improves detection but often lacks interpretability,limiting stakeholder trust and timely responses.Moreover,centralized feature selection in conventional IDS compromises data privacy and fails to accommodate the decentralized nature of smart city infrastructures.To address these limitations,this research introduces an Interpretable Federated Learning(FL)based Cyber Intrusion Detection model tailored for smart city applications.The proposed system leverages privacy-preserving feature selection,where each client node independently identifies top-ranked features using ML models integrated with SHAP-based explainability.These local feature subsets are then aggregated at a central server to construct a global model without compromising sensitive data.Furthermore,the global model is enhanced with Explainable AI(XAI)techniques such as SHAP and LIME,offering both global interpretability and instance-level transparency for cyber threat decisions.Experimental results demonstrate that the proposed global model achieves a high detection accuracy of 98.51%,with a significantly low miss rate of 1.49%,outperforming existing models while ensuring explainability,privacy,and scalability across smart city infrastructures.展开更多
The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud d...The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud data centres and an important concern of research for many researchers.In this paper,we proposed a cuckoo search(CS)-based optimisation technique for the virtual machine(VM)selection and a novel placement algorithm considering the different constraints.The energy consumption model and the simulation model have been implemented for the efficient selection of VM.The proposed model CSOA-VM not only lessens the violations at the service level agreement(SLA)level but also minimises the VM migrations.The proposed model also saves energy and the performance analysis shows that energy consumption obtained is 1.35 kWh,SLA violation is 9.2 and VM migration is about 268.Thus,there is an improvement in energy consumption of about 1.8%and a 2.1%improvement(reduction)in violations of SLA in comparison to existing techniques.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ...High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.展开更多
Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
With the advent of the sixth-generationwireless technology,the importance of using artificial intelligence of things(AIoT)devices is increasing to enhance efficiency.As massive volumes of data are collected and stored...With the advent of the sixth-generationwireless technology,the importance of using artificial intelligence of things(AIoT)devices is increasing to enhance efficiency.As massive volumes of data are collected and stored in these AIoT environments,each device becomes a potential attack target,leading to increased security vulnerabilities.Therefore,intrusion detection studies have been conducted to detect malicious network traffic.However,existing studies have been biased toward conducting in-depth analyses of individual packets to improve accuracy or applying flow-based statistical information to ensure real-time performance.Effectively responding to complex andmultifaceted threats in large-scale AIoT environments is challenging.This study proposes a hybrid multivariate network traffic(HyMNeT)feature-based intrusion detection system that applies a hybrid meta-heuristic feature selection approach to create a secure and efficient AIoT environment.The HyMNeT system selects critical features by applying mutual information maximization(MIM)and the maximal information coefficient(MIC)based on statistical features of the network traffic flow and raw packet features.This system employs the reference vector-guided evolutionary algorithm to search for optimal thresholds that maximizeMIMscores whileminimizingMIC scores.An evaluation of the selected multivariate network traffic feature set using four machine learning models on the BoT-IoT and ToN-IoT datasets resulted in average accuracy,precision,recall,and F1-score values of 0.9844,0.9897,0.9844,and 0.9859,respectively.This work demonstrates that HyMNeT performs detection consistently and stably across all models.展开更多
基金supported by Special major projects for research and development of Henan Provincial(Science and Technology Research Project)(No.252102321104)Humanities and Social Sciences Youth Foundation,Ministry of Education(24YJCZH410)。
文摘Effective management of mining areas in the Luo River Basin,located in the eastern Qinling Mountains,is vital for the integrated protection and restoration needed to support the high-quality development of the Yellow River Basin.Using the‘cupball'model,this study analyzes the limiting factors and restoration characteristics across four mining areas and proposes a conceptual model for selecting appropriate restoration approaches.A second conceptual model is then introduced to address regional development needs,incorporating ecological conservation,safety protection,and people's wellbeing.The applicability of the integrated model selection framework is demonstrated through a case study on the south bank of the Qinglongjian River.The results indicate that:(1)The key limiting factors are similar across cases,but the degree of ecological degradation varies.(2)Mildly degraded areas are represented by a shallower and narrower‘cup',where natural recovery is the preferred approach,whereas moderately and severely degraded systems call for assisted regeneration and ecological reconstruction,respectively.(3)When the restoration models determined based on limiting factors and development needs are consistent,the model is directly applicable;if they differ,the option involving less artificial intervention is preferred;(4)Monitoring of the restored mining area on the Qinglongjian River's south bank confirms significant improvements in soil erosion control and vegetation coverage.This study provides a transferable methodology for balancing resource extraction with ecosystem conservation,offering practical insights for other ecologically vulnerable mining regions.
文摘Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an interpretable machine learning approach to UCS prediction is presented,pairing five models(Random Forest(RF),Gradient Boosting(GB),Extreme Gradient Boosting(XGB),CatBoost,and K-Nearest Neighbors(KNN))with SHapley Additive exPlanations(SHAP)for enhanced interpretability and to guide feature removal.A complete dataset of 12 geotechnical and chemical parameters,i.e.,Atterberg limits,compaction properties,stabilizer chemistry,dosage,curing time,was used to train and test the models.R2,RMSE,MSE,and MAE were used to assess performance.Initial results with all 12 features indicated that boosting-based models(GB,XGB,CatBoost)exhibited the highest predictive accuracy(R^(2)=0.93)with satisfactory generalization on test data,followed by RF and KNN.SHAP analysis consistently picked CaO content,curing time,stabilizer dosage,and compaction parameters as the most important features,aligning with established soil stabilization mechanisms.Models were then re-trained on the top 8 and top 5 SHAP-ranked features.Interestingly,GB,XGB,and CatBoost maintained comparable accuracy with reduced input sets,while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality.The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss.The suggested hybrid approach offers an explainable,interpretable,and cost-effective tool for geotechnical engineering practice.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported in part by the Central Guidance for Local Science and Technology Development Funds under Grant No.YDZJSX2025D049Shanxi Provincial Graduate Innovation Research Program under Grant No.2024KY652.
文摘Federated Learning(FL)provides an effective framework for efficient processing in vehicular edge computing.However,the dynamic and uncertain communication environment,along with the performance variations of vehicular devices,affect the distribution and uploading processes of model parameters.In FL-assisted Internet of Vehicles(IoV)scenarios,challenges such as data heterogeneity,limited device resources,and unstable communication environments become increasingly prominent.These issues necessitate intelligent vehicle selection schemes to enhance training efficiency.Given this context,we propose a new scenario involving FL-assisted IoV systems under dynamic and uncertain communication conditions,and develop a dynamic interval multi-objective optimization algorithm to jointly optimize various factors including training experiments,system energy consumption,and bandwidth utilization to meet multi-criteria resource optimization requirements.For the problem at hand,we design a dynamic interval multi-objective optimization algorithm based on interval overlap detection.Simulation results demonstrate that our method outperforms other solutions in terms of accuracy,training cost,and server utilization.It effectively enhances training efficiency under wireless channel environments while rationally utilizing bandwidth resources,thus possessing significant scientific value and application potential in the field of IoV.
基金supported by the National Natural Science Foundation of China(42250101)the Macao Foundation。
文摘Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.
基金supported by the Postdoctoral Fellowship Program of CPSF(GZC20241651)the National Natural Science Foundation of China(12501391)the Natural Science Foundation of Anhui Province(2408085QA005).
文摘Portfolio theory has been extensively studied and applied in finance.To determine the optimal portfolio weight under the global minimum variance strategy,it is necessary to estimate both the covariance matrix and its inverse.However,the high dimensionality and heavy-tailed nature of financial data pose significant challenges to this estimation.In this study,we propose a method to estimate the Gini covariance matrix by introducing a low-rank and sparse correlation structure,as an alternative to the traditional sample covariance matrix.Our approach employs a factor model to capture the low-rank structure,combined with thresholding rules to achieve the final estimation.We demonstrate the consistency of our estimators and validate our approach through simulation experiments and empirical portfolio analyses.Simulation results show that our method is highly applicable across a variety of distributional scenarios.Furthermore,empirical portfolio analysis indicates that our method can construct portfolios with superior performance.
基金Supported by National Institute For Health and Care Research,No.NIHR302632.
文摘BACKGROUND Relieving pain is central to the early management of knee osteoarthritis,with a plethora of pharmacological agents licensed for this purpose.Intra-articular corticosteroid injections are a widely used option,albeit with variable efficacy.AIM To develop a machine learning(ML)model that predicts which patients will benefit from corticosteroid injections.METHODS Data from two prospective cohort studies[Osteoarthritis(OA)Initiative and Multicentre OA Study]was combined.The primary outcome was patientreported pain score following corticosteroid injection,assessed using the Western Ontario and McMaster Universities OA pain scale,with significant change defined using minimally clinically important difference and meaningful within person change.A ML algorithm was developed,utilizing linear discriminant analysis,to predict symptomatic improvement,and examine the association between pain scores and patient factors by calculating the sensitivity,specificity,positive predictive value,negative predictive value,accuracy,and F2 score.RESULTS A total of 330 patients were included,with a mean age of 63.4(SD:8.3).The mean Western Ontario and McMaster Universities OA pain score was 5.2(SD:4.1),with only 25.5%of patients achieving significant improvement in pain following corticosteroid injection.The ML model generated an accuracy of 67.8%(95%confidence interval:64.6%-70.9%),F1 score of 30.8%,and an area under the curve score of 0.60.CONCLUSION The model demonstrated feasibility to assist clinicians with decision-making in patient selection for corticosteroid injections.Further studies are required to improve the model prior to testing in clinical settings.
基金Supported by the Natural Science Foundation of Fujian Province(2022J011177,2024J01903)the Key Project of Fujian Provincial Education Department(JZ230054)。
文摘In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.
基金supported by the National Key Research and Development Program of China(Grant No.2021YFB3702404)the National Natural Science Foundation of China(Grant No.52104370)+4 种基金the Reviving-Liaoning Excellence Plan(XLYC2203186)Science and Technology Special Projects of Liaoning Province(Grant No.2022JH25/10200001)the Postdoctoral Research Fund for Northeastern(Grant No.20210203)Independent Projects of Basic Scientific Research(ZZ2021005)CITIC Niobium Steel Development Award Fund(2022-M1824).
文摘Selecting proper descriptors(also known feature selection,FS)is key in the process of establishing mechanical properties prediction model of hot-rolled microalloyed steels by using machine learning(ML)algorithm.FS methods based on data-driving can reduce the redundancy of data features and improve the prediction accuracy of mechanical properties.Based on the collected data of hot-rolled microalloyed steels,the association rules are used to mine the correlation information between the data.High-quality feature subsets are selected by the proposed FS method(FS method based on genetic algorithm embedding,GAMIC).Compared with the common FS method,it is shown on dataset that GAMIC selects feature subsets more appropriately.Six different ML algorithms are trained and tested for mechanical properties prediction.The result shows that the root-mean-square error of yield strength,tensile strength and elongation based on limit gradient enhancement(XGBoost)algorithm is 21.95 MPa,20.85 MPa and 1.96%,the correlation coefficient(R^(2))is 0.969,0.968 and 0.830,and the mean absolute error is 16.84 MPa,15.83 MPa and 1.48%,respectively,showing the best prediction performance.Finally,SHapley Additive exPlanation is used to further explore the influence of feature variables on mechanical properties.GAMIC feature selection method proposed is universal,which provides a basis for the development of high-precision mechanical property prediction model.
基金National Council for Scientific and Technological Development,Grant No.421278/2023-4,No.309248/2025-6。
文摘Coordinate transformation models often fail to account for nonlinear and spatially dependent distortions,leading to significant residual errors in geospatial applications.Here,we propose a residual-based neural correction(RBNC)strategy,in which a neural network learns to model only the systematic distortions left by an initial geometric transformation.By focusing solely on residual patterns,RBNC reduces model complexity and improves performance,particularly in scenarios with sparse or structured control point configurations.We evaluate the method using both simulated datasets(with varying distortion intensities and sampling strategies)and real-world image georeferencing tasks.Compared with direct neural network coordinate converters and classical transformation models,RBNC delivers more accurate and stable results under challenging conditions,while maintaining comparable performance in ideal cases.These findings demonstrate the effectiveness of residual modelling as a light-weight and robust alternative for improving coordinate transformation accuracy.
基金National Natural Science Foundation of China(62161048)Sichuan Science and Technology Program(2022NSFSC0547,2022ZYD0109)。
文摘In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.
文摘Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certain models,they do not invariably guarantee the extraction of the most critical or impactful features.Prior literature underscores the significance of equitable FS practices and has proposed diverse methodologies for the identification of appropriate features.However,the challenge of discerning the most relevant and influential features persists,particularly in the context of the exponential growth and heterogeneity of big data—a challenge that is increasingly salient in modern artificial intelligence(AI)applications.In response,this study introduces an innovative,automated statistical method termed Farea Similarity for Feature Selection(FSFS).The FSFS approach computes a similarity metric for each feature by benchmarking it against the record-wise mean,thereby finding feature dependencies and mitigating the influence of outliers that could potentially distort evaluation outcomes.Features are subsequently ranked according to their similarity scores,with the threshold established at the average similarity score.Notably,lower FSFS values indicate higher similarity and stronger data correlations,whereas higher values suggest lower similarity.The FSFS method is designed not only to yield reliable evaluation metrics but also to reduce data complexity without compromising model performance.Comparative analyses were performed against several established techniques,including Chi-squared(CS),Correlation Coefficient(CC),Genetic Algorithm(GA),Exhaustive Approach,Greedy Stepwise Approach,Gain Ratio,and Filtered Subset Eval,using a variety of datasets such as the Experimental Dataset,Breast Cancer Wisconsin(Original),KDD CUP 1999,NSL-KDD,UNSW-NB15,and Edge-IIoT.In the absence of the FSFS method,the highest classifier accuracies observed were 60.00%,95.13%,97.02%,98.17%,95.86%,and 94.62%for the respective datasets.When the FSFS technique was integrated with data normalization,encoding,balancing,and feature importance selection processes,accuracies improved to 100.00%,97.81%,98.63%,98.94%,94.27%,and 98.46%,respectively.The FSFS method,with a computational complexity of O(fn log n),demonstrates robust scalability and is well-suited for datasets of large size,ensuring efficient processing even when the number of features is substantial.By automatically eliminating outliers and redundant data,FSFS reduces computational overhead,resulting in faster training and improved model performance.Overall,the FSFS framework not only optimizes performance but also enhances the interpretability and explainability of data-driven models,thereby facilitating more trustworthy decision-making in AI applications.
基金co-supported by the National Natural Science Foundation of China(No.52477063)the National Key Research and Development Program of China(No.2023YFF0719100)。
文摘With the development of More Electric Aircraft(MEA),the Permanent Magnet Synchronous Motor(PMSM)is widely used in the MEA field.The PMSM control system of MEA needs to consider the system reliability,and the inverter switching frequency of the inverter is one of the impacting factors.At the same time,the control accuracy of the system also needs to be considered,and the torque ripple and flux ripple are usually considered to be its important indexes.This paper proposes a three-stage series Model Predictive Torque and Flux Control system(three-stage series MPTFC)based on fast optimal voltage vector selection to reduce switching frequency and suppress torque ripple and flux ripple.Firstly,the analytical model of the PMSM is established and the multi-stage series control method is used to reduce the switching frequency.Secondly,selectable voltage vectors are extended from 8 to 26 and a fast selection method for optimal voltage vector sectors is designed based on the hysteresis comparator,which can suppress the torque ripple and flux ripple to improve the control accuracy.Thirdly,a three-stage series control is obtained by expanding the two-stage series control using the P-Q torque decomposition theory.Finally,a model predictive torque and flux control experimental platform is built,and the feasibility and effectiveness of this method are verified through comparison experiments.
基金Supported by the National Natural Science Foundation of China(12261018)Universities Key Laboratory of Mathematical Modeling and Data Mining in Guizhou Province(2023013)。
文摘In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asymptotic stability of the trivial solution and the positive periodic solution.Finally,numerical simulations are presented to validate our results.Our results show that age-selective harvesting is more conducive to sustainable population survival than non-age-selective harvesting.
文摘The rapid evolution of smart cities through IoT,cloud computing,and connected infrastructures has significantly enhanced sectors such as transportation,healthcare,energy,and public safety,but also increased exposure to sophisticated cyber threats.The diversity of devices,high data volumes,and real-time operational demands complicate security,requiring not just robust intrusion detection but also effective feature selection for relevance and scalability.Traditional Machine Learning(ML)based Intrusion Detection System(IDS)improves detection but often lacks interpretability,limiting stakeholder trust and timely responses.Moreover,centralized feature selection in conventional IDS compromises data privacy and fails to accommodate the decentralized nature of smart city infrastructures.To address these limitations,this research introduces an Interpretable Federated Learning(FL)based Cyber Intrusion Detection model tailored for smart city applications.The proposed system leverages privacy-preserving feature selection,where each client node independently identifies top-ranked features using ML models integrated with SHAP-based explainability.These local feature subsets are then aggregated at a central server to construct a global model without compromising sensitive data.Furthermore,the global model is enhanced with Explainable AI(XAI)techniques such as SHAP and LIME,offering both global interpretability and instance-level transparency for cyber threat decisions.Experimental results demonstrate that the proposed global model achieves a high detection accuracy of 98.51%,with a significantly low miss rate of 1.49%,outperforming existing models while ensuring explainability,privacy,and scalability across smart city infrastructures.
文摘The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud data centres and an important concern of research for many researchers.In this paper,we proposed a cuckoo search(CS)-based optimisation technique for the virtual machine(VM)selection and a novel placement algorithm considering the different constraints.The energy consumption model and the simulation model have been implemented for the efficient selection of VM.The proposed model CSOA-VM not only lessens the violations at the service level agreement(SLA)level but also minimises the VM migrations.The proposed model also saves energy and the performance analysis shows that energy consumption obtained is 1.35 kWh,SLA violation is 9.2 and VM migration is about 268.Thus,there is an improvement in energy consumption of about 1.8%and a 2.1%improvement(reduction)in violations of SLA in comparison to existing techniques.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(RS-2020-NR049579).
文摘High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
基金supported by the National Research Foundation of Korea(NRF)funded by the Ministry of Science and ICT(RS-2023-00267476)by the Ministry of Trade,Industry and Energy(MOTIE)and the Korea Institute for Advancement of Technology(KIAT)through the International Cooperative R&D program(No.P0028271).
文摘With the advent of the sixth-generationwireless technology,the importance of using artificial intelligence of things(AIoT)devices is increasing to enhance efficiency.As massive volumes of data are collected and stored in these AIoT environments,each device becomes a potential attack target,leading to increased security vulnerabilities.Therefore,intrusion detection studies have been conducted to detect malicious network traffic.However,existing studies have been biased toward conducting in-depth analyses of individual packets to improve accuracy or applying flow-based statistical information to ensure real-time performance.Effectively responding to complex andmultifaceted threats in large-scale AIoT environments is challenging.This study proposes a hybrid multivariate network traffic(HyMNeT)feature-based intrusion detection system that applies a hybrid meta-heuristic feature selection approach to create a secure and efficient AIoT environment.The HyMNeT system selects critical features by applying mutual information maximization(MIM)and the maximal information coefficient(MIC)based on statistical features of the network traffic flow and raw packet features.This system employs the reference vector-guided evolutionary algorithm to search for optimal thresholds that maximizeMIMscores whileminimizingMIC scores.An evaluation of the selected multivariate network traffic feature set using four machine learning models on the BoT-IoT and ToN-IoT datasets resulted in average accuracy,precision,recall,and F1-score values of 0.9844,0.9897,0.9844,and 0.9859,respectively.This work demonstrates that HyMNeT performs detection consistently and stably across all models.