Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume...Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.展开更多
In this note,the authors revisit the envelope dimension reduction,which was first introduced for estimating a sufficient dimension reduction subspace without inverting the sample covariance.Motivated by the recent dev...In this note,the authors revisit the envelope dimension reduction,which was first introduced for estimating a sufficient dimension reduction subspace without inverting the sample covariance.Motivated by the recent developments in envelope methods and algorithms,the authors refresh the envelope inverse regression as a flexible alternative to the existing inverse regression methods in dimension reduction.The authors discuss the versatility of the envelope approach and demonstrate the advantages of the envelope dimension reduction through simulation studies.展开更多
Multi-dimensional arrays are referred to as tensors.Tensor-valued predictors are commonly encountered in modern biomedical applications,such as electroencephalogram(EEG),magnetic resonance imaging(MRI),functional MRI(...Multi-dimensional arrays are referred to as tensors.Tensor-valued predictors are commonly encountered in modern biomedical applications,such as electroencephalogram(EEG),magnetic resonance imaging(MRI),functional MRI(fMRI),diffusion-weighted MRI,and longitudinal health data.In survival analysis,it is both important and challenging to integrate clinically relevant information,such as gender,age,and disease state along with medical imaging tensor data or longitudinal health data to predict disease outcomes.Most existing higher-order sufficient dimension reduction regressions for matrix-or array-valued data focus solely on tensor data,often neglecting established clinical covariates that are readily available and known to have predictive value.Based on the idea of Folded-Minimum Average Variance Estimation(Folded-MAVE:Xue and Yin,2014),the authors propose a new method,Partial Dimension Folded-MAVE(PF-MAVE),to address regression mean functions with tensor-valued covariates while simultaneously incorporating clinical covariates,which are typically categorical variables.Theorems and simulation studies demonstrate the importance of incorporating these categorical clinical predictors.A survival analysis of a longitudinal study of primary biliary cirrhosis(PBC)data is included for illustration of the proposed method.展开更多
The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduce...The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.展开更多
In this paper,we study two types of the Ding injective dimensions of complexes.First,we provide some equivalent characterizations of the dimension related to the special Ding injec-tive preenvelopes.Furthermore,we con...In this paper,we study two types of the Ding injective dimensions of complexes.First,we provide some equivalent characterizations of the dimension related to the special Ding injec-tive preenvelopes.Furthermore,we consider the relationship between the dimensions Dipd(Y)and Did(Y)of the complex Y,where Dipd(Y)denotes the dimension associated with special Ding injective preenvelopes,and Did(Y)denotes the dimension associated with DG-injective resolutions.It is demonstrated that Dipd(Y)=Did(Y)for any bounded complex Y.展开更多
In this manuscript,we consider a non-autonomous dynamical system.Using the Carathéodory structure,we define a BS dimension on an arbitrary subset and obtain a Bowen’s equation that illustrates the relation of th...In this manuscript,we consider a non-autonomous dynamical system.Using the Carathéodory structure,we define a BS dimension on an arbitrary subset and obtain a Bowen’s equation that illustrates the relation of the BS dimension to the Pesin-Pitskel topological pressure given by Nazarian[24].Moreover,we establish a variational principle and an inverse variational principle for the BS dimension of non-autonomous dynamical systems.Finally,we also get an analogue of Billingsley’s theorem for the BS dimension of non-autonomous dynamical systems.展开更多
The proliferation of high-dimensional data and the widespread use of complex models present central challenges in contemporary statistics and data science.Dimension reduction and model checking,as two foundational pil...The proliferation of high-dimensional data and the widespread use of complex models present central challenges in contemporary statistics and data science.Dimension reduction and model checking,as two foundational pillars supporting scientific inference and data-driven decisionmaking,have evolved through the collective wisdom of generations of statisticians.This special issue,titled"Recent Developments in Dimension Reduction and Model Checking for regressions",not only aims to showcase cutting-edge advances in the field but also carries a distinct sense of academic homage to honor the groundbreaking and enduring contributions of Professor Lixing Zhu,a leading scholar whose work has profoundly shaped both areas.展开更多
With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy...With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy.However,efficient client selection and adaptive weight allocation in heterogeneous and non-IID environments remain challenging.To address these issues,we propose Federated Learning with Client Selection and Adaptive Weighting(FedCW),a novel algorithm that leverages adaptive client selection and dynamic weight allocation for optimizing model convergence in real-time vehicular networks.FedCW selects clients based on their Euclidean distance from the global model and dynamically adjusts aggregation weights to optimize both data diversity and model convergence.Experimental results show that FedCW significantly outperforms existing FL algorithms such as FedAvg,FedProx,and SCAFFOLD,particularly in non-IID settings,achieving faster convergence,higher accuracy,and reduced communication overhead.These findings demonstrate that FedCW provides an effective solution for enhancing the performance of FL in heterogeneous,edge-based computing environments.展开更多
Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when ta...Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when tackling high-dimensional optimization challenges.To effectively address these challenges,this study introduces cooperative metaheuristics integrating dynamic dimension reduction(DR).Building upon particle swarm optimization(PSO)and differential evolution(DE),the proposed cooperative methods C-PSO and C-DE are developed.In the proposed methods,the modified principal components analysis(PCA)is utilized to reduce the dimension of design variables,thereby decreasing computational costs.The dynamic DR strategy implements periodic execution of modified PCA after a fixed number of iterations,resulting in the important dimensions being dynamically identified.Compared with the static one,the dynamic DR strategy can achieve precise identification of important dimensions,thereby enabling accelerated convergence toward optimal solutions.Furthermore,the influence of cumulative contribution rate thresholds on optimization problems with different dimensions is investigated.Metaheuristic algorithms(PSO,DE)and cooperative metaheuristics(C-PSO,C-DE)are examined by 15 benchmark functions and two engineering design problems(speed reducer and composite pressure vessel).Comparative results demonstrate that the cooperative methods achieve significantly superior performance compared to standard methods in both solution accuracy and computational efficiency.Compared to standard metaheuristic algorithms,cooperative metaheuristics achieve a reduction in computational cost of at least 40%.The cooperative metaheuristics can be effectively used to tackle both high-dimensional unconstrained and constrained optimization problems.展开更多
In this paper,the authors propose a nonlinear dimension reduction technique based on Fréchet inverse regression to achieve sufficient dimension reduction for responses in metric spaces and predictors in Riemannia...In this paper,the authors propose a nonlinear dimension reduction technique based on Fréchet inverse regression to achieve sufficient dimension reduction for responses in metric spaces and predictors in Riemannian manifolds.The authors rigorously establish statistical properties of the estimators,providing formal proofs of their consistency and asymptotic behaviors.The effectiveness of our method is demonstrated through extensive simulations and applications to real-world datasets which highlight its practical utility for complex data with non-Euclidean structures.展开更多
Classical linear discriminant analysis(LDA)(Fisher,1936)implicitly assumes the classification boundary depends on only one linear combination of the predictors.This restriction can lead to poor classification in appli...Classical linear discriminant analysis(LDA)(Fisher,1936)implicitly assumes the classification boundary depends on only one linear combination of the predictors.This restriction can lead to poor classification in applications where the decision boundary depends on multiple linear combinations of the predictors.To overcome this challenge,the authors first project the predictors onto an envelope central space and then perform LDA based on the sufficient predictor.The performance of the proposed method in improving classification accuracy is demonstrated in both synthetic data and real applications.展开更多
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ...High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.展开更多
Populus species,important economic species combining rapid growth with broad ecological adaptability,play a critical role in sustainable forestry and bioenergy production.In this study,we performed whole-genome resequ...Populus species,important economic species combining rapid growth with broad ecological adaptability,play a critical role in sustainable forestry and bioenergy production.In this study,we performed whole-genome resequencing of 707 individuals from a full-sib family to develop comprehensive single nucleotide polymorphism(SNP)markers and constructed a high-density genetic linkage map of 19 linkage groups.The total genetic length of the map reached 3623.65 cM with an average marker interval of 0.34 cM.By integrating multidimensional phenotypic data,89 quantitative trait loci(QTL)associated with growth,wood physical and chemical properties,disease resistance,and leaf morphology traits were identified,with logarithm of odds(LOD)scores ranging from 3.13 to 21.72 Notably,pleiotropic analysis revealed significant colocaliza and phenotypic variance explained between 1.7% and 11.6%.-tion hotspots on chromosomes LG1,LG5,LG6,LG8,and LG14,with epistatic interaction network analysis confirming genetic basis of coordinated regulation across multiple traits.Functional annotation of 207 candidate genes showed that R2R3-MYB and bHLH transcription factors and pyruvate kinase-encoding genes were significantly enriched,suggesting crucial roles in lignin biosynthesis and carbon metabolic pathways.Allelic effect analysis indicated that the frequency of favorable alleles associated with target traits ranged from 0.20 to 0.55.Incorporation of QTL-derived favorable alleles as random effects into Bayesian-based genomic selection models led to an increase in prediction accuracy ranging from 1% to 21%,with Bayesian ridge regression as the best predictive model.This study provides valuable genomic resources and genetic insights for deciphering complex trait architecture and advancing molecular breeding in poplar.展开更多
The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects acc...The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.展开更多
As the core of cathode materials,sensitive metals play important roles in the optimization of acetate production from carbon dioxide(CO_(2))in microbial electrochemical system(MES).In this work,iron(Fe),copper(Cu),and...As the core of cathode materials,sensitive metals play important roles in the optimization of acetate production from carbon dioxide(CO_(2))in microbial electrochemical system(MES).In this work,iron(Fe),copper(Cu),and nickel(Ni)as sensitive metal cathode materials were evaluated for CO_(2) conversion in MES.The MES with Feelectrode as a promising electrode material demonstrated a superior CO_(2) reduction performance with a maximum acetate accumulation of 417.9±39.2 mg/L,which was 1.5 and 1.7 folds higher than that in the Ni-electrode and Cu-electrode groups,respectively.Furthermore,an outstanding electron recovery efficiency of 67.7%was shown in the Fe-electrode group.The electron transfer between electrode-suspended sludge was systematically cross-evaluated by the electrochemical behavior and extracellular polymeric substances.The Fe-electrode group had the highest electron transfer rate with 0.194 s-1(k_(app)),which was 17.6 and 21.5 times higher than that of the Cu-and Ni-electrode groups,respectively.Fe-electrode was beneficial for reducing electrochemical impedance between the electrode and suspended sludge.Additionally,redox substances in extracellular polymeric substances of the Fe-electrode group were increased,implying more favorable electron transport dynamics.Simultaneously,enrichments of functional bacteria Acetoanerobium and increased key enzymes involved in the carbonyl pathway of the Fe-electrode group were observed,which also promoted CO_(2) conversion in MES.This study provides a perspective on evaluating the promising sensitive metal electrode material for the process of CO_(2) valorization in MES and offers a reference for the subsequent electrode modification.展开更多
Human-modified landscapes serve as ecological filters,determining species distributions and persistence.Energy-efficient technologies,while crucial for climate change mitigation,represent novel filters whose impacts o...Human-modified landscapes serve as ecological filters,determining species distributions and persistence.Energy-efficient technologies,while crucial for climate change mitigation,represent novel filters whose impacts on synanthropic biodiversity are poorly understood.We investigated how attached sunspaces,a widely adopted energy-saving technology in rural China,filter the distribution of two ecologically important aerial insectivores,the Barn Swallow(Hirundo rustica)and Red-rumped Swallow(Cecropis daurica).We surveyed 106 villages during the 2024 and 2025 breeding seasons and recorded a total of 2323 nests(612 Barn Swallow,1711 Red-rumped Swallow).Using Generalized Linear Models,we assessed their responses to building characteristics,landscape composition and the prevalence of sunspaces.Barn Swallow nests preferred perches at the base and single attachment faces,while Red-rumped Swallow nests favored multiple attachment faces and avoided long shelters.The proportion of buildings with sunspaces acted as a strong positive filter for Barn Swallow nest abundance(+24%)but as a significant negative filter for Red-rumped Swallow(-51%).Other landscape variables(e.g.,human population density,NDVI,Human Footprint Index)were not significant.This study demonstrates that specific architectural innovations can act as powerful ecological filters,leading to divergent distributional outcomes for sympatric species reliant on anthropogenic structures.Our findings reveal a critical trade-off in sustainable development:energy efficiency gains may inadvertently reduce habitat suitability for certain species.To reconcile climate and biodiversity goals in rural landscapes,we advocate integrating species-specific habitat requirements into building design.We propose actionable modifications to sunspaces to support swallows without compromising energy savings.These principles provide a template for mitigating the distributional impacts of green infrastructure globally.展开更多
In Wireless Sensor Networks(WSNs),survivability is a crucial issue that is greatly impacted by energy efficiency.Solutions that satisfy application objectives while extending network life are needed to address severe ...In Wireless Sensor Networks(WSNs),survivability is a crucial issue that is greatly impacted by energy efficiency.Solutions that satisfy application objectives while extending network life are needed to address severe energy constraints inWSNs.This paper presents an Adaptive Enhanced GreyWolf Optimizer(AEGWO)for energy-efficient cluster head(CH)selection that mitigates the exploration–exploitation imbalance,preserves population diversity,and avoids premature convergence inherent in baseline GWO.The AEGWO combines adaptive control of the parameter of the search pressure to accelerate convergence without stagnation,a hybrid velocity-momentum update based on the dynamics of PSO,and an intelligent mutation operator to maintain the diversity of the population.The search is guided by a multi-objective fitness,which aims at maximizing the residual energy,equal distribution of CH,minimizing the intra-cluster distance,desirable proximity to sinks,and enhancing the coverage.Simulations on 100 nodes homogeneousWSN Tested the proposed AEGWO under the same conditions with LEACH,GWO,IGWO,PSO,WOA,and GA,AEGWO significantly increases stability and lifetime compared to LEACHand other tested algorithms;it has the best first,half,and last node dead,and higher residual energy and smaller communication overhead.The findings prove that AEGWO provides sustainable energy management and better lifetime extension,which makes it a robust,flexible clustering protocol of large-scaleWSNs.展开更多
Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness.This study improves on prior methods by developing a small yet robust system that ...Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness.This study improves on prior methods by developing a small yet robust system that can predict when turbofan engines will fail.It uses the NASA CMAPSS dataset,which has over 200,000 engine cycles from260 engines.The process begins with systematic preprocessing,which includes imputation,outlier removal,scaling,and labelling of the remaining useful life.Dimensionality is reduced using a hybrid selection method that combines variance filtering,recursive elimination,and gradient-boosted importance scores,yielding a stable set of 10 informative sensors.To mitigate class imbalance,minority cases are oversampled,and class-weighted losses are applied during training.Benchmarking is carried out with logistic regression,gradient boosting,and a recurrent design that integrates gated recurrent units with long short-term memory networks.The Long Short-Term Memory–Gated Recurrent Unit(LSTM–GRU)hybrid achieved the strongest performance with an F1 score of 0.92,precision of 0.93,recall of 0.91,ReceiverOperating Characteristic–AreaUnder the Curve(ROC-AUC)of 0.97,andminority recall of 0.75.Interpretability testing using permutation importance and Shapley values indicates that sensors 13,15,and 11 are the most important indicators of engine wear.The proposed system combines imbalance handling,feature reduction,and Interpretability into a practical design suitable for real industrial settings.展开更多
Feature selection serves as a critical preprocessing step inmachine learning,focusing on identifying and preserving the most relevant features to improve the efficiency and performance of classification algorithms.Par...Feature selection serves as a critical preprocessing step inmachine learning,focusing on identifying and preserving the most relevant features to improve the efficiency and performance of classification algorithms.Particle Swarm Optimization has demonstrated significant potential in addressing feature selection challenges.However,there are inherent limitations in Particle Swarm Optimization,such as the delicate balance between exploration and exploitation,susceptibility to local optima,and suboptimal convergence rates,hinder its performance.To tackle these issues,this study introduces a novel Leveraged Opposition-Based Learning method within Fitness Landscape Particle Swarm Optimization,tailored for wrapper-based feature selection.The proposed approach integrates:(1)a fitness-landscape adaptive strategy to dynamically balance exploration and exploitation,(2)the lever principle within Opposition-Based Learning to improve search efficiency,and(3)a Local Selection and Re-optimization mechanism combined with random perturbation to expedite convergence and enhance the quality of the optimal feature subset.The effectiveness of is rigorously evaluated on 24 benchmark datasets and compared against 13 advancedmetaheuristic algorithms.Experimental results demonstrate that the proposed method outperforms the compared algorithms in classification accuracy on over half of the datasets,whilst also significantly reducing the number of selected features.These findings demonstrate its effectiveness and robustness in feature selection tasks.展开更多
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
基金supported by Fundamental Research Program of Shanxi Province(Nos.202203021211088,202403021212254,202403021221109)Graduate Research Innovation Project in Shanxi Province(No.2024KY616).
文摘Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.
基金supported by the National Natural Science Foundation of China under Grant No.12301365supported by the National Natural Science Foundation of China under Grant No.2241200071Guangdong Basic and Applied Basic Research Foundation under Grant No.2023A1515110001。
文摘In this note,the authors revisit the envelope dimension reduction,which was first introduced for estimating a sufficient dimension reduction subspace without inverting the sample covariance.Motivated by the recent developments in envelope methods and algorithms,the authors refresh the envelope inverse regression as a flexible alternative to the existing inverse regression methods in dimension reduction.The authors discuss the versatility of the envelope approach and demonstrate the advantages of the envelope dimension reduction through simulation studies.
文摘Multi-dimensional arrays are referred to as tensors.Tensor-valued predictors are commonly encountered in modern biomedical applications,such as electroencephalogram(EEG),magnetic resonance imaging(MRI),functional MRI(fMRI),diffusion-weighted MRI,and longitudinal health data.In survival analysis,it is both important and challenging to integrate clinically relevant information,such as gender,age,and disease state along with medical imaging tensor data or longitudinal health data to predict disease outcomes.Most existing higher-order sufficient dimension reduction regressions for matrix-or array-valued data focus solely on tensor data,often neglecting established clinical covariates that are readily available and known to have predictive value.Based on the idea of Folded-Minimum Average Variance Estimation(Folded-MAVE:Xue and Yin,2014),the authors propose a new method,Partial Dimension Folded-MAVE(PF-MAVE),to address regression mean functions with tensor-valued covariates while simultaneously incorporating clinical covariates,which are typically categorical variables.Theorems and simulation studies demonstrate the importance of incorporating these categorical clinical predictors.A survival analysis of a longitudinal study of primary biliary cirrhosis(PBC)data is included for illustration of the proposed method.
基金supported by Ho Chi Minh City Open University,Vietnam under grant number E2024.02.1CD and Suan Sunandha Rajabhat University,Thailand.
文摘The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.
基金Supported by the National Natural Science Foundation of China(Grant No.12061061)the Young Talents Team Project of Gansu Province(Grant No.2025QNTD49)+1 种基金Lanshan Talent Project of Northwest Minzu University(Grant No.Xbmulsrc202412)Longyuan Young Talents of Gansu Province.
文摘In this paper,we study two types of the Ding injective dimensions of complexes.First,we provide some equivalent characterizations of the dimension related to the special Ding injec-tive preenvelopes.Furthermore,we consider the relationship between the dimensions Dipd(Y)and Did(Y)of the complex Y,where Dipd(Y)denotes the dimension associated with special Ding injective preenvelopes,and Did(Y)denotes the dimension associated with DG-injective resolutions.It is demonstrated that Dipd(Y)=Did(Y)for any bounded complex Y.
基金supported by the NSFC(12461012)and the NSF of Chongqing(CSTB2024NSCQ-MSX1246).
文摘In this manuscript,we consider a non-autonomous dynamical system.Using the Carathéodory structure,we define a BS dimension on an arbitrary subset and obtain a Bowen’s equation that illustrates the relation of the BS dimension to the Pesin-Pitskel topological pressure given by Nazarian[24].Moreover,we establish a variational principle and an inverse variational principle for the BS dimension of non-autonomous dynamical systems.Finally,we also get an analogue of Billingsley’s theorem for the BS dimension of non-autonomous dynamical systems.
文摘The proliferation of high-dimensional data and the widespread use of complex models present central challenges in contemporary statistics and data science.Dimension reduction and model checking,as two foundational pillars supporting scientific inference and data-driven decisionmaking,have evolved through the collective wisdom of generations of statisticians.This special issue,titled"Recent Developments in Dimension Reduction and Model Checking for regressions",not only aims to showcase cutting-edge advances in the field but also carries a distinct sense of academic homage to honor the groundbreaking and enduring contributions of Professor Lixing Zhu,a leading scholar whose work has profoundly shaped both areas.
文摘With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy.However,efficient client selection and adaptive weight allocation in heterogeneous and non-IID environments remain challenging.To address these issues,we propose Federated Learning with Client Selection and Adaptive Weighting(FedCW),a novel algorithm that leverages adaptive client selection and dynamic weight allocation for optimizing model convergence in real-time vehicular networks.FedCW selects clients based on their Euclidean distance from the global model and dynamically adjusts aggregation weights to optimize both data diversity and model convergence.Experimental results show that FedCW significantly outperforms existing FL algorithms such as FedAvg,FedProx,and SCAFFOLD,particularly in non-IID settings,achieving faster convergence,higher accuracy,and reduced communication overhead.These findings demonstrate that FedCW provides an effective solution for enhancing the performance of FL in heterogeneous,edge-based computing environments.
基金funded by National Natural Science Foundation of China(Nos.12402142,11832013 and 11572134)Natural Science Foundation of Hubei Province(No.2024AFB235)+1 种基金Hubei Provincial Department of Education Science and Technology Research Project(No.Q20221714)the Opening Foundation of Hubei Key Laboratory of Digital Textile Equipment(Nos.DTL2023019 and DTL2022012).
文摘Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when tackling high-dimensional optimization challenges.To effectively address these challenges,this study introduces cooperative metaheuristics integrating dynamic dimension reduction(DR).Building upon particle swarm optimization(PSO)and differential evolution(DE),the proposed cooperative methods C-PSO and C-DE are developed.In the proposed methods,the modified principal components analysis(PCA)is utilized to reduce the dimension of design variables,thereby decreasing computational costs.The dynamic DR strategy implements periodic execution of modified PCA after a fixed number of iterations,resulting in the important dimensions being dynamically identified.Compared with the static one,the dynamic DR strategy can achieve precise identification of important dimensions,thereby enabling accelerated convergence toward optimal solutions.Furthermore,the influence of cumulative contribution rate thresholds on optimization problems with different dimensions is investigated.Metaheuristic algorithms(PSO,DE)and cooperative metaheuristics(C-PSO,C-DE)are examined by 15 benchmark functions and two engineering design problems(speed reducer and composite pressure vessel).Comparative results demonstrate that the cooperative methods achieve significantly superior performance compared to standard methods in both solution accuracy and computational efficiency.Compared to standard metaheuristic algorithms,cooperative metaheuristics achieve a reduction in computational cost of at least 40%.The cooperative metaheuristics can be effectively used to tackle both high-dimensional unconstrained and constrained optimization problems.
文摘In this paper,the authors propose a nonlinear dimension reduction technique based on Fréchet inverse regression to achieve sufficient dimension reduction for responses in metric spaces and predictors in Riemannian manifolds.The authors rigorously establish statistical properties of the estimators,providing formal proofs of their consistency and asymptotic behaviors.The effectiveness of our method is demonstrated through extensive simulations and applications to real-world datasets which highlight its practical utility for complex data with non-Euclidean structures.
文摘Classical linear discriminant analysis(LDA)(Fisher,1936)implicitly assumes the classification boundary depends on only one linear combination of the predictors.This restriction can lead to poor classification in applications where the decision boundary depends on multiple linear combinations of the predictors.To overcome this challenge,the authors first project the predictors onto an envelope central space and then perform LDA based on the sufficient predictor.The performance of the proposed method in improving classification accuracy is demonstrated in both synthetic data and real applications.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(RS-2020-NR049579).
文摘High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.
基金supported by the National Key Research and Development Plan of China(2021YFD2200202)the Key Research and Development Project of Jiangsu Province,China(BE2021366).
文摘Populus species,important economic species combining rapid growth with broad ecological adaptability,play a critical role in sustainable forestry and bioenergy production.In this study,we performed whole-genome resequencing of 707 individuals from a full-sib family to develop comprehensive single nucleotide polymorphism(SNP)markers and constructed a high-density genetic linkage map of 19 linkage groups.The total genetic length of the map reached 3623.65 cM with an average marker interval of 0.34 cM.By integrating multidimensional phenotypic data,89 quantitative trait loci(QTL)associated with growth,wood physical and chemical properties,disease resistance,and leaf morphology traits were identified,with logarithm of odds(LOD)scores ranging from 3.13 to 21.72 Notably,pleiotropic analysis revealed significant colocaliza and phenotypic variance explained between 1.7% and 11.6%.-tion hotspots on chromosomes LG1,LG5,LG6,LG8,and LG14,with epistatic interaction network analysis confirming genetic basis of coordinated regulation across multiple traits.Functional annotation of 207 candidate genes showed that R2R3-MYB and bHLH transcription factors and pyruvate kinase-encoding genes were significantly enriched,suggesting crucial roles in lignin biosynthesis and carbon metabolic pathways.Allelic effect analysis indicated that the frequency of favorable alleles associated with target traits ranged from 0.20 to 0.55.Incorporation of QTL-derived favorable alleles as random effects into Bayesian-based genomic selection models led to an increase in prediction accuracy ranging from 1% to 21%,with Bayesian ridge regression as the best predictive model.This study provides valuable genomic resources and genetic insights for deciphering complex trait architecture and advancing molecular breeding in poplar.
基金supported by the China Agriculture Research System of MOF and MARAthe National Natural Science Foundation of China (31872337 and 31501919)the Agricultural Science and Technology Innovation Project,China (ASTIP-IAS02)。
文摘The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.
基金supported by the Science and Technology Commission of Shanghai Municipality Foundation(No.22230710500)the Interdisciplinary joint research project of Tongji University(No.2023-3-YB-07).
文摘As the core of cathode materials,sensitive metals play important roles in the optimization of acetate production from carbon dioxide(CO_(2))in microbial electrochemical system(MES).In this work,iron(Fe),copper(Cu),and nickel(Ni)as sensitive metal cathode materials were evaluated for CO_(2) conversion in MES.The MES with Feelectrode as a promising electrode material demonstrated a superior CO_(2) reduction performance with a maximum acetate accumulation of 417.9±39.2 mg/L,which was 1.5 and 1.7 folds higher than that in the Ni-electrode and Cu-electrode groups,respectively.Furthermore,an outstanding electron recovery efficiency of 67.7%was shown in the Fe-electrode group.The electron transfer between electrode-suspended sludge was systematically cross-evaluated by the electrochemical behavior and extracellular polymeric substances.The Fe-electrode group had the highest electron transfer rate with 0.194 s-1(k_(app)),which was 17.6 and 21.5 times higher than that of the Cu-and Ni-electrode groups,respectively.Fe-electrode was beneficial for reducing electrochemical impedance between the electrode and suspended sludge.Additionally,redox substances in extracellular polymeric substances of the Fe-electrode group were increased,implying more favorable electron transport dynamics.Simultaneously,enrichments of functional bacteria Acetoanerobium and increased key enzymes involved in the carbonyl pathway of the Fe-electrode group were observed,which also promoted CO_(2) conversion in MES.This study provides a perspective on evaluating the promising sensitive metal electrode material for the process of CO_(2) valorization in MES and offers a reference for the subsequent electrode modification.
基金funded by the National Natural Science Foundation of China(No.32201304)the Fundamental Research Funds for the Central Universities(2412022QD026)。
文摘Human-modified landscapes serve as ecological filters,determining species distributions and persistence.Energy-efficient technologies,while crucial for climate change mitigation,represent novel filters whose impacts on synanthropic biodiversity are poorly understood.We investigated how attached sunspaces,a widely adopted energy-saving technology in rural China,filter the distribution of two ecologically important aerial insectivores,the Barn Swallow(Hirundo rustica)and Red-rumped Swallow(Cecropis daurica).We surveyed 106 villages during the 2024 and 2025 breeding seasons and recorded a total of 2323 nests(612 Barn Swallow,1711 Red-rumped Swallow).Using Generalized Linear Models,we assessed their responses to building characteristics,landscape composition and the prevalence of sunspaces.Barn Swallow nests preferred perches at the base and single attachment faces,while Red-rumped Swallow nests favored multiple attachment faces and avoided long shelters.The proportion of buildings with sunspaces acted as a strong positive filter for Barn Swallow nest abundance(+24%)but as a significant negative filter for Red-rumped Swallow(-51%).Other landscape variables(e.g.,human population density,NDVI,Human Footprint Index)were not significant.This study demonstrates that specific architectural innovations can act as powerful ecological filters,leading to divergent distributional outcomes for sympatric species reliant on anthropogenic structures.Our findings reveal a critical trade-off in sustainable development:energy efficiency gains may inadvertently reduce habitat suitability for certain species.To reconcile climate and biodiversity goals in rural landscapes,we advocate integrating species-specific habitat requirements into building design.We propose actionable modifications to sunspaces to support swallows without compromising energy savings.These principles provide a template for mitigating the distributional impacts of green infrastructure globally.
基金The Open Access publication fee for this article was fully covered by Abu Dhabi University.
文摘In Wireless Sensor Networks(WSNs),survivability is a crucial issue that is greatly impacted by energy efficiency.Solutions that satisfy application objectives while extending network life are needed to address severe energy constraints inWSNs.This paper presents an Adaptive Enhanced GreyWolf Optimizer(AEGWO)for energy-efficient cluster head(CH)selection that mitigates the exploration–exploitation imbalance,preserves population diversity,and avoids premature convergence inherent in baseline GWO.The AEGWO combines adaptive control of the parameter of the search pressure to accelerate convergence without stagnation,a hybrid velocity-momentum update based on the dynamics of PSO,and an intelligent mutation operator to maintain the diversity of the population.The search is guided by a multi-objective fitness,which aims at maximizing the residual energy,equal distribution of CH,minimizing the intra-cluster distance,desirable proximity to sinks,and enhancing the coverage.Simulations on 100 nodes homogeneousWSN Tested the proposed AEGWO under the same conditions with LEACH,GWO,IGWO,PSO,WOA,and GA,AEGWO significantly increases stability and lifetime compared to LEACHand other tested algorithms;it has the best first,half,and last node dead,and higher residual energy and smaller communication overhead.The findings prove that AEGWO provides sustainable energy management and better lifetime extension,which makes it a robust,flexible clustering protocol of large-scaleWSNs.
基金supported by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia Grant No.KFU253765.
文摘Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness.This study improves on prior methods by developing a small yet robust system that can predict when turbofan engines will fail.It uses the NASA CMAPSS dataset,which has over 200,000 engine cycles from260 engines.The process begins with systematic preprocessing,which includes imputation,outlier removal,scaling,and labelling of the remaining useful life.Dimensionality is reduced using a hybrid selection method that combines variance filtering,recursive elimination,and gradient-boosted importance scores,yielding a stable set of 10 informative sensors.To mitigate class imbalance,minority cases are oversampled,and class-weighted losses are applied during training.Benchmarking is carried out with logistic regression,gradient boosting,and a recurrent design that integrates gated recurrent units with long short-term memory networks.The Long Short-Term Memory–Gated Recurrent Unit(LSTM–GRU)hybrid achieved the strongest performance with an F1 score of 0.92,precision of 0.93,recall of 0.91,ReceiverOperating Characteristic–AreaUnder the Curve(ROC-AUC)of 0.97,andminority recall of 0.75.Interpretability testing using permutation importance and Shapley values indicates that sensors 13,15,and 11 are the most important indicators of engine wear.The proposed system combines imbalance handling,feature reduction,and Interpretability into a practical design suitable for real industrial settings.
基金supported by National Natural Science Foundation of China(62106092)Natural Science Foundation of Fujian Province(2024J01822,2024J01820,2022J01916)Natural Science Foundation of Zhangzhou City(ZZ2024J28).
文摘Feature selection serves as a critical preprocessing step inmachine learning,focusing on identifying and preserving the most relevant features to improve the efficiency and performance of classification algorithms.Particle Swarm Optimization has demonstrated significant potential in addressing feature selection challenges.However,there are inherent limitations in Particle Swarm Optimization,such as the delicate balance between exploration and exploitation,susceptibility to local optima,and suboptimal convergence rates,hinder its performance.To tackle these issues,this study introduces a novel Leveraged Opposition-Based Learning method within Fitness Landscape Particle Swarm Optimization,tailored for wrapper-based feature selection.The proposed approach integrates:(1)a fitness-landscape adaptive strategy to dynamically balance exploration and exploitation,(2)the lever principle within Opposition-Based Learning to improve search efficiency,and(3)a Local Selection and Re-optimization mechanism combined with random perturbation to expedite convergence and enhance the quality of the optimal feature subset.The effectiveness of is rigorously evaluated on 24 benchmark datasets and compared against 13 advancedmetaheuristic algorithms.Experimental results demonstrate that the proposed method outperforms the compared algorithms in classification accuracy on over half of the datasets,whilst also significantly reducing the number of selected features.These findings demonstrate its effectiveness and robustness in feature selection tasks.