This paper investigates the impact of reducing feature-vector dimensionality on the performance of machine learning(ML)models.Dimensionality reduction and feature selection techniques can improve computational efficie...This paper investigates the impact of reducing feature-vector dimensionality on the performance of machine learning(ML)models.Dimensionality reduction and feature selection techniques can improve computational efficiency,accuracy,robustness,transparency,and interpretability of ML models.In high-dimensional data,where features outnumber training instances,redundant or irrelevant features introduce noise,hindering model generalization and accuracy.This study explores the effects of dimensionality reduction methods on binary classifier performance using network traffic data for cybersecurity applications.The paper examines how dimensionality reduction techniques influence classifier operation and performance across diverse performancemetrics for seven ML models.Four dimensionality reduction methods are evaluated:principal component analysis(PCA),singular value decomposition(SVD),univariate feature selection(UFS)using chi-square statistics,and feature selection based on mutual information(MI).The results suggest that direct feature selection can be more effective than data projection methods in some applications.Direct selection offers lower computational complexity and,in some cases,superior classifier performance.This study emphasizes that evaluation and comparison of binary classifiers depend on specific performance metrics,each providing insights into different aspects of ML model operation.Using open-source network traffic data,this paper demonstrates that dimensionality reduction can be a valuable tool.It reduces computational overhead,enhances model interpretability and transparency,and maintains or even improves the performance of trained classifiers.The study also reveals that direct feature selection can be a more effective strategy when compared to feature engineering in specific scenarios.展开更多
In the 1970’s, Folland and Stein studied a family of subelliptic scalar operators $ \mathcal{L}_\lambda $ which arise naturally in the $ \bar \partial _b $ -complex. They introduced weighted Sobolev spaces as the nat...In the 1970’s, Folland and Stein studied a family of subelliptic scalar operators $ \mathcal{L}_\lambda $ which arise naturally in the $ \bar \partial _b $ -complex. They introduced weighted Sobolev spaces as the natural spaces for this complex, and then obtained sharp estimates for $ \bar \partial _b $ in these spaces using integral kernels and approximate inverses. In the 1990’s, Rumin introduced a differential complex for compact contact manifolds, showed that the Folland-Stein operators are central to the analysis for the corresponding Laplace operator, and derived the necessary estimates for the Laplacian from the Folland Stein analysis. In this paper, we give a self-contained derivation of sharp estimates in the anisotropic Folland-Stein spaces for the operators studied by Rumin using integration by parts and a modified approach to bootstrapping.展开更多
基金funded by US Army Combat Capabilities Development Command(CCDC)Aviation&Missile Center,https://www.avmc.army.mil/(accessed on 5 February 2024),CONTRACT NUMBER:W31P4Q-18-D-0002 through Georgia Tech Research Institute and AAMU-RISE。
文摘This paper investigates the impact of reducing feature-vector dimensionality on the performance of machine learning(ML)models.Dimensionality reduction and feature selection techniques can improve computational efficiency,accuracy,robustness,transparency,and interpretability of ML models.In high-dimensional data,where features outnumber training instances,redundant or irrelevant features introduce noise,hindering model generalization and accuracy.This study explores the effects of dimensionality reduction methods on binary classifier performance using network traffic data for cybersecurity applications.The paper examines how dimensionality reduction techniques influence classifier operation and performance across diverse performancemetrics for seven ML models.Four dimensionality reduction methods are evaluated:principal component analysis(PCA),singular value decomposition(SVD),univariate feature selection(UFS)using chi-square statistics,and feature selection based on mutual information(MI).The results suggest that direct feature selection can be more effective than data projection methods in some applications.Direct selection offers lower computational complexity and,in some cases,superior classifier performance.This study emphasizes that evaluation and comparison of binary classifiers depend on specific performance metrics,each providing insights into different aspects of ML model operation.Using open-source network traffic data,this paper demonstrates that dimensionality reduction can be a valuable tool.It reduces computational overhead,enhances model interpretability and transparency,and maintains or even improves the performance of trained classifiers.The study also reveals that direct feature selection can be a more effective strategy when compared to feature engineering in specific scenarios.
基金This work was supported by NSERC(Grant No.RGPIN/9319-2005)
文摘In the 1970’s, Folland and Stein studied a family of subelliptic scalar operators $ \mathcal{L}_\lambda $ which arise naturally in the $ \bar \partial _b $ -complex. They introduced weighted Sobolev spaces as the natural spaces for this complex, and then obtained sharp estimates for $ \bar \partial _b $ in these spaces using integral kernels and approximate inverses. In the 1990’s, Rumin introduced a differential complex for compact contact manifolds, showed that the Folland-Stein operators are central to the analysis for the corresponding Laplace operator, and derived the necessary estimates for the Laplacian from the Folland Stein analysis. In this paper, we give a self-contained derivation of sharp estimates in the anisotropic Folland-Stein spaces for the operators studied by Rumin using integration by parts and a modified approach to bootstrapping.