针对配电网节点电压存在多重共线性导致拓扑识别不准确的问题,文章提出了利用多时间断面的节点电压数据进行拓扑识别,找出相似性高的节点并且用最小绝对收缩与选择(least absolute shrinkage and selection operator,Lasso)算法筛选邻...针对配电网节点电压存在多重共线性导致拓扑识别不准确的问题,文章提出了利用多时间断面的节点电压数据进行拓扑识别,找出相似性高的节点并且用最小绝对收缩与选择(least absolute shrinkage and selection operator,Lasso)算法筛选邻居节点。首先利用皮尔逊算法分析节点之间的相关系数,发现多个节点与不相邻节点存在高相关性,并推导了节点电压之间的近似线性相关关系。然后利用皮尔逊算法、欧氏距离和动态时间规整(dynamic time warping,DTW)算法作为相关性评价指标进行一次识别,找出具有多重共线性的节点。以主电源端节点作为父节点,利用Lasso回归算法确定子节点,以子节点作为新的父节点,如此循环进行二次识别,生成拓扑结构。最后通过IEEE33节点算例验证了该方法的可行性和准确性。展开更多
Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretabilit...Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.com/VSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data.For user’s convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations.展开更多
针对软件可靠性早期预测中软件复杂性度量属性维数灾难问题,提出了一种基于最小绝对值压缩与选择方法(The Least Absolute Shrinkage and Select Operator,LASSO)和最小角回归(Least Angle Regression,LARS)算法的软件复杂性度量属性特...针对软件可靠性早期预测中软件复杂性度量属性维数灾难问题,提出了一种基于最小绝对值压缩与选择方法(The Least Absolute Shrinkage and Select Operator,LASSO)和最小角回归(Least Angle Regression,LARS)算法的软件复杂性度量属性特征选择方法。该方法筛选掉一些对早期预测结果影响较小的软件复杂性度量属性,得到与早期预测关系最为密切的关键属性子集。首先分析了LASSO回归方法的特点及其在特征选择中的应用,然后对LARS算法进行了修正,使其可以解决LASSO方法所涉及的问题,得到相关的复杂性度量属性子集。最后结合学习向量量化(Learning Vector Quantization,LVQ)神经网络进行软件可靠性早期预测,并基于十折交叉方法进行实验。通过与传统特征选择方法相比较,证明所提方法可以显著提高软件可靠性早期预测精度。展开更多
文摘针对配电网节点电压存在多重共线性导致拓扑识别不准确的问题,文章提出了利用多时间断面的节点电压数据进行拓扑识别,找出相似性高的节点并且用最小绝对收缩与选择(least absolute shrinkage and selection operator,Lasso)算法筛选邻居节点。首先利用皮尔逊算法分析节点之间的相关系数,发现多个节点与不相邻节点存在高相关性,并推导了节点电压之间的近似线性相关关系。然后利用皮尔逊算法、欧氏距离和动态时间规整(dynamic time warping,DTW)算法作为相关性评价指标进行一次识别,找出具有多重共线性的节点。以主电源端节点作为父节点,利用Lasso回归算法确定子节点,以子节点作为新的父节点,如此循环进行二次识别,生成拓扑结构。最后通过IEEE33节点算例验证了该方法的可行性和准确性。
基金supported by National Key R&D Program of China(2021YFA1302100 to Q.Z)the National Natural Science Foundation of China(82172861 to Q.Z)+1 种基金Guangdong Basic and Applied Basic Research Foundation(2021A1515011743 to Q.Z)National Key Clinical Discipline(to D.Z)。
文摘Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.com/VSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data.For user’s convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations.