In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reve...In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.展开更多
钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noi...钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noise,DBSCAN)算法进行改进,对钢拱桥拱肋线形进行提取。三维激光点云数据具有全面性和细节体现的优势,能够完整地呈现桥梁结构的形状和变形信息,融合RANSAC的改进DBSCAN算法根据钢拱桥结构特征对聚类结果进行约束,能够很好地实现删除离散点及桥面、横撑、横联和腹杆部分的点云这一目的。根据融合RANSAC的改进DBSCAN算法提取出的点云进行关键点拟合,与人工提取结果进行对比,拱肋关键点提取误差均在毫米级,最大误差为9.2 mm,最小误差为0.1 mm,此提取方法能够更加准确有效地完成钢拱桥线形提取,使线形提取精度达到毫米级,大大降低了人力成本和时间成本,对钢拱桥的复杂结构有更好的鲁棒性,能很好地适应实际生产需求。展开更多
数据清洗、特征选择和预测模型建立是基于数据采集与监视控制系统(supervisory control and data acquisition,SCADA)数据,实现风电机组异常状态预警不可缺少的重要环节。先结合孤立森林(isolation forest,iForest)和基于密度的空间聚类...数据清洗、特征选择和预测模型建立是基于数据采集与监视控制系统(supervisory control and data acquisition,SCADA)数据,实现风电机组异常状态预警不可缺少的重要环节。先结合孤立森林(isolation forest,iForest)和基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法对SCADA数据异常点进行有效清洗,并采用随机森林算法(random forests,RF)与Person相关系数法优选模型输入参数;再进而基于Optuna优化的类别提升树(categorical boosting,CATBoost)算法,建立风电机组正常工况齿轮箱油池温度的预测模型;然后采用滑动窗方法,构建状态评价指标,并使用区间估计理论确定油温异常状态判别的临界阈值;实现油温异常预警;最后,采用某风电机组SCADA系统油温异常的真实历史故障数据进行检验,验证了该方法的有效性。展开更多
道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合...道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.展开更多
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
文摘In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.
文摘数据清洗、特征选择和预测模型建立是基于数据采集与监视控制系统(supervisory control and data acquisition,SCADA)数据,实现风电机组异常状态预警不可缺少的重要环节。先结合孤立森林(isolation forest,iForest)和基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法对SCADA数据异常点进行有效清洗,并采用随机森林算法(random forests,RF)与Person相关系数法优选模型输入参数;再进而基于Optuna优化的类别提升树(categorical boosting,CATBoost)算法,建立风电机组正常工况齿轮箱油池温度的预测模型;然后采用滑动窗方法,构建状态评价指标,并使用区间估计理论确定油温异常状态判别的临界阈值;实现油温异常预警;最后,采用某风电机组SCADA系统油温异常的真实历史故障数据进行检验,验证了该方法的有效性。
文摘道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.