This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoe...This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection.展开更多
When addressing various financial problems,such as estimating stock portfolio risk,it is necessary to derive the distribution of the sum of the dependent random variables.Although deriving this distribution requires i...When addressing various financial problems,such as estimating stock portfolio risk,it is necessary to derive the distribution of the sum of the dependent random variables.Although deriving this distribution requires identifying the joint distribution of these random variables,exact estimation of the joint distribution of dependent random variables is difficult.Therefore,in recent years,studies have been conducted on the bound of the sum of dependent random variables with dependence uncertainty.In this study,we obtain an improved Hoeffding inequality for dependent bounded variables.Further,we expand the above result to the case of sub-Gaussian random variables.展开更多
针对重现概念漂移检测中的概念表征和分类器选择问题,提出了一种适用于含重现概念漂移的数据流分类的算法——基于主要特征抽取的概念聚类和预测算法(Conceptual clustering and prediction through main feature extraction,MFCCP)。MF...针对重现概念漂移检测中的概念表征和分类器选择问题,提出了一种适用于含重现概念漂移的数据流分类的算法——基于主要特征抽取的概念聚类和预测算法(Conceptual clustering and prediction through main feature extraction,MFCCP)。MFCCP通过计算不同批次样本的主要特征及影响因子的差异度以识别重复出现的概念,为每个概念维持且及时更新一个分类器,并依据Hoeffding不等式选择最合适的分类器对当前样本集实施分类,以提高对概念漂移的反应能力。在3个数据集上的实验表明:MFCCP在含重现概念漂移的数据集上的分类准确率,对概念漂移的反应能力及对概念漂移检测的准确率均明显优于其他4种对比算法,且MFCCP也适用于对不含重现概念漂移的数据流进行分类。展开更多
文摘This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection.
基金This work was supported by JSPS Grant-in-Aid for Young Scientists(Grant No.18K12873)Waseda University Grants for Special Research Projects(“Tokutei Kadai”)(Grant No.2019C-688).
文摘When addressing various financial problems,such as estimating stock portfolio risk,it is necessary to derive the distribution of the sum of the dependent random variables.Although deriving this distribution requires identifying the joint distribution of these random variables,exact estimation of the joint distribution of dependent random variables is difficult.Therefore,in recent years,studies have been conducted on the bound of the sum of dependent random variables with dependence uncertainty.In this study,we obtain an improved Hoeffding inequality for dependent bounded variables.Further,we expand the above result to the case of sub-Gaussian random variables.
文摘针对重现概念漂移检测中的概念表征和分类器选择问题,提出了一种适用于含重现概念漂移的数据流分类的算法——基于主要特征抽取的概念聚类和预测算法(Conceptual clustering and prediction through main feature extraction,MFCCP)。MFCCP通过计算不同批次样本的主要特征及影响因子的差异度以识别重复出现的概念,为每个概念维持且及时更新一个分类器,并依据Hoeffding不等式选择最合适的分类器对当前样本集实施分类,以提高对概念漂移的反应能力。在3个数据集上的实验表明:MFCCP在含重现概念漂移的数据集上的分类准确率,对概念漂移的反应能力及对概念漂移检测的准确率均明显优于其他4种对比算法,且MFCCP也适用于对不含重现概念漂移的数据流进行分类。