The paper is devoted to the optimization of data structure in classification and clustering problems by mapping the original data onto a set of ordered feature vectors.When ordering,the elements of each feature vector...The paper is devoted to the optimization of data structure in classification and clustering problems by mapping the original data onto a set of ordered feature vectors.When ordering,the elements of each feature vector receive new num-bers such that their values are arranged in non-decreasing order.For update structure,the main volume of computational operations is performed not on multidimensional quantities describing objects,but on one-dimensional ones,which are the values of objects individual features.Then,instead of a rather complex existing algorithm,the same simplest algorithm is repeatedly used.Transition from original to ordered data leads to a decrease in the entropy of data distribution,which allows us to reveal their properties.It was shown that the classes differ in the functions of feature values for ordered object numbers.The set of these functions displays the information contained in the training sample and allows one to calculate class of any object in the test sample by values of its features using the simplest total probability formula.The paper also discusses the issues of using ordered data matrix to solve problems of par titioning a set into clusters of objects that have common properties.展开更多
In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces ...In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces in output space. In contrast, the recently proposed feature augmentation strategy, which aims at manipulating feature space, has also been shown to be an effective solution for MDC. However, existing feature augmentation approaches only focus on designing holistic augmented features to be appended with the original features, while better generalization performance could be achieved by exploiting multiple kinds of augmented features.In this paper, we propose the selective feature augmentation strategy that focuses on synergizing multiple kinds of augmented features.Specifically, by assuming that only part of the augmented features is pertinent and useful for each dimension′s model induction, we derive a classification model which can fully utilize the original features while conduct feature selection for the augmented features. To validate the effectiveness of the proposed strategy, we generate three kinds of simple augmented features based on standard k NN, weighted k NN, and maximum margin techniques, respectively. Comparative studies show that the proposed strategy achieves superior performance against both state-of-the-art MDC approaches and its degenerated versions with either kind of augmented features.展开更多
文摘The paper is devoted to the optimization of data structure in classification and clustering problems by mapping the original data onto a set of ordered feature vectors.When ordering,the elements of each feature vector receive new num-bers such that their values are arranged in non-decreasing order.For update structure,the main volume of computational operations is performed not on multidimensional quantities describing objects,but on one-dimensional ones,which are the values of objects individual features.Then,instead of a rather complex existing algorithm,the same simplest algorithm is repeatedly used.Transition from original to ordered data leads to a decrease in the entropy of data distribution,which allows us to reveal their properties.It was shown that the classes differ in the functions of feature values for ordered object numbers.The set of these functions displays the information contained in the training sample and allows one to calculate class of any object in the test sample by values of its features using the simplest total probability formula.The paper also discusses the issues of using ordered data matrix to solve problems of par titioning a set into clusters of objects that have common properties.
基金supported by National Science Foundation of China (No. 62176055)China University S&T Innovation Plan Guided by the Ministry of Education。
文摘In multi-dimensional classification(MDC), the semantics of objects are characterized by multiple class spaces from different dimensions. Most MDC approaches try to explicitly model the dependencies among class spaces in output space. In contrast, the recently proposed feature augmentation strategy, which aims at manipulating feature space, has also been shown to be an effective solution for MDC. However, existing feature augmentation approaches only focus on designing holistic augmented features to be appended with the original features, while better generalization performance could be achieved by exploiting multiple kinds of augmented features.In this paper, we propose the selective feature augmentation strategy that focuses on synergizing multiple kinds of augmented features.Specifically, by assuming that only part of the augmented features is pertinent and useful for each dimension′s model induction, we derive a classification model which can fully utilize the original features while conduct feature selection for the augmented features. To validate the effectiveness of the proposed strategy, we generate three kinds of simple augmented features based on standard k NN, weighted k NN, and maximum margin techniques, respectively. Comparative studies show that the proposed strategy achieves superior performance against both state-of-the-art MDC approaches and its degenerated versions with either kind of augmented features.