Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target cl...Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target classes accurately.This study aims to form a data classification algorithm using the fuzzy soft set method.In this study,the fuzzy soft set was calculated based on the normalized Hamming distance.Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function.In the classification step,a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets.The experiments used the University of California(UCI)Machine Learning dataset to assess the accuracy of the proposed data classification method.The dataset samples were divided into training(75%of samples)and test(25%of samples)sets.Experiments were performed in MATLAB R2010a software.The experiments showed that:(1)The fastest sequence is matching function,distance measure,similarity,normalized Euclidean distance,(2)the proposed approach can improve accuracy and recall by up to 10.3436%and 6.9723%,respectively,compared with baseline techniques.Hence,the fuzzy soft set method is appropriate for classifying data.展开更多
This paper proposes a robust faulted line-section location method based on the normalized quantile Hausdorff distance (NQHD) algorithm for detecting single-phase-to-ground faults in distribution networks.The faulted l...This paper proposes a robust faulted line-section location method based on the normalized quantile Hausdorff distance (NQHD) algorithm for detecting single-phase-to-ground faults in distribution networks.The faulted line section is determined according to the characteristic differences between the zero-sequence currents on the faulted and healthy line sections.Specifically,the zero-sequence currents at both ends of a healthy line section are highly similar to each other,while such is generally not the case on a faulted line section.The NQHD algorithm can disregard extremes or outliers while also providing a normalized scaling in different scenarios.Thus,it can be applied to calculate the robust waveform similarity of zero-sequence current waveforms at both ends of different line sections for identifying reliably the faulted line section even under the interference of outliers.The results demonstrate the good performance of the proposed method in detecting single-phase-to-ground faults under different fault conditions.Comparative tests with the existing methods confirm the advantageous robustness of the proposed method against the impacts of outliers and noises.展开更多
In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We d...In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.展开更多
In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and...In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.展开更多
文摘Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately.This study discusses data classification using a fuzzy soft set method to predict target classes accurately.This study aims to form a data classification algorithm using the fuzzy soft set method.In this study,the fuzzy soft set was calculated based on the normalized Hamming distance.Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function.In the classification step,a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets.The experiments used the University of California(UCI)Machine Learning dataset to assess the accuracy of the proposed data classification method.The dataset samples were divided into training(75%of samples)and test(25%of samples)sets.Experiments were performed in MATLAB R2010a software.The experiments showed that:(1)The fastest sequence is matching function,distance measure,similarity,normalized Euclidean distance,(2)the proposed approach can improve accuracy and recall by up to 10.3436%and 6.9723%,respectively,compared with baseline techniques.Hence,the fuzzy soft set method is appropriate for classifying data.
基金supported by the Future Resilient Systems(FRS-II)Project at the Singapore-ETH Centre(SEC),which was funded by the National Research Foundation of Singapore(NRF)under its Campus for Research Excellence and Technological Enterprise(CREATE)program.
文摘This paper proposes a robust faulted line-section location method based on the normalized quantile Hausdorff distance (NQHD) algorithm for detecting single-phase-to-ground faults in distribution networks.The faulted line section is determined according to the characteristic differences between the zero-sequence currents on the faulted and healthy line sections.Specifically,the zero-sequence currents at both ends of a healthy line section are highly similar to each other,while such is generally not the case on a faulted line section.The NQHD algorithm can disregard extremes or outliers while also providing a normalized scaling in different scenarios.Thus,it can be applied to calculate the robust waveform similarity of zero-sequence current waveforms at both ends of different line sections for identifying reliably the faulted line section even under the interference of outliers.The results demonstrate the good performance of the proposed method in detecting single-phase-to-ground faults under different fault conditions.Comparative tests with the existing methods confirm the advantageous robustness of the proposed method against the impacts of outliers and noises.
基金Supported by the National Natural Science Foundation of China(61133012,61202193,61373108)the Major Projects of the National Social Science Foundation of China(11&ZD189)+1 种基金the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)the Open Foundation of Shandong Key Laboratory of Language Resource Development and Application
文摘In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.
基金the National Natural Science Foundation of China under Grant Nos.60572084 and 60621062.
文摘In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.