Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping...Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping and replicability in error modeling.As area classes are rarely completely separable in empirically realized discriminant space,where class inseparabil-ity becomes more complicated for change categorization,we seek to quantify uncertainty in area classes(and change classes)due to measurement errors and semantic discrepancy separately and hence assess their relative margins objectively.Experiments using real datasets were carried out,and a Bayesian method was used to obtain change maps.We found that there are large differences be-tween uncertainty statistics referring to data classes and information classes.Therefore,uncertainty characterization in change categorization should be based on discriminant modeling of measurement errors and semantic mismatch analysis,enabling quanti-fication of uncertainty due to partially random measurement errors,and systematic categorical discrepancies,respectively.展开更多
It is important to describe misclassification errors in land cover maps and to quantify their propagation through geo-processing to resultant information products,such as land cover change maps.Geostatistical simulati...It is important to describe misclassification errors in land cover maps and to quantify their propagation through geo-processing to resultant information products,such as land cover change maps.Geostatistical simulation is widely used in error modeling,as it can generate equal-probable realizations of the fields being considered,which can be summarized to facilitate error propagation analysis.To fix noninvariance in indicator simulation,discriminant space-based methods were proposed to enhance consistency in area-class mapping and replicability in uncertainty modeling,as the former is achieved by imposing means while the latter is ensured by projecting spatio-temporal correlated residuals in discriminant space to geographic space through a mapping process.This paper explores discriminant models for error propagation in land cover change detection,followed by experiments based on bi-temporal remote sensing images.It was found that misclassification error propagation is effectively characterized with discriminant covariate-based stochastic simulation,where spatio-temporal interdependence is taken into account.展开更多
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl...Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.展开更多
自动安全换道是车辆实现无人驾驶的关键,为精确识别行驶车辆换道状态,保证行车安全,设计了一种基于多分类支持向量机(Multi-class Support Vector Machine,Multiclass SVM)的车辆换道识别模型。从NGSIM数据集中选取美国101公路车辆轨迹...自动安全换道是车辆实现无人驾驶的关键,为精确识别行驶车辆换道状态,保证行车安全,设计了一种基于多分类支持向量机(Multi-class Support Vector Machine,Multiclass SVM)的车辆换道识别模型。从NGSIM数据集中选取美国101公路车辆轨迹数据进行分类处理,并将车辆换道过程划分为车辆跟驰阶段、车辆换道准备阶段和车辆换道执行阶段。采用网格搜索结合粒子群优化算法(Grid Search-PSO)对SVM模型中惩罚参数C和核参数g进行寻优标定,利用多分类支持向量机换道识别模型对样本数据进行训练和测试,模型测试精度达97.68%。研究表明,模型能够很好地识别车辆在换道过程中的行为状态,为车辆换道阶段的研究提供支持。展开更多
We all live on one planet and geology has no borders.Countries that reside on different continents share the same architecture beneath the surface;they were once neighbors with common foundations.Interoperable geologi...We all live on one planet and geology has no borders.Countries that reside on different continents share the same architecture beneath the surface;they were once neighbors with common foundations.Interoperable geological data are now freely available to everyone for the benefit of society,demonstrating that geoscience can address both global and regional problems.Whilst increasingly large datasets("Big Data")provide clear opportunities(e.g.,Spina,2018).展开更多
A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting...A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.展开更多
In daily life,we are frequently attacked by infection organisms such as bacteria and viruses. Major Histocompatibility (MHC) molecules have an essential role in T-cell activation and initiating an adaptive immune resp...In daily life,we are frequently attacked by infection organisms such as bacteria and viruses. Major Histocompatibility (MHC) molecules have an essential role in T-cell activation and initiating an adaptive immune response. Development of methods for prediction of MHC-Peptide binding is important in vaccine design and immunotherapy. In this study, we try to predict the binding between peptides and MHC class II. Support vector machine (SVM) and Multi-Layer Percep-tron (MLP) are used for classification. These classifiers based on pseudo amino acid compositions of data that we ex-tracted from PseAAC server, classify the data. Since, the dataset, used in this work, is imbalanced, we apply a pre-processing step to over-sample the minority class and come over this problem. The results show that using the concept of pseudo amino acid composition and applying over-sampling method, increases the performance of predictor. Fur-thermore, the results demonstrate that using the concept of PseAAC and SVM is a successful method for the prediction of MHC class II molecules.展开更多
<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data qual...<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data quality. In the past several decades, different statistical measures have been developed to evaluate data quality for different types of data, such as nominal categorical data, ordinal categorical data and numerical data. Although these methods were originally proposed for medical research or psychological research, they have been widely used to evaluate spatial data quality. In this paper, we first review statistical methods for evaluating data quality, discuss under what conditions we should use them and how to interpret the results, followed by a brief discussion of statistical software and packages that can be used to compute these data quality measures.</span>展开更多
基金Supported by the National Natural Science Foundation of China (No.41171346,No. 41071286)the Fundamental Research Funds for the Central Universities (No. 20102130103000005)the National 973 Program of China (No. 2007CB714402‐5)
文摘Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping and replicability in error modeling.As area classes are rarely completely separable in empirically realized discriminant space,where class inseparabil-ity becomes more complicated for change categorization,we seek to quantify uncertainty in area classes(and change classes)due to measurement errors and semantic discrepancy separately and hence assess their relative margins objectively.Experiments using real datasets were carried out,and a Bayesian method was used to obtain change maps.We found that there are large differences be-tween uncertainty statistics referring to data classes and information classes.Therefore,uncertainty characterization in change categorization should be based on discriminant modeling of measurement errors and semantic mismatch analysis,enabling quanti-fication of uncertainty due to partially random measurement errors,and systematic categorical discrepancies,respectively.
基金Supported by the National Natural Science Foundation of China(Nos.41071286&41171346)Hubei Provincial Science and Technology Department(2007ABA276).
文摘It is important to describe misclassification errors in land cover maps and to quantify their propagation through geo-processing to resultant information products,such as land cover change maps.Geostatistical simulation is widely used in error modeling,as it can generate equal-probable realizations of the fields being considered,which can be summarized to facilitate error propagation analysis.To fix noninvariance in indicator simulation,discriminant space-based methods were proposed to enhance consistency in area-class mapping and replicability in uncertainty modeling,as the former is achieved by imposing means while the latter is ensured by projecting spatio-temporal correlated residuals in discriminant space to geographic space through a mapping process.This paper explores discriminant models for error propagation in land cover change detection,followed by experiments based on bi-temporal remote sensing images.It was found that misclassification error propagation is effectively characterized with discriminant covariate-based stochastic simulation,where spatio-temporal interdependence is taken into account.
文摘Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.
基金granted by National Natural Science Foundation of China(Grant Nos.41572154,41820104004)the National Key R&D Plan(Grant No.2017YFC0601405)the Strategic Priority Research Program(B)of the Chinese Academy of Sciences(Grant No.XDB18000000).
文摘We all live on one planet and geology has no borders.Countries that reside on different continents share the same architecture beneath the surface;they were once neighbors with common foundations.Interoperable geological data are now freely available to everyone for the benefit of society,demonstrating that geoscience can address both global and regional problems.Whilst increasingly large datasets("Big Data")provide clear opportunities(e.g.,Spina,2018).
文摘A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.
文摘In daily life,we are frequently attacked by infection organisms such as bacteria and viruses. Major Histocompatibility (MHC) molecules have an essential role in T-cell activation and initiating an adaptive immune response. Development of methods for prediction of MHC-Peptide binding is important in vaccine design and immunotherapy. In this study, we try to predict the binding between peptides and MHC class II. Support vector machine (SVM) and Multi-Layer Percep-tron (MLP) are used for classification. These classifiers based on pseudo amino acid compositions of data that we ex-tracted from PseAAC server, classify the data. Since, the dataset, used in this work, is imbalanced, we apply a pre-processing step to over-sample the minority class and come over this problem. The results show that using the concept of pseudo amino acid composition and applying over-sampling method, increases the performance of predictor. Fur-thermore, the results demonstrate that using the concept of PseAAC and SVM is a successful method for the prediction of MHC class II molecules.
文摘<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data quality. In the past several decades, different statistical measures have been developed to evaluate data quality for different types of data, such as nominal categorical data, ordinal categorical data and numerical data. Although these methods were originally proposed for medical research or psychological research, they have been widely used to evaluate spatial data quality. In this paper, we first review statistical methods for evaluating data quality, discuss under what conditions we should use them and how to interpret the results, followed by a brief discussion of statistical software and packages that can be used to compute these data quality measures.</span>