期刊文献+
共找到1,741篇文章
< 1 2 88 >
每页显示 20 50 100
Adaptive feature selection method for high-dimensional imbalanced data classification
1
作者 WU Jianzhen XUE Zhen +1 位作者 ZHANG Liangliang YANG Xu 《Journal of Measurement Science and Instrumentation》 2025年第4期612-624,共13页
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume... Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications. 展开更多
关键词 high-dimensional imbalanced data adaptive feature selection adaptive multi-filter feature hardness gaining sharing knowledge based algorithm metaheuristic algorithm
在线阅读 下载PDF
Diversity,Complexity,and Challenges of Viral Infectious Disease Data in the Big Data Era:A Comprehensive Review 被引量:1
2
作者 Yun Ma Lu-Yao Qin +1 位作者 Xiao Ding Ai-Ping Wu 《Chinese Medical Sciences Journal》 2025年第1期29-44,I0005,共17页
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr... Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape. 展开更多
关键词 viral infectious diseases big data data diversity and complexity data standardization artificial intelligence data analysis
暂未订购
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
3
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
在线阅读 下载PDF
Generalized Functional Linear Models:Efficient Modeling for High-dimensional Correlated Mixture Exposures
4
作者 Bingsong Zhang Haibin Yu +11 位作者 Xin Peng Haiyi Yan Siran Li Shutong Luo Renhuizi Wei Zhujiang Zhou Yalin Kuang Yihuan Zheng Chulan Ou Linhua Liu Yuehua Hu Jindong Ni 《Biomedical and Environmental Sciences》 2025年第8期961-976,共16页
Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemio... Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemiology and risk assessment,including high dimensionality,correlated exposure,and subtle individual effects.Methods We proposed a novel statistical approach,the generalized functional linear model(GFLM),to analyze the health effects of exposure mixtures.GFLM treats the effect of mixture exposures as a smooth function by reordering exposures based on specific mechanisms and capturing internal correlations to provide a meaningful estimation and interpretation.The robustness and efficiency was evaluated under various scenarios through extensive simulation studies.Results We applied the GFLM to two datasets from the National Health and Nutrition Examination Survey(NHANES).In the first application,we examined the effects of 37 nutrients on BMI(2011–2016 cycles).The GFLM identified a significant mixture effect,with fiber and fat emerging as the nutrients with the greatest negative and positive effects on BMI,respectively.For the second application,we investigated the association between four pre-and perfluoroalkyl substances(PFAS)and gout risk(2007–2018 cycles).Unlike traditional methods,the GFLM indicated no significant association,demonstrating its robustness to multicollinearity.Conclusion GFLM framework is a powerful tool for mixture exposure analysis,offering improved handling of correlated exposures and interpretable results.It demonstrates robust performance across various scenarios and real-world applications,advancing our understanding of complex environmental exposures and their health impacts on environmental epidemiology and toxicology. 展开更多
关键词 Mixture exposure modeling Functional data analysis high-dimensional data Correlated exposures Environmental epidemiology
暂未订购
Nonlocality Distillation and Trivial Communication Complexity for High-Dimensional Systems
5
作者 李艳 叶向军 陈景灵 《Chinese Physics Letters》 SCIE CAS CSCD 2016年第8期5-9,共5页
A nonlocality distillation protocol for arbitrary high-dimensional systems is proposed. We study the nonlocality distillation in the 2-input d-output bi-partite case. Firstly, we give the one-parameter nonlocal boxes ... A nonlocality distillation protocol for arbitrary high-dimensional systems is proposed. We study the nonlocality distillation in the 2-input d-output bi-partite case. Firstly, we give the one-parameter nonlocal boxes and their correlated distilling protocol. Then, we generalize the one-parameter nonlocality distillation protocol to the twoparameter case. Furthermore, we introduce a contracting protocol testifying that the 2-input d-output nonlocal boxes make communication complexity trivial. 展开更多
关键词 for on of IT in Nonlocality Distillation and Trivial Communication complexity for high-dimensional Systems IS
原文传递
A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
6
作者 李文法 Wang Gongming +1 位作者 Ma Nan Liu Hongzhe 《High Technology Letters》 EI CAS 2016年第3期241-247,共7页
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat... Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing. 展开更多
关键词 nearest neighbor search high-dimensional data SIMILARITY indexing tree NPsim KD-TREE SR-tree Munsell
在线阅读 下载PDF
A State-Migration Particle Swarm Optimizer for Adaptive Latent Factor Analysis of High-Dimensional and Incomplete Data
7
作者 Jiufang Chen Kechen Liu +4 位作者 Xin Luo Ye Yuan Khaled Sedraoui Yusuf Al-Turki MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第11期2220-2235,共16页
High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation lear... High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation learning to an HDI matrix,whose hyper-parameter adaptation can be implemented through a particle swarm optimizer(PSO) to meet scalable requirements.However, conventional PSO is limited by its premature issues,which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer(SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency.Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices. 展开更多
关键词 data science generalized momentum high-dimensional and incomplete(HDI)data hyper-parameter adaptation latent factor analysis(LFA) particle swarm optimization(PSO)
在线阅读 下载PDF
Censored Composite Conditional Quantile Screening for High-Dimensional Survival Data
8
作者 LIU Wei LI Yingqiu 《应用概率统计》 CSCD 北大核心 2024年第5期783-799,共17页
In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef... In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated. 展开更多
关键词 high-dimensional survival data censored composite conditional quantile coefficient sure screening property rank consistency property
在线阅读 下载PDF
Dimensionality Reduction of High-Dimensional Highly Correlated Multivariate Grapevine Dataset
9
作者 Uday Kant Jha Peter Bajorski +3 位作者 Ernest Fokoue Justine Vanden Heuvel Jan van Aardt Grant Anderson 《Open Journal of Statistics》 2017年第4期702-717,共16页
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni... Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability. 展开更多
关键词 high-dimensional data MULTI-STEP Adaptive Elastic Net MINIMAX CONCAVE Penalty Sure Independence Screening Functional data Analysis
暂未订购
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
10
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
在线阅读 下载PDF
Making Short-term High-dimensional Data Predictable
11
作者 CHEN Luonan 《Bulletin of the Chinese Academy of Sciences》 2018年第4期243-244,共2页
Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available f... Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available from real-world systems.To address this issue, Prof. 展开更多
关键词 RDE MAKING SHORT-TERM high-dimensional data Predictable
在线阅读 下载PDF
Lithological Discrimination of the Mafic-Ultramafic Complex, Huitongshan, Beishan, China: Using ASTER Data 被引量:11
12
作者 Lei Liu Jun Zhou +2 位作者 Dong Jiang Dafang Zhuang Lamin R Mansaray 《Journal of Earth Science》 SCIE CAS CSCD 2014年第3期529-536,共8页
The Beishan area has more than seventy mafic-ultramafic complexes sparsely distributed in the area and is of a big potential in mineral resources related to mafic-ultramafic intrusions. Many mafic-ultramafic intrusion... The Beishan area has more than seventy mafic-ultramafic complexes sparsely distributed in the area and is of a big potential in mineral resources related to mafic-ultramafic intrusions. Many mafic-ultramafic intrusions which are mostly in small sizes have been omitted by previous works. This research takes Huitongshan as the study area, which is a major district for mafic-ultramafic occurrences in Beishan. Advanced spaceborne thermal emission and reflection radiometer(ASTER) data have been processed and interpreted for mapping the mafic-ultramafic complex. ASTER data were processed by different techniques that were selected based on image reflectance and laboratory emissivity spectra. The visible near-infrared(VNIR) and short wave infrared(SWIR) data were transformed using band ratios and minimum noise fraction(MNF), while the thermal infrared(TIR) data were processed using mafic index(MI) and principal components analysis(PCA). ASTER band ratios(6/8, 5/4, 2/1) in RGB image and MNF(1, 2, 4) in RGB image were powerful in distinguishing the subtle differences between the various rock units. PCA applied to all five bands of ASTER TIR imagery highlighted marked differences among the mafic rock units and was more effective than the MI in differentiating mafic-ultramafic rocks. Our results were consistent with information derived from local geological maps. Based on the remote sensing results and field inspection, eleven gabbroic intrusions and a pyroxenite occurrence were recognized for the first time. A new geologic map of the Huitongshan area was created by integrating the results of remote sensing, previous geological maps and field inspection. It is concluded that the workflow of ASTER image processing, interpretation and ground inspection has great potential for mafic-ultramafic rocks identifying and relevant mineral targeting in the sparsely vegetated arid region of northwestern China. 展开更多
关键词 mafic-ultramafic complex ASTER data band ratio minimum noise fraction mafic index principal component analysis.
原文传递
Randomized Latent Factor Model for High-dimensional and Sparse Matrices from Industrial Applications 被引量:14
13
作者 Mingsheng Shang Xin Luo +3 位作者 Zhigang Liu Jia Chen Ye Yuan MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期131-141,共11页
Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterativ... Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost.Hence,determining how to accelerate the training process for LF models has become a significant issue.To address this,this work proposes a randomized latent factor(RLF)model.It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices,thereby greatly alleviating computational burden.It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models,RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices,which is especially desired for industrial applications demanding highly efficient models. 展开更多
关键词 Big data high-dimensional and sparse matrix latent factor analysis latent factor model randomized learning
在线阅读 下载PDF
Lensless complex amplitude demodulation based on deep learning in holographic data storage 被引量:7
14
作者 Jianying Hao Xiao Lin +5 位作者 Yongkun Lin Mingyong Chen Ruixian Chen Guohai Situ Hideyoshi Horimai Xiaodi Tan 《Opto-Electronic Advances》 SCIE EI CAS CSCD 2023年第3期42-56,共15页
To increase the storage capacity in holographic data storage(HDS),the information to be stored is encoded into a complex amplitude.Fast and accurate retrieval of amplitude and phase from the reconstructed beam is nece... To increase the storage capacity in holographic data storage(HDS),the information to be stored is encoded into a complex amplitude.Fast and accurate retrieval of amplitude and phase from the reconstructed beam is necessary during data readout in HDS.In this study,we proposed a complex amplitude demodulation method based on deep learning from a single-shot diffraction intensity image and verified it by a non-interferometric lensless experiment demodulating four-level amplitude and four-level phase.By analyzing the correlation between the diffraction intensity features and the amplitude and phase encoding data pages,the inverse problem was decomposed into two backward operators denoted by two convolutional neural networks(CNNs)to demodulate amplitude and phase respectively.The experimental system is simple,stable,and robust,and it only needs a single diffraction image to realize the direct demodulation of both amplitude and phase.To our investigation,this is the first time in HDS that multilevel complex amplitude demodulation is achieved experimentally from one diffraction intensity image without iterations. 展开更多
关键词 holographic data storage complex amplitude demodulation deep learning computational imaging
在线阅读 下载PDF
Source complexity of the 2016 M_W7.8 Kaikoura (New Zealand) earthquake revealed from teleseismic and InSAR data 被引量:4
15
作者 HaiLin Du Xu Zhang +3 位作者 LiSheng Xu WanPeng Feng Lei Yi Peng Li 《Earth and Planetary Physics》 2018年第4期310-326,共17页
On November 13, 2016, an MW7.8 earthquake struck Kaikoura in South Island of New Zealand. By means of back-projection of array recordings, ASTFs-analysis of global seismic recordings, and joint inversion of global sei... On November 13, 2016, an MW7.8 earthquake struck Kaikoura in South Island of New Zealand. By means of back-projection of array recordings, ASTFs-analysis of global seismic recordings, and joint inversion of global seismic data and co-seismic In SAR data, we investigated complexity of the earthquake source. The result shows that the 2016 MW7.8 Kaikoura earthquake ruptured about 100 s unilaterally from south to northeast(~N28°–33°E), producing a rupture area about 160 km long and about 50 km wide and releasing scalar moment 1.01×1021 Nm. In particular, the rupture area consisted of two slip asperities, with one close to the initial rupture point having a maximal slip value ~6.9 m while the other far away in the northeast having a maximal slip value ~9.3 m. The first asperity slipped for about 65 s and the second one started 40 s after the first one had initiated. The two slipped simultaneously for about 25 s.Furthermore, the first had a nearly thrust slip while the second had both thrust and strike slip. It is interesting that the rupture velocity was not constant, and the whole process may be divided into 5 stages in which the velocities were estimated to be 1.4 km/s, 0 km/s, 2.1 km/s, 0 km/s and 1.1 km/s, respectively. The high-frequency sources distributed nearly along the lower edge of the rupture area, the highfrequency radiating mainly occurred at launching of the asperities, and it seemed that no high-frequency energy was radiated when the rupturing was going to stop. 展开更多
关键词 2016 MW7.8 Kaikoura EARTHQUAKE BACK-PROJECTION of array RECORDINGS ASTFs-analysis of global RECORDINGS joint inversion of teleseismic and InSAR data complexITY of SOURCE
在线阅读 下载PDF
CSFW-SC: Cuckoo Search Fuzzy-Weighting Algorithm for Subspace Clustering Applying to High-Dimensional Clustering 被引量:1
16
作者 WANG Jindong HE Jiajing +1 位作者 ZHANG Hengwei YU Zhiyong 《China Communications》 SCIE CSCD 2015年第S2期55-63,共9页
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp... Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms. 展开更多
关键词 high-dimensional data CLUSTERING soft SUBSPACE CUCKOO SEARCH FUZZY CLUSTERING
在线阅读 下载PDF
A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection 被引量:1
17
作者 Yanlu Gong Junhai Zhou +2 位作者 Quanwang Wu MengChu Zhou Junhao Wen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1834-1844,共11页
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu... As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms. 展开更多
关键词 Bi-objective optimization feature selection(FS) genetic algorithm high-dimensional data length-adaptive
在线阅读 下载PDF
Observation points classifier ensemble for high-dimensional imbalanced classification 被引量:1
18
作者 Yulin He Xu Li +3 位作者 Philippe Fournier‐Viger Joshua Zhexue Huang Mianjie Li Salman Salloum 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期500-517,共18页
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)... In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems. 展开更多
关键词 classifier ensemble feature transformation high-dimensional data classification imbalanced learning observation point mechanism
在线阅读 下载PDF
A Complexity Analysis and Entropy for Different Data Compression Algorithms on Text Files 被引量:1
19
作者 Mohammad Hjouj Btoush Ziad E. Dawahdeh 《Journal of Computer and Communications》 2018年第1期301-315,共15页
In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorith... In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results. 展开更多
关键词 TEXT FILES data Compression HUFFMAN Coding LZW Hamming ENTROPY complexITY
暂未订购
上一页 1 2 88 下一页 到第
使用帮助 返回顶部