期刊文献+
共找到756篇文章
< 1 2 38 >
每页显示 20 50 100
Robust Latent Factor Analysis for Precise Representation of High-Dimensional and Sparse Data 被引量:5
1
作者 Di Wu Xin Luo 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第4期796-805,共10页
High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurat... High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurately represent them is of great significance.A latent factor(LF)model is one of the most popular and successful ways to address this issue.Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix,i.e.,they sum the errors between observed data and predicted ones with L2-norm.Yet L2-norm is sensitive to outlier data.Unfortunately,outlier data usually exist in such matrices.For example,an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users.To address this issue,this work proposes a smooth L1-norm-oriented latent factor(SL-LF)model.Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss,making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix.Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices. 展开更多
关键词 high-dimensional and sparse matrix L1-norm L2 norm latent factor model recommender system smooth L1-norm
在线阅读 下载PDF
Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data
2
作者 Haitao Zhang Wenhai Ma +1 位作者 Qilong Han Zhiqiang Ma 《国际计算机前沿大会会议论文集》 EI 2023年第1期192-206,共15页
This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based... This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed(SEAOD).First,the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm tofill in the minority class samples and generate a new dataset.Then,an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features.Finally,the data are dimensionally reduced based on the autoencoder,and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection,overcoming the impact of the curse of dimensionality on detection algorithms.The experimental results show that on 15 public datasets,this model outperforms other comparison algorithms.Furthermore,it was validated on industrial air quality datasets and achieved the expected results with practicality. 展开更多
关键词 high-dimensional data augmentation attention mechanism Outlier Detection
原文传递
Generalized Functional Linear Models:Efficient Modeling for High-dimensional Correlated Mixture Exposures
3
作者 Bingsong Zhang Haibin Yu +11 位作者 Xin Peng Haiyi Yan Siran Li Shutong Luo Renhuizi Wei Zhujiang Zhou Yalin Kuang Yihuan Zheng Chulan Ou Linhua Liu Yuehua Hu Jindong Ni 《Biomedical and Environmental Sciences》 2025年第8期961-976,共16页
Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemio... Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemiology and risk assessment,including high dimensionality,correlated exposure,and subtle individual effects.Methods We proposed a novel statistical approach,the generalized functional linear model(GFLM),to analyze the health effects of exposure mixtures.GFLM treats the effect of mixture exposures as a smooth function by reordering exposures based on specific mechanisms and capturing internal correlations to provide a meaningful estimation and interpretation.The robustness and efficiency was evaluated under various scenarios through extensive simulation studies.Results We applied the GFLM to two datasets from the National Health and Nutrition Examination Survey(NHANES).In the first application,we examined the effects of 37 nutrients on BMI(2011–2016 cycles).The GFLM identified a significant mixture effect,with fiber and fat emerging as the nutrients with the greatest negative and positive effects on BMI,respectively.For the second application,we investigated the association between four pre-and perfluoroalkyl substances(PFAS)and gout risk(2007–2018 cycles).Unlike traditional methods,the GFLM indicated no significant association,demonstrating its robustness to multicollinearity.Conclusion GFLM framework is a powerful tool for mixture exposure analysis,offering improved handling of correlated exposures and interpretable results.It demonstrates robust performance across various scenarios and real-world applications,advancing our understanding of complex environmental exposures and their health impacts on environmental epidemiology and toxicology. 展开更多
关键词 Mixture exposure modeling Functional data analysis high-dimensional data Correlated exposures Environmental epidemiology
暂未订购
A State-Migration Particle Swarm Optimizer for Adaptive Latent Factor Analysis of High-Dimensional and Incomplete Data
4
作者 Jiufang Chen Kechen Liu +4 位作者 Xin Luo Ye Yuan Khaled Sedraoui Yusuf Al-Turki MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第11期2220-2235,共16页
High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation lear... High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation learning to an HDI matrix,whose hyper-parameter adaptation can be implemented through a particle swarm optimizer(PSO) to meet scalable requirements.However, conventional PSO is limited by its premature issues,which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer(SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency.Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices. 展开更多
关键词 data science generalized momentum high-dimensional and incomplete(HDI)data hyper-parameter adaptation latent factor analysis(LFA) particle swarm optimization(PSO)
在线阅读 下载PDF
Censored Composite Conditional Quantile Screening for High-Dimensional Survival Data
5
作者 LIU Wei LI Yingqiu 《应用概率统计》 CSCD 北大核心 2024年第5期783-799,共17页
In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef... In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated. 展开更多
关键词 high-dimensional survival data censored composite conditional quantile coefficient sure screening property rank consistency property
在线阅读 下载PDF
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
6
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
在线阅读 下载PDF
Randomized Latent Factor Model for High-dimensional and Sparse Matrices from Industrial Applications 被引量:14
7
作者 Mingsheng Shang Xin Luo +3 位作者 Zhigang Liu Jia Chen Ye Yuan MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期131-141,共11页
Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterativ... Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost.Hence,determining how to accelerate the training process for LF models has become a significant issue.To address this,this work proposes a randomized latent factor(RLF)model.It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices,thereby greatly alleviating computational burden.It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models,RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices,which is especially desired for industrial applications demanding highly efficient models. 展开更多
关键词 Big data high-dimensional and sparse matrix latent factor analysis latent factor model randomized learning
在线阅读 下载PDF
Geophysical data sparse reconstruction based on L0-norm minimization 被引量:6
8
作者 陈国新 陈生昌 +1 位作者 王汉闯 张博 《Applied Geophysics》 SCIE CSCD 2013年第2期181-190,236,共11页
Missing data are a problem in geophysical surveys, and interpolation and reconstruction of missing data is part of the data processing and interpretation. Based on the sparseness of the geophysical data or the transfo... Missing data are a problem in geophysical surveys, and interpolation and reconstruction of missing data is part of the data processing and interpretation. Based on the sparseness of the geophysical data or the transform domain, we can improve the accuracy and stability of the reconstruction by transforming it to a sparse optimization problem. In this paper, we propose a mathematical model for the sparse reconstruction of data based on the LO-norm minimization. Furthermore, we discuss two types of the approximation algorithm for the LO- norm minimization according to the size and characteristics of the geophysical data: namely, the iteratively reweighted least-squares algorithm and the fast iterative hard thresholding algorithm. Theoretical and numerical analysis showed that applying the iteratively reweighted least-squares algorithm to the reconstruction of potential field data exploits its fast convergence rate, short calculation time, and high precision, whereas the fast iterative hard thresholding algorithm is more suitable for processing seismic data, moreover, its computational efficiency is better than that of the traditional iterative hard thresholding algorithm. 展开更多
关键词 Geophysical data sparse reconstruction LO-norm minimization iterativelyreweighted least squares fast iterative hard thresholding
在线阅读 下载PDF
High-dimensional Teaching Data Clustering in Sparse Subspaces Based on Information Entropy
9
作者 Huiyan Liu 《IJLAI Transactions on Science and Engineering》 2025年第2期23-28,共6页
Due to the large scale and high dimension of teaching data,the using of traditional clustering algorithms has problems such as high computational complexity and low accuracy.Therefore,this paper proposes a weighted bl... Due to the large scale and high dimension of teaching data,the using of traditional clustering algorithms has problems such as high computational complexity and low accuracy.Therefore,this paper proposes a weighted block sparse subspace clustering algorithm based on information entropy.The introduction of information entropy weight and block diagonal constraints can obtain the prior probability that two pixels belong to the same category before the simulation experiment,thereby positively intervening that the solutions solved by the model tend to be the optimal approximate solutions of the block diagonal structure.It can enable the model to obtain the performance against noise and outliers,and thereby improving the discriminative ability of the model classification.The experimental results show that the average probability Rand index of the proposed method is 0.86,which is higher than that of other algorithms.The average information change index of the proposed method is 1.55,which is lower than that of other algorithms,proving its strong robustness.On different datasets,the misclassification rates of the design method are 1.2%and 0.9%respectively,which proves that its classification accuracy is relatively high.The proposed method has high reliability in processing highdimensional teaching data.It can play an important role in the field of educational data analysis and provide strong support for intelligent teaching. 展开更多
关键词 Intelligent teaching sparse subspace clustering information entropy high-dimensional
在线阅读 下载PDF
CABOSFV algorithm for high dimensional sparse data clustering 被引量:7
10
作者 Sen Wu Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China 《Journal of University of Science and Technology Beijing》 CSCD 2004年第3期283-288,共6页
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp... An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively. 展开更多
关键词 CLUSTERING data mining sparse high dimensionality
在线阅读 下载PDF
A generative deep learning framework for airfoil flow field prediction with sparse data 被引量:9
11
作者 Haizhou WU Xuejun LIU +1 位作者 Wei AN Hongqiang LYU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第1期470-484,共15页
Deep learning has been probed for the airfoil performance prediction in recent years.Compared with the expensive CFD simulations and wind tunnel experiments,deep learning models can be leveraged to somewhat mitigate s... Deep learning has been probed for the airfoil performance prediction in recent years.Compared with the expensive CFD simulations and wind tunnel experiments,deep learning models can be leveraged to somewhat mitigate such expenses with proper means.Nevertheless,effective training of the data-driven models in deep learning severely hinges on the data in diversity and quantity.In this paper,we present a novel data augmented Generative Adversarial Network(GAN),daGAN,for rapid and accurate flow filed prediction,allowing the adaption to the task with sparse data.The presented approach consists of two modules,pre-training module and fine-tuning module.The pre-training module utilizes a conditional GAN(cGAN)to preliminarily estimate the distribution of the training data.In the fine-tuning module,we propose a novel adversarial architecture with two generators one of which fulfils a promising data augmentation operation,so that the complement data is adequately incorporated to boost the generalization of the model.We use numerical simulation data to verify the generalization of daGAN on airfoils and flow conditions with sparse training data.The results show that daGAN is a promising tool for rapid and accurate evaluation of detailed flow field without the requirement for big training data. 展开更多
关键词 CFD Flow field Generative adversarial networks(GANs) sparse data Supercritical airfoil
原文传递
Fast Computation of Sparse Data Cubes with Constraints 被引量:2
12
作者 FengYu-cai ChenChang-qing FengJian-lin XiangLong-gang 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第2期167-172,共6页
For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use the... For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system. 展开更多
关键词 sparse data cube functional dependency DIMENSION PARTITION CFD
在线阅读 下载PDF
Physics-informed neural network-based petroleum reservoir simulation with sparse data using domain decomposition 被引量:4
13
作者 Jiang-Xia Han Liang Xue +4 位作者 Yun-Sheng Wei Ya-Dong Qi Jun-Lei Wang Yue-Tian Liu Yu-Qi Zhang 《Petroleum Science》 SCIE EI CAS CSCD 2023年第6期3450-3460,共11页
Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity ... Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity numerical simulation data.This presents a significant challenge because the sole source of authentic wellbore production data for training is sparse.In response to this challenge,this work introduces a novel architecture called physics-informed neural network based on domain decomposition(PINN-DD),aiming to effectively utilize the sparse production data of wells for reservoir simulation with large-scale systems.To harness the capabilities of physics-informed neural networks(PINNs)in handling small-scale spatial-temporal domain while addressing the challenges of large-scale systems with sparse labeled data,the computational domain is divided into two distinct sub-domains:the well-containing and the well-free sub-domain.Moreover,the two sub-domains and the interface are rigorously constrained by the governing equations,data matching,and boundary conditions.The accuracy of the proposed method is evaluated on two problems,and its performance is compared against state-of-the-art PINNs through numerical analysis as a benchmark.The results demonstrate the superiority of PINN-DD in handling large-scale reservoir simulation with limited data and show its potential to outperform conventional PINNs in such scenarios. 展开更多
关键词 Physical-informed neural networks Fluid flow simulation sparse data Domain decomposition
原文传递
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
14
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
在线阅读 下载PDF
Reconstruction method of irregular seismic data with adaptive thresholds based on different sparse transform bases 被引量:4
15
作者 Zhao Hu Yang Tun +4 位作者 Ni Yu-Dong Liu Xing-Gang Xu Yin-Po Zhang Yi-Lei Zhang Guang-Rong 《Applied Geophysics》 SCIE CSCD 2021年第3期345-360,432,共17页
Oil and gas seismic exploration have to adopt irregular seismic acquisition due to the increasingly complex exploration conditions to adapt to complex geological conditions and environments.However,the irregular seism... Oil and gas seismic exploration have to adopt irregular seismic acquisition due to the increasingly complex exploration conditions to adapt to complex geological conditions and environments.However,the irregular seismic acquisition is accompanied by the lack of acquisition data,which requires high-precision regularization.The sparse signal feature in the transform domain in compressed sensing theory is used in this paper to recover the missing signal,involving sparse transform base optimization and threshold modeling.First,this paper analyzes and compares the effects of six sparse transformation bases on the reconstruction accuracy and efficiency of irregular seismic data and establishes the quantitative relationship between sparse transformation and reconstruction accuracy and efficiency.Second,an adaptive threshold modeling method based on sparse coefficient is provided to improve the reconstruction accuracy.Test results show that the method has good adaptability to different seismic data and sparse transform bases.The f-x domain reconstruction method of effective frequency samples is studied to address the problem of low computational efficiency.The parallel computing strategy of curvelet transform combined with OpenMP is further proposed,which substantially improves the computational efficiency under the premise of ensuring the reconstruction accuracy.Finally,the actual acquisition data are used to verify the proposed method.The results indicate that the proposed method strategy can solve the regularization problem of irregular seismic data in production and improve the imaging quality of the target layer economically and efficiently. 展开更多
关键词 irregular acquisition seismic data reconstruction adaptive threshold f-x domain OpenMP parallel optimization sparse transformation
在线阅读 下载PDF
INTERPOLATION TECHNIQUE FOR SPARSE DATA BASED ON INFORMATION DIFFUSION PRINCIPLE-ELLIPSE MODEL 被引量:1
16
作者 张韧 黄志松 +1 位作者 李佳讯 刘巍 《Journal of Tropical Meteorology》 SCIE 2013年第1期59-66,共8页
Addressing the difficulties of scattered and sparse observational data in ocean science,a new interpolation technique based on information diffusion is proposed in this paper.Based on a fuzzy mapping idea,sparse data ... Addressing the difficulties of scattered and sparse observational data in ocean science,a new interpolation technique based on information diffusion is proposed in this paper.Based on a fuzzy mapping idea,sparse data samples are diffused and mapped into corresponding fuzzy sets in the form of probability in an interpolation ellipse model.To avoid the shortcoming of normal diffusion function on the asymmetric structure,a kind of asymmetric information diffusion function is developed and a corresponding algorithm-ellipse model for diffusion of asymmetric information is established.Through interpolation experiments and contrast analysis of the sea surface temperature data with ARGO data,the rationality and validity of the ellipse model are assessed. 展开更多
关键词 INFORMATION DIFFUSION INTERPOLATION algorithm sparse data ELLIPSE model
在线阅读 下载PDF
Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning 被引量:3
17
作者 Shuo Zheng Yu-Xin Zhu +3 位作者 Dian-Qing Li Zi-Jun Cao Qin-Xuan Deng Kok-Kwang Phoon 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第1期425-439,共15页
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse mult... Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data. 展开更多
关键词 Outlier detection Site investigation sparse multivariate data Mahalanobis distance Resampling by half-means Bayesian machine learning
在线阅读 下载PDF
Sparse Seismic Data Reconstruction Based on a Convolutional Neural Network Algorithm 被引量:1
18
作者 HOU Xinwei TONG Siyou +3 位作者 WANG Zhongcheng XU Xiugang PENG Yin WANG Kai 《Journal of Ocean University of China》 SCIE CAS CSCD 2023年第2期410-418,共9页
At present,the acquisition of seismic data is developing toward high-precision and high-density methods.However,complex natural environments and cultural factors in many exploration areas cause difficulties in achievi... At present,the acquisition of seismic data is developing toward high-precision and high-density methods.However,complex natural environments and cultural factors in many exploration areas cause difficulties in achieving uniform and intensive acquisition,which makes complete seismic data collection impossible.Therefore,data reconstruction is required in the processing link to ensure imaging accuracy.Deep learning,as a new field in rapid development,presents clear advantages in feature extraction and modeling.In this study,the convolutional neural network deep learning algorithm is applied to seismic data reconstruction.Based on the convolutional neural network algorithm and combined with the characteristics of seismic data acquisition,two training strategies of supervised and unsupervised learning are designed to reconstruct sparse acquisition seismic records.First,a supervised learning strategy is proposed for labeled data,wherein the complete seismic data are segmented as the input of the training set and are randomly sampled before each training,thereby increasing the number of samples and the richness of features.Second,an unsupervised learning strategy based on large samples is proposed for unlabeled data,and the rolling segmentation method is used to update(pseudo)labels and training parameters in the training process.Through the reconstruction test of simulated and actual data,the deep learning algorithm based on a convolutional neural network shows better reconstruction quality and higher accuracy than compressed sensing based on Curvelet transform. 展开更多
关键词 deep learning convolutional neural network seismic data reconstruction compressed sensing sparse collection supervised learning unsupervised learning
在线阅读 下载PDF
A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
19
作者 李文法 Wang Gongming +1 位作者 Ma Nan Liu Hongzhe 《High Technology Letters》 EI CAS 2016年第3期241-247,共7页
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat... Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing. 展开更多
关键词 nearest neighbor search high-dimensional data SIMILARITY indexing tree NPsim KD-TREE SR-tree Munsell
在线阅读 下载PDF
上一页 1 2 38 下一页 到第
使用帮助 返回顶部