期刊文献+
共找到792篇文章
< 1 2 40 >
每页显示 20 50 100
Test for Varying-Coefficient Models with High-Dimensional Data
1
作者 YANG Lin GAO Yuzhao QU Lianqiang 《Journal of Systems Science & Complexity》 2026年第1期203-229,共27页
The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data.Utilizing kernel smoothing techniques,the authors propose a locally concerned U-statistic method... The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data.Utilizing kernel smoothing techniques,the authors propose a locally concerned U-statistic method to assess the overall significance of the coefficients.The authors establish that the proposed test is asymptotically normal under both the null hypothesis and local alternatives.Based on the locally concerned U-statistic,the authors further develop a globally concerned U-statistic to test whether the coefficient function is zero.A stochastic perturbation method is employed to approximate the distribution of the globally concerned test statistic.Monte Carlo simulations demonstrate the validity of the proposed test in finite samples. 展开更多
关键词 Hypothesis testing high-dimensional data kernel smoothing U-STATISTIC varying-coefficient models
原文传递
Enhanced sparse RCNN for transmission line bolt defect detection via text-to-image data augmentation and quality filtering
2
作者 Chen Zhenyu Yan Huaguang +2 位作者 Du Jianguang Xue Meng Zhao Shuai 《High Technology Letters》 2026年第1期11-20,共10页
To address the issue of inconsistent image quality and data scarcity in bolt defect detection for transmission lines,this paper proposes an improved sparse region-based convolutional neural network(RCNN) based detecti... To address the issue of inconsistent image quality and data scarcity in bolt defect detection for transmission lines,this paper proposes an improved sparse region-based convolutional neural network(RCNN) based detection framework integrating image quality evaluation and text-to-image data augmentation.First,a HyperNetwork-based image quality assessment module is introduced to filter low-quality inspection images in terms of clarity and structural integrity,resulting in a high-quality training dataset.Second,a text-to-image diffusion model is utilized for sample augmentation.By designing text prompts that describe various bolt defect types under diverse lighting and viewing conditions,the model automatically generates realistic synthetic samples.The generated images are further filtered using a combination of quality and perceptual similarity metrics to ensure consistency with the real data distribution.Building upon the sparse RCNN baseline,a dynamic label assignment mechanism and a random decision path detection head are incorporated to enhance bounding box matching and prediction accuracy.Experimental results demonstrate that the proposed method significantly improves detection accuracy(mAP@0.5) over the original sparse RCNN while maintaining low computational cost,enabling more efficient and intelligent inspection of transmission line components. 展开更多
关键词 sparse region-based convolutional neural network HyperNetwork image quality assessment text-to-image generation data augmentation bolt defect detection transmission line inspection
在线阅读 下载PDF
Robust Latent Factor Analysis for Precise Representation of High-Dimensional and Sparse Data 被引量:5
3
作者 Di Wu Xin Luo 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第4期796-805,共10页
High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurat... High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurately represent them is of great significance.A latent factor(LF)model is one of the most popular and successful ways to address this issue.Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix,i.e.,they sum the errors between observed data and predicted ones with L2-norm.Yet L2-norm is sensitive to outlier data.Unfortunately,outlier data usually exist in such matrices.For example,an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users.To address this issue,this work proposes a smooth L1-norm-oriented latent factor(SL-LF)model.Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss,making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix.Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices. 展开更多
关键词 high-dimensional and sparse matrix L1-norm L2 norm latent factor model recommender system smooth L1-norm
在线阅读 下载PDF
Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data
4
作者 Haitao Zhang Wenhai Ma +1 位作者 Qilong Han Zhiqiang Ma 《国际计算机前沿大会会议论文集》 EI 2023年第1期192-206,共15页
This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based... This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed(SEAOD).First,the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm tofill in the minority class samples and generate a new dataset.Then,an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features.Finally,the data are dimensionally reduced based on the autoencoder,and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection,overcoming the impact of the curse of dimensionality on detection algorithms.The experimental results show that on 15 public datasets,this model outperforms other comparison algorithms.Furthermore,it was validated on industrial air quality datasets and achieved the expected results with practicality. 展开更多
关键词 high-dimensional data augmentation attention mechanism Outlier Detection
原文传递
Adaptive feature selection method for high-dimensional imbalanced data classification
5
作者 WU Jianzhen XUE Zhen +1 位作者 ZHANG Liangliang YANG Xu 《Journal of Measurement Science and Instrumentation》 2025年第4期612-624,共13页
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume... Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications. 展开更多
关键词 high-dimensional imbalanced data adaptive feature selection adaptive multi-filter feature hardness gaining sharing knowledge based algorithm metaheuristic algorithm
在线阅读 下载PDF
Randomized Latent Factor Model for High-dimensional and Sparse Matrices from Industrial Applications 被引量:14
6
作者 Mingsheng Shang Xin Luo +3 位作者 Zhigang Liu Jia Chen Ye Yuan MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期131-141,共11页
Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterativ... Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost.Hence,determining how to accelerate the training process for LF models has become a significant issue.To address this,this work proposes a randomized latent factor(RLF)model.It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices,thereby greatly alleviating computational burden.It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models,RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices,which is especially desired for industrial applications demanding highly efficient models. 展开更多
关键词 Big data high-dimensional and sparse matrix latent factor analysis latent factor model randomized learning
在线阅读 下载PDF
Geophysical data sparse reconstruction based on L0-norm minimization 被引量:6
7
作者 陈国新 陈生昌 +1 位作者 王汉闯 张博 《Applied Geophysics》 SCIE CSCD 2013年第2期181-190,236,共11页
Missing data are a problem in geophysical surveys, and interpolation and reconstruction of missing data is part of the data processing and interpretation. Based on the sparseness of the geophysical data or the transfo... Missing data are a problem in geophysical surveys, and interpolation and reconstruction of missing data is part of the data processing and interpretation. Based on the sparseness of the geophysical data or the transform domain, we can improve the accuracy and stability of the reconstruction by transforming it to a sparse optimization problem. In this paper, we propose a mathematical model for the sparse reconstruction of data based on the LO-norm minimization. Furthermore, we discuss two types of the approximation algorithm for the LO- norm minimization according to the size and characteristics of the geophysical data: namely, the iteratively reweighted least-squares algorithm and the fast iterative hard thresholding algorithm. Theoretical and numerical analysis showed that applying the iteratively reweighted least-squares algorithm to the reconstruction of potential field data exploits its fast convergence rate, short calculation time, and high precision, whereas the fast iterative hard thresholding algorithm is more suitable for processing seismic data, moreover, its computational efficiency is better than that of the traditional iterative hard thresholding algorithm. 展开更多
关键词 Geophysical data sparse reconstruction LO-norm minimization iterativelyreweighted least squares fast iterative hard thresholding
在线阅读 下载PDF
CABOSFV algorithm for high dimensional sparse data clustering 被引量:7
8
作者 Sen Wu Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China 《Journal of University of Science and Technology Beijing》 CSCD 2004年第3期283-288,共6页
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp... An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively. 展开更多
关键词 CLUSTERING data mining sparse high dimensionality
在线阅读 下载PDF
A generative deep learning framework for airfoil flow field prediction with sparse data 被引量:10
9
作者 Haizhou WU Xuejun LIU +1 位作者 Wei AN Hongqiang LYU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第1期470-484,共15页
Deep learning has been probed for the airfoil performance prediction in recent years.Compared with the expensive CFD simulations and wind tunnel experiments,deep learning models can be leveraged to somewhat mitigate s... Deep learning has been probed for the airfoil performance prediction in recent years.Compared with the expensive CFD simulations and wind tunnel experiments,deep learning models can be leveraged to somewhat mitigate such expenses with proper means.Nevertheless,effective training of the data-driven models in deep learning severely hinges on the data in diversity and quantity.In this paper,we present a novel data augmented Generative Adversarial Network(GAN),daGAN,for rapid and accurate flow filed prediction,allowing the adaption to the task with sparse data.The presented approach consists of two modules,pre-training module and fine-tuning module.The pre-training module utilizes a conditional GAN(cGAN)to preliminarily estimate the distribution of the training data.In the fine-tuning module,we propose a novel adversarial architecture with two generators one of which fulfils a promising data augmentation operation,so that the complement data is adequately incorporated to boost the generalization of the model.We use numerical simulation data to verify the generalization of daGAN on airfoils and flow conditions with sparse training data.The results show that daGAN is a promising tool for rapid and accurate evaluation of detailed flow field without the requirement for big training data. 展开更多
关键词 CFD Flow field Generative adversarial networks(GANs) sparse data Supercritical airfoil
原文传递
Physics-informed neural network-based petroleum reservoir simulation with sparse data using domain decomposition 被引量:6
10
作者 Jiang-Xia Han Liang Xue +4 位作者 Yun-Sheng Wei Ya-Dong Qi Jun-Lei Wang Yue-Tian Liu Yu-Qi Zhang 《Petroleum Science》 SCIE EI CAS CSCD 2023年第6期3450-3460,共11页
Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity ... Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity numerical simulation data.This presents a significant challenge because the sole source of authentic wellbore production data for training is sparse.In response to this challenge,this work introduces a novel architecture called physics-informed neural network based on domain decomposition(PINN-DD),aiming to effectively utilize the sparse production data of wells for reservoir simulation with large-scale systems.To harness the capabilities of physics-informed neural networks(PINNs)in handling small-scale spatial-temporal domain while addressing the challenges of large-scale systems with sparse labeled data,the computational domain is divided into two distinct sub-domains:the well-containing and the well-free sub-domain.Moreover,the two sub-domains and the interface are rigorously constrained by the governing equations,data matching,and boundary conditions.The accuracy of the proposed method is evaluated on two problems,and its performance is compared against state-of-the-art PINNs through numerical analysis as a benchmark.The results demonstrate the superiority of PINN-DD in handling large-scale reservoir simulation with limited data and show its potential to outperform conventional PINNs in such scenarios. 展开更多
关键词 Physical-informed neural networks Fluid flow simulation sparse data Domain decomposition
原文传递
Fast Computation of Sparse Data Cubes with Constraints 被引量:2
11
作者 FengYu-cai ChenChang-qing FengJian-lin XiangLong-gang 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第2期167-172,共6页
For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use the... For a data cube there are always constraints between dimensions or among attributes in a dimension, such as functional dependencies. We introduce the problem that when there are functional dependencies, how to use them to speed up the computation of sparse data cubes. A new algorithm CFD (Computation by Functional Dependencies) is presented to satisfy this demand. CFD determines the order of dimensions by considering cardinalities of dimensions and functional dependencies between dimensions together, thus reduce the number of partitions for such dimensions. CFD also combines partitioning from bottom to up and aggregate computation from top to bottom to speed up the computation further. CFD can efficiently compute a data cube with hierarchies in a dimension from the smallest granularity to the coarsest one. Key words sparse data cube - functional dependency - dimension - partition - CFD CLC number TP 311 Foundation item: Supported by the E-Government Project of the Ministry of Science and Technology of China (2001BA110B01)Biography: Feng Yu-cai (1945-), male, Professor, research direction: database system. 展开更多
关键词 sparse data cube functional dependency DIMENSION PARTITION CFD
在线阅读 下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
12
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
在线阅读 下载PDF
Reconstruction method of irregular seismic data with adaptive thresholds based on different sparse transform bases 被引量:4
13
作者 Zhao Hu Yang Tun +4 位作者 Ni Yu-Dong Liu Xing-Gang Xu Yin-Po Zhang Yi-Lei Zhang Guang-Rong 《Applied Geophysics》 SCIE CSCD 2021年第3期345-360,432,共17页
Oil and gas seismic exploration have to adopt irregular seismic acquisition due to the increasingly complex exploration conditions to adapt to complex geological conditions and environments.However,the irregular seism... Oil and gas seismic exploration have to adopt irregular seismic acquisition due to the increasingly complex exploration conditions to adapt to complex geological conditions and environments.However,the irregular seismic acquisition is accompanied by the lack of acquisition data,which requires high-precision regularization.The sparse signal feature in the transform domain in compressed sensing theory is used in this paper to recover the missing signal,involving sparse transform base optimization and threshold modeling.First,this paper analyzes and compares the effects of six sparse transformation bases on the reconstruction accuracy and efficiency of irregular seismic data and establishes the quantitative relationship between sparse transformation and reconstruction accuracy and efficiency.Second,an adaptive threshold modeling method based on sparse coefficient is provided to improve the reconstruction accuracy.Test results show that the method has good adaptability to different seismic data and sparse transform bases.The f-x domain reconstruction method of effective frequency samples is studied to address the problem of low computational efficiency.The parallel computing strategy of curvelet transform combined with OpenMP is further proposed,which substantially improves the computational efficiency under the premise of ensuring the reconstruction accuracy.Finally,the actual acquisition data are used to verify the proposed method.The results indicate that the proposed method strategy can solve the regularization problem of irregular seismic data in production and improve the imaging quality of the target layer economically and efficiently. 展开更多
关键词 irregular acquisition seismic data reconstruction adaptive threshold f-x domain OpenMP parallel optimization sparse transformation
在线阅读 下载PDF
INTERPOLATION TECHNIQUE FOR SPARSE DATA BASED ON INFORMATION DIFFUSION PRINCIPLE-ELLIPSE MODEL 被引量:1
14
作者 张韧 黄志松 +1 位作者 李佳讯 刘巍 《Journal of Tropical Meteorology》 SCIE 2013年第1期59-66,共8页
Addressing the difficulties of scattered and sparse observational data in ocean science,a new interpolation technique based on information diffusion is proposed in this paper.Based on a fuzzy mapping idea,sparse data ... Addressing the difficulties of scattered and sparse observational data in ocean science,a new interpolation technique based on information diffusion is proposed in this paper.Based on a fuzzy mapping idea,sparse data samples are diffused and mapped into corresponding fuzzy sets in the form of probability in an interpolation ellipse model.To avoid the shortcoming of normal diffusion function on the asymmetric structure,a kind of asymmetric information diffusion function is developed and a corresponding algorithm-ellipse model for diffusion of asymmetric information is established.Through interpolation experiments and contrast analysis of the sea surface temperature data with ARGO data,the rationality and validity of the ellipse model are assessed. 展开更多
关键词 INFORMATION DIFFUSION INTERPOLATION algorithm sparse data ELLIPSE model
在线阅读 下载PDF
Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning 被引量:3
15
作者 Shuo Zheng Yu-Xin Zhu +3 位作者 Dian-Qing Li Zi-Jun Cao Qin-Xuan Deng Kok-Kwang Phoon 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第1期425-439,共15页
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse mult... Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data. 展开更多
关键词 Outlier detection Site investigation sparse multivariate data Mahalanobis distance Resampling by half-means Bayesian machine learning
在线阅读 下载PDF
Sparse Seismic Data Reconstruction Based on a Convolutional Neural Network Algorithm 被引量:1
16
作者 HOU Xinwei TONG Siyou +3 位作者 WANG Zhongcheng XU Xiugang PENG Yin WANG Kai 《Journal of Ocean University of China》 SCIE CAS CSCD 2023年第2期410-418,共9页
At present,the acquisition of seismic data is developing toward high-precision and high-density methods.However,complex natural environments and cultural factors in many exploration areas cause difficulties in achievi... At present,the acquisition of seismic data is developing toward high-precision and high-density methods.However,complex natural environments and cultural factors in many exploration areas cause difficulties in achieving uniform and intensive acquisition,which makes complete seismic data collection impossible.Therefore,data reconstruction is required in the processing link to ensure imaging accuracy.Deep learning,as a new field in rapid development,presents clear advantages in feature extraction and modeling.In this study,the convolutional neural network deep learning algorithm is applied to seismic data reconstruction.Based on the convolutional neural network algorithm and combined with the characteristics of seismic data acquisition,two training strategies of supervised and unsupervised learning are designed to reconstruct sparse acquisition seismic records.First,a supervised learning strategy is proposed for labeled data,wherein the complete seismic data are segmented as the input of the training set and are randomly sampled before each training,thereby increasing the number of samples and the richness of features.Second,an unsupervised learning strategy based on large samples is proposed for unlabeled data,and the rolling segmentation method is used to update(pseudo)labels and training parameters in the training process.Through the reconstruction test of simulated and actual data,the deep learning algorithm based on a convolutional neural network shows better reconstruction quality and higher accuracy than compressed sensing based on Curvelet transform. 展开更多
关键词 deep learning convolutional neural network seismic data reconstruction compressed sensing sparse collection supervised learning unsupervised learning
在线阅读 下载PDF
Generalized Functional Linear Models:Efficient Modeling for High-dimensional Correlated Mixture Exposures
17
作者 Bingsong Zhang Haibin Yu +11 位作者 Xin Peng Haiyi Yan Siran Li Shutong Luo Renhuizi Wei Zhujiang Zhou Yalin Kuang Yihuan Zheng Chulan Ou Linhua Liu Yuehua Hu Jindong Ni 《Biomedical and Environmental Sciences》 2025年第8期961-976,共16页
Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemio... Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemiology and risk assessment,including high dimensionality,correlated exposure,and subtle individual effects.Methods We proposed a novel statistical approach,the generalized functional linear model(GFLM),to analyze the health effects of exposure mixtures.GFLM treats the effect of mixture exposures as a smooth function by reordering exposures based on specific mechanisms and capturing internal correlations to provide a meaningful estimation and interpretation.The robustness and efficiency was evaluated under various scenarios through extensive simulation studies.Results We applied the GFLM to two datasets from the National Health and Nutrition Examination Survey(NHANES).In the first application,we examined the effects of 37 nutrients on BMI(2011–2016 cycles).The GFLM identified a significant mixture effect,with fiber and fat emerging as the nutrients with the greatest negative and positive effects on BMI,respectively.For the second application,we investigated the association between four pre-and perfluoroalkyl substances(PFAS)and gout risk(2007–2018 cycles).Unlike traditional methods,the GFLM indicated no significant association,demonstrating its robustness to multicollinearity.Conclusion GFLM framework is a powerful tool for mixture exposure analysis,offering improved handling of correlated exposures and interpretable results.It demonstrates robust performance across various scenarios and real-world applications,advancing our understanding of complex environmental exposures and their health impacts on environmental epidemiology and toxicology. 展开更多
关键词 Mixture exposure modeling Functional data analysis high-dimensional data Correlated exposures Environmental epidemiology
暂未订购
A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
18
作者 李文法 Wang Gongming +1 位作者 Ma Nan Liu Hongzhe 《High Technology Letters》 EI CAS 2016年第3期241-247,共7页
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat... Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing. 展开更多
关键词 nearest neighbor search high-dimensional data SIMILARITY indexing tree NPsim KD-TREE SR-tree Munsell
在线阅读 下载PDF
A State-Migration Particle Swarm Optimizer for Adaptive Latent Factor Analysis of High-Dimensional and Incomplete Data
19
作者 Jiufang Chen Kechen Liu +4 位作者 Xin Luo Ye Yuan Khaled Sedraoui Yusuf Al-Turki MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第11期2220-2235,共16页
High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation lear... High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation learning to an HDI matrix,whose hyper-parameter adaptation can be implemented through a particle swarm optimizer(PSO) to meet scalable requirements.However, conventional PSO is limited by its premature issues,which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer(SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency.Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices. 展开更多
关键词 data science generalized momentum high-dimensional and incomplete(HDI)data hyper-parameter adaptation latent factor analysis(LFA) particle swarm optimization(PSO)
在线阅读 下载PDF
上一页 1 2 40 下一页 到第
使用帮助 返回顶部