Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiase...Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.展开更多
To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed p...To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.展开更多
Nonlinear transforms have significantly advanced learned image compression(LIC),particularly using residual blocks.This transform enhances the nonlinear expression ability and obtain compact feature representation by ...Nonlinear transforms have significantly advanced learned image compression(LIC),particularly using residual blocks.This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field,which indicates how the convolution process extracts features in a high dimensional feature space.However,its functionality is restricted to the spatial dimension and network depth,limiting further improvements in network performance due to insufficient information interaction and representation.Crucially,the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped.In this paper,we consider nonlinear transforms from the perspective of feature space,defining high-dimensional feature spaces in different dimensions and investigating the specific effects.Firstly,we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction.Secondly,we design a channel-spatial fusion residual transform(CSR),which incorporates multi-dimensional transforms for a more effective representation.Furthermore,we simplify the proposed fusion transform to obtain a slim architecture(CSR-sm),balancing network complexity and compression performance.Finally,we build the overall network with stacked CSR transforms to achieve better compression and reconstruction.Experimental results demonstrate that the proposed method can achieve superior ratedistortion performance compared to the existing LIC methods and traditional codecs.Specifically,our proposed method achieves 9.38%BD-rate reduction over VVC on Kodak dataset.展开更多
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp...An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity,...In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity, but also saving storage space. The security analysis is also given.展开更多
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ...Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.展开更多
Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper...Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.展开更多
We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann...We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.展开更多
In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. B...In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.展开更多
Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the co...Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the collinearity problem. Combining the strengths of nonconvex penalties and ridge regression(abbreviated as NPR), we study the oracle property of the NPR estimator in high dimensional settings with highly correlated predictors, where the dimensionality of covariates pn is allowed to increase exponentially with the sample size n. Simulation studies and a real data example are presented to verify the performance of the NPR method.展开更多
Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-...Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework.展开更多
A large-scale dynamically weighted directed network(DWDN)involving numerous entities and massive dynamic interaction is an essential data source in many big-data-related applications,like in a terminal interaction pat...A large-scale dynamically weighted directed network(DWDN)involving numerous entities and massive dynamic interaction is an essential data source in many big-data-related applications,like in a terminal interaction pattern analysis system(TIPAS).It can be represented by a high-dimensional and incomplete(HDI)tensor whose entries are mostly unknown.Yet such an HDI tensor contains a wealth knowledge regarding various desired patterns like potential links in a DWDN.A latent factorization-of-tensors(LFT)model proves to be highly efficient in extracting such knowledge from an HDI tensor,which is commonly achieved via a stochastic gradient descent(SGD)solver.However,an SGD-based LFT model suffers from slow convergence that impairs its efficiency on large-scale DWDNs.To address this issue,this work proposes a proportional-integralderivative(PID)-incorporated LFT model.It constructs an adjusted instance error based on the PID control principle,and then substitutes it into an SGD solver to improve the convergence rate.Empirical studies on two DWDNs generated by a real TIPAS show that compared with state-of-the-art models,the proposed model achieves significant efficiency gain as well as highly competitive prediction accuracy when handling the task of missing link prediction for a given DWDN.展开更多
It is difficult for the double suppression division algorithm of bee colony to solve the spatio-temporal coupling or have higher dimensional attributes and undertake sudden tasks.Using the idea of clustering,after clu...It is difficult for the double suppression division algorithm of bee colony to solve the spatio-temporal coupling or have higher dimensional attributes and undertake sudden tasks.Using the idea of clustering,after clustering tasks according to spatio-temporal attributes,the clustered groups are linked into task sub-chains according to similarity.Then,based on the correlation between clusters,the child chains are connected to form a task chain.Therefore,the limitation is solved that the task chain in the bee colony algorithm can only be connected according to one dimension.When a sudden task occurs,a method of inserting a small number of tasks into the original task chain and a task chain reconstruction method are designed according to the relative relationship between the number of sudden tasks and the number of remaining tasks.Through the above improvements,the algorithm can be used to process tasks with spatio-temporal coupling and burst tasks.In order to reflect the efficiency and applicability of the algorithm,a task allocation model for the unmanned aerial vehicle(UAV)group is constructed,and a one-to-one correspondence between the improved bee colony double suppression division algorithm and each attribute in the UAV group is proposed.Task assignment has been constructed.The study uses the self-adjusting characteristics of the bee colony to achieve task allocation.Simulation verification and algorithm comparison show that the algorithm has stronger planning advantages and algorithm performance.展开更多
Feature selection is an important problem in pattern classification systems. High dimension fisher criterion(HDF) is a good indicator of class separability. However, calculating the high dimension fisher ratio is di...Feature selection is an important problem in pattern classification systems. High dimension fisher criterion(HDF) is a good indicator of class separability. However, calculating the high dimension fisher ratio is difficult. A new feature selection method, called fisher-and-correlation (FC), is proposed. The proposed method is combining fisher criterion and correlation criterion based on the analysis of feature relevance and redundancy. The proposed methodology is tested in five different classification applications. The presented resuits confirm that FC performs as well as HDF does at much lower computational complexity.展开更多
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithm...Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.展开更多
Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been i...Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been implemented to represent the chemical structure in machine learning models,among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018.However,for small datasets,the use of chemical structure representations typically increases the dimensionality of the input dataset,resulting in a decrease in model performance.Furthermore,the limited diversity of polymer chemical structures hinders the training of reliable embeddings,necessitating complex task-specific architecture implementations.To address these challenges,we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers.This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers.The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.展开更多
This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimato...This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimator satisfies a fixed point equation.The convergence property of the proposed algorithm is established.Numerical studies are conducted to evaluate the finite sample performance of the proposed algorithm.展开更多
The current study proposes a novel technique for feature selection by inculcating robustness in the conventional Signal to noise Ratio(SNR).The proposed method utilizes the robust measures of location i.e.,the“Median...The current study proposes a novel technique for feature selection by inculcating robustness in the conventional Signal to noise Ratio(SNR).The proposed method utilizes the robust measures of location i.e.,the“Median”as well as the measures of variation i.e.,“Median absolute deviation(MAD)and Interquartile range(IQR)”in the SNR.By this way,two independent robust signal-to-noise ratios have been proposed.The proposed method selects the most informative genes/features by combining the minimum subset of genes or features obtained via the greedy search approach with top-ranked genes selected through the robust signal-to-noise ratio(RSNR).The results obtained via the proposed method are compared with wellknown gene/feature selection methods on the basis of performance metric i.e.,classification error rate.A total of 5 gene expression datasets have been used in this study.Different subsets of informative genes are selected by the proposed and all the other methods included in the study,and their efficacy in terms of classification is investigated by using the classifier models such as support vector machine(SVM),Random forest(RF)and k-nearest neighbors(k-NN).The results of the analysis reveal that the proposed method(RSNR)produces minimum error rates than all the other competing feature selection methods in majority of the cases.For further assessment of the method,a detailed simulation study is also conducted.展开更多
This paper reviews the adaptive sparse grid discontinuous Galerkin(aSG-DG)method for computing high dimensional partial differential equations(PDEs)and its software implementation.The C++software package called AdaM-D...This paper reviews the adaptive sparse grid discontinuous Galerkin(aSG-DG)method for computing high dimensional partial differential equations(PDEs)and its software implementation.The C++software package called AdaM-DG,implementing the aSG-DG method,is available on GitHub at https://github.com/JuntaoHuang/adaptive-multiresolution-DG.The package is capable of treating a large class of high dimensional linear and nonlinear PDEs.We review the essential components of the algorithm and the functionality of the software,including the multiwavelets used,assembling of bilinear operators,fast matrix-vector product for data with hierarchical structures.We further demonstrate the performance of the package by reporting the numerical error and the CPU cost for several benchmark tests,including linear transport equations,wave equations,and Hamilton-Jacobi(HJ)equations.展开更多
文摘Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.
基金This work is partially supported by grants from the Key Program of National Natural Science Foundation of China(NSFC Nos.71631005 and 71731009)the Major Program of the National Social Science Foundation of China(No.19ZDA103).
文摘To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.62031013)Guangdong Province Key Construction Discipline Scientific Research Capacity Improvement Project(Grant No.2022ZDJS117).
文摘Nonlinear transforms have significantly advanced learned image compression(LIC),particularly using residual blocks.This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field,which indicates how the convolution process extracts features in a high dimensional feature space.However,its functionality is restricted to the spatial dimension and network depth,limiting further improvements in network performance due to insufficient information interaction and representation.Crucially,the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped.In this paper,we consider nonlinear transforms from the perspective of feature space,defining high-dimensional feature spaces in different dimensions and investigating the specific effects.Firstly,we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction.Secondly,we design a channel-spatial fusion residual transform(CSR),which incorporates multi-dimensional transforms for a more effective representation.Furthermore,we simplify the proposed fusion transform to obtain a slim architecture(CSR-sm),balancing network complexity and compression performance.Finally,we build the overall network with stacked CSR transforms to achieve better compression and reconstruction.Experimental results demonstrate that the proposed method can achieve superior ratedistortion performance compared to the existing LIC methods and traditional codecs.Specifically,our proposed method achieves 9.38%BD-rate reduction over VVC on Kodak dataset.
文摘An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金Project supported by the National Fundamental Research Program (Grant No 001CB309308), China National Natural Science Foundation (Grant Nos 60433050, 10325521, 10447106), the Hang-Tian Science Fund, the SRFDP program of Education Ministry of China and Beijing Education Committee (Grant No XK100270454).
文摘In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity, but also saving storage space. The security analysis is also given.
基金Project(RDF 11-02-03)supported by the Research Development Fund of XJTLU,China
文摘Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
基金Supported by the National Key Research and Development Program of China(2016YFC0100300)the National Natural Science Foundation of China(61402371,61771369)+1 种基金the Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6008)the Fundamental Research Funds for the Central Universities of China(3102017zy032,3102018zy020)
文摘Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.
基金Supported by the National Natural Science Foundation of China(Grant No.11601140,11401082,11701260)Program funded by Education Department of Liaoning Province(Grant No.LN2019Q15).
文摘We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.
基金supported by the Natural Science Foundation of China under Grant Nos.60804008,61174048and 11071263the Fundamental Research Funds for the Central Universities and Guangdong Province Key Laboratory of Computational Science at Sun Yat-Sen University
文摘In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.
基金Supported by the National Natural Science Foundation of China(Grant No.11401340)China Postdoctoral Science Foundation(Grant No.2014M561892)+1 种基金the Foundation of Qufu Normal University(Grant Nos.bsqd2012041xkj201304)
文摘Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the collinearity problem. Combining the strengths of nonconvex penalties and ridge regression(abbreviated as NPR), we study the oracle property of the NPR estimator in high dimensional settings with highly correlated predictors, where the dimensionality of covariates pn is allowed to increase exponentially with the sample size n. Simulation studies and a real data example are presented to verify the performance of the NPR method.
基金supported by the National Natural Science Foundation of China(62272078)Chongqing Natural Science Foundation(CSTB2023NSCQ-LZX0069)the Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202300210)
文摘Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework.
基金supported in part by the National Natural Science Foundation of China(61772493)the CAAI-Huawei MindSpore Open Fund(CAAIXSJLJJ-2020-004B)+4 种基金in part by the Natural Science Foundation of Chongqing of China(cstc2019jcyjjq X0013)in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciencesin part by the Deanship of Scientific Research(DSR)at King Abdulaziz UniversityJeddahSaudi Arabia(FP-165-43)。
文摘A large-scale dynamically weighted directed network(DWDN)involving numerous entities and massive dynamic interaction is an essential data source in many big-data-related applications,like in a terminal interaction pattern analysis system(TIPAS).It can be represented by a high-dimensional and incomplete(HDI)tensor whose entries are mostly unknown.Yet such an HDI tensor contains a wealth knowledge regarding various desired patterns like potential links in a DWDN.A latent factorization-of-tensors(LFT)model proves to be highly efficient in extracting such knowledge from an HDI tensor,which is commonly achieved via a stochastic gradient descent(SGD)solver.However,an SGD-based LFT model suffers from slow convergence that impairs its efficiency on large-scale DWDNs.To address this issue,this work proposes a proportional-integralderivative(PID)-incorporated LFT model.It constructs an adjusted instance error based on the PID control principle,and then substitutes it into an SGD solver to improve the convergence rate.Empirical studies on two DWDNs generated by a real TIPAS show that compared with state-of-the-art models,the proposed model achieves significant efficiency gain as well as highly competitive prediction accuracy when handling the task of missing link prediction for a given DWDN.
基金This work was supported by the National Natural Science and Technology Innovation 2030 Major Project of Ministry of Science and Technology of China(2018AAA0101200)the National Natural Science Foundation of China(61502522,61502534)+4 种基金the Equipment Pre-Research Field Fund(JZX7Y20190253036101)the Equipment Pre-Research Ministry of Education Joint Fund(6141A02033703)Shaanxi Provincial Natural Science Foundation(2020JQ-493)the Military Science Project of the National Social Science Fund(WJ2019-SKJJ-C-092)the Theoretical Research Foundation of Armed Police Engineering University(WJY202148).
文摘It is difficult for the double suppression division algorithm of bee colony to solve the spatio-temporal coupling or have higher dimensional attributes and undertake sudden tasks.Using the idea of clustering,after clustering tasks according to spatio-temporal attributes,the clustered groups are linked into task sub-chains according to similarity.Then,based on the correlation between clusters,the child chains are connected to form a task chain.Therefore,the limitation is solved that the task chain in the bee colony algorithm can only be connected according to one dimension.When a sudden task occurs,a method of inserting a small number of tasks into the original task chain and a task chain reconstruction method are designed according to the relative relationship between the number of sudden tasks and the number of remaining tasks.Through the above improvements,the algorithm can be used to process tasks with spatio-temporal coupling and burst tasks.In order to reflect the efficiency and applicability of the algorithm,a task allocation model for the unmanned aerial vehicle(UAV)group is constructed,and a one-to-one correspondence between the improved bee colony double suppression division algorithm and each attribute in the UAV group is proposed.Task assignment has been constructed.The study uses the self-adjusting characteristics of the bee colony to achieve task allocation.Simulation verification and algorithm comparison show that the algorithm has stronger planning advantages and algorithm performance.
基金the Ministerial Level Advanced Research Foundation(66830202)
文摘Feature selection is an important problem in pattern classification systems. High dimension fisher criterion(HDF) is a good indicator of class separability. However, calculating the high dimension fisher ratio is difficult. A new feature selection method, called fisher-and-correlation (FC), is proposed. The proposed method is combining fisher criterion and correlation criterion based on the analysis of feature relevance and redundancy. The proposed methodology is tested in five different classification applications. The presented resuits confirm that FC performs as well as HDF does at much lower computational complexity.
基金supported by the National Natural Science Foundation of China (No. 61175052,60975039, 61203297, 60933004, 61035003)National High-tech R&D Program of China (863 Program) (No.2012AA011003)supported by the ZTE research found of Parallel Web Mining project
文摘Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.
基金the framework of the program of state support for the centers of the National Technology Initiative(NTI)on the basis of educational institutions of higher education and scientific organizations(Center NTI"Digital Materials Science:New Materials and Substances"on the basis of the Bauman Moscow State Technical University).
文摘Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been implemented to represent the chemical structure in machine learning models,among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018.However,for small datasets,the use of chemical structure representations typically increases the dimensionality of the input dataset,resulting in a decrease in model performance.Furthermore,the limited diversity of polymer chemical structures hinders the training of reliable embeddings,necessitating complex task-specific architecture implementations.To address these challenges,we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers.This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers.The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.
基金Supported by the National Natural Science Foundation of China(11701571)
文摘This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimator satisfies a fixed point equation.The convergence property of the proposed algorithm is established.Numerical studies are conducted to evaluate the finite sample performance of the proposed algorithm.
基金King Saud University for funding this work through Researchers Supporting Project Number(RSP2022R426),King Saud University,Riyadh,Saudi Arabia.
文摘The current study proposes a novel technique for feature selection by inculcating robustness in the conventional Signal to noise Ratio(SNR).The proposed method utilizes the robust measures of location i.e.,the“Median”as well as the measures of variation i.e.,“Median absolute deviation(MAD)and Interquartile range(IQR)”in the SNR.By this way,two independent robust signal-to-noise ratios have been proposed.The proposed method selects the most informative genes/features by combining the minimum subset of genes or features obtained via the greedy search approach with top-ranked genes selected through the robust signal-to-noise ratio(RSNR).The results obtained via the proposed method are compared with wellknown gene/feature selection methods on the basis of performance metric i.e.,classification error rate.A total of 5 gene expression datasets have been used in this study.Different subsets of informative genes are selected by the proposed and all the other methods included in the study,and their efficacy in terms of classification is investigated by using the classifier models such as support vector machine(SVM),Random forest(RF)and k-nearest neighbors(k-NN).The results of the analysis reveal that the proposed method(RSNR)produces minimum error rates than all the other competing feature selection methods in majority of the cases.For further assessment of the method,a detailed simulation study is also conducted.
基金supported by the NSF grant DMS-2111383Air Force Office of Scientific Research FA9550-18-1-0257the NSF grant DMS-2011838.
文摘This paper reviews the adaptive sparse grid discontinuous Galerkin(aSG-DG)method for computing high dimensional partial differential equations(PDEs)and its software implementation.The C++software package called AdaM-DG,implementing the aSG-DG method,is available on GitHub at https://github.com/JuntaoHuang/adaptive-multiresolution-DG.The package is capable of treating a large class of high dimensional linear and nonlinear PDEs.We review the essential components of the algorithm and the functionality of the software,including the multiwavelets used,assembling of bilinear operators,fast matrix-vector product for data with hierarchical structures.We further demonstrate the performance of the package by reporting the numerical error and the CPU cost for several benchmark tests,including linear transport equations,wave equations,and Hamilton-Jacobi(HJ)equations.