期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Similarity measure design for high dimensional data 被引量:3
1
作者 LEE Sang-hyuk YAN Sun +1 位作者 JEONG Yoon-su SHIN Seung-soo 《Journal of Central South University》 SCIE EI CAS 2014年第9期3534-3540,共7页
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ... Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667. 展开更多
关键词 high dimensional data similarity measure DIFFERENCE neighborhood information financial fraud
在线阅读 下载PDF
Asymptotic Independence of the Quadratic Form and Maximum of Independent Random Variables with Applications to High-Dimensional Tests
2
作者 Da Chuan CHEN Long FENG De Cai LIANG 《Acta Mathematica Sinica,English Series》 SCIE CSCD 2024年第12期3093-3126,共34页
This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical resu... This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical result,we find the asymptotic joint distribution for the quadratic form and maximum,which can be applied into the high-dimensional testing problems.By combining the sum-type test and the max-type test,we propose the Fisher’s combination tests for the one-sample mean test and two-sample mean test.Under this novel general framework,several strong assumptions in existing literature have been relaxed.Monte Carlo simulation has been done which shows that our proposed tests are strongly robust to both sparse and dense data. 展开更多
关键词 Asymptotic independence high dimensional data large p small n one-sample test two-sample test
原文传递
Special Issue for “AI+BT for Big Clinical Omics Data”
3
《Genomics, Proteomics & Bioinformatics》 2025年第1期I0018-I0018,共1页
The journal Genomics,Proteomics&Bioinformatics(GPB)invites leading scholars to contribute high-quality manuscripts for a special issue on“AI+BT for Big Clinical Omics Data”scheduled for publication in the Autumn... The journal Genomics,Proteomics&Bioinformatics(GPB)invites leading scholars to contribute high-quality manuscripts for a special issue on“AI+BT for Big Clinical Omics Data”scheduled for publication in the Autumn of 2026.This special issue seeks submissions that focus on integrating artificial intelligence(AI)and biotechnologies(BT)to largely improve the collection,modelling,analysis,and application of large-scale clinical omics data.The goal is to address the challenges posed by the high-dimensional and dynamic nature of big clinical omics data and explore their potential to advance the diagnosis and treatment of complex diseases. 展开更多
关键词 dynamic data diagnosis artificial intelligence ai complex diseases big clinical omics data BIOTECHNOLOGY artificial intelligence high dimensional data
原文传递
Uncovering the Pre-Deterioration State during Disease Progression Based on Sample-Specific Causality Network Entropy(SCNE)
4
作者 Jiayuan Zhong Hui Tang +4 位作者 Ziyi Huang Hua Chai Fei Ling Pei Chen Rui Liu 《Research》 2025年第1期55-67,共13页
Complex diseases do not always follow gradual progressions.Instead,they may experience sudden shifts known as critical states or tipping points,where a marked qualitative change occurs.Detecting such a pivotal transit... Complex diseases do not always follow gradual progressions.Instead,they may experience sudden shifts known as critical states or tipping points,where a marked qualitative change occurs.Detecting such a pivotal transition or pre-deterioration state holds paramount importance due to its association with severe disease deterioration.Nevertheless,the task of pinpointing the pre-deterioration state for complex diseases remains an obstacle,especially in scenarios involving high-dimensional data with limited samples,where conventional statistical methods frequently prove inadequate.In this study,we introduce an innovative quantitative approach termed sample-specific causality network entropy(SCNE),which infers a sample-specific causality network for each individual and effectively quantifies the dynamic alterations in causal relations among molecules,thereby capturing critical points or pre-deterioration states of complex diseases.We substantiated the accuracy and efficacy of our approach via numerical simulations and by examining various real-world datasets,including single-cell data of epithelial cell deterioration(EPCD)in colorectal cancer,influenza infection data,and three different tumor cases from The Cancer Genome Atlas(TCGA)repositories.Compared to other existing six single-sample methods,our proposed approach exhibits superior performance in identifying critical signals or pre-deterioration states.Additionally,the efficacy of computational findings is underscored by analyzing the functionality of signaling biomarkers. 展开更多
关键词 critical states sample specific causality network entropy pre deterioration state complex diseases tipping points high dimensional data numerical simulations
原文传递
Geo-Coordinated Parallel Coordinates (GCPC): Field trial studies of environmental data analysis
5
作者 Maha El Meseery Orland Hoeber 《Visual Informatics》 EI 2018年第2期111-124,共14页
The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people an... The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people and the environment in which we live.Such datasets are often high dimensional and heterogeneous in nature,with complex geospatial relations.Analysing such data can be challenging,especially when there is a need to maintain spatial awareness as the non-spatial attributes are studied.Geo-Coordinated Parallel Coordinates(GCPC)is a geovisual analytics approach designed to support exploration and analysis within complex geospatial environmental data.Parallel coordinates are tightly coupled with a geospatial representation and an investigative scatterplot,all of which can be used to show,reorganize,filter,and highlight the high dimensional,heterogeneous,and geospatial aspects of the data.Two sets of field trials were conducted with expert data analysts to validate the real-world benefits of the approach for studying environmental data.The results of these evaluations were positive,providing real-world evidence and new insights regarding the value of using GCPC to explore among environmental datasets when there is a need to remain aware of the geospatial aspects of the data as the non-spatial elements are studied. 展开更多
关键词 Geovisual analytics Heterogeneous data visualization high dimensional data visualization Field trial evaluations
原文传递
Identifying the skeptics and the undecided through visual cluster analysis of local network geometry 被引量:1
6
作者 Shenghui Cheng Joachim Giesen +2 位作者 Tianyi Huang Philipp Lucas Klaus Mueller 《Visual Informatics》 EI 2022年第3期11-22,共12页
By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or a... By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes. 展开更多
关键词 Graph/network data high dimensional data visualization Visualization in social and information sciences data clustering coordinated and multiple VIEWS
原文传递
Subspace clustering through attribute clustering
7
作者 Kun NIU Shubo ZHANG Junliang CHEN 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2008年第1期44-48,共5页
Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the ... Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters. 展开更多
关键词 subspace clustering high dimensional data attribute clustering
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部