After a review of recent developments in precision medicine, population health sciences and innovative clinical trial designs, and in health economics and policy, we show how innovations in health analytics can capita...After a review of recent developments in precision medicine, population health sciences and innovative clinical trial designs, and in health economics and policy, we show how innovations in health analytics can capitalize on the advances in biomedicine and health economics towards developing a data-driven and cost-effective 21<sup>st</sup> century health care system. In particular, we propose a mutually beneficial public-private partnership that combines individual responsibility with community solidarity in building this health care system.展开更多
Let X1,X2,... be a sequence of independent random variables (r.v.s) belonging to the domain of attraction of a normal or stable law. In this paper, we study moderate deviations for the self-normalized sum n X ∑^n_i...Let X1,X2,... be a sequence of independent random variables (r.v.s) belonging to the domain of attraction of a normal or stable law. In this paper, we study moderate deviations for the self-normalized sum n X ∑^n_i=1Xi/Vm,p ,where Vn,p (∑^n_i=1|Xi|p)^1/p (P 〉 1).Applications to the self-normalized law of the iteratedlogarithm, Studentized increments of partial sums, t-statistic, and weighted sum of independent and identically distributed (i.i.d.) r.v.s are considered.展开更多
The survival analysis literature has always lagged behind the categorical data literature in developing methods to analyze clustered or multivariate data. While estimators based on
Although deep learning methods have recently attracted considerable attention in the medical field,analyzing large-scale electronic health record data is still a difficult task.In particular,the accurate recognition o...Although deep learning methods have recently attracted considerable attention in the medical field,analyzing large-scale electronic health record data is still a difficult task.In particular,the accurate recognition of heart failure is a key technology for doctors to make reasonable treatment decisions.This study uses data from the Medical Information Mart for Intensive Care database.Compared with structured data,unstructured data contain abundant patient information.However,this type of data has unsatisfactory characteristics,e.g.,many colloquial vocabularies and sparse content.To solve these problems,we propose the KTI-RNN model for unstructured data recognition.The proposed model overcomes sparse content and obtains good classification results.The term frequency-inverse word frequency(TF-IWF)model is used to extract the keyword set.The latent dirichlet allocation(LDA)model is adopted to extract the topic word set.These models enable the expansion of the medical record text content.Finally,we embed the global attention mechanism and gating mechanism between the bidirectional recurrent neural network(BiRNN)model and the output layer.We call it gated-attention-BiRNN(GA-BiRNN)and use it to identify heart failure from extensive medical texts.Results show that the F 1 score of the proposed KTI-RNN model is 85.57%,and the accuracy rate of the proposed KTI-RNN model is 85.59%.展开更多
In this paper, we investigate the two sample U-statistics by jackknife empirical likelihood(JEL),a versatile nonparametric approach. More precisely, we propose the method of balanced augmented jackknife empirical like...In this paper, we investigate the two sample U-statistics by jackknife empirical likelihood(JEL),a versatile nonparametric approach. More precisely, we propose the method of balanced augmented jackknife empirical likelihood(BAJEL) by adding two artificial points to the original pseudo-value dataset, and we prove that the log likelihood ratio based on the expanded dataset tends to the χ~2 distribution.展开更多
For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structur...For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures. Little work, however, has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures. In this paper we study this challenging problem via extending the famous Scheffe’s transformation method, which reduces the k-sample BF problem to a one-sample problem. The induced one-sample problem can be easily tested by the classical Hotelling’s T 2 test when the size of the resulting sample is very large relative to its dimensionality. For high dimensional data, however, the dimensionality of the resulting sample is often very large, and even much larger than its sample size, which makes the classical Hotelling’s T 2 test not powerful or not even well defined. To overcome this difficulty, we propose and study an L 2-norm based test. The asymptotic powers of the proposed L 2-norm based test and Hotelling’s T 2 test are derived and theoretically compared. Methods for implementing the L 2-norm based test are described. Simulation studies are conducted to compare the L 2-norm based test and Hotelling’s T 2 test when the latter can be well defined, and to compare the proposed implementation methods for the L 2-norm based test otherwise. The methodologies are motivated and illustrated by a real data example.展开更多
We consider the estimation of three-dimensional ROC surfaces for continuous tests given covariates.Three way ROC analysis is important in our motivating example where patients with Alzheimer's disease are usually ...We consider the estimation of three-dimensional ROC surfaces for continuous tests given covariates.Three way ROC analysis is important in our motivating example where patients with Alzheimer's disease are usually classified into three categories and should receive different category-specific medical treatment.There has been no discussion on how covariates affect the three way ROC analysis.We propose a regression framework induced from the relationship between test results and covariates.We consider several practical cases and the corresponding inference procedures.Simulations are conducted to validate our methodology.The application on the motivating example illustrates clearly the age and sex effects on the accuracy for Mini-Mental State Examination of Alzheimer's disease.展开更多
We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measu...We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measures such as sensitivity,specificity and area under the ROC curve are no longer applicable.In recent literature,new diagnostic accuracy measures are introduced in medical research studies.In this paper,important statistical concepts for multi-category classification accuracy are reviewed and their utilities are demonstrated with real medical examples.We offer problem-based R code to illustrate how to perform these statistical computations step by step.We expect such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.Our program can be adapted to many classifiers among which logistic regression may be the most popular approach.We thus base our discussion and illustration completely on the logistic regression in this paper.展开更多
This paper establishes a new framework for assessing multimodal statistical causality between cryptocurrency market(cryptomarket)sentiment and cryptocurrency price processes.In order to achieve this,we present an effi...This paper establishes a new framework for assessing multimodal statistical causality between cryptocurrency market(cryptomarket)sentiment and cryptocurrency price processes.In order to achieve this,we present an efficient algorithm for multimodal statistical causality analysis based on Multiple-Output Gaussian Processes.Signals from different information sources(modalities)are jointly modelled as a Multiple-Output Gaussian Process,and then using a novel approach to statistical causality based on Gaussian Processes(GPs),we study linear and non-linear causal effects between the different modalities.We demonstrate the effectiveness of our approach in a machine learning application by studying the relationship between cryptocurrency spot price dynamics and sentiment time-series data specific to the crypto sector,which we conjecture influences retail investor behaviour.The investor sentiment is extracted from cryptomarket news data via methods developed in the area of statistical machine learning known as Natural Language Processing(NLP).To capture sentiment,we present a novel framework for text to time-series embedding,which we then use to construct a sentiment index from publicly available news articles.We conduct a statistical analysis of our sentiment statistical index model and compare it to alternative state-of-the-art sentiment models popular in the NLP literature.In regard to the multimodal causality,the investor sentiment is our primary modality of exploration,in addition to price and a blockchain technologyrelated indicator(hash rate).Analysis shows that our approach is effective in modelling causal structures of variable degree of complexity between heterogeneous data sources and illustrates the impact that certain modelling choices for the different modalities can have on detecting causality.A solid understanding of these factors is necessary to gauge cryptocurrency adoption by retail investors and provide sentiment-and technologybased insights about the cryptocurrency market dynamics.展开更多
New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position ...New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position is established under the alternative hypothesis. The asymptotic distribution of the new statistic in testing the existence of a change point is obtained under the null hypothesis. Some simulation results are presented which show that the numerical performance of our method is satisfactory. The method is illustrated via the analysis of the house price index of US.展开更多
文摘After a review of recent developments in precision medicine, population health sciences and innovative clinical trial designs, and in health economics and policy, we show how innovations in health analytics can capitalize on the advances in biomedicine and health economics towards developing a data-driven and cost-effective 21<sup>st</sup> century health care system. In particular, we propose a mutually beneficial public-private partnership that combines individual responsibility with community solidarity in building this health care system.
基金supported by Hong Kong Research Grant Committee (Grant Nos.HKUST6019/10P and HKUST6019/12P)National Natural Science Foundation of China (Grant Nos. 10871146 and 11271286)the National University of Singapore (Grant No. R-155-000-106-112)
文摘Let X1,X2,... be a sequence of independent random variables (r.v.s) belonging to the domain of attraction of a normal or stable law. In this paper, we study moderate deviations for the self-normalized sum n X ∑^n_i=1Xi/Vm,p ,where Vn,p (∑^n_i=1|Xi|p)^1/p (P 〉 1).Applications to the self-normalized law of the iteratedlogarithm, Studentized increments of partial sums, t-statistic, and weighted sum of independent and identically distributed (i.i.d.) r.v.s are considered.
文摘The survival analysis literature has always lagged behind the categorical data literature in developing methods to analyze clustered or multivariate data. While estimators based on
基金supported by the National Major Scientific Research Instrument Development Project (No.62027819):High-Speed Real-Time Analyzer for Laser Chip’s Optical Catastrophic Damage Processthe General Object of the National Natural Science Foundation (No.62076177):Study on the Risk Assessment Model of Heart Failure by Integrating Multi-Modal Big DataShanxi Province Key Technology and Generic Technology R&D Project (No.2020XXX007):Energy Internet Integrated Intelligent Data Management and Decision Support Platform.
文摘Although deep learning methods have recently attracted considerable attention in the medical field,analyzing large-scale electronic health record data is still a difficult task.In particular,the accurate recognition of heart failure is a key technology for doctors to make reasonable treatment decisions.This study uses data from the Medical Information Mart for Intensive Care database.Compared with structured data,unstructured data contain abundant patient information.However,this type of data has unsatisfactory characteristics,e.g.,many colloquial vocabularies and sparse content.To solve these problems,we propose the KTI-RNN model for unstructured data recognition.The proposed model overcomes sparse content and obtains good classification results.The term frequency-inverse word frequency(TF-IWF)model is used to extract the keyword set.The latent dirichlet allocation(LDA)model is adopted to extract the topic word set.These models enable the expansion of the medical record text content.Finally,we embed the global attention mechanism and gating mechanism between the bidirectional recurrent neural network(BiRNN)model and the output layer.We call it gated-attention-BiRNN(GA-BiRNN)and use it to identify heart failure from extensive medical texts.Results show that the F 1 score of the proposed KTI-RNN model is 85.57%,and the accuracy rate of the proposed KTI-RNN model is 85.59%.
基金supported by the Natural Science Foundation of Guangdong Province(Grant No.2016A030307019)the Higher Education Colleges and Universities Innovation Strong School Project of Guangdong Province(Grant No.2016KTSCX153)+2 种基金Science and Technology Development Fund of Macao(Grant No.127/2016/A3)National Natural Science Foundation of China(Grant No.11401607)a grant at the National University of Singapore(Grant No.R-155-000-181-114)
文摘In this paper, we investigate the two sample U-statistics by jackknife empirical likelihood(JEL),a versatile nonparametric approach. More precisely, we propose the method of balanced augmented jackknife empirical likelihood(BAJEL) by adding two artificial points to the original pseudo-value dataset, and we prove that the log likelihood ratio based on the expanded dataset tends to the χ~2 distribution.
基金supported by the National University of Singapore Academic Research Grant (Grant No. R-155-000-085-112)
文摘For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures. Little work, however, has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures. In this paper we study this challenging problem via extending the famous Scheffe’s transformation method, which reduces the k-sample BF problem to a one-sample problem. The induced one-sample problem can be easily tested by the classical Hotelling’s T 2 test when the size of the resulting sample is very large relative to its dimensionality. For high dimensional data, however, the dimensionality of the resulting sample is often very large, and even much larger than its sample size, which makes the classical Hotelling’s T 2 test not powerful or not even well defined. To overcome this difficulty, we propose and study an L 2-norm based test. The asymptotic powers of the proposed L 2-norm based test and Hotelling’s T 2 test are derived and theoretically compared. Methods for implementing the L 2-norm based test are described. Simulation studies are conducted to compare the L 2-norm based test and Hotelling’s T 2 test when the latter can be well defined, and to compare the proposed implementation methods for the L 2-norm based test otherwise. The methodologies are motivated and illustrated by a real data example.
基金support provided by the National Alzheimer's Coordinating Center(NACC)supported by National University of Singapore Academic Research Funding(Grant No.R-155-000-109-112)+2 种基金a CBRG grant from the National Medical Research Council in Singapore,NACC(Grant No.U01AG16976)the National Institute of Health(Grant No.R01EB005829)National Natural Science Foundation of China(Grant No.30728019)
文摘We consider the estimation of three-dimensional ROC surfaces for continuous tests given covariates.Three way ROC analysis is important in our motivating example where patients with Alzheimer's disease are usually classified into three categories and should receive different category-specific medical treatment.There has been no discussion on how covariates affect the three way ROC analysis.We propose a regression framework induced from the relationship between test results and covariates.We consider several practical cases and the corresponding inference procedures.Simulations are conducted to validate our methodology.The application on the motivating example illustrates clearly the age and sex effects on the accuracy for Mini-Mental State Examination of Alzheimer's disease.
基金Li’s work was partially supported by National Medical Research Council in Singapore and AcRF R-155-000-174-114.NNSF[grant number 11371142].
文摘We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task.For qualitative response variables with more than two categories,many traditional accuracy measures such as sensitivity,specificity and area under the ROC curve are no longer applicable.In recent literature,new diagnostic accuracy measures are introduced in medical research studies.In this paper,important statistical concepts for multi-category classification accuracy are reviewed and their utilities are demonstrated with real medical examples.We offer problem-based R code to illustrate how to perform these statistical computations step by step.We expect such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.Our program can be adapted to many classifiers among which logistic regression may be the most popular approach.We thus base our discussion and illustration completely on the logistic regression in this paper.
基金Ioannis Chalkiadakis acknowledges the support of Heriot-Watt University through a James-Watt scholarship while undertaking this work.
文摘This paper establishes a new framework for assessing multimodal statistical causality between cryptocurrency market(cryptomarket)sentiment and cryptocurrency price processes.In order to achieve this,we present an efficient algorithm for multimodal statistical causality analysis based on Multiple-Output Gaussian Processes.Signals from different information sources(modalities)are jointly modelled as a Multiple-Output Gaussian Process,and then using a novel approach to statistical causality based on Gaussian Processes(GPs),we study linear and non-linear causal effects between the different modalities.We demonstrate the effectiveness of our approach in a machine learning application by studying the relationship between cryptocurrency spot price dynamics and sentiment time-series data specific to the crypto sector,which we conjecture influences retail investor behaviour.The investor sentiment is extracted from cryptomarket news data via methods developed in the area of statistical machine learning known as Natural Language Processing(NLP).To capture sentiment,we present a novel framework for text to time-series embedding,which we then use to construct a sentiment index from publicly available news articles.We conduct a statistical analysis of our sentiment statistical index model and compare it to alternative state-of-the-art sentiment models popular in the NLP literature.In regard to the multimodal causality,the investor sentiment is our primary modality of exploration,in addition to price and a blockchain technologyrelated indicator(hash rate).Analysis shows that our approach is effective in modelling causal structures of variable degree of complexity between heterogeneous data sources and illustrates the impact that certain modelling choices for the different modalities can have on detecting causality.A solid understanding of these factors is necessary to gauge cryptocurrency adoption by retail investors and provide sentiment-and technologybased insights about the cryptocurrency market dynamics.
基金supported by National Natural Science Foundation of China (Grant No. 11571337)the Ministry of Education of Singapore (Grant No. # ARC 14/11)the National University of Singapore (Grant No. R-155-151-112)
文摘New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position is established under the alternative hypothesis. The asymptotic distribution of the new statistic in testing the existence of a change point is obtained under the null hypothesis. Some simulation results are presented which show that the numerical performance of our method is satisfactory. The method is illustrated via the analysis of the house price index of US.