We consider the fluctuation of eigenvalues in factor models and propose a new method for testing the model.Based on the characteristics of eigenvalues,variables of unknown distribution are transformed into statistics ...We consider the fluctuation of eigenvalues in factor models and propose a new method for testing the model.Based on the characteristics of eigenvalues,variables of unknown distribution are transformed into statistics of known distribution through randomization.The test statistic checks for breaks in the structure of factor models,including changes in factor loadings and increases in the number of factors.We give the results of simulation experiments and test the factor structure of the stock return data of China’s and U.S.stock markets from January 1,2017,to December 31,2019.Our method performs well in both simulations and real data.展开更多
Despite of the wide use of the factor models, the issue of determining the number of factors has not been resolved in the statistics literature. An ad hoc approach is to set the number of factors to be the number of e...Despite of the wide use of the factor models, the issue of determining the number of factors has not been resolved in the statistics literature. An ad hoc approach is to set the number of factors to be the number of eigenvalues of the data correlation matrix that are larger than one, and subsequent statistical analysis proceeds assuming the resulting factor number is correct. In this work, we study the relation between the number of such eigenvalues and the number of factors, and provide the if and only if conditions under which the two numbers are equal. We show that the equality only relies on the properties of the loading matrix of the factor model. Guided by the newly discovered condition, we further reveal how the model error affects the estimation of the number of factors.展开更多
Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-...Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework.展开更多
We forecast realized volatilities by developing a time-varying heterogeneous autoregressive(HAR)latent factor model with dynamic model average(DMA)and dynamic model selection(DMS)approaches.The number of latent factor...We forecast realized volatilities by developing a time-varying heterogeneous autoregressive(HAR)latent factor model with dynamic model average(DMA)and dynamic model selection(DMS)approaches.The number of latent factors is determined using Chan and Grant's(2016)deviation information criteria.The predictors in our model include lagged daily,weekly,and monthly volatility variables,the corresponding volatility factors,and a speculation variable.In addition,the time-varying properties of the best-performing DMA(DMS)-HAR-2FX models,including size,inclusion probabilities,and coefficients,are examined.We find that the proposed DMA(DMS)-HAR-2FX model outperforms the competing models for both in-sample and out-of-sample forecasts.Furthermore,the speculation variable displays strong predictability for forecasting the realized volatility of financial futures in China.展开更多
Latent factor models have become a workhorse for a large number of recommender systems. While these sys- tems are built using ratings data, which is typically assumed static, the ability to incorporate different kinds...Latent factor models have become a workhorse for a large number of recommender systems. While these sys- tems are built using ratings data, which is typically assumed static, the ability to incorporate different kinds of subsequent user feedback is an important asset. For instance, the user might want to provide additional information to the system in order to improve his personal recommendations. To this end, we examine a novel scheme for efficiently learning (or refining) user parameters from such feedback. We propose a scheme where users are presented with a sequence of pair- wise preference questions: "Do you prefer item A over B?" User parameters are updated based on their response, and subsequent questions are chosen adaptively after incorporat- ing the feedback. We operate in a Bayesian framework and the choice of questions is based on an information gain cri- terion. We validate the scheme on the Netflix movie ratings data set and a proprietary television viewership data set. A user study and automated experiments validate our findings.展开更多
This paper provides a survey on recent developments in structural changes for high dimensional factor models. Compared with conventional low-dimensional time series, structural changes in factor models are more compli...This paper provides a survey on recent developments in structural changes for high dimensional factor models. Compared with conventional low-dimensional time series, structural changes in factor models are more complicated due to the unobservability of factors and factor loadings. The following topics are covered in this survey: the identification conditions for the structural changes in the factor loadings, different impacts of big and small breaks in factor models, tests for structural changes in the factor loadings of a specific variable, tests for structural changes in the factor loading matrix, joint tests for structural changes in the factor loadings and coefficients in factor-augmented regressions, tests for smooth changes in the factor loadings, estimation of break dates, and model selection in factor models with structural changes via the shrinkage method.展开更多
Linear factor models are familiar tools used in many fields.Several pioneering literatures established foundational theoretical results of the quasi-maximum likelihood estimator for high-dimensional linear factor mode...Linear factor models are familiar tools used in many fields.Several pioneering literatures established foundational theoretical results of the quasi-maximum likelihood estimator for high-dimensional linear factor models.Their results are based on a critical assumption:The error variance estimators are uniformly bounded in probability.Instead of making such an assumption,we provide a rigorous proof of this result under some mild conditions.展开更多
For the class of(partially specified)internal risk factor models we establish strongly simplified supermodular ordering results in comparison to the case of general risk factor models.This allows us to derive meaningf...For the class of(partially specified)internal risk factor models we establish strongly simplified supermodular ordering results in comparison to the case of general risk factor models.This allows us to derive meaningful and improved risk bounds for the joint portfolio in risk factor models with dependence information given by constrained specification sets for the copulas of the risk components and the systemic risk factor.The proof of our main comparison result is not standard.It is based on grid copula approximation of upper products of copulas and on the theory of mass transfers.An application to real market data shows considerable improvement over the standard method.展开更多
In the field of empirical asset pricing,the challenges of high dimensionality,non-linear relationships,and interaction effects have led to the increasing popularity of machine learning(ML)methods.This study investigat...In the field of empirical asset pricing,the challenges of high dimensionality,non-linear relationships,and interaction effects have led to the increasing popularity of machine learning(ML)methods.This study investigates the performance of ML methods when predicting different measures of stock returns from various factor models and investigates the feature importance and interaction effects among firm-specific variables and macroeconomic factors in this context.Our findings reveal that neural network models exhibit consistent performance across different stock return measures when they rely solely on firm-specific characteristic variables.However,the inclusion of macroeconomic factors from the financial market,real economic activities,and investor sentiment leads to substantial improvements in the model performance.Notably,the degree of improvement varies with the specific measures of stock returns under consideration.Furthermore,our analysis indicates that,after the inclusion of macroeconomic factors,there is a dissimilarity in model performance,variable importance,and interaction effects among macroeconomic and firm-specific variables,particularly concerning abnormal returns derived from the Fama–French three-and five-factor models compared with excess returns.This divergence is primarily attributed to the extent to which these factor models remove the variance associated with the macroeconomic variables.These findings collectively offer valuable insights into the efficacy of neural network models for stock return predictions and contribute to a deeper understanding of the intricate relationship between factor models,stock returns,and macroeconomic conditions in the domain of empirical asset pricing.展开更多
A weed is a plant that thrives in areas of human disturbance, such as gardens, fields, pastures, waysides, and waste places where it is not intentionally cultivated. Dispersal affects community dynamics and vegetation...A weed is a plant that thrives in areas of human disturbance, such as gardens, fields, pastures, waysides, and waste places where it is not intentionally cultivated. Dispersal affects community dynamics and vegetation response to global change. The process of seed disposal is influenced by wind, which plays a crucial role in determining the distance and probability of seed dispersal. Existing models of seed dispersal consider wind direction but fail to incorporate wind intensity. In this paper, a novel seed disposal model was proposed in this paper, incorporating wind intensity based on relevant references. According to various climatic conditions, including temperate, arid, and tropical regions, three specific regions were selected to establish a wind dispersal model that accurately reflects the density function distribution of dispersal distance. Additionally, dandelions growth is influenced by a multitude of factors, encompassing temperature, humidity, climate, and various environmental variables that necessitate meticulous consideration. Based on Factor Analysis model, which completely considers temperature, precipitation, solar radiation, wind, and land carrying capacity, a conclusion is presented, indicating that the growth of seeds is primarily influenced by plant attributes and climate conditions, with the former exerting a relatively stronger impact. Subsequently, the remaining two plants were chosen based on seed weight, yielding consistent conclusion.展开更多
In this editorial,we comment on the article by Chen et al.We specifically focus on the risk factors,prognostic factors,and management of brain metastasis(BM)in breast cancer(BC).BC is the second most common cancer to ...In this editorial,we comment on the article by Chen et al.We specifically focus on the risk factors,prognostic factors,and management of brain metastasis(BM)in breast cancer(BC).BC is the second most common cancer to have BM after lung cancer.Independent risk factors for BM in BC are:HER-2 positive BC,triplenegative BC,and germline BRCA mutation.Other factors associated with BM are lung metastasis,age less than 40 years,and African and American ancestry.Even though risk factors associated with BM in BC are elucidated,there is a lack of data on predictive models for BM in BC.Few studies have been made to formulate predictive models or nomograms to address this issue,where age,grade of tumor,HER-2 receptor status,and number of metastatic sites(1 vs>1)were predictive of BM in metastatic BC.However,none have been used in clinical practice.National Comprehensive Cancer Network recommends screening of BM in advanced BC only when the patient is symptomatic or suspicious of central nervous system symptoms;routine screening for BM in BC is not recommended in the guidelines.BM decreases the quality of life and will have a significant psychological impact.Further studies are required for designing validated nomograms or predictive models for BM in BC;these models can be used in the future to develop treatment approaches to prevent BM,which improves the quality of life and overall survival.展开更多
With the advancement of modern scientific research,multimodal data is increasingly being collected from multiple sources or types.For outcomes derived from generalized linear models with high-dimensional and multimoda...With the advancement of modern scientific research,multimodal data is increasingly being collected from multiple sources or types.For outcomes derived from generalized linear models with high-dimensional and multimodal covariates,we develop two distinct factor-adjusted tests to assess the significance of high-dimensional modality data and specific low-dimensional linear combinations of predictors from one or more modalities,respectively.First,we propose a factor-adjusted decorrelated score test to evaluate the significance of a single modality.This approach simultaneously transforms a high-dimensional test into a fixed low-dimensional one while addressing the impact of high-dimensional nuisance parameters.Second,we construct a factor-adjusted Wald test based on partial penalized estimation to assess the significance of certain low-dimensional combinations of variables from one or more modalities.The limiting distributions of these two proposed tests are analyzed under both the null hypothesis and local alternatives to characterize the asymptotic type-I errors and powers.The finite sample performance of our proposed tests is evaluated through simulations and further demonstrated with a breast cancer dataset.展开更多
Environmental problems from heavy metals(HMs)attract global attention.Accurately identifying sources and quantitatively evaluating ecological risks are keys for HMs pollution prevention.Dongting Lake in China was inve...Environmental problems from heavy metals(HMs)attract global attention.Accurately identifying sources and quantitatively evaluating ecological risks are keys for HMs pollution prevention.Dongting Lake in China was investigated through integrated methods like positive matrix factorization and Nemerow integrated risk index to examine spatial distribution,contamination characteristics,pollution sources,and the contribution of each source and pollutant to the ecological risk of 14 HMs in its surface sediments.Results showed that the mean concentrations of HMs were 0.82-9.44 times greater than the corresponding background values.The spatial distribution of HMs varied significantly,with high values of As,Cd,Mn,Pb,Sn,Tl and Zn concentrated in the sediments from Xiangjiang inlet and Yangtze outlet;Co,Cr,Cu,Ni and V in the Lishui sediments;Hg and Sb in the sediments from Yuanjiang and Zishui inlets,respectively.The accumulation of HMs was affected by five sources:mercury mining and atmospheric deposition(F1)(17.99%),urban domestic sewage and industrial sewage discharge(F2)(24.44%),antimony ore mining and smelting(F3)(6.50%),non-ferrous metal mining and extended processing industrial sources(F4)(15.72%),and mixed sources mainly from natural sources and agricultural sources(F5)(35.35%).F1 and F2 were identified as priority pollution sources;Cd,Hg,Tl,Sb and As,especially Cd and Hg,posed relatively high ecological risks and were prioritized HMs for control.展开更多
High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurat...High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurately represent them is of great significance.A latent factor(LF)model is one of the most popular and successful ways to address this issue.Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix,i.e.,they sum the errors between observed data and predicted ones with L2-norm.Yet L2-norm is sensitive to outlier data.Unfortunately,outlier data usually exist in such matrices.For example,an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users.To address this issue,this work proposes a smooth L1-norm-oriented latent factor(SL-LF)model.Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss,making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix.Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices.展开更多
Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterativ...Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost.Hence,determining how to accelerate the training process for LF models has become a significant issue.To address this,this work proposes a randomized latent factor(RLF)model.It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices,thereby greatly alleviating computational burden.It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models,RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices,which is especially desired for industrial applications demanding highly efficient models.展开更多
BACKGROUND Colorectal cancer is a common digestive cancer worldwide.As a comprehensive treatment for locally advanced rectal cancer(LARC),neoadjuvant therapy(NT)has been increasingly used as the standard treatment for...BACKGROUND Colorectal cancer is a common digestive cancer worldwide.As a comprehensive treatment for locally advanced rectal cancer(LARC),neoadjuvant therapy(NT)has been increasingly used as the standard treatment for clinical stage II/III rectal cancer.However,few patients achieve a complete pathological response,and most patients require surgical resection and adjuvant therapy.Therefore,identifying risk factors and developing accurate models to predict the prognosis of LARC patients are of great clinical significance.AIM To establish effective prognostic nomograms and risk score prediction models to predict overall survival(OS)and disease-free survival(DFS)for LARC treated with NT.METHODS Nomograms and risk factor score prediction models were based on patients who received NT at the Cancer Hospital from 2015 to 2017.The least absolute shrinkage and selection operator regression model were utilized to screen for prognostic risk factors,which were validated by the Cox regression method.Assessment of the performance of the two prediction models was conducted using receiver operating characteristic curves,and that of the two nomograms was conducted by calculating the concordance index(C-index)and calibration curves.The results were validated in a cohort of 65 patients from 2015 to 2017.RESULTS Seven features were significantly associated with OS and were included in the OS prediction nomogram and prediction model:Vascular_tumors_bolt,cancer nodules,yN,body mass index,matchmouth distance from the edge,nerve aggression and postoperative carcinoembryonic antigen.The nomogram showed good predictive value for OS,with a C-index of 0.91(95%CI:0.85,0.97)and good calibration.In the validation cohort,the C-index was 0.69(95%CI:0.53,0.84).The risk factor prediction model showed good predictive value.The areas under the curve for 3-and 5-year survival were 0.811 and 0.782.The nomogram for predicting DFS included ypTNM and nerve aggression and showed good calibration and a C-index of 0.77(95%CI:0.69,0.85).In the validation cohort,the C-index was 0.71(95%CI:0.61,0.81).The prediction model for DFS also had good predictive value,with an AUC for 3-year survival of 0.784 and an AUC for 5-year survival of 0.754.CONCLUSION We established accurate nomograms and prediction models for predicting OS and DFS in patients with LARC after undergoing NT.展开更多
Under China's innovation-driven development strategy, venture capital has become an important driving force in urban agglomeration integration and collaborative innovation. This paper uses social network analysis ...Under China's innovation-driven development strategy, venture capital has become an important driving force in urban agglomeration integration and collaborative innovation. This paper uses social network analysis to analyze spatiotemporal differences of venture capital in the Beijing-Tianjin-Hebei urban agglomeration for the period 2005–2015. A gravity model and panel data regression model are used to reveal the influencing factors on spatiotemporal differences in venture capital in the region. This study finds that there is a certain cyclical fluctuation and uneven differentiation in the venture capital network in the Beijing-Tianjin-Hebei urban agglomeration in terms of total investment, and that the three centers of venture capital(Beijing, Shijiazhuang and Tangshan) have a stimulatory effect on surrounding cities; flows of venture capital between cities display certain networking rules, but they are slow to develop and strongly centripetal; there is a strong positive correlation between levels of information infrastructure development and economic development and venture capital investment; and places with relatively underdeveloped financial environments and service industries are less able to apply the fruits of innovation and entrepreneurship and to attract funds. This study can act as a reference for the Beijing-Tianjin-Hebei urban agglomeration in building a world-class super urban agglomeration with the best innovation capabilities in China.展开更多
Several important equilibrium Si isotope fractionation factors among minerals,organic molecules and the H_4SiO_4 solution are complemented to facilitate the explanation of the distributions of Si isotopes in Earth'...Several important equilibrium Si isotope fractionation factors among minerals,organic molecules and the H_4SiO_4 solution are complemented to facilitate the explanation of the distributions of Si isotopes in Earth's surface environments.The results reveal that,in comparison to aqueous H_4SiO_4,heavy Si isotopes will be significantly enriched in secondary silicate minerals.On the contrary,quadra-coordinated organosilicon complexes are enriched in light silicon isotope relative to the solution.The extent of ^(28)Si-enrichment in hyper-coordinated organosilicon complexes was found to be the largest.In addition,the large kinetic isotope effect associated with the polymerization of monosilicic acid and dimer was calculated,and the results support the previous statement that highly ^(28)Sienrichment in the formation of amorphous quartz precursor contributes to the discrepancy between theoretical calculations and field observations.With the equilibrium Si isotope fractionation factors provided here,Si isotope distributions in many of Earth's surface systems can be explained.For example,the change of bulk soil δ^(30)Si can be predicted as a concave pattern with respect to the weathering degree,with the minimum value where allophane completely dissolves and the total amount of sesquioxides and poorly crystalline minerals reaches their maximum.When,under equilibrium conditions,the well-crystallized clays start to precipitate from the pore solutions,the bulk soil δ^(30)Si will increase again and reach a constant value.Similarly,the precipitation of crystalline smectite and the dissolution of poorly crystalline kaolinite may explain the δ^(30)Si variations in the ground water profile.The equilibrium Si isotope fractionations among the quadracoordinated organosilicon complexes and the H_4SiO_4solution may also shed light on the Si isotope distributions in the Si-accumulating plants.展开更多
Diffusion has been systematically described as the main mechanism of chloride transport in reinforced concrete(RC) structure, especially when the concrete is in a saturated state. However, the single mechanism of di...Diffusion has been systematically described as the main mechanism of chloride transport in reinforced concrete(RC) structure, especially when the concrete is in a saturated state. However, the single mechanism of diffusion is not able to describe the actual chloride ingress in the nonsaturated concrete. Instead, it is dominated by the interaction of diffusion and convection. With the synergetic effects of various factors taken into account, this study aimed to modify and develop an analytical convection- diffusion coupling model for chloride transport in nonsaturated concrete. The model was verified by simulation of laboratory tests and field measurement. The results of comparison study demonstrate that the analytical model developed in this study is efficient and accurate in predicting the chloride profiles in the nonsaturated concrete.展开更多
Aiming at the shortage of sufficient continuous parameters for using models to estimate farmland soil organic carbon(SOC) content, an acquisition method of factors influencing farmland SOC and an estimation method of ...Aiming at the shortage of sufficient continuous parameters for using models to estimate farmland soil organic carbon(SOC) content, an acquisition method of factors influencing farmland SOC and an estimation method of farmland SOC content with Internet of Things(IOT) are proposed in this paper. The IOT sensing device and transmission network were established in a wheat demonstration base in Yanzhou Distict of Jining City, Shandong Province, China to acquire data in real time. Using real-time data and statistics data, the dynamic changes of SOC content between October 2012 and June 2015 was simulated in the experimental area with SOC dynamic simulation model. In order to verify the estimation results, potassium dichromate external heating method was applied for measuring the SOC content. The results show that: 1) The estimated value matches the measured value in the lab very well. So the method is feasible in this paper. 2) There is a clear dynamic variation in the SOC content at 0.2 m soil depth in different growing periods of wheat. The content reached the highest level during the sowing period, and is lowest in the flowering period. 3) The SOC content at 0.2 m soil depth varies in accordance with the amount of returned straw. The larger the amount of returned straw is, the higher the SOC content.展开更多
基金supported by the National Natural Science Foundation of China(12001517,72091212)the USTC Research Funds of the Double First-Class Initiative(YD2040002005)the Fundamental Research Funds for the Central Universities(WK2040000026,WK2040000027)。
文摘We consider the fluctuation of eigenvalues in factor models and propose a new method for testing the model.Based on the characteristics of eigenvalues,variables of unknown distribution are transformed into statistics of known distribution through randomization.The test statistic checks for breaks in the structure of factor models,including changes in factor loadings and increases in the number of factors.We give the results of simulation experiments and test the factor structure of the stock return data of China’s and U.S.stock markets from January 1,2017,to December 31,2019.Our method performs well in both simulations and real data.
基金Supported by NSFC (Grant Nos. 11631003, 11690012)。
文摘Despite of the wide use of the factor models, the issue of determining the number of factors has not been resolved in the statistics literature. An ad hoc approach is to set the number of factors to be the number of eigenvalues of the data correlation matrix that are larger than one, and subsequent statistical analysis proceeds assuming the resulting factor number is correct. In this work, we study the relation between the number of such eigenvalues and the number of factors, and provide the if and only if conditions under which the two numbers are equal. We show that the equality only relies on the properties of the loading matrix of the factor model. Guided by the newly discovered condition, we further reveal how the model error affects the estimation of the number of factors.
基金supported by the National Natural Science Foundation of China(62272078)Chongqing Natural Science Foundation(CSTB2023NSCQ-LZX0069)the Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202300210)
文摘Dear Editor,This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (Neu Tuc F), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework.
基金supported by grants from the National Natural Science Foundation of China(72171088,71803049,72003205)the Ministry of Education of the People's Republic of China of Humanities and Social Sciences Youth Fundation(20YJC790142)the General Project of Social Science Planning in Guangdong Province,China(GD22CYJ12).
文摘We forecast realized volatilities by developing a time-varying heterogeneous autoregressive(HAR)latent factor model with dynamic model average(DMA)and dynamic model selection(DMS)approaches.The number of latent factors is determined using Chan and Grant's(2016)deviation information criteria.The predictors in our model include lagged daily,weekly,and monthly volatility variables,the corresponding volatility factors,and a speculation variable.In addition,the time-varying properties of the best-performing DMA(DMS)-HAR-2FX models,including size,inclusion probabilities,and coefficients,are examined.We find that the proposed DMA(DMS)-HAR-2FX model outperforms the competing models for both in-sample and out-of-sample forecasts.Furthermore,the speculation variable displays strong predictability for forecasting the realized volatility of financial futures in China.
文摘Latent factor models have become a workhorse for a large number of recommender systems. While these sys- tems are built using ratings data, which is typically assumed static, the ability to incorporate different kinds of subsequent user feedback is an important asset. For instance, the user might want to provide additional information to the system in order to improve his personal recommendations. To this end, we examine a novel scheme for efficiently learning (or refining) user parameters from such feedback. We propose a scheme where users are presented with a sequence of pair- wise preference questions: "Do you prefer item A over B?" User parameters are updated based on their response, and subsequent questions are chosen adaptively after incorporat- ing the feedback. We operate in a Bayesian framework and the choice of questions is based on an information gain cri- terion. We validate the scheme on the Netflix movie ratings data set and a proprietary television viewership data set. A user study and automated experiments validate our findings.
文摘This paper provides a survey on recent developments in structural changes for high dimensional factor models. Compared with conventional low-dimensional time series, structural changes in factor models are more complicated due to the unobservability of factors and factor loadings. The following topics are covered in this survey: the identification conditions for the structural changes in the factor loadings, different impacts of big and small breaks in factor models, tests for structural changes in the factor loadings of a specific variable, tests for structural changes in the factor loading matrix, joint tests for structural changes in the factor loadings and coefficients in factor-augmented regressions, tests for smooth changes in the factor loadings, estimation of break dates, and model selection in factor models with structural changes via the shrinkage method.
基金supported by National Natural Science Foundation of China(Grant Nos.11631003,11690012 and 11571068)the Fundamental Research Funds for the Central Universities(Grant No.2412019FZ030)+1 种基金Jilin Provincial Science and Technology Development Plan Funded Project(Grant No.20180520026JH)the National Institute of Health。
文摘Linear factor models are familiar tools used in many fields.Several pioneering literatures established foundational theoretical results of the quasi-maximum likelihood estimator for high-dimensional linear factor models.Their results are based on a critical assumption:The error variance estimators are uniformly bounded in probability.Instead of making such an assumption,we provide a rigorous proof of this result under some mild conditions.
文摘For the class of(partially specified)internal risk factor models we establish strongly simplified supermodular ordering results in comparison to the case of general risk factor models.This allows us to derive meaningful and improved risk bounds for the joint portfolio in risk factor models with dependence information given by constrained specification sets for the copulas of the risk components and the systemic risk factor.The proof of our main comparison result is not standard.It is based on grid copula approximation of upper products of copulas and on the theory of mass transfers.An application to real market data shows considerable improvement over the standard method.
文摘In the field of empirical asset pricing,the challenges of high dimensionality,non-linear relationships,and interaction effects have led to the increasing popularity of machine learning(ML)methods.This study investigates the performance of ML methods when predicting different measures of stock returns from various factor models and investigates the feature importance and interaction effects among firm-specific variables and macroeconomic factors in this context.Our findings reveal that neural network models exhibit consistent performance across different stock return measures when they rely solely on firm-specific characteristic variables.However,the inclusion of macroeconomic factors from the financial market,real economic activities,and investor sentiment leads to substantial improvements in the model performance.Notably,the degree of improvement varies with the specific measures of stock returns under consideration.Furthermore,our analysis indicates that,after the inclusion of macroeconomic factors,there is a dissimilarity in model performance,variable importance,and interaction effects among macroeconomic and firm-specific variables,particularly concerning abnormal returns derived from the Fama–French three-and five-factor models compared with excess returns.This divergence is primarily attributed to the extent to which these factor models remove the variance associated with the macroeconomic variables.These findings collectively offer valuable insights into the efficacy of neural network models for stock return predictions and contribute to a deeper understanding of the intricate relationship between factor models,stock returns,and macroeconomic conditions in the domain of empirical asset pricing.
文摘A weed is a plant that thrives in areas of human disturbance, such as gardens, fields, pastures, waysides, and waste places where it is not intentionally cultivated. Dispersal affects community dynamics and vegetation response to global change. The process of seed disposal is influenced by wind, which plays a crucial role in determining the distance and probability of seed dispersal. Existing models of seed dispersal consider wind direction but fail to incorporate wind intensity. In this paper, a novel seed disposal model was proposed in this paper, incorporating wind intensity based on relevant references. According to various climatic conditions, including temperate, arid, and tropical regions, three specific regions were selected to establish a wind dispersal model that accurately reflects the density function distribution of dispersal distance. Additionally, dandelions growth is influenced by a multitude of factors, encompassing temperature, humidity, climate, and various environmental variables that necessitate meticulous consideration. Based on Factor Analysis model, which completely considers temperature, precipitation, solar radiation, wind, and land carrying capacity, a conclusion is presented, indicating that the growth of seeds is primarily influenced by plant attributes and climate conditions, with the former exerting a relatively stronger impact. Subsequently, the remaining two plants were chosen based on seed weight, yielding consistent conclusion.
文摘In this editorial,we comment on the article by Chen et al.We specifically focus on the risk factors,prognostic factors,and management of brain metastasis(BM)in breast cancer(BC).BC is the second most common cancer to have BM after lung cancer.Independent risk factors for BM in BC are:HER-2 positive BC,triplenegative BC,and germline BRCA mutation.Other factors associated with BM are lung metastasis,age less than 40 years,and African and American ancestry.Even though risk factors associated with BM in BC are elucidated,there is a lack of data on predictive models for BM in BC.Few studies have been made to formulate predictive models or nomograms to address this issue,where age,grade of tumor,HER-2 receptor status,and number of metastatic sites(1 vs>1)were predictive of BM in metastatic BC.However,none have been used in clinical practice.National Comprehensive Cancer Network recommends screening of BM in advanced BC only when the patient is symptomatic or suspicious of central nervous system symptoms;routine screening for BM in BC is not recommended in the guidelines.BM decreases the quality of life and will have a significant psychological impact.Further studies are required for designing validated nomograms or predictive models for BM in BC;these models can be used in the future to develop treatment approaches to prevent BM,which improves the quality of life and overall survival.
基金supported by the Fundamental Research Funds for the Central UniversitiesNational Natural Science Foundation of China(Grant No.12271272)。
文摘With the advancement of modern scientific research,multimodal data is increasingly being collected from multiple sources or types.For outcomes derived from generalized linear models with high-dimensional and multimodal covariates,we develop two distinct factor-adjusted tests to assess the significance of high-dimensional modality data and specific low-dimensional linear combinations of predictors from one or more modalities,respectively.First,we propose a factor-adjusted decorrelated score test to evaluate the significance of a single modality.This approach simultaneously transforms a high-dimensional test into a fixed low-dimensional one while addressing the impact of high-dimensional nuisance parameters.Second,we construct a factor-adjusted Wald test based on partial penalized estimation to assess the significance of certain low-dimensional combinations of variables from one or more modalities.The limiting distributions of these two proposed tests are analyzed under both the null hypothesis and local alternatives to characterize the asymptotic type-I errors and powers.The finite sample performance of our proposed tests is evaluated through simulations and further demonstrated with a breast cancer dataset.
基金financially supported by the Key Research and Development Program of Hunan Province,China(No.2023SK2006)the Natural Science Foundation of Hunan Province,China(No.2023JJ50057)+2 种基金the Science and Technology Plan Project of Geological Bureau of Hunan Province,China(No.HNGSTP202411)the Open Project of Key Laboratory of the Ministry of Natural Resources,China(No.BL202105)the Natural Science Foundation of Changsha City,China(No.kq2202090)。
文摘Environmental problems from heavy metals(HMs)attract global attention.Accurately identifying sources and quantitatively evaluating ecological risks are keys for HMs pollution prevention.Dongting Lake in China was investigated through integrated methods like positive matrix factorization and Nemerow integrated risk index to examine spatial distribution,contamination characteristics,pollution sources,and the contribution of each source and pollutant to the ecological risk of 14 HMs in its surface sediments.Results showed that the mean concentrations of HMs were 0.82-9.44 times greater than the corresponding background values.The spatial distribution of HMs varied significantly,with high values of As,Cd,Mn,Pb,Sn,Tl and Zn concentrated in the sediments from Xiangjiang inlet and Yangtze outlet;Co,Cr,Cu,Ni and V in the Lishui sediments;Hg and Sb in the sediments from Yuanjiang and Zishui inlets,respectively.The accumulation of HMs was affected by five sources:mercury mining and atmospheric deposition(F1)(17.99%),urban domestic sewage and industrial sewage discharge(F2)(24.44%),antimony ore mining and smelting(F3)(6.50%),non-ferrous metal mining and extended processing industrial sources(F4)(15.72%),and mixed sources mainly from natural sources and agricultural sources(F5)(35.35%).F1 and F2 were identified as priority pollution sources;Cd,Hg,Tl,Sb and As,especially Cd and Hg,posed relatively high ecological risks and were prioritized HMs for control.
基金supported in part by the National Natural Science Foundation of China(61702475,61772493,61902370,62002337)in part by the Natural Science Foundation of Chongqing,China(cstc2019jcyj-msxmX0578,cstc2019jcyjjqX0013)+1 种基金in part by the Chinese Academy of Sciences“Light of West China”Program,in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciencesby Technology Innovation and Application Development Project of Chongqing,China(cstc2019jscx-fxydX0027)。
文摘High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurately represent them is of great significance.A latent factor(LF)model is one of the most popular and successful ways to address this issue.Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix,i.e.,they sum the errors between observed data and predicted ones with L2-norm.Yet L2-norm is sensitive to outlier data.Unfortunately,outlier data usually exist in such matrices.For example,an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users.To address this issue,this work proposes a smooth L1-norm-oriented latent factor(SL-LF)model.Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss,making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix.Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices.
基金supported in part by the National Natural Science Foundation of China (6177249391646114)+1 种基金Chongqing research program of technology innovation and application (cstc2017rgzn-zdyfX0020)in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciences
文摘Latent factor(LF)models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS)matrices which are commonly seen in various industrial applications.An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost.Hence,determining how to accelerate the training process for LF models has become a significant issue.To address this,this work proposes a randomized latent factor(RLF)model.It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices,thereby greatly alleviating computational burden.It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models,RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices,which is especially desired for industrial applications demanding highly efficient models.
文摘BACKGROUND Colorectal cancer is a common digestive cancer worldwide.As a comprehensive treatment for locally advanced rectal cancer(LARC),neoadjuvant therapy(NT)has been increasingly used as the standard treatment for clinical stage II/III rectal cancer.However,few patients achieve a complete pathological response,and most patients require surgical resection and adjuvant therapy.Therefore,identifying risk factors and developing accurate models to predict the prognosis of LARC patients are of great clinical significance.AIM To establish effective prognostic nomograms and risk score prediction models to predict overall survival(OS)and disease-free survival(DFS)for LARC treated with NT.METHODS Nomograms and risk factor score prediction models were based on patients who received NT at the Cancer Hospital from 2015 to 2017.The least absolute shrinkage and selection operator regression model were utilized to screen for prognostic risk factors,which were validated by the Cox regression method.Assessment of the performance of the two prediction models was conducted using receiver operating characteristic curves,and that of the two nomograms was conducted by calculating the concordance index(C-index)and calibration curves.The results were validated in a cohort of 65 patients from 2015 to 2017.RESULTS Seven features were significantly associated with OS and were included in the OS prediction nomogram and prediction model:Vascular_tumors_bolt,cancer nodules,yN,body mass index,matchmouth distance from the edge,nerve aggression and postoperative carcinoembryonic antigen.The nomogram showed good predictive value for OS,with a C-index of 0.91(95%CI:0.85,0.97)and good calibration.In the validation cohort,the C-index was 0.69(95%CI:0.53,0.84).The risk factor prediction model showed good predictive value.The areas under the curve for 3-and 5-year survival were 0.811 and 0.782.The nomogram for predicting DFS included ypTNM and nerve aggression and showed good calibration and a C-index of 0.77(95%CI:0.69,0.85).In the validation cohort,the C-index was 0.71(95%CI:0.61,0.81).The prediction model for DFS also had good predictive value,with an AUC for 3-year survival of 0.784 and an AUC for 5-year survival of 0.754.CONCLUSION We established accurate nomograms and prediction models for predicting OS and DFS in patients with LARC after undergoing NT.
基金Major Program of the National Natural Science Foundation of China,No.41590842
文摘Under China's innovation-driven development strategy, venture capital has become an important driving force in urban agglomeration integration and collaborative innovation. This paper uses social network analysis to analyze spatiotemporal differences of venture capital in the Beijing-Tianjin-Hebei urban agglomeration for the period 2005–2015. A gravity model and panel data regression model are used to reveal the influencing factors on spatiotemporal differences in venture capital in the region. This study finds that there is a certain cyclical fluctuation and uneven differentiation in the venture capital network in the Beijing-Tianjin-Hebei urban agglomeration in terms of total investment, and that the three centers of venture capital(Beijing, Shijiazhuang and Tangshan) have a stimulatory effect on surrounding cities; flows of venture capital between cities display certain networking rules, but they are slow to develop and strongly centripetal; there is a strong positive correlation between levels of information infrastructure development and economic development and venture capital investment; and places with relatively underdeveloped financial environments and service industries are less able to apply the fruits of innovation and entrepreneurship and to attract funds. This study can act as a reference for the Beijing-Tianjin-Hebei urban agglomeration in building a world-class super urban agglomeration with the best innovation capabilities in China.
基金the funding support from the 973 Program(2014CB440904)CAS/SAFEA International Partnership Program for Creative Research Teams(Intraplate Mineralization Research Team,KZZD-EW-TZ-20)Chinese NSF projects(41173023,41225012,41490635,41530210)
文摘Several important equilibrium Si isotope fractionation factors among minerals,organic molecules and the H_4SiO_4 solution are complemented to facilitate the explanation of the distributions of Si isotopes in Earth's surface environments.The results reveal that,in comparison to aqueous H_4SiO_4,heavy Si isotopes will be significantly enriched in secondary silicate minerals.On the contrary,quadra-coordinated organosilicon complexes are enriched in light silicon isotope relative to the solution.The extent of ^(28)Si-enrichment in hyper-coordinated organosilicon complexes was found to be the largest.In addition,the large kinetic isotope effect associated with the polymerization of monosilicic acid and dimer was calculated,and the results support the previous statement that highly ^(28)Sienrichment in the formation of amorphous quartz precursor contributes to the discrepancy between theoretical calculations and field observations.With the equilibrium Si isotope fractionation factors provided here,Si isotope distributions in many of Earth's surface systems can be explained.For example,the change of bulk soil δ^(30)Si can be predicted as a concave pattern with respect to the weathering degree,with the minimum value where allophane completely dissolves and the total amount of sesquioxides and poorly crystalline minerals reaches their maximum.When,under equilibrium conditions,the well-crystallized clays start to precipitate from the pore solutions,the bulk soil δ^(30)Si will increase again and reach a constant value.Similarly,the precipitation of crystalline smectite and the dissolution of poorly crystalline kaolinite may explain the δ^(30)Si variations in the ground water profile.The equilibrium Si isotope fractionations among the quadracoordinated organosilicon complexes and the H_4SiO_4solution may also shed light on the Si isotope distributions in the Si-accumulating plants.
基金Funded by the National Natural Science Foundation of China(Nos.51278304,U1134209,U1434204&51422814)the National Basic Research Program(973 Program)of China(No.011-CB013604)the Technology Research and Development Program(Basic Research Project)of Shenzhen(Nos.JCYJ20120613174456685&JCYJ20130329143859418)
文摘Diffusion has been systematically described as the main mechanism of chloride transport in reinforced concrete(RC) structure, especially when the concrete is in a saturated state. However, the single mechanism of diffusion is not able to describe the actual chloride ingress in the nonsaturated concrete. Instead, it is dominated by the interaction of diffusion and convection. With the synergetic effects of various factors taken into account, this study aimed to modify and develop an analytical convection- diffusion coupling model for chloride transport in nonsaturated concrete. The model was verified by simulation of laboratory tests and field measurement. The results of comparison study demonstrate that the analytical model developed in this study is efficient and accurate in predicting the chloride profiles in the nonsaturated concrete.
基金Under the auspices of National High-tech R&D Program of China(No.2013AA102301)National Natural Science Foundation of China(No.71503148)
文摘Aiming at the shortage of sufficient continuous parameters for using models to estimate farmland soil organic carbon(SOC) content, an acquisition method of factors influencing farmland SOC and an estimation method of farmland SOC content with Internet of Things(IOT) are proposed in this paper. The IOT sensing device and transmission network were established in a wheat demonstration base in Yanzhou Distict of Jining City, Shandong Province, China to acquire data in real time. Using real-time data and statistics data, the dynamic changes of SOC content between October 2012 and June 2015 was simulated in the experimental area with SOC dynamic simulation model. In order to verify the estimation results, potassium dichromate external heating method was applied for measuring the SOC content. The results show that: 1) The estimated value matches the measured value in the lab very well. So the method is feasible in this paper. 2) There is a clear dynamic variation in the SOC content at 0.2 m soil depth in different growing periods of wheat. The content reached the highest level during the sowing period, and is lowest in the flowering period. 3) The SOC content at 0.2 m soil depth varies in accordance with the amount of returned straw. The larger the amount of returned straw is, the higher the SOC content.