The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utilit...The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.展开更多
Full electronic automation in stock exchanges has recently become popular,generat-ing high-frequency intraday data and motivating the development of near real-time price forecasting methods.Machine learning algorithms...Full electronic automation in stock exchanges has recently become popular,generat-ing high-frequency intraday data and motivating the development of near real-time price forecasting methods.Machine learning algorithms are widely applied to mid-price stock predictions.Processing raw data as inputs for prediction models(e.g.,data thinning and feature engineering)can primarily affect the performance of the prediction methods.However,researchers rarely discuss this topic.This motivated us to propose three novel modelling strategies for processing raw data.We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks.In these experiments,our strategies often lead to statistically significant improvement in predictions.The three strategies improve the F1 scores of the SVM models by 0.056,0.087,and 0.016,respectively.展开更多
基金by the National Key Research and Development Program of China(2023YFC3303701-02 and 2024YFC3306701)the National Natural Science Foundation of China(T2425014 and 32270667)+3 种基金the Natural Science Foundation of Fujian Province of China(2023J06013)the Major Project of the National Social Science Foundation of China granted to Chuan-Chao Wang(21&ZD285)Open Research Fund of State Key Laboratory of Genetic Engineering at Fudan University(SKLGE-2310)Open Research Fund of Forensic Genetics Key Laboratory of the Ministry of Public Security(2023FGKFKT07).
文摘The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.
基金Canada Research Chair(950231363,XZ),Natural Sciences and Engineering Research Council of Canada(NSERC)Discovery Grants(RGPIN-20203530,LX)the Social Sciences and Humanities Research Council of Canada(SSHRC)Insight Development Grants(430-2018-00557,KX).
文摘Full electronic automation in stock exchanges has recently become popular,generat-ing high-frequency intraday data and motivating the development of near real-time price forecasting methods.Machine learning algorithms are widely applied to mid-price stock predictions.Processing raw data as inputs for prediction models(e.g.,data thinning and feature engineering)can primarily affect the performance of the prediction methods.However,researchers rarely discuss this topic.This motivated us to propose three novel modelling strategies for processing raw data.We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks.In these experiments,our strategies often lead to statistically significant improvement in predictions.The three strategies improve the F1 scores of the SVM models by 0.056,0.087,and 0.016,respectively.