Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(...Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(2025)].In relation to seasonal forecasting and climate projection in the East Asian summer monsoon season,proper simulation of the seasonal migration of rain bands by models is a challenging and limiting factor[section 7.1 in Wang et al.(2025)].展开更多
Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these i...Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.展开更多
Accurate estimation of evapotranspiration(ET),especially at the regional scale,is an extensively investigated topic in the field of water science. The ability to obtain a continuous time series of highly precise ET va...Accurate estimation of evapotranspiration(ET),especially at the regional scale,is an extensively investigated topic in the field of water science. The ability to obtain a continuous time series of highly precise ET values is necessary for improving our knowledge of fundamental hydrological processes and for addressing various problems regarding the use of water. This objective can be achieved by means of ET data assimilation based on hydrological modeling. In this paper,a comprehensive review of ET data assimilation based on hydrological modeling is provided. The difficulties and bottlenecks of using ET,being a non-state variable,to construct data assimilation relationships are elaborated upon,with a discussion and analysis of the feasibility of assimilating ET into various hydrological models. Based on this,a new easy-to-operate ET assimilation scheme that includes a water circulation physical mechanism is proposed. The scheme was developed with an improved data assimilation system that uses a distributed time-variant gain model(DTVGM),and the ET-soil humidity nonlinear time response relationship of this model. Moreover,the ET mechanism in the DTVGM was improved to perfect the ET data assimilation system. The new scheme may provide the best spatial and temporal characteristics for hydrological states,and may be referenced for accurate estimation of regional evapotranspiration.展开更多
OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on...OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
The parametric temporal data model captures a real world entity in a single tuple, which reduces query language complexity. Such a data model, however, is difficult to be implemented on top of conventional databases b...The parametric temporal data model captures a real world entity in a single tuple, which reduces query language complexity. Such a data model, however, is difficult to be implemented on top of conventional databases because of its unfixed attribute sizes. XML is a matured technology and can be an elegant solution for such challenge. Representing data in XML trigger a question about storage efficiency. The goal of this work is to provide a straightforward answer to such a question. To this end, we compare three different storage models for the parametric temporal data model and show that XML is not worse than any other approaches. Furthermore, XML outperforms the other storages under certain conditions. Therefore, our simulation results provide a positive indication that the myth about XML is not true in the parametric temporal data model.展开更多
An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical...An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
Predicting tropical cyclone(TC)genesis is of great societal importance but scientifically challenging.It requires fineresolution coupled models that properly represent air−sea interactions in the atmospheric responses...Predicting tropical cyclone(TC)genesis is of great societal importance but scientifically challenging.It requires fineresolution coupled models that properly represent air−sea interactions in the atmospheric responses to local warm sea surface temperatures and feedbacks,with aid from coherent coupled initialization.This study uses three sets of highresolution regional coupled models(RCMs)covering the Asia−Pacific(AP)region initialized with local observations and dynamically downscaled coupled data assimilation to evaluate the predictability of TC genesis in the West Pacific.The APRCMs consist of three sets of high-resolution configurations of the Weather Research and Forecasting−Regional Ocean Model System(WRF-ROMS):27-km WRF with 9-km ROMS,and 9-km WRF with 3-km ROMS.In this study,a 9-km WRF with 9-km ROMS coupled model system is also used in a case test for the predictability of TC genesis.Since the local sea surface temperatures and wind shear conditions that favor TC formation are better resolved,the enhanced-resolution coupled model tends to improve the predictability of TC genesis,which could be further improved by improving planetary boundary layer physics,thus resolving better air−sea and air−land interactions.展开更多
A novel encryption model is proposed. It combines encryption process with compression process, and realizes compression and encryption at the same time. The model's feasibility and security are analyzed in detail. An...A novel encryption model is proposed. It combines encryption process with compression process, and realizes compression and encryption at the same time. The model's feasibility and security are analyzed in detail. And the relationship between its security and compression ratio is also analyzed.展开更多
Crowdsourced data can effectively observe environmental and urban ecosystem processes.The use of data produced by untrained people into flood forecasting models may effectively allow Early Warning Systems(EWS)to bette...Crowdsourced data can effectively observe environmental and urban ecosystem processes.The use of data produced by untrained people into flood forecasting models may effectively allow Early Warning Systems(EWS)to better perform while support decision-making to reduce the fatalities and economic losses due to inundation hazard.In this work,we develop a Data Assimilation(DA)method integrating Volunteered Geographic Information(VGI)and a 2D hydraulic model and we test its performances.The proposed framework seeks to extend the capabilities and performances of standard DA works,based on the use of traditional in situ sensors,by assimilating VGI while managing and taking into account the uncertainties related to the quality,and the location and timing of the entire set of observational data.The November 2012 flood in the Italian Tiber River basin was selected as the case study.Results show improvements of the model in terms of uncertainty with a significant persistence of the model updating after the integration of the VGI,even in the case of use of few-selected observations gathered from social media.This will encourage further research in the use of VGI for EWS considering the exponential increase of quality and quantity of smartphone and social media user worldwide.展开更多
In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. O...In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).展开更多
Background:The universal occurrence of randomly distributed dark holes(i.e.,data pits appearing within the tree crown)in LiDAR-derived canopy height models(CHMs)negatively affects the accuracy of extracted forest inve...Background:The universal occurrence of randomly distributed dark holes(i.e.,data pits appearing within the tree crown)in LiDAR-derived canopy height models(CHMs)negatively affects the accuracy of extracted forest inventory parameters.Methods:We develop an algorithm based on cloth simulation for constructing a pit-free CHM.Results:The proposed algorithm effectively fills data pits of various sizes whilst preserving canopy details.Our pitfree CHMs derived from point clouds at different proportions of data pits are remarkably better than those constructed using other algorithms,as evidenced by the lowest average root mean square error(0.4981 m)between the reference CHMs and the constructed pit-free CHMs.Moreover,our pit-free CHMs show the best performance overall in terms of maximum tree height estimation(average bias=0.9674 m).Conclusion:The proposed algorithm can be adopted when working with different quality LiDAR data and shows high potential in forestry applications.展开更多
In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues...In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.展开更多
Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of th...Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.展开更多
With the development of drone technology and oblique photogrammetry technology, the acquisition of oblique photogrammetry models and basemap becomes more and more convenient and quickly. The increase in the number of ...With the development of drone technology and oblique photogrammetry technology, the acquisition of oblique photogrammetry models and basemap becomes more and more convenient and quickly. The increase in the number of basemap leads to excessively redundant basemap tiles requests in 3D GIS when loading oblique photogrammetry models, which slows down the system. Aiming at improving the speed of running system, this paper proposes a dynamic strategy for loading basemap tiles. Different from existing 3D GIS which loading oblique photogrammetry models and basemap tiles inde-pendently, this strategy dynamically loads basemap tiles depending on different height of view and the range of loaded oblique photogrammetry models. We achieve dynamic loading of basemap tiles by predetermining whether the basemap tiles will be covered by the oblique photogrammetry models. The experimental results show that this strategy can greatly reduce the num-ber of redundant requests from the client to the server while ensuring the user’s visual requirements for the oblique photogrammetric model.展开更多
In this paper, we present a set of best practices for workflow design and implementation for numerical weather prediction models and meteorological data service, which have been in operation in China Meteorological Ad...In this paper, we present a set of best practices for workflow design and implementation for numerical weather prediction models and meteorological data service, which have been in operation in China Meteorological Administration (CMA) for years and have been proven effective in reliably managing the complexities of large-scale meteorological related workflows. Based on the previous work on the platforms, we argue that a minimum set of guidelines including workflow scheme, module design, implementation standards and maintenance consideration during the whole establishment of the platform are highly recommended, serving to reduce the need for future maintenance and adjustment. A significant gain in performance can be achieved through the workflow-based projects. We believe that a good workflow system plays an important role in the weather forecast service, providing a useful tool for monitoring the whole process, fixing the errors, repairing a workflow, or redesigning an equivalent workflow pattern with new components.展开更多
In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose a...In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose an empirical likelihood based variable selection procedure, and show that it is consistent and satisfies the sparsity. The simulation studies show that the proposed variable selection method is workable.展开更多
Direct soil temperature(ST)measurement is time-consuming and costly;thus,the use of simple and cost-effective machine learning(ML)tools is helpful.In this study,ML approaches,including KStar,instance-based K-nearest l...Direct soil temperature(ST)measurement is time-consuming and costly;thus,the use of simple and cost-effective machine learning(ML)tools is helpful.In this study,ML approaches,including KStar,instance-based K-nearest learning(IBK),and locally weighted learning(LWL),coupled with resampling algorithms of bagging(BA)and dagging(DA)(BA-IBK,BA-KStar,BA-LWL,DA-IBK,DA-KStar,and DA-LWL)were developed and tested for multi-step ahead(3,6,and 9 d ahead)ST forecasting.In addition,a linear regression(LR)model was used as a benchmark to evaluate the results.A dataset was established,with daily ST time-series at 5 and 50 cm soil depths in a farmland as models’output and meteorological data as models’input,including mean(T_(mean)),minimum(Tmin),and maximum(T_(max))air temperatures,evaporation(Eva),sunshine hours(SSH),and solar radiation(SR),which were collected at Isfahan Synoptic Station(Iran)for 13 years(1992–2005).Six different input combination scenarios were selected based on Pearson’s correlation coefficients between inputs and outputs and fed into the models.We used 70%of the data to train the models,with the remaining 30%used for model evaluation via multiple visual and quantitative metrics.Our?ndings showed that T_(mean)was the most effective input variable for ST forecasting in most of the developed models,while in some cases the combinations of variables,including T_(mean)and T_(max)and T_(mean),T_(max),Tmin,Eva,and SSH proved to be the best input combinations.Among the evaluated models,BA-KStar showed greater compatibility,while in most cases,BA-IBK and-LWL provided more accurate results,depending on soil depth.For the 5 cm soil depth,BA-KStar had superior performance(i.e.,Nash-Sutcliffe efficiency(NSE)=0.90,0.87,and 0.85 for 3,6,and 9 d ahead forecasting,respectively);for the 50 cm soil depth,DA-KStar outperformed the other models(i.e.,NSE=0.88,0.89,and 0.89 for 3,6,and 9 d ahead forecasting,respectively).The results con?rmed that all hybrid models had higher prediction capabilities than the LR model.展开更多
We used simulated data to investigate both the small and large sample properties of the within-groups (WG) estimator and the first difference generalized method of moments (FD-GMM) estimator of a dynamic panel data (D...We used simulated data to investigate both the small and large sample properties of the within-groups (WG) estimator and the first difference generalized method of moments (FD-GMM) estimator of a dynamic panel data (DPD) model. The magnitude of WG and FD-GMM estimates are almost the same for square panels. WG estimator performs best for long panels such as those with time dimension as large as 50. The advantage of FD-GMM estimator however, is observed on panels that are long and wide, say with time dimension at least 25 and cross-section dimension size of at least 30. For small-sized panels, the two methods failed since their optimality was established in the context of asymptotic theory. We developed parametric bootstrap versions of WG and FD-GMM estimators. Simulation study indicates the advantages of the bootstrap methods under small sample cases on the assumption that variances of the individual effects and the disturbances are of similar magnitude. The boostrapped WG and FD-GMM estimators are optimal for small samples.展开更多
文摘Climate model prediction has been improved by enhancing model resolution as well as the implementation of sophisticated physical parameterization and refinement of data assimilation systems[section 6.1 in Wang et al.(2025)].In relation to seasonal forecasting and climate projection in the East Asian summer monsoon season,proper simulation of the seasonal migration of rain bands by models is a challenging and limiting factor[section 7.1 in Wang et al.(2025)].
基金supported by the National Natural Science Foundation of China(42250101)the Macao Foundation。
文摘Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.
基金National Key Basic Research Program of China(973 Program),No.2015CB452701National Natural Science Foundation of China,No.41271003+1 种基金No.41371043No.41401042
文摘Accurate estimation of evapotranspiration(ET),especially at the regional scale,is an extensively investigated topic in the field of water science. The ability to obtain a continuous time series of highly precise ET values is necessary for improving our knowledge of fundamental hydrological processes and for addressing various problems regarding the use of water. This objective can be achieved by means of ET data assimilation based on hydrological modeling. In this paper,a comprehensive review of ET data assimilation based on hydrological modeling is provided. The difficulties and bottlenecks of using ET,being a non-state variable,to construct data assimilation relationships are elaborated upon,with a discussion and analysis of the feasibility of assimilating ET into various hydrological models. Based on this,a new easy-to-operate ET assimilation scheme that includes a water circulation physical mechanism is proposed. The scheme was developed with an improved data assimilation system that uses a distributed time-variant gain model(DTVGM),and the ET-soil humidity nonlinear time response relationship of this model. Moreover,the ET mechanism in the DTVGM was improved to perfect the ET data assimilation system. The new scheme may provide the best spatial and temporal characteristics for hydrological states,and may be referenced for accurate estimation of regional evapotranspiration.
基金Supported by Research on Pattern differentiation of AIDS based on Graph Theroy of National Natural Science Foundation of China(No.81202858)Research on Intervention Evaluation of TCM Health Differentiation of National Key Technology Support Program(No.2012BAI25B02)+3 种基金Research and Development in Digital Information System of Traditional Chinese Medicine of National 863 Program of China(No.2012AA02A609)Acupuncture Efficacy of Gastrointestinal Dysfunction(No.ZZ05003)Acupuncture-point Specialty Analysis based on Image Processing Technology(No.ZZ03090)of Self-selected subject of China Academy of Chinese Medical SciencesSemantic Recognition of Tongue and Pulse based on Image Content of the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1306)
文摘OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
基金supported by the National Research Foundation in Korea through contract N-12-NM-IR05
文摘The parametric temporal data model captures a real world entity in a single tuple, which reduces query language complexity. Such a data model, however, is difficult to be implemented on top of conventional databases because of its unfixed attribute sizes. XML is a matured technology and can be an elegant solution for such challenge. Representing data in XML trigger a question about storage efficiency. The goal of this work is to provide a straightforward answer to such a question. To this end, we compare three different storage models for the parametric temporal data model and show that XML is not worse than any other approaches. Furthermore, XML outperforms the other storages under certain conditions. Therefore, our simulation results provide a positive indication that the myth about XML is not true in the parametric temporal data model.
文摘An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
基金supported by the National Key Research&Development Program of China(Grant Nos.2017YFC1404100 and 2017YFC1404104)the National Natural Science Foundation of China(Grant Nos.41775100 and 41830964)。
文摘Predicting tropical cyclone(TC)genesis is of great societal importance but scientifically challenging.It requires fineresolution coupled models that properly represent air−sea interactions in the atmospheric responses to local warm sea surface temperatures and feedbacks,with aid from coherent coupled initialization.This study uses three sets of highresolution regional coupled models(RCMs)covering the Asia−Pacific(AP)region initialized with local observations and dynamically downscaled coupled data assimilation to evaluate the predictability of TC genesis in the West Pacific.The APRCMs consist of three sets of high-resolution configurations of the Weather Research and Forecasting−Regional Ocean Model System(WRF-ROMS):27-km WRF with 9-km ROMS,and 9-km WRF with 3-km ROMS.In this study,a 9-km WRF with 9-km ROMS coupled model system is also used in a case test for the predictability of TC genesis.Since the local sea surface temperatures and wind shear conditions that favor TC formation are better resolved,the enhanced-resolution coupled model tends to improve the predictability of TC genesis,which could be further improved by improving planetary boundary layer physics,thus resolving better air−sea and air−land interactions.
基金supported by the National Natural Science Foundation of China(60903197)the Major State Basic Research Development Program of China(2007CB310800)+1 种基金the Major Research Plan of the National Natural Science Foundation of China (90718006)the Foundation of Key Laboratory of Aerospace Information Security and Trust Computing Ministry of Education.
文摘A novel encryption model is proposed. It combines encryption process with compression process, and realizes compression and encryption at the same time. The model's feasibility and security are analyzed in detail. And the relationship between its security and compression ratio is also analyzed.
文摘Crowdsourced data can effectively observe environmental and urban ecosystem processes.The use of data produced by untrained people into flood forecasting models may effectively allow Early Warning Systems(EWS)to better perform while support decision-making to reduce the fatalities and economic losses due to inundation hazard.In this work,we develop a Data Assimilation(DA)method integrating Volunteered Geographic Information(VGI)and a 2D hydraulic model and we test its performances.The proposed framework seeks to extend the capabilities and performances of standard DA works,based on the use of traditional in situ sensors,by assimilating VGI while managing and taking into account the uncertainties related to the quality,and the location and timing of the entire set of observational data.The November 2012 flood in the Italian Tiber River basin was selected as the case study.Results show improvements of the model in terms of uncertainty with a significant persistence of the model updating after the integration of the VGI,even in the case of use of few-selected observations gathered from social media.This will encourage further research in the use of VGI for EWS considering the exponential increase of quality and quantity of smartphone and social media user worldwide.
基金The project supported by NNSFC (19631040), NSSFC (04BTJ002) and the grant for post-doctor fellows in SELF.
文摘In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).
基金the National Natural Science Foundation of China(Grant Nos.41671414,41971380 and 41171265)the National Key Research and Development Program of China(No.2016YFB0501404).
文摘Background:The universal occurrence of randomly distributed dark holes(i.e.,data pits appearing within the tree crown)in LiDAR-derived canopy height models(CHMs)negatively affects the accuracy of extracted forest inventory parameters.Methods:We develop an algorithm based on cloth simulation for constructing a pit-free CHM.Results:The proposed algorithm effectively fills data pits of various sizes whilst preserving canopy details.Our pitfree CHMs derived from point clouds at different proportions of data pits are remarkably better than those constructed using other algorithms,as evidenced by the lowest average root mean square error(0.4981 m)between the reference CHMs and the constructed pit-free CHMs.Moreover,our pit-free CHMs show the best performance overall in terms of maximum tree height estimation(average bias=0.9674 m).Conclusion:The proposed algorithm can be adopted when working with different quality LiDAR data and shows high potential in forestry applications.
基金Supported by the National Natural Science Foundation of China(71131008(Key Project)and 71271179)
文摘In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.
基金supported By Grant (PLN2022-14) of State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation (Southwest Petroleum University)。
文摘Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.
文摘With the development of drone technology and oblique photogrammetry technology, the acquisition of oblique photogrammetry models and basemap becomes more and more convenient and quickly. The increase in the number of basemap leads to excessively redundant basemap tiles requests in 3D GIS when loading oblique photogrammetry models, which slows down the system. Aiming at improving the speed of running system, this paper proposes a dynamic strategy for loading basemap tiles. Different from existing 3D GIS which loading oblique photogrammetry models and basemap tiles inde-pendently, this strategy dynamically loads basemap tiles depending on different height of view and the range of loaded oblique photogrammetry models. We achieve dynamic loading of basemap tiles by predetermining whether the basemap tiles will be covered by the oblique photogrammetry models. The experimental results show that this strategy can greatly reduce the num-ber of redundant requests from the client to the server while ensuring the user’s visual requirements for the oblique photogrammetric model.
文摘In this paper, we present a set of best practices for workflow design and implementation for numerical weather prediction models and meteorological data service, which have been in operation in China Meteorological Administration (CMA) for years and have been proven effective in reliably managing the complexities of large-scale meteorological related workflows. Based on the previous work on the platforms, we argue that a minimum set of guidelines including workflow scheme, module design, implementation standards and maintenance consideration during the whole establishment of the platform are highly recommended, serving to reduce the need for future maintenance and adjustment. A significant gain in performance can be achieved through the workflow-based projects. We believe that a good workflow system plays an important role in the weather forecast service, providing a useful tool for monitoring the whole process, fixing the errors, repairing a workflow, or redesigning an equivalent workflow pattern with new components.
基金Supported by the National Natural Science Foundation of China(Grant Nos.1110111911126332)+2 种基金the National Social Science Foundation of China(Grant No.11CTJ004)the Natural Science Foundation of Guangxi Province(Grant No.2010GXNSFB013051)the Philosophy and Social Sciences Foundation of Guangxi Province(Grant No.11FTJ002)
文摘In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose an empirical likelihood based variable selection procedure, and show that it is consistent and satisfies the sparsity. The simulation studies show that the proposed variable selection method is workable.
文摘Direct soil temperature(ST)measurement is time-consuming and costly;thus,the use of simple and cost-effective machine learning(ML)tools is helpful.In this study,ML approaches,including KStar,instance-based K-nearest learning(IBK),and locally weighted learning(LWL),coupled with resampling algorithms of bagging(BA)and dagging(DA)(BA-IBK,BA-KStar,BA-LWL,DA-IBK,DA-KStar,and DA-LWL)were developed and tested for multi-step ahead(3,6,and 9 d ahead)ST forecasting.In addition,a linear regression(LR)model was used as a benchmark to evaluate the results.A dataset was established,with daily ST time-series at 5 and 50 cm soil depths in a farmland as models’output and meteorological data as models’input,including mean(T_(mean)),minimum(Tmin),and maximum(T_(max))air temperatures,evaporation(Eva),sunshine hours(SSH),and solar radiation(SR),which were collected at Isfahan Synoptic Station(Iran)for 13 years(1992–2005).Six different input combination scenarios were selected based on Pearson’s correlation coefficients between inputs and outputs and fed into the models.We used 70%of the data to train the models,with the remaining 30%used for model evaluation via multiple visual and quantitative metrics.Our?ndings showed that T_(mean)was the most effective input variable for ST forecasting in most of the developed models,while in some cases the combinations of variables,including T_(mean)and T_(max)and T_(mean),T_(max),Tmin,Eva,and SSH proved to be the best input combinations.Among the evaluated models,BA-KStar showed greater compatibility,while in most cases,BA-IBK and-LWL provided more accurate results,depending on soil depth.For the 5 cm soil depth,BA-KStar had superior performance(i.e.,Nash-Sutcliffe efficiency(NSE)=0.90,0.87,and 0.85 for 3,6,and 9 d ahead forecasting,respectively);for the 50 cm soil depth,DA-KStar outperformed the other models(i.e.,NSE=0.88,0.89,and 0.89 for 3,6,and 9 d ahead forecasting,respectively).The results con?rmed that all hybrid models had higher prediction capabilities than the LR model.
文摘We used simulated data to investigate both the small and large sample properties of the within-groups (WG) estimator and the first difference generalized method of moments (FD-GMM) estimator of a dynamic panel data (DPD) model. The magnitude of WG and FD-GMM estimates are almost the same for square panels. WG estimator performs best for long panels such as those with time dimension as large as 50. The advantage of FD-GMM estimator however, is observed on panels that are long and wide, say with time dimension at least 25 and cross-section dimension size of at least 30. For small-sized panels, the two methods failed since their optimality was established in the context of asymptotic theory. We developed parametric bootstrap versions of WG and FD-GMM estimators. Simulation study indicates the advantages of the bootstrap methods under small sample cases on the assumption that variances of the individual effects and the disturbances are of similar magnitude. The boostrapped WG and FD-GMM estimators are optimal for small samples.