The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environmen...The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environment.This study aims to build a prediction model to understand the spatial characteristics and piecewise effects of forest fire drivers.Using monthly grid data from 2006 to 2020,a modeling study analyzed fire occurrences during the September to April fire season in Fujian Province,China.We compared the fitting performance of the logistic regression model(LRM),the generalized additive logistic model(GALM),and the spatial generalized additive logistic model(SGALM).The results indicate that SGALMs had the best fitting results and the highest prediction accuracy.Meteorological factors significantly impacted forest fires in Fujian Province.Areas with high fire incidence were mainly concentrated in the northwest and southeast.SGALMs improved the fitting effect of fire prediction models by considering spatial effects and the flexible fitting ability of nonlinear interpretation.This model provides piecewise interpretations of forest wildfire occurrences,which can be valuable for relevant departments and will assist forest managers in refining prevention measures based on temporal and spatial differences.展开更多
The growth of Sakhalin fir(Abies sachalinen-sis)seedlings,an important forest tree species in northern Hokkaido,Japan,is significantly affected by competition from surrounding vegetation,especially evergreen dwarf bam...The growth of Sakhalin fir(Abies sachalinen-sis)seedlings,an important forest tree species in northern Hokkaido,Japan,is significantly affected by competition from surrounding vegetation,especially evergreen dwarf bamboo.In this study,we investigated the height and root collar diameter(RCD)growth of Sakhalin fir seedlings under various degrees of cover by deciduous vegetation and evergreen dwarf bamboo.Generalized additive models were used to quantify the effects of canopy cover and forest floor cover on the relative growth rates of these two parameters.The canopy cover of Sakhalin fir seedlings had a nonlin-ear negative effect on both the height growth of seedlings in the subsequent year and the RCD growth in the current year,given the general growth pattern in this species,where height growth ceases in early summer and RCD growth con-tinues until autumn.Height growth declined sharply after the canopy cover rate exceeded 50%,while RCD growth declined rapidly between 0 and 50%canopy cover rate.The forest floor cover had a greater negative impact on RCD growth than on height growth.These results suggested that Sakhalin fir seedlings respond to vegetative competition by prioritizing height growth for light acquisition at the expense of diameter growth and possibly root growth for below-ground competition.The cover of evergreen dwarf bamboo reduced the height growth of fir seedlings significantly more than the cover of deciduous vegetation.This difference is likely due to the timing of light availability.When competing with deciduous vegetation,Sakhalin fir seedlings exposed to light during the post-snow melt and early spring before the development of the deciduous vegetation canopy can photosynthesize more effectively,leading to greater height growth.The results of this study highlighted the importance of vegetation control considering the type of vegetation for successful Sakhalin fir reforestation.Adjusting the intensity and timing of weeding based on the presence and abundance of dwarf bamboo and other competing vegetation could potentially reduce weeding costs and increase biodiversity in reforested areas.展开更多
This study aims to provide a predictive vegetation mapping approach based on the spectral data, DEM and Generalized Additive Models (GAMs). GAMs were used as a prediction tool to describe the relationship between vege...This study aims to provide a predictive vegetation mapping approach based on the spectral data, DEM and Generalized Additive Models (GAMs). GAMs were used as a prediction tool to describe the relationship between vegetation and environmental variables, as well as spectral variables. Based on the fitted GAMs model, probability map of species occurrence was generated and then vegetation type of each grid was defined according to the probability of species occurrence. Deviance analysis was employed to test the goodness of curve fitting and drop contribution calculation was used to evaluate the contribution of each predictor in the fitted GAMs models. Area under curve (AUC) of Receiver Operating Characteristic (ROC) curve was employed to assess the results maps of probability. The results showed that: 1) AUC values of the fitted GAMs models are very high which proves that integrating spectral data and environmental variables based on the GAMs is a feasible way to map the vegetation. 2) Prediction accuracy varies with plant community, and community with dense cover is better predicted than sparse plant community. 3) Both spectral variables and environmental variables play an important role in mapping the vegetation. However, the contribution of the same predictor in the GAMs models for different plant communities is different. 4) Insufficient resolution of spectral data, environmental data and confounding effects of land use and other variables which are not closely related to the environmental conditions are the major causes of imprecision.展开更多
This research develops a new mathematical modeling method by combining industrial big data and process mechanism analysis under the framework of generalized additive models(GAM)to generate a practical model with gener...This research develops a new mathematical modeling method by combining industrial big data and process mechanism analysis under the framework of generalized additive models(GAM)to generate a practical model with generalization and precision.Specifically,the proposed modeling method includes the following steps.Firstly,the influence factors are screened using mechanism knowledge and data-mining methods.Secondly,the unary GAM without interactions including cleaning the data,building the sub-models,and verifying the sub-models.Subsequently,the interactions between the various factors are explored,and the binary GAM with interactions is constructed.The relationships among the sub-models are analyzed,and the integrated model is built.Finally,based on the proposed modeling method,two prediction models of mechanical property and deformation resistance for hot-rolled strips are established.Industrial actual data verification demonstrates that the new models have good prediction precision,and the mean absolute percentage errors of tensile strength,yield strength and deformation resistance are 2.54%,3.34%and 6.53%,respectively.And experimental results suggest that the proposed method offers a new approach to industrial process modeling.展开更多
A model of deformation resistance during hot strip rolling was established based on generalized additive model.Firstly,a data modeling method based on generalized additive model was given.It included the selection of ...A model of deformation resistance during hot strip rolling was established based on generalized additive model.Firstly,a data modeling method based on generalized additive model was given.It included the selection of dependent variable and independent variables of the model,the link function of dependent variable and smoothing functional form of each independent variable,estimating process of the link function and smooth functions,and the last model modification.Then,the practical modeling test was carried out based on a large amount of hot rolling process data.An integrated variable was proposed to reflect the effects of different chemical compositions such as carbon,silicon,manganese,nickel,chromium,niobium,etc.The integrated chemical composition,strain,strain rate and rolling temperature were selected as independent variables and the cubic spline as the smooth function for them.The modeling process of deformation resistance was realized by SAS software,and the influence curves of the independent variables on deformation resistance were obtained by local scoring algorithm.Some interesting phenomena were found,for example,there is a critical value of strain rate,and the deformation resistance increases before this value and then decreases.The results confirm that the new model has higher prediction accuracy than traditional ones and is suitable for carbon steel,microalloyed steel,alloyed steel and other steel grades.展开更多
There are typical ecosystems of littoral wetlands in the Yellow River Delta.In order to study the relationships between Tamarix chinensis and environmental variables and to predict T.chinensis potential distribution i...There are typical ecosystems of littoral wetlands in the Yellow River Delta.In order to study the relationships between Tamarix chinensis and environmental variables and to predict T.chinensis potential distribution in the Yellow River Delta,641 vegetation samples and 964 soil samples were collected in the area in October of 2004,2005,2006 and 2007.The contents of soil organic matter,total phosphorus,salt,and soluble potassium were determined.Then,the analyzed data were interpolated into spatial raster data by Kriging interpolation method.Meanwhile,the digital elevation model,soil type map and landform unit map of the Yellow River Delta were also collected.Generalized Additive Models(GAMs) were employed to build species-environment model and then simulate the potential distribution of T.chinensis.The results indicated that the distribution of T.chinensis was mainly limited by soil salt content,total soil phosphorus content,soluble potassium content,soil type,landform unit,and elevation.The distribution probability of T.chinensis was produced with a lookup table generated by Grasp Module(based on GAMs) in software ArcView GIS 3.2.The AUC(Area Under Curve) value of validation and cross-validation of ROC(Receive Operating Characteristic) were both higher than 0.8,which suggested that the established model had a high precision for predicting species distribution.展开更多
Fault monitoring of bioprocess is important to ensure safety of a reactor and maintain high quality of products. It is difficult to build an accurate mechanistic model for a bioprocess, so fault monitoring based on ri...Fault monitoring of bioprocess is important to ensure safety of a reactor and maintain high quality of products. It is difficult to build an accurate mechanistic model for a bioprocess, so fault monitoring based on rich historical or online database is an effective way. A group of data based on bootstrap method could be resampling stochastically, improving generalization capability of model. In this paper, online fault monitoring of generalized additive models (GAMs) combining with bootstrap is proposed for glutamate fermentation process. GAMs and bootstrap are first used to decide confidence interval based on the online and off-line normal sampled data from glutamate fermentation experiments. Then GAMs are used to online fault monitoring for time, dissolved oxygen, oxygen uptake rate, and carbon dioxide evolution rate. The method can provide accurate fault alarm online and is helpful to provide useful information for removing fault and abnormal phenomena in the fermentation.展开更多
In dealing with nonparametric regression the GAM procedure is the most versatile of several new procedures. The terminology behind this procedure is more flexible than traditional parametric modeling tools. It relaxes...In dealing with nonparametric regression the GAM procedure is the most versatile of several new procedures. The terminology behind this procedure is more flexible than traditional parametric modeling tools. It relaxes the usual assumptions of parametric model and enables us to uncover structure to establish the relationship between independent variables and dependent variable in exponential family that may not be obvious otherwise. In this paper, we discussed two methods of fitting generalized additive logistic regression model, one based on Newton Raphson method and another based on iterative weighted least square method for first and second order Taylor series expansion. The use of the GAM procedure with the specified set of weights, using local scoring algorithm, was applied to real life data sets. The cubic spline smoother is applied to the independent variables. Based on nonparametric regression and smoothing techniques, this procedure provides powerful tools for data analysis.展开更多
In this study, the horizontal and vertical distribution of primary production(PP) and its monthly variations were described based on field data collected from the Daya Bay in January–December of 2016. The relationshi...In this study, the horizontal and vertical distribution of primary production(PP) and its monthly variations were described based on field data collected from the Daya Bay in January–December of 2016. The relationships between PP and environmental factors were analyzed using a general additive model(GAM). Significant seasonal differences were observed in the horizontal distribution of PP, while vertical distribution showed a relatively consistent unimodal pattern. The monthly average PP(calculated by carbon) ranged from 48.03 to 390.56 mg/(m~2·h),with an annual average of 182.77 mg/(m~2·h). The highest PP was observed in May and the lowest in November.Additionally, the overall trend in PP was spring>summer>winter>autumn, and spring PP was approximately three times that of autumn PP. GAM analysis revealed that temperature, bottom salinity, phytoplankton, and photosynthetically active radiation(PAR) had no significant relationships with PP, while longitude, depth, surface salinity, chlorophyll a(Chl a) and transparency were significantly correlated with PP. Overall, the results presented herein indicate that monsoonal changes and terrestrial and offshore water systems have crucial effects on environmental factors that are associated with PP changes.展开更多
Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson...Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial and the Poisson inverse Gaussian have variance larger than the mean and therefore are more appropriate to model over-dispersed count data. As an alternative to these two models, we shall use the generalized Poisson distribution for group comparisons in the presence of multiple covariates. This problem is known as the ANCOVA and is solved for continuous data. Our objectives were to develop ANCOVA using the generalized Poisson distribution, and compare its goodness of fit to that of the nonparametric Generalized Additive Models. We used real life data to show that the model performs quite satisfactorily when compared to the nonparametric Generalized Additive Models.展开更多
This paper discusses variable selection for interval-censored failure time data,a general type of failure time data that commonly arise in many areas such as clinical trials and follow-up studies.Although some methods...This paper discusses variable selection for interval-censored failure time data,a general type of failure time data that commonly arise in many areas such as clinical trials and follow-up studies.Although some methods have been developed in the literature for the problem,most of the existing procedures apply only to specific models.In this paper,we consider the data arising from a general class of partly linear additive generalized odds rate models and propose a penalized variable selection approach through maximizing a derived penalized likelihood function.In the method,the Bernsetin polynomials are employed to approximate both the unknown baseline hazard functions and the nonlinear covariate effects functions,and for the implementation of the method,a coordinate descent algorithm is developed.Also the asymptotic properties of the proposed estimators,including the oracle property,are established.An extensive simulation study is conducted to assess the finite-sample performance of the proposed estimators and indicates that it works well in practice.Finally,the proposed method is applied to a set of real data on Alzheimer’s disease.展开更多
A probabilistic precipitation forecasting model using generalized additive models (GAMs) and Bayesian model averaging (BMA) was proposed in this paper. GAMs were used to fit the spatial-temporal precipitation mode...A probabilistic precipitation forecasting model using generalized additive models (GAMs) and Bayesian model averaging (BMA) was proposed in this paper. GAMs were used to fit the spatial-temporal precipitation models to individual ensemble member forecasts. The distributions of the precipitation occurrence and the cumulative precipitation amount were represented simultaneously by a single Tweedie distribution. BMA was then used as a post-processing method to combine the individual models to form a more skillful probabilistic forecasting model. The mixing weights were estimated using the expectation-maximization algorithm. The residual diagnostics was used to examine if the fitted BMA forecasting model had fully captured the spatial and temporal variations of precipitation. The proposed method was applied to daily observations at the Yishusi River basin for July 2007 using the National Centers for Environmental Prediction ensemble forecasts. By applying scoring rules, the BMA forecasts were verified and showed better performances compared with the empirical probabilistic ensemble forecasts, particularly for extreme precipitation. Finally, possible improvements and a^plication of this method to the downscaling of climate change scenarios were discussed.展开更多
Climate change is one of the critical determinants affecting life cycles and transmission of most infectious agents,including malaria,cholera,dengue fever,hand,foot,and mouth disease(HFMD),and the recent Corona-virus ...Climate change is one of the critical determinants affecting life cycles and transmission of most infectious agents,including malaria,cholera,dengue fever,hand,foot,and mouth disease(HFMD),and the recent Corona-virus pandemic.HFMD has been associated with a growing number of outbreaks resulting in fatal complications since the late 1990s.The outbreaks may result from a combination of rapid population growth,climate change,socioeconomic changes,and other lifestyle changes.However,the modeling of climate variability and HFMD remains unclear,particularly in statistical theory development.The statistical relationship between HFMD and climate factors has been widely studied using generalized linear and additive modeling.When dealing with time-series data with clustered variables such as HFMD with clustered states,the independence principle of both modeling approaches may be violated.Thus,a Generalized Additive Mixed Model(GAMM)is used to investigate the relationship between HFMD and climate factors in Malaysia.The model is improved by using a first-order autoregressive term and treating all Malaysian states as a random effect.This method is preferred as it allows states to be modeled as random effects and accounts for time series data autocorrelation.The findings indicate that climate variables such as rainfall and wind speed affect HFMD cases in Malaysia.The risk of HFMD increased in the subsequent two weeks with rainfall below 60 mm and decreased with rainfall exceeding 60 mm.Besides,a two-week lag in wind speeds between 2 and 5 m/s reduced HFMD's chances.The results also show that HFMD cases rose in Malaysia during the inter-monsoon and southwest monsoon seasons but fell during the northeast monsoon.The study's outcomes can be used by public health officials and the general public to raise awareness,and thus,implement effective preventive measures.展开更多
The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.W...The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.展开更多
This paper covers predicting high-resolution electricity peak demand features given lower-resolution data.This is a relevant setup as it answers whether limited higher-resolution monitoring helps to estimate future hi...This paper covers predicting high-resolution electricity peak demand features given lower-resolution data.This is a relevant setup as it answers whether limited higher-resolution monitoring helps to estimate future high-resolution peak loads when the high-resolution data is no longer available.That question is particularly interesting for network operators considering replacing high-resolution monitoring by predictive models due to economic considerations.We propose models to predict half-hourly minima and maxima of high-resolution(every minute)electricity load data while model inputs are of a lower resolution(30 min).We combine predictions of generalized additive models(GAM)and deep artificial neural networks(DNN),which are popular in load forecasting.We extensively analyze the prediction models,including the input parameters’importance,focusing on load,weather,and seasonal effects.The proposed method won a data competition organized by Western Power Distribution,a British distribution network operator.In addition,we provide a rigorous evaluation study that goes beyond the competition frame to analyze the models’robustness.The results show that the proposed methods are superior to the competition benchmark concerning the out-of-sample root mean squared error(RMSE).This holds regarding the competition month and the supplementary evaluation study,which covers an additional eleven months.Overall,our proposed model combination reduces the out-of-sample RMSE by 57.4%compared to the benchmark.展开更多
为了解决东南太平洋长鳍金枪鱼(Thunnus alalunga)资源评估长期依赖日韩等国的捕捞数据导致的偏差问题,以我国延绳钓渔业数据作为基础数据构建了资源丰度指数。基于2013—2022年10年间我国金枪鱼延绳钓渔业捕捞数据及海洋环境参数,利用...为了解决东南太平洋长鳍金枪鱼(Thunnus alalunga)资源评估长期依赖日韩等国的捕捞数据导致的偏差问题,以我国延绳钓渔业数据作为基础数据构建了资源丰度指数。基于2013—2022年10年间我国金枪鱼延绳钓渔业捕捞数据及海洋环境参数,利用广义加性模型(Generalized additive models,GAM)对单位捕捞努力量渔获量(Catch per unit effort,CPUE)进行标准化分析,量化纬度、经度、年份、月份、环境因子及交互作用等的影响,并通过普通最小二乘法回归模型(Ordinary least squares,OLS)对比了我国与日本延绳钓渔业的标准化CPUE变化趋势。结果表明,GAM模型最大偏差解释率为69.8%,纬度对CPUE的贡献最为显著。资源丰度较高的区域为20°S~30°S,100°W~120°W,资源密度最高的年份为2016年,最高的月份为4—8月。标准化CPUE与名义CPUE趋势大致相同且季节性波动明显。除2020年以外,标准化CPUE均低于名义CPUE。在大多数年份,基于我国渔业数据的标准化CPUE与基于日本延绳钓渔业数据的标准化CPUE变化趋势相似。本研究为东南太平洋长鳍金枪鱼资源评估提供了新的资源丰度指数信息,对进一步提高资源评估的可靠性具有积极的参考价值。展开更多
Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abu...Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abundance to each environmental variable is different and habitat requirements may change over life history stages and seasons.Therefore,it is necessary to determine the optimal combination of environmental variables in HSI modelling.In this study,generalized additive models(GAMs)were used to determine which environmental variables to be included in the HSI models.Significant variables were retained and weighted in the HSI model according to their relative contribution(%)to the total deviation explained by the boosted regression tree(BRT).The HSI models were applied to evaluate the habitat suitability of mantis shrimp Oratosquilla oratoria in the Haizhou Bay and adjacent areas in 2011 and 2013–2017.Ontogenetic and seasonal variations in HSI models of mantis shrimp were also examined.Among the four models(non-optimized model,BRT informed HSI model,GAM informed HSI model,and both BRT and GAM informed HSI model),both BRT and GAM informed HSI model showed the best performance.Four environmental variables(bottom temperature,depth,distance offshore and sediment type)were selected in the HSI models for four groups(spring-juvenile,spring-adult,falljuvenile and fall-adult)of mantis shrimp.The distribution of habitat suitability showed similar patterns between juveniles and adults,but obvious seasonal variations were observed.This study suggests that the process of optimizing environmental variables in HSI models improves the performance of HSI models,and this optimization strategy could be extended to other marine organisms to enhance the understanding of the habitat suitability of target species.展开更多
基金supported by the Fujian Provincial Science and Technology Program“University-Industry Cooperation Project”(2024Y4015)National Key R&D Plan of Strategic International Scientific and Technological Innovation Cooperation Project(2018YFE0207800).
文摘The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environment.This study aims to build a prediction model to understand the spatial characteristics and piecewise effects of forest fire drivers.Using monthly grid data from 2006 to 2020,a modeling study analyzed fire occurrences during the September to April fire season in Fujian Province,China.We compared the fitting performance of the logistic regression model(LRM),the generalized additive logistic model(GALM),and the spatial generalized additive logistic model(SGALM).The results indicate that SGALMs had the best fitting results and the highest prediction accuracy.Meteorological factors significantly impacted forest fires in Fujian Province.Areas with high fire incidence were mainly concentrated in the northwest and southeast.SGALMs improved the fitting effect of fire prediction models by considering spatial effects and the flexible fitting ability of nonlinear interpretation.This model provides piecewise interpretations of forest wildfire occurrences,which can be valuable for relevant departments and will assist forest managers in refining prevention measures based on temporal and spatial differences.
基金supported by the Ministry of Agriculture,Forestry,and Fisheries of Japan (25093 C)JSPS KAKENHI (JP23H02262)
文摘The growth of Sakhalin fir(Abies sachalinen-sis)seedlings,an important forest tree species in northern Hokkaido,Japan,is significantly affected by competition from surrounding vegetation,especially evergreen dwarf bamboo.In this study,we investigated the height and root collar diameter(RCD)growth of Sakhalin fir seedlings under various degrees of cover by deciduous vegetation and evergreen dwarf bamboo.Generalized additive models were used to quantify the effects of canopy cover and forest floor cover on the relative growth rates of these two parameters.The canopy cover of Sakhalin fir seedlings had a nonlin-ear negative effect on both the height growth of seedlings in the subsequent year and the RCD growth in the current year,given the general growth pattern in this species,where height growth ceases in early summer and RCD growth con-tinues until autumn.Height growth declined sharply after the canopy cover rate exceeded 50%,while RCD growth declined rapidly between 0 and 50%canopy cover rate.The forest floor cover had a greater negative impact on RCD growth than on height growth.These results suggested that Sakhalin fir seedlings respond to vegetative competition by prioritizing height growth for light acquisition at the expense of diameter growth and possibly root growth for below-ground competition.The cover of evergreen dwarf bamboo reduced the height growth of fir seedlings significantly more than the cover of deciduous vegetation.This difference is likely due to the timing of light availability.When competing with deciduous vegetation,Sakhalin fir seedlings exposed to light during the post-snow melt and early spring before the development of the deciduous vegetation canopy can photosynthesize more effectively,leading to greater height growth.The results of this study highlighted the importance of vegetation control considering the type of vegetation for successful Sakhalin fir reforestation.Adjusting the intensity and timing of weeding based on the presence and abundance of dwarf bamboo and other competing vegetation could potentially reduce weeding costs and increase biodiversity in reforested areas.
基金Under the auspices of National Natural Science Foundation of China(No.41001363)
文摘This study aims to provide a predictive vegetation mapping approach based on the spectral data, DEM and Generalized Additive Models (GAMs). GAMs were used as a prediction tool to describe the relationship between vegetation and environmental variables, as well as spectral variables. Based on the fitted GAMs model, probability map of species occurrence was generated and then vegetation type of each grid was defined according to the probability of species occurrence. Deviance analysis was employed to test the goodness of curve fitting and drop contribution calculation was used to evaluate the contribution of each predictor in the fitted GAMs models. Area under curve (AUC) of Receiver Operating Characteristic (ROC) curve was employed to assess the results maps of probability. The results showed that: 1) AUC values of the fitted GAMs models are very high which proves that integrating spectral data and environmental variables based on the GAMs is a feasible way to map the vegetation. 2) Prediction accuracy varies with plant community, and community with dense cover is better predicted than sparse plant community. 3) Both spectral variables and environmental variables play an important role in mapping the vegetation. However, the contribution of the same predictor in the GAMs models for different plant communities is different. 4) Insufficient resolution of spectral data, environmental data and confounding effects of land use and other variables which are not closely related to the environmental conditions are the major causes of imprecision.
基金Project(51774219)supported by the National Natural Science Foundation of China
文摘This research develops a new mathematical modeling method by combining industrial big data and process mechanism analysis under the framework of generalized additive models(GAM)to generate a practical model with generalization and precision.Specifically,the proposed modeling method includes the following steps.Firstly,the influence factors are screened using mechanism knowledge and data-mining methods.Secondly,the unary GAM without interactions including cleaning the data,building the sub-models,and verifying the sub-models.Subsequently,the interactions between the various factors are explored,and the binary GAM with interactions is constructed.The relationships among the sub-models are analyzed,and the integrated model is built.Finally,based on the proposed modeling method,two prediction models of mechanical property and deformation resistance for hot-rolled strips are established.Industrial actual data verification demonstrates that the new models have good prediction precision,and the mean absolute percentage errors of tensile strength,yield strength and deformation resistance are 2.54%,3.34%and 6.53%,respectively.And experimental results suggest that the proposed method offers a new approach to industrial process modeling.
基金supported by National Natural Science Foundation of China (51774219)Science and Technology Research Program of Hubei Ministry of Education(D20161103)Youth Science and technology Program of Wuhan(2016070204010099)
文摘A model of deformation resistance during hot strip rolling was established based on generalized additive model.Firstly,a data modeling method based on generalized additive model was given.It included the selection of dependent variable and independent variables of the model,the link function of dependent variable and smoothing functional form of each independent variable,estimating process of the link function and smooth functions,and the last model modification.Then,the practical modeling test was carried out based on a large amount of hot rolling process data.An integrated variable was proposed to reflect the effects of different chemical compositions such as carbon,silicon,manganese,nickel,chromium,niobium,etc.The integrated chemical composition,strain,strain rate and rolling temperature were selected as independent variables and the cubic spline as the smooth function for them.The modeling process of deformation resistance was realized by SAS software,and the influence curves of the independent variables on deformation resistance were obtained by local scoring algorithm.Some interesting phenomena were found,for example,there is a critical value of strain rate,and the deformation resistance increases before this value and then decreases.The results confirm that the new model has higher prediction accuracy than traditional ones and is suitable for carbon steel,microalloyed steel,alloyed steel and other steel grades.
基金Under the auspices of the Project of National Natural Science Foundation of China ( No. 41001363)Autonomous Project of State Key Laboratory of Resources and Environmental Information System,Geo-information Tupu Theory and Virtual Geoscience
文摘There are typical ecosystems of littoral wetlands in the Yellow River Delta.In order to study the relationships between Tamarix chinensis and environmental variables and to predict T.chinensis potential distribution in the Yellow River Delta,641 vegetation samples and 964 soil samples were collected in the area in October of 2004,2005,2006 and 2007.The contents of soil organic matter,total phosphorus,salt,and soluble potassium were determined.Then,the analyzed data were interpolated into spatial raster data by Kriging interpolation method.Meanwhile,the digital elevation model,soil type map and landform unit map of the Yellow River Delta were also collected.Generalized Additive Models(GAMs) were employed to build species-environment model and then simulate the potential distribution of T.chinensis.The results indicated that the distribution of T.chinensis was mainly limited by soil salt content,total soil phosphorus content,soluble potassium content,soil type,landform unit,and elevation.The distribution probability of T.chinensis was produced with a lookup table generated by Grasp Module(based on GAMs) in software ArcView GIS 3.2.The AUC(Area Under Curve) value of validation and cross-validation of ROC(Receive Operating Characteristic) were both higher than 0.8,which suggested that the established model had a high precision for predicting species distribution.
基金Supported by the National Natural Science Foundation of China (61273131) 111 Project (B12018)+1 种基金 the Innovation Project of Graduate in Jiangsu Province (CXZZ12_0741) the Fundamental Research Funds for the Central Universities (JUDCF12034)
文摘Fault monitoring of bioprocess is important to ensure safety of a reactor and maintain high quality of products. It is difficult to build an accurate mechanistic model for a bioprocess, so fault monitoring based on rich historical or online database is an effective way. A group of data based on bootstrap method could be resampling stochastically, improving generalization capability of model. In this paper, online fault monitoring of generalized additive models (GAMs) combining with bootstrap is proposed for glutamate fermentation process. GAMs and bootstrap are first used to decide confidence interval based on the online and off-line normal sampled data from glutamate fermentation experiments. Then GAMs are used to online fault monitoring for time, dissolved oxygen, oxygen uptake rate, and carbon dioxide evolution rate. The method can provide accurate fault alarm online and is helpful to provide useful information for removing fault and abnormal phenomena in the fermentation.
文摘In dealing with nonparametric regression the GAM procedure is the most versatile of several new procedures. The terminology behind this procedure is more flexible than traditional parametric modeling tools. It relaxes the usual assumptions of parametric model and enables us to uncover structure to establish the relationship between independent variables and dependent variable in exponential family that may not be obvious otherwise. In this paper, we discussed two methods of fitting generalized additive logistic regression model, one based on Newton Raphson method and another based on iterative weighted least square method for first and second order Taylor series expansion. The use of the GAM procedure with the specified set of weights, using local scoring algorithm, was applied to real life data sets. The cubic spline smoother is applied to the independent variables. Based on nonparametric regression and smoothing techniques, this procedure provides powerful tools for data analysis.
基金The National Natural Science Foundation of China under contract No.41506136the Scientific Research Foundation of Third Institute of Oceanography,SOA under contract No.2015005
文摘In this study, the horizontal and vertical distribution of primary production(PP) and its monthly variations were described based on field data collected from the Daya Bay in January–December of 2016. The relationships between PP and environmental factors were analyzed using a general additive model(GAM). Significant seasonal differences were observed in the horizontal distribution of PP, while vertical distribution showed a relatively consistent unimodal pattern. The monthly average PP(calculated by carbon) ranged from 48.03 to 390.56 mg/(m~2·h),with an annual average of 182.77 mg/(m~2·h). The highest PP was observed in May and the lowest in November.Additionally, the overall trend in PP was spring>summer>winter>autumn, and spring PP was approximately three times that of autumn PP. GAM analysis revealed that temperature, bottom salinity, phytoplankton, and photosynthetically active radiation(PAR) had no significant relationships with PP, while longitude, depth, surface salinity, chlorophyll a(Chl a) and transparency were significantly correlated with PP. Overall, the results presented herein indicate that monsoonal changes and terrestrial and offshore water systems have crucial effects on environmental factors that are associated with PP changes.
文摘Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial and the Poisson inverse Gaussian have variance larger than the mean and therefore are more appropriate to model over-dispersed count data. As an alternative to these two models, we shall use the generalized Poisson distribution for group comparisons in the presence of multiple covariates. This problem is known as the ANCOVA and is solved for continuous data. Our objectives were to develop ANCOVA using the generalized Poisson distribution, and compare its goodness of fit to that of the nonparametric Generalized Additive Models. We used real life data to show that the model performs quite satisfactorily when compared to the nonparametric Generalized Additive Models.
基金Supported by the National Natural Science Foundation of China(Grant Nos.12071176,12031016,12171328)Scientific and Technologial Innovation Programs of Higher Education Institutions in Shanxi(Grant No.2023L012)Beijing Natural Science Foundation(Grant No.Z210003)。
文摘This paper discusses variable selection for interval-censored failure time data,a general type of failure time data that commonly arise in many areas such as clinical trials and follow-up studies.Although some methods have been developed in the literature for the problem,most of the existing procedures apply only to specific models.In this paper,we consider the data arising from a general class of partly linear additive generalized odds rate models and propose a penalized variable selection approach through maximizing a derived penalized likelihood function.In the method,the Bernsetin polynomials are employed to approximate both the unknown baseline hazard functions and the nonlinear covariate effects functions,and for the implementation of the method,a coordinate descent algorithm is developed.Also the asymptotic properties of the proposed estimators,including the oracle property,are established.An extensive simulation study is conducted to assess the finite-sample performance of the proposed estimators and indicates that it works well in practice.Finally,the proposed method is applied to a set of real data on Alzheimer’s disease.
基金Supported by the National Basic Research and Development (973) Program of China (2010CB428402)China Meteorological Administration Special Public Welfare Research Fund (GYHY200706001)
文摘A probabilistic precipitation forecasting model using generalized additive models (GAMs) and Bayesian model averaging (BMA) was proposed in this paper. GAMs were used to fit the spatial-temporal precipitation models to individual ensemble member forecasts. The distributions of the precipitation occurrence and the cumulative precipitation amount were represented simultaneously by a single Tweedie distribution. BMA was then used as a post-processing method to combine the individual models to form a more skillful probabilistic forecasting model. The mixing weights were estimated using the expectation-maximization algorithm. The residual diagnostics was used to examine if the fitted BMA forecasting model had fully captured the spatial and temporal variations of precipitation. The proposed method was applied to daily observations at the Yishusi River basin for July 2007 using the National Centers for Environmental Prediction ensemble forecasts. By applying scoring rules, the BMA forecasts were verified and showed better performances compared with the empirical probabilistic ensemble forecasts, particularly for extreme precipitation. Finally, possible improvements and a^plication of this method to the downscaling of climate change scenarios were discussed.
基金This work was supported by the Ministry of Higher Education,Malaysia under the Fundamental Research Grant Scheme FRGS/1/2020/STG06/UTM/02/3(5F311)Research University Grant with vote no:QJ130000.3854.19J58Zamalah UTM Scholarship under Universiti Teknologi Malaysia.
文摘Climate change is one of the critical determinants affecting life cycles and transmission of most infectious agents,including malaria,cholera,dengue fever,hand,foot,and mouth disease(HFMD),and the recent Corona-virus pandemic.HFMD has been associated with a growing number of outbreaks resulting in fatal complications since the late 1990s.The outbreaks may result from a combination of rapid population growth,climate change,socioeconomic changes,and other lifestyle changes.However,the modeling of climate variability and HFMD remains unclear,particularly in statistical theory development.The statistical relationship between HFMD and climate factors has been widely studied using generalized linear and additive modeling.When dealing with time-series data with clustered variables such as HFMD with clustered states,the independence principle of both modeling approaches may be violated.Thus,a Generalized Additive Mixed Model(GAMM)is used to investigate the relationship between HFMD and climate factors in Malaysia.The model is improved by using a first-order autoregressive term and treating all Malaysian states as a random effect.This method is preferred as it allows states to be modeled as random effects and accounts for time series data autocorrelation.The findings indicate that climate variables such as rainfall and wind speed affect HFMD cases in Malaysia.The risk of HFMD increased in the subsequent two weeks with rainfall below 60 mm and decreased with rainfall exceeding 60 mm.Besides,a two-week lag in wind speeds between 2 and 5 m/s reduced HFMD's chances.The results also show that HFMD cases rose in Malaysia during the inter-monsoon and southwest monsoon seasons but fell during the northeast monsoon.The study's outcomes can be used by public health officials and the general public to raise awareness,and thus,implement effective preventive measures.
文摘The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.
文摘This paper covers predicting high-resolution electricity peak demand features given lower-resolution data.This is a relevant setup as it answers whether limited higher-resolution monitoring helps to estimate future high-resolution peak loads when the high-resolution data is no longer available.That question is particularly interesting for network operators considering replacing high-resolution monitoring by predictive models due to economic considerations.We propose models to predict half-hourly minima and maxima of high-resolution(every minute)electricity load data while model inputs are of a lower resolution(30 min).We combine predictions of generalized additive models(GAM)and deep artificial neural networks(DNN),which are popular in load forecasting.We extensively analyze the prediction models,including the input parameters’importance,focusing on load,weather,and seasonal effects.The proposed method won a data competition organized by Western Power Distribution,a British distribution network operator.In addition,we provide a rigorous evaluation study that goes beyond the competition frame to analyze the models’robustness.The results show that the proposed methods are superior to the competition benchmark concerning the out-of-sample root mean squared error(RMSE).This holds regarding the competition month and the supplementary evaluation study,which covers an additional eleven months.Overall,our proposed model combination reduces the out-of-sample RMSE by 57.4%compared to the benchmark.
文摘为了解决东南太平洋长鳍金枪鱼(Thunnus alalunga)资源评估长期依赖日韩等国的捕捞数据导致的偏差问题,以我国延绳钓渔业数据作为基础数据构建了资源丰度指数。基于2013—2022年10年间我国金枪鱼延绳钓渔业捕捞数据及海洋环境参数,利用广义加性模型(Generalized additive models,GAM)对单位捕捞努力量渔获量(Catch per unit effort,CPUE)进行标准化分析,量化纬度、经度、年份、月份、环境因子及交互作用等的影响,并通过普通最小二乘法回归模型(Ordinary least squares,OLS)对比了我国与日本延绳钓渔业的标准化CPUE变化趋势。结果表明,GAM模型最大偏差解释率为69.8%,纬度对CPUE的贡献最为显著。资源丰度较高的区域为20°S~30°S,100°W~120°W,资源密度最高的年份为2016年,最高的月份为4—8月。标准化CPUE与名义CPUE趋势大致相同且季节性波动明显。除2020年以外,标准化CPUE均低于名义CPUE。在大多数年份,基于我国渔业数据的标准化CPUE与基于日本延绳钓渔业数据的标准化CPUE变化趋势相似。本研究为东南太平洋长鳍金枪鱼资源评估提供了新的资源丰度指数信息,对进一步提高资源评估的可靠性具有积极的参考价值。
基金The National Key R&D Program of China under contract No.2017YFE0104400the National Natural Science Foundation of China under contract No.31772852the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology(Qingdao)under contract No.2018SDKJ0501-2。
文摘Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abundance to each environmental variable is different and habitat requirements may change over life history stages and seasons.Therefore,it is necessary to determine the optimal combination of environmental variables in HSI modelling.In this study,generalized additive models(GAMs)were used to determine which environmental variables to be included in the HSI models.Significant variables were retained and weighted in the HSI model according to their relative contribution(%)to the total deviation explained by the boosted regression tree(BRT).The HSI models were applied to evaluate the habitat suitability of mantis shrimp Oratosquilla oratoria in the Haizhou Bay and adjacent areas in 2011 and 2013–2017.Ontogenetic and seasonal variations in HSI models of mantis shrimp were also examined.Among the four models(non-optimized model,BRT informed HSI model,GAM informed HSI model,and both BRT and GAM informed HSI model),both BRT and GAM informed HSI model showed the best performance.Four environmental variables(bottom temperature,depth,distance offshore and sediment type)were selected in the HSI models for four groups(spring-juvenile,spring-adult,falljuvenile and fall-adult)of mantis shrimp.The distribution of habitat suitability showed similar patterns between juveniles and adults,but obvious seasonal variations were observed.This study suggests that the process of optimizing environmental variables in HSI models improves the performance of HSI models,and this optimization strategy could be extended to other marine organisms to enhance the understanding of the habitat suitability of target species.