There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal compo...There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.展开更多
In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social developmen...In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social development.Consequently,many domestic universities have introduced majors or courses related to big data.Among these,the Big Data Management and Applications major stands out for its interdisciplinary approach and emphasis on practical skills.However,as an emerging field,it has not yet accumulated a robust foundation in teaching theory and practice.Current instructional practices face issues such as unclear training objectives,inconsistent teaching methods and course content,insufficient integration of practical components,and a shortage of qualified faculty-factors that hinder both the development of the major and the overall quality of education.Taking the statistics course within the Big Data Management and Applications major as an example,this paper examines the challenges faced by statistics education in the context of emerging productive forces and proposes corresponding improvement measures.By introducing innovative teaching concepts and strategies,the teaching system for professional courses is optimized,and authentic classroom scenarios are recreated through illustrative examples.Questionnaire surveys and statistical analyses of data collected before and after the teaching reforms indicate that the curriculum changes effectively enhance instructional outcomes,promote the development of the major,and improve the quality of talent cultivation.展开更多
First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a to...First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a total operational mileage of approximately 12837.8 km(excluding the electronic guideway rubber-tired system,there were 57 cities,with a total operational mileage of 12651.6 km).The metro system dominates,while low-capacity systems exhibit a multi-modal development pattern.Subsequently,the characteristics of China′s urban rail transit industry development are analyzed,indicating that:(1)It should closely align with the theme of urban intensive development,promote quality improvement and efficiency enhancement of existing lines,and focus on the supporting role of initial passenger flow for new line construction,multi-network integration,and economic and financial sustainability.(2)Significant innovative achievements have been made in safety resilience,green and low-carbon development,intelligent construction,and digital transformation.Finally,development recommendations for the"15th Five-Year Plan"period are proposed:promoting cost reduction and efficiency improvement in the rail transit industry,enhancing the operational efficiency of existing networks,continuously exploring railway services for urban commuting,strengthening external exchanges,and driving the"going global"strategy of the urban rail transit industry.展开更多
The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two c...The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.展开更多
In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, pro...In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, providing a set of technological process to identify the sewage monitoring data, which is convenient and simple.展开更多
Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this ana...Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this analysis is based on a database-driven pattern and focuses on the discrete and irregular features of geological data. The geological data from a variety of sources covering a range of accuracy, resolution, quantity and quality are classified and integrated according to their reliability and consistency for 3D modeling. The new interpolation-approximation fitting construction algorithm of geological surfaces with the non-uniform rational B-spline(NURBS) technique is then presented. The NURBS technique can retain the balance among the requirements for accuracy, surface continuity and data storage of geological structures. Finally, four alternative 3D modeling approaches are demonstrated with reference to some examples, which are selected according to the data quantity and accuracy specification. The proposed approaches offer flexible modeling patterns for different practical engineering demands.展开更多
In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp cat...In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp catastrophe and catastrophe parameters, including time-domain catastrophe potential, time-domain catastrophe time, frequency-domain catastrophe potential and frequency- domain degree, are calculated. Catastrophe theory is used in 3D seismic structural interpretation in coal mine. The results show that the position of abnormality of the catastrophe parameter profile or curve is related to the location of fault, and the cusp catastrophe theory is effective to automatically pick up geology information and improve the interpretation precision in 3D seismic data.展开更多
Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large citi...Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large cities and megapolises with rapidly growing economies and huge traffic demands. As a result, a great deal of railway, highway, and rail transit facilities are required in this country. In the past, the construction of these facilities mainly involved subgrade and bridge works; in recent years.展开更多
Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,...Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,provides strong support for in-depth analysis of air pollution characteristics and causes.However,in the era of big data,to meet current demands for fine management of the atmospheric environment,it is important to explore the characteristics and causes of air pollution from multiple aspects for comprehensive and scientific evaluation of air quality.This study reviewed and summarized air quality evaluation methods on the basis of environmental monitoring data statistics during the 13th Five-Year Plan period,and evaluated the level of air pollution in the Beijing-Tianjin-Hebei region and its surrounding areas(i.e.,the“2+26”region)during the period of the three-year action plan to fight air pollution.We suggest that air quality should be comprehensively,deeply,and scientifically evaluated from the aspects of air pollution characteristics,causes,and influences of meteorological conditions and anthropogenic emissions.It is also suggested that a threeyear moving average be introduced as one of the evaluation indexes of long-term change of pollutants.Additionally,both temporal and spatial differences should be considered when removing confounding meteorological factors.展开更多
Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observa...Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observatories.Traditional approaches to seeing prediction mostly rely on regional weather models to capture the in-dome optical turbulence patterns.Thanks to the developing of data gathering and aggregation facilities of astronomical observatories in recent years,data-driven approaches are becoming increasingly feasible and attractive to predict astronomical seeing.This paper systematically investigates data-driven approaches to seeing prediction by leveraging various big data techniques,from traditional statistical modeling,machine learning to new emerging deep learning methods,on the monitoring data of the Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST).The raw monitoring data are preprocessed to allow for big data modeling.Then we formulate the seeing prediction task under each type of modeling framework and develop seeing prediction models through using representative big data techniques,including ARIMA and Prophet for statistical modeling,MLP and XGBoost for machine learning,and LSTM,GRU and Transformer for deep learning.We perform empirical studies on the developed models with a variety of feature configurations,yielding notable insights into the applicability of big data techniques to the seeing prediction task.展开更多
Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimila...Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimilation(DA) and model output statistics(MOS). However, the relative importance and combined effects of the two techniques have not been clarified. Here,a one-month air quality forecast with the Weather Research and Forecasting-Chemistry(WRF-Chem) model was carried out in a virtually operational setup focusing on Hebei Province, China. Meanwhile, three-dimensional variational(3 DVar) DA and MOS based on one-dimensional Kalman filtering were implemented separately and simultaneously to investigate their performance in improving the model forecast. Comparison with observations shows that the chemistry forecast with MOS outperforms that with 3 DVar DA, which could be seen in all the species tested over the whole 72 forecast hours. Combined use of both techniques does not guarantee a better forecast than MOS only, with the improvements and degradations being small and appearing rather randomly. Results indicate that the implementation of MOS is more suitable than 3 DVar DA in improving the operational forecasting ability of WRF-Chem.展开更多
Statistical study is first performed of the efficiency of the technique of statistical interpretation using the products of NWP. The result shows that the application of the technique has improved the predictabilily o...Statistical study is first performed of the efficiency of the technique of statistical interpretation using the products of NWP. The result shows that the application of the technique has improved the predictabilily of predictors in objective forecasting of tropical cyclone motion, increased the forecasting skill of models and extended the valid period of forecast. Then a discussion is made of some technical problems in the application in the motion forecasting, suggesting: a large sample of data and perfect forecast method be employed in constructing objective forecast models for tropical cyclone motion, predictors be included that are so finely built that they reflect all synoptic features and physical quantity fields and NWP products be used and corrected that are available at multiple times. It is one of the effective ways to improve the skill and stability of the forecast by composite use of outcomes from various forecasting models.展开更多
The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with stati...The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with statistical unit. Making use of geographic information system (GIS) techniques, and supported by Auto CAD software, the author of this paper has put forward a practical method for making statistical map and developed a software (SMT) for the making of small scale statistical map using C language.展开更多
The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods...The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.展开更多
A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread...A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread Map (ASSM) to establish the destructive and negative effect a dataset will have on the rest of the matrix if stayed within that matrix. The ASSM is supported by training a neural network engine, which will determine which dataset is responsible for its inability to learn, classify and predict. The carried out experiments proved that a neural system was not able to learn in the presence of such an unhealthy dataset that possessed some deviated characteristics, even though it was produced under the same conditions and through the same process as the rest of the datasets in the matrix, and hence, it should be disqualified, and either removed completely or transferred to another matrix. Such novel approach is very useful in pattern recognition of datasets and features that do not belong to their source and could be used as an effective tool to detect suspicious activities in many areas of secure filing, communication and data storage.展开更多
Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at t...Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.展开更多
Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary dom...Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.展开更多
Omics data provides an essential means for molecular biology and systems biology to capture the systematic properties of inner activities of cells. And one of the strongest challenge problems biological researchers ha...Omics data provides an essential means for molecular biology and systems biology to capture the systematic properties of inner activities of cells. And one of the strongest challenge problems biological researchers have faced is to find the methods for discovering biomarkers for tracking the process of disease such as cancer. So some feature selection methods have been widely used to cope with discovering biomarkers problem. However omics data usually contains a large number of features, but a small number of samples and some omics data have a large range distribution, which make feature selection methods remains difficult to deal with omics data. In order to overcome the problems, wepresent a computing method called localized statistic of abundance distribution based on Gaussian window(LSADBGW) to test the significance of the feature. The experiments on three datasets including gene and protein datasets showed the accuracy and efficiency of LSADBGW for feature selection.展开更多
Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we c...Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we calculate the parameters of ionospheric waves by applying the MMEM to numerously temporally approximate and spatially close global-positioning-system radio occultation total electron content profile triples provided by the unique clustered satellites flight between years 2006 and 2007 right after the constellation observing system for meteorology, ionosphere, and climate(COSMIC) mission launch. The results show that the amplitude of ionospheric waves increases at the low and high latitudes(~0.15 TECU) and decreases in the mid-latitudes(~0.05 TECU). The vertical wavelength of the ionospheric waves increases in the mid-latitudes(e.g., ~50 km at altitudes of 200–250 km) and decreases at the low and high latitudes(e.g., ~35 km at altitudes of 200–250 km).The horizontal wavelength shows a similar result(e.g., ~1400 km in the mid-latitudes and ~800 km at the low and high latitudes).展开更多
We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction...We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.展开更多
文摘There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.
文摘In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social development.Consequently,many domestic universities have introduced majors or courses related to big data.Among these,the Big Data Management and Applications major stands out for its interdisciplinary approach and emphasis on practical skills.However,as an emerging field,it has not yet accumulated a robust foundation in teaching theory and practice.Current instructional practices face issues such as unclear training objectives,inconsistent teaching methods and course content,insufficient integration of practical components,and a shortage of qualified faculty-factors that hinder both the development of the major and the overall quality of education.Taking the statistics course within the Big Data Management and Applications major as an example,this paper examines the challenges faced by statistics education in the context of emerging productive forces and proposes corresponding improvement measures.By introducing innovative teaching concepts and strategies,the teaching system for professional courses is optimized,and authentic classroom scenarios are recreated through illustrative examples.Questionnaire surveys and statistical analyses of data collected before and after the teaching reforms indicate that the curriculum changes effectively enhance instructional outcomes,promote the development of the major,and improve the quality of talent cultivation.
文摘First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a total operational mileage of approximately 12837.8 km(excluding the electronic guideway rubber-tired system,there were 57 cities,with a total operational mileage of 12651.6 km).The metro system dominates,while low-capacity systems exhibit a multi-modal development pattern.Subsequently,the characteristics of China′s urban rail transit industry development are analyzed,indicating that:(1)It should closely align with the theme of urban intensive development,promote quality improvement and efficiency enhancement of existing lines,and focus on the supporting role of initial passenger flow for new line construction,multi-network integration,and economic and financial sustainability.(2)Significant innovative achievements have been made in safety resilience,green and low-carbon development,intelligent construction,and digital transformation.Finally,development recommendations for the"15th Five-Year Plan"period are proposed:promoting cost reduction and efficiency improvement in the rail transit industry,enhancing the operational efficiency of existing networks,continuously exploring railway services for urban commuting,strengthening external exchanges,and driving the"going global"strategy of the urban rail transit industry.
文摘The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.
文摘In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, providing a set of technological process to identify the sewage monitoring data, which is convenient and simple.
基金Supported by the National Natural Science Foundation of China(No.51379006 and No.51009106)the Program for New Century Excellent Talents in University of Ministry of Education of China(No.NCET-12-0404)the National Basic Research Program of China("973"Program,No.2013CB035903)
文摘Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this analysis is based on a database-driven pattern and focuses on the discrete and irregular features of geological data. The geological data from a variety of sources covering a range of accuracy, resolution, quantity and quality are classified and integrated according to their reliability and consistency for 3D modeling. The new interpolation-approximation fitting construction algorithm of geological surfaces with the non-uniform rational B-spline(NURBS) technique is then presented. The NURBS technique can retain the balance among the requirements for accuracy, surface continuity and data storage of geological structures. Finally, four alternative 3D modeling approaches are demonstrated with reference to some examples, which are selected according to the data quantity and accuracy specification. The proposed approaches offer flexible modeling patterns for different practical engineering demands.
文摘In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp catastrophe and catastrophe parameters, including time-domain catastrophe potential, time-domain catastrophe time, frequency-domain catastrophe potential and frequency- domain degree, are calculated. Catastrophe theory is used in 3D seismic structural interpretation in coal mine. The results show that the position of abnormality of the catastrophe parameter profile or curve is related to the location of fault, and the cusp catastrophe theory is effective to automatically pick up geology information and improve the interpretation precision in 3D seismic data.
文摘Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large cities and megapolises with rapidly growing economies and huge traffic demands. As a result, a great deal of railway, highway, and rail transit facilities are required in this country. In the past, the construction of these facilities mainly involved subgrade and bridge works; in recent years.
基金supported by the National Key Research and Development Program of China(No.2019YFC0214800)。
文摘Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,provides strong support for in-depth analysis of air pollution characteristics and causes.However,in the era of big data,to meet current demands for fine management of the atmospheric environment,it is important to explore the characteristics and causes of air pollution from multiple aspects for comprehensive and scientific evaluation of air quality.This study reviewed and summarized air quality evaluation methods on the basis of environmental monitoring data statistics during the 13th Five-Year Plan period,and evaluated the level of air pollution in the Beijing-Tianjin-Hebei region and its surrounding areas(i.e.,the“2+26”region)during the period of the three-year action plan to fight air pollution.We suggest that air quality should be comprehensively,deeply,and scientifically evaluated from the aspects of air pollution characteristics,causes,and influences of meteorological conditions and anthropogenic emissions.It is also suggested that a threeyear moving average be introduced as one of the evaluation indexes of long-term change of pollutants.Additionally,both temporal and spatial differences should be considered when removing confounding meteorological factors.
基金supported by the National Natural Science Foundation of China(U1931207,61602278 and 61702306)Sci.&Tech.Development Fund of Shandong Province of China(2016ZDJS02A11,ZR2017BF015 and ZR2017MF027)+1 种基金the Humanities and Social Science Research Project of the Ministry of Education(18YJAZH017)the Taishan Scholar Program of Shandong Province,and the Science and Technology Support Plan of Youth Innovation Team of Shandong Higher School(2019KJN024)。
文摘Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observatories.Traditional approaches to seeing prediction mostly rely on regional weather models to capture the in-dome optical turbulence patterns.Thanks to the developing of data gathering and aggregation facilities of astronomical observatories in recent years,data-driven approaches are becoming increasingly feasible and attractive to predict astronomical seeing.This paper systematically investigates data-driven approaches to seeing prediction by leveraging various big data techniques,from traditional statistical modeling,machine learning to new emerging deep learning methods,on the monitoring data of the Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST).The raw monitoring data are preprocessed to allow for big data modeling.Then we formulate the seeing prediction task under each type of modeling framework and develop seeing prediction models through using representative big data techniques,including ARIMA and Prophet for statistical modeling,MLP and XGBoost for machine learning,and LSTM,GRU and Transformer for deep learning.We perform empirical studies on the developed models with a variety of feature configurations,yielding notable insights into the applicability of big data techniques to the seeing prediction task.
基金supported by the State Key Research and Development Program (Grant Nos. 2017YFC0209803, 2016YFC0208504, 2016YFC0203303 and 2017YFC0210106)the National Natural Science Foundation of China (Grant Nos. 91544230, 41575145, 41621005 and 41275128)
文摘Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimilation(DA) and model output statistics(MOS). However, the relative importance and combined effects of the two techniques have not been clarified. Here,a one-month air quality forecast with the Weather Research and Forecasting-Chemistry(WRF-Chem) model was carried out in a virtually operational setup focusing on Hebei Province, China. Meanwhile, three-dimensional variational(3 DVar) DA and MOS based on one-dimensional Kalman filtering were implemented separately and simultaneously to investigate their performance in improving the model forecast. Comparison with observations shows that the chemistry forecast with MOS outperforms that with 3 DVar DA, which could be seen in all the species tested over the whole 72 forecast hours. Combined use of both techniques does not guarantee a better forecast than MOS only, with the improvements and degradations being small and appearing rather randomly. Results indicate that the implementation of MOS is more suitable than 3 DVar DA in improving the operational forecasting ability of WRF-Chem.
文摘Statistical study is first performed of the efficiency of the technique of statistical interpretation using the products of NWP. The result shows that the application of the technique has improved the predictabilily of predictors in objective forecasting of tropical cyclone motion, increased the forecasting skill of models and extended the valid period of forecast. Then a discussion is made of some technical problems in the application in the motion forecasting, suggesting: a large sample of data and perfect forecast method be employed in constructing objective forecast models for tropical cyclone motion, predictors be included that are so finely built that they reflect all synoptic features and physical quantity fields and NWP products be used and corrected that are available at multiple times. It is one of the effective ways to improve the skill and stability of the forecast by composite use of outcomes from various forecasting models.
文摘The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with statistical unit. Making use of geographic information system (GIS) techniques, and supported by Auto CAD software, the author of this paper has put forward a practical method for making statistical map and developed a software (SMT) for the making of small scale statistical map using C language.
文摘The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.
文摘A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread Map (ASSM) to establish the destructive and negative effect a dataset will have on the rest of the matrix if stayed within that matrix. The ASSM is supported by training a neural network engine, which will determine which dataset is responsible for its inability to learn, classify and predict. The carried out experiments proved that a neural system was not able to learn in the presence of such an unhealthy dataset that possessed some deviated characteristics, even though it was produced under the same conditions and through the same process as the rest of the datasets in the matrix, and hence, it should be disqualified, and either removed completely or transferred to another matrix. Such novel approach is very useful in pattern recognition of datasets and features that do not belong to their source and could be used as an effective tool to detect suspicious activities in many areas of secure filing, communication and data storage.
文摘Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.
基金This research was partly supported by the Technology Development Program of MSS[No.S3033853]by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A4A1031509).
文摘Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.
文摘Omics data provides an essential means for molecular biology and systems biology to capture the systematic properties of inner activities of cells. And one of the strongest challenge problems biological researchers have faced is to find the methods for discovering biomarkers for tracking the process of disease such as cancer. So some feature selection methods have been widely used to cope with discovering biomarkers problem. However omics data usually contains a large number of features, but a small number of samples and some omics data have a large range distribution, which make feature selection methods remains difficult to deal with omics data. In order to overcome the problems, wepresent a computing method called localized statistic of abundance distribution based on Gaussian window(LSADBGW) to test the significance of the feature. The experiments on three datasets including gene and protein datasets showed the accuracy and efficiency of LSADBGW for feature selection.
基金Supported by the National Natural Science Foundation of China under Grant Nos 41774158,41474129 and 41704148the Chinese Meridian Projectthe Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant No2011324
文摘Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we calculate the parameters of ionospheric waves by applying the MMEM to numerously temporally approximate and spatially close global-positioning-system radio occultation total electron content profile triples provided by the unique clustered satellites flight between years 2006 and 2007 right after the constellation observing system for meteorology, ionosphere, and climate(COSMIC) mission launch. The results show that the amplitude of ionospheric waves increases at the low and high latitudes(~0.15 TECU) and decreases in the mid-latitudes(~0.05 TECU). The vertical wavelength of the ionospheric waves increases in the mid-latitudes(e.g., ~50 km at altitudes of 200–250 km) and decreases at the low and high latitudes(e.g., ~35 km at altitudes of 200–250 km).The horizontal wavelength shows a similar result(e.g., ~1400 km in the mid-latitudes and ~800 km at the low and high latitudes).
文摘We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.