In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social developmen...In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social development.Consequently,many domestic universities have introduced majors or courses related to big data.Among these,the Big Data Management and Applications major stands out for its interdisciplinary approach and emphasis on practical skills.However,as an emerging field,it has not yet accumulated a robust foundation in teaching theory and practice.Current instructional practices face issues such as unclear training objectives,inconsistent teaching methods and course content,insufficient integration of practical components,and a shortage of qualified faculty-factors that hinder both the development of the major and the overall quality of education.Taking the statistics course within the Big Data Management and Applications major as an example,this paper examines the challenges faced by statistics education in the context of emerging productive forces and proposes corresponding improvement measures.By introducing innovative teaching concepts and strategies,the teaching system for professional courses is optimized,and authentic classroom scenarios are recreated through illustrative examples.Questionnaire surveys and statistical analyses of data collected before and after the teaching reforms indicate that the curriculum changes effectively enhance instructional outcomes,promote the development of the major,and improve the quality of talent cultivation.展开更多
In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, pro...In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, providing a set of technological process to identify the sewage monitoring data, which is convenient and simple.展开更多
Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this ana...Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this analysis is based on a database-driven pattern and focuses on the discrete and irregular features of geological data. The geological data from a variety of sources covering a range of accuracy, resolution, quantity and quality are classified and integrated according to their reliability and consistency for 3D modeling. The new interpolation-approximation fitting construction algorithm of geological surfaces with the non-uniform rational B-spline(NURBS) technique is then presented. The NURBS technique can retain the balance among the requirements for accuracy, surface continuity and data storage of geological structures. Finally, four alternative 3D modeling approaches are demonstrated with reference to some examples, which are selected according to the data quantity and accuracy specification. The proposed approaches offer flexible modeling patterns for different practical engineering demands.展开更多
In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp cat...In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp catastrophe and catastrophe parameters, including time-domain catastrophe potential, time-domain catastrophe time, frequency-domain catastrophe potential and frequency- domain degree, are calculated. Catastrophe theory is used in 3D seismic structural interpretation in coal mine. The results show that the position of abnormality of the catastrophe parameter profile or curve is related to the location of fault, and the cusp catastrophe theory is effective to automatically pick up geology information and improve the interpretation precision in 3D seismic data.展开更多
Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimila...Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimilation(DA) and model output statistics(MOS). However, the relative importance and combined effects of the two techniques have not been clarified. Here,a one-month air quality forecast with the Weather Research and Forecasting-Chemistry(WRF-Chem) model was carried out in a virtually operational setup focusing on Hebei Province, China. Meanwhile, three-dimensional variational(3 DVar) DA and MOS based on one-dimensional Kalman filtering were implemented separately and simultaneously to investigate their performance in improving the model forecast. Comparison with observations shows that the chemistry forecast with MOS outperforms that with 3 DVar DA, which could be seen in all the species tested over the whole 72 forecast hours. Combined use of both techniques does not guarantee a better forecast than MOS only, with the improvements and degradations being small and appearing rather randomly. Results indicate that the implementation of MOS is more suitable than 3 DVar DA in improving the operational forecasting ability of WRF-Chem.展开更多
Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein...Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein complex structure can be simultaneously built, and the distribution of different states can be measured. This provides a tool to push cryo-EM technology beyond just to resolve protein structures, but to obtain the thermodynamic properties of protein machines. Here, we used a deep manifold learning framework to get the conformational landscape of Kai C proteins, and further obtained the thermodynamic properties of this central oscillator component in the circadian clock by means of statistical physics.展开更多
The knowledge of probability is fully reflected in people's daily life and production. People know the world. By using the tools of probability and mathematical statistics, people can scientifically and reasonably...The knowledge of probability is fully reflected in people's daily life and production. People know the world. By using the tools of probability and mathematical statistics, people can scientifically and reasonably analyze various complex problems and data, thus significantly improving people's quality of life. At the same time, they can accurately predict the law and trend of things development based on the existing data. Because of these advantages, probability theory and mathematical statistics have become the direction of many complicated problems. At present, people are in great need of big data analysis. Similarly, people also need a better way for big data analysis to deal with various difficult problems in actual production and life.展开更多
Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,...Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,provides strong support for in-depth analysis of air pollution characteristics and causes.However,in the era of big data,to meet current demands for fine management of the atmospheric environment,it is important to explore the characteristics and causes of air pollution from multiple aspects for comprehensive and scientific evaluation of air quality.This study reviewed and summarized air quality evaluation methods on the basis of environmental monitoring data statistics during the 13th Five-Year Plan period,and evaluated the level of air pollution in the Beijing-Tianjin-Hebei region and its surrounding areas(i.e.,the“2+26”region)during the period of the three-year action plan to fight air pollution.We suggest that air quality should be comprehensively,deeply,and scientifically evaluated from the aspects of air pollution characteristics,causes,and influences of meteorological conditions and anthropogenic emissions.It is also suggested that a threeyear moving average be introduced as one of the evaluation indexes of long-term change of pollutants.Additionally,both temporal and spatial differences should be considered when removing confounding meteorological factors.展开更多
The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was wea...The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.展开更多
In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absenc...In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absence of the truth. This study applies an error statistics estimation method to the Pfiysical-space Statistical Analysis System (PSAS) height-wind forecast error covariance model. This method consists of two components: the first component computes the error statistics by using the National Meteorological Center (NMC) method, which is a lagged-forecast difference approach, within the framework of the PSAS height-wind forecast error covariance model; the second obtains a calibration formula to rescale the error standard deviations provided by the NMC method. The calibration is against the error statistics estimated by using a maximum-likelihood estimation (MLE) with rawindsonde height observed-minus-forecast residuals. A complete set of formulas for estimating the error statistics and for the calibration is applied to a one-month-long dataset generated by a general circulation model of the Global Model and Assimilation Office (GMAO), NASA. There is a clear constant relationship between the error statistics estimates of the NMC-method and MLE. The final product provides a full set of 6-hour error statistics required by the PSAS height-wind forecast error covariance model over the globe. The features of these error statistics are examined and discussed.展开更多
It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when th...It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.展开更多
It is a matter of course that Kolmogorov’s probability theory is a very useful mathematical tool for the analysis of statistics. However, this fact never means that statistics is based on Kolmogorov’s probability th...It is a matter of course that Kolmogorov’s probability theory is a very useful mathematical tool for the analysis of statistics. However, this fact never means that statistics is based on Kolmogorov’s probability theory, since it is not guaranteed that mathematics and our world are connected. In order that mathematics asserts some statements concerning our world, a certain theory (so called “world view”) mediates between mathematics and our world. Recently we propose measurement theory (i.e., the theory of the quantum mechanical world view), which is characterized as the linguistic turn of quantum mechanics. In this paper, we assert that statistics is based on measurement theory. And, for example, we show, from the pure theoretical point of view (i.e., from the measurement theoretical point of view), that regression analysis can not be justified without Bayes’ theorem. This may imply that even the conventional classification of (Fisher’s) statistics and Bayesian statistics should be reconsidered.展开更多
Statistics Norway has been engaged in the development of official statistics on accidents at work for the last ten years and represents Norway in international bodies like Eurostat working groups. Some of the work was...Statistics Norway has been engaged in the development of official statistics on accidents at work for the last ten years and represents Norway in international bodies like Eurostat working groups. Some of the work was documented and presented back in 2011 at the ISI Dublin convention and a review of further developments the last four years could shed even more light over the efforts made. There has been implemented a new data collection system at the national level that involves data and files based on forms for reporting accidents at work being sent from the Norwegian Labour and Welfare Administration (NLW) to Statistics Norway. Nevertheless there are still some challenges to be met. These include the use of different versions of the NLW forms, the scanning and extraction of data in the NLW, the implementation of a secure electronic solution for transmitting data between the NLW and Statistics Norway, the reading and interpretation of tiff-files and the lessons to be learned from other countries. The ambition is that Statistics Norway produces methodological sound official statistics on accidents at work within the first half of 2015 and transmits data and files to Eurostat that are necessary and sufficient to fulfil EU regulations within the first half of 2016.展开更多
Given the rise of artificial intelligence,big data analytics has emerged as an important tool for processing and assimilating the enormous volume of data available on social media.It is of great theoretical and practi...Given the rise of artificial intelligence,big data analytics has emerged as an important tool for processing and assimilating the enormous volume of data available on social media.It is of great theoretical and practical significance to explore the public opinion diffusion process and characteristics,and users’emotions of mega sports events based on big data statistics in the social media environment.This paper takes the Jakarta Asian Games,Russian World Cup and PyeongChang Winter Olympics held in 2018 as cases,uses text mining and social network analysis methods to analyze the dissemination process of social media users’data,presents the semantic words disseminated in sports events through high-frequency word cloud diagrams,and summarizes the general rules of public opinion dissemination.The results show that the more users’participation,the greater diffusion volume,and the diffusion process shows fast increasing,short duration,scattered topics,diversified contents,and strong guidance and weak continuity of attention.The high-frequency words,except for the names of the events,such as“cheer”,“win the game”and“must win”,have obvious concentration of emotional words.展开更多
Accurate tracking and statistics analysis of pedestrian flow have wide applications in public scenarios.However,the conventional tracking-by-detection approaches are prone to missing individuals in densely populated o...Accurate tracking and statistics analysis of pedestrian flow have wide applications in public scenarios.However,the conventional tracking-by-detection approaches are prone to missing individuals in densely populated or poorly lit environments.This study introduces a pedestrian detection and flow statistics method based on data fusion,which effectively tracks pedestrians across varying crowd densities.The proposed method amalgamates object detection strategies with crowd counting technique to determine the locations of all pedestrians.By observing the coordinates of pedestrians’foot points,this approach assesses the interaction dynamics between the movement trajectories of pedestrians and designated spatial areas,thereby enabling the collection of flow statistics.Experimental results indicate that the proposed method identifies 2.7 times more pedestrians than object detection methods alone and decreases false positives by 58%compared to crowd counting techniques in crowded settings.In conclusion,the proposed method exhibits considerable promise for achieving accurate pedestrian detection and flow analysis.展开更多
First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a to...First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a total operational mileage of approximately 12837.8 km(excluding the electronic guideway rubber-tired system,there were 57 cities,with a total operational mileage of 12651.6 km).The metro system dominates,while low-capacity systems exhibit a multi-modal development pattern.Subsequently,the characteristics of China′s urban rail transit industry development are analyzed,indicating that:(1)It should closely align with the theme of urban intensive development,promote quality improvement and efficiency enhancement of existing lines,and focus on the supporting role of initial passenger flow for new line construction,multi-network integration,and economic and financial sustainability.(2)Significant innovative achievements have been made in safety resilience,green and low-carbon development,intelligent construction,and digital transformation.Finally,development recommendations for the"15th Five-Year Plan"period are proposed:promoting cost reduction and efficiency improvement in the rail transit industry,enhancing the operational efficiency of existing networks,continuously exploring railway services for urban commuting,strengthening external exchanges,and driving the"going global"strategy of the urban rail transit industry.展开更多
The founding conference of the Big Data Statistics Branch (BDSB) of the Chinese Association forApplied Statistics (CAAS) was held on 8 December 2018, at East China Normal University (ECNU),Shanghai, China. More than 6...The founding conference of the Big Data Statistics Branch (BDSB) of the Chinese Association forApplied Statistics (CAAS) was held on 8 December 2018, at East China Normal University (ECNU),Shanghai, China. More than 600 experts and scholars attended the conference. Professor ZhangRiquan was elected as the chairman of the first Board of Directors of the BDSB. Fang Xiangzhong,Chairman of the CAAS, delivered a speech. Professor Wang Zhaojun and Dr Liu Zhong delivered,respectively, keynote reports on the development of Big Data researches and practices, at theconference. The BDSB will be dedicated to building a high-level big data statistics exchange platform for experts and scholars in universities, governments, enterprises, and other fields to betterserve the society and serve the country’s major strategies.展开更多
Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play...Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.展开更多
Branching river channels and the coexistence of valleys, ridges, hiils, and slopes'as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in the...Branching river channels and the coexistence of valleys, ridges, hiils, and slopes'as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in these complex conditions, has established a suite of technologies for high-fidelity processing and fine interpretation of seismic data. This article introduces the processes involved in the data processing and interpretation and illustrates the results.展开更多
文摘In the new era,the impact of emerging productive forces has permeated every sector of industry.As the core production factor of these forces,data plays a pivotal role in industrial transformation and social development.Consequently,many domestic universities have introduced majors or courses related to big data.Among these,the Big Data Management and Applications major stands out for its interdisciplinary approach and emphasis on practical skills.However,as an emerging field,it has not yet accumulated a robust foundation in teaching theory and practice.Current instructional practices face issues such as unclear training objectives,inconsistent teaching methods and course content,insufficient integration of practical components,and a shortage of qualified faculty-factors that hinder both the development of the major and the overall quality of education.Taking the statistics course within the Big Data Management and Applications major as an example,this paper examines the challenges faced by statistics education in the context of emerging productive forces and proposes corresponding improvement measures.By introducing innovative teaching concepts and strategies,the teaching system for professional courses is optimized,and authentic classroom scenarios are recreated through illustrative examples.Questionnaire surveys and statistical analyses of data collected before and after the teaching reforms indicate that the curriculum changes effectively enhance instructional outcomes,promote the development of the major,and improve the quality of talent cultivation.
文摘In order to reduce the enormous pressure to environmental monitoring work brought by the false sewage monitoring data, Grubbs method, box plot, t test and other methods are used to make depth analysis to the data, providing a set of technological process to identify the sewage monitoring data, which is convenient and simple.
基金Supported by the National Natural Science Foundation of China(No.51379006 and No.51009106)the Program for New Century Excellent Talents in University of Ministry of Education of China(No.NCET-12-0404)the National Basic Research Program of China("973"Program,No.2013CB035903)
文摘Due to the complex nature of multi-source geological data, it is difficult to rebuild every geological structure through a single 3D modeling method. The multi-source data interpretation method put forward in this analysis is based on a database-driven pattern and focuses on the discrete and irregular features of geological data. The geological data from a variety of sources covering a range of accuracy, resolution, quantity and quality are classified and integrated according to their reliability and consistency for 3D modeling. The new interpolation-approximation fitting construction algorithm of geological surfaces with the non-uniform rational B-spline(NURBS) technique is then presented. The NURBS technique can retain the balance among the requirements for accuracy, surface continuity and data storage of geological structures. Finally, four alternative 3D modeling approaches are demonstrated with reference to some examples, which are selected according to the data quantity and accuracy specification. The proposed approaches offer flexible modeling patterns for different practical engineering demands.
文摘In order to detect fault exactly and quickly, cusp catastrophe theory is used to interpret 3D coal seismic data in this paper. By establishing a cusp model, seismic signal is transformed into standard form of cusp catastrophe and catastrophe parameters, including time-domain catastrophe potential, time-domain catastrophe time, frequency-domain catastrophe potential and frequency- domain degree, are calculated. Catastrophe theory is used in 3D seismic structural interpretation in coal mine. The results show that the position of abnormality of the catastrophe parameter profile or curve is related to the location of fault, and the cusp catastrophe theory is effective to automatically pick up geology information and improve the interpretation precision in 3D seismic data.
基金supported by the State Key Research and Development Program (Grant Nos. 2017YFC0209803, 2016YFC0208504, 2016YFC0203303 and 2017YFC0210106)the National Natural Science Foundation of China (Grant Nos. 91544230, 41575145, 41621005 and 41275128)
文摘Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimilation(DA) and model output statistics(MOS). However, the relative importance and combined effects of the two techniques have not been clarified. Here,a one-month air quality forecast with the Weather Research and Forecasting-Chemistry(WRF-Chem) model was carried out in a virtually operational setup focusing on Hebei Province, China. Meanwhile, three-dimensional variational(3 DVar) DA and MOS based on one-dimensional Kalman filtering were implemented separately and simultaneously to investigate their performance in improving the model forecast. Comparison with observations shows that the chemistry forecast with MOS outperforms that with 3 DVar DA, which could be seen in all the species tested over the whole 72 forecast hours. Combined use of both techniques does not guarantee a better forecast than MOS only, with the improvements and degradations being small and appearing rather randomly. Results indicate that the implementation of MOS is more suitable than 3 DVar DA in improving the operational forecasting ability of WRF-Chem.
基金supported by the National Natural Science Foundation of China (Grant No. 12090054)。
文摘Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein complex structure can be simultaneously built, and the distribution of different states can be measured. This provides a tool to push cryo-EM technology beyond just to resolve protein structures, but to obtain the thermodynamic properties of protein machines. Here, we used a deep manifold learning framework to get the conformational landscape of Kai C proteins, and further obtained the thermodynamic properties of this central oscillator component in the circadian clock by means of statistical physics.
文摘The knowledge of probability is fully reflected in people's daily life and production. People know the world. By using the tools of probability and mathematical statistics, people can scientifically and reasonably analyze various complex problems and data, thus significantly improving people's quality of life. At the same time, they can accurately predict the law and trend of things development based on the existing data. Because of these advantages, probability theory and mathematical statistics have become the direction of many complicated problems. At present, people are in great need of big data analysis. Similarly, people also need a better way for big data analysis to deal with various difficult problems in actual production and life.
基金supported by the National Key Research and Development Program of China(No.2019YFC0214800)。
文摘Air quality monitoring is effective for timely understanding of the current air quality status of a region or city.Currently,the huge volume of environmental monitoring data,which has reasonable real-time performance,provides strong support for in-depth analysis of air pollution characteristics and causes.However,in the era of big data,to meet current demands for fine management of the atmospheric environment,it is important to explore the characteristics and causes of air pollution from multiple aspects for comprehensive and scientific evaluation of air quality.This study reviewed and summarized air quality evaluation methods on the basis of environmental monitoring data statistics during the 13th Five-Year Plan period,and evaluated the level of air pollution in the Beijing-Tianjin-Hebei region and its surrounding areas(i.e.,the“2+26”region)during the period of the three-year action plan to fight air pollution.We suggest that air quality should be comprehensively,deeply,and scientifically evaluated from the aspects of air pollution characteristics,causes,and influences of meteorological conditions and anthropogenic emissions.It is also suggested that a threeyear moving average be introduced as one of the evaluation indexes of long-term change of pollutants.Additionally,both temporal and spatial differences should be considered when removing confounding meteorological factors.
文摘The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.
文摘In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absence of the truth. This study applies an error statistics estimation method to the Pfiysical-space Statistical Analysis System (PSAS) height-wind forecast error covariance model. This method consists of two components: the first component computes the error statistics by using the National Meteorological Center (NMC) method, which is a lagged-forecast difference approach, within the framework of the PSAS height-wind forecast error covariance model; the second obtains a calibration formula to rescale the error standard deviations provided by the NMC method. The calibration is against the error statistics estimated by using a maximum-likelihood estimation (MLE) with rawindsonde height observed-minus-forecast residuals. A complete set of formulas for estimating the error statistics and for the calibration is applied to a one-month-long dataset generated by a general circulation model of the Global Model and Assimilation Office (GMAO), NASA. There is a clear constant relationship between the error statistics estimates of the NMC-method and MLE. The final product provides a full set of 6-hour error statistics required by the PSAS height-wind forecast error covariance model over the globe. The features of these error statistics are examined and discussed.
文摘It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.
文摘It is a matter of course that Kolmogorov’s probability theory is a very useful mathematical tool for the analysis of statistics. However, this fact never means that statistics is based on Kolmogorov’s probability theory, since it is not guaranteed that mathematics and our world are connected. In order that mathematics asserts some statements concerning our world, a certain theory (so called “world view”) mediates between mathematics and our world. Recently we propose measurement theory (i.e., the theory of the quantum mechanical world view), which is characterized as the linguistic turn of quantum mechanics. In this paper, we assert that statistics is based on measurement theory. And, for example, we show, from the pure theoretical point of view (i.e., from the measurement theoretical point of view), that regression analysis can not be justified without Bayes’ theorem. This may imply that even the conventional classification of (Fisher’s) statistics and Bayesian statistics should be reconsidered.
文摘Statistics Norway has been engaged in the development of official statistics on accidents at work for the last ten years and represents Norway in international bodies like Eurostat working groups. Some of the work was documented and presented back in 2011 at the ISI Dublin convention and a review of further developments the last four years could shed even more light over the efforts made. There has been implemented a new data collection system at the national level that involves data and files based on forms for reporting accidents at work being sent from the Norwegian Labour and Welfare Administration (NLW) to Statistics Norway. Nevertheless there are still some challenges to be met. These include the use of different versions of the NLW forms, the scanning and extraction of data in the NLW, the implementation of a secure electronic solution for transmitting data between the NLW and Statistics Norway, the reading and interpretation of tiff-files and the lessons to be learned from other countries. The ambition is that Statistics Norway produces methodological sound official statistics on accidents at work within the first half of 2015 and transmits data and files to Eurostat that are necessary and sufficient to fulfil EU regulations within the first half of 2016.
基金Supported by National Natural Science Foundation of China(72302230)Shandong Provincial Natural Science Foundation Youth Project(ZR2023QG068)。
文摘Given the rise of artificial intelligence,big data analytics has emerged as an important tool for processing and assimilating the enormous volume of data available on social media.It is of great theoretical and practical significance to explore the public opinion diffusion process and characteristics,and users’emotions of mega sports events based on big data statistics in the social media environment.This paper takes the Jakarta Asian Games,Russian World Cup and PyeongChang Winter Olympics held in 2018 as cases,uses text mining and social network analysis methods to analyze the dissemination process of social media users’data,presents the semantic words disseminated in sports events through high-frequency word cloud diagrams,and summarizes the general rules of public opinion dissemination.The results show that the more users’participation,the greater diffusion volume,and the diffusion process shows fast increasing,short duration,scattered topics,diversified contents,and strong guidance and weak continuity of attention.The high-frequency words,except for the names of the events,such as“cheer”,“win the game”and“must win”,have obvious concentration of emotional words.
基金National Natural Science Foundation of China(No.72174102,No.72334003)Major Consulting Project of Chinese Academy of Engineering(No.2024-XBZD-21).
文摘Accurate tracking and statistics analysis of pedestrian flow have wide applications in public scenarios.However,the conventional tracking-by-detection approaches are prone to missing individuals in densely populated or poorly lit environments.This study introduces a pedestrian detection and flow statistics method based on data fusion,which effectively tracks pedestrians across varying crowd densities.The proposed method amalgamates object detection strategies with crowd counting technique to determine the locations of all pedestrians.By observing the coordinates of pedestrians’foot points,this approach assesses the interaction dynamics between the movement trajectories of pedestrians and designated spatial areas,thereby enabling the collection of flow statistics.Experimental results indicate that the proposed method identifies 2.7 times more pedestrians than object detection methods alone and decreases false positives by 58%compared to crowd counting techniques in crowded settings.In conclusion,the proposed method exhibits considerable promise for achieving accurate pedestrian detection and flow analysis.
文摘First,statistics on the operational lines and mileage of urban rail transit in China are conducted.The results show that,as of Dec.31,2025,there were 60 cities with urban rail transit in operation nationwide,with a total operational mileage of approximately 12837.8 km(excluding the electronic guideway rubber-tired system,there were 57 cities,with a total operational mileage of 12651.6 km).The metro system dominates,while low-capacity systems exhibit a multi-modal development pattern.Subsequently,the characteristics of China′s urban rail transit industry development are analyzed,indicating that:(1)It should closely align with the theme of urban intensive development,promote quality improvement and efficiency enhancement of existing lines,and focus on the supporting role of initial passenger flow for new line construction,multi-network integration,and economic and financial sustainability.(2)Significant innovative achievements have been made in safety resilience,green and low-carbon development,intelligent construction,and digital transformation.Finally,development recommendations for the"15th Five-Year Plan"period are proposed:promoting cost reduction and efficiency improvement in the rail transit industry,enhancing the operational efficiency of existing networks,continuously exploring railway services for urban commuting,strengthening external exchanges,and driving the"going global"strategy of the urban rail transit industry.
文摘The founding conference of the Big Data Statistics Branch (BDSB) of the Chinese Association forApplied Statistics (CAAS) was held on 8 December 2018, at East China Normal University (ECNU),Shanghai, China. More than 600 experts and scholars attended the conference. Professor ZhangRiquan was elected as the chairman of the first Board of Directors of the BDSB. Fang Xiangzhong,Chairman of the CAAS, delivered a speech. Professor Wang Zhaojun and Dr Liu Zhong delivered,respectively, keynote reports on the development of Big Data researches and practices, at theconference. The BDSB will be dedicated to building a high-level big data statistics exchange platform for experts and scholars in universities, governments, enterprises, and other fields to betterserve the society and serve the country’s major strategies.
基金sponsored by the National Science and Technology Major Project(No.2011ZX05023-005-006)
文摘Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.
文摘Branching river channels and the coexistence of valleys, ridges, hiils, and slopes'as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in these complex conditions, has established a suite of technologies for high-fidelity processing and fine interpretation of seismic data. This article introduces the processes involved in the data processing and interpretation and illustrates the results.