This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the or...This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.展开更多
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics data...Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By us...Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.展开更多
This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author ...This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.展开更多
In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is p...In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.展开更多
文摘This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
基金This paper is financially supported by the National I mportant MiningZone Database ( No .200210000004)Prediction and Assessment ofMineral Resources and Social Service (No .1212010331402) .
文摘Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
文摘Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.
文摘This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.
基金the National Grand Fundamental Research 973 Program of China(G1998030414)
文摘In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.