Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By us...Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.展开更多
This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the or...This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.展开更多
In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is p...In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.展开更多
Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics data...Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is...To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is developed.The procedure includes(a)conversion of the finite element data into a triangular mesh,(b)selection of a common coordinate system,(c)determination of the rigid body transformation to place both measurements and FEA data in the same system and(d)interpolation of the FEA nodal information to the same spatial locations as the StereoDIC measurements using barycentric coordinates.For an aluminum Al-6061 double edge notched tensile specimen,FEA results are obtained using both the von Mises isotropic yield criterion and Hill’s quadratic anisotropic yield criterion,with the unknown Hill model parameters determined using full-field specimen strain measurements for the nominally plane stress specimen.Using Hill’s quadratic anisotropic yield criterion,the point-by-point comparison of experimentally based full-field strains and stresses to finite element predictions are shown to be in excellent agreement,confirming the effectiveness of the field comparison process.展开更多
Understanding data transformation scripts is an essential task for data analysts who write code to process data.However,this can be challenging,especially when encountering unfamiliar scripts.Comments can help users u...Understanding data transformation scripts is an essential task for data analysts who write code to process data.However,this can be challenging,especially when encountering unfamiliar scripts.Comments can help users understand data transformation code,but well-written comments are not always present.Visualization methods have been proposed to help analysts understand data transformations,but they generally require a separate view,which may distract users and entail efforts for connecting visualizations and code.In this work,we explore the use of in situ program visualization to help data analysts understand data transformation scripts.We present CodeLin,a new visualization method that combines word-sized glyphs for presenting transformation semantics and a lineage graph for presenting data lineage in an in situ manner.Through a use case,code pattern demonstrations,and a preliminary user study,we demonstrate the effectiveness and usability of CodeLin.We further discuss how visualization can help users understand data transformation code.展开更多
Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the respons...Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the response of variations of M-AMBI, using biomass (M-BAMBI) in the calculations, with different transformations of the raw data. The results showed that the ecological quality of most areas in the study indicated by M-AMBI was from moderate to bad status with the worse status in the coastal areas, especially around the estuaries, harbors and ouffalls, and better status in the offshore areas except the area close to oil platforms or disposal sites. Despite large variations in nature of the input data, all variations of M-AMBI gave similar spatial and temporal distribution patterns of the ecological status within the bay, and showed high correlation between them. The agreement of new ecological status obtained from all M-AMBI variations, which were calculated according to linear regression, was almost perfect. The benthic quality, assessed using different input data, could be related to human pressures in the bay, such as water discharges, land reclamation, dredged sediment and drilling cuts disposal sites. It seems that M-BAMBI were more effective than M-NABMI (M-AMBI calculated using abundance data) in indicating human pressures of the Bay. Finally, indices calculated with more severe transformations, such as presence/absence data, could not indicate the higher density of human pressures in the coastal areas of the north part of our study area, but those calculated using mild transformation (i.e., square root) did.展开更多
This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author ...This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.展开更多
The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Hua...The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Huang transform( HHT) analysis method to the one-second interval records at seven superconducting gravimeter( SG) stations and seven broadband seismic( BS) stations to carry out spectrum analysis and compute the energy-frequency-time distribution. Tidal effects are removed from SG data by T-soft software before the data series are transformed by HHT method. Based on HHT spectra and the marginal spectra from the records at selected seven SG stations and seven BS stations we found anomalous signals in terms of energy. The dominant frequencies of the anomalous signals are respectively about 0. 13 Hz in SG records and 0. 2 Hz in seismic data,and the anomalous signals occurred one week or two to three days prior to the event. Taking into account that in this period no typhoon event occurred,we may conclude that these anomalous signals might be related to the great earthquake event.展开更多
Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model an...Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.展开更多
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat...Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.展开更多
Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible d...Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible data standardization through data feature identification,cluster analysis,and weighted data transformation.The proposed method could handle locally inflated distribution with long tails.The results of this study enrich the method library of data standardization,allowing researchers to have more targeted data differentiation capabilities in the establishment of indicator systems.展开更多
VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of t...VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of tabular data in the context of the Virtual Observatory (VO). It is the first Proposed Recommendation defined by International Virtual Observatory Alliance, and has obtained wide support from both the VO community and many Astronomy projects. OpenOffice.org is a mature, open source, and front office application suite with the advantage of native support of industrial standard OpenDocument XML file format. Using the VOFilter, VOTable files can be loaded in OpenOffice.org Calc, a spreadsheet application, and then displayed and analyzed as other spreadsheet files. Here, the VOFilter acts as a connector, bridging the coming VO with current industrial office applications. We introduce Virtual Observatory and technical background of the VOFilter. Its workflow, installation and usage are presented. Existing problems and limitations are also discussed together with the future development plans.展开更多
Pharmacotranscriptomic profiles,which capture drug-induced changes in gene expression,offer vast potential for computational drug discovery and are widely used in modern medicine.However,current computational approach...Pharmacotranscriptomic profiles,which capture drug-induced changes in gene expression,offer vast potential for computational drug discovery and are widely used in modern medicine.However,current computational approaches neglected the associations within gene‒gene functional networks and unrevealed the systematic relationship between drug efficacy and the reversal effect.Here,we developed a new genome-scale functional module(GSFM)transformation framework to quantitatively evaluate drug efficacy for in silico drug discovery.GSFM employs four biologically interpretable quantifiers:GSFM_Up,GSFM_Down,GSFM_ssGSEA,and GSFM_TF to comprehensively evaluate the multidimension activities of each functional module(FM)at gene-level,pathway-level,and transcriptional regulatory network-level.Through a data transformation strategy,GSFM effectively converts noisy and potentially unreliable gene expression data into a more dependable FM active matrix,significantly outperforming other methods in terms of both robustness and accuracy.Besides,we found a positive correlation between RSGSFM and drug efficacy,suggesting that RSGSFM could serve as representative measure of drug efficacy.Furthermore,we identified WYE-354,perhexiline,and NTNCB as candidate therapeutic agents for the treatment of breast-invasive carcinoma,lung adenocarcinoma,and castration-resistant prostate cancer,respectively.The results from in vitro and in vivo experiments have validated that all identified compounds exhibit potent anti-tumor effects,providing proof-of-concept for our computational approach.展开更多
Ⅰ.INTRODUCTION.In recent years,the Ministry of Public Security has been actively promoting a nationwide initiative known as‘evidence transformation of financial analysis'--the forensic transformation of financia...Ⅰ.INTRODUCTION.In recent years,the Ministry of Public Security has been actively promoting a nationwide initiative known as‘evidence transformation of financial analysis'--the forensic transformation of financial data analysis results into legally admissible criminal evidence.展开更多
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
文摘Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.
文摘This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.
基金the National Grand Fundamental Research 973 Program of China(G1998030414)
文摘In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.
基金This paper is financially supported by the National I mportant MiningZone Database ( No .200210000004)Prediction and Assessment ofMineral Resources and Social Service (No .1212010331402) .
文摘Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金Financial support provided by Correlated Solutions Incorporated to perform StereoDIC experimentsthe Department of Mechanical Engineering at the University of South Carolina for simulation studies is deeply appreciated.
文摘To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is developed.The procedure includes(a)conversion of the finite element data into a triangular mesh,(b)selection of a common coordinate system,(c)determination of the rigid body transformation to place both measurements and FEA data in the same system and(d)interpolation of the FEA nodal information to the same spatial locations as the StereoDIC measurements using barycentric coordinates.For an aluminum Al-6061 double edge notched tensile specimen,FEA results are obtained using both the von Mises isotropic yield criterion and Hill’s quadratic anisotropic yield criterion,with the unknown Hill model parameters determined using full-field specimen strain measurements for the nominally plane stress specimen.Using Hill’s quadratic anisotropic yield criterion,the point-by-point comparison of experimentally based full-field strains and stresses to finite element predictions are shown to be in excellent agreement,confirming the effectiveness of the field comparison process.
基金supported by National Key R&D Program of China(2022YFE0137800)Key“Pioneer”R&D Projects of Zhejiang Province(2023C01120)NSFC(U22A2032,62402421).
文摘Understanding data transformation scripts is an essential task for data analysts who write code to process data.However,this can be challenging,especially when encountering unfamiliar scripts.Comments can help users understand data transformation code,but well-written comments are not always present.Visualization methods have been proposed to help analysts understand data transformations,but they generally require a separate view,which may distract users and entail efforts for connecting visualizations and code.In this work,we explore the use of in situ program visualization to help data analysts understand data transformation scripts.We present CodeLin,a new visualization method that combines word-sized glyphs for presenting transformation semantics and a lineage graph for presenting data lineage in an in situ manner.Through a use case,code pattern demonstrations,and a preliminary user study,we demonstrate the effectiveness and usability of CodeLin.We further discuss how visualization can help users understand data transformation code.
基金The National Natural Science Foundation of China under contract Nos 41406160 and 51209190the Public Science and Technology Research Funds Projects of Environmental Protection under contract No.201309007the Special Foundation of Chinese Research Academy of Sciences under contract No.gyk5091201
文摘Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the response of variations of M-AMBI, using biomass (M-BAMBI) in the calculations, with different transformations of the raw data. The results showed that the ecological quality of most areas in the study indicated by M-AMBI was from moderate to bad status with the worse status in the coastal areas, especially around the estuaries, harbors and ouffalls, and better status in the offshore areas except the area close to oil platforms or disposal sites. Despite large variations in nature of the input data, all variations of M-AMBI gave similar spatial and temporal distribution patterns of the ecological status within the bay, and showed high correlation between them. The agreement of new ecological status obtained from all M-AMBI variations, which were calculated according to linear regression, was almost perfect. The benthic quality, assessed using different input data, could be related to human pressures in the bay, such as water discharges, land reclamation, dredged sediment and drilling cuts disposal sites. It seems that M-BAMBI were more effective than M-NABMI (M-AMBI calculated using abundance data) in indicating human pressures of the Bay. Finally, indices calculated with more severe transformations, such as presence/absence data, could not indicate the higher density of human pressures in the coastal areas of the north part of our study area, but those calculated using mild transformation (i.e., square root) did.
文摘This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.
基金supported by National 973 Project China(2013CB733305)NSFC(41174011,41128003,41210006,41021061,40974015)
文摘The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Huang transform( HHT) analysis method to the one-second interval records at seven superconducting gravimeter( SG) stations and seven broadband seismic( BS) stations to carry out spectrum analysis and compute the energy-frequency-time distribution. Tidal effects are removed from SG data by T-soft software before the data series are transformed by HHT method. Based on HHT spectra and the marginal spectra from the records at selected seven SG stations and seven BS stations we found anomalous signals in terms of energy. The dominant frequencies of the anomalous signals are respectively about 0. 13 Hz in SG records and 0. 2 Hz in seismic data,and the anomalous signals occurred one week or two to three days prior to the event. Taking into account that in this period no typhoon event occurred,we may conclude that these anomalous signals might be related to the great earthquake event.
基金supported by the National Natural Science Foundation of China(41171336)the Project of Jiangsu Province Agricultural Science and Technology Innovation Fund(CX12-3054)
文摘Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.
基金supported by the Crohn's&Colitis Foundation Senior Research Award(No.902766 to J.S.)The National Institute of Diabetes and Digestive and Kidney Diseases(No.R01DK105118-01 and R01DK114126 to J.S.)+1 种基金United States Department of Defense Congressionally Directed Medical Research Programs(No.BC191198 to J.S.)VA Merit Award BX-19-00 to J.S.
文摘Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.
文摘Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible data standardization through data feature identification,cluster analysis,and weighted data transformation.The proposed method could handle locally inflated distribution with long tails.The results of this study enrich the method library of data standardization,allowing researchers to have more targeted data differentiation capabilities in the establishment of indicator systems.
基金Supported by the National Natural Science Foundation of China.
文摘VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of tabular data in the context of the Virtual Observatory (VO). It is the first Proposed Recommendation defined by International Virtual Observatory Alliance, and has obtained wide support from both the VO community and many Astronomy projects. OpenOffice.org is a mature, open source, and front office application suite with the advantage of native support of industrial standard OpenDocument XML file format. Using the VOFilter, VOTable files can be loaded in OpenOffice.org Calc, a spreadsheet application, and then displayed and analyzed as other spreadsheet files. Here, the VOFilter acts as a connector, bridging the coming VO with current industrial office applications. We introduce Virtual Observatory and technical background of the VOFilter. Its workflow, installation and usage are presented. Existing problems and limitations are also discussed together with the future development plans.
基金funded by the National Key Research and Development Program of China(2022YFC3502000)the National Natural Science Foundation of China(82141203,82274172,82430119)+4 种基金Shanghai Municipal Science and Technology Major Project(ZD2021CY001)Key project at central government level:The ability establishment of sustainable use for valuable Chinese medicine resources(2060302)Innovation Team and Talents Cultivation Program of National Administration of Traditional Chinese Medicine(ZYYCXTDD-202004)the support of Wild Goose Array Project,Zhengzhou Center of PLAJLSF,the Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission(23CGA45,Saisai Tian)Tianjin Health Research Project(TJWJ2024QN100).
文摘Pharmacotranscriptomic profiles,which capture drug-induced changes in gene expression,offer vast potential for computational drug discovery and are widely used in modern medicine.However,current computational approaches neglected the associations within gene‒gene functional networks and unrevealed the systematic relationship between drug efficacy and the reversal effect.Here,we developed a new genome-scale functional module(GSFM)transformation framework to quantitatively evaluate drug efficacy for in silico drug discovery.GSFM employs four biologically interpretable quantifiers:GSFM_Up,GSFM_Down,GSFM_ssGSEA,and GSFM_TF to comprehensively evaluate the multidimension activities of each functional module(FM)at gene-level,pathway-level,and transcriptional regulatory network-level.Through a data transformation strategy,GSFM effectively converts noisy and potentially unreliable gene expression data into a more dependable FM active matrix,significantly outperforming other methods in terms of both robustness and accuracy.Besides,we found a positive correlation between RSGSFM and drug efficacy,suggesting that RSGSFM could serve as representative measure of drug efficacy.Furthermore,we identified WYE-354,perhexiline,and NTNCB as candidate therapeutic agents for the treatment of breast-invasive carcinoma,lung adenocarcinoma,and castration-resistant prostate cancer,respectively.The results from in vitro and in vivo experiments have validated that all identified compounds exhibit potent anti-tumor effects,providing proof-of-concept for our computational approach.
基金research result of the Major Judicial Research Project for 2024 of the Supreme People's Court,entitled Research on the Recovery and Disposition of Assets Involved in Criminal Cases of Illegal Fundraising(GFZDKT2024C03-1).
文摘Ⅰ.INTRODUCTION.In recent years,the Ministry of Public Security has been actively promoting a nationwide initiative known as‘evidence transformation of financial analysis'--the forensic transformation of financial data analysis results into legally admissible criminal evidence.