With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This...With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This study aims to explore the development strategies of real-time data analysis and decision-support systems,and analyze their application status and future development trends in various industries.The article first reviews the basic concepts and importance of real-time data analysis and decision-support systems,and then discusses in detail the key technical aspects such as system architecture,data collection and processing,analysis methods,and visualization techniques.展开更多
In section‘Track decoding’of this article,one of the paragraphs was inadvertently missed out after the text'…shows the flow diagram of the Tr2-1121 track mode.'The missed paragraph is provided below.
DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expres...DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.展开更多
Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpe...Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.展开更多
Cervical cancer,a leading malignancy globally,poses a significant threat to women's health,with an estimated 604,000 new cases and 342,000 deaths reported in 2020^([1]).As cervical cancer is closely linked to huma...Cervical cancer,a leading malignancy globally,poses a significant threat to women's health,with an estimated 604,000 new cases and 342,000 deaths reported in 2020^([1]).As cervical cancer is closely linked to human papilloma virus(HPV)infection,early detection relies on HPV screening;however,late-stage prognosis remains poor,underscoring the need for novel diagnostic and therapeutic targets^([2]).展开更多
There are some limitations when we apply conventional methods to analyze the massive amounts of seismic data acquired with high-density spatial sampling since processors usually obtain the properties of raw data from ...There are some limitations when we apply conventional methods to analyze the massive amounts of seismic data acquired with high-density spatial sampling since processors usually obtain the properties of raw data from common shot gathers or other datasets located at certain points or along lines. We propose a novel method in this paper to observe seismic data on time slices from spatial subsets. The composition of a spatial subset and the unique character of orthogonal or oblique subsets are described and pre-stack subsets are shown by 3D visualization. In seismic data processing, spatial subsets can be used for the following aspects: (1) to check the trace distribution uniformity and regularity; (2) to observe the main features of ground-roll and linear noise; (3) to find abnormal traces from slices of datasets; and (4) to QC the results of pre-stack noise attenuation. The field data application shows that seismic data analysis in spatial subsets is an effective method that may lead to a better discrimination among various wavefields and help us obtain more information.展开更多
Quantized kernel least mean square(QKLMS) algorithm is an effective nonlinear adaptive online learning algorithm with good performance in constraining the growth of network size through the use of quantization for inp...Quantized kernel least mean square(QKLMS) algorithm is an effective nonlinear adaptive online learning algorithm with good performance in constraining the growth of network size through the use of quantization for input space. It can serve as a powerful tool to perform complex computing for network service and application. With the purpose of compressing the input to further improve learning performance, this article proposes a novel QKLMS with entropy-guided learning, called EQ-KLMS. Under the consecutive square entropy learning framework, the basic idea of entropy-guided learning technique is to measure the uncertainty of the input vectors used for QKLMS, and delete those data with larger uncertainty, which are insignificant or easy to cause learning errors. Then, the dataset is compressed. Consequently, by using square entropy, the learning performance of proposed EQ-KLMS is improved with high precision and low computational cost. The proposed EQ-KLMS is validated using a weather-related dataset, and the results demonstrate the desirable performance of our scheme.展开更多
Space debris poses a serious threat to human space activities and needs to be measured and cataloged. As a new technology for space target surveillance, the measurement accuracy of diffuse reflection laser ranging (D...Space debris poses a serious threat to human space activities and needs to be measured and cataloged. As a new technology for space target surveillance, the measurement accuracy of diffuse reflection laser ranging (DRLR) is much higher than that of microwave radar and optoelectronic measurement. Based on the laser ranging data of space debris from the DRLR system at Shanghai Astronomical Observatory acquired in March-April, 2013, the characteristics and precision of the laser ranging data are analyzed and their applications in orbit determination of space debris are discussed, which is implemented for the first time in China. The experiment indicates that the precision of laser ranging data can reach 39 cm-228 cm. When the data are sufficient enough (four arcs measured over three days), the orbital accuracy of space debris can be up to 50 m.展开更多
Detection of a periodic signal hidden in noise is the goal of Superconducting Gravimeter(SG)data analysis.Due to spikes,gaps,datum shrifts(offsets)and other disturbances,the traditional FFT method shows inherent limit...Detection of a periodic signal hidden in noise is the goal of Superconducting Gravimeter(SG)data analysis.Due to spikes,gaps,datum shrifts(offsets)and other disturbances,the traditional FFT method shows inherent limitations.Instead,the least squares spectral analysis(LSSA)has showed itself more suitable than Fourier analysis of gappy,unequally spaced and unequally weighted data series in a variety of applications in geodesy and geophysics.This paper reviews the principle of LSSA and gives a possible strategy for the analysis of time series obtained from the Canadian Superconducting Gravimeter Installation(CGSI),with gaps,offsets,unequal sampling decimation of the data and unequally weighted data points.展开更多
In this paper, variational inference is studied on manifolds with certain metrics. To solve the problem, the analysis is first proposed for the variational Bayesian on Lie group, and then extended to the manifold that...In this paper, variational inference is studied on manifolds with certain metrics. To solve the problem, the analysis is first proposed for the variational Bayesian on Lie group, and then extended to the manifold that is approximated by Lie groups. Then the convergence of the proposed algorithm with respect to the manifold metric is proved in two iterative processes: variational Bayesian expectation (VB-F) step and variational Bayesian maximum (VB-M) step. Moreover, the effective of different metrics for Bayesian analysis is discussed.展开更多
This paper presents the development and application of a production data analysis software that can analyze and forecast the production performance and reservoir properties of shale gas wells.The theories used in the ...This paper presents the development and application of a production data analysis software that can analyze and forecast the production performance and reservoir properties of shale gas wells.The theories used in the study were based on the analytical and empirical approaches.Its reliability has been confirmed through comparisons with a commercial software.Using transient data relating to multi-stage hydraulic fractured horizontal wells,it was confirmed that the accuracy of the modified hyperbolic method showed an error of approximately 4%compared to the actual estimated ultimate recovery(EUR).On the basis of the developed model,reliable productivity forecasts have been obtained by analyzing field production data relating to wells in Canada.The EUR was computed as 9.6 Bcf using the modified hyperbolic method.Employing the Pow Law Exponential method,the EUR would be 9.4 Bcf.The models developed in this study will allow in the future integration of new analytical and empirical theories in a relatively readily than commercial models.展开更多
RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpre...RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.展开更多
A novel study using LCeMS(Liquid chromatography tandem mass spectrometry)coupled with multivariate data analysis and bioactivity evaluation was established for discrimination of aqueous extract and vinegar extract of...A novel study using LCeMS(Liquid chromatography tandem mass spectrometry)coupled with multivariate data analysis and bioactivity evaluation was established for discrimination of aqueous extract and vinegar extract of Shixiao San.Batches of these two kinds of samples were subjected to analysis,and the datasets of sample codes,tR-m/z pairs and ion intensities were processed with principal component analysis(PCA).The result of score plot showed a clear classification of the aqueous and vinegar groups.And the chemical markers having great contributions to the differentiation were screened out on the loading plot.The identities of the chemical markers were performed by comparing the mass fragments and retention times with those of reference compounds and/or the known compounds published in the literatures.Based on the proposed strategy,quercetin-3-Oneohesperidoside,isorhamnetin-3-O-neohespeeridoside,kaempferol-3-O-neohesperidoside,isorhamnetin-3-O-rutinoside and isorhamnetin-3-O-(2G-a-l-rhamnosyl)-rutinoside were explored as representative markers in distinguishing the vinegar extract from the aqueous extract.The anti-hyperlipidemic activities of two processed extracts of Shixiao San were examined on serum levels of lipids,lipoprotein and blood antioxidant enzymes in a rat hyperlipidemia model,and the vinegary extract,exerting strong lipid-lowering and antioxidative effects,was superior to the aqueous extract.Therefore,boiling with vinegary was predicted as the greatest processing procedure for anti-hyperlipidemic effect of Shixiao San.Furthermore,combining the changes in the metabolic profiling and bioactivity evaluation,the five representative markers may be related to the observed antihyperlipidemic effect.展开更多
A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on th...A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on the surface of triaxial soil specimens. The principle and implementation of this digital image processing method were introduced as well as the calculation method for local mechanical properties of soil specimens. Comparisons were made between the test results calculated by the data from both the entire specimen and local regions, and it was found that the deformations were more uniform in the middle region compared with the entire specimen. In order to quantify the nonuniform characteristic of deformation, the non-uniformity coefficients of strain were defined and calculated. Traditional and end-lubricated triaxial tests were conducted under the same condition to investigate the effects of using local region data for deformation calculation on eliminating the end restraint of specimens. After the statistical analysis of all test results, it was concluded that for the tested soil specimen with the size of 39.1 mm × 80 ram, the utilization of the middle 35 mm region of traditional specimens in data processing had a better effect on eliminating end restraint compared with end lubrication. Furthermore, the local data analysis in this paper was validated through the comparisons with the test results from other researchers.展开更多
Under industry 4.0, internet of things(IoT), especially radio frequency identification(RFID) technology, has been widely applied in manufacturing environment. This technology can bring convenience to production contro...Under industry 4.0, internet of things(IoT), especially radio frequency identification(RFID) technology, has been widely applied in manufacturing environment. This technology can bring convenience to production control and production transparency. Meanwhile, it generates increasing production data that are sometimes discrete, uncorrelated, and hard-to-use. Thus,an efficient analysis method is needed to utilize the invaluable data. This work provides an RFID-based production data analysis method for production control in Io T-enabled smart job-shops.The physical configuration and operation logic of Io T-enabled smart job-shop production are firstly described. Based on that,an RFID-based production data model is built to formalize and correlate the heterogeneous production data. Then, an eventdriven RFID-based production data analysis method is proposed to construct the RFID events and judge the process command execution. Furthermore, a near big data approach is used to excavate hidden information and knowledge from the historical production data. A demonstrative case is studied to verify the feasibility of the proposed model and methods. It is expected that our work will provide a different insight into the RFIDbased production data analysis.展开更多
The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, custome...The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.展开更多
Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university...Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.展开更多
With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network be...With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.展开更多
A new dynamic model identification method is developed for continuous-time series analysis and forward prediction applications. The quantum of data is defined over moving time intervals in sliding window coordinates f...A new dynamic model identification method is developed for continuous-time series analysis and forward prediction applications. The quantum of data is defined over moving time intervals in sliding window coordinates for compressing the size of stored data while retaining the resolution of information. Quantum vectors are introduced as the basis of a linear space for defining a Dynamic Quantum Operator (DQO) model of the system defined by its data stream. The transport of the quantum of compressed data is modeled between the time interval bins during the movement of the sliding time window. The DQO model is identified from the samples of the real-time flow of data over the sliding time window. A least-square-fit identification method is used for evaluating the parameters of the quantum operator model, utilizing the repeated use of the sampled data through a number of time steps. The method is tested to analyze, and forward-predict air temperature variations accessed from weather data as well as methane concentration variations obtained from measurements of an operating mine. The results show efficient forward prediction capabilities, surpassing those using neural networks and other methods for the same task.展开更多
The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on pr...The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.展开更多
文摘With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This study aims to explore the development strategies of real-time data analysis and decision-support systems,and analyze their application status and future development trends in various industries.The article first reviews the basic concepts and importance of real-time data analysis and decision-support systems,and then discusses in detail the key technical aspects such as system architecture,data collection and processing,analysis methods,and visualization techniques.
文摘In section‘Track decoding’of this article,one of the paragraphs was inadvertently missed out after the text'…shows the flow diagram of the Tr2-1121 track mode.'The missed paragraph is provided below.
文摘DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.
基金supported in part by the National Key Research and Development Program of China under Grant 2024YFE0200600in part by the National Natural Science Foundation of China under Grant 62071425+3 种基金in part by the Zhejiang Key Research and Development Plan under Grant 2022C01093in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LR23F010005in part by the National Key Laboratory of Wireless Communications Foundation under Grant 2023KP01601in part by the Big Data and Intelligent Computing Key Lab of CQUPT under Grant BDIC-2023-B-001.
文摘Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.
基金supported by a project funded by the Hebei Provincial Central Guidance Local Science and Technology Development Fund(236Z7714G)。
文摘Cervical cancer,a leading malignancy globally,poses a significant threat to women's health,with an estimated 604,000 new cases and 342,000 deaths reported in 2020^([1]).As cervical cancer is closely linked to human papilloma virus(HPV)infection,early detection relies on HPV screening;however,late-stage prognosis remains poor,underscoring the need for novel diagnostic and therapeutic targets^([2]).
文摘There are some limitations when we apply conventional methods to analyze the massive amounts of seismic data acquired with high-density spatial sampling since processors usually obtain the properties of raw data from common shot gathers or other datasets located at certain points or along lines. We propose a novel method in this paper to observe seismic data on time slices from spatial subsets. The composition of a spatial subset and the unique character of orthogonal or oblique subsets are described and pre-stack subsets are shown by 3D visualization. In seismic data processing, spatial subsets can be used for the following aspects: (1) to check the trace distribution uniformity and regularity; (2) to observe the main features of ground-roll and linear noise; (3) to find abnormal traces from slices of datasets; and (4) to QC the results of pre-stack noise attenuation. The field data application shows that seismic data analysis in spatial subsets is an effective method that may lead to a better discrimination among various wavefields and help us obtain more information.
基金supported by the National Key Technologies R&D Program of China under Grant No. 2015BAK38B01the National Natural Science Foundation of China under Grant Nos. 61174103 and 61603032+4 种基金the National Key Research and Development Program of China under Grant Nos. 2016YFB0700502, 2016YFB1001404, and 2017YFB0702300the China Postdoctoral Science Foundation under Grant No. 2016M590048the Fundamental Research Funds for the Central Universities under Grant No. 06500025the University of Science and Technology Beijing - Taipei University of Technology Joint Research Program under Grant No. TW201610the Foundation from the Taipei University of Technology of Taiwan under Grant No. NTUT-USTB-105-4
文摘Quantized kernel least mean square(QKLMS) algorithm is an effective nonlinear adaptive online learning algorithm with good performance in constraining the growth of network size through the use of quantization for input space. It can serve as a powerful tool to perform complex computing for network service and application. With the purpose of compressing the input to further improve learning performance, this article proposes a novel QKLMS with entropy-guided learning, called EQ-KLMS. Under the consecutive square entropy learning framework, the basic idea of entropy-guided learning technique is to measure the uncertainty of the input vectors used for QKLMS, and delete those data with larger uncertainty, which are insignificant or easy to cause learning errors. Then, the dataset is compressed. Consequently, by using square entropy, the learning performance of proposed EQ-KLMS is improved with high precision and low computational cost. The proposed EQ-KLMS is validated using a weather-related dataset, and the results demonstrate the desirable performance of our scheme.
基金Supported by the National Natural Science Foundation of China
文摘Space debris poses a serious threat to human space activities and needs to be measured and cataloged. As a new technology for space target surveillance, the measurement accuracy of diffuse reflection laser ranging (DRLR) is much higher than that of microwave radar and optoelectronic measurement. Based on the laser ranging data of space debris from the DRLR system at Shanghai Astronomical Observatory acquired in March-April, 2013, the characteristics and precision of the laser ranging data are analyzed and their applications in orbit determination of space debris are discussed, which is implemented for the first time in China. The experiment indicates that the precision of laser ranging data can reach 39 cm-228 cm. When the data are sufficient enough (four arcs measured over three days), the orbital accuracy of space debris can be up to 50 m.
基金Funded by the Canadian National Center of Excellence GEOIDE.
文摘Detection of a periodic signal hidden in noise is the goal of Superconducting Gravimeter(SG)data analysis.Due to spikes,gaps,datum shrifts(offsets)and other disturbances,the traditional FFT method shows inherent limitations.Instead,the least squares spectral analysis(LSSA)has showed itself more suitable than Fourier analysis of gappy,unequally spaced and unequally weighted data series in a variety of applications in geodesy and geophysics.This paper reviews the principle of LSSA and gives a possible strategy for the analysis of time series obtained from the Canadian Superconducting Gravimeter Installation(CGSI),with gaps,offsets,unequal sampling decimation of the data and unequally weighted data points.
基金This work was supported by the National Key Research and Development Program of China (No. 2016YF-B0901900) and the National Natural Science Foundation of China (Nos. 61733018, 61333001, 61573344).
文摘In this paper, variational inference is studied on manifolds with certain metrics. To solve the problem, the analysis is first proposed for the variational Bayesian on Lie group, and then extended to the manifold that is approximated by Lie groups. Then the convergence of the proposed algorithm with respect to the manifold metric is proved in two iterative processes: variational Bayesian expectation (VB-F) step and variational Bayesian maximum (VB-M) step. Moreover, the effective of different metrics for Bayesian analysis is discussed.
基金supported by the Energy Efficiency&Resources Core Technology Program of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)granted financial resource from the Ministry of Trade,Industry&Energy,Republic of Korea(No.20172510102090).
文摘This paper presents the development and application of a production data analysis software that can analyze and forecast the production performance and reservoir properties of shale gas wells.The theories used in the study were based on the analytical and empirical approaches.Its reliability has been confirmed through comparisons with a commercial software.Using transient data relating to multi-stage hydraulic fractured horizontal wells,it was confirmed that the accuracy of the modified hyperbolic method showed an error of approximately 4%compared to the actual estimated ultimate recovery(EUR).On the basis of the developed model,reliable productivity forecasts have been obtained by analyzing field production data relating to wells in Canada.The EUR was computed as 9.6 Bcf using the modified hyperbolic method.Employing the Pow Law Exponential method,the EUR would be 9.4 Bcf.The models developed in this study will allow in the future integration of new analytical and empirical theories in a relatively readily than commercial models.
文摘RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.
基金Natural Science Foundation of China(T11036061/T0108).
文摘A novel study using LCeMS(Liquid chromatography tandem mass spectrometry)coupled with multivariate data analysis and bioactivity evaluation was established for discrimination of aqueous extract and vinegar extract of Shixiao San.Batches of these two kinds of samples were subjected to analysis,and the datasets of sample codes,tR-m/z pairs and ion intensities were processed with principal component analysis(PCA).The result of score plot showed a clear classification of the aqueous and vinegar groups.And the chemical markers having great contributions to the differentiation were screened out on the loading plot.The identities of the chemical markers were performed by comparing the mass fragments and retention times with those of reference compounds and/or the known compounds published in the literatures.Based on the proposed strategy,quercetin-3-Oneohesperidoside,isorhamnetin-3-O-neohespeeridoside,kaempferol-3-O-neohesperidoside,isorhamnetin-3-O-rutinoside and isorhamnetin-3-O-(2G-a-l-rhamnosyl)-rutinoside were explored as representative markers in distinguishing the vinegar extract from the aqueous extract.The anti-hyperlipidemic activities of two processed extracts of Shixiao San were examined on serum levels of lipids,lipoprotein and blood antioxidant enzymes in a rat hyperlipidemia model,and the vinegary extract,exerting strong lipid-lowering and antioxidative effects,was superior to the aqueous extract.Therefore,boiling with vinegary was predicted as the greatest processing procedure for anti-hyperlipidemic effect of Shixiao San.Furthermore,combining the changes in the metabolic profiling and bioactivity evaluation,the five representative markers may be related to the observed antihyperlipidemic effect.
基金Supported by Major State Basic Research Development Program of China("973" Program,No.2010CB731502)
文摘A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on the surface of triaxial soil specimens. The principle and implementation of this digital image processing method were introduced as well as the calculation method for local mechanical properties of soil specimens. Comparisons were made between the test results calculated by the data from both the entire specimen and local regions, and it was found that the deformations were more uniform in the middle region compared with the entire specimen. In order to quantify the nonuniform characteristic of deformation, the non-uniformity coefficients of strain were defined and calculated. Traditional and end-lubricated triaxial tests were conducted under the same condition to investigate the effects of using local region data for deformation calculation on eliminating the end restraint of specimens. After the statistical analysis of all test results, it was concluded that for the tested soil specimen with the size of 39.1 mm × 80 ram, the utilization of the middle 35 mm region of traditional specimens in data processing had a better effect on eliminating end restraint compared with end lubrication. Furthermore, the local data analysis in this paper was validated through the comparisons with the test results from other researchers.
基金supported by the National Natural Science Foundation of China(71571142,51275396)
文摘Under industry 4.0, internet of things(IoT), especially radio frequency identification(RFID) technology, has been widely applied in manufacturing environment. This technology can bring convenience to production control and production transparency. Meanwhile, it generates increasing production data that are sometimes discrete, uncorrelated, and hard-to-use. Thus,an efficient analysis method is needed to utilize the invaluable data. This work provides an RFID-based production data analysis method for production control in Io T-enabled smart job-shops.The physical configuration and operation logic of Io T-enabled smart job-shop production are firstly described. Based on that,an RFID-based production data model is built to formalize and correlate the heterogeneous production data. Then, an eventdriven RFID-based production data analysis method is proposed to construct the RFID events and judge the process command execution. Furthermore, a near big data approach is used to excavate hidden information and knowledge from the historical production data. A demonstrative case is studied to verify the feasibility of the proposed model and methods. It is expected that our work will provide a different insight into the RFIDbased production data analysis.
文摘The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.
基金Nanjing Key Laboratory of Intelligent Information Processing Open Fund Project(No.19AIP05)。
文摘Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.
基金supported by the Huaihua University Science Foundation under Grant HHUY2019-24.
文摘With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.
文摘A new dynamic model identification method is developed for continuous-time series analysis and forward prediction applications. The quantum of data is defined over moving time intervals in sliding window coordinates for compressing the size of stored data while retaining the resolution of information. Quantum vectors are introduced as the basis of a linear space for defining a Dynamic Quantum Operator (DQO) model of the system defined by its data stream. The transport of the quantum of compressed data is modeled between the time interval bins during the movement of the sliding time window. The DQO model is identified from the samples of the real-time flow of data over the sliding time window. A least-square-fit identification method is used for evaluating the parameters of the quantum operator model, utilizing the repeated use of the sampled data through a number of time steps. The method is tested to analyze, and forward-predict air temperature variations accessed from weather data as well as methane concentration variations obtained from measurements of an operating mine. The results show efficient forward prediction capabilities, surpassing those using neural networks and other methods for the same task.
基金We thank the anonymous reviewers and editors for their very constructive comments.the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.