A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR...A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.展开更多
In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonline...In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonlinear industrial process. Kernel PCA (KPCA) is extensionof PCA and can be used for nonlinear feature analysis. A nonlinear data reconciliation method basedon KPCA is proposed. The basic idea of this method is that firstly original data are mapped to highdimensional feature space by nonlinear function, and PCA is implemented in the feature space. Thennonlinear feature analysis is implemented and data are reconstructed by using the kernel. The datareconciliation method based on KPCA is applied to ternary distillation column. Simulation resultsshow that this method can filter the noise in measurements of nonlinear process and reconciliateddata can represent the true information of nonlinear process.展开更多
A method for identification of pulsations in time series of magnetic field data which are simultaneously present in multiple channels of data at one or more sensor locations is described. Candidate pulsations of inter...A method for identification of pulsations in time series of magnetic field data which are simultaneously present in multiple channels of data at one or more sensor locations is described. Candidate pulsations of interest are first identified in geomagnetic time series by inspection. Time series of these "training events" are represented in matrix form and transpose-multiplied to generate time- domain covariance matrices. The ranked eigenvectors of this matrix are stored as a feature of the pulsation. In the second stage of the algorithm, a sliding window (approxi- mately the width of the training event) is moved across the vector-valued time-series comprising the channels on which the training event was observed. At each window position, the data covariance matrix and associated eigen- vectors are calculated. We compare the orientation of the dominant eigenvectors of the training data to those from the windowed data and flag windows where the dominant eigenvectors directions are similar. This was successful in automatically identifying pulses which share polarization and appear to be from the same source process. We apply the method to a case study of continuously sampled (50 Hz) data from six observatories, each equipped with three- component induction coil magnetometers. We examine a 90-day interval of data associated with a cluster of four observatories located within 50 km of Napa, California, together with two remote reference stations-one 100 km to the north of the cluster and the other 350 km south. When the training data contains signals present in the remote reference observatories, we are reliably able to identify and extract global geomagnetic signals such as solar-generated noise. When training data contains pulsations only observed in the cluster of local observatories, we identify several types of non-plane wave signals having similar polarization.展开更多
The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive st...The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.展开更多
Banana is one of the main economic agrotypes in Zhangzhou, Fujian Province. The multitemporal ENVlSAT ASAR data with different polarization are used to classify the banana fields in this paper. Principal component ana...Banana is one of the main economic agrotypes in Zhangzhou, Fujian Province. The multitemporal ENVlSAT ASAR data with different polarization are used to classify the banana fields in this paper. Principal component analysis (PCA) was applied for six pairs of ASAR dual-polarization data. For its large leaves, banana has high backscatter. So the value of banana fields is high and shows very bright in the 1st component, which makes it much easier for banana fields extraction. Dual-polarization data provide more information, and the W and VH backscatter of banana show different characters with other land covers. Based on the analysis of the radar signature of banana fields and other land covers and the 1st compo- nent, banana fields are classified using object-oriented classifier. Compared to the field survey data and ASTER data, the accuracy of banana fields in the study area is 83.5%. It shows that the principal component analysis provides the useful information in SAR images analysis and makes the extraction of banana fields easier.展开更多
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based...Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based on data mining.Furthermore,with the complexity of industrial equipment increasing,the managers should pay more attention to the key components and carry out the lean management is very important.Therefore,the idea"customer segmentation"of"precise marketing"can be used in the maintenance management of the multi-component system.Following the idea of segmentation,the components of multicomponent systems should be subdivied into groups based on specific attributes relevant to maintenance,such as maintenance cost,mean time between failures,and failure frequency.For the target specific groups of parts,the optimal maintenance policy,health assessment and maintenance scheduling can be determined.The proposed analysis framework will be given out.In order to illustrate the effectiveness of this method,a numerical example is given out.展开更多
The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations im...The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations imposed by inadequate resources,energy,and network scalability,this type of network relies heavily on data aggregation and clustering algorithms.Although various conventional studies have aimed to enhance the lifespan of a network through robust systems,they do not always provide optimal efficiency for real-time applications.This paper presents an approach based on state-of-the-art machine-learning methods.In this study,we employed a novel approach that combines an extended version of principal component analysis(PCA)and a reinforcement learning algorithm to achieve efficient clustering and data reduction.The primary objectives of this study are to enhance the service life of a network,reduce energy usage,and improve data aggregation efficiency.We evaluated the proposed methodology using data collected from sensors deployed in agricultural fields for crop monitoring.Our proposed approach(PQL)was compared to previous studies that utilized adaptive Q-learning(AQL)and regional energy-aware clustering(REAC).Our study outperformed in terms of both network longevity and energy consumption and established a fault-tolerant network.展开更多
Until recently, computational power was insufficient to diagonalize atmospheric datasets of order 108 - 109 elements. Eigenanalysis of tens of thousands of variables now can achieve massive data compression for spatia...Until recently, computational power was insufficient to diagonalize atmospheric datasets of order 108 - 109 elements. Eigenanalysis of tens of thousands of variables now can achieve massive data compression for spatial fields with strong correlation properties. Application of eigenanalysis to 26,394 variable dimensions, for three severe weather datasets (tornado, hail and wind) retains 9 - 11 principal components explaining 42% - 52% of the variability. Rotated principal components (RPCs) detect localized coherent data variance structures for each outbreak type and are related to standardized anomalies of the meteorological fields. Our analyses of the RPC loadings and scores show that these graphical displays can efficiently reduce and interpret large datasets. Data is analyzed 24 hours prior to severe weather as a forecasting aid. RPC loadings of sea-level pressure fields show different morphology loadings for each outbreak type. Analysis of low level moisture and temperature RPCs suggests moisture fields for hail and wind which are more related than for tornado outbreaks. Consequently, these patterns can identify precursors of severe weather and discriminate between tornadic and non-tornadic outbreaks.展开更多
This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expre...This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expression data to find a small subset of the most important genes in a set of thousand or tens of thousands of genes from a relatively small number of experimental runs. The method was previously developed and evaluated on artificially generated data and real data sets. Its evaluations consisted of its ability to rank the genes against known truth in simulated data studies and to identify known important genes in real data studies. The purpose of the work described here is to identify a ranked set of genes in an experimental study and then for a few of the most highly ranked unverified genes, experimentally verify their importance.This method was evaluated using the transcriptional response of Escherichia coli to treatment with four distinct inhibitory compounds: nitric oxide, S-nitrosoglutathione, serine hydroxamate and potassium cyanide. Our analysis identified genes previously recognized in the response to these compounds and also identified new genes.Three of these new genes, ycbR, yJhA and yahN, were found to significantly (p-values〈0.002) affect the sensitivityofE, coli to nitric oxide-mediated growth inhibition. Given that the three genes were not highly ranked in the selected ranked set (RS), these results support strong sensitivity in the ability of the method to successfully identify genes related to challenge by NO and GSNO. This ability to identify genes related to the response to an inhibitory compound is important for engineering tolerance to inhibitory metabolic products, such as biofuels, and utilization of cheap sugar streams, such as biomass-derived sugars or hydrolysate.展开更多
Based on MATRIXx, a universal real-time visual distributed simulation system is developed. The system can receive different input data from network or local terminal. Application models in the simulation modules can a...Based on MATRIXx, a universal real-time visual distributed simulation system is developed. The system can receive different input data from network or local terminal. Application models in the simulation modules can automatically get such data to be analyzed and calculated, and then produce real-time simulation control information. Meanwhile, this paper designs relevant simulation components to implement the input and output data, which can guarantee the real-time and universal of the data transmission. Result of the experimental system shows that the real-time performance of the simulation is perfect.展开更多
A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their constructio...A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.展开更多
In the coexisted world of 3G,4G,5G and many other specialized wireless communication systems,billions of connections could be existing for various information transmission types.Unluckily,data show that the increase o...In the coexisted world of 3G,4G,5G and many other specialized wireless communication systems,billions of connections could be existing for various information transmission types.Unluckily,data show that the increase of network capacity is heavily more than the increase of the network energy efficiency in recent years,which could lead to more energy consumption per transmitted bit in the future network.As basic units in mobile communication systems,microwave/RF components and modules play key roles展开更多
Screening similar historical fault-free candidate data would greatly affect the effectiveness of fault detection results based on principal component analysis(PCA).In order to find out the candidate data,this study co...Screening similar historical fault-free candidate data would greatly affect the effectiveness of fault detection results based on principal component analysis(PCA).In order to find out the candidate data,this study compares unweighted and weighted similarity factors(SFs),which measure the similarity of the principal component subspace corresponding to the first k main components of two datasets.The fault detection employs the principal component subspace corresponding to the current measured data and the historical fault-free data.From the historical fault-free database,the load parameters are employed to locate the candidate data similar to the current operating data.Fault detection method for air conditioning systems is based on principal component.The results show that the weighted principal component SF can improve the effects of the fault-free detection and the fault detection.Compared with the unweighted SF,the average fault-free detection rate of the weighted SF is 17.33%higher than that of the unweighted,and the average fault detection rate is 7.51%higher than unweighted.展开更多
Independent component analysis(ICA) can reveal the essential underlying structure of data, and independent component regression(ICR) methods usually obtain better performance than other regression methods such as prin...Independent component analysis(ICA) can reveal the essential underlying structure of data, and independent component regression(ICR) methods usually obtain better performance than other regression methods such as principal component regression. However, when existing ICR methods separate or extract independent components using prewhitened data, the backward propagation of inevitable prewhitened errors deteriorates the final linear prediction accuracy. To overcome this weakness, first, we proposed using weighted orthogonal constraint condition to replace the prewhitening of the data in ICA. Next, the statistical independence of ICs and the close relationship between ICs and quality variables are considered at the same time. Then, by combining the merits of improved ICR and ensemble ICR algorithm which solved the problem of selecting an appropriate nonquadratic function in ICA iteration procedure, a modified independent component regression(MICR) method that directly used the measured process data was proposed. Finally, three experimental results were used to validate excellent performance of modified algorithm.展开更多
In the paper, the primary component analysis is made using 8 seismicity parameters of earthquake frequency N (ML≥3.0), b-value, η-value, A(b)-value, Mf-value, Ac-value, C-value and D-value that reflect the character...In the paper, the primary component analysis is made using 8 seismicity parameters of earthquake frequency N (ML≥3.0), b-value, η-value, A(b)-value, Mf-value, Ac-value, C-value and D-value that reflect the characteristics of magnitude, time and space distribution of seismicity from different respects. By using the primary component analysis method, the synthesis parameter W reflecting the anomalous features of earthquake magnitude, time and space distribution can be gained. Generally, there is some relativity among the 8 parameters, but their variations are different in different periods. The earthquake prediction based on these parameters is not very well. However, the synthesis parameter W showed obvious anomalies before 13 earthquakes (MS≥5.8) occurred in North China, which indicates that the synthesis parameter W can reflect the anomalous characteristics of magnitude, time and space distribution of seismicity better. Other problems related to the conclusions drawn by the primary component analysis method are also discussed.展开更多
In this paper, our previous work on Principal Component Analysis (PCA) based fault detection method is extended to the dynamic monitoring and detection of loss-of-main in power systems using wide-area synchrophasor me...In this paper, our previous work on Principal Component Analysis (PCA) based fault detection method is extended to the dynamic monitoring and detection of loss-of-main in power systems using wide-area synchrophasor measurements. In the previous work, a static PCA model was built and verified to be capable of detecting and extracting system faulty events;however the false alarm rate is high. To address this problem, this paper uses a well-known ‘time lag shift’ method to include dynamic behavior of the PCA model based on the synchronized measurements from Phasor Measurement Units (PMU), which is named as the Dynamic Principal Component Analysis (DPCA). Compared with the static PCA approach as well as the traditional passive mechanisms of loss-of-main detection, the proposed DPCA procedure describes how the synchrophasors are linearly auto- and cross-correlated, based on conducting the singular value decomposition on the augmented time lagged synchrophasor matrix. Similar to the static PCA method, two statistics, namely T2 and Q with confidence limits are calculated to form intuitive charts for engineers or operators to monitor the loss-of-main situation in real time. The effectiveness of the proposed methodology is evaluated on the loss-of-main monitoring of a real system, where the historic data are recorded from PMUs installed in several locations in the UK/Ireland power system.展开更多
Mixed models provide a wide range of applications including hierarchical modeling and longitudinal studies. The tests of variance component in mixed models have long been a methodological challenge because of its boun...Mixed models provide a wide range of applications including hierarchical modeling and longitudinal studies. The tests of variance component in mixed models have long been a methodological challenge because of its boundary conditions. It is well documented in literature that the traditional first-order methods: likelihood ratio statistic, Wald statistic and score statistic, provide an excessively conservative approximation to the null distribution. However, the magnitude of the conservativeness has not been thoroughly explored. In this paper, we propose a likelihood-based third-order method to the mixed models for testing the null hypothesis of zero and non-zero variance component. The proposed method dramatically improved the accuracy of the tests. Extensive simulations were carried out to demonstrate the accuracy of the proposed method in comparison with the standard first-order methods. The results show the conservativeness of the first order methods and the accuracy of the proposed method in approximating the p-values and confidence intervals even when the sample size is small.展开更多
基金The National Natural Science Foundation of China(No.60673060)the Natural Science Foundation of Jiangsu Province(No.BK2005047)
文摘A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.
基金This project is supported by Special Foundation for Major State Basic Research of China (Project 973, No.G1998030415)
文摘In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonlinear industrial process. Kernel PCA (KPCA) is extensionof PCA and can be used for nonlinear feature analysis. A nonlinear data reconciliation method basedon KPCA is proposed. The basic idea of this method is that firstly original data are mapped to highdimensional feature space by nonlinear function, and PCA is implemented in the feature space. Thennonlinear feature analysis is implemented and data are reconstructed by using the kernel. The datareconciliation method based on KPCA is applied to ternary distillation column. Simulation resultsshow that this method can filter the noise in measurements of nonlinear process and reconciliateddata can represent the true information of nonlinear process.
文摘A method for identification of pulsations in time series of magnetic field data which are simultaneously present in multiple channels of data at one or more sensor locations is described. Candidate pulsations of interest are first identified in geomagnetic time series by inspection. Time series of these "training events" are represented in matrix form and transpose-multiplied to generate time- domain covariance matrices. The ranked eigenvectors of this matrix are stored as a feature of the pulsation. In the second stage of the algorithm, a sliding window (approxi- mately the width of the training event) is moved across the vector-valued time-series comprising the channels on which the training event was observed. At each window position, the data covariance matrix and associated eigen- vectors are calculated. We compare the orientation of the dominant eigenvectors of the training data to those from the windowed data and flag windows where the dominant eigenvectors directions are similar. This was successful in automatically identifying pulses which share polarization and appear to be from the same source process. We apply the method to a case study of continuously sampled (50 Hz) data from six observatories, each equipped with three- component induction coil magnetometers. We examine a 90-day interval of data associated with a cluster of four observatories located within 50 km of Napa, California, together with two remote reference stations-one 100 km to the north of the cluster and the other 350 km south. When the training data contains signals present in the remote reference observatories, we are reliably able to identify and extract global geomagnetic signals such as solar-generated noise. When training data contains pulsations only observed in the cluster of local observatories, we identify several types of non-plane wave signals having similar polarization.
基金supported by the EU H2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement(Project-DEEP,Grant number:101109045)National Key R&D Program of China with Grant number 2018YFB1800804+2 种基金the National Natural Science Foundation of China(Nos.NSFC 61925105,and 62171257)Tsinghua University-China Mobile Communications Group Co.,Ltd,Joint Institutethe Fundamental Research Funds for the Central Universities,China(No.FRF-NP-20-03)。
文摘The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.
基金Supported by the Program for New Century Excellent Talents in University (NCET-05-0573)Fujian Science and Technology Project (No2006I0018)the Science Project of the Education Department of Fujian Province(No 2006F5022)
文摘Banana is one of the main economic agrotypes in Zhangzhou, Fujian Province. The multitemporal ENVlSAT ASAR data with different polarization are used to classify the banana fields in this paper. Principal component analysis (PCA) was applied for six pairs of ASAR dual-polarization data. For its large leaves, banana has high backscatter. So the value of banana fields is high and shows very bright in the 1st component, which makes it much easier for banana fields extraction. Dual-polarization data provide more information, and the W and VH backscatter of banana show different characters with other land covers. Based on the analysis of the radar signature of banana fields and other land covers and the 1st compo- nent, banana fields are classified using object-oriented classifier. Compared to the field survey data and ASTER data, the accuracy of banana fields in the study area is 83.5%. It shows that the principal component analysis provides the useful information in SAR images analysis and makes the extraction of banana fields easier.
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
基金National Natural Science Foundations of China(No.71501103)Natural Science Foundation of Inner Mongolia,China(No.2015BS0705)the Program of Higher-Level Talents of Inner Mongolia University,China(No.20700-5145131)
文摘Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based on data mining.Furthermore,with the complexity of industrial equipment increasing,the managers should pay more attention to the key components and carry out the lean management is very important.Therefore,the idea"customer segmentation"of"precise marketing"can be used in the maintenance management of the multi-component system.Following the idea of segmentation,the components of multicomponent systems should be subdivied into groups based on specific attributes relevant to maintenance,such as maintenance cost,mean time between failures,and failure frequency.For the target specific groups of parts,the optimal maintenance policy,health assessment and maintenance scheduling can be determined.The proposed analysis framework will be given out.In order to illustrate the effectiveness of this method,a numerical example is given out.
文摘The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations imposed by inadequate resources,energy,and network scalability,this type of network relies heavily on data aggregation and clustering algorithms.Although various conventional studies have aimed to enhance the lifespan of a network through robust systems,they do not always provide optimal efficiency for real-time applications.This paper presents an approach based on state-of-the-art machine-learning methods.In this study,we employed a novel approach that combines an extended version of principal component analysis(PCA)and a reinforcement learning algorithm to achieve efficient clustering and data reduction.The primary objectives of this study are to enhance the service life of a network,reduce energy usage,and improve data aggregation efficiency.We evaluated the proposed methodology using data collected from sensors deployed in agricultural fields for crop monitoring.Our proposed approach(PQL)was compared to previous studies that utilized adaptive Q-learning(AQL)and regional energy-aware clustering(REAC).Our study outperformed in terms of both network longevity and energy consumption and established a fault-tolerant network.
文摘Until recently, computational power was insufficient to diagonalize atmospheric datasets of order 108 - 109 elements. Eigenanalysis of tens of thousands of variables now can achieve massive data compression for spatial fields with strong correlation properties. Application of eigenanalysis to 26,394 variable dimensions, for three severe weather datasets (tornado, hail and wind) retains 9 - 11 principal components explaining 42% - 52% of the variability. Rotated principal components (RPCs) detect localized coherent data variance structures for each outbreak type and are related to standardized anomalies of the meteorological fields. Our analyses of the RPC loadings and scores show that these graphical displays can efficiently reduce and interpret large datasets. Data is analyzed 24 hours prior to severe weather as a forecasting aid. RPC loadings of sea-level pressure fields show different morphology loadings for each outbreak type. Analysis of low level moisture and temperature RPCs suggests moisture fields for hail and wind which are more related than for tornado outbreaks. Consequently, these patterns can identify precursors of severe weather and discriminate between tornadic and non-tornadic outbreaks.
文摘This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expression data to find a small subset of the most important genes in a set of thousand or tens of thousands of genes from a relatively small number of experimental runs. The method was previously developed and evaluated on artificially generated data and real data sets. Its evaluations consisted of its ability to rank the genes against known truth in simulated data studies and to identify known important genes in real data studies. The purpose of the work described here is to identify a ranked set of genes in an experimental study and then for a few of the most highly ranked unverified genes, experimentally verify their importance.This method was evaluated using the transcriptional response of Escherichia coli to treatment with four distinct inhibitory compounds: nitric oxide, S-nitrosoglutathione, serine hydroxamate and potassium cyanide. Our analysis identified genes previously recognized in the response to these compounds and also identified new genes.Three of these new genes, ycbR, yJhA and yahN, were found to significantly (p-values〈0.002) affect the sensitivityofE, coli to nitric oxide-mediated growth inhibition. Given that the three genes were not highly ranked in the selected ranked set (RS), these results support strong sensitivity in the ability of the method to successfully identify genes related to challenge by NO and GSNO. This ability to identify genes related to the response to an inhibitory compound is important for engineering tolerance to inhibitory metabolic products, such as biofuels, and utilization of cheap sugar streams, such as biomass-derived sugars or hydrolysate.
文摘Based on MATRIXx, a universal real-time visual distributed simulation system is developed. The system can receive different input data from network or local terminal. Application models in the simulation modules can automatically get such data to be analyzed and calculated, and then produce real-time simulation control information. Meanwhile, this paper designs relevant simulation components to implement the input and output data, which can guarantee the real-time and universal of the data transmission. Result of the experimental system shows that the real-time performance of the simulation is perfect.
文摘A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.
文摘In the coexisted world of 3G,4G,5G and many other specialized wireless communication systems,billions of connections could be existing for various information transmission types.Unluckily,data show that the increase of network capacity is heavily more than the increase of the network energy efficiency in recent years,which could lead to more energy consumption per transmitted bit in the future network.As basic units in mobile communication systems,microwave/RF components and modules play key roles
基金Research Project of China Ship Development and Design Center。
文摘Screening similar historical fault-free candidate data would greatly affect the effectiveness of fault detection results based on principal component analysis(PCA).In order to find out the candidate data,this study compares unweighted and weighted similarity factors(SFs),which measure the similarity of the principal component subspace corresponding to the first k main components of two datasets.The fault detection employs the principal component subspace corresponding to the current measured data and the historical fault-free data.From the historical fault-free database,the load parameters are employed to locate the candidate data similar to the current operating data.Fault detection method for air conditioning systems is based on principal component.The results show that the weighted principal component SF can improve the effects of the fault-free detection and the fault detection.Compared with the unweighted SF,the average fault-free detection rate of the weighted SF is 17.33%higher than that of the unweighted,and the average fault detection rate is 7.51%higher than unweighted.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61573014)
文摘Independent component analysis(ICA) can reveal the essential underlying structure of data, and independent component regression(ICR) methods usually obtain better performance than other regression methods such as principal component regression. However, when existing ICR methods separate or extract independent components using prewhitened data, the backward propagation of inevitable prewhitened errors deteriorates the final linear prediction accuracy. To overcome this weakness, first, we proposed using weighted orthogonal constraint condition to replace the prewhitening of the data in ICA. Next, the statistical independence of ICs and the close relationship between ICs and quality variables are considered at the same time. Then, by combining the merits of improved ICR and ensemble ICR algorithm which solved the problem of selecting an appropriate nonquadratic function in ICA iteration procedure, a modified independent component regression(MICR) method that directly used the measured process data was proposed. Finally, three experimental results were used to validate excellent performance of modified algorithm.
基金Project of Joint Seismological Science Foundation of China (104090).
文摘In the paper, the primary component analysis is made using 8 seismicity parameters of earthquake frequency N (ML≥3.0), b-value, η-value, A(b)-value, Mf-value, Ac-value, C-value and D-value that reflect the characteristics of magnitude, time and space distribution of seismicity from different respects. By using the primary component analysis method, the synthesis parameter W reflecting the anomalous features of earthquake magnitude, time and space distribution can be gained. Generally, there is some relativity among the 8 parameters, but their variations are different in different periods. The earthquake prediction based on these parameters is not very well. However, the synthesis parameter W showed obvious anomalies before 13 earthquakes (MS≥5.8) occurred in North China, which indicates that the synthesis parameter W can reflect the anomalous characteristics of magnitude, time and space distribution of seismicity better. Other problems related to the conclusions drawn by the primary component analysis method are also discussed.
文摘In this paper, our previous work on Principal Component Analysis (PCA) based fault detection method is extended to the dynamic monitoring and detection of loss-of-main in power systems using wide-area synchrophasor measurements. In the previous work, a static PCA model was built and verified to be capable of detecting and extracting system faulty events;however the false alarm rate is high. To address this problem, this paper uses a well-known ‘time lag shift’ method to include dynamic behavior of the PCA model based on the synchronized measurements from Phasor Measurement Units (PMU), which is named as the Dynamic Principal Component Analysis (DPCA). Compared with the static PCA approach as well as the traditional passive mechanisms of loss-of-main detection, the proposed DPCA procedure describes how the synchrophasors are linearly auto- and cross-correlated, based on conducting the singular value decomposition on the augmented time lagged synchrophasor matrix. Similar to the static PCA method, two statistics, namely T2 and Q with confidence limits are calculated to form intuitive charts for engineers or operators to monitor the loss-of-main situation in real time. The effectiveness of the proposed methodology is evaluated on the loss-of-main monitoring of a real system, where the historic data are recorded from PMUs installed in several locations in the UK/Ireland power system.
文摘Mixed models provide a wide range of applications including hierarchical modeling and longitudinal studies. The tests of variance component in mixed models have long been a methodological challenge because of its boundary conditions. It is well documented in literature that the traditional first-order methods: likelihood ratio statistic, Wald statistic and score statistic, provide an excessively conservative approximation to the null distribution. However, the magnitude of the conservativeness has not been thoroughly explored. In this paper, we propose a likelihood-based third-order method to the mixed models for testing the null hypothesis of zero and non-zero variance component. The proposed method dramatically improved the accuracy of the tests. Extensive simulations were carried out to demonstrate the accuracy of the proposed method in comparison with the standard first-order methods. The results show the conservativeness of the first order methods and the accuracy of the proposed method in approximating the p-values and confidence intervals even when the sample size is small.