期刊文献+

一种监控高维数据流的稳健型控制图

A Robust Control Chart for Monitoring High-dimensional Data Streams
在线阅读 下载PDF
导出
摘要 随着传感器等现代化科技的迅速发展,高维数据流在各行各业中频繁出现。然而,高维数据的复杂性给质量监控带来了许多挑战。例如,在高维情形下正态性假设往往失效,而且实际中分布形式通常未知;同时仅监控均值的控制图已经无法满足实际需求,监控方差的重要性早已成为学界和业界的共识。为此,提出了一种用于监控独立高维数据流的稳健型控制图。首先将经典的得分检验统计量经过数学变换后,与指数加权移动平均(EWMA)方法相结合,提出监控每一维数据流的局部统计量,并在此基础上结合top-r等方法,提出了监控高维数据流的全局监控统计量。所提出方法适用于正态分布及非正态分布的数据,并且能够同时监控均值和方差。通过数值仿真分析和实际案例研究阐明新方法的有效性和稳健性。 As technology advances quickly,the functions of products are expanding in number,and their structures are becoming progressively more complicated.Therefore,it is often necessary to monitor multiple quality characteristics simultaneously during the production process.However,the data dimension is expanding quickly with the rapid growth of data collecting technology in the data age with the innovation of science technology and the advanced Internet.The number of product indicators that need to be monitored during the production process is growing day by day.High-dimensional data streams appear more and more frequently in various industries,especially in sensor-based manufacturing and image processing.High-dimensional data streams have attracted a lot of attention as a new type of data,and they have been already pervasive in daily life.Examples include information returned by sensors,real-time meteorological cloud images captured by satellites and user communication records.However,the complexity of high-dimensional data brings many new challenges to quality monitoring.For instance,due to a large number of variables,the normality assumption of data is often invalid in high-dimensional cases,and the distribution form is usually unknown in practical applications.At the same time,the control chart that only detects mean shifts has been unable to satisfy the practical needs.Therefore,we urgently need statistical methods to monitor high-dimensional data streams.To this end,a new robust control chart for monitoring independent high-dimensional data streams is proposed.Firstly,the local statistics for monitoring each dimension of the data streams are constructed by combining the score test statistic with the exponentially weighted moving average strategy.As a result,for the t th observations of k th data stream X k,t,the final local charting statistic is given by R k,t=(θk,t)TI-10θk,t,where,θk,t is the EWMA-type score function vector,and I 0is the Fisher information matrix in control.Naturally,this type of statistic makes use of all data up to the current time point,and the control chart gives different observations varying weights.On this basis,the global monitoring statistics are constructed by utilizing the sum,the maximum value,and the top-r strategy.Especially,the proposed control chart method based on top-r method Z top-r monitors is better than the method Z max and more efficiently than the method Z sum because it only needs to calculate the first r local statistics.Therefore,this method is more convenient for calculation and more economical in cost.Accordingly,we advise we use the method Z top-r whether it is for detecting mean shifts or variance drifts.In fact,the numerical simulations and a real case study have demonstrated its effectiveness.Practically,the top-r control chart method can be expressed as Z top-r=∑r k=1 R(k),t=∑r k=1[(θ(k),t)T I-10θ(k),t],1 k p,where R(k),t denotes the k th largest local statistic.In practice,the simulation results have shown that using the Z top-r statistics is sensitive and robust to detect process changes with suitable choices of the parameter r.This method is appropriate for data with normal distribution or non-normal distribution.At the same time,it can detect not only shifts of mean value,but also shifts of variance,which is not available in many control charts.In order to evaluate the monitoring effect of the proposed control charts,the Monte Carlo simulation method is used.The average run length is used as an indicator to evaluate the monitoring performance of the control chart.The effectiveness and robustness of the proposed control charts are verified by the numerical simulation.In order to illustrate the monitoring effect of the new control chart method in practical application,a practical case study is carried out with a set of real data.The data set contains 1,567 samples in total from a semiconductor manufacturing process.Each observation vector is composed of 590 dimensional variables.The final results prove that the proposed method Z top-r has a higher calculation and detection efficiency.And it can detect abnormal shifts well in practical production in high-dimensional data streams.The proposed new control chart in this paper has several advantages.Firstly,it can deal with both the normal and non-normal data.Secondly,it can not only detect mean shifts,but also variance shifts.Finally,the method only needs to focus on the first r local statistics.The statistics are simple in form and calculation,and more efficient.Therefore,these advantages of this new method guarantee that in the actual production process,any shifts of data streams can be quickly and effectively alarmed.The new control chart can be used in actual production process and effectively monitor product quality.In this paper,we assume that the data streams are independent of one another,but in the actual production process,the relationship among data streams will be more complex as the dimension increases.In future research,we can consider extending the proposed method to the case of more general high-dimensional data streams.
作者 丁冬 姜亚蕾 DING Dong;JIANG Yalei(School of Management,Xi’an Polytechnic University,Xi’an 710048,China)
出处 《运筹与管理》 北大核心 2025年第1期12-18,共7页 Operations Research and Management Science
基金 陕西省教育厅哲学社会科学重点研究基地项目(21JZ030)。
关键词 高维数据流 稳健型控制图 EWMA top-r统计量 统计过程控制 high-dimensional data streams robust control charts EWMA the top-r statistics statistical process control
  • 相关文献

参考文献4

二级参考文献26

  • 1Yanfeng Shen,Zhengyan Lin,Jun Zhu.Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis[J]. Computational Statistics and Data Analysis . 2011 (7)
  • 2Jianqing Fan.Test of significance based on wavelet thresholding and Neyman’s truncation. Journal of the American Statistical Association . 1996
  • 3Lowry C A,Woodall W H,Champ C W,et al.A multivariate exponentially weighted moving average control chart. Technometrics . 1992
  • 4Mason, Robert L.,Young, John C.Improving the sensitivity of the T<sup>2</sup> statistic in multivariate process control. Journal of Environmental Quality . 1999
  • 5Sullivan, Joe H.,Woodall, William H.Comparison of multivariate control charts for individual observations. Journal of Environmental Quality . 1996
  • 6Sch?fer Juliane,Strimmer Korbinian.A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology . 2006
  • 7Crosier,Ronald B.Multivariate Generalizations of Cumulative Sum Quality-Control Schemes. Technometrics . 1988
  • 8Pignatiello J,Runger G.Comparisons of multivariate CUSUM charts. Journal of Environmental Quality . 1990
  • 9M. S. Srivastava.Multivariate Theory for Analyzing High Dimensional Data. JOURNAL OF THE JAPAN STATISTICAL SOCIETY . 2008
  • 10Hotelling H.Multivariate quality control-illustrated by the air testing of sample bombsights. Techniques of Statistical Analysis . 1947

共引文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部