With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware ...With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.展开更多
A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversio...A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversion method, evolution of the density profile analyzed by this method can quickly offer important information. This method has the advantage of fast calculation speed with the order of ten milliseconds per normal shot and it is capable of processing up to 1 MHz sampled data, which is helpful for studying density sawtooth instability and the disruption between shots. In the duration of a flat-top plasma current of usual ohmic discharges on J-TEXT, shape factor u is ranged from 4 to 5. When the disruption of discharge happens, the density profile becomes peaked and the shape factor u typically decreases to 1.展开更多
Energy saving and fast responding of data gathering are two crucial factors for the performance of wireless sensor networks. A dynamic tree based energy equalizing routing scheme (DTEER) was proposed to make an effo...Energy saving and fast responding of data gathering are two crucial factors for the performance of wireless sensor networks. A dynamic tree based energy equalizing routing scheme (DTEER) was proposed to make an effort to gather data along with low energy consumption and low time delay. DTEER introduces a dynamic multi-hop route selecting scheme based on weight-value and height-value to form a dynamic tree and a mechanism similar to token passing to elect the root of the tree. DTEER can simply and rapidly organize all the nodes with low overhead and is robust enough to the topology changes. When compared with power-efficient gathering in sensor information systems (PEGASIS) and the hybrid, energy- efficient, distributed clustering approach (HEED), the simulation results show that DTEER achieves its intention of consuming less energy, equalizing the energy consumption of all the nodes, alleviating the data gathering delay, as well as extending the network lifetime perfectly.展开更多
When castings become complicated and the demands for precision of numerical simulation become higher,the numerical data of casting numerical simulation become more massive.On a general personal computer,these massive ...When castings become complicated and the demands for precision of numerical simulation become higher,the numerical data of casting numerical simulation become more massive.On a general personal computer,these massive numerical data may probably exceed the capacity of available memory,resulting in failure of rendering.Based on the out-of-core technique,this paper proposes a method to effectively utilize external storage and reduce memory usage dramatically,so as to solve the problem of insufficient memory for massive data rendering on general personal computers.Based on this method,a new postprocessor is developed.It is capable to illustrate filling and solidification processes of casting,as well as thermal stess.The new post-processor also provides fast interaction to simulation results.Theoretical analysis as well as several practical examples prove that the memory usage and loading time of the post-processor are independent of the size of the relevant files,but the proportion of the number of cells on surface.Meanwhile,the speed of rendering and fetching of value from the mouse is appreciable,and the demands of real-time and interaction are satisfied.展开更多
This paper focuses on the fast rate fault detection filter (FDF) problem for a class of multirate sampled-data (MSD) systems. A lifting technique is used to convert such an MSD system into a linear time-invariant disc...This paper focuses on the fast rate fault detection filter (FDF) problem for a class of multirate sampled-data (MSD) systems. A lifting technique is used to convert such an MSD system into a linear time-invariant discrete-time one and an unknown input observer (UIO) is considered as FDF to generate residual. The design of FDF is formulated as an H∞ optimization problem and a solvable condition as well as an optimal solution are derived. The causality of the residual generator can be guaranteed so that the fast rate residual can be implemented via inverse lifting. A numerical example is included to demonstrate the feasibility of the obtained results.展开更多
基金supported in part by the Major Program of the Ministry of Science and Technology of China under Grant 2019YFB2205102in part by the National Natural Science Foundation of China under Grant 61974164,62074166,61804181,62004219,62004220,62104256.
文摘With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2014GB106000,2014GB106002,and2014GB106003)National Natural Science Foundation of China(Nos.11275234,11375237 and 11505238)Scientific Research Grant of Hefei Science Center of CAS(No.2015SRG-HSC010)
文摘A method of fast data processing has been developed to rapidly obtain evolution of the electron density profile for a multichannel polarimeter-interferometer system(POLARIS)on J-TEXT. Compared with the Abel inversion method, evolution of the density profile analyzed by this method can quickly offer important information. This method has the advantage of fast calculation speed with the order of ten milliseconds per normal shot and it is capable of processing up to 1 MHz sampled data, which is helpful for studying density sawtooth instability and the disruption between shots. In the duration of a flat-top plasma current of usual ohmic discharges on J-TEXT, shape factor u is ranged from 4 to 5. When the disruption of discharge happens, the density profile becomes peaked and the shape factor u typically decreases to 1.
基金the National Natural Science Foundation of China(60602016);the National Basic Research Program of China(2003CB314801);the Hi-Tech Resrarch and Development Program of China(2007AA01Z428); MOE-MS Key Laboratory of Multimedia Calculation and Communication Open Foundation(05071801);HUAWEI Foundation(YJCB2006062WL,YJCB2007061WL).
文摘Energy saving and fast responding of data gathering are two crucial factors for the performance of wireless sensor networks. A dynamic tree based energy equalizing routing scheme (DTEER) was proposed to make an effort to gather data along with low energy consumption and low time delay. DTEER introduces a dynamic multi-hop route selecting scheme based on weight-value and height-value to form a dynamic tree and a mechanism similar to token passing to elect the root of the tree. DTEER can simply and rapidly organize all the nodes with low overhead and is robust enough to the topology changes. When compared with power-efficient gathering in sensor information systems (PEGASIS) and the hybrid, energy- efficient, distributed clustering approach (HEED), the simulation results show that DTEER achieves its intention of consuming less energy, equalizing the energy consumption of all the nodes, alleviating the data gathering delay, as well as extending the network lifetime perfectly.
基金supported by the New Century Excellent Talents in University(NCET-09-0396)the National Science&Technology Key Projects of Numerical Control(2012ZX04014-031)+1 种基金the Natural Science Foundation of Hubei Province(2011CDB279)the Foundation for Innovative Research Groups of the Natural Science Foundation of Hubei Province,China(2010CDA067)
文摘When castings become complicated and the demands for precision of numerical simulation become higher,the numerical data of casting numerical simulation become more massive.On a general personal computer,these massive numerical data may probably exceed the capacity of available memory,resulting in failure of rendering.Based on the out-of-core technique,this paper proposes a method to effectively utilize external storage and reduce memory usage dramatically,so as to solve the problem of insufficient memory for massive data rendering on general personal computers.Based on this method,a new postprocessor is developed.It is capable to illustrate filling and solidification processes of casting,as well as thermal stess.The new post-processor also provides fast interaction to simulation results.Theoretical analysis as well as several practical examples prove that the memory usage and loading time of the post-processor are independent of the size of the relevant files,but the proportion of the number of cells on surface.Meanwhile,the speed of rendering and fetching of value from the mouse is appreciable,and the demands of real-time and interaction are satisfied.
基金Supported by National Natural Science Foundation of P. R. China (60374021)the Natural Science Foundation of Shandong Province (Y2002G05)the Youth Scientists Foundation of Shandong Province (03BS091, 05BS01007) and Education Ministry Foundation of P. R. China (20050422036)
文摘This paper focuses on the fast rate fault detection filter (FDF) problem for a class of multirate sampled-data (MSD) systems. A lifting technique is used to convert such an MSD system into a linear time-invariant discrete-time one and an unknown input observer (UIO) is considered as FDF to generate residual. The design of FDF is formulated as an H∞ optimization problem and a solvable condition as well as an optimal solution are derived. The causality of the residual generator can be guaranteed so that the fast rate residual can be implemented via inverse lifting. A numerical example is included to demonstrate the feasibility of the obtained results.