To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convo...To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.展开更多
In order to precisely retrieve the atmospheric CO2 , a retrieval method based on both near infrared (NIR) and thermal infrared (TIR) is established firstly. Then a look-up-table (LUT) based fast line-by-line rad...In order to precisely retrieve the atmospheric CO2 , a retrieval method based on both near infrared (NIR) and thermal infrared (TIR) is established firstly. Then a look-up-table (LUT) based fast line-by-line radiative transfer model (RTM) was integrated into the retrieval procedure to accelerate radiative transfer calculations. The LUT stores gas absorption cross-sections as a function of temperature, pressure and wavenumber. It could greatly reduce calculating time in radiative transfer compared to direct line-by-line method. Then retrieval was simulated using NIR, TIR and both bands. The retrieved CO2 profiles suggest joint approach could reconstruct CO2 profile better than those using NIR or TIR alone. Joint retrieval using both bands simultaneously could provide better constrain to CO2 vertical distribution in the whole troposphere.展开更多
This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The compl...This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The complexity lying in the realization of FIR filter is dominated by the multiplier structure. This complexity grows further with filter order, which results in increased area, power, and reduced speed of operation. The speed of operation is improved over multiply-accumulate approach using multiplier less conventional DA based design and decomposed DA based design. Both the structure requires B clock cycles to get the filter output for the input width of B, which limits the speed of DA structure. This limitation is addressed using parallel LUTs, called high speed DA FIR, at the expense of additional hardware cost. With large number of taps, the number of LUTs and its size also becomes large. In the proposed method, by exploiting coefficient symmetry property, the number of LUTs in the decomposed DA form is reduced by a factor of about 2. This proposed approach is applied in high speed DA based FIR design, to obtain area and speed efficient structure. The proposed design offers around 40% less area and 53.98% less slice-delay product (SDP) than the high throughput DA based structure when it’s implemented over Xilinx Virtex-5 FPGA device-XC5VSX95T-1FF1136 for 16-tap symmetric FIR filter. The proposed design on the same FPGA device, supports up to 607 MHz input sampling frequency, and offers 60.5% more speed and 67.71% less SDP than the systolic DA based design.展开更多
为了能够对记忆型功率放大器线性化处理,并能一定程度克服其记忆效应,该文介绍一种自适应数字预失真器。该数字预失真器采用查找表与记忆效应补偿技术相结合的方法,并且利用内插值方法有效减小了查找表幅度量化过程产生的误差。相比记...为了能够对记忆型功率放大器线性化处理,并能一定程度克服其记忆效应,该文介绍一种自适应数字预失真器。该数字预失真器采用查找表与记忆效应补偿技术相结合的方法,并且利用内插值方法有效减小了查找表幅度量化过程产生的误差。相比记忆多项式预失真器,这种预失真器的计算复杂度较小,却能够得到与其相近的线性化效果。基于功率放大器记忆多项式模型,利用OFDM(Orthogonal Frequency Division Multiplexing)宽带信号验证该文提出的预失真器对记忆型非线性功率放大器的良好线性化效果。展开更多
叶面积指数(Leaf Area Index,LAI)作为表征不同作物生长状况的基本参数,是农业精细化管理及农田生态系统建模的关键。我国农田作物种植比较离散,受地表空间结构非均一性和反演模型非线性等因素影响,不同尺度遥感数据估算的作物LAI存在...叶面积指数(Leaf Area Index,LAI)作为表征不同作物生长状况的基本参数,是农业精细化管理及农田生态系统建模的关键。我国农田作物种植比较离散,受地表空间结构非均一性和反演模型非线性等因素影响,不同尺度遥感数据估算的作物LAI存在一定的差异,即农田作物LAI的遥感反演普遍存在尺度效应问题。以包头遥感综合验证场农业示范区为研究区,利用无人机高光谱数据结合PROSPECT+SAIL模型构建典型农作物区多类型作物的查找表(Look-Up-Table,LUT)反演农田LAI,研究查找表用于玉米、马铃薯、向日葵、瓜地等不同作物LAI反演的适用性和精度;通过无人机高光谱数据聚合获得多尺度遥感数据源,结合Taylor展开理论和计算几何模型,提出了一种既考虑类间差异又考虑类内异质性的尺度转换模型,定量描述多种作物混合的非均一地表LAI反演过程中的尺度效应特征。结果表明:基于分类和参数敏感性分析的LUT方法能很好地应用于包头典型农作物区多类型混合作物LAI反演,总估算精度为相关系数R^2=0.82、均方根误差RMSE=0.43m^2/m^2。随着反演尺度的增加,作物类间差异造成的反演偏差明显高于类内异质性,利用本文所提出的尺度转换模型均能较好纠正低分辨率LAI反演的尺度效应问题。展开更多
基金The Academic Colleges and Universities Innovation Program 2.0(No.BP0719013)。
文摘To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.
基金Supported by the National Natural Science Foundation of China(41175037)
文摘In order to precisely retrieve the atmospheric CO2 , a retrieval method based on both near infrared (NIR) and thermal infrared (TIR) is established firstly. Then a look-up-table (LUT) based fast line-by-line radiative transfer model (RTM) was integrated into the retrieval procedure to accelerate radiative transfer calculations. The LUT stores gas absorption cross-sections as a function of temperature, pressure and wavenumber. It could greatly reduce calculating time in radiative transfer compared to direct line-by-line method. Then retrieval was simulated using NIR, TIR and both bands. The retrieved CO2 profiles suggest joint approach could reconstruct CO2 profile better than those using NIR or TIR alone. Joint retrieval using both bands simultaneously could provide better constrain to CO2 vertical distribution in the whole troposphere.
文摘This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The complexity lying in the realization of FIR filter is dominated by the multiplier structure. This complexity grows further with filter order, which results in increased area, power, and reduced speed of operation. The speed of operation is improved over multiply-accumulate approach using multiplier less conventional DA based design and decomposed DA based design. Both the structure requires B clock cycles to get the filter output for the input width of B, which limits the speed of DA structure. This limitation is addressed using parallel LUTs, called high speed DA FIR, at the expense of additional hardware cost. With large number of taps, the number of LUTs and its size also becomes large. In the proposed method, by exploiting coefficient symmetry property, the number of LUTs in the decomposed DA form is reduced by a factor of about 2. This proposed approach is applied in high speed DA based FIR design, to obtain area and speed efficient structure. The proposed design offers around 40% less area and 53.98% less slice-delay product (SDP) than the high throughput DA based structure when it’s implemented over Xilinx Virtex-5 FPGA device-XC5VSX95T-1FF1136 for 16-tap symmetric FIR filter. The proposed design on the same FPGA device, supports up to 607 MHz input sampling frequency, and offers 60.5% more speed and 67.71% less SDP than the systolic DA based design.
文摘为了能够对记忆型功率放大器线性化处理,并能一定程度克服其记忆效应,该文介绍一种自适应数字预失真器。该数字预失真器采用查找表与记忆效应补偿技术相结合的方法,并且利用内插值方法有效减小了查找表幅度量化过程产生的误差。相比记忆多项式预失真器,这种预失真器的计算复杂度较小,却能够得到与其相近的线性化效果。基于功率放大器记忆多项式模型,利用OFDM(Orthogonal Frequency Division Multiplexing)宽带信号验证该文提出的预失真器对记忆型非线性功率放大器的良好线性化效果。
文摘叶面积指数(Leaf Area Index,LAI)作为表征不同作物生长状况的基本参数,是农业精细化管理及农田生态系统建模的关键。我国农田作物种植比较离散,受地表空间结构非均一性和反演模型非线性等因素影响,不同尺度遥感数据估算的作物LAI存在一定的差异,即农田作物LAI的遥感反演普遍存在尺度效应问题。以包头遥感综合验证场农业示范区为研究区,利用无人机高光谱数据结合PROSPECT+SAIL模型构建典型农作物区多类型作物的查找表(Look-Up-Table,LUT)反演农田LAI,研究查找表用于玉米、马铃薯、向日葵、瓜地等不同作物LAI反演的适用性和精度;通过无人机高光谱数据聚合获得多尺度遥感数据源,结合Taylor展开理论和计算几何模型,提出了一种既考虑类间差异又考虑类内异质性的尺度转换模型,定量描述多种作物混合的非均一地表LAI反演过程中的尺度效应特征。结果表明:基于分类和参数敏感性分析的LUT方法能很好地应用于包头典型农作物区多类型混合作物LAI反演,总估算精度为相关系数R^2=0.82、均方根误差RMSE=0.43m^2/m^2。随着反演尺度的增加,作物类间差异造成的反演偏差明显高于类内异质性,利用本文所提出的尺度转换模型均能较好纠正低分辨率LAI反演的尺度效应问题。