We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This p...We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor.展开更多
After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To re...After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this paper.Based on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference blocks.Through the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original algorithm.To avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is proposed.The parallelization design reduces the storage access time,configuration time and saves the storage cost.Verified with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per second.Compared with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach 3.4539.On average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446.展开更多
为了优化BWDSP平台上高效视频编码(high efficiency video coding,HEVC)熵编码算法,文章基于BWDSP仿真平台对熵编码复杂度进行了深入分析,并结合BWDSP搭载的硬件资源,从对不同尺寸的变换系数块熵编码算法结构的优化、存储器优化和线性...为了优化BWDSP平台上高效视频编码(high efficiency video coding,HEVC)熵编码算法,文章基于BWDSP仿真平台对熵编码复杂度进行了深入分析,并结合BWDSP搭载的硬件资源,从对不同尺寸的变换系数块熵编码算法结构的优化、存储器优化和线性汇编优化3个不同层级的组合优化方案进行优化处理,提出了一种基于乒乓缓存的DMA数据传输优化方案,设计了一种基于单核DSP的多任务级并行处理的优化方案。实验结果表明,经过优化的HEVC熵编码的运行速度显著提高,平均加速比达到15倍。展开更多
针对高效视频编码(high efficiency video coding,HEVC)分像素运动估计亮度分量插值算法计算量大、冗余度高、难以实现不同编码块之间灵活切换的问题,提出一种动态可重构且具有高数据复用率的分像素插值算法实现方法。根据编码单元(codi...针对高效视频编码(high efficiency video coding,HEVC)分像素运动估计亮度分量插值算法计算量大、冗余度高、难以实现不同编码块之间灵活切换的问题,提出一种动态可重构且具有高数据复用率的分像素插值算法实现方法。根据编码单元(coding unit,CU)的规模和大小自适应地对其周围参考像素块进行插值计算,得到最优预测单元的编码模式和运动矢量。实验结果表明,与专用硬件实现的分像素插值算法相比,不同编码块灵活切换的同时,参考像素的读取数量减少43.8%,硬件资源消耗减少18.5%。展开更多
针对专用硬件实现高效视频编码(High Efficiency Video Coding,HEVC)帧内预测算法资源占用大,且硬件资源不能重复利用、灵活性差的问题.提出一种可重构的视频阵列处理器,能够根据当前视频序列的特点进行帧内预测算法的动态映射.首先,分...针对专用硬件实现高效视频编码(High Efficiency Video Coding,HEVC)帧内预测算法资源占用大,且硬件资源不能重复利用、灵活性差的问题.提出一种可重构的视频阵列处理器,能够根据当前视频序列的特点进行帧内预测算法的动态映射.首先,分析HEVC帧内预测算法的特点和重构的可行性,以提前终止编码块划分的阈值作为处理器进行硬件重构的依据.其次,以计算出来的参数驱动可重构阵列处理器进行硬件重构.最后,在重构的阵列处理器上进行帧内预测算法映射.通过在4×4的可重构阵列上进行Planar和DC两种预测模式实现,结果表明:与专用硬件实现方法相比资源减少了65%,与多核处理器实现方法相比延时降低了32%.展开更多
去块滤波算法是高效视频编码标准(high-efficiency video coding,HEVC)的重要组成部分,专用硬件实现的去块滤波电路结构难以满足不断革新的算法需求,可重构计算兼具计算高效性和编程灵活性成为研究热点。基于指令流与数据流混合驱动可...去块滤波算法是高效视频编码标准(high-efficiency video coding,HEVC)的重要组成部分,专用硬件实现的去块滤波电路结构难以满足不断革新的算法需求,可重构计算兼具计算高效性和编程灵活性成为研究热点。基于指令流与数据流混合驱动可重构视频阵列处理器(reconfigurable video array processor,RVAP),提出一种可重构的HEVC编码去块滤波电路的并行化实现方法,依据数据流图分析实现去块滤波算法的最大化并行,提高计算效率;通过强/弱滤波方式的灵活切换,提高计算资源利用率。实验结果表明,所提方法在满足算法灵活切换和计算速度要求的同时,硬件资源减少了47.6%,时钟频率达167 MHz。展开更多
基金supported by the National Basic Research 973 Program of China under Grant No. 2007CB310900the National Natural Science Foundation of China under Grant No. 60725208Fellowships of the Japan Society for the Promotion of Sciencefor Young Scientists Program
文摘We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor.
基金Supported by the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61874087,61634004)the Shaanxi Province Key R&D Plan(No.2020JM-525,2021GY-029,2021KW-16)。
文摘After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this paper.Based on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference blocks.Through the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original algorithm.To avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is proposed.The parallelization design reduces the storage access time,configuration time and saves the storage cost.Verified with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per second.Compared with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach 3.4539.On average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446.
文摘为了优化BWDSP平台上高效视频编码(high efficiency video coding,HEVC)熵编码算法,文章基于BWDSP仿真平台对熵编码复杂度进行了深入分析,并结合BWDSP搭载的硬件资源,从对不同尺寸的变换系数块熵编码算法结构的优化、存储器优化和线性汇编优化3个不同层级的组合优化方案进行优化处理,提出了一种基于乒乓缓存的DMA数据传输优化方案,设计了一种基于单核DSP的多任务级并行处理的优化方案。实验结果表明,经过优化的HEVC熵编码的运行速度显著提高,平均加速比达到15倍。
文摘针对高效视频编码(high efficiency video coding,HEVC)分像素运动估计亮度分量插值算法计算量大、冗余度高、难以实现不同编码块之间灵活切换的问题,提出一种动态可重构且具有高数据复用率的分像素插值算法实现方法。根据编码单元(coding unit,CU)的规模和大小自适应地对其周围参考像素块进行插值计算,得到最优预测单元的编码模式和运动矢量。实验结果表明,与专用硬件实现的分像素插值算法相比,不同编码块灵活切换的同时,参考像素的读取数量减少43.8%,硬件资源消耗减少18.5%。
文摘针对专用硬件实现高效视频编码(High Efficiency Video Coding,HEVC)帧内预测算法资源占用大,且硬件资源不能重复利用、灵活性差的问题.提出一种可重构的视频阵列处理器,能够根据当前视频序列的特点进行帧内预测算法的动态映射.首先,分析HEVC帧内预测算法的特点和重构的可行性,以提前终止编码块划分的阈值作为处理器进行硬件重构的依据.其次,以计算出来的参数驱动可重构阵列处理器进行硬件重构.最后,在重构的阵列处理器上进行帧内预测算法映射.通过在4×4的可重构阵列上进行Planar和DC两种预测模式实现,结果表明:与专用硬件实现方法相比资源减少了65%,与多核处理器实现方法相比延时降低了32%.
文摘去块滤波算法是高效视频编码标准(high-efficiency video coding,HEVC)的重要组成部分,专用硬件实现的去块滤波电路结构难以满足不断革新的算法需求,可重构计算兼具计算高效性和编程灵活性成为研究热点。基于指令流与数据流混合驱动可重构视频阵列处理器(reconfigurable video array processor,RVAP),提出一种可重构的HEVC编码去块滤波电路的并行化实现方法,依据数据流图分析实现去块滤波算法的最大化并行,提高计算效率;通过强/弱滤波方式的灵活切换,提高计算资源利用率。实验结果表明,所提方法在满足算法灵活切换和计算速度要求的同时,硬件资源减少了47.6%,时钟频率达167 MHz。