Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core sc...Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance.展开更多
预执行帮助线程在预取过程中需要进行动态预取调节,而传统静态枚举控制参数值的控制方法在预取执行过程中保持固定不变,从而使得该方法不能够有效地为主线程提供预取质量保证(quality of service,Qo S)。针对该问题,提出了一种基于交织...预执行帮助线程在预取过程中需要进行动态预取调节,而传统静态枚举控制参数值的控制方法在预取执行过程中保持固定不变,从而使得该方法不能够有效地为主线程提供预取质量保证(quality of service,Qo S)。针对该问题,提出了一种基于交织预取率的帮助线程预取质量参数调节方法。首先,对帮助线程的预取Qo S优化进行了建模分析;其次,在前期交织预取工作的基础上,提出了基于交织预取率的帮助线程参数值调节算法;最后,在真实的商用多核平台上对所提出帮助线程预取调节算法进行了评测和分析。实验结果是所提出的帮助线程预取调节算法使得基准测试程序的几何平均性能加速比为1. 114,而传统静态枚举方法的几何平均性能加速比为1. 135。实验结果表明,所提出的帮助线程预取质量调节算法解决了帮助线程预取过程中的参数值自动调节问题,算法无须静态枚举参数值便可以快速获得与之相近似的预取性能提升。展开更多
文摘Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance.