The Sequential Task Flow(STF)model guides task parallelism by dynamically analyzing data dependencies at runtime,making it well-suited to handle dynamic and irregular parallelism.However,it introduces additional depen...The Sequential Task Flow(STF)model guides task parallelism by dynamically analyzing data dependencies at runtime,making it well-suited to handle dynamic and irregular parallelism.However,it introduces additional dependency tracking overhead.As task granularity becomes increasingly fine-grained or hardware parallelism increases,the traditional Centralized TDG Building(CB)algorithm progressively becomes a performance bottleneck.The Parallel TDG Building algorithm with Helpers(PBH),which leverages hardware message-passing mechanisms,has achieved significant speedups on the SW26010 platform,but its intensive sub-microsecond irregular synchronizations make it difficult to scale on cache-coherent multicore platforms.This paper proposes Cache-friendly PBH(CPBH),a parallel dependency tracking algorithm optimized for cache-coherent architectures.CPBH introduces a locality-aware lock-free batch synchronization mechanism that reduces the overhead of atomic operation contention and improves data access locality.Additionally,it employs an asynchronous execution strategy to overlap dependency tracking and task graph execution using dynamic reference counting.Experiments on three cache-coherent multicore platforms using 10 HPC benchmarks demonstrate that CPBH achieves an average speedup exceeding 1.4×compared to CB and over 1.2×speedup compared to DDAST under fine-grained scenarios.展开更多
目的工业缺陷检测是现代工业质量控制中至关重要的一环,针对工业多模态缺陷检测场景下,捕捉不同形状大小、在RGB图像上感知度低的缺陷,以及减少单模态原始特征空间内存在的噪声对多模态信息交互的干扰的挑战,提出了一种基于归一化流的...目的工业缺陷检测是现代工业质量控制中至关重要的一环,针对工业多模态缺陷检测场景下,捕捉不同形状大小、在RGB图像上感知度低的缺陷,以及减少单模态原始特征空间内存在的噪声对多模态信息交互的干扰的挑战,提出了一种基于归一化流的多模态多尺度缺陷检测方法。方法首先,使用Vision Transformer和Point Transformer对RGB图像和3D点云两个模态的信息提取第1、3、11块的特征构建特征金字塔,保留低层次特征的空间信息助力缺陷定位任务,并提高模型对不同形状大小缺陷的鲁棒性;其次,为了简化多模态交互,使用过点特征对齐算法将3D点云特征对齐至RGB图像所在平面,通过构建对比学习矩阵的方式实现无监督多模态特征融合,促进不同模态之间信息的交互;此外,通过设计代理任务的方式将信息瓶颈机制扩展至无监督,并在尽可能保留原始信息的同时,减少噪声干扰得到更充分有力的多模态表示;最后,使用多尺度归一化流结构捕捉不同尺度的特征信息,实现不同尺度特征之间的交互。结果本文方法在MVTec-3D AD数据集上进行性能评估,实验结果显示Detection AUCROC(area under the curve of the receiveroperating characteristic)指标达到93.3%,SegmentationAUPRO(area under the precision-recall overlap)指标达到96.1%,Segmentation AUCROC指标达到98.8%,优于大多数现有的多模态缺陷检测方法。结论本文方法对于不同形状大小、在RGB图像上感知度低的缺陷有较好的检测效果,不但减少了原始特征空间内噪声对多模态表示的影响,并且对不同形状大小的缺陷具有一定的泛化能力,较好地满足了现代工业对于缺陷检测的要求。展开更多
针对多视觉任务中传输成本高、解码端计算压力大的问题,提出一种自适应可伸缩视频编码(adaptive scalable video coding,ASVC)传输框架,将视频分为语义层和背景层,分别传输语义和背景信息。此外,提出一种自适应压缩算法,构建了C4.5决策...针对多视觉任务中传输成本高、解码端计算压力大的问题,提出一种自适应可伸缩视频编码(adaptive scalable video coding,ASVC)传输框架,将视频分为语义层和背景层,分别传输语义和背景信息。此外,提出一种自适应压缩算法,构建了C4.5决策树模型分析网络环境对视频进行压缩的决策判定,并对帧序列进行光流分析,在保留变化显著的帧基础上引入插值机制保持图像的平滑性。仿真结果表明,ASVC方法在不同码率环境下表现更高的识别精准率,视频质量和传输效率的显著提升。展开更多
基金supported in part by the National Key Research and Development Program of China(2024YFB4505701)National Natural Science Foundation of China(62090024)。
文摘The Sequential Task Flow(STF)model guides task parallelism by dynamically analyzing data dependencies at runtime,making it well-suited to handle dynamic and irregular parallelism.However,it introduces additional dependency tracking overhead.As task granularity becomes increasingly fine-grained or hardware parallelism increases,the traditional Centralized TDG Building(CB)algorithm progressively becomes a performance bottleneck.The Parallel TDG Building algorithm with Helpers(PBH),which leverages hardware message-passing mechanisms,has achieved significant speedups on the SW26010 platform,but its intensive sub-microsecond irregular synchronizations make it difficult to scale on cache-coherent multicore platforms.This paper proposes Cache-friendly PBH(CPBH),a parallel dependency tracking algorithm optimized for cache-coherent architectures.CPBH introduces a locality-aware lock-free batch synchronization mechanism that reduces the overhead of atomic operation contention and improves data access locality.Additionally,it employs an asynchronous execution strategy to overlap dependency tracking and task graph execution using dynamic reference counting.Experiments on three cache-coherent multicore platforms using 10 HPC benchmarks demonstrate that CPBH achieves an average speedup exceeding 1.4×compared to CB and over 1.2×speedup compared to DDAST under fine-grained scenarios.
文摘目的工业缺陷检测是现代工业质量控制中至关重要的一环,针对工业多模态缺陷检测场景下,捕捉不同形状大小、在RGB图像上感知度低的缺陷,以及减少单模态原始特征空间内存在的噪声对多模态信息交互的干扰的挑战,提出了一种基于归一化流的多模态多尺度缺陷检测方法。方法首先,使用Vision Transformer和Point Transformer对RGB图像和3D点云两个模态的信息提取第1、3、11块的特征构建特征金字塔,保留低层次特征的空间信息助力缺陷定位任务,并提高模型对不同形状大小缺陷的鲁棒性;其次,为了简化多模态交互,使用过点特征对齐算法将3D点云特征对齐至RGB图像所在平面,通过构建对比学习矩阵的方式实现无监督多模态特征融合,促进不同模态之间信息的交互;此外,通过设计代理任务的方式将信息瓶颈机制扩展至无监督,并在尽可能保留原始信息的同时,减少噪声干扰得到更充分有力的多模态表示;最后,使用多尺度归一化流结构捕捉不同尺度的特征信息,实现不同尺度特征之间的交互。结果本文方法在MVTec-3D AD数据集上进行性能评估,实验结果显示Detection AUCROC(area under the curve of the receiveroperating characteristic)指标达到93.3%,SegmentationAUPRO(area under the precision-recall overlap)指标达到96.1%,Segmentation AUCROC指标达到98.8%,优于大多数现有的多模态缺陷检测方法。结论本文方法对于不同形状大小、在RGB图像上感知度低的缺陷有较好的检测效果,不但减少了原始特征空间内噪声对多模态表示的影响,并且对不同形状大小的缺陷具有一定的泛化能力,较好地满足了现代工业对于缺陷检测的要求。
文摘顺序任务流(sequential task flow,STF)将对共享数据的访问表示为任务之间的依赖关系,STF运行时系统通过任务构造、依赖分析和任务依赖图(task dependence graph,TDG)生成、任务调度实现异步并行,这3个环节的开销直接影响并行程序的性能.目前以STF为核心的AceMesh运行时系统,在SW39000处理器上仅使用单主核构图、多从核执行的方式.然而,SW39000处理器离散访存性能较弱,细粒度任务构图离散访存增多,构图更容易成为瓶颈.对此,提出了一种利用多从核辅助主核进行构图的算法.首先,分析在依赖分析和TDG生成过程中的并行性,在SW39000处理器上实现了一种基于胖任务依赖图(fatTDG)的多核辅助并行构图算法PFBH(parallelized fatTDG building algorithm with helpers)并进行优化.其次,针对线程间的主存资源竞争问题,提出构图与执行并行中从核资源调节方法及参数选择.最终,在5类典型应用下进行实验测试.与单核串行构图系统相比,在细粒度任务场景下最高加速为1.75倍;与SW39000处理器上的OpenACC模型相比,AceMesh最高可达2倍加速.
文摘针对多视觉任务中传输成本高、解码端计算压力大的问题,提出一种自适应可伸缩视频编码(adaptive scalable video coding,ASVC)传输框架,将视频分为语义层和背景层,分别传输语义和背景信息。此外,提出一种自适应压缩算法,构建了C4.5决策树模型分析网络环境对视频进行压缩的决策判定,并对帧序列进行光流分析,在保留变化显著的帧基础上引入插值机制保持图像的平滑性。仿真结果表明,ASVC方法在不同码率环境下表现更高的识别精准率,视频质量和传输效率的显著提升。