NUMA-conscious外键连接优化技术

NUMA-conscious Foreign Key Join Optimization Technique

下载PDF

导出

摘要 NUMA(non-uniform memory access)是现代多核、多路处理器平台上主流的内存访问架构,NUMA访问延迟对数据库的查询性能有较大影响,因此如何降低查询处理中跨NUMA节点的访问延迟是现代内存数据库查询优化的热点问题之一.不同的处理器在NUMA架构、NUMA延迟等方面差异较大,因此NUMA优化技术需要与硬件特性相结合.基于内存数据库执行代价最高和对数据局部性依赖最强的内存外键连接算法,面向代表性的ARM、Intel CLX、Intel ICX、AMD Zen2和AMD Zen3这5个处理器NUMA架构和延迟特征,探索了不同NUMA优化方法,包括NUMA-conscious和NUMA-oblivious实现技术.在数据存储、数据分片、连接中间结果缓存等方面采用不同的优化方案,比较了不同处理器架构上的算法性能,实验结果表明,NUMA-conscious优化策略需软、硬件相结合,其中Radix Join对NUMA延迟敏感度为中性,在5个不同的处理器平台上,NUMA优化性能收益稳定在30%左右,NPO算法对NUMA延迟敏感度较高,在不同平台NUMA优化性能收益在38%–57%,Vector Join算法对NUMA延迟敏感但影响幅度较小,NUMA优化性能收益在1%–25%之间,且在算法性能特征上,Vector Join受cache效率影响比NUMA延迟影响更大;NUMA-conscious优化技术在ARM平台差异较大,在x86平台差异极小,NUMA-oblivious算法复杂度更低,具有较好的通用性.从处理器硬件发展趋势来看,降低NUMA访问延迟可以有效地降低不同NUMA-conscious优化算法的性能差异,简化连接算法的复杂度,提高连接操作性能. Non-uniform memory access(NUMA)is the mainstream memory access architecture for state-of-the-art multicore and multi-way processor platforms.Reducing the latency of cross-NUMA node accesses during queries is a key issue for modern in-memory database query optimization techniques.Due to the differences in NUMA architectures and NUMA latency across various processors,NUMA optimization techniques should be combined with hardware characteristics.This study focuses on the in-memory foreign key join algorithm,which has high cost and strong locality of data dependency in in-memory databases,and explores different NUMA optimization techniques,including NUMA-conscious and NUMA-oblivious implementations,on five platforms featuring ARM,Intel CLX/ICX,and AMD Zen2/Zen3 processors.The study also compares the performance of the algorithms across different processor platforms with strategies such as data storage,data partitioning,and join intermediate result caching.Experimental results show that the NUMA-conscious optimization strategy requires the integration of both software and hardware.Radix Join demonstrates neutral sensitivity to NUMA latency,with NUMA optimization gains constantly around 30%.The NPO algorithm shows higher sensitivity to NUMA latency,with NUMA optimization gains ranging from 38%to 57%.The Vector Join algorithm is sensitive to NUMA latency,but the impact is relatively minor,with NUMA optimization gains varying from 1%to 25%.For algorithm performance characteristics,cache efficiency influences the Vector Join performance more than NUMA latency.NUMA-conscious optimization techniques show significant differences on ARM platforms,while the differences are minimal on x86 platforms.The less complex NUMA-oblivious algorithms exhibit greater generality.Given hardware trends,reducing NUMA latency can effectively reduce performance gaps in NUMA-conscious optimization techniques,simplify join algorithm complexity,and improve join operation performance.

作者韩瑞琛张延松刘专张宇焦敏王珊 HAN Rui-Chen;ZHANG Yan-Song;LIU Zhuan;ZHANG Yu;JIAO Min;WANG Shan(Engineering Research Center of Database and Business Intelligence,Ministry of Education,Beijing 100872,China;Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University),Ministry of Education,Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China;National Survey Research Center,Renmin University of China,Beijing 100872,China;Intel China Research Center Ltd.,Beijing 100190,China;National Satellite Meteorological Center,Beijing 100081,China)

机构地区数据库与商务智能教育部工程研究中心数据工程与知识工程教育部重点实验室(中国人民大学) 中国人民大学信息学院中国人民大学中国调查与数据中心英特尔(中国)有限公司国家卫星气象中心

出处《软件学报》北大核心 2025年第12期5821-5850,共30页 Journal of Software

基金国家重点研发计划(2023YFB4503600) 国家自然科学基金(U23A20299,62172424,62276270,62322214)。

关键词 NUMA架构 NUMA感知优化非NUMA感知实现向量连接连接基准 NUMA architecture NUMA-conscious optimization NUMA-oblivious implementation vector join join benchmark

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1Yansong ZHANG,Shan WANG,Jiaheng LU.Improving performance by creating a native join-index for OLAP[J].Frontiers of Computer Science,2011,5(2):236-249. 被引量：4

共引文献3

1焦敏,张延松,王珊,陈红.内存OLAP多核并行查询优化技术研究[J].计算机学报,2014,37(9):1895-1910. 被引量：4
2张宇,张延松,张兵,陈红,王珊.Co-OLAP:CPU&GPU混合平台上面向星形模型基准的协同OLAP（英文）[J].华东师范大学学报（自然科学版）,2014(5):240-251.
3张宇,张延松,张吉.CPU和GPU混合平台上的协同联机分析处理[J].高性能计算技术,2013,0(6):1-9.

1娃娃.配置点评[J].电脑自做,2005(4):86-88.
2李博睿,张景钧,雷雅.基于数据库表的微服务拆分技术分析[J].信息与电脑,2025,37(19):111-113.
3刘玮,柳毅,郭瑛.基于电力大数据的智能挖掘技术分析[J].集成电路应用,2025,42(7):100-101.
4覃紫云,郭青松,马国帅,张凯涵,石琼,蔡江辉.时序图模式匹配的查询优化[J].计算机系统应用,2025,34(11):42-55.
5陈智伟.面向边缘计算的轻量化人工智能模型研究[J].互联网周刊,2025(22):26-28.
6张啸剑,曹小杰,王宁,孟小峰.基于本地化差分隐私的多表星形连接查询[J].软件学报,2025,36(2):830-850.
7李汉桥,赵苑君.面向并发图分析的局部性感知的缓存管理策略[J].计算机科学,2025,52(12):125-132.
8韩冰,王佳音,钟新龙.国产人工智能框架发展的突破路径[J].软件和集成电路,2025(11):35-37.
9张仰鹏,颜志铭,庞劲松,夏旖琪.基于TDR技术的广西砾土介电常数特征研究[J].西部交通科技,2025(10):5-8.
10刘群.高速相干光通信中一种改进的EEPN补偿算法[J].光通信研究,2025(5):34-40.

软件学报

2025年第12期

浏览历史

内容加载中请稍等...

NUMA-conscious外键连接优化技术

参考文献1

共引文献3

相关作者

相关机构

相关主题

浏览历史