摘要
随着三维地震数据采集、油气田开发等系列新技术带来的快速数据更新,大数据背景下三维数据体的高效存储与处理分析变得越来越复杂。在参考GFS(Google File System,谷歌文件系统)设计理念和消化其关键分布式处理技术的基础上,本文设计了PetroV(Petroleum Valuation,勘探部署决策一体化软件系统)最新软件架构并研发了系列大数据存储与分析的关键软件技术。利用三维空间下八叉树结构与编码的快速空间定位和多分辨率机制,实现对三维大数据体的结构分层、分块存储与二级内存缓存框架,支持并发访问和不同分辨率流式显示。由数据存取客户端、元数据服务和子数据块存取服务组成的八叉树分布式存储框架,屏蔽了基于八叉树切分后子体数据块在上百台计算机的后台冗余或备份存储事实,实现了几乎与单机文件系统接口一致的分布式文件存取接口;由任务执行客户端、任务管理服务和面向地震、数字岩心或测井数据的系列专业解释算法服务组成的分布式大数据分析框架,利用八叉树分块存储的特点实现"分而治之"并行编程模型,显著降低并行编程模型实现的复杂度。大数据下PetroV软件体系架构设计及衍生的系列专业软件版本,最终目的是希望能够推动应用导向与新技术发展深度结合,持续追求卓越,增强自主创新能力。
As more and more 3D seismic data are acquired and rapid data updating for the oilfield development is repeatedly achieved,one of the toughest challenges for the software system architecture is efficient mass data storage and analysis in the nowadays big data era.Inspired by the distributed system framework of the Google File System(GFS)and Map-Reduce shared by Google,we design PetroV distributed system framework and develop some key techniques for mass data storage and analysis.First,3Dseismic or digital rock data cubes are splited with the spatial octree encoding algorithm into multi-scale partition structure and data blocks.Then,differentiated from SEGY sequential access,the octree data block access framework is implemented,which embodies parallel read and write,multi-scale stream render,and two-step memory cache.After that,intuitive and heuristic distributed file storage solutions upon commodity computers are transparently deployed and respectively named by master nodes and data chunk nodes.Distributed files have the similar API as a stand-alone file except for shedding massive 3Ddata blocks in commodity machines with redundant backup.Finally,thanks to the octree data block,the Map-Reduce parallel analytic framework is developed on plenty of task nodes.
出处
《石油地球物理勘探》
EI
CSCD
北大核心
2017年第4期875-883,共9页
Oil Geophysical Prospecting
基金
国家页岩油重大专项(2017ZX05049001-007)
中科院A类战略性先导科技专项(XDAXX010405)联合资助