期刊文献+

深度学习助力全息元宇宙虚实融合场景生成与呈现:发展与展望(特邀)

Deep learning empowers generation and presentation of virtual-real fusion scenarios in holographic metaverse:development and prospects(invited)
原文传递
导出
摘要 元宇宙是互联网变革的先导性和支撑性技术,表明了信息维度扩展和沉浸体验革新是互联网的未来发展趋势。数字三维内容是元宇宙的核心要素,也是承载信息和传递反馈的主要媒介。基于数字渲染的三维内容生成和基于全息显示的三维内容呈现,在图像效果、设备成本和应用灵活性等方面优势显著,在元宇宙领域内具有广阔前景。文中比较了常用数字渲染技术的性能表现,介绍了单目深度估计在真实场景三维数字化任务中的作用,综述了有监督和无监督两类基于人工智能的单目深度估计技术的发展历程,强调了突破深度估计精度和速度瓶颈是单目深度估计技术在元宇宙内容生成领域的主要挑战,进而介绍了潜在解决方案,包括回归估计区间优化、特征参数冗余压缩和多维度特征关联等;介绍了人工智能技术在计算全息图生成任务中的应用,综述了数据驱动和模型驱动两类计算全息图生成网络的发展历程,总结了全息显示结果可重构深度范围受限是计算全息图生成网络在元宇宙内容呈现领域的主要挑战,进而介绍了潜在的解决方案,包括全息图频率成分滤波、初始计算条件优化和模型收敛路径选择等。总之,提升三维内容生成和呈现的质量和效率,是元宇宙对计算全息三维显示提出的必然要求。 Significance The metaverse is a guiding and supporting technology for the revolution of internet.It can enhance the visual experience and interactive efficiency,demonstrating prominent economic and social benefits.Digital 3D content is a core element of the metaverse,serving as the primary medium for visual information and interactive feedback.Thus,the generation and presentation of 3D content are critical for the construction of the metaverse(Fig.1-Fig.2).Generating 3D content through digital rendering technology and presenting it through holographic display technology is a wise combination for the metaverse construction because it can strike a balance among visual fidelity,device costs,and deployment complexity.However,in the task of real-world digitalization,this combination often faces bottlenecks of calculation speed and presentation quality which are caused by the massive computational load.Fortunately,the advancement of neural network provides a powerful tool to break through these bottlenecks.Progress Digital 3D rendering of 2D images,also known as depth estimation,can be categorized into multiview estimation,motion estimation,and monocular estimation.Monocular depth estimation employs single-view 2D images as the input data,demonstrating advantages including high deployment flexibility and low device costs.The neural network of monocular depth estimation can be categorized into supervised-type and unsupervised-type(Fig.3).Supervised network requires depth-labeled datasets as supervisory signals for parameter training.However,its practical application is often limited by the high difficulty of obtaining labeled datasets.Unsupervised network primarily relies on mathematical priors to achieve depth estimation,significantly reducing dependence on labeled datasets.However,the performance of this type of networks still requires continuous enhancement.Currently,monocular depth estimation networks face challenges in insufficient estimation robustness and inadequate calculation speed.To rapidly construct high-quality 3D content for the metaverse,constraints in monocular depth estimation require further in-depth investigation,to break through these mentioned challenges.Potential research directions include the optimization of estimation intervals,reduction of feature redundancy in depth estimation,and enhancement of correlations between monocular estimation and multi-view estimation(Fig.4).Holographic display is an impeccable solution for presenting digital 3D content in the metaverse.Phase-only hologram,with its high energy-efficiency and absence of twin-image artifact,serves as a superior medium for dynamic 3D content.However,the generation process of a phase-only hologram is ill-posed,posing challenges of limited computational speed and accuracy.Neural network,as an expert in solving ill-posed problems,provides a powerful tool for the calculation of phase-only holograms.Generation networks for phase-only holograms can be categorized into data-driven type and model-driven type(Fig.5).Data-driven network requires 3D targets and corresponding phase-only holograms to update parameters of the network.However,obtaining high-quality hologram-datasets demands significant computational resources.Model-driven network leverages physical constraints to train the network,overcoming the limitation of dataset quality on inference capabilities of the network.Currently,holographic display often suffers from the limited depth ranges in optical reconstructions.To extend the depth range,it is critical to address the constraints imposed by computational strategies on solving illposed problems.Further research directions include frequency filtering of phase-only holograms,optimization of initial calculation conditions,and selection of solution paths(Fig.6).Conclusion and prospect The integration of metaverse technology with internet technology holds the potential to revolutionize many fields including education,social interaction,healthcare,and industry.Neural network,as a rapid and accurate calculation tool,provides an ideal solution for the generation and presentation of the 3D content in the metaverse.The limited estimation robustness and calculation speed pose a bottleneck on 3D content generation.Researches on the constraints in monocular depth estimation should be conducted to breakthrough this bottleneck.The limited depth range of optical reconstructions is a major challenge for holographic presentation of the 3D content.Addressing this challenge requires optimizing calculation strategies for solving ill-posed problems.Based on these researches,3D acquisition and projection systems can be constructed in the foreseeable future,which would inject strong momentum into the sustainable development of virtual-real interaction in the metaverse.
作者 何泽浩 高云晖 曹良才 张岩 HE Zehao;GAO Yunhui;CAO Liangcai;ZHANG Yan(Department of Physics,Capital Normal University,Beijing 100048,China;Department of Precision Instrument,Tsinghua University,Beijing 100084,China)
出处 《红外与激光工程》 北大核心 2025年第7期54-67,共14页 Infrared and Laser Engineering
基金 国家自然科学基金项目(62205173,62441613)。
关键词 元宇宙 深度估计 计算全息 三维成像 三维显示 metaverse depth estimation computer-generated holography 3D imaging 3D display
  • 相关文献

参考文献17

二级参考文献82

共引文献133

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部