The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,call...The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach.展开更多
This paper presents a novel generative model,Collaborative Competitive Agents(CCA),which leverages the capabilities of multiple Large Language Models(LLMs)based agents to execute complex tasks.Drawing inspiration from...This paper presents a novel generative model,Collaborative Competitive Agents(CCA),which leverages the capabilities of multiple Large Language Models(LLMs)based agents to execute complex tasks.Drawing inspiration from Generative Adversarial Networks(GANs),the CCA system employs two equal-status generator agents and a discriminator agent.The generators independently process user instructions and generate results,while the discriminator evaluates the outputs,and provides feedback for the generator agents to further reflect and improve the generation results.Unlike the previous generative model,our system can obtain the intermediate steps of generation.This allows each generator agent to learn from other successful executions due to its transparency,enabling a collaborative competition that enhances the quality and robustness of the system’s results.The primary focus of this study is image editing,demonstrating the CCA’s ability to handle intricate instructions robustly.The paper’s main contributions include the introduction of a multiagent-based generative model with controllable intermediate steps and iterative optimization,a detailed examination of agent relationships,and comprehensive experiments on image editing.展开更多
Photon mapping is widely used for global illumi- nation rendering because of its high computational efficiency. But its efficiency is still limited, mainly by the intensive sam- piing required in final gathering, a pr...Photon mapping is widely used for global illumi- nation rendering because of its high computational efficiency. But its efficiency is still limited, mainly by the intensive sam- piing required in final gathering, a process that is critical for removing low frequency artifacts of density estimation. In this paper, we propose a method to predict the final gather- ing estimation with direct density estimation, thereby achiev- ing high quality global illumination by photon mapping with high efficiency. We first sample the irradiance of a subset of shading points by both final gathering and direct radiance es- timation. Then we use the samples as a training set to predict the final gathered irradiance of other shading points through regression. Consequently, we are able to achieve about three times overall speedup compared with straightforward final gathering in global illumination computation with the same rendering quality.展开更多
文摘The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach.
文摘This paper presents a novel generative model,Collaborative Competitive Agents(CCA),which leverages the capabilities of multiple Large Language Models(LLMs)based agents to execute complex tasks.Drawing inspiration from Generative Adversarial Networks(GANs),the CCA system employs two equal-status generator agents and a discriminator agent.The generators independently process user instructions and generate results,while the discriminator evaluates the outputs,and provides feedback for the generator agents to further reflect and improve the generation results.Unlike the previous generative model,our system can obtain the intermediate steps of generation.This allows each generator agent to learn from other successful executions due to its transparency,enabling a collaborative competition that enhances the quality and robustness of the system’s results.The primary focus of this study is image editing,demonstrating the CCA’s ability to handle intricate instructions robustly.The paper’s main contributions include the introduction of a multiagent-based generative model with controllable intermediate steps and iterative optimization,a detailed examination of agent relationships,and comprehensive experiments on image editing.
文摘Photon mapping is widely used for global illumi- nation rendering because of its high computational efficiency. But its efficiency is still limited, mainly by the intensive sam- piing required in final gathering, a process that is critical for removing low frequency artifacts of density estimation. In this paper, we propose a method to predict the final gather- ing estimation with direct density estimation, thereby achiev- ing high quality global illumination by photon mapping with high efficiency. We first sample the irradiance of a subset of shading points by both final gathering and direct radiance es- timation. Then we use the samples as a training set to predict the final gathered irradiance of other shading points through regression. Consequently, we are able to achieve about three times overall speedup compared with straightforward final gathering in global illumination computation with the same rendering quality.