期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality 被引量:7
1
作者 Jinyu LI Bangbang YANG +3 位作者 Danpeng CHEN Nan WANG Guofeng ZHANG Hujun BAO 《Virtual Reality & Intelligent Hardware》 2019年第4期386-410,共25页
Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an ... Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an appropriate benchmark.For AR applications in practice,a variety of challenging situations(e.g.,fast motion,strong rotation,serious motion blur,dynamic interference)may be easily encountered since a home user may not carefully move the AR device,and the real environment may be quite complex.In addition,the frequency of camera lost should be minimized and the recovery from the failure status should be fast and accurate for good AR experience.Existing SLAM datasets/benchmarks generally only provide the evaluation of pose accuracy and their camera motions are somehow simple and do not fit well the common cases in the mobile AR applications.With the above motivation,we build a new visual-inertial dataset as well as a series of evaluation criteria for AR.We also review the existing monocular VSLAM/VISLAM approaches with detailed analyses and comparisons.Especially,we select 8 representative monocular VSLAM/VISLAM approaches/systems and quantitatively evaluate them on our benchmark.Our dataset,sample code and corresponding evaluation tools are available at the benchmark website http://www.zjucvg.net/eval-vislam/. 展开更多
关键词 Visual-inertial SLAM ODOMETRY Tracking LOCALIZATION Mapping Augmented reality
在线阅读 下载PDF
Multi-View Point-Based Registration for Native Knee Kinematics Measurement with Feature Transfer Learning
2
作者 Cong Wang Shuaining Xie +4 位作者 Kang Li Chongyang Wang Xudong Liu Liang Zhao Tsung-Yuan Tsai 《Engineering》 SCIE EI 2021年第6期881-888,共8页
Deep-learning methods provide a promising approach for measuring in-vivo knee joint motion from fast registration of two-dimensional(2D)to three-dimensional(3D)data with a broad range of capture.However,if there are i... Deep-learning methods provide a promising approach for measuring in-vivo knee joint motion from fast registration of two-dimensional(2D)to three-dimensional(3D)data with a broad range of capture.However,if there are insufficient data for training,the data-driven approach will fail.We propose a feature-based transfer-learning method to extract features from fluoroscopic images.With three subjects and fewer than 100 pairs of real fluoroscopic images,we achieved a mean registration success rate of up to 40%.The proposed method provides a promising solution,using a learning-based registration method when only a limited number of real fluoroscopic images is available. 展开更多
关键词 2D–3D registration Machine learning Domain adaption Point correspondence
在线阅读 下载PDF
PVT v2:Improved baselines with Pyramid Vision Transformer 被引量:128
3
作者 Wenhai Wang Enze Xie +6 位作者 Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao 《Computational Visual Media》 SCIE EI CSCD 2022年第3期415-424,共10页
Transformers have recently lead to encouraging progress in computer vision.In this work,we present new baselines by improving the original Pyramid Vision Transformer(PVT v1)by adding three designs:(i)a linear complexi... Transformers have recently lead to encouraging progress in computer vision.In this work,we present new baselines by improving the original Pyramid Vision Transformer(PVT v1)by adding three designs:(i)a linear complexity attention layer,(ii)an overlapping patch embedding,and(iii)a convolutional feed-forward network.With these modifications,PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification,detection,and segmentation.In particular,PVT v2 achieves comparable or better performance than recent work such as the Swin transformer.We hope this work will facilitate state-ofthe-art transformer research in computer vision.Code is available at https://github.com/whai362/PVT. 展开更多
关键词 TRANSFORMERS dense prediction image classification object detection semantic segmentation
原文传递
An AI-Based Curling Game System for Winter Olympics 被引量:1
4
作者 Xuanke Shi Quan Wang +4 位作者 Chao Wang Rui Wang Longshu Zheng Chen Qian Wei Tang 《Research》 SCIE EI CSCD 2023年第2期159-175,共17页
The real-time application of artificial intelligence(AI)technologies in sports is a long-standing challenge owing to large spatial sports field,complexity,and uncertainty of real-world environment,etc.
关键词 SPORTS artificial WINTER
原文传递
6G-Enabled Edge AI for Metaverse:Challenges, Methods,and Future Research Directions 被引量:4
5
作者 Luyi Chang Zhe Zhang +8 位作者 Pei Li Shan Xi Wei Guo Yukang Shen Zehui Xiong Jiawen Kang Dusit Niyato Xiuquan Qiao Yi Wu 《Journal of Communications and Information Networks》 EI CSCD 2022年第2期107-121,共15页
Sixth generation(6G)enabled edge intelligence opens up a new era of Internet of everything and makes it possible to interconnect people-devices-cloud anytime,anywhere.More and more next-generation wireless network sma... Sixth generation(6G)enabled edge intelligence opens up a new era of Internet of everything and makes it possible to interconnect people-devices-cloud anytime,anywhere.More and more next-generation wireless network smart service applications are changing our way of life and improving our quality of life.As the hottest new form of next-generation Internet applications,Metaverse is striving to connect billions of users and create a shared world where virtual and reality merge.However,limited by resources,computing power,and sensory devices,Metaverse is still far from realizing its full vision of immersion,materialization,and interoperability.To this end,this survey aims to realize this vision through the organic integration of 6G-enabled edge artificial intelligence(AI)and Metaverse.Specifically,we first introduce three new types of edge-Metaverse architectures that use 6G-enabled edge AI to solve resource and computing constraints in Metaverse.Then we summarize technical challenges that these architectures face in Metaverse and the existing solutions.Furthermore,we explore how the edge-Metaverse architecture technology helps Metaverse to interact and share digital data.Finally,we discuss future research directions to realize the true vision of Metaverse with 6G-enabled edge AI. 展开更多
关键词 edge artificial intelligence artificial intelli-gence 6G metaverse federated learning
原文传递
Formulating facial mesh tracking as a differentiable optimization problem:a backpropagation-based solution
6
作者 Siran Peng Xiangyu Zhu +2 位作者 Dong Yi Chen Qian Zhen Lei 《Visual Intelligence》 2024年第1期247-258,共12页
Facial mesh tracking enables the production of topologically consistent 3D facial meshes from stereo video input captured by calibrated cameras.This technology is an integral part of many digital human applications,su... Facial mesh tracking enables the production of topologically consistent 3D facial meshes from stereo video input captured by calibrated cameras.This technology is an integral part of many digital human applications,such as personalized avatar creation,audio-driven 3D facial animation,and talking face video generation.Currently,most facial mesh tracking methods are built on computer graphics techniques,which involve complex procedures and often necessitate human annotation within pipelines.As a result,these approaches are difficult to implement and hard to generalize across various scenarios.We propose a backpropagation-based solution that formulates facial mesh tracking as a differentiable optimization problem called the BPMT.Our solution leverages visual clues extracted from the stereo input to estimate vertex-wise geometry and texture information.The BPMT is composed of two steps:automatic face analysis and mesh tracking.In the first step,a range of visual clues are automatically extracted from the input,including facial point clouds,multi-view 2D landmarks,3D landmarks in the world coordinate system,motion fields,and image masks.The second step can be viewed as a differentiable optimization problem,with constraints comprising stereo video input and facial clues.The primary objective is to achieve topologically consistent 3D facial meshes across frames.Additionally,the parameters to be optimized encompass the positions of free-form deformed vertices and a shared texture UV map.Furthermore,the 3D morphable model(3DMM)is introduced as a form of regularization to enhance the convergence of the optimization process.Leveraging the fully developed backpropagation software,we progressively register the facial meshes to the recorded object,generating high-quality 3D faces with consistent topologies.The BPMT requires no manual labeling within the pipeline,making it suitable for producing large-scale stereo facial data.Moreover,our method exhibits a high degree of flexibility and extensibility,positioning it as a promising platform for future research in the community. 展开更多
关键词 Facial mesh tracking Free-form deformation Optimization BACK-PROPAGATION
在线阅读 下载PDF
Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance
7
作者 Zhangwei Gao Zhe Chen +12 位作者 Erfei Cui Yiming Ren Weiyun Wang Jinguo Zhu Hao Tian Shenglong Ye Junjun He Xizhou Zhu Lewei Lu Tong Lu Yu Qiao Jifeng Dai Wenhai Wang 《Visual Intelligence》 2024年第1期392-408,共17页
Multi-modal large language models(MLLMs)have demonstrated impressive performance in vision-language tasks across a wide range of domains.However,the large model scale and associated high computational cost pose signif... Multi-modal large language models(MLLMs)have demonstrated impressive performance in vision-language tasks across a wide range of domains.However,the large model scale and associated high computational cost pose significant challenges for training and deploying MLLMs on consumer-grade GPUs or edge devices,thereby hindering their widespread application.In this work,we introduce Mini-InternVL,a series of MLLMs with parameters ranging from 1 billion to 4 billion,which achieves 90% of the performance with only 5% of the parameters.This significant improvement in efficiency and effectiveness makes our models more accessible and applicable in various real-world scenarios.To further promote the adoption of our models,we are developing a unified adaptation framework for Mini-InternVL,which enables our models to transfer and outperform specialized models in downstream tasks,including autonomous driving,medical image processing,and remote sensing.We believe that our models can provide valuable insights and resources to advance the development of efficient and effective MLLMs. 展开更多
关键词 Lightweight multi-modal large language model Vision-language model Knowledge distillation Visual instruction tuning
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部