摘要
Transformer是一种基于自注意力机制、并行化处理数据的深度神经网络。近几年基于Transformer的模型成为计算机视觉任务的重要研究方向。针对目前国内基于Transformer综述性文章的空白,对其在计算机视觉上的应用进行概述。回顾了Transformer的基本原理,重点介绍了其在图像分类、目标检测、图像分割等七个视觉任务上的应用,并对效果显著的模型进行分析。最后对Transformer在计算机视觉中面临的挑战以及未来的发展趋势进行了总结和展望。
Transformer is a deep neural network based on the self-attention mechanism and parallel processing data.In recent years,Transformer-based models have emerged as an important area of research for computer vision tasks.Aiming at the current blanks in domestic review articles based on Transformer,this paper covers its application in computer vision.This paper reviews the basic principles of the Transformer model,mainly focuses on the application of seven visual tasks such as image classification,object detection and segmentation,and analyzes Transformer-based models with significant effects.Finally,this paper summarizes the challenges and future development trends of the Transformer model in computer vision.
作者
刘文婷
卢新明
LIU Wenting;LU Xinming(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266500,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第6期1-16,共16页
Computer Engineering and Applications
基金
国家重点研发计划(2017YFC0804406)
山东省重点研发计划(2016ZDJS02A05)。