Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constr...Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constructs the Resghost Module by combining residual connections and Adaptive-SE Blocks,which enhances the quality of generated feature maps through direct propagation of original input information and selection of important channels before cheap operations.Specifically,ResghostNet introduces residual connections on the basis of the Ghost Module to optimize the information flow,and designs a weight self-attention mechanism combined with SE blocks to enhance feature expression capabilities in cheap operations.Experimental results on the ImageNet dataset show that,compared to GhostNet,ResghostNet achieves higher accuracy while reducing the number of parameters by 52%.Although the computational complexity increases,by optimizing the usage strategy of GPU cachememory,themodel’s inference speed becomes faster.The ResghostNet is optimized in terms of classification accuracy and the number of model parameters,and shows great potential in edge computing devices.展开更多
Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time n...Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very naive implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU.展开更多
基金funded by Science and Technology Innovation Project grant No.ZZKY20222304.
文摘Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constructs the Resghost Module by combining residual connections and Adaptive-SE Blocks,which enhances the quality of generated feature maps through direct propagation of original input information and selection of important channels before cheap operations.Specifically,ResghostNet introduces residual connections on the basis of the Ghost Module to optimize the information flow,and designs a weight self-attention mechanism combined with SE blocks to enhance feature expression capabilities in cheap operations.Experimental results on the ImageNet dataset show that,compared to GhostNet,ResghostNet achieves higher accuracy while reducing the number of parameters by 52%.Although the computational complexity increases,by optimizing the usage strategy of GPU cachememory,themodel’s inference speed becomes faster.The ResghostNet is optimized in terms of classification accuracy and the number of model parameters,and shows great potential in edge computing devices.
基金Supported by the National Natural Science Foundation of China (No. 61133005)the National High-Tech Research and Development (863) Program of China (No. 2012AA010902)
文摘Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very naive implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU.