To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes,a spatial guidance and temporal enhancement(SG-TE)network is proposed for facial-bodily e...To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes,a spatial guidance and temporal enhancement(SG-TE)network is proposed for facial-bodily emotion recognition.First,ResNet50,DNN and spatial ransformer models are used to capture facial texture vectors,bodily skeleton vectors and wholebody geometric vectors,and an intraframe correlation attention guidance(S-CAG)mechanism,which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector,is designed to exploit the spatial potential emotional correlation between face and posture.Second,an interframe significant segment enhancement(T-SSE)structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony.Finally,an adaptive weight assignment(M-AWA)strategy is constructed to realise facial-bodily fusion.The experimental results on the BabyRobot Emotion Dataset(BRED)and Context-Aware Emotion Recognition(CAER)dataset indicate that the proposed network reaches accuracies of 81.61%and 89.39%,which are 9.61%and 9.46%higher than those of the baseline network,respectively.Compared with the state-of-the-art methods,the proposed method achieves 7.73%and 20.57%higher accuracy than single-modal methods based on facial expression or bodily posture,respectively,and 2.16%higher accuracy than the dual-modal methods based on facial-bodily fusion.Therefore,the proposed method,which adaptively fuses the complementary information of face and posture,improves the quality of emotion recognition in real-world scenarios.展开更多
With the rapid development of network and communication techniques,the teaching forms have become diversified.To enhance the education experience and improve the teaching environment,an increasing number of educationa...With the rapid development of network and communication techniques,the teaching forms have become diversified.To enhance the education experience and improve the teaching environment,an increasing number of educational institutions have adopted virtual simulation technology.A typical teaching mechanism is to exploit Virtual Reality(VR)technology,which affords participants an immersive experience.Unquestionably,such a VRbased mode is highly approved.However,the performance of this technology requires further optimization.On one hand,for VR 360video,the current intraframe decision cannot adapt to rapid response demands.On the other hand,the generated data size is considerably large and fast computation may not be realized,depending on the local VR device.Therefore,this study proposes an improved teaching mechanism empowered by edge computing–driven VR,called VE4T,that involves two parts.First,an intraframe decision algorithm for VR 360videos is devised to realize the rapid responses.Second,an edge computing framework is proposed to offload some tasks to an edge server for computation,where a task scheduling strategy is developed to check whether a task needs to be offloaded.Finally,experiments are performed using a practical teaching scenario with some VR devices.The obtained results demonstrate that VE4T is more efficient than existing mechanisms.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:62176084,Natural Science Foundation of Anhui Province of China,Grant/Award Number:1908085MF195,Natural Science Research Project of the Education Department of Anhui Province of China Grant/Award Numbers:2022AH051038,2023AH050474 and 2023AH050490.
文摘To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes,a spatial guidance and temporal enhancement(SG-TE)network is proposed for facial-bodily emotion recognition.First,ResNet50,DNN and spatial ransformer models are used to capture facial texture vectors,bodily skeleton vectors and wholebody geometric vectors,and an intraframe correlation attention guidance(S-CAG)mechanism,which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector,is designed to exploit the spatial potential emotional correlation between face and posture.Second,an interframe significant segment enhancement(T-SSE)structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony.Finally,an adaptive weight assignment(M-AWA)strategy is constructed to realise facial-bodily fusion.The experimental results on the BabyRobot Emotion Dataset(BRED)and Context-Aware Emotion Recognition(CAER)dataset indicate that the proposed network reaches accuracies of 81.61%and 89.39%,which are 9.61%and 9.46%higher than those of the baseline network,respectively.Compared with the state-of-the-art methods,the proposed method achieves 7.73%and 20.57%higher accuracy than single-modal methods based on facial expression or bodily posture,respectively,and 2.16%higher accuracy than the dual-modal methods based on facial-bodily fusion.Therefore,the proposed method,which adaptively fuses the complementary information of face and posture,improves the quality of emotion recognition in real-world scenarios.
基金supported by the Approved Project of Jilin Undergraduate Higher Education and Teaching Reform 2020(General Project).
文摘With the rapid development of network and communication techniques,the teaching forms have become diversified.To enhance the education experience and improve the teaching environment,an increasing number of educational institutions have adopted virtual simulation technology.A typical teaching mechanism is to exploit Virtual Reality(VR)technology,which affords participants an immersive experience.Unquestionably,such a VRbased mode is highly approved.However,the performance of this technology requires further optimization.On one hand,for VR 360video,the current intraframe decision cannot adapt to rapid response demands.On the other hand,the generated data size is considerably large and fast computation may not be realized,depending on the local VR device.Therefore,this study proposes an improved teaching mechanism empowered by edge computing–driven VR,called VE4T,that involves two parts.First,an intraframe decision algorithm for VR 360videos is devised to realize the rapid responses.Second,an edge computing framework is proposed to offload some tasks to an edge server for computation,where a task scheduling strategy is developed to check whether a task needs to be offloaded.Finally,experiments are performed using a practical teaching scenario with some VR devices.The obtained results demonstrate that VE4T is more efficient than existing mechanisms.