With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analy...With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analysis tasks such as behaviour recognition.These applications have dramatically increased the diversity of IoT systems.Specifically,behaviour recognition in videos usually requires a combinatorial analysis of the spatial information about objects and information about their dynamic actions in the temporal dimension.Behaviour recognition may even rely more on the modeling of temporal information containing short-range and long-range motions,in contrast to computer vision tasks involving images that focus on understanding spatial information.However,current solutions fail to jointly and comprehensively analyse short-range motions between adjacent frames and long-range temporal aggregations at large scales in videos.In this paper,we propose a novel behaviour recognition method based on the integration of multigranular(IMG)motion features,which can provide support for deploying video analysis in multimedia IoT crowdsensing systems.In particular,we achieve reliable motion information modeling by integrating a channel attention-based short-term motion feature enhancement module(CSEM)and a cascaded long-term motion feature integration module(CLIM).We evaluate our model on several action recognition benchmarks,such as HMDB51,Something-Something and UCF101.The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods,which confirms its effective-ness and efficiency.展开更多
A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descrip...A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.展开更多
Detecting keypoints in dairy cows aims to locate and track the motion trajectories of the body's joints,which plays a crucial role in behavior analysis and lameness detection.However,real farming scenarios,charact...Detecting keypoints in dairy cows aims to locate and track the motion trajectories of the body's joints,which plays a crucial role in behavior analysis and lameness detection.However,real farming scenarios,characterized by occlusions and large variations in object scale may result in poor detection results.Therefore,we introduce the atrous spatial pyramid pooling(ASPP) module into the shallow layers network of ResNet101,designed to improve the multi-scale feature extraction capability of the model.The ASPP module enhances the robustness of recognition for different dimensional sizes and occluded keypoints using different dilatation rates in the parallel atrous convolutional layers to expand the model's receptive field.Furthermore,seven types of motion features,including tracking up,gait symmetry,step height balance,motion speed variability,head swing amplitude,head-neck slope and back curvature are extracted simultaneously by monitoring and tracking the motion trajectory of distinct keypoints.Several of these features represent innovative extraction models and attributes,first proposed in this study.Multiple models are trained and tested on datasets containing 2,385 frames for ablation experiments.The experiments show that,in comparison with the ResNet50,MobileNet_v2_1.0,and EfficientNet-b0backbone networks,the training error and test error of ResNet101 are reduced by 4.04-30.12 pixels and 3.81-28.14 pixels.Therefore,ResNet101 is used as the benchmark for subsequent model improvement by adding the ASPP module.The training error and test error of the ResNet101-ASPP network are reduced by 0.27 and 0.24 pixels,respectively,compared to the benchmark network.The prediction confidence improves by 1.65-2.50% at three different dairy cow object scales.In addition,the keypoints under different occlusion conditions improve considerably,especially for small-scale keypoints,demonstrating the capability of the ASPP module for multi-scale feature extraction.By analyzing the distribution of the seven features and health,mild lameness,and severe lameness in dairy cows,it is shown that all the different features play an important role in distinguishing between different levels of lameness.展开更多
Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely u...Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.展开更多
基金supported by National Natural Science Foundation of China under grant No.62271125,No.62273071Sichuan Science and Technology Program(No.2022YFG0038,No.2021YFG0018)+1 种基金by Xinjiang Science and Technology Program(No.2022273061)by the Fundamental Research Funds for the Central Universities(No.ZYGX2020ZB034,No.ZYGX2021J019).
文摘With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analysis tasks such as behaviour recognition.These applications have dramatically increased the diversity of IoT systems.Specifically,behaviour recognition in videos usually requires a combinatorial analysis of the spatial information about objects and information about their dynamic actions in the temporal dimension.Behaviour recognition may even rely more on the modeling of temporal information containing short-range and long-range motions,in contrast to computer vision tasks involving images that focus on understanding spatial information.However,current solutions fail to jointly and comprehensively analyse short-range motions between adjacent frames and long-range temporal aggregations at large scales in videos.In this paper,we propose a novel behaviour recognition method based on the integration of multigranular(IMG)motion features,which can provide support for deploying video analysis in multimedia IoT crowdsensing systems.In particular,we achieve reliable motion information modeling by integrating a channel attention-based short-term motion feature enhancement module(CSEM)and a cascaded long-term motion feature integration module(CLIM).We evaluate our model on several action recognition benchmarks,such as HMDB51,Something-Something and UCF101.The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods,which confirms its effective-ness and efficiency.
基金Supported by the National Natural Science Foundation of China (No. 60772134, 60902081, 60902052) the 111 Project (No.B08038) the Fundamental Research Funds for the Central Universities(No.72105457).
文摘A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.
基金supported by the National Natural Science Foundation of China (32102600)the Central Publicinterest Scientific Institution Basal Research Fund, China (Y2023XK13, JBYW-AII-2024-28/40, and JBYWAII-2023-33/37/42)+1 种基金Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-AII)the Wuhu Science and Technology Bureau Two Strong One Increase Project, China (2023ly12)。
文摘Detecting keypoints in dairy cows aims to locate and track the motion trajectories of the body's joints,which plays a crucial role in behavior analysis and lameness detection.However,real farming scenarios,characterized by occlusions and large variations in object scale may result in poor detection results.Therefore,we introduce the atrous spatial pyramid pooling(ASPP) module into the shallow layers network of ResNet101,designed to improve the multi-scale feature extraction capability of the model.The ASPP module enhances the robustness of recognition for different dimensional sizes and occluded keypoints using different dilatation rates in the parallel atrous convolutional layers to expand the model's receptive field.Furthermore,seven types of motion features,including tracking up,gait symmetry,step height balance,motion speed variability,head swing amplitude,head-neck slope and back curvature are extracted simultaneously by monitoring and tracking the motion trajectory of distinct keypoints.Several of these features represent innovative extraction models and attributes,first proposed in this study.Multiple models are trained and tested on datasets containing 2,385 frames for ablation experiments.The experiments show that,in comparison with the ResNet50,MobileNet_v2_1.0,and EfficientNet-b0backbone networks,the training error and test error of ResNet101 are reduced by 4.04-30.12 pixels and 3.81-28.14 pixels.Therefore,ResNet101 is used as the benchmark for subsequent model improvement by adding the ASPP module.The training error and test error of the ResNet101-ASPP network are reduced by 0.27 and 0.24 pixels,respectively,compared to the benchmark network.The prediction confidence improves by 1.65-2.50% at three different dairy cow object scales.In addition,the keypoints under different occlusion conditions improve considerably,especially for small-scale keypoints,demonstrating the capability of the ASPP module for multi-scale feature extraction.By analyzing the distribution of the seven features and health,mild lameness,and severe lameness in dairy cows,it is shown that all the different features play an important role in distinguishing between different levels of lameness.
基金This work was supported by grants fromthe Natural Science Foundation of Hebei Province,under Grant No.F2021202021the S&T Program of Hebei,under Grant No.22375001Dthe National Key R&D Program of China,under Grant No.2019YFB1312500.
文摘Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.