Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini...Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.展开更多
Neural activities differentiating bodies versus non-body stimuli have been identified in the occipitotemporal cortex of both humans and nonhuman primates.However,the neural mechanisms of coding the similarity of diffe...Neural activities differentiating bodies versus non-body stimuli have been identified in the occipitotemporal cortex of both humans and nonhuman primates.However,the neural mechanisms of coding the similarity of different individuals’bodies of the same species to support their categorical representations remain unclear.Using electroencephalography(EEG)and magnetoencephalography(MEG),we investigated the temporal and spatial characteristics of neural processes shared by different individual body silhouettes of the same species by quantifying the repetition suppression of neural responses to human and animal(chimpanzee,dog,and bird)body silhouettes showing different postures.Our EEG results revealed significant repetition suppression of the amplitudes of early frontal/central activity at 180–220 ms(P2)and late occipitoparietal activity at 220–320 ms(P270)in response to animal(but not human)body silhouettes of the same species.Our MEG results further localized the repetition suppression effect related to animal body silhouettes in the left supramarginal gyrus and left frontal cortex at 200–440 ms after stimulus onset.Our findings suggest two neural processes that are involved in spontaneous categorical representations of animal body silhouettes as a cognitive basis of human-animal interactions.展开更多
Linear octrees offer a volume representation of 3-D objects, which is quite compactand lends itself to traditional object processing operations. However, the linear octree structurefor generating the representation of...Linear octrees offer a volume representation of 3-D objects, which is quite compactand lends itself to traditional object processing operations. However, the linear octree structurefor generating the representation of 3-D objects from three orthogonal silhouettes by using thevolume intersection technique is dependent on viewpoints. The recognition achieved from match-ing object representations to model representations requires that the representations of objectsare independent of viewpoints. In order to obtain independent representations of viewpoints,the three principal axes of the object should be obtained from the moment of inertia matrix bycomputing its eigenvectors. The linear octree is projected onto the image planes of the three prin-cipal views (along the principal axes) to obtain the three normalized linear quadtrees. The objectmatching procedure has two phases: the first phase is to match the normalized linear quadtrees ofthe unknown object to a subset of models contained in a library utilizing a measure of symmetricdifference; the second phase is to generate the normalized linear octrees of the object and theseselected models and then to match the normalized linear octree of the unknown object with themodel having the minimum symmetric difference.展开更多
We give the topology changing of the silhouette in 3D space while others study the projections in an image. Silhou- ettes play a crucial role in visualization, graphics and vision. This work focuses on the global beha...We give the topology changing of the silhouette in 3D space while others study the projections in an image. Silhou- ettes play a crucial role in visualization, graphics and vision. This work focuses on the global behaviors of silhouettes, especially their topological evolutions, such as splitting, merging, appearing and disappearing. The dynamics of silhouettes are governed by the topology, the curvature of the surface, and the view point. In this paper, we work on a more theoretical level to give enu- merative properties of the silhouette including: the integration of signed geodesic curvature along a silhouette is equal to the view cone angle; in elliptic regions, no silhouette can be contained in another one; in hyperbolic regions, if a silhouette is homotopic to a point, then it has at least 4 cusps; finally, critical events can only happen when the view point is on the aspect surfaces (ruled surface of the asymptotic lines of parabolic points with surface itself). We also introduce a method to visualize the evolution of silhouettes, especially all the critical events where the topologies of the silhouettes change. The results have broad applications in computer vision for recognition, graphics for rendering and visualization.展开更多
Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationall...Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationally intensive,sensitive to video resolution changes and often fail in crowded scenes.We propose a novel hybrid system that is computationally efficient,robust to degraded video quality and able to filter out irrelevant individuals,making it suitable for real-life use.The system leverages multi-modal handcrafted features for interaction representation and a deep learning classifier for capturing complex dependencies.Using Mask R-CNN and YOLO11-Pose,we extract grayscale silhouettes and keypoint coordinates of interacting individuals,while filtering out irrelevant individuals using a proposed algorithm.From these,we extract silhouette-based features(local ternary pattern and histogram of optical flow)and keypoint-based features(distances,angles and velocities)that capture distinct spatial and temporal information.A Bidirectional Long Short-Term Memory network(BiLSTM)then classifies the interactions.Extensive experiments on the UT Interaction,SBU Kinect Interaction and the ISR-UOL 3D social activity datasets demonstrate that our system achieves competitive accuracy.They also validate the effectiveness of the chosen features and classifier,along with the proposed system’s computational efficiency and robustness to occlusion.展开更多
In the effort to enhance cardiovascular diagnostics,deep learning-based heart sound classification presents a promising solution.This research introduces a novel preprocessing method:iterative k-means clustering combi...In the effort to enhance cardiovascular diagnostics,deep learning-based heart sound classification presents a promising solution.This research introduces a novel preprocessing method:iterative k-means clustering combined with silhouette score analysis,aimed at downsampling.This approach ensures optimal cluster formation and improves data quality for deep learning models.The process involves applying k-means clustering to the dataset,calculating the average silhouette score for each cluster,and selecting the clusterwith the highest score.We evaluated this method using 10-fold cross-validation across various transfer learningmodels fromdifferent families and architectures.The evaluation was conducted on four datasets:a binary dataset,an augmented binary dataset,amulticlass dataset,and an augmentedmulticlass dataset.All datasets were derived from the Heart Wave heart sounds dataset,a novelmulticlass dataset introduced by our research group.To increase dataset sizes and improve model training,data augmentation was performed using heartbeat cycle segmentation.Our findings highlight the significant impact of the proposed preprocessing approach on the HeartWave datasets.Across all datasets,model performance improved notably with the application of our method.In augmented multiclass classification,the MobileNetV2 model showed an average weighted F1-score improvement of 27.10%.In binary classification,ResNet50 demonstrated an average accuracy improvement of 8.70%,reaching 92.40%compared to its baseline performance.These results underscore the effectiveness of clustering with silhouette score analysis as a preprocessing step,significantly enhancing model accuracy and robustness.They also emphasize the critical role of preprocessing in addressing class imbalance and advancing precision medicine in cardiovascular diagnostics.展开更多
In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has ...In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.展开更多
Rapidly and accurately assessing the geometric characteristics of coarse aggregate particles is crucial for ensuring pavement performance in highway engineering.This article introduces an innovative system for the thr...Rapidly and accurately assessing the geometric characteristics of coarse aggregate particles is crucial for ensuring pavement performance in highway engineering.This article introduces an innovative system for the three-dimensional(3D)surface reconstruction of coarse aggregate particles using occlusion-free multi-view imaging.The system captures synchronized images of particles in free fall,employing a matte sphere and a nonlinear optimization approach to estimate the camera projection matrices.A pre-trained segmentation model is utilized to eliminate the background of the images.The Shape from Silhouettes(SfS)algorithm is then applied to generate 3D voxel data,followed by the Marching Cubes algorithm to construct the 3D surface contour.Validation against standard parts and diverse coarse aggregate particles confirms the method's high accuracy,with an average measurement precision of 0.434 mm and a significant increase in scanning and reconstruction efficiency.展开更多
基金funded by the Natural Science Foundation of Chongqing Municipality,grant number CSTB2022NSCQ-MSX0503.
文摘Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.
基金supported by the National Natural Science Foundation of China(32230043 and 32371092)the Ministry of Science and Technology of China(2019YFA0707103)+1 种基金Das Chinesisch-Deutsche Zentrum für Wissenschaftsförderung(M-0093)the High-performance Computing Platform of Peking University.
文摘Neural activities differentiating bodies versus non-body stimuli have been identified in the occipitotemporal cortex of both humans and nonhuman primates.However,the neural mechanisms of coding the similarity of different individuals’bodies of the same species to support their categorical representations remain unclear.Using electroencephalography(EEG)and magnetoencephalography(MEG),we investigated the temporal and spatial characteristics of neural processes shared by different individual body silhouettes of the same species by quantifying the repetition suppression of neural responses to human and animal(chimpanzee,dog,and bird)body silhouettes showing different postures.Our EEG results revealed significant repetition suppression of the amplitudes of early frontal/central activity at 180–220 ms(P2)and late occipitoparietal activity at 220–320 ms(P270)in response to animal(but not human)body silhouettes of the same species.Our MEG results further localized the repetition suppression effect related to animal body silhouettes in the left supramarginal gyrus and left frontal cortex at 200–440 ms after stimulus onset.Our findings suggest two neural processes that are involved in spontaneous categorical representations of animal body silhouettes as a cognitive basis of human-animal interactions.
文摘Linear octrees offer a volume representation of 3-D objects, which is quite compactand lends itself to traditional object processing operations. However, the linear octree structurefor generating the representation of 3-D objects from three orthogonal silhouettes by using thevolume intersection technique is dependent on viewpoints. The recognition achieved from match-ing object representations to model representations requires that the representations of objectsare independent of viewpoints. In order to obtain independent representations of viewpoints,the three principal axes of the object should be obtained from the moment of inertia matrix bycomputing its eigenvectors. The linear octree is projected onto the image planes of the three prin-cipal views (along the principal axes) to obtain the three normalized linear quadtrees. The objectmatching procedure has two phases: the first phase is to match the normalized linear quadtrees ofthe unknown object to a subset of models contained in a library utilizing a measure of symmetricdifference; the second phase is to generate the normalized linear octrees of the object and theseselected models and then to match the normalized linear octree of the unknown object with themodel having the minimum symmetric difference.
基金Project supported by the NSF CAREER Award (Nos. CCF-0448339 and DMS-0528363) of the USAthe National Natural Science Foundation of China (No. 60503067)
文摘We give the topology changing of the silhouette in 3D space while others study the projections in an image. Silhou- ettes play a crucial role in visualization, graphics and vision. This work focuses on the global behaviors of silhouettes, especially their topological evolutions, such as splitting, merging, appearing and disappearing. The dynamics of silhouettes are governed by the topology, the curvature of the surface, and the view point. In this paper, we work on a more theoretical level to give enu- merative properties of the silhouette including: the integration of signed geodesic curvature along a silhouette is equal to the view cone angle; in elliptic regions, no silhouette can be contained in another one; in hyperbolic regions, if a silhouette is homotopic to a point, then it has at least 4 cusps; finally, critical events can only happen when the view point is on the aspect surfaces (ruled surface of the asymptotic lines of parabolic points with surface itself). We also introduce a method to visualize the evolution of silhouettes, especially all the critical events where the topologies of the silhouettes change. The results have broad applications in computer vision for recognition, graphics for rendering and visualization.
基金supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recognizing human interactions in RGB videos is a critical task in computer vision,with applications in video surveillance.Existing deep learning-based architectures have achieved strong results,but are computationally intensive,sensitive to video resolution changes and often fail in crowded scenes.We propose a novel hybrid system that is computationally efficient,robust to degraded video quality and able to filter out irrelevant individuals,making it suitable for real-life use.The system leverages multi-modal handcrafted features for interaction representation and a deep learning classifier for capturing complex dependencies.Using Mask R-CNN and YOLO11-Pose,we extract grayscale silhouettes and keypoint coordinates of interacting individuals,while filtering out irrelevant individuals using a proposed algorithm.From these,we extract silhouette-based features(local ternary pattern and histogram of optical flow)and keypoint-based features(distances,angles and velocities)that capture distinct spatial and temporal information.A Bidirectional Long Short-Term Memory network(BiLSTM)then classifies the interactions.Extensive experiments on the UT Interaction,SBU Kinect Interaction and the ISR-UOL 3D social activity datasets demonstrate that our system achieves competitive accuracy.They also validate the effectiveness of the chosen features and classifier,along with the proposed system’s computational efficiency and robustness to occlusion.
基金supported by the Deanship of Scientific Research(DSR),King Abdulaziz University,Jeddah,under grant No.IPP:533-611-2025DSR technical and financial support.
文摘In the effort to enhance cardiovascular diagnostics,deep learning-based heart sound classification presents a promising solution.This research introduces a novel preprocessing method:iterative k-means clustering combined with silhouette score analysis,aimed at downsampling.This approach ensures optimal cluster formation and improves data quality for deep learning models.The process involves applying k-means clustering to the dataset,calculating the average silhouette score for each cluster,and selecting the clusterwith the highest score.We evaluated this method using 10-fold cross-validation across various transfer learningmodels fromdifferent families and architectures.The evaluation was conducted on four datasets:a binary dataset,an augmented binary dataset,amulticlass dataset,and an augmentedmulticlass dataset.All datasets were derived from the Heart Wave heart sounds dataset,a novelmulticlass dataset introduced by our research group.To increase dataset sizes and improve model training,data augmentation was performed using heartbeat cycle segmentation.Our findings highlight the significant impact of the proposed preprocessing approach on the HeartWave datasets.Across all datasets,model performance improved notably with the application of our method.In augmented multiclass classification,the MobileNetV2 model showed an average weighted F1-score improvement of 27.10%.In binary classification,ResNet50 demonstrated an average accuracy improvement of 8.70%,reaching 92.40%compared to its baseline performance.These results underscore the effectiveness of clustering with silhouette score analysis as a preprocessing step,significantly enhancing model accuracy and robustness.They also emphasize the critical role of preprocessing in addressing class imbalance and advancing precision medicine in cardiovascular diagnostics.
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
基金the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2024-RS-2022-00156326)the IITP(Institute of Information&Communications Technology Planning&Evaluation).Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R440)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This research was supported by the Deanship of Scientific Research at Najran University,under the Research Group Funding program grant code(NU/RG/SERC/13/30).
文摘In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.
基金Supported by the Key R&D Projects in Shaanxi Province(2022JBGS3-08)。
文摘Rapidly and accurately assessing the geometric characteristics of coarse aggregate particles is crucial for ensuring pavement performance in highway engineering.This article introduces an innovative system for the three-dimensional(3D)surface reconstruction of coarse aggregate particles using occlusion-free multi-view imaging.The system captures synchronized images of particles in free fall,employing a matte sphere and a nonlinear optimization approach to estimate the camera projection matrices.A pre-trained segmentation model is utilized to eliminate the background of the images.The Shape from Silhouettes(SfS)algorithm is then applied to generate 3D voxel data,followed by the Marching Cubes algorithm to construct the 3D surface contour.Validation against standard parts and diverse coarse aggregate particles confirms the method's high accuracy,with an average measurement precision of 0.434 mm and a significant increase in scanning and reconstruction efficiency.