In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to u...The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.展开更多
Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To...Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To address the difficulties in simultaneously capturing local and global dynamic spatiotemporal correlations in traffic flow,as well as the high time complexity of existing models,a multi-head flow attention-based local-global dynamic hypergraph convolution(MFA-LGDHC)pre-diction model is proposed.which consists of multi-head flow attention(MHFA)mechanism,graph convolution network(GCN),and local-global dynamic hypergraph convolution(LGHC).MHFA is utilized to extract the time dependency of traffic flow and reduce the time complexity of the model.GCN is employed to catch the spatial dependency of traffic flow.LGHC utilizes down-sampling con-volution and isometric convolution to capture the local and global spatial dependencies of traffic flow.And dynamic hypergraph convolution is used to model the dynamic higher-order relationships of the traffic road network.Experimental results indicate that the MFA-LGDHC model outperforms current popular baseline models and exhibits good prediction performance.展开更多
The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope ...The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope and image processing technology were employed to achieve a quantitative grain size distribution of NiTi alloys so as to provide experimental data for molecular dynamics modeling at the atomic scale.Considering the size effect of molecular dynamics model on material properties,a reasonable modeling size was provided by taking into account three characteristic dimensions from the perspective of macro,meso,and micro scales according to the Buckinghamπtheorem.Then,the corresponding MD simulation on deformation and fracture behavior was investigated to derive a parameterized traction-separation(T-S)law,and then it was embedded into cohesive elements of finite element software.Thus,the crack propagation behavior in NiTi alloys was reproduced by the finite element method(FEM).The experimental results show that the predicted initiation fracture toughness is in good agreement with experimental data.In addition,it is found that the dynamics initiation fracture toughness increases with decreasing grain size and increasing loading velocity.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
Pneumonia is a serious disease that can be fatal,particularly among children and the elderly.The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging.This ...Pneumonia is a serious disease that can be fatal,particularly among children and the elderly.The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging.This study proposes X-ODFCANet,which addresses the issues of low accuracy and excessive parameters in existing deep-learningbased pneumonia-classification methods.This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution(ODConv)module,leveraging the residual module for feature extraction from X-ray images.The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions.Additionally,the ODConv module extracts and fuses feature information in four dimensions:the spatial dimension of the convolution kernel,input and output channel quantities,and convolution kernel quantity.The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification,which is 3.77%higher than that of ResNet18.The model parameters are 4.45M,which was reduced by approximately 2.5 times.The code is available at https://github.com/limuni/X ODFCA NET.展开更多
Water electrolyzers play a crucial role in green hydrogen production.However,their efficiency and scalability are often compromised by bubble dynamics across various scales,from nanoscale to macroscale components.This...Water electrolyzers play a crucial role in green hydrogen production.However,their efficiency and scalability are often compromised by bubble dynamics across various scales,from nanoscale to macroscale components.This review explores multi-scale modeling as a tool to visualize multi-phase flow and improve mass transport in water electrolyzers.At the nanoscale,molecular dynamics(MD)simulations reveal how electrode surface features and wettability influence nanobubble nucleation and stability.Moving to the mesoscale,models such as volume of fluid(VOF)and lattice Boltzmann method(LBM)shed light on bubble transport in porous transport layers(PTLs).These insights inform innovative designs,including gradient porosity and hydrophilic-hydrophobic patterning,aimed at minimizing gas saturation.At the macroscale,VOF simulations elucidate two-phase flow regimes within channels,showing how flow field geometry and wettability affect bubble discharging.Moreover,artificial intelligence(AI)-driven surrogate models expedite the optimization process,allowing for rapid exploration of structural parameters in channel-rib flow fields and porous flow field designs.By integrating these approaches,we can bridge theoretical insights with experimental validation,ultimately enhancing water electrolyzer performance,reducing costs,and advancing affordable,high-efficiency hydrogen production.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the...Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Aiming at the difficulty of fault identification caused by manual extraction of fault features of rotating machinery,a one-dimensional multi-scale convolutional auto-encoder fault diagnosis model is proposed,based on ...Aiming at the difficulty of fault identification caused by manual extraction of fault features of rotating machinery,a one-dimensional multi-scale convolutional auto-encoder fault diagnosis model is proposed,based on the standard convolutional auto-encoder.In this model,the parallel convolutional and deconvolutional kernels of different scales are used to extract the features from the input signal and reconstruct the input signal;then the feature map extracted by multi-scale convolutional kernels is used as the input of the classifier;and finally the parameters of the whole model are fine-tuned using labeled data.Experiments on one set of simulation fault data and two sets of rolling bearing fault data are conducted to validate the proposed method.The results show that the model can achieve 99.75%,99.3%and 100%diagnostic accuracy,respectively.In addition,the diagnostic accuracy and reconstruction error of the one-dimensional multi-scale convolutional auto-encoder are compared with traditional machine learning,convolutional neural networks and a traditional convolutional auto-encoder.The final results show that the proposed model has a better recognition effect for rolling bearing fault data.展开更多
Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propos...Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.展开更多
As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symboli...As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.展开更多
BACKGROUND The accurate classification of focal liver lesions(FLLs)is essential to properly guide treatment options and predict prognosis.Dynamic contrast-enhanced computed tomography(DCE-CT)is still the cornerstone i...BACKGROUND The accurate classification of focal liver lesions(FLLs)is essential to properly guide treatment options and predict prognosis.Dynamic contrast-enhanced computed tomography(DCE-CT)is still the cornerstone in the exact classification of FLLs due to its noninvasive nature,high scanning speed,and high-density resolution.Since their recent development,convolutional neural network-based deep learning techniques has been recognized to have high potential for image recognition tasks.AIM To develop and evaluate an automated multiphase convolutional dense network(MP-CDN)to classify FLLs on multiphase CT.METHODS A total of 517 FLLs scanned on a 320-detector CT scanner using a four-phase DCECT imaging protocol(including precontrast phase,arterial phase,portal venous phase,and delayed phase)from 2012 to 2017 were retrospectively enrolled.FLLs were classified into four categories:Category A,hepatocellular carcinoma(HCC);category B,liver metastases;category C,benign non-inflammatory FLLs including hemangiomas,focal nodular hyperplasias and adenomas;and category D,hepatic abscesses.Each category was split into a training set and test set in an approximate 8:2 ratio.An MP-CDN classifier with a sequential input of the fourphase CT images was developed to automatically classify FLLs.The classification performance of the model was evaluated on the test set;the accuracy and specificity were calculated from the confusion matrix,and the area under the receiver operating characteristic curve(AUC)was calculated from the SoftMax probability outputted from the last layer of the MP-CDN.RESULTS A total of 410 FLLs were used for training and 107 FLLs were used for testing.The mean classification accuracy of the test set was 81.3%(87/107).The accuracy/specificity of distinguishing each category from the others were 0.916/0.964,0.925/0.905,0.860/0.918,and 0.925/0.963 for HCC,metastases,benign non-inflammatory FLLs,and abscesses on the test set,respectively.The AUC(95%confidence interval)for differentiating each category from the others was 0.92(0.837-0.992),0.99(0.967-1.00),0.88(0.795-0.955)and 0.96(0.914-0.996)for HCC,metastases,benign non-inflammatory FLLs,and abscesses on the test set,respectively.CONCLUSION MP-CDN accurately classified FLLs detected on four-phase CT as HCC,metastases,benign non-inflammatory FLLs and hepatic abscesses and may assist radiologists in identifying the different types of FLLs.展开更多
The calculation of settling speed of coarse particles is firstly addressed, with accelerated Stokesian dynamics without adjustable parameters, in which far field force acting on the particle instead of particle veloci...The calculation of settling speed of coarse particles is firstly addressed, with accelerated Stokesian dynamics without adjustable parameters, in which far field force acting on the particle instead of particle velocity is chosen as dependent variables to consider inter-particle hydrodynamic interactions. The sedimentation of a simple cubic array of spherical particles is simulated and compared to the results available to verify and validate the numerical code and computational scheme. The improved method keeps the same computational cost of the order O(NlogN) as usual accelerated Stokesian dynamics does. Then, more realistic random suspension sedimentation is investigated with the help of Mont Carlo method. The computational results agree well with experimental fitting. Finally, the sedimentation of finer cohesive particle, which is often observed in estuary environment, is presented as a further application in coastal engineering.展开更多
Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label c...Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.展开更多
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi...Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.展开更多
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
文摘The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
基金Supported by the Key R&D Program of Gansu Province(No.23YFGA0063)the Key Talent Project of Gansu Province(No.2024RCXM57,2024RCXM22)the Major Science and Technology Special Program of Gansu Province(No.25ZYJA037).
文摘Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To address the difficulties in simultaneously capturing local and global dynamic spatiotemporal correlations in traffic flow,as well as the high time complexity of existing models,a multi-head flow attention-based local-global dynamic hypergraph convolution(MFA-LGDHC)pre-diction model is proposed.which consists of multi-head flow attention(MHFA)mechanism,graph convolution network(GCN),and local-global dynamic hypergraph convolution(LGHC).MHFA is utilized to extract the time dependency of traffic flow and reduce the time complexity of the model.GCN is employed to catch the spatial dependency of traffic flow.LGHC utilizes down-sampling con-volution and isometric convolution to capture the local and global spatial dependencies of traffic flow.And dynamic hypergraph convolution is used to model the dynamic higher-order relationships of the traffic road network.Experimental results indicate that the MFA-LGDHC model outperforms current popular baseline models and exhibits good prediction performance.
基金Funded by the National Natural Science Foundation of China Academy of Engineering Physics and Jointly Setup"NSAF"Joint Fund(No.U1430119)。
文摘The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope and image processing technology were employed to achieve a quantitative grain size distribution of NiTi alloys so as to provide experimental data for molecular dynamics modeling at the atomic scale.Considering the size effect of molecular dynamics model on material properties,a reasonable modeling size was provided by taking into account three characteristic dimensions from the perspective of macro,meso,and micro scales according to the Buckinghamπtheorem.Then,the corresponding MD simulation on deformation and fracture behavior was investigated to derive a parameterized traction-separation(T-S)law,and then it was embedded into cohesive elements of finite element software.Thus,the crack propagation behavior in NiTi alloys was reproduced by the finite element method(FEM).The experimental results show that the predicted initiation fracture toughness is in good agreement with experimental data.In addition,it is found that the dynamics initiation fracture toughness increases with decreasing grain size and increasing loading velocity.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金supported in part by the Key Research and Development Program of Shaanxi Province of China,No.2024GX-YBXM-149in part by the National Natural Science Foundation of China,No.62071381.
文摘Pneumonia is a serious disease that can be fatal,particularly among children and the elderly.The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging.This study proposes X-ODFCANet,which addresses the issues of low accuracy and excessive parameters in existing deep-learningbased pneumonia-classification methods.This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution(ODConv)module,leveraging the residual module for feature extraction from X-ray images.The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions.Additionally,the ODConv module extracts and fuses feature information in four dimensions:the spatial dimension of the convolution kernel,input and output channel quantities,and convolution kernel quantity.The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification,which is 3.77%higher than that of ResNet18.The model parameters are 4.45M,which was reduced by approximately 2.5 times.The code is available at https://github.com/limuni/X ODFCA NET.
基金supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region,China(Project No.15308024)a grant from Research Centre for Carbon-Strategic Catalysis,The Hong Kong Polytechnic University(CE2X).
文摘Water electrolyzers play a crucial role in green hydrogen production.However,their efficiency and scalability are often compromised by bubble dynamics across various scales,from nanoscale to macroscale components.This review explores multi-scale modeling as a tool to visualize multi-phase flow and improve mass transport in water electrolyzers.At the nanoscale,molecular dynamics(MD)simulations reveal how electrode surface features and wettability influence nanobubble nucleation and stability.Moving to the mesoscale,models such as volume of fluid(VOF)and lattice Boltzmann method(LBM)shed light on bubble transport in porous transport layers(PTLs).These insights inform innovative designs,including gradient porosity and hydrophilic-hydrophobic patterning,aimed at minimizing gas saturation.At the macroscale,VOF simulations elucidate two-phase flow regimes within channels,showing how flow field geometry and wettability affect bubble discharging.Moreover,artificial intelligence(AI)-driven surrogate models expedite the optimization process,allowing for rapid exploration of structural parameters in channel-rib flow fields and porous flow field designs.By integrating these approaches,we can bridge theoretical insights with experimental validation,ultimately enhancing water electrolyzer performance,reducing costs,and advancing affordable,high-efficiency hydrogen production.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金supported in part by the National Natural Science Foundations of China(No.61801214)the Postgraduate Research Practice Innovation Program of NUAA(No.xcxjh20231504)。
文摘Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金The National Natural Science Foundation of China(No.51675098)
文摘Aiming at the difficulty of fault identification caused by manual extraction of fault features of rotating machinery,a one-dimensional multi-scale convolutional auto-encoder fault diagnosis model is proposed,based on the standard convolutional auto-encoder.In this model,the parallel convolutional and deconvolutional kernels of different scales are used to extract the features from the input signal and reconstruct the input signal;then the feature map extracted by multi-scale convolutional kernels is used as the input of the classifier;and finally the parameters of the whole model are fine-tuned using labeled data.Experiments on one set of simulation fault data and two sets of rolling bearing fault data are conducted to validate the proposed method.The results show that the model can achieve 99.75%,99.3%and 100%diagnostic accuracy,respectively.In addition,the diagnostic accuracy and reconstruction error of the one-dimensional multi-scale convolutional auto-encoder are compared with traditional machine learning,convolutional neural networks and a traditional convolutional auto-encoder.The final results show that the proposed model has a better recognition effect for rolling bearing fault data.
基金supported by the National Natural Science Foundation of China[grant number 41671452].
文摘Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.
基金Supported in part by Natural Science Foundation of China(Grant Nos.51835009,51705398)Shaanxi Province 2020 Natural Science Basic Research Plan(Grant No.2020JQ-042)Aeronautical Science Foundation(Grant No.2019ZB070001).
文摘As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.
基金Supported by National Natural Science Foundation of China,No.91959118Science and Technology Program of Guangzhou,China,No.201704020016+1 种基金SKY Radiology Department International Medical Research Foundation of China,No.Z-2014-07-1912-15Clinical Research Foundation of the 3rd Affiliated Hospital of Sun Yat-Sen University,No.YHJH201901.
文摘BACKGROUND The accurate classification of focal liver lesions(FLLs)is essential to properly guide treatment options and predict prognosis.Dynamic contrast-enhanced computed tomography(DCE-CT)is still the cornerstone in the exact classification of FLLs due to its noninvasive nature,high scanning speed,and high-density resolution.Since their recent development,convolutional neural network-based deep learning techniques has been recognized to have high potential for image recognition tasks.AIM To develop and evaluate an automated multiphase convolutional dense network(MP-CDN)to classify FLLs on multiphase CT.METHODS A total of 517 FLLs scanned on a 320-detector CT scanner using a four-phase DCECT imaging protocol(including precontrast phase,arterial phase,portal venous phase,and delayed phase)from 2012 to 2017 were retrospectively enrolled.FLLs were classified into four categories:Category A,hepatocellular carcinoma(HCC);category B,liver metastases;category C,benign non-inflammatory FLLs including hemangiomas,focal nodular hyperplasias and adenomas;and category D,hepatic abscesses.Each category was split into a training set and test set in an approximate 8:2 ratio.An MP-CDN classifier with a sequential input of the fourphase CT images was developed to automatically classify FLLs.The classification performance of the model was evaluated on the test set;the accuracy and specificity were calculated from the confusion matrix,and the area under the receiver operating characteristic curve(AUC)was calculated from the SoftMax probability outputted from the last layer of the MP-CDN.RESULTS A total of 410 FLLs were used for training and 107 FLLs were used for testing.The mean classification accuracy of the test set was 81.3%(87/107).The accuracy/specificity of distinguishing each category from the others were 0.916/0.964,0.925/0.905,0.860/0.918,and 0.925/0.963 for HCC,metastases,benign non-inflammatory FLLs,and abscesses on the test set,respectively.The AUC(95%confidence interval)for differentiating each category from the others was 0.92(0.837-0.992),0.99(0.967-1.00),0.88(0.795-0.955)and 0.96(0.914-0.996)for HCC,metastases,benign non-inflammatory FLLs,and abscesses on the test set,respectively.CONCLUSION MP-CDN accurately classified FLLs detected on four-phase CT as HCC,metastases,benign non-inflammatory FLLs and hepatic abscesses and may assist radiologists in identifying the different types of FLLs.
基金the National Natural Science Foundation of China (10332050 and 10572144)Knowledge Innovation Program (KJCX-SW-L08)
文摘The calculation of settling speed of coarse particles is firstly addressed, with accelerated Stokesian dynamics without adjustable parameters, in which far field force acting on the particle instead of particle velocity is chosen as dependent variables to consider inter-particle hydrodynamic interactions. The sedimentation of a simple cubic array of spherical particles is simulated and compared to the results available to verify and validate the numerical code and computational scheme. The improved method keeps the same computational cost of the order O(NlogN) as usual accelerated Stokesian dynamics does. Then, more realistic random suspension sedimentation is investigated with the help of Mont Carlo method. The computational results agree well with experimental fitting. Finally, the sedimentation of finer cohesive particle, which is often observed in estuary environment, is presented as a further application in coastal engineering.
基金Supported by the National Natural Science Foundation of China(No.61602191,61672521,61375037,61473291,61572501,61572536,61502491,61372107,61401167)the Natural Science Foundation of Fujian Province(No.2016J01308)+3 种基金the Scientific and Technology Funds of Quanzhou(No.2015Z114)the Scientific and Technology Funds of Xiamen(No.3502Z20173045)the Promotion Program for Young and Middle aged Teacher in Science and Technology Research of Huaqiao University(No.ZQN-PY418,ZQN-YX403)the Scientific Research Funds of Huaqiao University(No.16BS108)
文摘Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.
基金Supported by the National Natural Science Foundation of China(61903336,61976190)the Natural Science Foundation of Zhejiang Province(LY21F030015)。
文摘Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.