In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for...In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.展开更多
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India...Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.展开更多
In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D...In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.展开更多
Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Ou...Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.展开更多
Protein Secondary Structure Prediction (PSSP) is considered as one of the major challenging tasks in bioinformatics, so many solutions have been proposed to solve that problem via trying to achieve more accurate predi...Protein Secondary Structure Prediction (PSSP) is considered as one of the major challenging tasks in bioinformatics, so many solutions have been proposed to solve that problem via trying to achieve more accurate prediction results. The goal of this paper is to develop and implement an intelligent based system to predict secondary structure of a protein from its primary amino acid sequence by using five models of Neural Network (NN). These models are Feed Forward Neural Network (FNN), Learning Vector Quantization (LVQ), Probabilistic Neural Network (PNN), Convolutional Neural Network (CNN), and CNN Fine Tuning for PSSP. To evaluate our approaches two datasets have been used. The first one contains 114 protein samples, and the second one contains 1845 protein samples.展开更多
针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成...针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。展开更多
Optical coherence tomography(OCT),particularly Swept-Source OCT,is widely employed in medical diagnostics and industrial inspections owing to its high-resolution imaging capabilities.However,Swept-Source OCT 3D imagin...Optical coherence tomography(OCT),particularly Swept-Source OCT,is widely employed in medical diagnostics and industrial inspections owing to its high-resolution imaging capabilities.However,Swept-Source OCT 3D imaging often suffers from stripe artifacts caused by unstable light sources,system noise,and environmental interference,posing challenges to real-time processing of large-scale datasets.To address this issue,this study introduces a real-time reconstruction system that integrates stripe-artifact suppression and parallel computing using a graphics processing unit.This approach employs a frequency-domain filtering algorithm with adaptive anti-suppression parameters,dynamically adjusted through an image quality evaluation function and optimized using a convolutional neural network for complex frequency-domain feature learning.Additionally,a graphics processing unit integrated 3D reconstruction framework is developed,enhancing data processing throughput and real-time performance via a dual-queue decoupling mechanism.Experimental results demonstrate significant improvements in structural similarity(0.92),peak signal-to-noise ratio(31.62 dB),and stripe suppression ratio(15.73 dB)compared with existing methods.On the RTX 4090 platform,the proposed system achieved an end-to-end delay of 94.36 milliseconds,a frame rate of 10.3 frames per second,and a throughput of 121.5 million voxels per second,effectively suppressing artifacts while preserving image details and enhancing real-time 3D reconstruction performance.展开更多
In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object ...In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.展开更多
Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the...Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.展开更多
Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as b...Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as by human activities.For this reason,the study of damaged areas is crucial for mural restoration.These damaged regions differ significantly from undamaged areas and can be considered abnormal targets.Traditional manual visual processing lacks strong characterization capabilities and is prone to omissions and false detections.Hyperspectral imaging can reflect the material properties more effectively than visual characterization methods.Thus,this study employs hyperspectral imaging to obtain mural information and proposes a mural anomaly detection algorithm based on a hyperspectral multi-scale residual attention network(HM-MRANet).The innovations of this paper include:(1)Constructing mural painting hyperspectral datasets.(2)Proposing a multi-scale residual spectral-spatial feature extraction module based on a 3D CNN(Convolutional Neural Networks)network to better capture multiscale information and improve performance on small-sample hyperspectral datasets.(3)Proposing the Enhanced Residual Attention Module(ERAM)to address the feature redundancy problem,enhance the network’s feature discrimination ability,and further improve abnormal area detection accuracy.The experimental results show that the AUC(Area Under Curve),Specificity,and Accuracy of this paper’s algorithm reach 85.42%,88.84%,and 87.65%,respectively,on this dataset.These results represent improvements of 3.07%,1.11%and 2.68%compared to the SSRN algorithm,demonstrating the effectiveness of this method for mural anomaly detection.展开更多
Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shap...Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.展开更多
An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information r...An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.展开更多
<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream a...<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>展开更多
基金the National Natural Science Foundation of China(No.41274129)Chuan Qing Drilling Engineering Company's Scientific Research Project:Seismic detection technology and application of complex carbonate reservoir in Sulige Majiagou Formation and the 2018 Central Supporting Local Co-construction Fund(No.80000-18Z0140504)the Construction and Development of Universities in 2019-Joint Support for Geophysics(Double First-Class center,80000-19Z0204)。
文摘In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.
文摘Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.
基金supported by the Open Project of Key Laboratory of Computational Aerodynamics,AVIC Aerodynamics Research Institute(Grant No.YL2022XFX0409).
文摘In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.
文摘Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.
文摘Protein Secondary Structure Prediction (PSSP) is considered as one of the major challenging tasks in bioinformatics, so many solutions have been proposed to solve that problem via trying to achieve more accurate prediction results. The goal of this paper is to develop and implement an intelligent based system to predict secondary structure of a protein from its primary amino acid sequence by using five models of Neural Network (NN). These models are Feed Forward Neural Network (FNN), Learning Vector Quantization (LVQ), Probabilistic Neural Network (PNN), Convolutional Neural Network (CNN), and CNN Fine Tuning for PSSP. To evaluate our approaches two datasets have been used. The first one contains 114 protein samples, and the second one contains 1845 protein samples.
文摘针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。
文摘Optical coherence tomography(OCT),particularly Swept-Source OCT,is widely employed in medical diagnostics and industrial inspections owing to its high-resolution imaging capabilities.However,Swept-Source OCT 3D imaging often suffers from stripe artifacts caused by unstable light sources,system noise,and environmental interference,posing challenges to real-time processing of large-scale datasets.To address this issue,this study introduces a real-time reconstruction system that integrates stripe-artifact suppression and parallel computing using a graphics processing unit.This approach employs a frequency-domain filtering algorithm with adaptive anti-suppression parameters,dynamically adjusted through an image quality evaluation function and optimized using a convolutional neural network for complex frequency-domain feature learning.Additionally,a graphics processing unit integrated 3D reconstruction framework is developed,enhancing data processing throughput and real-time performance via a dual-queue decoupling mechanism.Experimental results demonstrate significant improvements in structural similarity(0.92),peak signal-to-noise ratio(31.62 dB),and stripe suppression ratio(15.73 dB)compared with existing methods.On the RTX 4090 platform,the proposed system achieved an end-to-end delay of 94.36 milliseconds,a frame rate of 10.3 frames per second,and a throughput of 121.5 million voxels per second,effectively suppressing artifacts while preserving image details and enhancing real-time 3D reconstruction performance.
基金This paper was partially supported by a project of the Shanghai Science and Technology Committee(18510760300)Anhui Natural Science Foundation(1908085MF178)Anhui Excellent Young Talents Support Program Project(gxyqZD2019069).
文摘In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.
基金Supported by the Shaanxi Province Key Research and Development Project (No. 2021GY-280)Shaanxi Province Natural Science Basic Research Program (No. 2021JM-459)the National Natural Science Foundation of China (No. 61772417)
文摘Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.
基金supported by Key Research and Development Plan of Ministry of Science and Technology(No.2023YFF0906200)Shaanxi Key Research and Development Plan(No.2018ZDXM-SF-093)+3 种基金Shaanxi Province Key Industrial Innovation Chain(Nos.S2022-YF-ZDCXL-ZDLGY-0093 and 2023-ZDLGY-45)Light of West China(No.XAB2022YN10)The China Postdoctoral Science Foundation(No.2023M740760)Shaanxi Key Research and Development Plan(No.2024SF-YBXM-678).
文摘Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as by human activities.For this reason,the study of damaged areas is crucial for mural restoration.These damaged regions differ significantly from undamaged areas and can be considered abnormal targets.Traditional manual visual processing lacks strong characterization capabilities and is prone to omissions and false detections.Hyperspectral imaging can reflect the material properties more effectively than visual characterization methods.Thus,this study employs hyperspectral imaging to obtain mural information and proposes a mural anomaly detection algorithm based on a hyperspectral multi-scale residual attention network(HM-MRANet).The innovations of this paper include:(1)Constructing mural painting hyperspectral datasets.(2)Proposing a multi-scale residual spectral-spatial feature extraction module based on a 3D CNN(Convolutional Neural Networks)network to better capture multiscale information and improve performance on small-sample hyperspectral datasets.(3)Proposing the Enhanced Residual Attention Module(ERAM)to address the feature redundancy problem,enhance the network’s feature discrimination ability,and further improve abnormal area detection accuracy.The experimental results show that the AUC(Area Under Curve),Specificity,and Accuracy of this paper’s algorithm reach 85.42%,88.84%,and 87.65%,respectively,on this dataset.These results represent improvements of 3.07%,1.11%and 2.68%compared to the SSRN algorithm,demonstrating the effectiveness of this method for mural anomaly detection.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFE0206900the National Natural Science Foundation of China under Grant No.61871440 and CAAI‐Huawei Mind-Spore Open Fund.
文摘Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.
基金supported by the General Program of the National Natural Science Foundation of China (62272234)the Enterprise Cooperation Project (2022h160)the Priority Academic Program Development of Jiangsu Higher Education Institutions Project.
文摘An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.
文摘<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>