针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成...针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用...肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用特定顺序输入策略,还在公开的Kaggle Data Science Bowl 2017数据集上进行了实验。实验表明,该方法对图像的分类准确率达到76%,比采用随机顺序的输入策略时有所提升,能够为肺部病理图像的分类研究提供有价值的参考。展开更多
In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous...In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.展开更多
Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepin...Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.展开更多
Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a bi...Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.展开更多
This article describes a novel approach for enhancing the three-dimensional(3D)point cloud reconstruction for light field microscopy(LFM)using U-net architecture-based fully convolutional neural network(CNN).Since the...This article describes a novel approach for enhancing the three-dimensional(3D)point cloud reconstruction for light field microscopy(LFM)using U-net architecture-based fully convolutional neural network(CNN).Since the directional view of the LFM is limited,noise and artifacts make it difficult to reconstruct the exact shape of 3D point clouds.The existing methods suffer from these problems due to the self-occlusion of the model.This manuscript proposes a deep fusion learning(DL)method that combines a 3D CNN with a U-Net-based model as a feature extractor.The sub-aperture images obtained from the light field microscopy are aligned to form a light field data cube for preprocessing.A multi-stream 3D CNNs and U-net architecture are applied to obtain the depth feature fromthe directional sub-aperture LF data cube.For the enhancement of the depthmap,dual iteration-based weighted median filtering(WMF)is used to reduce surface noise and enhance the accuracy of the reconstruction.Generating a 3D point cloud involves combining two key elements:the enhanced depth map and the central view of the light field image.The proposed method is validated using synthesized Heidelberg Collaboratory for Image Processing(HCI)and real-world LFM datasets.The results are compared with different state-of-the-art methods.The structural similarity index(SSIM)gain for boxes,cotton,pillow,and pens are 0.9760,0.9806,0.9940,and 0.9907,respectively.Moreover,the discrete entropy(DE)value for LFM depth maps exhibited better performance than other existing methods.展开更多
The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dim...The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.展开更多
文摘针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
文摘肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用特定顺序输入策略,还在公开的Kaggle Data Science Bowl 2017数据集上进行了实验。实验表明,该方法对图像的分类准确率达到76%,比采用随机顺序的输入策略时有所提升,能够为肺部病理图像的分类研究提供有价值的参考。
文摘In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.
文摘Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.
基金supported by the National Natural Science Foundation of China(Nos.61862058,61962034,and 8226070356)in part by the Gansu Provincial Science&Technology Department(No.20JR10RA076)。
文摘Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.
基金supported by the National Research Foundation of Korea (NRF) (NRF-2018R1D1A3B07044041&NRF-2020R1A2C1101258)supported by the MSIT (Ministry of Science and ICT),Korea,under the ITRC (Information Technology Research Center)Support Program (IITP-2023-2020-0-01846)was conducted during the research year of Chungbuk National University in 2023.
文摘This article describes a novel approach for enhancing the three-dimensional(3D)point cloud reconstruction for light field microscopy(LFM)using U-net architecture-based fully convolutional neural network(CNN).Since the directional view of the LFM is limited,noise and artifacts make it difficult to reconstruct the exact shape of 3D point clouds.The existing methods suffer from these problems due to the self-occlusion of the model.This manuscript proposes a deep fusion learning(DL)method that combines a 3D CNN with a U-Net-based model as a feature extractor.The sub-aperture images obtained from the light field microscopy are aligned to form a light field data cube for preprocessing.A multi-stream 3D CNNs and U-net architecture are applied to obtain the depth feature fromthe directional sub-aperture LF data cube.For the enhancement of the depthmap,dual iteration-based weighted median filtering(WMF)is used to reduce surface noise and enhance the accuracy of the reconstruction.Generating a 3D point cloud involves combining two key elements:the enhanced depth map and the central view of the light field image.The proposed method is validated using synthesized Heidelberg Collaboratory for Image Processing(HCI)and real-world LFM datasets.The results are compared with different state-of-the-art methods.The structural similarity index(SSIM)gain for boxes,cotton,pillow,and pens are 0.9760,0.9806,0.9940,and 0.9907,respectively.Moreover,the discrete entropy(DE)value for LFM depth maps exhibited better performance than other existing methods.
基金Supported by the Shaanxi Province Key Research and Development Project(No.2021GY-280)Shaanxi Province Natural Science Basic Re-search Program Project(No.2021JM-459)+1 种基金the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)the Shaanxi Province International Science and Technology Cooperation Project(No.2018KW-006).
文摘The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.