With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.Howe...With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.展开更多
Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-rel...Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.展开更多
Land use and land cover change(LUCC)process exhibits spatial correlation and temporal dependency.Accurate extraction of spatiotemporal features is important in enhancing the modeling capabilities of LUCC.Cellular auto...Land use and land cover change(LUCC)process exhibits spatial correlation and temporal dependency.Accurate extraction of spatiotemporal features is important in enhancing the modeling capabilities of LUCC.Cellular automaton(CA)models,recognized as powerful tools for simulating dynamic LUCC processes,are traditionally applied in LUCC,focusing on time-slice driving factor data,often neglecting the temporal dimension.However,the transformer architecture,a highly acclaimed model in machine learning,has been rarely integrated into CA models for the simulation of dynamic LUCC processes.To fill this gap,we proposed a novel spatiotemporal urban LUCC simulation model,namely,transformer-convolutional neural network(TC)-CA.Based on CA models that involve the utilization of a convolutional neural network(CNN)for extracting latent spatial features,TC-CA extends this paradigm by incorporating a transformer architecture to extract spatiotemporal information from temporal driving factor data and temporal spatial features.The evaluation results with Wuxi city as a study area indicated the advantage of our proposed TC-CA against random forest-CA,conventional CNN-CA,artificial neural network-CA,and transformer-CA.Compared with the three non-transformer-based CAs,the TC-CA improved the figure of merit by up to 2.85%-8.14%.This study contributes a fresh spatiotemporal perspective and transformer approach to the field of LUCC modeling.展开更多
Painleve integrability has been tested for (2+1)D Boussinesq equation with disturbance term using the standard WTC approach after introducing the Kruskai's simplification. New breather solitary solutions depending...Painleve integrability has been tested for (2+1)D Boussinesq equation with disturbance term using the standard WTC approach after introducing the Kruskai's simplification. New breather solitary solutions depending on constant equilibrium solution are obtained by using Extended Homoclinic Test Method. Moreover, the spatiotemporal feature of breather solitary wave is exhibited.展开更多
Recently,the importance of data analysis has increased significantly due to the rapid data increase.In particular,vehicle communication data,considered a significant challenge in Intelligent Transportation Systems(ITS...Recently,the importance of data analysis has increased significantly due to the rapid data increase.In particular,vehicle communication data,considered a significant challenge in Intelligent Transportation Systems(ITS),has spatiotemporal characteristics and many missing values.High missing values in data lead to the decreased predictive performance of models.Existing missing value imputation models ignore the topology of transportation net-works due to the structural connection of road networks,although physical distances are close in spatiotemporal image data.Additionally,the learning process of missing value imputation models requires complete data,but there are limitations in securing complete vehicle communication data.This study proposes a missing value imputation model based on adversarial autoencoder using spatiotemporal feature extraction to address these issues.The proposed method replaces missing values by reflecting spatiotemporal characteristics of transportation data using temporal convolution and spatial convolution.Experimental results show that the proposed model has the lowest error rate of 5.92%,demonstrating excellent predictive accuracy.Through this,it is possible to solve the data sparsity problem and improve traffic safety by showing superior predictive performance.展开更多
Currently,numerical models based on idealized assumptions,complex algorithms and high computational costs are unsatisfactory for ocean surface current prediction.Moreover,the complex temporal and spatial variability o...Currently,numerical models based on idealized assumptions,complex algorithms and high computational costs are unsatisfactory for ocean surface current prediction.Moreover,the complex temporal and spatial variability of ocean currents also makes the prediction methods based on time series data challenging.The deep network model can automatically learn and extract complex features hidden in large amount of complex data,so it is a promising method for high quality prediction of ocean currents.In this paper,we propose a spatiotemporal coupled attention deep network model STCANet that can extract abundant temporal and spatial coupling information on the behavior characteristics of ocean currents for improving the prediction accuracy.Firstly,Spatial Module is designed and implemented to extract the spatiotemporal coupling characteristics of ocean currents,and meanwhile the spatial correlations and dependencies among adjacent sea areas are obtained through Spatial Channel Attention Module(SCAM).Secondly,we use the GatedRecurrent-Unit(GRU)to extract temporal relationships of ocean currents,and design and implement the nearest neighbor time attention module to extract the interdependences of ocean currents between adjacent times,which can further improve the accuracy of ocean current prediction.Finally,a series of comparative experiments on the MediSea_Dataset and EastSea_Dataset showed that the prediction quality of our model greatly outperforms those of other benchmark models such as History Average(HA),Autoregressive Integrated Moving Average Model(ARIMA),Long Short-term Memory(LSTM),Gate Recurrent Unit(GRU)and CNN_GRU.展开更多
The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring ef...The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.展开更多
Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepin...Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.展开更多
Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a bi...Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.展开更多
This paper proposes a novel,efficient and affordable approach to detect the students’engagement levels in an e-learning environment by using webcams.Our method analyzes spatiotemporal features of e-learners’micro bo...This paper proposes a novel,efficient and affordable approach to detect the students’engagement levels in an e-learning environment by using webcams.Our method analyzes spatiotemporal features of e-learners’micro body gestures,which will be mapped to emotions and appropriate engagement states.The proposed engagement detection model uses a three-dimensional convolutional neural network to analyze both temporal and spatial information across video frames.We follow a transfer learning approach by using the C3D model that was trained on the Sports-1M dataset.The adopted C3D model was used based on two different approaches;as a feature extractor with linear classifiers and a classifier after applying fine-tuning to the pretrained model.Our model was tested and its performance was evaluated and compared to the existing models.It proved its effectiveness and superiority over the other existing methods with an accuracy of 94%.The results of this work will contribute to the development of smart and interactive e-learning systems with adaptive responses based on users’engagement levels.展开更多
Timely acquisition of chicken behavioral information is crucial for assessing chicken health status and production performance.Video-based behavior recognition has emerged as a primary technique for obtaining such inf...Timely acquisition of chicken behavioral information is crucial for assessing chicken health status and production performance.Video-based behavior recognition has emerged as a primary technique for obtaining such information due to its accuracy and robustness.Video-based models generally predict a single behavior from a single video segment of a fixed duration.However,during periods of high activity in poultry,behavior transition may occur within a video segment,and existing models often fail to capture such transitions effectively.This limitation highlights the insufficient temporal resolution of video-based behavior recognition models.This study presents a chicken behavior recognition and localization model,CBLFormer,which is based on spatiotemporal feature learning.The model was designed to recognize behaviors that occur before and after transitions in video segments and to localize the corresponding time interval for each behavior.An improved transformer block,the cascade encoder-decoder network(CEDNet),a transformer-based head,and weighted distance intersection over union(WDIoU)loss were integrated into CBLFormer to enhance the model's ability to distinguish between different behavior categories and locate behavior boundaries.For the training and testing of CBLFormer,a dataset was created by collecting videos from 320 chickens across different ages and rearing densities.The results showed that CBLFormer achieved a mAP@0.5:0.95 of 98.34%on the test set.The integration of CEDNet contributed the most to the performance improvement of CBLFormer.The visualization results confirmed that the model effectively captured the behavioral boundaries of chickens and correctly recognized behavior categories.The transfer learning results demonstrated that the model is applicable to chicken behavior recognition and localization tasks in real-world poultry farms.The proposed method handles cases where poultry behavior transitions occur within the video segment and improves the temporal resolution of video-based behavior recognition models.展开更多
The research of emotion recognition based on electroencephalogram(EEG)signals often ignores the related information between the brain electrode channels and the contextual emotional information existing in EEG signals...The research of emotion recognition based on electroencephalogram(EEG)signals often ignores the related information between the brain electrode channels and the contextual emotional information existing in EEG signals,which may contain important characteristics related to emotional states.Aiming at the above defects,a spatiotemporal emotion recognition method based on a 3-dimensional(3 D)time-frequency domain feature matrix was proposed.Specifically,the extracted time-frequency domain EEG features are first expressed as a 3 D matrix format according to the actual position of the cerebral cortex.Then,the input 3 D matrix is processed successively by multivariate convolutional neural network(MVCNN)and long short-term memory(LSTM)to classify the emotional state.Spatiotemporal emotion recognition method is evaluated on the DEAP data set,and achieved accuracy of 87.58%and 88.50%on arousal and valence dimensions respectively in binary classification tasks,as well as obtained accuracy of 84.58%in four class classification tasks.The experimental results show that 3 D matrix representation can represent emotional information more reasonably than two-dimensional(2 D).In addition,MVCNN and LSTM can utilize the spatial information of the electrode channels and the temporal context information of the EEG signal respectively.展开更多
基金supported by National Natural Science Foundation of China(Grant No.62071098)Sichuan Science and Technology Program(Grants 2022YFG0319,2023YFG0301 and 2023YFG0018).
文摘With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.
基金supported by the National Natural Science Foundation of China(Grant No.42050101)the National Key Research and Development Program of China(Grant Nos.2022YFB3904200&2021YFB00903)supported by the International Big Science Program of Deeptime Digital Earth(DDE)。
文摘Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.
基金National Natural Science Foundation of China,No.42271418,No.42171088State Key Laboratory of Earth Surface Processes and Resource Ecology,No.2022-ZD-04,No.2023-WT-02。
文摘Land use and land cover change(LUCC)process exhibits spatial correlation and temporal dependency.Accurate extraction of spatiotemporal features is important in enhancing the modeling capabilities of LUCC.Cellular automaton(CA)models,recognized as powerful tools for simulating dynamic LUCC processes,are traditionally applied in LUCC,focusing on time-slice driving factor data,often neglecting the temporal dimension.However,the transformer architecture,a highly acclaimed model in machine learning,has been rarely integrated into CA models for the simulation of dynamic LUCC processes.To fill this gap,we proposed a novel spatiotemporal urban LUCC simulation model,namely,transformer-convolutional neural network(TC)-CA.Based on CA models that involve the utilization of a convolutional neural network(CNN)for extracting latent spatial features,TC-CA extends this paradigm by incorporating a transformer architecture to extract spatiotemporal information from temporal driving factor data and temporal spatial features.The evaluation results with Wuxi city as a study area indicated the advantage of our proposed TC-CA against random forest-CA,conventional CNN-CA,artificial neural network-CA,and transformer-CA.Compared with the three non-transformer-based CAs,the TC-CA improved the figure of merit by up to 2.85%-8.14%.This study contributes a fresh spatiotemporal perspective and transformer approach to the field of LUCC modeling.
基金Supported by National Natural Science Foundation of China under Grant Nos. 11061028, 11261049, Yunnan Natural Science Foundation under Grant Nos. 2010CD086 and 2011Y012 and Qujing Normal University Natural Science Foundation under Grant Nos. 2009ZD002 and 2012QN016
文摘Painleve integrability has been tested for (2+1)D Boussinesq equation with disturbance term using the standard WTC approach after introducing the Kruskai's simplification. New breather solitary solutions depending on constant equilibrium solution are obtained by using Extended Homoclinic Test Method. Moreover, the spatiotemporal feature of breather solitary wave is exhibited.
基金supported by the MSIT (Ministry of Science and ICT),Korea,under the ITRC (Information Technology Research Center)support program (IITP-2018-0-01405)supervised by the IITP (Institute for Information&Communications Technology Planning&Evaluation).
文摘Recently,the importance of data analysis has increased significantly due to the rapid data increase.In particular,vehicle communication data,considered a significant challenge in Intelligent Transportation Systems(ITS),has spatiotemporal characteristics and many missing values.High missing values in data lead to the decreased predictive performance of models.Existing missing value imputation models ignore the topology of transportation net-works due to the structural connection of road networks,although physical distances are close in spatiotemporal image data.Additionally,the learning process of missing value imputation models requires complete data,but there are limitations in securing complete vehicle communication data.This study proposes a missing value imputation model based on adversarial autoencoder using spatiotemporal feature extraction to address these issues.The proposed method replaces missing values by reflecting spatiotemporal characteristics of transportation data using temporal convolution and spatial convolution.Experimental results show that the proposed model has the lowest error rate of 5.92%,demonstrating excellent predictive accuracy.Through this,it is possible to solve the data sparsity problem and improve traffic safety by showing superior predictive performance.
基金The authors would like to thank the financial support from the National Key Research and Development Program of China(Nos.2020YFE0201200,2019YFC1509100)the partial support by the Youth Program of Natural Science Foundation of China(No.41706010)the Fundamental Research Funds for the Central Universities(No.202264002).
文摘Currently,numerical models based on idealized assumptions,complex algorithms and high computational costs are unsatisfactory for ocean surface current prediction.Moreover,the complex temporal and spatial variability of ocean currents also makes the prediction methods based on time series data challenging.The deep network model can automatically learn and extract complex features hidden in large amount of complex data,so it is a promising method for high quality prediction of ocean currents.In this paper,we propose a spatiotemporal coupled attention deep network model STCANet that can extract abundant temporal and spatial coupling information on the behavior characteristics of ocean currents for improving the prediction accuracy.Firstly,Spatial Module is designed and implemented to extract the spatiotemporal coupling characteristics of ocean currents,and meanwhile the spatial correlations and dependencies among adjacent sea areas are obtained through Spatial Channel Attention Module(SCAM).Secondly,we use the GatedRecurrent-Unit(GRU)to extract temporal relationships of ocean currents,and design and implement the nearest neighbor time attention module to extract the interdependences of ocean currents between adjacent times,which can further improve the accuracy of ocean current prediction.Finally,a series of comparative experiments on the MediSea_Dataset and EastSea_Dataset showed that the prediction quality of our model greatly outperforms those of other benchmark models such as History Average(HA),Autoregressive Integrated Moving Average Model(ARIMA),Long Short-term Memory(LSTM),Gate Recurrent Unit(GRU)and CNN_GRU.
基金supported by the National Natural Science Foundation of China(Grant Nos.62071315 and 62271336).
文摘The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.
文摘Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.
基金supported by the National Natural Science Foundation of China(Nos.61862058,61962034,and 8226070356)in part by the Gansu Provincial Science&Technology Department(No.20JR10RA076)。
文摘Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.
基金Makkah Digital Gate Initiatives funded this research work under Grant Number(MDP-IRI-8-2020).Emirate of Makkah Province and King Abdulaziz University,Jeddah,Saudi Arabia.https://science.makkah.kau.edu.sa/Default-101888-AR.
文摘This paper proposes a novel,efficient and affordable approach to detect the students’engagement levels in an e-learning environment by using webcams.Our method analyzes spatiotemporal features of e-learners’micro body gestures,which will be mapped to emotions and appropriate engagement states.The proposed engagement detection model uses a three-dimensional convolutional neural network to analyze both temporal and spatial information across video frames.We follow a transfer learning approach by using the C3D model that was trained on the Sports-1M dataset.The adopted C3D model was used based on two different approaches;as a feature extractor with linear classifiers and a classifier after applying fine-tuning to the pretrained model.Our model was tested and its performance was evaluated and compared to the existing models.It proved its effectiveness and superiority over the other existing methods with an accuracy of 94%.The results of this work will contribute to the development of smart and interactive e-learning systems with adaptive responses based on users’engagement levels.
基金Supported by Scientific Research Fund of Zhejiang Provincial Education Department(Y202457020).
文摘Timely acquisition of chicken behavioral information is crucial for assessing chicken health status and production performance.Video-based behavior recognition has emerged as a primary technique for obtaining such information due to its accuracy and robustness.Video-based models generally predict a single behavior from a single video segment of a fixed duration.However,during periods of high activity in poultry,behavior transition may occur within a video segment,and existing models often fail to capture such transitions effectively.This limitation highlights the insufficient temporal resolution of video-based behavior recognition models.This study presents a chicken behavior recognition and localization model,CBLFormer,which is based on spatiotemporal feature learning.The model was designed to recognize behaviors that occur before and after transitions in video segments and to localize the corresponding time interval for each behavior.An improved transformer block,the cascade encoder-decoder network(CEDNet),a transformer-based head,and weighted distance intersection over union(WDIoU)loss were integrated into CBLFormer to enhance the model's ability to distinguish between different behavior categories and locate behavior boundaries.For the training and testing of CBLFormer,a dataset was created by collecting videos from 320 chickens across different ages and rearing densities.The results showed that CBLFormer achieved a mAP@0.5:0.95 of 98.34%on the test set.The integration of CEDNet contributed the most to the performance improvement of CBLFormer.The visualization results confirmed that the model effectively captured the behavioral boundaries of chickens and correctly recognized behavior categories.The transfer learning results demonstrated that the model is applicable to chicken behavior recognition and localization tasks in real-world poultry farms.The proposed method handles cases where poultry behavior transitions occur within the video segment and improves the temporal resolution of video-based behavior recognition models.
基金supported by the National Natural Science Foundation of China(61872126)the Key Scientific Research Project Plan of Colleges and Universities in Henan Province(19A520004)。
文摘The research of emotion recognition based on electroencephalogram(EEG)signals often ignores the related information between the brain electrode channels and the contextual emotional information existing in EEG signals,which may contain important characteristics related to emotional states.Aiming at the above defects,a spatiotemporal emotion recognition method based on a 3-dimensional(3 D)time-frequency domain feature matrix was proposed.Specifically,the extracted time-frequency domain EEG features are first expressed as a 3 D matrix format according to the actual position of the cerebral cortex.Then,the input 3 D matrix is processed successively by multivariate convolutional neural network(MVCNN)and long short-term memory(LSTM)to classify the emotional state.Spatiotemporal emotion recognition method is evaluated on the DEAP data set,and achieved accuracy of 87.58%and 88.50%on arousal and valence dimensions respectively in binary classification tasks,as well as obtained accuracy of 84.58%in four class classification tasks.The experimental results show that 3 D matrix representation can represent emotional information more reasonably than two-dimensional(2 D).In addition,MVCNN and LSTM can utilize the spatial information of the electrode channels and the temporal context information of the EEG signal respectively.