A fast-charging policy is widely employed to alleviate the inconvenience caused by the extended charging time of electric vehicles. However, fast charging exacerbates battery degradation and shortens battery lifespan....A fast-charging policy is widely employed to alleviate the inconvenience caused by the extended charging time of electric vehicles. However, fast charging exacerbates battery degradation and shortens battery lifespan. In addition, there is still a lack of tailored health estimations for fast-charging batteries;most existing methods are applicable at lower charging rates. This paper proposes a novel method for estimating the health of lithium-ion batteries, which is tailored for multi-stage constant current-constant voltage fast-charging policies. Initially, short charging segments are extracted by monitoring current switches,followed by deriving voltage sequences using interpolation techniques. Subsequently, a graph generation layer is used to transform the voltage sequence into graphical data. Furthermore, the integration of a graph convolution network with a long short-term memory network enables the extraction of information related to inter-node message transmission, capturing the key local and temporal features during the battery degradation process. Finally, this method is confirmed by utilizing aging data from 185 cells and 81 distinct fast-charging policies. The 4-minute charging duration achieves a balance between high accuracy in estimating battery state of health and low data requirements, with mean absolute errors and root mean square errors of 0.34% and 0.66%, respectively.展开更多
Safety is one of the important topics in the field of civil aviation. Auxiliary Power Unit(APU) is one of important components in aircraft, which provides electrical power and compressed air for aircraft. The hazards ...Safety is one of the important topics in the field of civil aviation. Auxiliary Power Unit(APU) is one of important components in aircraft, which provides electrical power and compressed air for aircraft. The hazards in APU are prone to cause economic losses and even casualties. So,actively identifying the hazards in APU before an accident occurs is necessary. In this paper, a Hybrid Deep Neural Network(HDNN) based on multi-time window convolutional neural network-Bidirectional Long Short-Term Memory(CNN-Bi LSTM) neural network is proposed for active hazard identification of APU in civil aircraft. In order to identify the risks caused by different types of failures, the proposed HDNN simultaneously integrates three CNN-Bi LSTM basic models with different time window sizes in parallel by using a fully connected neural network. The CNN-Bi LSTM basic model can automatically extract features representing the system state from the input data and learn the time information of irregular trends in the time series data. Nine benchmark models are compared with the proposed HDNN. The comparison results show that the proposed HDNN has the highest identification accuracy. The HDNN has the most stable identification performance for data with imbalanced samples.展开更多
A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an e...A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.展开更多
Recognition of human activity based on convolutional neural network(CNN)has received the interest of researchers in recent years due to its significant improvement in accuracy.A large number of algorithms based on the...Recognition of human activity based on convolutional neural network(CNN)has received the interest of researchers in recent years due to its significant improvement in accuracy.A large number of algorithms based on the deep learning approach have been proposed for activity recognition purpose.However,with the increasing advancements in technologies having limited computational resources,it needs to design an efficient deep learning-based approaches with improved utilization of computational resources.This paper presents a simple and efficient 2-dimensional CNN(2-D CNN)architecture with very small-size convolutional kernel for human activity recognition.The merit of the proposed CNN architecture over standard deep learning architectures is fewer trainable parameters and lesser memory requirement which enables it to train the proposed CNN architecture on low GPU memory-based devices and also works well with smaller as well as larger size datasets.The proposed approach consists of mainly four stages:namely(1)creation of dataset and data augmentation,(2)designing 2-D CNN architecture,(3)the proposed 2-D CNN architecture trained from scratch up to optimum stage,and(4)evaluation of the trained 2-D CNN architecture.To illustrate the effectiveness of the proposed architecture several extensive experiments are conducted on three publicly available datasets,namely IXMAS,YouTube,and UCF101 dataset.The results of the proposed method and its comparison with other state-of-the-art methods demonstrate the usefulness of the proposed method.展开更多
As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical...As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).展开更多
Recent advancements have established machine learning's utility in predicting nonlinear fluid dynamics,with predictive accuracy being a central motivation for employing neural networks.However,the pattern recognit...Recent advancements have established machine learning's utility in predicting nonlinear fluid dynamics,with predictive accuracy being a central motivation for employing neural networks.However,the pattern recognition central to the networks function is equally valuable for enhancing our dynamical insight into the complex fluid dynamics.In this paper,a single-layer convolutional neural network(CNN)was trained to recognize three qualitatively different subsonic buffet flows(periodic,quasi-periodic and chaotic)over a high-incidence airfoil,and a near-perfect accuracy was obtained with only a small training dataset.The convolutional kernels and corresponding feature maps,developed by the model with no temporal information provided,identified large-scale coherent structures in agreement with those known to be associated with buffet flows.Sensitivity to hyperparameters including network architecture and convolutional kernel size was also explored.The coherent structures identified by these models enhance our dynamical understanding of subsonic buffet over high-incidence airfoils over a wide range of Reynolds numbers.展开更多
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India...Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.展开更多
Ionosphere delay is one of the main sources of noise affecting global navigation satellite systems, operation of radio detection and ranging systems and very-long-baseline-interferometry. One of the most important and...Ionosphere delay is one of the main sources of noise affecting global navigation satellite systems, operation of radio detection and ranging systems and very-long-baseline-interferometry. One of the most important and common methods to reduce this phase delay is to establish accurate nowcasting and forecasting ionospheric total electron content models. For forecasting models, compared to mid-to-high latitudes, at low latitudes, an active ionosphere leads to extreme differences between long-term prediction models and the actual state of the ionosphere. To solve the problem of low accuracy for long-term prediction models at low latitudes, this article provides a low-latitude, long-term ionospheric prediction model based on a multi-input-multi-output, long-short-term memory neural network. To verify the feasibility of the model, we first made predictions of the vertical total electron content data 24 and 48 hours in advance for each day of July 2020 and then compared both the predictions corresponding to a given day, for all days. Furthermore, in the model modification part, we selected historical data from June 2020 for the validation set, determined a large offset from the results that were predicted to be active, and used the ratio of the mean absolute error of the detected results to that of the predicted results as a correction coefficient to modify our multi-input-multi-output long short-term memory model. The average root mean square error of the 24-hour-advance predictions of our modified model was 4.4 TECU, which was lower and better than5.1 TECU of the multi-input-multi-output, long short-term memory model and 5.9 TECU of the IRI-2016 model.展开更多
Time series prediction has always been an important problem in the field of machine learning.Among them,power load forecasting plays a crucial role in identifying the behavior of photovoltaic power plants and regulati...Time series prediction has always been an important problem in the field of machine learning.Among them,power load forecasting plays a crucial role in identifying the behavior of photovoltaic power plants and regulating their control strategies.Traditional power load forecasting often has poor feature extraction performance for long time series.In this paper,a new deep learning framework Residual Stacked Temporal Long Short-Term Memory(RST-LSTM)is proposed,which combines wavelet decomposition and time convolutional memory network to solve the problem of feature extraction for long sequences.The network framework of RST-LSTM consists of two parts:one is a stacked time convolutional memory unit module for global and local feature extraction,and the other is a residual combination optimization module to reduce model redundancy.Finally,this paper demonstrates through various experimental indicators that RST-LSTM achieves significant performance improvements in both overall and local prediction accuracy compared to some state-of-the-art baseline methods.展开更多
Real-time prediction and precise control of sinter quality are pivotal for energy saving,cost reduction,quality improvement and efficiency enhancement in the ironmaking process.To advance,the accuracy and comprehensiv...Real-time prediction and precise control of sinter quality are pivotal for energy saving,cost reduction,quality improvement and efficiency enhancement in the ironmaking process.To advance,the accuracy and comprehensiveness of sinter quality prediction,an intelligent flare monitoring system for sintering machine tails that combines hybrid neural networks integrating convolutional neural network with long short-term memory(CNN-LSTM)networks was proposed.The system utilized a high-temperature thermal imager for image acquisition at the sintering machine tail and employed a zone-triggered method to accurately capture dynamic feature images under challenging conditions of high-temperature,high dust,and occlusion.The feature images were then segmented through a triple-iteration multi-thresholding approach based on the maximum between-class variance method to minimize detail loss during the segmentation process.Leveraging the advantages of CNN and LSTM networks in capturing temporal and spatial information,a comprehensive model for sinter quality prediction was constructed,with inputs including the proportion of combustion layer,porosity rate,temperature distribution,and image features obtained from the convolutional neural network,and outputs comprising quality indicators such as underburning index,uniformity index,and FeO content of the sinter.The accuracy is notably increased,achieving a 95.8%hit rate within an error margin of±1.0.After the system is applied,the average qualified rate of FeO content increases from 87.24%to 89.99%,representing an improvement of 2.75%.The average monthly solid fuel consumption is reduced from 49.75 to 46.44 kg/t,leading to a 6.65%reduction and underscoring significant energy saving and cost reduction effects.展开更多
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficien...With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficient and economical method for soil quality assessment.However,traditional single-output networks exhibit limitations in the prediction process,particularly in their inability to fully utilize the correlations among various elements.As a result,single-output networks tend to be optimized for a single task,neglecting the interrelationships among different soil elements,which limits prediction accuracy and model generalizability.To overcome this limitation,in this study,a multi-task learning architecture with a progressive extraction network was implemented for the simultaneous prediction of multiple indicators in soil,including nitrogen(N),organic carbon(OC),calcium carbonate(CaCO 3),cation exchange capacity(CEC),and pH.Furthermore,while incorporating the Pearson correlation coefficient,convolutional neural networks,long short-term memory networks and attention mechanisms were combined to extract local abstract features from the original spectra,thereby further improving the model.This architecture is referred to as the Relevance-sharing Progressive Layered Extraction Network.The model employs an adaptive joint loss optimization method to update the weights of individual task losses in the multi-task learning training process.展开更多
The combustion characteristic parameters of mining conveyor belts represent a crucial index for measuring the fire performance and hazard posed by combustible materials.An accurate prediction of its value provides imp...The combustion characteristic parameters of mining conveyor belts represent a crucial index for measuring the fire performance and hazard posed by combustible materials.An accurate prediction of its value provides important guidance on preventing conveyor belt fires.The critical parameters of a flame-retardant polyvinyl chloride gum elastic conveyor belt were measured under different radiative heat fluxes,including mass loss rate,heat release rate,effective heat of combustion and gas production rates for CO and CO_(2).The prediction method for the combustion characteristics of conveyor belts was proposed by combining a convolutional neural network with long short-term memory.Results indicated that the peak values of the mass loss,heat release,smoke production and gas production rates of CO and CO_(2) were positively correlated with radiative heat flux,whilst the time required to reach the peak value was negatively correlated with it.The peak time of the effective heat of combustion occurred earlier.Through deep learning modelling,mean absolute error,root mean square error and coefficient of determination were determined as 2.09,3.45 and 9.93×10^(-1),respectively.Compared with convolutional neural network,long short-term memory and multilayer perceptron,mean absolute error decreased by 26.92%,24.82%and 25.09%,root mean square error declined by 27.82%,29.59%and 29.59%and coefficient of determination increased by 0.05×10^(-1),0.06×10^(-1) and 0.06×10^(-1),respectively.The findings provide a quantitative reference benchmark for the development of conveyor belt fires and offer new technical support for the construction of early warning systems for conveyor belt fires in coal mines.展开更多
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo...Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.展开更多
Stocks that are fundamentally connected with each other tend to move together.Considering such common trends is believed to benefit stock movement forecasting tasks.However,such signals are not trivial to model becaus...Stocks that are fundamentally connected with each other tend to move together.Considering such common trends is believed to benefit stock movement forecasting tasks.However,such signals are not trivial to model because the connections among stocks are not physically presented and need to be estimated from volatile data.Motivated by this observation,we propose a framework that incorporates the inter-connection of firms to forecast stock prices.To effectively utilize a large set of fundamental features,we further design a novel pipeline.First,we use variational autoencoder(VAE)to reduce the dimension of stock fundamental information and then cluster stocks into a graph structure(fundamentally clustering).Second,a hybrid model of graph convolutional network and long-short term memory network(GCN-LSTM)with an adjacency graph matrix(learnt from VAE)is proposed for graph-structured stock market forecasting.Experiments on minute-level U.S.stock market data demonstrate that our model effectively captures both spatial and temporal signals and achieves superior improvement over baseline methods.The proposed model is promising for other applications in which there is a possible but hidden spatial dependency to improve time-series prediction.展开更多
Owing to the expansion of the grid interconnection scale,the spatiotemporal distribution characteristics of the frequency response of power systems after the occurrence of disturbances have become increasingly importa...Owing to the expansion of the grid interconnection scale,the spatiotemporal distribution characteristics of the frequency response of power systems after the occurrence of disturbances have become increasingly important.These characteristics can provide effective support in coordinated security control.However,traditional model-based frequencyprediction methods cannot satisfactorily meet the requirements of online applications owing to the long calculation time and accurate power-system models.Therefore,this study presents a rolling frequency-prediction model based on a graph convolutional network(GCN)and a long short-term memory(LSTM)spatiotemporal network and named as STGCN-LSTM.In the proposed method,the measurement data from phasor measurement units after the occurrence of disturbances are used to construct the spatiotemporal input.An improved GCN embedded with topology information is used to extract the spatial features,while the LSTM network is used to extract the temporal features.The spatiotemporal-network-regression model is further trained,and asynchronous-frequency-sequence prediction is realized by utilizing the rolling update of measurement information.The proposed spatiotemporal-network-based prediction model can achieve accurate frequency prediction by considering the spatiotemporal distribution characteristics of the frequency response.The noise immunity and robustness of the proposed method are verified on the IEEE 39-bus and IEEE 118-bus systems.展开更多
Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann arc...Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.展开更多
基金National Key Research and Development Program of China (Grant No. 2022YFE0102700)National Natural Science Foundation of China (Grant No. 52102420)+2 种基金research project “Safe Da Batt” (03EMF0409A) funded by the German Federal Ministry of Digital and Transport (BMDV)China Postdoctoral Science Foundation (Grant No. 2023T160085)Sichuan Science and Technology Program (Grant No. 2024NSFSC0938)。
文摘A fast-charging policy is widely employed to alleviate the inconvenience caused by the extended charging time of electric vehicles. However, fast charging exacerbates battery degradation and shortens battery lifespan. In addition, there is still a lack of tailored health estimations for fast-charging batteries;most existing methods are applicable at lower charging rates. This paper proposes a novel method for estimating the health of lithium-ion batteries, which is tailored for multi-stage constant current-constant voltage fast-charging policies. Initially, short charging segments are extracted by monitoring current switches,followed by deriving voltage sequences using interpolation techniques. Subsequently, a graph generation layer is used to transform the voltage sequence into graphical data. Furthermore, the integration of a graph convolution network with a long short-term memory network enables the extraction of information related to inter-node message transmission, capturing the key local and temporal features during the battery degradation process. Finally, this method is confirmed by utilizing aging data from 185 cells and 81 distinct fast-charging policies. The 4-minute charging duration achieves a balance between high accuracy in estimating battery state of health and low data requirements, with mean absolute errors and root mean square errors of 0.34% and 0.66%, respectively.
基金co-supported by National Natural Science Foundation of China(No.U1933202)Natural Science Foundation of Civil Aviation University of China(No.U1733201)+1 种基金China Scholarship Council(CSC)(No.201906830043)Postgraduate Research&Practice Innovation Program of Jiangsu Province,China(Nos.KYCX18_0310 and KYCX18_0265)。
文摘Safety is one of the important topics in the field of civil aviation. Auxiliary Power Unit(APU) is one of important components in aircraft, which provides electrical power and compressed air for aircraft. The hazards in APU are prone to cause economic losses and even casualties. So,actively identifying the hazards in APU before an accident occurs is necessary. In this paper, a Hybrid Deep Neural Network(HDNN) based on multi-time window convolutional neural network-Bidirectional Long Short-Term Memory(CNN-Bi LSTM) neural network is proposed for active hazard identification of APU in civil aircraft. In order to identify the risks caused by different types of failures, the proposed HDNN simultaneously integrates three CNN-Bi LSTM basic models with different time window sizes in parallel by using a fully connected neural network. The CNN-Bi LSTM basic model can automatically extract features representing the system state from the input data and learn the time information of irregular trends in the time series data. Nine benchmark models are compared with the proposed HDNN. The comparison results show that the proposed HDNN has the highest identification accuracy. The HDNN has the most stable identification performance for data with imbalanced samples.
基金Researchers would like to thank the Deanship of Scientific Research,Qassim University,for funding publication of this project.
文摘A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.
文摘Recognition of human activity based on convolutional neural network(CNN)has received the interest of researchers in recent years due to its significant improvement in accuracy.A large number of algorithms based on the deep learning approach have been proposed for activity recognition purpose.However,with the increasing advancements in technologies having limited computational resources,it needs to design an efficient deep learning-based approaches with improved utilization of computational resources.This paper presents a simple and efficient 2-dimensional CNN(2-D CNN)architecture with very small-size convolutional kernel for human activity recognition.The merit of the proposed CNN architecture over standard deep learning architectures is fewer trainable parameters and lesser memory requirement which enables it to train the proposed CNN architecture on low GPU memory-based devices and also works well with smaller as well as larger size datasets.The proposed approach consists of mainly four stages:namely(1)creation of dataset and data augmentation,(2)designing 2-D CNN architecture,(3)the proposed 2-D CNN architecture trained from scratch up to optimum stage,and(4)evaluation of the trained 2-D CNN architecture.To illustrate the effectiveness of the proposed architecture several extensive experiments are conducted on three publicly available datasets,namely IXMAS,YouTube,and UCF101 dataset.The results of the proposed method and its comparison with other state-of-the-art methods demonstrate the usefulness of the proposed method.
基金Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-19-006A3).
文摘As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).
文摘Recent advancements have established machine learning's utility in predicting nonlinear fluid dynamics,with predictive accuracy being a central motivation for employing neural networks.However,the pattern recognition central to the networks function is equally valuable for enhancing our dynamical insight into the complex fluid dynamics.In this paper,a single-layer convolutional neural network(CNN)was trained to recognize three qualitatively different subsonic buffet flows(periodic,quasi-periodic and chaotic)over a high-incidence airfoil,and a near-perfect accuracy was obtained with only a small training dataset.The convolutional kernels and corresponding feature maps,developed by the model with no temporal information provided,identified large-scale coherent structures in agreement with those known to be associated with buffet flows.Sensitivity to hyperparameters including network architecture and convolutional kernel size was also explored.The coherent structures identified by these models enhance our dynamical understanding of subsonic buffet over high-incidence airfoils over a wide range of Reynolds numbers.
文摘Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.
基金Project supported by the National Key Research and Development Program of China(Grant No.2016YFA0302101)the Initiative Program of State Key Laboratory of Precision Measurement Technology and Instrument。
文摘Ionosphere delay is one of the main sources of noise affecting global navigation satellite systems, operation of radio detection and ranging systems and very-long-baseline-interferometry. One of the most important and common methods to reduce this phase delay is to establish accurate nowcasting and forecasting ionospheric total electron content models. For forecasting models, compared to mid-to-high latitudes, at low latitudes, an active ionosphere leads to extreme differences between long-term prediction models and the actual state of the ionosphere. To solve the problem of low accuracy for long-term prediction models at low latitudes, this article provides a low-latitude, long-term ionospheric prediction model based on a multi-input-multi-output, long-short-term memory neural network. To verify the feasibility of the model, we first made predictions of the vertical total electron content data 24 and 48 hours in advance for each day of July 2020 and then compared both the predictions corresponding to a given day, for all days. Furthermore, in the model modification part, we selected historical data from June 2020 for the validation set, determined a large offset from the results that were predicted to be active, and used the ratio of the mean absolute error of the detected results to that of the predicted results as a correction coefficient to modify our multi-input-multi-output long short-term memory model. The average root mean square error of the 24-hour-advance predictions of our modified model was 4.4 TECU, which was lower and better than5.1 TECU of the multi-input-multi-output, long short-term memory model and 5.9 TECU of the IRI-2016 model.
基金funded by NARI Group’s Independent Project of China(Granted No.524609230125)the foundation of NARI-TECH Nanjing Control System Ltd.of China(Granted No.0914202403120020).
文摘Time series prediction has always been an important problem in the field of machine learning.Among them,power load forecasting plays a crucial role in identifying the behavior of photovoltaic power plants and regulating their control strategies.Traditional power load forecasting often has poor feature extraction performance for long time series.In this paper,a new deep learning framework Residual Stacked Temporal Long Short-Term Memory(RST-LSTM)is proposed,which combines wavelet decomposition and time convolutional memory network to solve the problem of feature extraction for long sequences.The network framework of RST-LSTM consists of two parts:one is a stacked time convolutional memory unit module for global and local feature extraction,and the other is a residual combination optimization module to reduce model redundancy.Finally,this paper demonstrates through various experimental indicators that RST-LSTM achieves significant performance improvements in both overall and local prediction accuracy compared to some state-of-the-art baseline methods.
基金founded by the Open Project Program of Anhui Province Key Laboratory of Metallurgical Engineering and Resources Recycling(Anhui University of Technology)(No.SKF21-06)Research Fund for Young Teachers of Anhui University of Technology in 2020(No.QZ202001).
文摘Real-time prediction and precise control of sinter quality are pivotal for energy saving,cost reduction,quality improvement and efficiency enhancement in the ironmaking process.To advance,the accuracy and comprehensiveness of sinter quality prediction,an intelligent flare monitoring system for sintering machine tails that combines hybrid neural networks integrating convolutional neural network with long short-term memory(CNN-LSTM)networks was proposed.The system utilized a high-temperature thermal imager for image acquisition at the sintering machine tail and employed a zone-triggered method to accurately capture dynamic feature images under challenging conditions of high-temperature,high dust,and occlusion.The feature images were then segmented through a triple-iteration multi-thresholding approach based on the maximum between-class variance method to minimize detail loss during the segmentation process.Leveraging the advantages of CNN and LSTM networks in capturing temporal and spatial information,a comprehensive model for sinter quality prediction was constructed,with inputs including the proportion of combustion layer,porosity rate,temperature distribution,and image features obtained from the convolutional neural network,and outputs comprising quality indicators such as underburning index,uniformity index,and FeO content of the sinter.The accuracy is notably increased,achieving a 95.8%hit rate within an error margin of±1.0.After the system is applied,the average qualified rate of FeO content increases from 87.24%to 89.99%,representing an improvement of 2.75%.The average monthly solid fuel consumption is reduced from 49.75 to 46.44 kg/t,leading to a 6.65%reduction and underscoring significant energy saving and cost reduction effects.
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
文摘With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficient and economical method for soil quality assessment.However,traditional single-output networks exhibit limitations in the prediction process,particularly in their inability to fully utilize the correlations among various elements.As a result,single-output networks tend to be optimized for a single task,neglecting the interrelationships among different soil elements,which limits prediction accuracy and model generalizability.To overcome this limitation,in this study,a multi-task learning architecture with a progressive extraction network was implemented for the simultaneous prediction of multiple indicators in soil,including nitrogen(N),organic carbon(OC),calcium carbonate(CaCO 3),cation exchange capacity(CEC),and pH.Furthermore,while incorporating the Pearson correlation coefficient,convolutional neural networks,long short-term memory networks and attention mechanisms were combined to extract local abstract features from the original spectra,thereby further improving the model.This architecture is referred to as the Relevance-sharing Progressive Layered Extraction Network.The model employs an adaptive joint loss optimization method to update the weights of individual task losses in the multi-task learning training process.
基金supported by the Natural Science Foundation of Shaanxi Province(No.2024JC-YBQN-0458)National Natural Science Foundation of China(No.5217-4204)+1 种基金Innovation Capacity Improve-ment Project for Small and Medium-Sized Scientific and Technological Enterprises of Shandong Province(No.2023TSGC0952)Luzhou Science and Technology Planning Project of China(No.2024JYJ057).
文摘The combustion characteristic parameters of mining conveyor belts represent a crucial index for measuring the fire performance and hazard posed by combustible materials.An accurate prediction of its value provides important guidance on preventing conveyor belt fires.The critical parameters of a flame-retardant polyvinyl chloride gum elastic conveyor belt were measured under different radiative heat fluxes,including mass loss rate,heat release rate,effective heat of combustion and gas production rates for CO and CO_(2).The prediction method for the combustion characteristics of conveyor belts was proposed by combining a convolutional neural network with long short-term memory.Results indicated that the peak values of the mass loss,heat release,smoke production and gas production rates of CO and CO_(2) were positively correlated with radiative heat flux,whilst the time required to reach the peak value was negatively correlated with it.The peak time of the effective heat of combustion occurred earlier.Through deep learning modelling,mean absolute error,root mean square error and coefficient of determination were determined as 2.09,3.45 and 9.93×10^(-1),respectively.Compared with convolutional neural network,long short-term memory and multilayer perceptron,mean absolute error decreased by 26.92%,24.82%and 25.09%,root mean square error declined by 27.82%,29.59%and 29.59%and coefficient of determination increased by 0.05×10^(-1),0.06×10^(-1) and 0.06×10^(-1),respectively.The findings provide a quantitative reference benchmark for the development of conveyor belt fires and offer new technical support for the construction of early warning systems for conveyor belt fires in coal mines.
文摘Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.
文摘Stocks that are fundamentally connected with each other tend to move together.Considering such common trends is believed to benefit stock movement forecasting tasks.However,such signals are not trivial to model because the connections among stocks are not physically presented and need to be estimated from volatile data.Motivated by this observation,we propose a framework that incorporates the inter-connection of firms to forecast stock prices.To effectively utilize a large set of fundamental features,we further design a novel pipeline.First,we use variational autoencoder(VAE)to reduce the dimension of stock fundamental information and then cluster stocks into a graph structure(fundamentally clustering).Second,a hybrid model of graph convolutional network and long-short term memory network(GCN-LSTM)with an adjacency graph matrix(learnt from VAE)is proposed for graph-structured stock market forecasting.Experiments on minute-level U.S.stock market data demonstrate that our model effectively captures both spatial and temporal signals and achieves superior improvement over baseline methods.The proposed model is promising for other applications in which there is a possible but hidden spatial dependency to improve time-series prediction.
基金supported by the National Natural Science Foundation of China(Grant Nos.51627811,51725702)the Science and Technology Project of State Grid Corporation of Beijing(Grant No.SGBJDK00DWJS2100164).
文摘Owing to the expansion of the grid interconnection scale,the spatiotemporal distribution characteristics of the frequency response of power systems after the occurrence of disturbances have become increasingly important.These characteristics can provide effective support in coordinated security control.However,traditional model-based frequencyprediction methods cannot satisfactorily meet the requirements of online applications owing to the long calculation time and accurate power-system models.Therefore,this study presents a rolling frequency-prediction model based on a graph convolutional network(GCN)and a long short-term memory(LSTM)spatiotemporal network and named as STGCN-LSTM.In the proposed method,the measurement data from phasor measurement units after the occurrence of disturbances are used to construct the spatiotemporal input.An improved GCN embedded with topology information is used to extract the spatial features,while the LSTM network is used to extract the temporal features.The spatiotemporal-network-regression model is further trained,and asynchronous-frequency-sequence prediction is realized by utilizing the rolling update of measurement information.The proposed spatiotemporal-network-based prediction model can achieve accurate frequency prediction by considering the spatiotemporal distribution characteristics of the frequency response.The noise immunity and robustness of the proposed method are verified on the IEEE 39-bus and IEEE 118-bus systems.
基金the National Key Research and Development Program of China(2018YFB2202602)The State Key Program of the National Natural Science Foundation of China(NO.61934005)+1 种基金The National Natural Science Foundation of China(NO.62074001)Joint Funds of the National Natural Science Foundation of China under Grant U19A2074.
文摘Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.