As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of ...As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Automated recognition of insect category,which currently is performed mainly by agriculture experts,is a challenging problem that has received increasing attention in recent years.The goal of the present research is t...Automated recognition of insect category,which currently is performed mainly by agriculture experts,is a challenging problem that has received increasing attention in recent years.The goal of the present research is to develop an intelligent mobile-terminal recognition system based on deep neural networks to recognize garden insects in a device that can be conveniently deployed in mobile terminals.State-of-the-art lightweight convolutional neural networks(such as SqueezeNet and ShuffleNet)have the same accuracy as classical convolutional neural networks such as AlexNet but fewer parameters,thereby not only requiring communication across servers during distributed training but also being more feasible to deploy on mobile terminals and other hardware with limited memory.In this research,we connect with the rich details of the low-level network features and the rich semantic information of the high-level network features to construct more rich semantic information feature maps which can effectively improve SqueezeNet model with a small computational cost.In addition,we developed an off-line insect recognition software that can be deployed on the mobile terminal to solve no network and the timedelay problems in the field.Experiments demonstrate that the proposed method is promising for recognition while remaining within a limited computational budget and delivers a much higher recognition accuracy of 91.64%with less training time relative to other classical convolutional neural networks.We have also verified the results that the improved SqueezeNet model has a 2.3%higher than of the original model in the open insect data IP102.展开更多
In viticulture,there is an increasing demand for automatic winter grapevine pruning devices,for which detection of pruning location in vineyard images is a necessary task,susceptible to being automated through the use...In viticulture,there is an increasing demand for automatic winter grapevine pruning devices,for which detection of pruning location in vineyard images is a necessary task,susceptible to being automated through the use of computer vision methods.In this study,a novel 2D grapevine winter pruning location detection method was proposed for automatic winter pruning with a Y-shaped cultivation system.The method can be divided into the following four steps.First,the vineyard image was segmented by the threshold two times Red minus Green minus Blue(2R−G−B)channel and S channel;Second,extract the grapevine skeleton by Improved Enhanced Parallel Thinning Algorithm(IEPTA);Third,find the structure of each grapevine by judging the angle and distance relationship between branches;Fourth,obtain the bounding boxes from these grapevines,then pre-trained MobileNetV3_small×0.75 was utilized to classify each bounding box and finally find the pruning location.According to the detection experiment result,the method of this study achieved a precision of 98.8%and a recall of 92.3%for bud detection,an accuracy of 83.4%for pruning location detection,and a total time of 0.423 s.Therefore,the results indicated that the proposed 2D pruning location detection method had decent robustness as well as high precision that could guide automatic devices to winter prune efficiently.展开更多
In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of ...In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of pattern recognition.The research and development of high-efficiency,highquality and low-cost automatic identification methods for rice diseases that can replace humans is an important means of dealing with the current situation from a technical perspective.This paper mainly focuses on the problem of huge parameters of the Convolutional Neural Network(CNN)model and proposes a recognitionmodel that combines amulti-scale convolution module with a neural network model based on Visual Geometry Group(VGG).The accuracy and loss of the training set and the test set are used to evaluate the performance of the model.The test accuracy of this model is 97.1%that has increased 5.87%over VGG.Furthermore,the memory requirement is 26.1M,only 1.6%of the VGG.Experiment results show that this model performs better in terms of accuracy,recognition speed and memory size.展开更多
To address low detection accuracy in near-coastal vessel target detection under complex conditions,a novel near-coastal vessel detection model based on an improved YOLOv7 architecture is proposed in this paper.The att...To address low detection accuracy in near-coastal vessel target detection under complex conditions,a novel near-coastal vessel detection model based on an improved YOLOv7 architecture is proposed in this paper.The attention mechanism Coordinate Attention is used to improve channel attention weight and enhance a network’s ability to extract small target features.In the enhanced feature extraction network,the lightweight convolution algorithm Grouped Spatial Convolution is used to replace MPConv to reduce model calculation costs.EIoU Loss is used to replace the regression frame loss function in YOLOv7 to reduce the probability of missed and false detection.The performance of the improved model was verified using an enhanced dataset obtained through rainy and foggy weather simulation.Experiments were conducted on the datasets before and after the enhancement.The improved model achieved a mean average precision(mAP)of 97.45%on the original dataset,and the number of parameters was reduced by 2%.On the enhanced dataset,the mAP of the improved model reached 88.08%.Compared with seven target detection models,such as Faster R-CNN,YOLOv3,YOLOv4,YOLOv5,YOLOv7,YOLOv8-n,and YOLOv8-s,the improved model can effectively reduce the missed and false detection rates and improve target detection accuracy.The improved model not only accurately detects vessels in complex weather environments but also outperforms other methods on original and enhanced SeaShip datasets.This finding shows that the improved model can achieve near-coastal vessel target detection in multiple environments,laying the foundation for vessel path planning and automatic obstacle avoidance.展开更多
To address the issues of slow diagnostic speed,low accuracy,and poor generalization performance in traditional rolling bearing fault diagnosis methods,we propose a rolling bearing fault diagnosis method based on Marko...To address the issues of slow diagnostic speed,low accuracy,and poor generalization performance in traditional rolling bearing fault diagnosis methods,we propose a rolling bearing fault diagnosis method based on Markov Transition Field(MTF)image encoding combined with a lightweight convolutional neural network that integrates a Convolutional Block Attention Module(CBAM-LCNN).Specifically,we first use the Markov Transition Field to convert the original one-dimensional vibration signals of rolling bearings into two-dimensional images.Then,we construct a lightweight convolutional neural network incorporating the convolutional attention module(CBAM-LCNN).Finally,the two-dimensional images obtained from MTF mapping are fed into the CBAM-LCNN network for image feature extraction and fault diagnosis.We validate the effectiveness of the proposed method on the bearing fault datasets from Guangdong University of Petrochemical Technology’s multi-stage centrifugal fan and Case Western Reserve University.Experimental results show that,compared to other advanced baseline methods,the proposed rolling bearing fault diagnosis method offers faster diagnostic speed and higher diagnostic accuracy.In addition,we conducted experiments on the Xi’an Jiaotong University rolling bearing dataset,achieving excellent results in bearing fault diagnosis.These results validate the strong generalization performance of the proposed method.The method presented in this paper not only effectively diagnoses faults in rolling bearings but also serves as a reference for fault diagnosis in other equipment.展开更多
As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational ef...As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).展开更多
The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financia...The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.展开更多
With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The reco...With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space.To solve the problem of low recognition accuracy of existing networks,an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed.The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures.Additionally,to enhance the model’s focus and improve its accuracy in identifying dynamic gestures,a lightweight convolutional attention mechanism is introduced.This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase.In order to further optimize the performance of the model,a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction.Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03%and 86.21%,respectively.When operating in RGB mode,the accuracy reached 93.49%and 80.22%,respectively.These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy,showcasing its potential for applications in advanced human–computer interaction systems.展开更多
Speeding is one of the primary contributors to rural road crashes.Self-explaining theory offers a solution to reduce speeding,which suggests that well-designed facility environments(i.e.,road facilities and surroundin...Speeding is one of the primary contributors to rural road crashes.Self-explaining theory offers a solution to reduce speeding,which suggests that well-designed facility environments(i.e.,road facilities and surrounding landscapes)can automatically guide drivers to choose appropriate speeds on different road categories.This study proposes an improved lightweight convolutional neural network(LW-CNN)that includes drivers’visual perception characteristics(i.e.,depth perception and dynamic vision)to conduct the self-explaining analysis of the facility environment on 2-lane rural roads.Data for this study are gathered through naturalistic driving experiments on 2-lane rural roads across five Chinese provinces.A total of 3502 visual facility environment images,alongside their corresponding operation speeds and speed limits,are collected.The improved LW-CNN exhibits high accuracy and efficiency in predicting operation speeds with these visual facility environment images,achieving a train loss of 0.05%and a validation loss of 0.15%.The semantics of facility environments affecting operation speeds are further identified by combining this LW-CNN with the gradient-weighted class activation mapping(Grad-CAM)algorithm and the semantic segmentation network.Then,six typical 2-lane rural road categories perceived by drivers with different operation speeds and speeding probability(SP)are sum-marized using k-means clustering.An objective and comprehensive analysis of each category’s semantic composition and depth features is conducted to evaluate their influence on drivers’speeding probability and road category perception.The findings of this study can be directly used to optimize facility environments from drivers’visual perception to decrease speeding-related crashes.展开更多
Automated analysis of sports video summarization is challenging due to variations in cameras,replay speed,illumination conditions,editing effects,game structure,genre,etc.To address these challenges,we propose an effe...Automated analysis of sports video summarization is challenging due to variations in cameras,replay speed,illumination conditions,editing effects,game structure,genre,etc.To address these challenges,we propose an effective video summarization framework based on shot classification and replay detection for field sports videos.Accurate shot classification is mandatory to better structure the input video for further processing,i.e.,key events or replay detection.Therefore,we present a lightweight convolutional neural network based method for shot classification.Then we analyze each shot for replay detection and specifically detect the successive batch of logo transition frames that identify the replay segments from the sports videos.For this purpose,we propose local octa-pattern features to represent video frames and train the extreme learning machine for classification as replay or non-replay frames.The proposed framework is robust to variations in cameras,replay speed,shot speed,illumination conditions,game structure,sports genre,broadcasters,logo designs and placement,frame transitions,and editing effects.The performance of our framework is evaluated on a dataset containing diverse YouTube sports videos of soccer,baseball,and cricket.Experimental results demonstrate that the proposed framework can reliably be used for shot classification and replay detection to summarize field sports videos.展开更多
Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet...Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet behavior recognition models,this paper proposes GTB-ResNet,a network designed to detect abnormal behaviors in petroleum station staff.Design/methodology/approach-Firstly,to mitigate the issues of excessive parameters and computational complexity in 3D ResNet,a lightweight residual convolution module called the Ghost residual module(GhostNet)is introduced in the feature extraction network.Ghost convolution replaces standard convolution,reducing model parameters while preserving multi-scale feature extraction capabilities.Secondly,to enhance the model’s focus on salient features amidst wide surveillance ranges and small target objects,the triplet attention mechanism module is integrated to facilitate spatial and channel information interaction.Lastly,to address the challenge of short time-series features leading to misjudgments in similar actions,a bidirectional gated recurrent network is added to the feature extraction backbone network.This ensures the extraction of key long time-series features,thereby improving feature extraction accuracy.Findings-The experimental setup encompasses four behavior types:illegal phone answering,smoking,falling(abnormal)and touching the face(normal),comprising a total of 892 videos.Experimental results showcase GTB-ResNet achieving a recognition accuracy of 96.7%with a model parameter count of 4.46 M and a computational complexity of 3.898 G.This represents a 4.4%improvement over 3D ResNet,with reductions of 90.4%in parameters and 61.5%in computational complexity.Originality/value-Specifically designed for edge devices in oil stations,the 3D ResNet network is tailored for real-time action prediction.To address the challenges posed by the large number of parameters in 3D ResNet networks and the difficulties in deployment on edge devices,a lightweight residual module based on ghost convolution is developed.Additionally,to tackle the issue of low detection accuracy of behaviors amidst the noisy environment of petroleum stations,a triple attention mechanism is introduced during feature extraction to enhance focus on salient features.Moreover,to overcome the potential for misjudgments arising from the similarity of actions,a Bi-GRU model is introduced to enhance the extraction of key long-term features.展开更多
基金This work was funded by the foundation of Liaoning Educational committee under the Grant No.2019LNJC03.
文摘As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金National Natural Science Foundation of China(Grand No:61601034)National Natural Science of China(Grand No:31871525)Promotion and Innovation of Beijing Academy of Agriculture and Forestry Sciences.
文摘Automated recognition of insect category,which currently is performed mainly by agriculture experts,is a challenging problem that has received increasing attention in recent years.The goal of the present research is to develop an intelligent mobile-terminal recognition system based on deep neural networks to recognize garden insects in a device that can be conveniently deployed in mobile terminals.State-of-the-art lightweight convolutional neural networks(such as SqueezeNet and ShuffleNet)have the same accuracy as classical convolutional neural networks such as AlexNet but fewer parameters,thereby not only requiring communication across servers during distributed training but also being more feasible to deploy on mobile terminals and other hardware with limited memory.In this research,we connect with the rich details of the low-level network features and the rich semantic information of the high-level network features to construct more rich semantic information feature maps which can effectively improve SqueezeNet model with a small computational cost.In addition,we developed an off-line insect recognition software that can be deployed on the mobile terminal to solve no network and the timedelay problems in the field.Experiments demonstrate that the proposed method is promising for recognition while remaining within a limited computational budget and delivers a much higher recognition accuracy of 91.64%with less training time relative to other classical convolutional neural networks.We have also verified the results that the improved SqueezeNet model has a 2.3%higher than of the original model in the open insect data IP102.
基金This work was financially supported by the Basic Public Welfare Research Project of Zhejiang Province(Grant No.LGN20E050007).
文摘In viticulture,there is an increasing demand for automatic winter grapevine pruning devices,for which detection of pruning location in vineyard images is a necessary task,susceptible to being automated through the use of computer vision methods.In this study,a novel 2D grapevine winter pruning location detection method was proposed for automatic winter pruning with a Y-shaped cultivation system.The method can be divided into the following four steps.First,the vineyard image was segmented by the threshold two times Red minus Green minus Blue(2R−G−B)channel and S channel;Second,extract the grapevine skeleton by Improved Enhanced Parallel Thinning Algorithm(IEPTA);Third,find the structure of each grapevine by judging the angle and distance relationship between branches;Fourth,obtain the bounding boxes from these grapevines,then pre-trained MobileNetV3_small×0.75 was utilized to classify each bounding box and finally find the pruning location.According to the detection experiment result,the method of this study achieved a precision of 98.8%and a recall of 92.3%for bud detection,an accuracy of 83.4%for pruning location detection,and a total time of 0.423 s.Therefore,the results indicated that the proposed 2D pruning location detection method had decent robustness as well as high precision that could guide automatic devices to winter prune efficiently.
基金supported by National key research and development program sub-topics[2018YFF0213606-03(Mu Y.,Hu T.L.,Gong H.,Li S.J.and Sun Y.H.)http://www.most.gov.cn]Jilin Province Science and Technology Development Plan focuses on research and development projects[20200402006NC(Mu Y.,Hu T.L.,Gong H.and Li S.J.)http://kjt.jl.gov.cn]+1 种基金Science and technology support project for key industries in southern Xinjiang[2018DB001(Gong H.,and Li S.J.)http://kjj.xjbt.gov.cn]Key technology R&D project of Changchun Science and Technology Bureau of Jilin Province[21ZGN29(Mu Y.,Bao H.P.,Wang X.B.)http://kjj.changchun.gov.cn].
文摘In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of pattern recognition.The research and development of high-efficiency,highquality and low-cost automatic identification methods for rice diseases that can replace humans is an important means of dealing with the current situation from a technical perspective.This paper mainly focuses on the problem of huge parameters of the Convolutional Neural Network(CNN)model and proposes a recognitionmodel that combines amulti-scale convolution module with a neural network model based on Visual Geometry Group(VGG).The accuracy and loss of the training set and the test set are used to evaluate the performance of the model.The test accuracy of this model is 97.1%that has increased 5.87%over VGG.Furthermore,the memory requirement is 26.1M,only 1.6%of the VGG.Experiment results show that this model performs better in terms of accuracy,recognition speed and memory size.
文摘To address low detection accuracy in near-coastal vessel target detection under complex conditions,a novel near-coastal vessel detection model based on an improved YOLOv7 architecture is proposed in this paper.The attention mechanism Coordinate Attention is used to improve channel attention weight and enhance a network’s ability to extract small target features.In the enhanced feature extraction network,the lightweight convolution algorithm Grouped Spatial Convolution is used to replace MPConv to reduce model calculation costs.EIoU Loss is used to replace the regression frame loss function in YOLOv7 to reduce the probability of missed and false detection.The performance of the improved model was verified using an enhanced dataset obtained through rainy and foggy weather simulation.Experiments were conducted on the datasets before and after the enhancement.The improved model achieved a mean average precision(mAP)of 97.45%on the original dataset,and the number of parameters was reduced by 2%.On the enhanced dataset,the mAP of the improved model reached 88.08%.Compared with seven target detection models,such as Faster R-CNN,YOLOv3,YOLOv4,YOLOv5,YOLOv7,YOLOv8-n,and YOLOv8-s,the improved model can effectively reduce the missed and false detection rates and improve target detection accuracy.The improved model not only accurately detects vessels in complex weather environments but also outperforms other methods on original and enhanced SeaShip datasets.This finding shows that the improved model can achieve near-coastal vessel target detection in multiple environments,laying the foundation for vessel path planning and automatic obstacle avoidance.
基金supported by the National Natural Science Foundation of China(52001340)the Henan Province Science and Technology Key Research Project(242102110332)the Henan Province Teaching Reform Project(2022SYJXLX087).
文摘To address the issues of slow diagnostic speed,low accuracy,and poor generalization performance in traditional rolling bearing fault diagnosis methods,we propose a rolling bearing fault diagnosis method based on Markov Transition Field(MTF)image encoding combined with a lightweight convolutional neural network that integrates a Convolutional Block Attention Module(CBAM-LCNN).Specifically,we first use the Markov Transition Field to convert the original one-dimensional vibration signals of rolling bearings into two-dimensional images.Then,we construct a lightweight convolutional neural network incorporating the convolutional attention module(CBAM-LCNN).Finally,the two-dimensional images obtained from MTF mapping are fed into the CBAM-LCNN network for image feature extraction and fault diagnosis.We validate the effectiveness of the proposed method on the bearing fault datasets from Guangdong University of Petrochemical Technology’s multi-stage centrifugal fan and Case Western Reserve University.Experimental results show that,compared to other advanced baseline methods,the proposed rolling bearing fault diagnosis method offers faster diagnostic speed and higher diagnostic accuracy.In addition,we conducted experiments on the Xi’an Jiaotong University rolling bearing dataset,achieving excellent results in bearing fault diagnosis.These results validate the strong generalization performance of the proposed method.The method presented in this paper not only effectively diagnoses faults in rolling bearings but also serves as a reference for fault diagnosis in other equipment.
文摘As the field of autonomous driving evolves, real-time semantic segmentation has become a crucial part of computer vision tasks. However, most existing methods use lightweight convolution to reduce the computational effort, resulting in lower accuracy. To address this problem, we construct TBANet, a network with an encoder-decoder structure for efficient feature extraction. In the encoder part, the TBA module is designed to extract details and the ETBA module is used to learn semantic representations in a high-dimensional space. In the decoder part, we design a combination of multiple upsampling methods to aggregate features with less computational overhead. We validate the efficiency of TBANet on the Cityscapes dataset. It achieves 75.1% mean Intersection over Union(mIoU) with only 2.07 million parameters and can reach 90.3 Frames Per Second(FPS).
基金funded by National Natural Science Foundation of China(under Grant No.61905201)。
文摘The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.
文摘With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space.To solve the problem of low recognition accuracy of existing networks,an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed.The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures.Additionally,to enhance the model’s focus and improve its accuracy in identifying dynamic gestures,a lightweight convolutional attention mechanism is introduced.This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase.In order to further optimize the performance of the model,a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction.Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03%and 86.21%,respectively.When operating in RGB mode,the accuracy reached 93.49%and 80.22%,respectively.These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy,showcasing its potential for applications in advanced human–computer interaction systems.
基金supported by the National Natural Science Foundation of China(No.52102416)the Natural Science Foundation of Shanghai(No.22ZR1466000)the Fundamental Research Funds for the Central Universities of China(No.22120240159).
文摘Speeding is one of the primary contributors to rural road crashes.Self-explaining theory offers a solution to reduce speeding,which suggests that well-designed facility environments(i.e.,road facilities and surrounding landscapes)can automatically guide drivers to choose appropriate speeds on different road categories.This study proposes an improved lightweight convolutional neural network(LW-CNN)that includes drivers’visual perception characteristics(i.e.,depth perception and dynamic vision)to conduct the self-explaining analysis of the facility environment on 2-lane rural roads.Data for this study are gathered through naturalistic driving experiments on 2-lane rural roads across five Chinese provinces.A total of 3502 visual facility environment images,alongside their corresponding operation speeds and speed limits,are collected.The improved LW-CNN exhibits high accuracy and efficiency in predicting operation speeds with these visual facility environment images,achieving a train loss of 0.05%and a validation loss of 0.15%.The semantics of facility environments affecting operation speeds are further identified by combining this LW-CNN with the gradient-weighted class activation mapping(Grad-CAM)algorithm and the semantic segmentation network.Then,six typical 2-lane rural road categories perceived by drivers with different operation speeds and speeding probability(SP)are sum-marized using k-means clustering.An objective and comprehensive analysis of each category’s semantic composition and depth features is conducted to evaluate their influence on drivers’speeding probability and road category perception.The findings of this study can be directly used to optimize facility environments from drivers’visual perception to decrease speeding-related crashes.
基金Project supported by the Directorate of Advanced Studies,Research&Technological Development,University of Engineering and Technology Taxila(No.UET/ASRTD/RG-1002-3)。
文摘Automated analysis of sports video summarization is challenging due to variations in cameras,replay speed,illumination conditions,editing effects,game structure,genre,etc.To address these challenges,we propose an effective video summarization framework based on shot classification and replay detection for field sports videos.Accurate shot classification is mandatory to better structure the input video for further processing,i.e.,key events or replay detection.Therefore,we present a lightweight convolutional neural network based method for shot classification.Then we analyze each shot for replay detection and specifically detect the successive batch of logo transition frames that identify the replay segments from the sports videos.For this purpose,we propose local octa-pattern features to represent video frames and train the extreme learning machine for classification as replay or non-replay frames.The proposed framework is robust to variations in cameras,replay speed,shot speed,illumination conditions,game structure,sports genre,broadcasters,logo designs and placement,frame transitions,and editing effects.The performance of our framework is evaluated on a dataset containing diverse YouTube sports videos of soccer,baseball,and cricket.Experimental results demonstrate that the proposed framework can reliably be used for shot classification and replay detection to summarize field sports videos.
文摘Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet behavior recognition models,this paper proposes GTB-ResNet,a network designed to detect abnormal behaviors in petroleum station staff.Design/methodology/approach-Firstly,to mitigate the issues of excessive parameters and computational complexity in 3D ResNet,a lightweight residual convolution module called the Ghost residual module(GhostNet)is introduced in the feature extraction network.Ghost convolution replaces standard convolution,reducing model parameters while preserving multi-scale feature extraction capabilities.Secondly,to enhance the model’s focus on salient features amidst wide surveillance ranges and small target objects,the triplet attention mechanism module is integrated to facilitate spatial and channel information interaction.Lastly,to address the challenge of short time-series features leading to misjudgments in similar actions,a bidirectional gated recurrent network is added to the feature extraction backbone network.This ensures the extraction of key long time-series features,thereby improving feature extraction accuracy.Findings-The experimental setup encompasses four behavior types:illegal phone answering,smoking,falling(abnormal)and touching the face(normal),comprising a total of 892 videos.Experimental results showcase GTB-ResNet achieving a recognition accuracy of 96.7%with a model parameter count of 4.46 M and a computational complexity of 3.898 G.This represents a 4.4%improvement over 3D ResNet,with reductions of 90.4%in parameters and 61.5%in computational complexity.Originality/value-Specifically designed for edge devices in oil stations,the 3D ResNet network is tailored for real-time action prediction.To address the challenges posed by the large number of parameters in 3D ResNet networks and the difficulties in deployment on edge devices,a lightweight residual module based on ghost convolution is developed.Additionally,to tackle the issue of low detection accuracy of behaviors amidst the noisy environment of petroleum stations,a triple attention mechanism is introduced during feature extraction to enhance focus on salient features.Moreover,to overcome the potential for misjudgments arising from the similarity of actions,a Bi-GRU model is introduced to enhance the extraction of key long-term features.