Current spatio-temporal action detection methods lack sufficient capabilities in extracting and comprehending spatio-temporal information. This paper introduces an end-to-end Adaptive Cross-Scale Fusion Encoder-Decode...Current spatio-temporal action detection methods lack sufficient capabilities in extracting and comprehending spatio-temporal information. This paper introduces an end-to-end Adaptive Cross-Scale Fusion Encoder-Decoder (ACSF-ED) network to predict the action and locate the object efficiently. In the Adaptive Cross-Scale Fusion Spatio-Temporal Encoder (ACSF ST-Encoder), the Asymptotic Cross-scale Feature-fusion Module (ACCFM) is designed to address the issue of information degradation caused by the propagation of high-level semantic information, thereby extracting high-quality multi-scale features to provide superior features for subsequent spatio-temporal information modeling. Within the Shared-Head Decoder structure, a shared classification and regression detection head is constructed. A multi-constraint loss function composed of one-to-one, one-to-many, and contrastive denoising losses is designed to address the problem of insufficient constraint force in predicting results with traditional methods. This loss function enhances the accuracy of model classification predictions and improves the proximity of regression position predictions to ground truth objects. The proposed method model is evaluated on the popular dataset UCF101-24 and JHMDB-21. Experimental results demonstrate that the proposed method achieves an accuracy of 81.52% on the Frame-mAP metric, surpassing current existing methods.展开更多
Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature ma...Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.展开更多
According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are r...According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are rich in local details and simple in semantic features,an Encoder-Decoder network with shallow layers and high resolution is designed to improve the ability to represent detail information.Secondly,as the road area is a small proportion in remote sensing images,the cross-entropy loss function is improved,which solves the imbalance between positive and negative samples in the training process.Experiments on large road extraction datasets show that the proposed method gets the recall rate 83.9%,precision 82.5%and F1-score 82.9%,which can extract the road targets in remote sensing images completely and accurately.The Encoder-Decoder network designed in this paper performs well in the road extraction task and needs less artificial participation,so it has a good application prospect.展开更多
The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE)....The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE). 360-degree videos have already taken up the user’s behavior by storm.However, the users only focus on the part of 360-degree videos, known as aviewport. Despite the immense hype, 360-degree videos convey a loathsomeside effect about viewport prediction, making viewers feel uncomfortablebecause user viewport needs to be pre-fetched in advance. Ideally, we canminimize the bandwidth consumption if we know what the user motionin advance. Looking into the problem definition, we propose an EncoderDecoder based Long-Short Term Memory (LSTM) model to more accuratelycapture the non-linear relationship between past and future viewport positions. This model takes the transforming data instead of taking the direct inputto predict the future user movement. Then, this prediction model is combinedwith a rate adaptation approach that assigns the bitrates to various tiles for360-degree video frames under a given network capacity. Hence, our proposedwork aims to facilitate improved system performance when QoE parametersare jointly optimized. Some experiments were carried out and compared withexisting work to prove the performance of the proposed model. Last but notleast, the experiments implementation of our proposed work provides highuser’s QoE than its competitors.展开更多
As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical...As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).展开更多
Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological an...Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological and natural noise in the marine environ-ment.The feature extraction method combining time-frequency spectrograms and deep learning can effectively achieve the separation of noise and target signals.A fully convolutional encoder-decoder neural network(FCEDN)is proposed to address the issue of noise reduc-tion in underwater acoustic signals.The time-domain waveform map of underwater acoustic signals is converted into a wavelet low-frequency analysis recording spectrogram during the denoising process to preserve as many underwater acoustic signal characteristics as possible.The FCEDN is built to learn the spectrogram mapping between noise and target signals that can be learned at each time level.The transposed convolution transforms are introduced,which can transform the spectrogram features of the signals into listenable audio files.After evaluating the systems on the ShipsEar Dataset,the proposed method can increase SNR and SI-SNR by 10.02 and 9.5dB,re-spectively.展开更多
Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the futur...Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective.Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view,neglecting the differences between the two.To this end,we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view.We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder.In the decoder part,bidirectional long short-term memory(Bi-LSTM)blocks are adopted to generate the ultimate prediction of pedestrians’future trajectories.Our method was evaluated on a public dataset and achieved a competitive performance,compared with other approaches.An ablation study demonstrates the effectiveness of the action prediction branch.展开更多
Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural networ...Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultivated land from satellite images and uses it for agricultural automation solutions.The encoder consists of two part:the first is the modified Xception,it can used as the feature extraction network,and the second is the atrous convolution,it can used to expand the receptive field and the context information to extract richer feature information.The decoder part uses the conventional upsampling operation to restore the original resolution.In addition,we use the combination of BCE and Loves-hinge as a loss function to optimize the Intersection over Union(IoU).Experimental results show that the proposed network structure can solve the problem of cultivated land extraction in Yinchuan City.展开更多
高精度且鲁棒的预测模型建立高度依赖于样本数据的大小、多样性和分布;日益积累的文献数据为获得大量的多样性样本数据提供了可能。以SLM-ed IN 718合金的相对密度(RD)为研究对象,针对从文献中挖掘的激光功率P、扫描速度V、扫描间距HS...高精度且鲁棒的预测模型建立高度依赖于样本数据的大小、多样性和分布;日益积累的文献数据为获得大量的多样性样本数据提供了可能。以SLM-ed IN 718合金的相对密度(RD)为研究对象,针对从文献中挖掘的激光功率P、扫描速度V、扫描间距HS和铺粉厚度LT与RD样本数据存在缺失参数和分布不均问题,采用最大期望化(EM)算法对缺失参数进行补齐;采用带有梯度惩罚的WGAN算法(WGAN-GP)对数据稀疏的低RD区间生成虚拟样本数据。然后,分别基于补齐文献数据和补充虚拟数据,采用常青藤算法优化的随机森林(IVYA-RF)构建了RD预测模型,并对模型预测精度进行了定量评估和实验验证。结果表明:基于补充虚拟数据集构建的IVYA-RF模型II比基于补齐文献数据集构建的IVYA-RF模型I具有更好的预测精度,其原因主要来源于在低RD区间生成虚拟数据后,使建模样本数据的分布均匀性得到改善,这也是数据增强与机器学习相结合的意义所在。对新实验数据的验证取得了满意的预测精度,其中,IVYA-RF模型I验证结果的统计学参数R2(决定系数)、RMSE(均方根误差)、MAE(平均绝对误差)和MRE(平均相对误差)分别达到了0.891、1.352%、0.915%和0.98%;IVYA-RF模型II验证结果的R2增大至0.956,RMSE、MAE和MRE分别减小至0.833%、0.687%和0.71%,同样显示出后者比前者具有更好的预测精度。实验验证结果表明,所构建的RD预测模型具有较好的鲁棒性,从而具备了较好的工程应用价值。展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
基金support for this work was supported by Key Lab of Intelligent and Green Flexographic Printing under Grant ZBKT202301.
文摘Current spatio-temporal action detection methods lack sufficient capabilities in extracting and comprehending spatio-temporal information. This paper introduces an end-to-end Adaptive Cross-Scale Fusion Encoder-Decoder (ACSF-ED) network to predict the action and locate the object efficiently. In the Adaptive Cross-Scale Fusion Spatio-Temporal Encoder (ACSF ST-Encoder), the Asymptotic Cross-scale Feature-fusion Module (ACCFM) is designed to address the issue of information degradation caused by the propagation of high-level semantic information, thereby extracting high-quality multi-scale features to provide superior features for subsequent spatio-temporal information modeling. Within the Shared-Head Decoder structure, a shared classification and regression detection head is constructed. A multi-constraint loss function composed of one-to-one, one-to-many, and contrastive denoising losses is designed to address the problem of insufficient constraint force in predicting results with traditional methods. This loss function enhances the accuracy of model classification predictions and improves the proximity of regression position predictions to ground truth objects. The proposed method model is evaluated on the popular dataset UCF101-24 and JHMDB-21. Experimental results demonstrate that the proposed method achieves an accuracy of 81.52% on the Frame-mAP metric, surpassing current existing methods.
基金funded by the National Key Research and Development Program of China(Grant 2020YFB1708900)the Fundamental Research Funds for the Central Universities(Grant No.B220201044).
文摘Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.
基金National Natural Science Foundation of China(Nos.61673017,61403398)and Natural Science Foundation of Shaanxi Province(Nos.2017JM6077,2018ZDXM-GY-039)。
文摘According to the characteristics of the road features,an Encoder-Decoder deep semantic segmentation network is designed for the road extraction of remote sensing images.Firstly,as the features of the road target are rich in local details and simple in semantic features,an Encoder-Decoder network with shallow layers and high resolution is designed to improve the ability to represent detail information.Secondly,as the road area is a small proportion in remote sensing images,the cross-entropy loss function is improved,which solves the imbalance between positive and negative samples in the training process.Experiments on large road extraction datasets show that the proposed method gets the recall rate 83.9%,precision 82.5%and F1-score 82.9%,which can extract the road targets in remote sensing images completely and accurately.The Encoder-Decoder network designed in this paper performs well in the road extraction task and needs less artificial participation,so it has a good application prospect.
文摘The development of multimedia content has resulted in a massiveincrease in network traffic for video streaming. It demands such types ofsolutions that can be addressed to obtain the user’s Quality-of-Experience(QoE). 360-degree videos have already taken up the user’s behavior by storm.However, the users only focus on the part of 360-degree videos, known as aviewport. Despite the immense hype, 360-degree videos convey a loathsomeside effect about viewport prediction, making viewers feel uncomfortablebecause user viewport needs to be pre-fetched in advance. Ideally, we canminimize the bandwidth consumption if we know what the user motionin advance. Looking into the problem definition, we propose an EncoderDecoder based Long-Short Term Memory (LSTM) model to more accuratelycapture the non-linear relationship between past and future viewport positions. This model takes the transforming data instead of taking the direct inputto predict the future user movement. Then, this prediction model is combinedwith a rate adaptation approach that assigns the bitrates to various tiles for360-degree video frames under a given network capacity. Hence, our proposedwork aims to facilitate improved system performance when QoE parametersare jointly optimized. Some experiments were carried out and compared withexisting work to prove the performance of the proposed model. Last but notleast, the experiments implementation of our proposed work provides highuser’s QoE than its competitors.
基金Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-19-006A3).
文摘As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).
基金supported by the National Natural Science Foundation of China(No.41906169)the PLA Academy of Military Sciences.
文摘Noise reduction analysis of signals is essential for modern underwater acoustic detection systems.The traditional noise reduction techniques gradually lose efficacy because the target signal is masked by biological and natural noise in the marine environ-ment.The feature extraction method combining time-frequency spectrograms and deep learning can effectively achieve the separation of noise and target signals.A fully convolutional encoder-decoder neural network(FCEDN)is proposed to address the issue of noise reduc-tion in underwater acoustic signals.The time-domain waveform map of underwater acoustic signals is converted into a wavelet low-frequency analysis recording spectrogram during the denoising process to preserve as many underwater acoustic signal characteristics as possible.The FCEDN is built to learn the spectrogram mapping between noise and target signals that can be learned at each time level.The transposed convolution transforms are introduced,which can transform the spectrogram features of the signals into listenable audio files.After evaluating the systems on the ShipsEar Dataset,the proposed method can increase SNR and SI-SNR by 10.02 and 9.5dB,re-spectively.
文摘Accurate pedestrian trajectory predictions are critical in self-driving systems,as they are fundamental to the response-and decision-making of ego vehicles.In this study,we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective.Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view,neglecting the differences between the two.To this end,we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view.We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder.In the decoder part,bidirectional long short-term memory(Bi-LSTM)blocks are adopted to generate the ultimate prediction of pedestrians’future trajectories.Our method was evaluated on a public dataset and achieved a competitive performance,compared with other approaches.An ablation study demonstrates the effectiveness of the action prediction branch.
基金support for this work are as follows:Ningxia Hui Autonomous Region Key Research and Development Program Project:Research and demonstration application of key technologies for intelligent monitoring of spatial planning based on high-scoring remote sensing(Project No.2018YBZD1629).
文摘Cultivated land extraction is essential for sustainable development and agriculture.In this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultivated land from satellite images and uses it for agricultural automation solutions.The encoder consists of two part:the first is the modified Xception,it can used as the feature extraction network,and the second is the atrous convolution,it can used to expand the receptive field and the context information to extract richer feature information.The decoder part uses the conventional upsampling operation to restore the original resolution.In addition,we use the combination of BCE and Loves-hinge as a loss function to optimize the Intersection over Union(IoU).Experimental results show that the proposed network structure can solve the problem of cultivated land extraction in Yinchuan City.
文摘高精度且鲁棒的预测模型建立高度依赖于样本数据的大小、多样性和分布;日益积累的文献数据为获得大量的多样性样本数据提供了可能。以SLM-ed IN 718合金的相对密度(RD)为研究对象,针对从文献中挖掘的激光功率P、扫描速度V、扫描间距HS和铺粉厚度LT与RD样本数据存在缺失参数和分布不均问题,采用最大期望化(EM)算法对缺失参数进行补齐;采用带有梯度惩罚的WGAN算法(WGAN-GP)对数据稀疏的低RD区间生成虚拟样本数据。然后,分别基于补齐文献数据和补充虚拟数据,采用常青藤算法优化的随机森林(IVYA-RF)构建了RD预测模型,并对模型预测精度进行了定量评估和实验验证。结果表明:基于补充虚拟数据集构建的IVYA-RF模型II比基于补齐文献数据集构建的IVYA-RF模型I具有更好的预测精度,其原因主要来源于在低RD区间生成虚拟数据后,使建模样本数据的分布均匀性得到改善,这也是数据增强与机器学习相结合的意义所在。对新实验数据的验证取得了满意的预测精度,其中,IVYA-RF模型I验证结果的统计学参数R2(决定系数)、RMSE(均方根误差)、MAE(平均绝对误差)和MRE(平均相对误差)分别达到了0.891、1.352%、0.915%和0.98%;IVYA-RF模型II验证结果的R2增大至0.956,RMSE、MAE和MRE分别减小至0.833%、0.687%和0.71%,同样显示出后者比前者具有更好的预测精度。实验验证结果表明,所构建的RD预测模型具有较好的鲁棒性,从而具备了较好的工程应用价值。
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.