With the increasing application of surveillance cameras,vehicle re-identication(Re-ID)has attracted more attention in the eld of public security.Vehicle Re-ID meets challenge attributable to the large intra-class diff...With the increasing application of surveillance cameras,vehicle re-identication(Re-ID)has attracted more attention in the eld of public security.Vehicle Re-ID meets challenge attributable to the large intra-class differences caused by different views of vehicles in the traveling process and obvious inter-class similarities caused by similar appearances.Plentiful existing methods focus on local attributes by marking local locations.However,these methods require additional annotations,resulting in complex algorithms and insufferable computation time.To cope with these challenges,this paper proposes a vehicle Re-ID model based on optimized DenseNet121 with joint loss.This model applies the SE block to automatically obtain the importance of each channel feature and assign the corresponding weight to it,then features are transferred to the deep layer by adjusting the corresponding weights,which reduces the transmission of redundant information in the process of feature reuse in DenseNet121.At the same time,the proposed model leverages the complementary expression advantages of middle features of the CNN to enhance the feature expression ability.Additionally,a joint loss with focal loss and triplet loss is proposed in vehicle Re-ID to enhance the model’s ability to discriminate difcult-to-separate samples by enlarging the weight of the difcult-to-separate samples during the training process.Experimental results on the VeRi-776 dataset show that mAP and Rank-1 reach 75.5%and 94.8%,respectively.Besides,Rank-1 on small,medium and large sub-datasets of Vehicle ID dataset reach 81.3%,78.9%,and 76.5%,respectively,which surpasses most existing vehicle Re-ID methods.展开更多
Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, ...Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.展开更多
With the continuous development of face recognition network,the selection of loss function plays an increasingly important role in improving accuracy.The loss function of face recognition network needs to minimize the...With the continuous development of face recognition network,the selection of loss function plays an increasingly important role in improving accuracy.The loss function of face recognition network needs to minimize the intra-class distance while expanding the inter-class distance.So far,one of our mainstream loss function optimization methods is to add penalty terms,such as orthogonal loss,to further constrain the original loss function.The other is to optimize using the loss based on angular/cosine margin.The last is Triplet loss and a new type of joint optimization based on HST Loss and ACT Loss.In this paper,based on the three methods with good practical performance and the joint optimization method,various loss functions are thoroughly reviewed.展开更多
A novel joint optimization strategy for the secondary user( SU) was proposed to consider the short-term and long-term video transmissions over distributed cognitive radio networks( DCRNs).Since the long-term video tra...A novel joint optimization strategy for the secondary user( SU) was proposed to consider the short-term and long-term video transmissions over distributed cognitive radio networks( DCRNs).Since the long-term video transmission consisted of a series of shortterm transmissions, the optimization problem in the video transmission was a composite optimization process. Firstly,considering some factors like primary user's( PU's) collision limitations,non-synchronization between SU and PU,and SU's limited buffer size, the short-term optimization problem was formulated as a mixed integer non-linear program( MINLP) to minimize the block probability of video packets. Secondly,combining the minimum packet block probability obtained in shortterm optimization and SU's constraint on hardware complexity,the partially observable Markov decision process( POMDP) framework was proposed to learn PU's statistic information over DCRNs.Moreover,based on the proposed framework,joint optimization strategy was designed to obtain the minimum packet loss rate in long-term video transmission. Numerical simulation results were provided to demonstrate validity of our strategies.展开更多
基金supported,in part,by the National Nature Science Foundation of China under Grant Numbers 61502240,61502096,61304205,61773219in part,by the Natural Science Foundation of Jiangsu Province under Grant Numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘With the increasing application of surveillance cameras,vehicle re-identication(Re-ID)has attracted more attention in the eld of public security.Vehicle Re-ID meets challenge attributable to the large intra-class differences caused by different views of vehicles in the traveling process and obvious inter-class similarities caused by similar appearances.Plentiful existing methods focus on local attributes by marking local locations.However,these methods require additional annotations,resulting in complex algorithms and insufferable computation time.To cope with these challenges,this paper proposes a vehicle Re-ID model based on optimized DenseNet121 with joint loss.This model applies the SE block to automatically obtain the importance of each channel feature and assign the corresponding weight to it,then features are transferred to the deep layer by adjusting the corresponding weights,which reduces the transmission of redundant information in the process of feature reuse in DenseNet121.At the same time,the proposed model leverages the complementary expression advantages of middle features of the CNN to enhance the feature expression ability.Additionally,a joint loss with focal loss and triplet loss is proposed in vehicle Re-ID to enhance the model’s ability to discriminate difcult-to-separate samples by enlarging the weight of the difcult-to-separate samples during the training process.Experimental results on the VeRi-776 dataset show that mAP and Rank-1 reach 75.5%and 94.8%,respectively.Besides,Rank-1 on small,medium and large sub-datasets of Vehicle ID dataset reach 81.3%,78.9%,and 76.5%,respectively,which surpasses most existing vehicle Re-ID methods.
基金supported by Natural Science Foundation of Shandong Province,China(No.ZR2020QF007).
文摘Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.
基金This work was supported in part by the National Natural Science Foundation of China(Grant No.41875184)Innovation Team of“Six Talent Peaks”In Jiangsu Province(Grant No.TD-XYDXX-004).
文摘With the continuous development of face recognition network,the selection of loss function plays an increasingly important role in improving accuracy.The loss function of face recognition network needs to minimize the intra-class distance while expanding the inter-class distance.So far,one of our mainstream loss function optimization methods is to add penalty terms,such as orthogonal loss,to further constrain the original loss function.The other is to optimize using the loss based on angular/cosine margin.The last is Triplet loss and a new type of joint optimization based on HST Loss and ACT Loss.In this paper,based on the three methods with good practical performance and the joint optimization method,various loss functions are thoroughly reviewed.
基金National Natural Science Foundation of China(No.61301101)
文摘A novel joint optimization strategy for the secondary user( SU) was proposed to consider the short-term and long-term video transmissions over distributed cognitive radio networks( DCRNs).Since the long-term video transmission consisted of a series of shortterm transmissions, the optimization problem in the video transmission was a composite optimization process. Firstly,considering some factors like primary user's( PU's) collision limitations,non-synchronization between SU and PU,and SU's limited buffer size, the short-term optimization problem was formulated as a mixed integer non-linear program( MINLP) to minimize the block probability of video packets. Secondly,combining the minimum packet block probability obtained in shortterm optimization and SU's constraint on hardware complexity,the partially observable Markov decision process( POMDP) framework was proposed to learn PU's statistic information over DCRNs.Moreover,based on the proposed framework,joint optimization strategy was designed to obtain the minimum packet loss rate in long-term video transmission. Numerical simulation results were provided to demonstrate validity of our strategies.