1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity...1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity of strongly labeled training data,researchers have increasingly turned to semi-supervised learning(SSL)[1],which leverages unlabeled data to augment training and improve detection performance.Among many SSL methods[2-4].展开更多
Semi-supervised sound event detection(SSED)tasks typically leverage a large amount of unlabeled and synthetic data to facilitate model generalization during training,reducing overfitting on a limited set of labeled da...Semi-supervised sound event detection(SSED)tasks typically leverage a large amount of unlabeled and synthetic data to facilitate model generalization during training,reducing overfitting on a limited set of labeled data.However,the generalization training process often encounters challenges from noisy interference introduced by pseudo-labels or domain knowledge gaps.To alleviate noisy interference in class distribution learning,we propose an efficient semi-supervised class distribution learning method through dynamic prompt tuning,named prompting class distribution optimization(PADO).Specifically,when modeling real labeled data,PADO dynamically incorporates independent learnable prompt tokens to explore prior knowledge about the true distribution.Then,the prior knowledge serves as prompt information,dynamically interacting with the posterior noisy-class distribution information.In this case,PADO achieves class distribution optimization while maintaining model generalization,leading to a significant improvement in the efficiency of class distribution learning.Compared with state-of-the-art methods on the SSED datasets from DCASE 2019,2020,and 2021 challenges,PADO achieves significant performance improvements.Furthermore,it is readily extendable to other benchmark models.展开更多
Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,...Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.展开更多
基金supported by the Zhejiang Provincial Key R&D Program(Nos.2024C01108,2023C01030,2023C01034)the Hangzhou Key R&D Program(Nos.2023SZD0046,2024SZD1A03)the Ningbo Key R&D Program(No.2024Z114).
文摘1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity of strongly labeled training data,researchers have increasingly turned to semi-supervised learning(SSL)[1],which leverages unlabeled data to augment training and improve detection performance.Among many SSL methods[2-4].
基金supported by the National Natural Science Foundation of China(Nos.62176106 and U1836220)the Special Scientific Research Project of School of Emergency Management of Jiangsu University(No.KY-A-01)+2 种基金the Project of Faculty of Agricultural Engineering of Jiangsu University(No.NGXB20240101)the Post-graduate Research&Practice Innovation Program of Jiangsu Province(Nos.KYCX22_3668 and KYCX21_3373)the Jiangsu Key Research and Development Plan(No.BE2020036)。
文摘Semi-supervised sound event detection(SSED)tasks typically leverage a large amount of unlabeled and synthetic data to facilitate model generalization during training,reducing overfitting on a limited set of labeled data.However,the generalization training process often encounters challenges from noisy interference introduced by pseudo-labels or domain knowledge gaps.To alleviate noisy interference in class distribution learning,we propose an efficient semi-supervised class distribution learning method through dynamic prompt tuning,named prompting class distribution optimization(PADO).Specifically,when modeling real labeled data,PADO dynamically incorporates independent learnable prompt tokens to explore prior knowledge about the true distribution.Then,the prior knowledge serves as prompt information,dynamically interacting with the posterior noisy-class distribution information.In this case,PADO achieves class distribution optimization while maintaining model generalization,leading to a significant improvement in the efficiency of class distribution learning.Compared with state-of-the-art methods on the SSED datasets from DCASE 2019,2020,and 2021 challenges,PADO achieves significant performance improvements.Furthermore,it is readily extendable to other benchmark models.
基金supported by the National Natural Science Foundation of China(61877067)the Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002,TCGZ2021C003,6142414200511)the Natural Science Basic Research Program of Shaanxi(2021JZ-19)。
文摘Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.