Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propo...Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propose a synthetic speech detection model called TFTransformer,which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies.Structurally,the model is divided into two main components:a front-end and a back-end.The front-end of the model uses a combination of SincLayer and two-dimensional(2D)convolution to extract high-level feature maps(HFM)containing local dependency of the input speech signals.The back-end uses time-frequency Transformer module to process these feature maps and further capture global dependency.Furthermore,we propose TFTransformer-SE,which incorporates a channel attention mechanism within the 2D convolutional blocks.This enhancement aims to more effectively capture local dependencies,thereby improving the model’s performance.The experiments were conducted on the ASVspoof 2021 LA dataset,and the results showed that the model achieved an equal error rate(EER)of 3.37%without data augmentation.Additionally,we evaluated the model using the ASVspoof 2019 LA dataset,achieving an EER of 0.84%,also without data augmentation.This demonstrates that combining local and global dependencies in the time-frequency domain can significantly improve detection accuracy.展开更多
针对传统多重信号分类(multiple signal classification,MUSIC)算法在低信噪比环境和小型化麦克风阵列影响下的性能下降问题,提出了一种结合第一主向量法和子空间加权法的改进MUSIC算法。首先利用第一主向量法对传统MUSIC算法进行优化,...针对传统多重信号分类(multiple signal classification,MUSIC)算法在低信噪比环境和小型化麦克风阵列影响下的性能下降问题,提出了一种结合第一主向量法和子空间加权法的改进MUSIC算法。首先利用第一主向量法对传统MUSIC算法进行优化,得到改进的空间谱函数,以降低噪声对定位精度的影响:其次利用基于双指数模型的最小二乘法修正特征值,并对信号子空间和噪声子空间进行加权处理。仿真结果表明,改进后的MUSIC算法能够有效提升小型化麦克风阵列在低信噪比条件下对相近声源波达方向的估计精度,为声源定位系统的小型化应用提供了新的解决方案。展开更多
Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstruc...Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.展开更多
基金supported by project ZR2022MF330 supported by Shandong Provincial Natural Science Foundationthe National Natural Science Foundation of China under Grant No.61701286.
文摘Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propose a synthetic speech detection model called TFTransformer,which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies.Structurally,the model is divided into two main components:a front-end and a back-end.The front-end of the model uses a combination of SincLayer and two-dimensional(2D)convolution to extract high-level feature maps(HFM)containing local dependency of the input speech signals.The back-end uses time-frequency Transformer module to process these feature maps and further capture global dependency.Furthermore,we propose TFTransformer-SE,which incorporates a channel attention mechanism within the 2D convolutional blocks.This enhancement aims to more effectively capture local dependencies,thereby improving the model’s performance.The experiments were conducted on the ASVspoof 2021 LA dataset,and the results showed that the model achieved an equal error rate(EER)of 3.37%without data augmentation.Additionally,we evaluated the model using the ASVspoof 2019 LA dataset,achieving an EER of 0.84%,also without data augmentation.This demonstrates that combining local and global dependencies in the time-frequency domain can significantly improve detection accuracy.
文摘针对传统多重信号分类(multiple signal classification,MUSIC)算法在低信噪比环境和小型化麦克风阵列影响下的性能下降问题,提出了一种结合第一主向量法和子空间加权法的改进MUSIC算法。首先利用第一主向量法对传统MUSIC算法进行优化,得到改进的空间谱函数,以降低噪声对定位精度的影响:其次利用基于双指数模型的最小二乘法修正特征值,并对信号子空间和噪声子空间进行加权处理。仿真结果表明,改进后的MUSIC算法能够有效提升小型化麦克风阵列在低信噪比条件下对相近声源波达方向的估计精度,为声源定位系统的小型化应用提供了新的解决方案。
基金funded by the Directorate of Research and Community Service,Directorate General of Research and Development,Ministry of Higher Education,Science and Technologyin accordance with the Implementation Contract for the Operational Assistance Program for State Universities,Research Program Number:109/C3/DT.05.00/PL/2025.
文摘Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.