Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propo...Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propose a synthetic speech detection model called TFTransformer,which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies.Structurally,the model is divided into two main components:a front-end and a back-end.The front-end of the model uses a combination of SincLayer and two-dimensional(2D)convolution to extract high-level feature maps(HFM)containing local dependency of the input speech signals.The back-end uses time-frequency Transformer module to process these feature maps and further capture global dependency.Furthermore,we propose TFTransformer-SE,which incorporates a channel attention mechanism within the 2D convolutional blocks.This enhancement aims to more effectively capture local dependencies,thereby improving the model’s performance.The experiments were conducted on the ASVspoof 2021 LA dataset,and the results showed that the model achieved an equal error rate(EER)of 3.37%without data augmentation.Additionally,we evaluated the model using the ASVspoof 2019 LA dataset,achieving an EER of 0.84%,also without data augmentation.This demonstrates that combining local and global dependencies in the time-frequency domain can significantly improve detection accuracy.展开更多
Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstruc...Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.展开更多
除了降低用户的听觉效果和佩戴舒适度外,数字助听器中的回声也很容易形成啸叫,从而导致系统的不稳定。为了保持自适应算法的收敛速度与稳态误差之间的平衡,研究了助听器中的回声消除模型和自适应回声消除算法,并提出了一种改进的归一化...除了降低用户的听觉效果和佩戴舒适度外,数字助听器中的回声也很容易形成啸叫,从而导致系统的不稳定。为了保持自适应算法的收敛速度与稳态误差之间的平衡,研究了助听器中的回声消除模型和自适应回声消除算法,并提出了一种改进的归一化最小均方(normalized least mean square,NLMS)算法。该方法引入误差信号自动调整步长因子,以加快滤波器收敛和降低稳态误差。仿真结果表明,改进算法在回声消除方面表现出更好的性能。通过比较算法的均方误差(mean square error,MSE)曲线,改进算法具有更好的收敛速度和稳态误差,平均均方误差比NLMS算法低4.01 dB。改进后的NLMS算法性能良好,耗时相对较短,易于实现。展开更多
基金supported by project ZR2022MF330 supported by Shandong Provincial Natural Science Foundationthe National Natural Science Foundation of China under Grant No.61701286.
文摘Synthetic speech detection is an essential task in the field of voice security,aimed at identifying deceptive voice attacks generated by text-to-speech(TTS)systems or voice conversion(VC)systems.In this paper,we propose a synthetic speech detection model called TFTransformer,which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies.Structurally,the model is divided into two main components:a front-end and a back-end.The front-end of the model uses a combination of SincLayer and two-dimensional(2D)convolution to extract high-level feature maps(HFM)containing local dependency of the input speech signals.The back-end uses time-frequency Transformer module to process these feature maps and further capture global dependency.Furthermore,we propose TFTransformer-SE,which incorporates a channel attention mechanism within the 2D convolutional blocks.This enhancement aims to more effectively capture local dependencies,thereby improving the model’s performance.The experiments were conducted on the ASVspoof 2021 LA dataset,and the results showed that the model achieved an equal error rate(EER)of 3.37%without data augmentation.Additionally,we evaluated the model using the ASVspoof 2019 LA dataset,achieving an EER of 0.84%,also without data augmentation.This demonstrates that combining local and global dependencies in the time-frequency domain can significantly improve detection accuracy.
基金funded by the Directorate of Research and Community Service,Directorate General of Research and Development,Ministry of Higher Education,Science and Technologyin accordance with the Implementation Contract for the Operational Assistance Program for State Universities,Research Program Number:109/C3/DT.05.00/PL/2025.
文摘Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.
文摘除了降低用户的听觉效果和佩戴舒适度外,数字助听器中的回声也很容易形成啸叫,从而导致系统的不稳定。为了保持自适应算法的收敛速度与稳态误差之间的平衡,研究了助听器中的回声消除模型和自适应回声消除算法,并提出了一种改进的归一化最小均方(normalized least mean square,NLMS)算法。该方法引入误差信号自动调整步长因子,以加快滤波器收敛和降低稳态误差。仿真结果表明,改进算法在回声消除方面表现出更好的性能。通过比较算法的均方误差(mean square error,MSE)曲线,改进算法具有更好的收敛速度和稳态误差,平均均方误差比NLMS算法低4.01 dB。改进后的NLMS算法性能良好,耗时相对较短,易于实现。