Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstruc...Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.展开更多
A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state mach...A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state machine through an example of audio signal generator system based on Labview. The result shows that the introduction of the state machine can make complex design processes more clear and the revision of programs easier.展开更多
With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capac...With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.展开更多
With the development of cloud-based data centers and multimedia technologies, cloud-based multimedia service systems have been paid more and more attention. Audio highlights detection plays an important role in the cl...With the development of cloud-based data centers and multimedia technologies, cloud-based multimedia service systems have been paid more and more attention. Audio highlights detection plays an important role in the cloud-based multimedia service system. In this paper, we proposed a novel highlight detection method to extract the audio highlight effects for the cloud-based multimedia service system using the unsupervised approach. In the proposed method, we first extract the audio features for each audio document. Then the spectral clustering scheme was used to decompose the audio document into several audio effects. Then, we introduce the TF-IDF method to label the highlight effect. We design some experiments to evaluate the performance of the proposed method, and the experimental results show that our method can achieve satisfying results.展开更多
Low and audio frequency internal friction(IF)measurements in Y_(1)Ba_(2)Cu_(3)O_(7-x)system showed five IF peaks at 110,200(2430Hz),220,393 and 477K(~1Hz)respectively.The 477K peak rises gradul1y with the escaping of ...Low and audio frequency internal friction(IF)measurements in Y_(1)Ba_(2)Cu_(3)O_(7-x)system showed five IF peaks at 110,200(2430Hz),220,393 and 477K(~1Hz)respectively.The 477K peak rises gradul1y with the escaping of oxygen atoms from the specimen during measuring process in vacuum and shifts to higher temperature with increasing frequency.The 477K peak is attributed to the diffusional relaxation process of oxygen atoms related to the one-dimensional Cu-O chains in the orthorhombic phase.展开更多
An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The a...An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.展开更多
A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in th...A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.展开更多
Passive acoustic monitoring(PAM)technology is increasingly becoming one of the mainstream methods for bird monitoring.However,detecting bird audio within complex natural acoustic environments using PAM devices remains...Passive acoustic monitoring(PAM)technology is increasingly becoming one of the mainstream methods for bird monitoring.However,detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge.To enhance the accuracy(ACC)of bird audio detection(BAD)and reduce both false negatives and false positives,this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model(DFEFM).This method incorporates per-channel energy normalization(PCEN)to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients(MFCC)and frequency correlation matrices(FCM)as input features.It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches,and further integrates Spatial and Channel Synergistic Attention(SCSA)and Multi-Head Attention(MHA)modules to enhance the fusion effect of the aforementioned two deep features.Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4%and an AUC value of 0.963,with false negative and false positive rates of 11.36%and 7.40%,respectively,surpassing existing methods.The method also demonstrated detection ACC above 92%and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing.Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48%when processing an average of 10 s of audio,with a response time of only 0.557 s,showing excellent processing efficiency.This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices,which helps to save edge storage and information transmission costs,and has significant application value for wild bird monitoring and ecological research.展开更多
In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted fe...In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted features such asMel filters,often suffer frominformation loss and limited feature representation capabilities.To address these limitations,this study proposes an innovative end-to-end audio pattern recognition framework that directly processes raw audio signals,preserving original information and extracting effective classification features.The proposed framework utilizes a dual-branch architecture:a global refinement module that retains channel and temporal details and a multi-scale embedding module that captures high-level semantic information.Additionally,a guided fusion module integrates complementary features from both branches,ensuring a comprehensive representation of audio data.Specifically,the multi-scale audio context embedding module is designed to effectively extract spatiotemporal dependencies,while the global refinement module aggregates multi-scale channel and temporal cues for enhanced modeling.The guided fusion module leverages these features to achieve efficient integration of complementary information,resulting in improved classification accuracy.Experimental results demonstrate the model’s superior performance on multiple datasets,including ESC-50,UrbanSound8K,RAVDESS,and CREMA-D,with classification accuracies of 93.25%,90.91%,92.36%,and 70.50%,respectively.These results highlight the robustness and effectiveness of the proposed framework,which significantly outperforms existing approaches.By addressing critical challenges such as information loss and limited feature representation,thiswork provides newinsights and methodologies for advancing audio classification and multimodal interaction systems.展开更多
With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multi...With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.展开更多
基金funded by the Directorate of Research and Community Service,Directorate General of Research and Development,Ministry of Higher Education,Science and Technologyin accordance with the Implementation Contract for the Operational Assistance Program for State Universities,Research Program Number:109/C3/DT.05.00/PL/2025.
文摘Sudden wildfires cause significant global ecological damage.While satellite imagery has advanced early fire detection and mitigation,image-based systems face limitations including high false alarm rates,visual obstructions,and substantial computational demands,especially in complex forest terrains.To address these challenges,this study proposes a novel forest fire detection model utilizing audio classification and machine learning.We developed an audio-based pipeline using real-world environmental sound recordings.Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network(CNN),enabling the capture of distinctive fire acoustic signatures(e.g.,crackling,roaring)that are minimally impacted by visual or weather conditions.Internet of Things(IoT)sound sensors were crucial for generating complex environmental parameters to optimize feature extraction.The CNN model achieved high performance in stratified 5-fold cross-validation(92.4%±1.6 accuracy,91.2%±1.8 F1-score)and on test data(94.93%accuracy,93.04%F1-score),with 98.44%precision and 88.32%recall,demonstrating reliability across environmental conditions.These results indicate that the audio-based approach not only improves detection reliability but also markedly reduces computational overhead compared to traditional image-based methods.The findings suggest that acoustic sensing integrated with machine learning offers a powerful,low-cost,and efficient solution for real-time forest fire monitoring in complex,dynamic environments.
文摘A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state machine through an example of audio signal generator system based on Labview. The result shows that the introduction of the state machine can make complex design processes more clear and the revision of programs easier.
文摘With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.
基金supported by National Development and Reform Commission Information Security Special FundNational Key Basic Reseerch Program of China (973 program) under Grant No.2007CB311203
文摘With the development of cloud-based data centers and multimedia technologies, cloud-based multimedia service systems have been paid more and more attention. Audio highlights detection plays an important role in the cloud-based multimedia service system. In this paper, we proposed a novel highlight detection method to extract the audio highlight effects for the cloud-based multimedia service system using the unsupervised approach. In the proposed method, we first extract the audio features for each audio document. Then the spectral clustering scheme was used to decompose the audio document into several audio effects. Then, we introduce the TF-IDF method to label the highlight effect. We design some experiments to evaluate the performance of the proposed method, and the experimental results show that our method can achieve satisfying results.
文摘Low and audio frequency internal friction(IF)measurements in Y_(1)Ba_(2)Cu_(3)O_(7-x)system showed five IF peaks at 110,200(2430Hz),220,393 and 477K(~1Hz)respectively.The 477K peak rises gradul1y with the escaping of oxygen atoms from the specimen during measuring process in vacuum and shifts to higher temperature with increasing frequency.The 477K peak is attributed to the diffusional relaxation process of oxygen atoms related to the one-dimensional Cu-O chains in the orthorhombic phase.
基金Supported by the Integration and Application Project of Meteorological Key Technology of China Meteorological Administration(CMAGJ2012M30) Technology Development Projects of Tai'an Science and Technology Bureau in 2010 (201002045) and 2011
文摘An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.
文摘A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.
基金supported by the Beijing Natural Science Foundation(5252014)the National Natural Science Foundation of China(62303063)。
文摘Passive acoustic monitoring(PAM)technology is increasingly becoming one of the mainstream methods for bird monitoring.However,detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge.To enhance the accuracy(ACC)of bird audio detection(BAD)and reduce both false negatives and false positives,this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model(DFEFM).This method incorporates per-channel energy normalization(PCEN)to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients(MFCC)and frequency correlation matrices(FCM)as input features.It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches,and further integrates Spatial and Channel Synergistic Attention(SCSA)and Multi-Head Attention(MHA)modules to enhance the fusion effect of the aforementioned two deep features.Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4%and an AUC value of 0.963,with false negative and false positive rates of 11.36%and 7.40%,respectively,surpassing existing methods.The method also demonstrated detection ACC above 92%and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing.Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48%when processing an average of 10 s of audio,with a response time of only 0.557 s,showing excellent processing efficiency.This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices,which helps to save edge storage and information transmission costs,and has significant application value for wild bird monitoring and ecological research.
基金supported by the National Natural Science Foundation of China(62106214)the Hebei Natural Science Foundation(D2024203008)the Provincial Key Laboratory Performance Subsidy Project(22567612H).
文摘In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted features such asMel filters,often suffer frominformation loss and limited feature representation capabilities.To address these limitations,this study proposes an innovative end-to-end audio pattern recognition framework that directly processes raw audio signals,preserving original information and extracting effective classification features.The proposed framework utilizes a dual-branch architecture:a global refinement module that retains channel and temporal details and a multi-scale embedding module that captures high-level semantic information.Additionally,a guided fusion module integrates complementary features from both branches,ensuring a comprehensive representation of audio data.Specifically,the multi-scale audio context embedding module is designed to effectively extract spatiotemporal dependencies,while the global refinement module aggregates multi-scale channel and temporal cues for enhanced modeling.The guided fusion module leverages these features to achieve efficient integration of complementary information,resulting in improved classification accuracy.Experimental results demonstrate the model’s superior performance on multiple datasets,including ESC-50,UrbanSound8K,RAVDESS,and CREMA-D,with classification accuracies of 93.25%,90.91%,92.36%,and 70.50%,respectively.These results highlight the robustness and effectiveness of the proposed framework,which significantly outperforms existing approaches.By addressing critical challenges such as information loss and limited feature representation,thiswork provides newinsights and methodologies for advancing audio classification and multimodal interaction systems.
基金funded by theNational Science and Technology Council of Taiwan under the grant number NSTC 113-2221-E-035-058.
文摘With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.