Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic au...Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.展开更多
An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The a...An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.展开更多
Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually gen...Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.展开更多
In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the ...In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the home and simultaneously deliver them to a sink node for sound event detection. The proposed approach is mainly composed of three modules, including signal estimation, reliable sensor channel selection, and sound event detection. During signal estimation, lost packets are recovered to improve the signal quality. Next, reliable channels are selected using a multi-channel cross-correlation coefficient to improve the computational efficiency for distant sound event detection without sacrificing performance. Finally, the signals of the selected two channels are used for environmental sound event detection based on bidirectional gated recurrent neural networks using two-channel audio features. Experiments show that the proposed approach achieves superior performances compared to the baseline.展开更多
Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches ...Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches to multimedia transmissions to meet the growing volume demand and quality expectations of multimedia traffic. This paper studies network coding which is a promising paradigm that has the potential to improve the performance of networks for multimedia streaming applications in terms of packet delivery ratio (PDR), latency and jitter. This paper examines several network coding protocols for ad hoc wireless mesh networks and compares their performance on multimedia streaming applications with optimized broadcast protocols, e.g., BCast, Simplified Multicast Forwarding (SMF), and Partial Dominant Pruning (PDP). The results show that the performance increases significantly with the Random Linear Network Coding (RLNC) scheme.展开更多
With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Per...With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Perceptual Evaluation of Audio Quality) recommended by ITU, usually lead to poor results when assessing multichannel audio, which have little correlation with subjective scores. In this paper, a novel two-layer model based on Multiple Linear Regression(MLR) and Neural Network(NN) is proposed. Through the first layer, two indicators of multichannel audio, Audio Quality Score(AQS) and Spatial Perception Score(SPS) are derived, and through the second layer the overall score is output. The final results show that this model can not only improve the correlation with the subjective test score by 30.7% and decrease the Root Mean Square Error(RMSE) by 44.6%, but also add two new indicators: AQS and SPS, which can help reflect the multichannel audio quality more clearly.展开更多
Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ n...Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.展开更多
基金This work was supported by the Projects of the National Nat-ura! Science Foundation of China under Crant No.U0835001 the Fundamental Research Funds for the Central Universities-2011PTB-00-28.
文摘Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.
基金Supported by the Integration and Application Project of Meteorological Key Technology of China Meteorological Administration(CMAGJ2012M30) Technology Development Projects of Tai'an Science and Technology Bureau in 2010 (201002045) and 2011
文摘An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.
基金supported by the National Natural Science Foundation of China under Grant No. 61762005, 61231015, 61671335, 61702472, 61701194, 61761044, 61471271National High Technology Research and Development Program of China (863 Program) under Grant No. 2015AA016306+2 种基金 Hubei Province Technological Innovation Major Project under Grant No. 2016AAA015the Science Project of Education Department of Jiangxi Province under No. GJJ150585The Opening Project of Collaborative Innovation Center for Economics Crime Investigation and Prevention Technology, Jiangxi Province, under Grant No. JXJZXTCX-025
文摘Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF2015R1D1A1A01059804)the MSIP (Ministry of Science,ICT and Future Planning),Korea,under the ITRC(Information Technology Research Center) support program (IITP-2016-R2718-16-0011) supervised by the IITP(Institute for Information & communications Technology Promotion)the present Research has been conducted by the Research Grant of Kwangwoon University in 2017
文摘In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the home and simultaneously deliver them to a sink node for sound event detection. The proposed approach is mainly composed of three modules, including signal estimation, reliable sensor channel selection, and sound event detection. During signal estimation, lost packets are recovered to improve the signal quality. Next, reliable channels are selected using a multi-channel cross-correlation coefficient to improve the computational efficiency for distant sound event detection without sacrificing performance. Finally, the signals of the selected two channels are used for environmental sound event detection based on bidirectional gated recurrent neural networks using two-channel audio features. Experiments show that the proposed approach achieves superior performances compared to the baseline.
文摘Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches to multimedia transmissions to meet the growing volume demand and quality expectations of multimedia traffic. This paper studies network coding which is a promising paradigm that has the potential to improve the performance of networks for multimedia streaming applications in terms of packet delivery ratio (PDR), latency and jitter. This paper examines several network coding protocols for ad hoc wireless mesh networks and compares their performance on multimedia streaming applications with optimized broadcast protocols, e.g., BCast, Simplified Multicast Forwarding (SMF), and Partial Dominant Pruning (PDP). The results show that the performance increases significantly with the Random Linear Network Coding (RLNC) scheme.
基金supported by the National Natural Science Foundation of China (No.61571044,No.11590772,and No.61473041)
文摘With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Perceptual Evaluation of Audio Quality) recommended by ITU, usually lead to poor results when assessing multichannel audio, which have little correlation with subjective scores. In this paper, a novel two-layer model based on Multiple Linear Regression(MLR) and Neural Network(NN) is proposed. Through the first layer, two indicators of multichannel audio, Audio Quality Score(AQS) and Spatial Perception Score(SPS) are derived, and through the second layer the overall score is output. The final results show that this model can not only improve the correlation with the subjective test score by 30.7% and decrease the Root Mean Square Error(RMSE) by 44.6%, but also add two new indicators: AQS and SPS, which can help reflect the multichannel audio quality more clearly.
文摘Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.