Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ n...Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.展开更多
三维(Three-dimension,3D)多媒体技术,尤其是和3D视频相比有所差距的3D音频技术受到了广泛的关注。当前三维音频技术研究可分为基于物理声场重建的多声道音频技术和基于感知的声音场景重建的多声道音频技术两大类。物理声场重建技术的...三维(Three-dimension,3D)多媒体技术,尤其是和3D视频相比有所差距的3D音频技术受到了广泛的关注。当前三维音频技术研究可分为基于物理声场重建的多声道音频技术和基于感知的声音场景重建的多声道音频技术两大类。物理声场重建技术的重要代表是基于球谐分解的声重放技术和波场合成技术(Wave field synthesis,WFS),基于感知的声音场景重建技术主要包括幅度平移技术(Amplitude panning,AP)和基于头相关传输函数的双耳重建技术(Head related transfer function,HRTF)。本文对上述4类三维音频技术及其对应的典型系统进行了介绍及对比分析,并对三维音频技术当前3大主要研究热点:空间听觉机制、三维音频压缩编码以及三维音频系统精简的现状与前沿技术进行了介绍。展开更多
文摘Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.
文摘三维(Three-dimension,3D)多媒体技术,尤其是和3D视频相比有所差距的3D音频技术受到了广泛的关注。当前三维音频技术研究可分为基于物理声场重建的多声道音频技术和基于感知的声音场景重建的多声道音频技术两大类。物理声场重建技术的重要代表是基于球谐分解的声重放技术和波场合成技术(Wave field synthesis,WFS),基于感知的声音场景重建技术主要包括幅度平移技术(Amplitude panning,AP)和基于头相关传输函数的双耳重建技术(Head related transfer function,HRTF)。本文对上述4类三维音频技术及其对应的典型系统进行了介绍及对比分析,并对三维音频技术当前3大主要研究热点:空间听觉机制、三维音频压缩编码以及三维音频系统精简的现状与前沿技术进行了介绍。