期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Deep Scalogram Representations for Acoustic Scene Classification 被引量:5
1
作者 Zhao Ren Kun Qian +3 位作者 Zixing Zhang Vedhas Pandit Alice Baird Bjorn Schuller 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2018年第3期662-669,共8页
Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency info... Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly,the features extracted from a subsequent fully connected layer are fed into(bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer;finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events(DCASE). On the evaluation set, an accuracy of 64.0 % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy,when fusing with a spectrogram-based system. 展开更多
关键词 Acoustic scene classification(ASC) (bidirectional) gated recurrent neural networks((B) GRNNs) convolutional neural networks(CNNs) deep scalogram representation spectrogram representation
在线阅读 下载PDF
Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform
2
作者 Simone Hantke Tobias Olenyi +2 位作者 Christoph Hausner Tobias Appel Bjorn Schuller 《International Journal of Automation and computing》 EI CSCD 2019年第4期427-436,共10页
In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classificati... In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results. 展开更多
关键词 Human computation speech analysis crowdsourcing gamified data COLLECTION SURVEY
原文传递
Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance 被引量:1
3
作者 Manuel Milling Shuo Liu +2 位作者 Andreas Triantafyllopoulos Ilhan Aslan Björn W.Schuller 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第4期895-911,共17页
Neural network models for audio tasks,such as automatic speech recognition(ASR)and acoustic scene classification(ASC),are susceptible to noise contamination for real-life applications.To improve audio quality,an enhan... Neural network models for audio tasks,such as automatic speech recognition(ASR)and acoustic scene classification(ASC),are susceptible to noise contamination for real-life applications.To improve audio quality,an enhancement module,which can be developed independently,is explicitly used at the front-end of the target audio applications.In this paper,we present an end-to-end learning solution to jointly optimise the models for audio enhancement(AE)and the subsequent applications.To guide the optimisation of the AE module towards a target application,and especially to overcome difficult samples,we make use of the sample-wise performance measure as an indication of sample importance.In experiments,we consider four representative applications to evaluate our training paradigm,i.e.,ASR,speech command recognition(SCR),speech emotion recognition(SER),and ASC.These applications are associated with speech and nonspeech tasks concerning semantic and non-semantic features,transient and global information,and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models,especially at low signal-to-noise ratios,for a wide range of computer audition tasks in everyday-life noisy environments. 展开更多
关键词 audio enhancement computer audition joint optimisation multi-task learning voice suppression
原文传递
Detecting somatisation disorder via speech:introducing the Shenzhen Somatisation Speech Corpus
4
作者 Kun Qian Ruolan Huang +6 位作者 Zhihao Bao Yang Tan Zhonghao Zhao Mengkai Sun Bin Hu Björn W.Schuller Yoshiharu Yamamoto 《Intelligent Medicine》 EI CSCD 2024年第2期96-103,共8页
Objective Speech recognition technology is widely used as a mature technical approach in many fields.In the study of depression recognition,speech signals are commonly used due to their convenience and ease of acquisi... Objective Speech recognition technology is widely used as a mature technical approach in many fields.In the study of depression recognition,speech signals are commonly used due to their convenience and ease of acquisition.Though speech recognition is popular in the research field of depression recognition,it has been little studied in somatisation disorder recognition.The reason for this is the lack of a publicly accessible database of relevant speech and benchmark studies.To this end,we introduced our somatisation disorder speech database and gave benchmark results.Methods By collecting speech samples of somatisation disorder patients,in cooperation with the Shenzhen University General Hospital,we introduced our somatisation disorder speech database,the Shenzhen Somatisation Speech Corpus(SSSC).Moreover,a benchmark for SSSC using classic acoustic features and a machine learning model was proposed in our work.Results To obtain a more scientific benchmark,we compared and analysed the performance of different acoustic features,i.e.,the full ComPare feature set,or only Mel frequency cepstral coefficients(MFCCs),fundamental frequency(F0),and frequency and bandwidth of the formants(F1-F3).By comparison,the best result of our benchmark was the 76.0%unweighted average recall achieved by a support vector machine with formants F1–F3.Conclusion The proposal of SSSC may bridge a research gap in somatisation disorder,providing researchers with a publicly accessible speech database.In addition,the results of the benchmark could show the scientific validity and feasibility of computer audition for speech recognition in somatization disorders. 展开更多
关键词 Somatisation disorder Machine learning Healthcare Computer audition
原文传递
Federated Abnormal Heart Sound Detection with Weak to No Labels
5
作者 Wanyong Qiu Chen Quan +5 位作者 Yongzi Yu Eda Kara Kun Qian Bin Hu Bjorn W.Schuller Yoshiharu Yamamoto 《Cyborg and Bionic Systems》 2024年第1期91-107,共17页
Cardiovascular diseases are a prominent cause of mortality,emphasizing the need for early prevention and diagnosis.Utilizing artificial intelligence(AI)models,heart sound analysis emerges as a noninvasive and universa... Cardiovascular diseases are a prominent cause of mortality,emphasizing the need for early prevention and diagnosis.Utilizing artificial intelligence(AI)models,heart sound analysis emerges as a noninvasive and universally applicable approach for assessing cardiovascular health conditions.However,real-world medical data are dispersed across medical institutions,forming“data islands”due to data sharing limitations for security reasons.To this end,federated learning(FL)has been extensively employed in the medical field,which can effectively model across multiple institutions.Additionally,conventional supervised classification methods require fully labeled data classes,e.g.,binary classification requires labeling of positive and negative samples.Nevertheless,the process of labeling healthcare data is timeconsuming and labor-intensive,leading to the possibility of mislabeling negative samples.In this study,we validate an FL framework with a naive positive-unlabeled(PU)learning strategy.Semisupervised FL model can directly learn from a limited set of positive samples and an extensive pool of unlabeled samples.Our emphasis is on vertical-FL to enhance collaboration across institutions with different medical record feature spaces.Additionally,our contribution extends to feature importance analysis,where we explore 6 methods and provide practical recommendations for detecting abnormal heart sounds.The study demonstrated an impressive accuracy of 84%,comparable to outcomes in supervised learning,thereby advancing the application of FL in abnormal heart sound detection. 展开更多
关键词 federated learning semi supervised learning feature importance analysis vertical federated learning abnormal heart sound detection artificial intelligence ai modelsheart sound analysis cardiovascular diseases weak labels
原文传递
Learning Representations from Heart Sound:A Comparative Study on Shallow and Deep Models
6
作者 Kun Qian Zhihao Bao +12 位作者 Zhonghao Zhao Tomoya Koike Fengquan Dong Maximilian Schmitt Qunxi Dong Jian Shen Weipeng Jiang Yajuan Jiang Bo Dong Zhenyu Dai Bin Hu Björn W.Schuller Yoshiharu Yamamoto 《Cyborg and Bionic Systems》 2024年第1期687-698,共12页
Leveraging the power of artificial intelligence to facilitate an automatic analysis and monitoring of heart sounds has increasingly attracted tremendous efforts in the past decade.Nevertheless,lacking on standard open... Leveraging the power of artificial intelligence to facilitate an automatic analysis and monitoring of heart sounds has increasingly attracted tremendous efforts in the past decade.Nevertheless,lacking on standard open-access database made it difficult to maintain a sustainable and comparable research before the first release of the PhysioNet CinC Challenge Dataset.However,inconsistent standards on data collection,annotation,and partition are still restraining a fair and efficient comparison between different works.To this line,we introduced and benchmarked a first version of the Heart Sounds Shenzhen(HSS)corpus.Motivated and inspired by the previous works based on HSS,we redefined the tasks and make a comprehensive investigation on shallow and deep models in this study.First,we segmented the heart sound recording into shorter recordings(10 s),which makes it more similar to the human auscultation case.Second,we redefined the classification tasks.Besides using the 3 class categories(normal,moderate,and mild/severe)adopted in HSS,we added a binary classification task in this study,i.e.,normal and abnormal.In this work,we provided detailed benchmarks based on both the classic machine learning and the state-of-the-art deep learning technologies,which are reproducible by using open-source toolkits.Last but not least,we analyzed the feature contributions of best performance achieved by the benchmark to make the results more convincing and interpretable. 展开更多
关键词 deep learning physionet cinc challenge datasethoweverinconsistent heart sound classification tasks analysis monitoring heart sounds shallow models deep models machine learning
原文传递
The Voice of the Body:Why AI Should Listen to It and an Archive 被引量:1
7
作者 Kun Qian Bin Hu +1 位作者 Yoshiharu Yamamoto Bjorn W.Schuller 《Cyborg and Bionic Systems》 EI CAS 2023年第1期520-522,共3页
The sound generated by body carries important information about our health status physically and psychologically.In the past decades,we have witnessed a plethora of successes achieved in the field of body sound analys... The sound generated by body carries important information about our health status physically and psychologically.In the past decades,we have witnessed a plethora of successes achieved in the field of body sound analysis.Nevertheless,the fundamentals of this young field are still not well established.In particular,publicly accessible databases are rarely developed,which dramatically restrains a sustainable research.To this end,we are launching and continuously calling for participation from the global scientific community to contribute to the Voice of the Body(VoB)archive.We aim to build an open access platform to collect the well-established body sound databases in a well standardized way.Moreover,we hope to organize a series of challenges to promote the development of audio-driven methods for healthcare via the proposed VoB.We believe that VoB can help break the walls between different subjects toward an era of Medicine 4.0 enriched by audio intelligence. 展开更多
关键词 VOICE SOUND restrain
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部