Offensive language detection has received important attention and plays a crucial role in promoting healthy communication on social platforms,as well as promoting the safe deployment of large language models.Training ...Offensive language detection has received important attention and plays a crucial role in promoting healthy communication on social platforms,as well as promoting the safe deployment of large language models.Training data is the basis for developing detectors;however,the available offense-related dataset in Chinese is severely limited in terms of data scale and coverage when compared to English resources.This significantly affects the accuracy of Chinese offensive language detectors in practical applications,especially when dealing with hard cases or out-of-domain samples.To alleviate the limitations posed by available datasets,we introduce AugCOLD(Augmented Chinese Offensive Language Dataset),a large-scale unsupervised dataset containing 1 million samples gathered by data crawling and model generation.Furthermore,we employ a multiteacher distillation framework to enhance detection performance with unsupervised data.That is,we build multiple teachers with publicly accessible datasets and use them to assign soft labels to AugCOLD.The soft labels serve as a bridge for knowledge to be distilled from both AugCOLD and multiteacher to the student network,i.e.,the final offensive detector.We conduct experiments on multiple public test sets and our well-designed hard tests,demonstrating that our proposal can effectively improve the generalization and robustness of the offensive language detector.展开更多
Importance:Heart sound auscultation is a routinely used physical examination in clinical practice to identify potential cardiac abnormalities. However, accurate interpretation of heart sounds requires specialized trai...Importance:Heart sound auscultation is a routinely used physical examination in clinical practice to identify potential cardiac abnormalities. However, accurate interpretation of heart sounds requires specialized training and experience, which limits its generalizability. Deep learning, a subset of machine learning, involves training artiffcial neural networks to learn from large datasets and perform complex tasks with intricate patterns. Over the past decade, deep learning has been successfully applied to heart sound analysis, achieving remarkable results and accumulating substantial heart sound data for model training. Although several reviews have summarized deep learning algorithms for heart sound analysis, there is a lack of comprehensive summaries regarding the available heart sound data and the clinical applications. Highlights:This review will compile the commonly used heart sound datasets, introduce the fundamentals and state-of-the-art techniques in heart sound analysis and deep learning, and summarize the current applications of deep learning for heart sound analysis, along with their limitations and areas for future improvement. Conclusions:The integration of deep learning into heart sound analysis represents a signiffcant advancement in clinical practice. The growing availability of heart sound datasets and the continuous development of deep learning techniques contribute to the improvement and broader clinical adoption of these models. However, ongoing research is needed to address existing challenges and reffne these technologies for broader clinical use.展开更多
基金supported by the National Science Foundation for Distinguished Young Scholars(with No.62125604)the NSFC projects(Key project with No.61936010 and regular project with No.61876096)+1 种基金supported by the Guoqiang Institute of Tsinghua University,with Grant No.2019GQG1 and 2020GQG0005sponsored by Tsinghua-Toyota Joint Research Fund.
文摘Offensive language detection has received important attention and plays a crucial role in promoting healthy communication on social platforms,as well as promoting the safe deployment of large language models.Training data is the basis for developing detectors;however,the available offense-related dataset in Chinese is severely limited in terms of data scale and coverage when compared to English resources.This significantly affects the accuracy of Chinese offensive language detectors in practical applications,especially when dealing with hard cases or out-of-domain samples.To alleviate the limitations posed by available datasets,we introduce AugCOLD(Augmented Chinese Offensive Language Dataset),a large-scale unsupervised dataset containing 1 million samples gathered by data crawling and model generation.Furthermore,we employ a multiteacher distillation framework to enhance detection performance with unsupervised data.That is,we build multiple teachers with publicly accessible datasets and use them to assign soft labels to AugCOLD.The soft labels serve as a bridge for knowledge to be distilled from both AugCOLD and multiteacher to the student network,i.e.,the final offensive detector.We conduct experiments on multiple public test sets and our well-designed hard tests,demonstrating that our proposal can effectively improve the generalization and robustness of the offensive language detector.
基金supported by the National Natural Science Foundation of China(No.62102008)the Peking University People’s Hospital Scientific Research Development Funds(RDJP2022-39)the Clinical Medicine Plus X-Young Scholars Project of Peking University,and the Fundamental Research Funds for the Central Universities(PKU2024LCXQ030).
文摘Importance:Heart sound auscultation is a routinely used physical examination in clinical practice to identify potential cardiac abnormalities. However, accurate interpretation of heart sounds requires specialized training and experience, which limits its generalizability. Deep learning, a subset of machine learning, involves training artiffcial neural networks to learn from large datasets and perform complex tasks with intricate patterns. Over the past decade, deep learning has been successfully applied to heart sound analysis, achieving remarkable results and accumulating substantial heart sound data for model training. Although several reviews have summarized deep learning algorithms for heart sound analysis, there is a lack of comprehensive summaries regarding the available heart sound data and the clinical applications. Highlights:This review will compile the commonly used heart sound datasets, introduce the fundamentals and state-of-the-art techniques in heart sound analysis and deep learning, and summarize the current applications of deep learning for heart sound analysis, along with their limitations and areas for future improvement. Conclusions:The integration of deep learning into heart sound analysis represents a signiffcant advancement in clinical practice. The growing availability of heart sound datasets and the continuous development of deep learning techniques contribute to the improvement and broader clinical adoption of these models. However, ongoing research is needed to address existing challenges and reffne these technologies for broader clinical use.