Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages...Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages,which hampers effective communication for non-standard language people.Here,we prepare an ultralight Ti_(3)C_(2)T_(x)MXene/chitosan/polyvinylidene difluoride composite aerogel with a detection range of 6.25 Pa-1200 k Pa,rapid response/recovery time,and low hysteresis(13.69%).The wearable aerogel pressure sensor can detect speech information through the throat muscle vibrations without any interference,allowing for accurate recognition of six dialects(96.2%accuracy)and seven different words(96.6%accuracy)with the assistance of convolutional neural networks.This work represents a significant step forward in silent speech recognition for human–machine interaction and physiological signal monitoring.展开更多
This research investigates intra-dialectal hierarchies within Northeastern Mandarin,focusing on the Shenyang and Jinzhou dialects,two closely related varieties in Liaoning province,China.The segmental features of thes...This research investigates intra-dialectal hierarchies within Northeastern Mandarin,focusing on the Shenyang and Jinzhou dialects,two closely related varieties in Liaoning province,China.The segmental features of these dialects are largely comparable;however,their suprasegmental characteristics,especially the intonation patterns in interrogatives,demonstrate considerable divergence.This enables us to examine how listeners utilize prosodic cues for both recognition and social assessment.The study,which involved recordings of speech,perception tests,and attitude surveys with ninety individuals from both local and non-local backgrounds,reveals a paradox:individuals struggle to accurately identify dialect origins through suprasegmental features,yet consistently evaluate Shenyang speech more favorably,indicating its status as the regional standard.This"misrecognition paradox"asserts that suprasegmental cues can sustain symbolic hierarchies even in the absence of accurate recognition,thus clarifying the implicit mechanisms that contribute to linguistic inequality.The results enhance sociophonetics and sociolinguistics by demonstrating how prosodic features facilitate intra-dialectal stratification and perpetuate social hierarchies beyond overt language classification.展开更多
A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-rela...A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use contextindependent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multipronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.展开更多
基金supported by the National Nature Science Foundation of China(No.62122030,62333008,62371205,52103208)National Key Research and Development Program of China(No.2021YFB3201300)+1 种基金Application and Basic Research of Jilin Province(20130102010 JC)Fundamental Research Funds for the Central Universities,Jilin Provincial Science and Technology Development Program(20230101072JC)。
文摘Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages,which hampers effective communication for non-standard language people.Here,we prepare an ultralight Ti_(3)C_(2)T_(x)MXene/chitosan/polyvinylidene difluoride composite aerogel with a detection range of 6.25 Pa-1200 k Pa,rapid response/recovery time,and low hysteresis(13.69%).The wearable aerogel pressure sensor can detect speech information through the throat muscle vibrations without any interference,allowing for accurate recognition of six dialects(96.2%accuracy)and seven different words(96.6%accuracy)with the assistance of convolutional neural networks.This work represents a significant step forward in silent speech recognition for human–machine interaction and physiological signal monitoring.
文摘This research investigates intra-dialectal hierarchies within Northeastern Mandarin,focusing on the Shenyang and Jinzhou dialects,two closely related varieties in Liaoning province,China.The segmental features of these dialects are largely comparable;however,their suprasegmental characteristics,especially the intonation patterns in interrogatives,demonstrate considerable divergence.This enables us to examine how listeners utilize prosodic cues for both recognition and social assessment.The study,which involved recordings of speech,perception tests,and attitude surveys with ninety individuals from both local and non-local backgrounds,reveals a paradox:individuals struggle to accurately identify dialect origins through suprasegmental features,yet consistently evaluate Shenyang speech more favorably,indicating its status as the regional standard.This"misrecognition paradox"asserts that suprasegmental cues can sustain symbolic hierarchies even in the absence of accurate recognition,thus clarifying the implicit mechanisms that contribute to linguistic inequality.The results enhance sociophonetics and sociolinguistics by demonstrating how prosodic features facilitate intra-dialectal stratification and perpetuate social hierarchies beyond overt language classification.
基金This paper is based upon a study supported by the US National Science Foundation under Grant No.0121285. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
文摘A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use contextindependent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multipronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.