期刊文献+
共找到21,102篇文章
< 1 2 250 >
每页显示 20 50 100
Educating for Civil Discourse: Fortifying the Fragile Path Between Too Little and Too Much Freedom of Speech
1
作者 Susan T.Gardner Daniel J.Anderson 《Philosophy Study》 2025年第2期61-75,共15页
“civil discourse”amongst multiple individuals with diverse viewpoints is necessary to move toward truth,to maintain democratic buoyancy,and to get the most accurate read on how best to move forward toward our collec... “civil discourse”amongst multiple individuals with diverse viewpoints is necessary to move toward truth,to maintain democratic buoyancy,and to get the most accurate read on how best to move forward toward our collective good,civil discourse is nonetheless under catastrophic threat by contemporary forces that include the sloppy use of the term“hate speech”;the“libelling by labeling”(aka“cancelling”)in the public square of social media;technologically powered disinformation campaigns;and the growth of“safetyism”in academia.In light of these threats,the goal must be to convince educators,particularly philosophical educators,of the need to adopt a whole new focus in education,namely one that puts a spotlight on the fact that the utilization of the freedom of speech to destroy the freedom of speech of others utterly undermines the positive value of freedom of speech.In order to motivate individuals to turn their back on the dopamine rush of shutting someone down,educators must also spend a great deal of time showcasing the merits of“civil discourse”by providing young people with extensive experience in engaging in facilitated“civil discourse”(aka Communities of Philosophical Inquiry)so that its value can be woven into a personal commitment. 展开更多
关键词 civil discourse freedom of speech Communities of Philosophical Inquiry hate speech cancelling disinformation safetyism
在线阅读 下载PDF
Correction to DeepCNN:Spectro-temporal feature representation for speech emotion recognition
2
《CAAI Transactions on Intelligence Technology》 2025年第2期633-633,共1页
Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf shou... Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf should be[Independent Researcher,UK]. 展开更多
关键词 independent researcher speech emotion recognition deep cnn uk speech emotion recognitioncaai spectro temporal feature representation hafiz tayyab rauf
在线阅读 下载PDF
Gender effect on Psychoacoustical abilities and Speech perception in noise in individuals with Sickle cell anemia
3
作者 Preeti Sahu Animesh Barman 《Journal of Otology》 2025年第3期142-148,共7页
Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and female... Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and females, possibly true for SCA due to gender-specific disease pathophysiological changes. Objective: To investigate gender differences in psychoacoustical abilities, and speech perception in noise in SCA individuals and further compare with normal healthy(NH) population. Methods: 80 SCA and 80 NH normal-hearing participants aged 15-40 years were included and further grouped based on gender. Auditory discrimination for frequency, intensity, and duration at 500Hz and 4000Hz;temporal processing(Gap detection threshold & Modulation Detection Threshold) and Speech Perception In Noise(SPIN) at 0d BSNR tests were evaluated and compared between males and females of SCA and NH population. Results: SCA performed poorer compared to NH for all experimental measures. In the NH population, males performed poorer than females in psychoacoustical measures whereas within the SCA population, the reverse was true. Female participants performed better in the SPIN test in both populations. Conclusions: The adverse impact of SCA on the auditory system due to circulatory changes might cause poorer performance in SCA. Poorer performance by Female SCA is possibly due to the contrary impact of lower Hb level overlying Sickle disease.Estrogen levels and gender preference in auditory processing might lead to better performance by females within the NH population. SPIN performance depends on different attentional demands and sensorimotor processing strategies in noise beyond psychoacoustical processing may lead to better female performance in both populations. 展开更多
关键词 psychoacoustical abilities cell anemia sca GENDER sickle cell anemia psychoacoustical abilitiesand speech perception genetic hemoglobin disordersuggests speech perception noise
暂未订购
Text-and-Timbre-Based Speech Semantic Coding for Ultra-Low-Bitrate Communications
4
作者 Yang Xiaoniu Qian Liping +2 位作者 Lyu Sikai Wang Qian Wang Wei 《China Communications》 2025年第1期7-24,共18页
To address the contradiction between the explosive growth of wireless data and the limited spectrum resources,semantic communication has been emerging as a promising communication paradigm.In this paper,we thus design... To address the contradiction between the explosive growth of wireless data and the limited spectrum resources,semantic communication has been emerging as a promising communication paradigm.In this paper,we thus design a speech semantic coded communication system,referred to as Deep-STS(i.e.,Deep-learning based Speech To Speech),for the lowbandwidth speech communication.Specifically,we first deeply compress the speech data through extracting the textual information from the speech based on the conformer encoder and connectionist temporal classification decoder at the transmitter side of Deep-STS system.In order to facilitate the final speech timbre recovery,we also extract the short-term timbre feature of speech signals only for the starting 2s duration by the long short-term memory network.Then,the Reed-Solomon coding and hybrid automatic repeat request protocol are applied to improve the reliability of transmitting the extracted text and timbre feature over the wireless channel.Third,we reconstruct the speech signal by the mel spectrogram prediction network and vocoder,when the extracted text is received along with the timbre feature at the receiver of Deep-STS system.Finally,we develop the demo system based on the USRP and GNU radio for the performance evaluation of Deep-STS.Numerical results show that the ac-Received:Jan.17,2024 Revised:Jun.12,2024 Editor:Niu Kai curacy of text extraction approaches 95%,and the mel cepstral distortion between the recovered speech signal and the original one in the spectrum domain is less than 10.Furthermore,the experimental results show that the proposed Deep-STS system can reduce the total delay of speech communication by 85%on average compared to the G.723 coding at the transmission rate of 5.4 kbps.More importantly,the coding rate of the proposed Deep-STS system is extremely low,only 0.2 kbps for continuous speech communication.It is worth noting that the Deep-STS with lower coding rate can support the low-zero-power speech communication,unveiling a new era in ultra-efficient coded communications. 展开更多
关键词 low coding rate semantic communication speech recognition speech synthesis
在线阅读 下载PDF
Non-Musicians Experience Early Aging in Speech Perception in Noise Abilities Compared to Musicians
5
作者 Kruthika.S. Ajith Kumar Uppunda 《Journal of Otology》 2025年第2期133-140,共8页
Background:Research has shown that musicians outperform non-musicians in speech perception in noise(SPiN)tasks.However,it remains unclear whether the advantages of musical training are substantial enough to slow down ... Background:Research has shown that musicians outperform non-musicians in speech perception in noise(SPiN)tasks.However,it remains unclear whether the advantages of musical training are substantial enough to slow down the decline in SPiN performance associated with aging.Objectives:Therefore,we assessed SPiN performances in a continuum of age groups comprising musicians and non-musicians.The goal was to compare how the aging process affected SPiN performances of musicians and non-musicians.Method:A cross-sectional descriptive mixed design was used,involving 150 participants divided into 75 musicians and 75 non-musicians.Each age group(10-19,20-29,30-39,40-49,and 50-59)consisted of15 musicians and 15 non-musicians.Six Kannada sentence lists were combined with four-talker babble.At+5,0,and-5 dB signal-to-noise ratios(SNRs),the percent correct Speech Identification Scores were calculated.Results:The repeated measure ANOVA(RM ANOVA)revealed significant main effects and interaction effects between SNR,musicianship,and age groups(p<0.05).A small to large effect size was noted(ηp2=0.05 to0.17).A significant interaction effect and follow-up post hoc tests showed that SPiN abilities deteriorated more rapidly with increasing age in nonmusicians compared to musicians,especially at difficult SNRs.Conclusions:Musicians had better SPiN abilities than non-musicians across all age groups.Also,age-related deterioration in SPiN abilities was faster in non-musicians compared to musicians. 展开更多
关键词 Music training Age effects speech Perception in Noise Signal to Noise Ratio speech Identification Scores
暂未订购
基于联合微调CLIP和Fastspeech2的盲文图像-语音生成
6
作者 孙恩威 徐春 《计算机时代》 2025年第5期28-34,39,共8页
为提升视障人士的阅读效率,构建了一种适用于中文语言场景下的盲文图像-语音转换框架:CLIPViT-H/14-KNN-FastSpeech2。采取先独立预训练再联合微调的策略:首先,将中文CLIP和FastSpeech2文本转语音模型在公开数据集中分别预训练并验证其... 为提升视障人士的阅读效率,构建了一种适用于中文语言场景下的盲文图像-语音转换框架:CLIPViT-H/14-KNN-FastSpeech2。采取先独立预训练再联合微调的策略:首先,将中文CLIP和FastSpeech2文本转语音模型在公开数据集中分别预训练并验证其收敛性;然后,在此基础上利用盲文图像数据集进行联合微调。实验结果表明:模型在PER等指标上均有所提高,验证了模型在有限数据下仍具备合成高质量语音的能力以及联合训练策略的有效性。 展开更多
关键词 盲文图像 图像-语音转换 CLIP Fastspeech2 联合微调
在线阅读 下载PDF
Enhancing Phoneme Labeling in Dysarthric Speech with Digital Twin-Driven Multi-Modal Architecture
7
作者 Saeed Alzahrani Nazar Hussain Farah Mohammad 《Computers, Materials & Continua》 2025年第9期4825-4849,共25页
Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients.This paper presents a novel multi-modal architecture leveraging digital twins to enhance p... Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients.This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling.By integrating real-time images,electronic health records,and genomic information,the system enables personalized simulations for disease progression modeling,treatment response prediction,and preventive care strategies.In dysarthric speech,which is characterized by articulation imprecision,temporal misalignments,and phoneme distortions,existing models struggle to capture these irregularities.Traditional approaches,often relying solely on audio features,fail to address the full complexity of phoneme variations,leading to increased phoneme error rates(PER)and word error rates(WER).To overcome these challenges,we propose a novel multi-modal architecture that integrates both audio and articulatory data through a combination of Temporal Convolutional Networks(TCNs),Graph Convolutional Networks(GCNs),Transformer Encoders,and a cross-modal attention mechanism.The audio branch of the model utilizes TCNs and Transformer Encoders to capture both short-and long-term dependencies in the audio signal,while the articulatory branch leverages GCNs to model spatial relationships between articulators,such as the lips,jaw,and tongue,allowing the model to detect subtle articulatory imprecisions.A cross-modal attention mechanism fuses the encoded audio and articulatory features,enabling dynamic adjustment of the model’s focus depending on input quality,which significantly improves phoneme labeling accuracy.The proposed model consistently outperforms existing methods,achieving lower Phoneme Error Rates(PER),Word Error Rates(WER),and Articulatory Feature Misclassification Rates(AFMR).Specifically,across all datasets,the model achieves an average PER of 13.43%,an average WER of 21.67%,and an average AFMR of 12.73%.By capturing both the acoustic and articulatory intricacies of speech,this comprehensive approach not only improves phoneme labeling precision but also marks substantial progress in speech recognition technology for individuals with dysarthria. 展开更多
关键词 Dysarthric speech phoneme labelling TCNs GCNs TRANSFORMERS
暂未订购
Deviation of Transitivity in Ba-constructions from the Perspective of Mode and Speech Acts
8
作者 ZHANG Shuo 《Journal of Literature and Art Studies》 2025年第6期460-473,共14页
Ba-constructions carry high level of transitivity,but a deviation towards low transitivity happens.It is found that mode provides axis of this transitivity deviation,especially in spoken dialogue.Under the influence o... Ba-constructions carry high level of transitivity,but a deviation towards low transitivity happens.It is found that mode provides axis of this transitivity deviation,especially in spoken dialogue.Under the influence of irrealis mode,parameters such as aspects,affectedness of O,individuation of O,and affirmation exhibit different degrees of transitivity deviation.Speech acts,which is closely related to mode,are the driving force in discourse of this phenomenon.The composition rules of speech acts of directive,declarative,commitment,and emotive,which account for the majority of speech acts in spoken dialogue,determine that they are all irrealis.Therefore,under the axis of irrealis mode,several transitivity parameters of ba-constructions in oral dialogue deviates towards low transitivity.The phenomenon of deviation of transitivity in ba-constructions verifies transitivity hypothesis. 展开更多
关键词 ba-constructions TRANSITIVITY speech acts MODE
在线阅读 下载PDF
The Effect of Binaural Beamforming Technology on Mandarin Speech Recognition in Babble Noise for Bimodal Hearing CI users
9
作者 Aiting Chen Mengdi Hong +17 位作者 Jianan Li Qian Wang Nan Li Lumeng Han Qian Wu Haihong Liu Yidi Liu Yue Long Fangxia Hu Jianfen Luo Lei Xu Zhaomin Fan Peng Lin Wei Wang Yue Wang Yu Chen Zhaohui Hou Fei Ji 《Journal of Otology》 2025年第3期157-161,共5页
PurposeThe purpose of the study was to investigate the effect of bimodal beamforming on speech recognition and comfort for cochlear implant (CI) users with the bimodal hearing solution made up by linking a hearing aid... PurposeThe purpose of the study was to investigate the effect of bimodal beamforming on speech recognition and comfort for cochlear implant (CI) users with the bimodal hearing solution made up by linking a hearing aid to the CI sound processor.Methods19 subjects participated in this study. Speech tests were conducted in quiet and in noisy environments, with the target speech presented from 0° and the noise signal from 45°. Speech recognition thresholds (SRTs) were compared among the previously used bimodal hearing configuration (baseline, any CI sound processor plus any hearing aid), the Naída Bimodal Hearing Solution with omnidirectional microphone, and with directional microphone (so called StereoZoom) switched on. In addition, the study participants provided subjective feedback on their hearing impressions.ResultsThe SRT results showed no significant difference among the three hearing conditions in the quiet environment. No significant improvement was reported when using Naída bimodal system with omnidirectional microphone in noise compared to the baseline (p=0.27). When applying StereoZoom, SRT in noise showed significant improvements compared to omnidirectional settings (p<0.05). Subjective feedback showed that 13 participants were satisfied with Naída Bimodal Hearing Solution, and wanted to continue using it after the trial.ConclusionThe Naída Bimodal Hearing Solution with the same pre-processing algorithm can provide satisfying hearing performance. Beamforming technology can further improve speech perception in noisy environments. 展开更多
关键词 BIMODAL Cochlear Implant speech recognition beamforming directional microphone
暂未订购
A Pragmatic Study of Dialogues in Flipped in Light of Speech Act Theory
10
作者 JI Ming-ming 《Journal of Literature and Art Studies》 2025年第10期769-774,共6页
Flipped is a book written by American author Wendelin Van Draanen. It is a novel about young teenagers and was adapted into the famous film of the same name in 2010. The thesis employs speech acts, as pioneered by Joh... Flipped is a book written by American author Wendelin Van Draanen. It is a novel about young teenagers and was adapted into the famous film of the same name in 2010. The thesis employs speech acts, as pioneered by John Austin and further developed by John Searle, to investigate the influence of dialogues on characterization and plot development in Flipped. By exploring the theory of speech acts presented in dialogues between characters, the author deciphers the underlying intentions embodied in the dialogues and demonstrates the importance of the use of speech acts in dialogues in revealing the characters, driving the development of the plot and expressing the theme of the text. 展开更多
关键词 PRAGMATICS speech act theory Flipped DIALOGUE literary effect
在线阅读 下载PDF
E-SWAN:Efficient Sliding Window Analysis Network for Real-Time Speech Steganography Detection
11
作者 Kening Wang Feipeng Gao +1 位作者 Jie Yang Hao Zhang 《Computers, Materials & Continua》 2025年第3期4797-4820,共24页
With the rapid advancement of Voice over Internet Protocol(VoIP)technology,speech steganography techniques such as Quantization Index Modulation(QIM)and Pitch Modulation Steganography(PMS)have emerged as significant c... With the rapid advancement of Voice over Internet Protocol(VoIP)technology,speech steganography techniques such as Quantization Index Modulation(QIM)and Pitch Modulation Steganography(PMS)have emerged as significant challenges to information security.These techniques embed hidden information into speech streams,making detection increasingly difficult,particularly under conditions of low embedding rates and short speech durations.Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals.To address these challenges,this paper proposes an Efficient Sliding Window Analysis Network(E-SWAN),a novel deep learning model specifically designed for real-time speech steganalysis.E-SWAN integrates two core modules:the LSTM Temporal Feature Miner(LTFM)and the Convolutional Key Feature Miner(CKFM).LTFM captures long-range temporal dependencies using Long Short-Term Memory networks,while CKFM identifies local spatial variations caused by steganographic embedding through convolutional operations.These modules operate within a sliding window framework,enabling efficient extraction of temporal and spatial features.Experimental results on the Chinese CNV and PMS datasets demonstrate the superior performance of E-SWAN.Under conditions of a ten-second sample duration and an embedding rate of 10%,E-SWAN achieves a detection accuracy of 62.09%on the PMS dataset,surpassing existing methods by 4.57%,and an accuracy of 82.28%on the CNV dataset,outperforming state-of-the-art methods by 7.29%.These findings validate the robustness and efficiency of E-SWAN under low embedding rates and short durations,offering a promising solution for real-time VoIP steganalysis.This work provides significant contributions to enhancing information security in digital communications. 展开更多
关键词 STEGANALYSIS speech convolutional sliding window deep learning
在线阅读 下载PDF
Curing the Cultural Disease of Toxic Masculinity in Octavia Butler’s“Speech Sounds”
12
作者 LI Qing-yang QI Jia-min 《Journal of Literature and Art Studies》 2025年第3期135-141,共7页
This paper delves into African America writer Octavia Butler’s Hugo-Award winning“Speech Sounds”to explore how the author uses a fictional pandemic as a metaphor to critique toxic masculinity in 1980s American cult... This paper delves into African America writer Octavia Butler’s Hugo-Award winning“Speech Sounds”to explore how the author uses a fictional pandemic as a metaphor to critique toxic masculinity in 1980s American culture.By analyzing the story,it reveals how the unnamed illness functions as a social pathogen,intensifying the negative aspects of hegemonic masculinity,leading to the breakdown of communication and the prevalence of violence.Through the character of Rye,the paper also examines how black feminist resilience offers a counter-narrative to the destructive forces of toxic masculinity.The study concludes that Butler’s work not only exposes the cultural disease of toxic masculinity but also provides a vision of healing and regeneration through communal care and the cultivation of hope,highlighting the power of speculative fiction as a tool for social critique and imagining alternative futures. 展开更多
关键词 speech Sounds” MASCULINITY pandemic Octavia Butler
在线阅读 下载PDF
A Black-Box Speech Adversarial Attack Method Based on Enhanced Neural Predictors in Industrial IoT
13
作者 Yun Zhang Zhenhua Yu +2 位作者 Xufei Hu Xuya Cong Ou Ye 《Computers, Materials & Continua》 2025年第9期5403-5426,共24页
Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks.Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Inte... Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks.Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices.Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation.To address these issues,this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors.This method searches for minimal perturbations in the perturbation space,employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction.This direction is then applied to the original sample to generate adversarial samples.To improve search efficiency,a pruning strategy is designed to discard samples below a threshold in the early search stages,reducing the number of searches.Additionally,a dynamic factor based on feedback from querying the automatic speech recognition system is introduced to adaptively adjust the search step size,further accelerating the search process.To validate the performance of the proposed method,experiments are conducted on the LibriSpeech dataset.Compared with the mainstream methods,the proposed method improves the signal-to-noise ratio by 0.8 dB,increases sample similarity by 0.43%,and reduces the average number of queries by 7%.Experimental results demonstrate that the proposed method offers better attack effectiveness and stealthiness. 展开更多
关键词 speech recognition adversarial attack self attention pruning strategy
在线阅读 下载PDF
Legal Protection of Human Dignity:Starting from Regulating“Hate Speech”
14
作者 HUANG Wenting JIANG Yu(Translated) 《The Journal of Human Rights》 2025年第1期124-148,共25页
The legal protection of human dignity can be explored from the perspective of regulating“hate speech.”The practices of most countries worldwide demonstrate that human dignity serves as a fundamental value limiting t... The legal protection of human dignity can be explored from the perspective of regulating“hate speech.”The practices of most countries worldwide demonstrate that human dignity serves as a fundamental value limiting the freedom of expression.Legally protected human dignity encompasses three levels of meaning:the dignity of life as an inherent aspect of human existence,the dignity of individuals as members of specific groups,and the personal dignity of individuals as unique beings.These three levels collectively emphasize the principle that human beings are ends in themselves,underscoring that individuals must not be degraded to mere means or subjected to harm.The inherent nature of human dignity necessitates its protection by both the state and societal entities.Traditionally,the safeguarding of human dignity has primarily depended on state intervention.However,with the advent of the digital age,this responsibility has increasingly extended to social entities,imposing changes of enhanced and expanded obligations of respect.Consequently,the key to protecting human dignity lies in adjusting the allocation of responsibilities between the state and society in accordance with the development of the times.Under the guidance of human dignity as a constitutional value,China should focus on establishing a comprehensive protection system by improving legislation,law enforcement,and judicial practices.This includes specifying the obligations of social entities and constructing multi-level regulatory mechanisms to form an effective system of protection by the state and society. 展开更多
关键词 human dignity legal protection “hate speech state obligation social responsibility
原文传递
Addresses at the opening ceremony and speeches at the forum
15
《China Standardization》 2025年第6期33-48,共16页
AI continues to reshape industries at a rapid pace,which reminds us of the growing importance of standardization.Standards and conformity assessment are essential to addressing the socio-technical dimensions of AI—en... AI continues to reshape industries at a rapid pace,which reminds us of the growing importance of standardization.Standards and conformity assessment are essential to addressing the socio-technical dimensions of AI—ensuring its safe,ethical,and inclusive adoption across different sectors. 展开更多
关键词 socio technical dimensions opening ceremony ETHICS conformity assessment speeches standardizationstandards conformity assessment AI SAFETY
原文传递
Effect of minimum and maximum electrical stimulation levels on sound field thresholds,and speech perception in children using the Advanced Bionics Cochlear Implant
16
作者 Muthuselvi Thangaraj Ravikumar Arunachalam +1 位作者 Madhuri Gore Ajith Kumar Uppunda 《Journal of Otology》 2025年第3期170-175,共6页
BackgroundIt's crucial to study the effect of changes in thresholds(T)and most comfortable levels(M)on behavioral measurements in young children using cochlear implants.This would help the clinician with the optim... BackgroundIt's crucial to study the effect of changes in thresholds(T)and most comfortable levels(M)on behavioral measurements in young children using cochlear implants.This would help the clinician with the optimization and validation of programming parameters.ObjectiveThe study has attempted to describe the changes in behavioral responses with modification of T and M levels.MethodsTwenty-five participants in the age range 5 to 12 years using HR90K/HiFocus1J or HR90KAdvantage/HiFocus1J with Harmony speech processors participated in the study.A decrease in T levels,a rise in T levels,or a decrease in M levels in the everyday program were used to create experimental programs.Sound field thresholds and speech perception were measured at 50 dBHL for three experimental and everyday programs.ConclusionThe results indicated that only reductions of M levels resulted in significantly(p<0.01)poor aided thresholds and speech perception.On the other hand,variation in T levels did not have significant changes in either sound field thresholds or speech perception.The results highlight that M levels must be correctly established in order to prevent decreased speech perception and audibility. 展开更多
关键词 T levels M levels sound field threshold speech perception
暂未订购
The distinct speech and voice phenotypes among TCM constitution for adults:A cross-sectional study
17
作者 ZHANG Weiqiang SUN Xiaoru +5 位作者 ZHANG Menghan TANG Dezhi QIU Jian’ge JIANG Binghua WANG Yongjun WANG Jiucun 《World Journal of Integrated Traditional and Western Medicine》 2025年第2期55-65,共11页
Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Metho... Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Methods:Acoustic analysis and TCM constitution assessment were performed for all 620 participants using Praat software and the CCMQ,respectively.Results:For formant features,the speech duration of special constitution participants was shorter than that of neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency participants when pronuncing the vowels/a/,/i/,and/u/.Compare to Yang-deficiency,Qi-deficiency participants had a shorter speech duration when pronucing/i/.For/u/,blood-stasis participants exhibited a lower F1 value than neutral participants.For vocal features,special constitution participants showed higher local jitter than neutral,dampness-heat,and Yang-deficiency participants(for/a/,/i/,and/u/).Higher absolute local jitter than neutral or dampness-heat participants.Compared with neutral or Yang-deficiency participants,special participants owned a higher local shimmer(dB).Special participants had a lower harmonicity autocorrelation than neutral,dampness-heat,or Yang-deficiency participants.Conclusions:Formant features may effectively differentiate special constitution from neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency constitutions based on vowel duration measurements(/a/,/i/,/u/).For the vowel/u/,F1 values may help distinguish blood-stasis from neutral constitution.Vocal features appear particularly useful for distinguishing special constitution from neutral,dampness-heat,or Yang-deficiency constitution,with local jitter and harmonicity autocorrelation showing significant discriminatory power. 展开更多
关键词 speech and voice phenotype Acoustic feature TCM constitution Chinmedphenomics
暂未订购
Subsequent psychiatric disorders in attention deficit and hyperactivity children receiving speech therapy
18
作者 Ruu-Fen Tzang Yu-Wen Lin +5 位作者 Kai-Liang Kao Yue-Cune Chang Hui-Chun Huang Shang-Yu Wu Shu-I Wu Robert Stewart 《World Journal of Psychiatry》 2025年第5期145-154,共10页
BACKGROUND Speech and language therapy(ST)might moderate the prognosis in children with attention deficit and hyperactivity disorder(ADHD)comorbid with speech delay.This study investigated whether ST in children with ... BACKGROUND Speech and language therapy(ST)might moderate the prognosis in children with attention deficit and hyperactivity disorder(ADHD)comorbid with speech delay.This study investigated whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.AIM To investigate whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.METHODS The population-based National Health Insurance Research Database in Taiwan was used.Hazards of subsequent psychiatric disorders were compared between those who received ST and a propensity-score matched comparison group by Cox regression analyses.RESULTS Of 11987 children with ADHD identified from the dataset,2911(24%)had received ST.The adjusted hazard ratio for any subsequent recorded psychiatric disorder was 0.72(95%confidence interval:0.63-0.82)in children who received ST compared to the matched counterparts.This protective association was only statistically significant in the subgroup that received both medication and behavioral interventions.CONCLUSION ST can moderate the effects of integrated early interventions in ADHD children with speech delay. 展开更多
关键词 Non-Western country Attention deficit and hyperactivity disorder Psychiatric disorders speech and language therapy Adjustment disorder
暂未订购
Ti_(3)C_(2)T_(x) Composite Aerogels Enable Pressure Sensors for Dialect Speech Recognition Assisted by Deep Learning
19
作者 Yanan Xiao He Li +8 位作者 Tianyi Gu Xiaoteng Jia Shixiang Sun Yong Liu Bin Wang He Tian Peng Sun Fangmeng Liu Geyu Lu 《Nano-Micro Letters》 2025年第5期1-15,共15页
Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages... Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages,which hampers effective communication for non-standard language people.Here,we prepare an ultralight Ti_(3)C_(2)T_(x)MXene/chitosan/polyvinylidene difluoride composite aerogel with a detection range of 6.25 Pa-1200 k Pa,rapid response/recovery time,and low hysteresis(13.69%).The wearable aerogel pressure sensor can detect speech information through the throat muscle vibrations without any interference,allowing for accurate recognition of six dialects(96.2%accuracy)and seven different words(96.6%accuracy)with the assistance of convolutional neural networks.This work represents a significant step forward in silent speech recognition for human–machine interaction and physiological signal monitoring. 展开更多
关键词 Pressure sensor Wearable sensor Ti_(3)C_(2)T_(x) composite aerogel Dialect speech recognition
在线阅读 下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
20
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部