“civil discourse”amongst multiple individuals with diverse viewpoints is necessary to move toward truth,to maintain democratic buoyancy,and to get the most accurate read on how best to move forward toward our collec...“civil discourse”amongst multiple individuals with diverse viewpoints is necessary to move toward truth,to maintain democratic buoyancy,and to get the most accurate read on how best to move forward toward our collective good,civil discourse is nonetheless under catastrophic threat by contemporary forces that include the sloppy use of the term“hate speech”;the“libelling by labeling”(aka“cancelling”)in the public square of social media;technologically powered disinformation campaigns;and the growth of“safetyism”in academia.In light of these threats,the goal must be to convince educators,particularly philosophical educators,of the need to adopt a whole new focus in education,namely one that puts a spotlight on the fact that the utilization of the freedom of speech to destroy the freedom of speech of others utterly undermines the positive value of freedom of speech.In order to motivate individuals to turn their back on the dopamine rush of shutting someone down,educators must also spend a great deal of time showcasing the merits of“civil discourse”by providing young people with extensive experience in engaging in facilitated“civil discourse”(aka Communities of Philosophical Inquiry)so that its value can be woven into a personal commitment.展开更多
Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and female...Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and females, possibly true for SCA due to gender-specific disease pathophysiological changes. Objective: To investigate gender differences in psychoacoustical abilities, and speech perception in noise in SCA individuals and further compare with normal healthy(NH) population. Methods: 80 SCA and 80 NH normal-hearing participants aged 15-40 years were included and further grouped based on gender. Auditory discrimination for frequency, intensity, and duration at 500Hz and 4000Hz;temporal processing(Gap detection threshold & Modulation Detection Threshold) and Speech Perception In Noise(SPIN) at 0d BSNR tests were evaluated and compared between males and females of SCA and NH population. Results: SCA performed poorer compared to NH for all experimental measures. In the NH population, males performed poorer than females in psychoacoustical measures whereas within the SCA population, the reverse was true. Female participants performed better in the SPIN test in both populations. Conclusions: The adverse impact of SCA on the auditory system due to circulatory changes might cause poorer performance in SCA. Poorer performance by Female SCA is possibly due to the contrary impact of lower Hb level overlying Sickle disease.Estrogen levels and gender preference in auditory processing might lead to better performance by females within the NH population. SPIN performance depends on different attentional demands and sensorimotor processing strategies in noise beyond psychoacoustical processing may lead to better female performance in both populations.展开更多
To address the contradiction between the explosive growth of wireless data and the limited spectrum resources,semantic communication has been emerging as a promising communication paradigm.In this paper,we thus design...To address the contradiction between the explosive growth of wireless data and the limited spectrum resources,semantic communication has been emerging as a promising communication paradigm.In this paper,we thus design a speech semantic coded communication system,referred to as Deep-STS(i.e.,Deep-learning based Speech To Speech),for the lowbandwidth speech communication.Specifically,we first deeply compress the speech data through extracting the textual information from the speech based on the conformer encoder and connectionist temporal classification decoder at the transmitter side of Deep-STS system.In order to facilitate the final speech timbre recovery,we also extract the short-term timbre feature of speech signals only for the starting 2s duration by the long short-term memory network.Then,the Reed-Solomon coding and hybrid automatic repeat request protocol are applied to improve the reliability of transmitting the extracted text and timbre feature over the wireless channel.Third,we reconstruct the speech signal by the mel spectrogram prediction network and vocoder,when the extracted text is received along with the timbre feature at the receiver of Deep-STS system.Finally,we develop the demo system based on the USRP and GNU radio for the performance evaluation of Deep-STS.Numerical results show that the ac-Received:Jan.17,2024 Revised:Jun.12,2024 Editor:Niu Kai curacy of text extraction approaches 95%,and the mel cepstral distortion between the recovered speech signal and the original one in the spectrum domain is less than 10.Furthermore,the experimental results show that the proposed Deep-STS system can reduce the total delay of speech communication by 85%on average compared to the G.723 coding at the transmission rate of 5.4 kbps.More importantly,the coding rate of the proposed Deep-STS system is extremely low,only 0.2 kbps for continuous speech communication.It is worth noting that the Deep-STS with lower coding rate can support the low-zero-power speech communication,unveiling a new era in ultra-efficient coded communications.展开更多
Background:Research has shown that musicians outperform non-musicians in speech perception in noise(SPiN)tasks.However,it remains unclear whether the advantages of musical training are substantial enough to slow down ...Background:Research has shown that musicians outperform non-musicians in speech perception in noise(SPiN)tasks.However,it remains unclear whether the advantages of musical training are substantial enough to slow down the decline in SPiN performance associated with aging.Objectives:Therefore,we assessed SPiN performances in a continuum of age groups comprising musicians and non-musicians.The goal was to compare how the aging process affected SPiN performances of musicians and non-musicians.Method:A cross-sectional descriptive mixed design was used,involving 150 participants divided into 75 musicians and 75 non-musicians.Each age group(10-19,20-29,30-39,40-49,and 50-59)consisted of15 musicians and 15 non-musicians.Six Kannada sentence lists were combined with four-talker babble.At+5,0,and-5 dB signal-to-noise ratios(SNRs),the percent correct Speech Identification Scores were calculated.Results:The repeated measure ANOVA(RM ANOVA)revealed significant main effects and interaction effects between SNR,musicianship,and age groups(p<0.05).A small to large effect size was noted(ηp2=0.05 to0.17).A significant interaction effect and follow-up post hoc tests showed that SPiN abilities deteriorated more rapidly with increasing age in nonmusicians compared to musicians,especially at difficult SNRs.Conclusions:Musicians had better SPiN abilities than non-musicians across all age groups.Also,age-related deterioration in SPiN abilities was faster in non-musicians compared to musicians.展开更多
Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients.This paper presents a novel multi-modal architecture leveraging digital twins to enhance p...Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients.This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling.By integrating real-time images,electronic health records,and genomic information,the system enables personalized simulations for disease progression modeling,treatment response prediction,and preventive care strategies.In dysarthric speech,which is characterized by articulation imprecision,temporal misalignments,and phoneme distortions,existing models struggle to capture these irregularities.Traditional approaches,often relying solely on audio features,fail to address the full complexity of phoneme variations,leading to increased phoneme error rates(PER)and word error rates(WER).To overcome these challenges,we propose a novel multi-modal architecture that integrates both audio and articulatory data through a combination of Temporal Convolutional Networks(TCNs),Graph Convolutional Networks(GCNs),Transformer Encoders,and a cross-modal attention mechanism.The audio branch of the model utilizes TCNs and Transformer Encoders to capture both short-and long-term dependencies in the audio signal,while the articulatory branch leverages GCNs to model spatial relationships between articulators,such as the lips,jaw,and tongue,allowing the model to detect subtle articulatory imprecisions.A cross-modal attention mechanism fuses the encoded audio and articulatory features,enabling dynamic adjustment of the model’s focus depending on input quality,which significantly improves phoneme labeling accuracy.The proposed model consistently outperforms existing methods,achieving lower Phoneme Error Rates(PER),Word Error Rates(WER),and Articulatory Feature Misclassification Rates(AFMR).Specifically,across all datasets,the model achieves an average PER of 13.43%,an average WER of 21.67%,and an average AFMR of 12.73%.By capturing both the acoustic and articulatory intricacies of speech,this comprehensive approach not only improves phoneme labeling precision but also marks substantial progress in speech recognition technology for individuals with dysarthria.展开更多
Ba-constructions carry high level of transitivity,but a deviation towards low transitivity happens.It is found that mode provides axis of this transitivity deviation,especially in spoken dialogue.Under the influence o...Ba-constructions carry high level of transitivity,but a deviation towards low transitivity happens.It is found that mode provides axis of this transitivity deviation,especially in spoken dialogue.Under the influence of irrealis mode,parameters such as aspects,affectedness of O,individuation of O,and affirmation exhibit different degrees of transitivity deviation.Speech acts,which is closely related to mode,are the driving force in discourse of this phenomenon.The composition rules of speech acts of directive,declarative,commitment,and emotive,which account for the majority of speech acts in spoken dialogue,determine that they are all irrealis.Therefore,under the axis of irrealis mode,several transitivity parameters of ba-constructions in oral dialogue deviates towards low transitivity.The phenomenon of deviation of transitivity in ba-constructions verifies transitivity hypothesis.展开更多
PurposeThe purpose of the study was to investigate the effect of bimodal beamforming on speech recognition and comfort for cochlear implant (CI) users with the bimodal hearing solution made up by linking a hearing aid...PurposeThe purpose of the study was to investigate the effect of bimodal beamforming on speech recognition and comfort for cochlear implant (CI) users with the bimodal hearing solution made up by linking a hearing aid to the CI sound processor.Methods19 subjects participated in this study. Speech tests were conducted in quiet and in noisy environments, with the target speech presented from 0° and the noise signal from 45°. Speech recognition thresholds (SRTs) were compared among the previously used bimodal hearing configuration (baseline, any CI sound processor plus any hearing aid), the Naída Bimodal Hearing Solution with omnidirectional microphone, and with directional microphone (so called StereoZoom) switched on. In addition, the study participants provided subjective feedback on their hearing impressions.ResultsThe SRT results showed no significant difference among the three hearing conditions in the quiet environment. No significant improvement was reported when using Naída bimodal system with omnidirectional microphone in noise compared to the baseline (p=0.27). When applying StereoZoom, SRT in noise showed significant improvements compared to omnidirectional settings (p<0.05). Subjective feedback showed that 13 participants were satisfied with Naída Bimodal Hearing Solution, and wanted to continue using it after the trial.ConclusionThe Naída Bimodal Hearing Solution with the same pre-processing algorithm can provide satisfying hearing performance. Beamforming technology can further improve speech perception in noisy environments.展开更多
Flipped is a book written by American author Wendelin Van Draanen. It is a novel about young teenagers and was adapted into the famous film of the same name in 2010. The thesis employs speech acts, as pioneered by Joh...Flipped is a book written by American author Wendelin Van Draanen. It is a novel about young teenagers and was adapted into the famous film of the same name in 2010. The thesis employs speech acts, as pioneered by John Austin and further developed by John Searle, to investigate the influence of dialogues on characterization and plot development in Flipped. By exploring the theory of speech acts presented in dialogues between characters, the author deciphers the underlying intentions embodied in the dialogues and demonstrates the importance of the use of speech acts in dialogues in revealing the characters, driving the development of the plot and expressing the theme of the text.展开更多
With the rapid advancement of Voice over Internet Protocol(VoIP)technology,speech steganography techniques such as Quantization Index Modulation(QIM)and Pitch Modulation Steganography(PMS)have emerged as significant c...With the rapid advancement of Voice over Internet Protocol(VoIP)technology,speech steganography techniques such as Quantization Index Modulation(QIM)and Pitch Modulation Steganography(PMS)have emerged as significant challenges to information security.These techniques embed hidden information into speech streams,making detection increasingly difficult,particularly under conditions of low embedding rates and short speech durations.Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals.To address these challenges,this paper proposes an Efficient Sliding Window Analysis Network(E-SWAN),a novel deep learning model specifically designed for real-time speech steganalysis.E-SWAN integrates two core modules:the LSTM Temporal Feature Miner(LTFM)and the Convolutional Key Feature Miner(CKFM).LTFM captures long-range temporal dependencies using Long Short-Term Memory networks,while CKFM identifies local spatial variations caused by steganographic embedding through convolutional operations.These modules operate within a sliding window framework,enabling efficient extraction of temporal and spatial features.Experimental results on the Chinese CNV and PMS datasets demonstrate the superior performance of E-SWAN.Under conditions of a ten-second sample duration and an embedding rate of 10%,E-SWAN achieves a detection accuracy of 62.09%on the PMS dataset,surpassing existing methods by 4.57%,and an accuracy of 82.28%on the CNV dataset,outperforming state-of-the-art methods by 7.29%.These findings validate the robustness and efficiency of E-SWAN under low embedding rates and short durations,offering a promising solution for real-time VoIP steganalysis.This work provides significant contributions to enhancing information security in digital communications.展开更多
This paper delves into African America writer Octavia Butler’s Hugo-Award winning“Speech Sounds”to explore how the author uses a fictional pandemic as a metaphor to critique toxic masculinity in 1980s American cult...This paper delves into African America writer Octavia Butler’s Hugo-Award winning“Speech Sounds”to explore how the author uses a fictional pandemic as a metaphor to critique toxic masculinity in 1980s American culture.By analyzing the story,it reveals how the unnamed illness functions as a social pathogen,intensifying the negative aspects of hegemonic masculinity,leading to the breakdown of communication and the prevalence of violence.Through the character of Rye,the paper also examines how black feminist resilience offers a counter-narrative to the destructive forces of toxic masculinity.The study concludes that Butler’s work not only exposes the cultural disease of toxic masculinity but also provides a vision of healing and regeneration through communal care and the cultivation of hope,highlighting the power of speculative fiction as a tool for social critique and imagining alternative futures.展开更多
Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks.Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Inte...Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks.Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices.Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation.To address these issues,this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors.This method searches for minimal perturbations in the perturbation space,employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction.This direction is then applied to the original sample to generate adversarial samples.To improve search efficiency,a pruning strategy is designed to discard samples below a threshold in the early search stages,reducing the number of searches.Additionally,a dynamic factor based on feedback from querying the automatic speech recognition system is introduced to adaptively adjust the search step size,further accelerating the search process.To validate the performance of the proposed method,experiments are conducted on the LibriSpeech dataset.Compared with the mainstream methods,the proposed method improves the signal-to-noise ratio by 0.8 dB,increases sample similarity by 0.43%,and reduces the average number of queries by 7%.Experimental results demonstrate that the proposed method offers better attack effectiveness and stealthiness.展开更多
The legal protection of human dignity can be explored from the perspective of regulating“hate speech.”The practices of most countries worldwide demonstrate that human dignity serves as a fundamental value limiting t...The legal protection of human dignity can be explored from the perspective of regulating“hate speech.”The practices of most countries worldwide demonstrate that human dignity serves as a fundamental value limiting the freedom of expression.Legally protected human dignity encompasses three levels of meaning:the dignity of life as an inherent aspect of human existence,the dignity of individuals as members of specific groups,and the personal dignity of individuals as unique beings.These three levels collectively emphasize the principle that human beings are ends in themselves,underscoring that individuals must not be degraded to mere means or subjected to harm.The inherent nature of human dignity necessitates its protection by both the state and societal entities.Traditionally,the safeguarding of human dignity has primarily depended on state intervention.However,with the advent of the digital age,this responsibility has increasingly extended to social entities,imposing changes of enhanced and expanded obligations of respect.Consequently,the key to protecting human dignity lies in adjusting the allocation of responsibilities between the state and society in accordance with the development of the times.Under the guidance of human dignity as a constitutional value,China should focus on establishing a comprehensive protection system by improving legislation,law enforcement,and judicial practices.This includes specifying the obligations of social entities and constructing multi-level regulatory mechanisms to form an effective system of protection by the state and society.展开更多
AI continues to reshape industries at a rapid pace,which reminds us of the growing importance of standardization.Standards and conformity assessment are essential to addressing the socio-technical dimensions of AI—en...AI continues to reshape industries at a rapid pace,which reminds us of the growing importance of standardization.Standards and conformity assessment are essential to addressing the socio-technical dimensions of AI—ensuring its safe,ethical,and inclusive adoption across different sectors.展开更多
BackgroundIt's crucial to study the effect of changes in thresholds(T)and most comfortable levels(M)on behavioral measurements in young children using cochlear implants.This would help the clinician with the optim...BackgroundIt's crucial to study the effect of changes in thresholds(T)and most comfortable levels(M)on behavioral measurements in young children using cochlear implants.This would help the clinician with the optimization and validation of programming parameters.ObjectiveThe study has attempted to describe the changes in behavioral responses with modification of T and M levels.MethodsTwenty-five participants in the age range 5 to 12 years using HR90K/HiFocus1J or HR90KAdvantage/HiFocus1J with Harmony speech processors participated in the study.A decrease in T levels,a rise in T levels,or a decrease in M levels in the everyday program were used to create experimental programs.Sound field thresholds and speech perception were measured at 50 dBHL for three experimental and everyday programs.ConclusionThe results indicated that only reductions of M levels resulted in significantly(p<0.01)poor aided thresholds and speech perception.On the other hand,variation in T levels did not have significant changes in either sound field thresholds or speech perception.The results highlight that M levels must be correctly established in order to prevent decreased speech perception and audibility.展开更多
Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Metho...Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Methods:Acoustic analysis and TCM constitution assessment were performed for all 620 participants using Praat software and the CCMQ,respectively.Results:For formant features,the speech duration of special constitution participants was shorter than that of neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency participants when pronuncing the vowels/a/,/i/,and/u/.Compare to Yang-deficiency,Qi-deficiency participants had a shorter speech duration when pronucing/i/.For/u/,blood-stasis participants exhibited a lower F1 value than neutral participants.For vocal features,special constitution participants showed higher local jitter than neutral,dampness-heat,and Yang-deficiency participants(for/a/,/i/,and/u/).Higher absolute local jitter than neutral or dampness-heat participants.Compared with neutral or Yang-deficiency participants,special participants owned a higher local shimmer(dB).Special participants had a lower harmonicity autocorrelation than neutral,dampness-heat,or Yang-deficiency participants.Conclusions:Formant features may effectively differentiate special constitution from neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency constitutions based on vowel duration measurements(/a/,/i/,/u/).For the vowel/u/,F1 values may help distinguish blood-stasis from neutral constitution.Vocal features appear particularly useful for distinguishing special constitution from neutral,dampness-heat,or Yang-deficiency constitution,with local jitter and harmonicity autocorrelation showing significant discriminatory power.展开更多
BACKGROUND Speech and language therapy(ST)might moderate the prognosis in children with attention deficit and hyperactivity disorder(ADHD)comorbid with speech delay.This study investigated whether ST in children with ...BACKGROUND Speech and language therapy(ST)might moderate the prognosis in children with attention deficit and hyperactivity disorder(ADHD)comorbid with speech delay.This study investigated whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.AIM To investigate whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.METHODS The population-based National Health Insurance Research Database in Taiwan was used.Hazards of subsequent psychiatric disorders were compared between those who received ST and a propensity-score matched comparison group by Cox regression analyses.RESULTS Of 11987 children with ADHD identified from the dataset,2911(24%)had received ST.The adjusted hazard ratio for any subsequent recorded psychiatric disorder was 0.72(95%confidence interval:0.63-0.82)in children who received ST compared to the matched counterparts.This protective association was only statistically significant in the subgroup that received both medication and behavioral interventions.CONCLUSION ST can moderate the effects of integrated early interventions in ADHD children with speech delay.展开更多
Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages...Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages,which hampers effective communication for non-standard language people.Here,we prepare an ultralight Ti_(3)C_(2)T_(x)MXene/chitosan/polyvinylidene difluoride composite aerogel with a detection range of 6.25 Pa-1200 k Pa,rapid response/recovery time,and low hysteresis(13.69%).The wearable aerogel pressure sensor can detect speech information through the throat muscle vibrations without any interference,allowing for accurate recognition of six dialects(96.2%accuracy)and seven different words(96.6%accuracy)with the assistance of convolutional neural networks.This work represents a significant step forward in silent speech recognition for human–machine interaction and physiological signal monitoring.展开更多
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p...In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.展开更多
文摘“civil discourse”amongst multiple individuals with diverse viewpoints is necessary to move toward truth,to maintain democratic buoyancy,and to get the most accurate read on how best to move forward toward our collective good,civil discourse is nonetheless under catastrophic threat by contemporary forces that include the sloppy use of the term“hate speech”;the“libelling by labeling”(aka“cancelling”)in the public square of social media;technologically powered disinformation campaigns;and the growth of“safetyism”in academia.In light of these threats,the goal must be to convince educators,particularly philosophical educators,of the need to adopt a whole new focus in education,namely one that puts a spotlight on the fact that the utilization of the freedom of speech to destroy the freedom of speech of others utterly undermines the positive value of freedom of speech.In order to motivate individuals to turn their back on the dopamine rush of shutting someone down,educators must also spend a great deal of time showcasing the merits of“civil discourse”by providing young people with extensive experience in engaging in facilitated“civil discourse”(aka Communities of Philosophical Inquiry)so that its value can be woven into a personal commitment.
文摘Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and females, possibly true for SCA due to gender-specific disease pathophysiological changes. Objective: To investigate gender differences in psychoacoustical abilities, and speech perception in noise in SCA individuals and further compare with normal healthy(NH) population. Methods: 80 SCA and 80 NH normal-hearing participants aged 15-40 years were included and further grouped based on gender. Auditory discrimination for frequency, intensity, and duration at 500Hz and 4000Hz;temporal processing(Gap detection threshold & Modulation Detection Threshold) and Speech Perception In Noise(SPIN) at 0d BSNR tests were evaluated and compared between males and females of SCA and NH population. Results: SCA performed poorer compared to NH for all experimental measures. In the NH population, males performed poorer than females in psychoacoustical measures whereas within the SCA population, the reverse was true. Female participants performed better in the SPIN test in both populations. Conclusions: The adverse impact of SCA on the auditory system due to circulatory changes might cause poorer performance in SCA. Poorer performance by Female SCA is possibly due to the contrary impact of lower Hb level overlying Sickle disease.Estrogen levels and gender preference in auditory processing might lead to better performance by females within the NH population. SPIN performance depends on different attentional demands and sensorimotor processing strategies in noise beyond psychoacoustical processing may lead to better female performance in both populations.
基金supported in part by National Natural Science Foundation of China under Grants 62122069,62071431,and 62201507.
文摘To address the contradiction between the explosive growth of wireless data and the limited spectrum resources,semantic communication has been emerging as a promising communication paradigm.In this paper,we thus design a speech semantic coded communication system,referred to as Deep-STS(i.e.,Deep-learning based Speech To Speech),for the lowbandwidth speech communication.Specifically,we first deeply compress the speech data through extracting the textual information from the speech based on the conformer encoder and connectionist temporal classification decoder at the transmitter side of Deep-STS system.In order to facilitate the final speech timbre recovery,we also extract the short-term timbre feature of speech signals only for the starting 2s duration by the long short-term memory network.Then,the Reed-Solomon coding and hybrid automatic repeat request protocol are applied to improve the reliability of transmitting the extracted text and timbre feature over the wireless channel.Third,we reconstruct the speech signal by the mel spectrogram prediction network and vocoder,when the extracted text is received along with the timbre feature at the receiver of Deep-STS system.Finally,we develop the demo system based on the USRP and GNU radio for the performance evaluation of Deep-STS.Numerical results show that the ac-Received:Jan.17,2024 Revised:Jun.12,2024 Editor:Niu Kai curacy of text extraction approaches 95%,and the mel cepstral distortion between the recovered speech signal and the original one in the spectrum domain is less than 10.Furthermore,the experimental results show that the proposed Deep-STS system can reduce the total delay of speech communication by 85%on average compared to the G.723 coding at the transmission rate of 5.4 kbps.More importantly,the coding rate of the proposed Deep-STS system is extremely low,only 0.2 kbps for continuous speech communication.It is worth noting that the Deep-STS with lower coding rate can support the low-zero-power speech communication,unveiling a new era in ultra-efficient coded communications.
文摘Background:Research has shown that musicians outperform non-musicians in speech perception in noise(SPiN)tasks.However,it remains unclear whether the advantages of musical training are substantial enough to slow down the decline in SPiN performance associated with aging.Objectives:Therefore,we assessed SPiN performances in a continuum of age groups comprising musicians and non-musicians.The goal was to compare how the aging process affected SPiN performances of musicians and non-musicians.Method:A cross-sectional descriptive mixed design was used,involving 150 participants divided into 75 musicians and 75 non-musicians.Each age group(10-19,20-29,30-39,40-49,and 50-59)consisted of15 musicians and 15 non-musicians.Six Kannada sentence lists were combined with four-talker babble.At+5,0,and-5 dB signal-to-noise ratios(SNRs),the percent correct Speech Identification Scores were calculated.Results:The repeated measure ANOVA(RM ANOVA)revealed significant main effects and interaction effects between SNR,musicianship,and age groups(p<0.05).A small to large effect size was noted(ηp2=0.05 to0.17).A significant interaction effect and follow-up post hoc tests showed that SPiN abilities deteriorated more rapidly with increasing age in nonmusicians compared to musicians,especially at difficult SNRs.Conclusions:Musicians had better SPiN abilities than non-musicians across all age groups.Also,age-related deterioration in SPiN abilities was faster in non-musicians compared to musicians.
基金funded by the Ongoing Research Funding program(ORF-2025-867),King Saud University,Riyadh,Saudi Arabia.
文摘Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients.This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling.By integrating real-time images,electronic health records,and genomic information,the system enables personalized simulations for disease progression modeling,treatment response prediction,and preventive care strategies.In dysarthric speech,which is characterized by articulation imprecision,temporal misalignments,and phoneme distortions,existing models struggle to capture these irregularities.Traditional approaches,often relying solely on audio features,fail to address the full complexity of phoneme variations,leading to increased phoneme error rates(PER)and word error rates(WER).To overcome these challenges,we propose a novel multi-modal architecture that integrates both audio and articulatory data through a combination of Temporal Convolutional Networks(TCNs),Graph Convolutional Networks(GCNs),Transformer Encoders,and a cross-modal attention mechanism.The audio branch of the model utilizes TCNs and Transformer Encoders to capture both short-and long-term dependencies in the audio signal,while the articulatory branch leverages GCNs to model spatial relationships between articulators,such as the lips,jaw,and tongue,allowing the model to detect subtle articulatory imprecisions.A cross-modal attention mechanism fuses the encoded audio and articulatory features,enabling dynamic adjustment of the model’s focus depending on input quality,which significantly improves phoneme labeling accuracy.The proposed model consistently outperforms existing methods,achieving lower Phoneme Error Rates(PER),Word Error Rates(WER),and Articulatory Feature Misclassification Rates(AFMR).Specifically,across all datasets,the model achieves an average PER of 13.43%,an average WER of 21.67%,and an average AFMR of 12.73%.By capturing both the acoustic and articulatory intricacies of speech,this comprehensive approach not only improves phoneme labeling precision but also marks substantial progress in speech recognition technology for individuals with dysarthria.
文摘Ba-constructions carry high level of transitivity,but a deviation towards low transitivity happens.It is found that mode provides axis of this transitivity deviation,especially in spoken dialogue.Under the influence of irrealis mode,parameters such as aspects,affectedness of O,individuation of O,and affirmation exhibit different degrees of transitivity deviation.Speech acts,which is closely related to mode,are the driving force in discourse of this phenomenon.The composition rules of speech acts of directive,declarative,commitment,and emotive,which account for the majority of speech acts in spoken dialogue,determine that they are all irrealis.Therefore,under the axis of irrealis mode,several transitivity parameters of ba-constructions in oral dialogue deviates towards low transitivity.The phenomenon of deviation of transitivity in ba-constructions verifies transitivity hypothesis.
基金supported by grants from Capital’s Funds for Health Improvement and Research(No.2022-1-2023)the National Natural Science Foundation of China(No.82371148)Open project National Clinical Research Center for Otolaryngologic Diseases(202200010).
文摘PurposeThe purpose of the study was to investigate the effect of bimodal beamforming on speech recognition and comfort for cochlear implant (CI) users with the bimodal hearing solution made up by linking a hearing aid to the CI sound processor.Methods19 subjects participated in this study. Speech tests were conducted in quiet and in noisy environments, with the target speech presented from 0° and the noise signal from 45°. Speech recognition thresholds (SRTs) were compared among the previously used bimodal hearing configuration (baseline, any CI sound processor plus any hearing aid), the Naída Bimodal Hearing Solution with omnidirectional microphone, and with directional microphone (so called StereoZoom) switched on. In addition, the study participants provided subjective feedback on their hearing impressions.ResultsThe SRT results showed no significant difference among the three hearing conditions in the quiet environment. No significant improvement was reported when using Naída bimodal system with omnidirectional microphone in noise compared to the baseline (p=0.27). When applying StereoZoom, SRT in noise showed significant improvements compared to omnidirectional settings (p<0.05). Subjective feedback showed that 13 participants were satisfied with Naída Bimodal Hearing Solution, and wanted to continue using it after the trial.ConclusionThe Naída Bimodal Hearing Solution with the same pre-processing algorithm can provide satisfying hearing performance. Beamforming technology can further improve speech perception in noisy environments.
文摘Flipped is a book written by American author Wendelin Van Draanen. It is a novel about young teenagers and was adapted into the famous film of the same name in 2010. The thesis employs speech acts, as pioneered by John Austin and further developed by John Searle, to investigate the influence of dialogues on characterization and plot development in Flipped. By exploring the theory of speech acts presented in dialogues between characters, the author deciphers the underlying intentions embodied in the dialogues and demonstrates the importance of the use of speech acts in dialogues in revealing the characters, driving the development of the plot and expressing the theme of the text.
基金supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LQ20F020004in part by the National College Student Innovation and Research Training Program under Grant 202313283002.
文摘With the rapid advancement of Voice over Internet Protocol(VoIP)technology,speech steganography techniques such as Quantization Index Modulation(QIM)and Pitch Modulation Steganography(PMS)have emerged as significant challenges to information security.These techniques embed hidden information into speech streams,making detection increasingly difficult,particularly under conditions of low embedding rates and short speech durations.Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals.To address these challenges,this paper proposes an Efficient Sliding Window Analysis Network(E-SWAN),a novel deep learning model specifically designed for real-time speech steganalysis.E-SWAN integrates two core modules:the LSTM Temporal Feature Miner(LTFM)and the Convolutional Key Feature Miner(CKFM).LTFM captures long-range temporal dependencies using Long Short-Term Memory networks,while CKFM identifies local spatial variations caused by steganographic embedding through convolutional operations.These modules operate within a sliding window framework,enabling efficient extraction of temporal and spatial features.Experimental results on the Chinese CNV and PMS datasets demonstrate the superior performance of E-SWAN.Under conditions of a ten-second sample duration and an embedding rate of 10%,E-SWAN achieves a detection accuracy of 62.09%on the PMS dataset,surpassing existing methods by 4.57%,and an accuracy of 82.28%on the CNV dataset,outperforming state-of-the-art methods by 7.29%.These findings validate the robustness and efficiency of E-SWAN under low embedding rates and short durations,offering a promising solution for real-time VoIP steganalysis.This work provides significant contributions to enhancing information security in digital communications.
基金supported by the Ministry of Education Humanities and Social Science Project,Project Title:“A Study of the Writing of Futures in Contemporary Science Fiction by African American Women”(Grant No.22YJC752010)。
文摘This paper delves into African America writer Octavia Butler’s Hugo-Award winning“Speech Sounds”to explore how the author uses a fictional pandemic as a metaphor to critique toxic masculinity in 1980s American culture.By analyzing the story,it reveals how the unnamed illness functions as a social pathogen,intensifying the negative aspects of hegemonic masculinity,leading to the breakdown of communication and the prevalence of violence.Through the character of Rye,the paper also examines how black feminist resilience offers a counter-narrative to the destructive forces of toxic masculinity.The study concludes that Butler’s work not only exposes the cultural disease of toxic masculinity but also provides a vision of healing and regeneration through communal care and the cultivation of hope,highlighting the power of speculative fiction as a tool for social critique and imagining alternative futures.
基金supported in part by the Natural Science Foundation of China under Grant 62273272,Grant 62303375,and Grant 61873277in part by the Key Research and Development Program of Shaanxi Province under Grant 2024CY2-GJHX-49 and Grant 2024CY2-GJHX-43+1 种基金in part by the Youth Innovation Team of Shaanxi Universitiesand in part by the Key Scientific Research Programof Education Department of Shaanxi Province under Grant 24JR111.
文摘Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks.Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices.Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation.To address these issues,this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors.This method searches for minimal perturbations in the perturbation space,employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction.This direction is then applied to the original sample to generate adversarial samples.To improve search efficiency,a pruning strategy is designed to discard samples below a threshold in the early search stages,reducing the number of searches.Additionally,a dynamic factor based on feedback from querying the automatic speech recognition system is introduced to adaptively adjust the search step size,further accelerating the search process.To validate the performance of the proposed method,experiments are conducted on the LibriSpeech dataset.Compared with the mainstream methods,the proposed method improves the signal-to-noise ratio by 0.8 dB,increases sample similarity by 0.43%,and reduces the average number of queries by 7%.Experimental results demonstrate that the proposed method offers better attack effectiveness and stealthiness.
文摘The legal protection of human dignity can be explored from the perspective of regulating“hate speech.”The practices of most countries worldwide demonstrate that human dignity serves as a fundamental value limiting the freedom of expression.Legally protected human dignity encompasses three levels of meaning:the dignity of life as an inherent aspect of human existence,the dignity of individuals as members of specific groups,and the personal dignity of individuals as unique beings.These three levels collectively emphasize the principle that human beings are ends in themselves,underscoring that individuals must not be degraded to mere means or subjected to harm.The inherent nature of human dignity necessitates its protection by both the state and societal entities.Traditionally,the safeguarding of human dignity has primarily depended on state intervention.However,with the advent of the digital age,this responsibility has increasingly extended to social entities,imposing changes of enhanced and expanded obligations of respect.Consequently,the key to protecting human dignity lies in adjusting the allocation of responsibilities between the state and society in accordance with the development of the times.Under the guidance of human dignity as a constitutional value,China should focus on establishing a comprehensive protection system by improving legislation,law enforcement,and judicial practices.This includes specifying the obligations of social entities and constructing multi-level regulatory mechanisms to form an effective system of protection by the state and society.
文摘AI continues to reshape industries at a rapid pace,which reminds us of the growing importance of standardization.Standards and conformity assessment are essential to addressing the socio-technical dimensions of AI—ensuring its safe,ethical,and inclusive adoption across different sectors.
文摘BackgroundIt's crucial to study the effect of changes in thresholds(T)and most comfortable levels(M)on behavioral measurements in young children using cochlear implants.This would help the clinician with the optimization and validation of programming parameters.ObjectiveThe study has attempted to describe the changes in behavioral responses with modification of T and M levels.MethodsTwenty-five participants in the age range 5 to 12 years using HR90K/HiFocus1J or HR90KAdvantage/HiFocus1J with Harmony speech processors participated in the study.A decrease in T levels,a rise in T levels,or a decrease in M levels in the everyday program were used to create experimental programs.Sound field thresholds and speech perception were measured at 50 dBHL for three experimental and everyday programs.ConclusionThe results indicated that only reductions of M levels resulted in significantly(p<0.01)poor aided thresholds and speech perception.On the other hand,variation in T levels did not have significant changes in either sound field thresholds or speech perception.The results highlight that M levels must be correctly established in order to prevent decreased speech perception and audibility.
基金supported by the National Natural Science Foundation of China(Nos.81730107 and 81973883)the National Science&Technology Basic Research Project(No.2015FY111700)the Shanghai Pudong New District New Area Project(No.PW2022A-78(WQZ)).
文摘Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Methods:Acoustic analysis and TCM constitution assessment were performed for all 620 participants using Praat software and the CCMQ,respectively.Results:For formant features,the speech duration of special constitution participants was shorter than that of neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency participants when pronuncing the vowels/a/,/i/,and/u/.Compare to Yang-deficiency,Qi-deficiency participants had a shorter speech duration when pronucing/i/.For/u/,blood-stasis participants exhibited a lower F1 value than neutral participants.For vocal features,special constitution participants showed higher local jitter than neutral,dampness-heat,and Yang-deficiency participants(for/a/,/i/,and/u/).Higher absolute local jitter than neutral or dampness-heat participants.Compared with neutral or Yang-deficiency participants,special participants owned a higher local shimmer(dB).Special participants had a lower harmonicity autocorrelation than neutral,dampness-heat,or Yang-deficiency participants.Conclusions:Formant features may effectively differentiate special constitution from neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency constitutions based on vowel duration measurements(/a/,/i/,/u/).For the vowel/u/,F1 values may help distinguish blood-stasis from neutral constitution.Vocal features appear particularly useful for distinguishing special constitution from neutral,dampness-heat,or Yang-deficiency constitution,with local jitter and harmonicity autocorrelation showing significant discriminatory power.
文摘BACKGROUND Speech and language therapy(ST)might moderate the prognosis in children with attention deficit and hyperactivity disorder(ADHD)comorbid with speech delay.This study investigated whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.AIM To investigate whether ST in children with ADHD is associated with a decreased risk of subsequent psychiatric disorders.METHODS The population-based National Health Insurance Research Database in Taiwan was used.Hazards of subsequent psychiatric disorders were compared between those who received ST and a propensity-score matched comparison group by Cox regression analyses.RESULTS Of 11987 children with ADHD identified from the dataset,2911(24%)had received ST.The adjusted hazard ratio for any subsequent recorded psychiatric disorder was 0.72(95%confidence interval:0.63-0.82)in children who received ST compared to the matched counterparts.This protective association was only statistically significant in the subgroup that received both medication and behavioral interventions.CONCLUSION ST can moderate the effects of integrated early interventions in ADHD children with speech delay.
基金supported by the National Nature Science Foundation of China(No.62122030,62333008,62371205,52103208)National Key Research and Development Program of China(No.2021YFB3201300)+1 种基金Application and Basic Research of Jilin Province(20130102010 JC)Fundamental Research Funds for the Central Universities,Jilin Provincial Science and Technology Development Program(20230101072JC)。
文摘Wearable pressure sensors capable of adhering comfortably to the skin hold great promise in sound detection.However,current intelligent speech assistants based on pressure sensors can only recognize standard languages,which hampers effective communication for non-standard language people.Here,we prepare an ultralight Ti_(3)C_(2)T_(x)MXene/chitosan/polyvinylidene difluoride composite aerogel with a detection range of 6.25 Pa-1200 k Pa,rapid response/recovery time,and low hysteresis(13.69%).The wearable aerogel pressure sensor can detect speech information through the throat muscle vibrations without any interference,allowing for accurate recognition of six dialects(96.2%accuracy)and seven different words(96.6%accuracy)with the assistance of convolutional neural networks.This work represents a significant step forward in silent speech recognition for human–machine interaction and physiological signal monitoring.
基金This research was funded by Shenzhen Science and Technology Program(Grant No.RCBS20221008093121051)the General Higher Education Project of Guangdong Provincial Education Department(Grant No.2020ZDZX3085)+1 种基金China Postdoctoral Science Foundation(Grant No.2021M703371)the Post-Doctoral Foundation Project of Shenzhen Polytechnic(Grant No.6021330002K).
文摘In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.