“We are together.We are friends forever.Nothing can break the bond between you and me,”On December 30,2025,these heartfelt lyrics,sung in both Chinese and English,filled the historic Erqi Theater in central Beijing....“We are together.We are friends forever.Nothing can break the bond between you and me,”On December 30,2025,these heartfelt lyrics,sung in both Chinese and English,filled the historic Erqi Theater in central Beijing.Theywere performed by 54 young vocalists from the One Voice Children’s Choir of Utah,U.S.,joined on stage by a dozen students from the High School Affiliated to Xi’an Jiaotong University.Ranging in age from five to 18,the choir delivered a spectacular performance that visibly captivated the audience.展开更多
With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-...With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-to-end datamodem scheme that transmits the caller’s digital certificates through a voice channel for the recipient to verify the caller’s identity.Encoding useful information through voice channels is very difficult without the assistance of telecommunications providers.For example,speech activity detection may quickly classify encoded signals as nonspeech signals and reject input waveforms.To address this issue,we propose a novel modulation method based on linear frequency modulation that encodes 3 bits per symbol by varying its frequency,shape,and phase,alongside a lightweightMobileNetV3-Small-based demodulator for efficient and accurate signal decoding on resource-constrained devices.This method leverages the unique characteristics of linear frequency modulation signals,making them more easily transmitted and decoded in speech channels.To ensure reliable data delivery over unstable voice links,we further introduce a robust framing scheme with delimiter-based synchronization,a sample-level position remedying algorithm,and a feedback-driven retransmission mechanism.We have validated the feasibility and performance of our system through expanded real-world evaluations,demonstrating that it outperforms existing advanced methods in terms of robustness and data transfer rate.This technology establishes the foundational infrastructure for reliable certificate delivery over voice channels,which is crucial for achieving strong caller authentication and preventing telephone fraud at its root cause.展开更多
Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learni...Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.展开更多
Research on adaptive deformable mirror technology for voice coil actuators(VCAs)is an important trend in the development of large ground-based telescopes.A voice coil adaptive deformable mirror contains a large number...Research on adaptive deformable mirror technology for voice coil actuators(VCAs)is an important trend in the development of large ground-based telescopes.A voice coil adaptive deformable mirror contains a large number of actuators,and there are problems with structural coupling and large temperature increases in their internal coils.Additionally,parameters of the traditional proportional integral derivative(PID)control cannot be adjusted in real-time to adapt to system changes.These problems can be addressed by introducing fuzzy control methods.A table lookup method is adopted to replace real-time calculations of the regular fuzzy controller during the control process,and a prototype platform has been established to verify the effectiveness and robustness of this process.Experimental tests compare the control performance of traditional and fuzzy proportional integral derivative(Fuzzy-PID)controllers,showing that,in system step response tests,the fuzzy control system reduces rise time by 20.25%,decreases overshoot by 78.24%,and shortens settling time by 67.59%.In disturbance rejection experiments,fuzzy control achieves a 46.09%reduction in the maximum deviation,indicating stronger robustness.The Fuzzy-PID controller,based on table lookup,outperforms the standard controller significantly,showing excellent potential for enhancing the dynamic performance and disturbance rejection capability of the voice coil motor actuator system.展开更多
Expert System (ES) is considered effective and efficient in agricultural production, as agricultural informationization becomes a main trend in agricultural development. ES, however, is applied unsatisfactorily in m...Expert System (ES) is considered effective and efficient in agricultural production, as agricultural informationization becomes a main trend in agricultural development. ES, however, is applied unsatisfactorily in most rural areas of China and it has considerably affected and restricted the development of the agricultural informationization. This paper proposed a solution to voice service system of ES, which was suitable for the information transmission, and it especially could help the peasants in remote regions obtain knowledge from ES through the voice service system. As for the disadvantages of massive knowledge data and slow deduction, in this system the classification method could be adopted based on the decision tree. Designing pruning algorithm to "trim off" the unrelated knowledge to the users in query course would simplify the structure of the decision tree and accelerate the speed of deduction before the inference engine deduced the knowledge required by users.展开更多
In this paper, an expert system for security based on biometric human features that can be obtained without any contact with the registering sensor is presented. These features are extracted from human’s voice, so th...In this paper, an expert system for security based on biometric human features that can be obtained without any contact with the registering sensor is presented. These features are extracted from human’s voice, so the system is called Voice Recognition System (VRS). The proposed system?consists of a combination of three stages: signal pre-processing, features extraction by using?Wavelet Packet Transform (WPT) and features matching by using Artificial Neural Networks (ANNs). The features vectors are formed after two steps: firstly, decomposing the speech signal at level 7 with Daubechies 20-tap (db20), secondly, the energy corresponding to each WPT node is calculated which collected to form a features vector. One hundred twenty eight features vector for each speaker was fed to the Feed Forward Back-propagation Neural Network (FFBPNN). The data used in this paper are drawn from the English Language Speech Database for Speaker Recognition (ELSDSR) database which composes of audio files for training and other files for testing. The performance of the proposed system is evaluated by using the test files. Our results showed that the rate of correct recognition of the proposed system is about 100% for training files and 95.7% for one testing file for each speaker from the ELSDSR database. The proposed method showed efficiency results were better than the well-known Mel Frequency Cepstral Coefficient (MFCC) and the Zak transform.展开更多
In this paper,the key techniques and approaches to pragmatize text and voice integrated paging system are discussed. Based on the analyses, a 2 400 bps integrated experimental paging system fully compatible with P...In this paper,the key techniques and approaches to pragmatize text and voice integrated paging system are discussed. Based on the analyses, a 2 400 bps integrated experimental paging system fully compatible with POCSAG system is presented. The theory展开更多
College classes are becoming increasingly large.A critical component in scaling class size is the collaboration and interactions among instructors,teaching assistants,and students.We develop a prototype of an intellig...College classes are becoming increasingly large.A critical component in scaling class size is the collaboration and interactions among instructors,teaching assistants,and students.We develop a prototype of an intelligent voice instructorassistant system for supporting large classes,in which Amazon Web Services,Alexa Voice Services,and self-developed services are used.It uses a scraping service for reading the questions and answers from the past and current course discussion boards,organizes the questions in JavaScript object notation format,and stores them in the database,which can be accessed by Amazon web services Alexa skills.When a voice question from a student comes,Alexa is used for translating the voice sentence into texts.Then,Siamese deep long short-term memory model is introduced to calculate the similarity between the question asked and the questions in the database to find the best-matched answer.Questions with no match will be sent to the instructor,and instructor’s answer will be added into the database.Experiments show that the implemented model achieves promising results that can lead to a practical system.Intelligent voice instructor-assistant system starts with a small set of questions.It can grow through learning and improving when more and more questions are asked and answered.展开更多
This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using A...This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.展开更多
Along with the rapid development of informatization, people habitually rely on electronic devices. For the convenience and efficiency provided by various devices, people could dispose almost everything by their phones...Along with the rapid development of informatization, people habitually rely on electronic devices. For the convenience and efficiency provided by various devices, people could dispose almost everything by their phones and PC. In this respect, the security of the electronic devices that people use in daily life becomes much more important. In this area, exerts has tried many ways to keep data and uses safe from attacks. This paper has designed a Voice Based Biometric Security System, which is able to reinforce the security level of the target device. With its help, the electronic devices will be more security.展开更多
Non-Terrestrial Networks(NTN)can be used to provide emergency voice services in Sixth-Generation(6G)communication systems.However,Internet of Things(Io T)terminals,which comprise restricted bandwidth resources and wea...Non-Terrestrial Networks(NTN)can be used to provide emergency voice services in Sixth-Generation(6G)communication systems.However,Internet of Things(Io T)terminals,which comprise restricted bandwidth resources and weak computing power,which make ensuring high-quality voice services over NTN challenging.Recent advancements in Artificial Intelligence(AI)techniques have been increasingly applied to enhance the audio quality and reduce the bit rate.However,applying models with high computational complexity to Io T terminals is difficult.In this study,we propose a voice-services-over NTN solution including a novel 6G non-terrestrial and ground network integrated framework and a lightweight Large Models(LMs)-driven codec operating at 450 bits per second.We also designed a new voice packet header and deployed an agent on-ground gateway to reduce the bandwidth overhead.The non-standard Session Initiation Protocol header was converted to the standard format while re-encapsulating Internet Protocol and User Datagram Protocol headers,replacing the conventional implementations.Additionally,an operational NTN satellite was used to evaluate the proposed Re Codec.The experimental results demonstrate that the Re Codec decreases the computational complexity by 96.61%while increasing the voice quality by 17.55%when compared with the state-of-the-art mechanisms.Furthermore,the design of the packet header reduced the voice frame header to 50 bytes.展开更多
This study examined the relationship between inclusive leadership and subordinates’upward voice,focusing on the mediating role of psychological safety and the moderating role of collectivism.Data were collected from ...This study examined the relationship between inclusive leadership and subordinates’upward voice,focusing on the mediating role of psychological safety and the moderating role of collectivism.Data were collected from 284 subordinates and supervisors across 11 organizations in China in three cross-lagged waves.Structural equation modeling results indicated that inclusive leadership was associated with subordinates’upward voice via psychological safety.Moreover,collectivism strengthens the association between inclusive leadership and upward voice via psychological safety,leading to a higher upward voice.These findings highlight the importance of inclusive leadership in fostering an environment that promotes open communication and psychological safety between supervisors and subordinates,ultimately enhancing workplace health and well-being.The implications of these findings suggest that management practices should cultivate inclusive leadership behaviors for enhancing psychological safety,and encouraging subordinates to voice their opinions for the overall success of the organization.展开更多
Although substantial research shows the effectiveness of written corrective feedback(WCF)in treating simple grammar structures,more research is still needed to refute Truscott’s claim that WCF may not work on complex...Although substantial research shows the effectiveness of written corrective feedback(WCF)in treating simple grammar structures,more research is still needed to refute Truscott’s claim that WCF may not work on complex grammar structures.Similarly,a previous body of research has shown that the degree of explicitness of feedback moderates the efficacy of WCF.However,most WCF studies have systematically manipulated only direct corrective feedback.The current study was therefore conducted to fill these gaps in the literature.To this end,five intact classes of Functional English were recruited and later randomly assigned to four treatment groups:DCF,DCF+ME,ICF,and ICF+ME,and one control group that received no feedback.All the groups took part in three WCF treatment sessions,during which they wrote two different pieces:a news report and a picture description.Later,only the treatment groups received the WCF.The WCF’s effectiveness was measured by writing tests and grammaticality judgment tasks(GJT).The results demonstrated that WCF helped L2 learners improve their grammatical accuracy of passive voice tenses.The study further showed that the group that received the most explicit type of WCF fared better than the ones that received the least explicit type of WCF.Important pedagogical implications for ESL/EFL teachers are discussed.展开更多
Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Metho...Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Methods:Acoustic analysis and TCM constitution assessment were performed for all 620 participants using Praat software and the CCMQ,respectively.Results:For formant features,the speech duration of special constitution participants was shorter than that of neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency participants when pronuncing the vowels/a/,/i/,and/u/.Compare to Yang-deficiency,Qi-deficiency participants had a shorter speech duration when pronucing/i/.For/u/,blood-stasis participants exhibited a lower F1 value than neutral participants.For vocal features,special constitution participants showed higher local jitter than neutral,dampness-heat,and Yang-deficiency participants(for/a/,/i/,and/u/).Higher absolute local jitter than neutral or dampness-heat participants.Compared with neutral or Yang-deficiency participants,special participants owned a higher local shimmer(dB).Special participants had a lower harmonicity autocorrelation than neutral,dampness-heat,or Yang-deficiency participants.Conclusions:Formant features may effectively differentiate special constitution from neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency constitutions based on vowel duration measurements(/a/,/i/,/u/).For the vowel/u/,F1 values may help distinguish blood-stasis from neutral constitution.Vocal features appear particularly useful for distinguishing special constitution from neutral,dampness-heat,or Yang-deficiency constitution,with local jitter and harmonicity autocorrelation showing significant discriminatory power.展开更多
This study examined the relationship between leader-employee calling congruence on employees’voice behaviour.Participants were 173 leader-employee dyads from the Chinese service industry.They completed online surveys...This study examined the relationship between leader-employee calling congruence on employees’voice behaviour.Participants were 173 leader-employee dyads from the Chinese service industry.They completed online surveys on calling,perceived insider status,and voice behaviour.Results from polynomial regression and response surface analysis showed that employees perceived insider status to be weaker with the low leader-low subordinate calling congruence,and stronger with high leader and high subordinate calling congruence.Employees perceived insider status is stronger in low leader and high subordinate calling incongruence compared with high leader and low subordinate calling incongruence.Perceived insider status plays a mediating role among calling congruence and voice behaviour.This study’sfindings suggest pathways of calling congruence on voice behaviour,which are important for promoting employee voice behaviour and guiding organisational recruitment in the workplace.展开更多
Objective:The objective of this study was to compare the effect of nurse and beloved family member’s recording voice on consciousness and physical parameters in patients with coma state.Materials and Methods:A random...Objective:The objective of this study was to compare the effect of nurse and beloved family member’s recording voice on consciousness and physical parameters in patients with coma state.Materials and Methods:A randomized control trial parallel group design was conducted among 45 comatose patients divided into two intervention groups,i.e.nurse voice stimulus group,receiving nurses voice with standard care,family members voice stimulus group receiving their beloved family member voice with standard care and one control group receiving only standard care in medicine intensive care unit.The intervention was provided three times a day,each lasting 5 min for 7 days in addition to standard care.Repeated measure analysis of variance and independent t-test were used to compare within and between groups,respectively.Results:The study found significant differences in Glasgow coma scale(GCS)scores within both the nurse(F=2.78,P=0.042)and family member(F=10.27,P=0.0001)voice stimulus groups over 7 days.Comparing GCS scores between intervention groups showed significant variations before(P=0.028),during(P=0.047),and after(P=0.036)the intervention on day 7.Comparing GCS scores between the family members’voice stimulus group and the control group,significant changes were observed on days 5 and 7(P=0.043,0.030,0.030,and 0.014,0.012,0.012)before,during,and after the intervention.Conclusions:The use of beloved family members’voices proved more effective in elevating the patients’level of consciousness compared to both the nurse voice stimulus group and the control group.展开更多
Du Zhanyuan,Standing Committee member of the 14th CPPCC National Committee and CICG president,on how to tell engaging stories about China.AS changes unseen in a century accelerate across the world,cultural exchange an...Du Zhanyuan,Standing Committee member of the 14th CPPCC National Committee and CICG president,on how to tell engaging stories about China.AS changes unseen in a century accelerate across the world,cultural exchange and mutual learning between civilizations are becoming increasingly important.展开更多
This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization(MSO)to address the complex multi-objective challenges of pathological speech synthesis in underresourced Lithua...This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization(MSO)to address the complex multi-objective challenges of pathological speech synthesis in underresourced Lithuanian language with unique phonemes not present in most pre-trained models.Unlike existing voice synthesis models that often optimize for a single objective or are restricted to major languages,our approach explicitly balances four competing criteria:speech naturalness,speaker similarity,computational efficiency,and adaptability to pathological voice patterns.We evaluate four model configurations combining Lithuanian and English encoders,synthesizers,and vocoders.The hybrid model(English encoder,Lithuanian synthesizer,English vocoder),optimized via MSO,achieved the highest Mean Opinion Score(MOS)of 4.3 and demonstrated superior intelligibility and speaker fidelity.The results confirm that MSO enables effective navigation of trade-offs in multilingual pathological voice cloning,offering a scalable path toward high-quality voice restoration in clinical speech applications.This work represents the first integration of Mordukhovich optimization into pathological TTS,setting a new benchmark for speech synthesis under clinical and linguistic constraints.展开更多
文摘“We are together.We are friends forever.Nothing can break the bond between you and me,”On December 30,2025,these heartfelt lyrics,sung in both Chinese and English,filled the historic Erqi Theater in central Beijing.Theywere performed by 54 young vocalists from the One Voice Children’s Choir of Utah,U.S.,joined on stage by a dozen students from the High School Affiliated to Xi’an Jiaotong University.Ranging in age from five to 18,the choir delivered a spectacular performance that visibly captivated the audience.
文摘With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-to-end datamodem scheme that transmits the caller’s digital certificates through a voice channel for the recipient to verify the caller’s identity.Encoding useful information through voice channels is very difficult without the assistance of telecommunications providers.For example,speech activity detection may quickly classify encoded signals as nonspeech signals and reject input waveforms.To address this issue,we propose a novel modulation method based on linear frequency modulation that encodes 3 bits per symbol by varying its frequency,shape,and phase,alongside a lightweightMobileNetV3-Small-based demodulator for efficient and accurate signal decoding on resource-constrained devices.This method leverages the unique characteristics of linear frequency modulation signals,making them more easily transmitted and decoded in speech channels.To ensure reliable data delivery over unstable voice links,we further introduce a robust framing scheme with delimiter-based synchronization,a sample-level position remedying algorithm,and a feedback-driven retransmission mechanism.We have validated the feasibility and performance of our system through expanded real-world evaluations,demonstrating that it outperforms existing advanced methods in terms of robustness and data transfer rate.This technology establishes the foundational infrastructure for reliable certificate delivery over voice channels,which is crucial for achieving strong caller authentication and preventing telephone fraud at its root cause.
基金supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2025/03/32440).
文摘Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.
基金supported by the National Key R&D Program of China (2022YFA1603001,2021YFC2801402)the National Nature Science Foundation of China (12073053)the Science and Technology Plan of Inner Mongolia (2021GG0245).
文摘Research on adaptive deformable mirror technology for voice coil actuators(VCAs)is an important trend in the development of large ground-based telescopes.A voice coil adaptive deformable mirror contains a large number of actuators,and there are problems with structural coupling and large temperature increases in their internal coils.Additionally,parameters of the traditional proportional integral derivative(PID)control cannot be adjusted in real-time to adapt to system changes.These problems can be addressed by introducing fuzzy control methods.A table lookup method is adopted to replace real-time calculations of the regular fuzzy controller during the control process,and a prototype platform has been established to verify the effectiveness and robustness of this process.Experimental tests compare the control performance of traditional and fuzzy proportional integral derivative(Fuzzy-PID)controllers,showing that,in system step response tests,the fuzzy control system reduces rise time by 20.25%,decreases overshoot by 78.24%,and shortens settling time by 67.59%.In disturbance rejection experiments,fuzzy control achieves a 46.09%reduction in the maximum deviation,indicating stronger robustness.The Fuzzy-PID controller,based on table lookup,outperforms the standard controller significantly,showing excellent potential for enhancing the dynamic performance and disturbance rejection capability of the voice coil motor actuator system.
基金Supported by Northeast Agricultural University Doctoral Development FoundationChina Postdoctoral Science Foundation
文摘Expert System (ES) is considered effective and efficient in agricultural production, as agricultural informationization becomes a main trend in agricultural development. ES, however, is applied unsatisfactorily in most rural areas of China and it has considerably affected and restricted the development of the agricultural informationization. This paper proposed a solution to voice service system of ES, which was suitable for the information transmission, and it especially could help the peasants in remote regions obtain knowledge from ES through the voice service system. As for the disadvantages of massive knowledge data and slow deduction, in this system the classification method could be adopted based on the decision tree. Designing pruning algorithm to "trim off" the unrelated knowledge to the users in query course would simplify the structure of the decision tree and accelerate the speed of deduction before the inference engine deduced the knowledge required by users.
文摘In this paper, an expert system for security based on biometric human features that can be obtained without any contact with the registering sensor is presented. These features are extracted from human’s voice, so the system is called Voice Recognition System (VRS). The proposed system?consists of a combination of three stages: signal pre-processing, features extraction by using?Wavelet Packet Transform (WPT) and features matching by using Artificial Neural Networks (ANNs). The features vectors are formed after two steps: firstly, decomposing the speech signal at level 7 with Daubechies 20-tap (db20), secondly, the energy corresponding to each WPT node is calculated which collected to form a features vector. One hundred twenty eight features vector for each speaker was fed to the Feed Forward Back-propagation Neural Network (FFBPNN). The data used in this paper are drawn from the English Language Speech Database for Speaker Recognition (ELSDSR) database which composes of audio files for training and other files for testing. The performance of the proposed system is evaluated by using the test files. Our results showed that the rate of correct recognition of the proposed system is about 100% for training files and 95.7% for one testing file for each speaker from the ELSDSR database. The proposed method showed efficiency results were better than the well-known Mel Frequency Cepstral Coefficient (MFCC) and the Zak transform.
文摘In this paper,the key techniques and approaches to pragmatize text and voice integrated paging system are discussed. Based on the analyses, a 2 400 bps integrated experimental paging system fully compatible with POCSAG system is presented. The theory
基金The authors wish to thank their colleagues and students who were involved in this study and provided valuable implementation and technical support.The research is partly supported by general funding at IoT and Robotics Education Lab and FURI program at Arizona State University and is partly supported by China Scholarship Council,Guangdong Science and Technology Department,under Grant Number 2016A010101020,2016A010101021,and 2016A010101022Guangzhou Science and Information Bureau under Grant Number 201802010033.
文摘College classes are becoming increasingly large.A critical component in scaling class size is the collaboration and interactions among instructors,teaching assistants,and students.We develop a prototype of an intelligent voice instructorassistant system for supporting large classes,in which Amazon Web Services,Alexa Voice Services,and self-developed services are used.It uses a scraping service for reading the questions and answers from the past and current course discussion boards,organizes the questions in JavaScript object notation format,and stores them in the database,which can be accessed by Amazon web services Alexa skills.When a voice question from a student comes,Alexa is used for translating the voice sentence into texts.Then,Siamese deep long short-term memory model is introduced to calculate the similarity between the question asked and the questions in the database to find the best-matched answer.Questions with no match will be sent to the instructor,and instructor’s answer will be added into the database.Experiments show that the implemented model achieves promising results that can lead to a practical system.Intelligent voice instructor-assistant system starts with a small set of questions.It can grow through learning and improving when more and more questions are asked and answered.
基金Supported by the National Natural Science Foundation of China (No. 60872105)the Program for Science & Technology Innovative Research Team of Qing Lan Project in Higher Educational Institutions of Jiangsuthe Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD)
文摘This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.
文摘Along with the rapid development of informatization, people habitually rely on electronic devices. For the convenience and efficiency provided by various devices, people could dispose almost everything by their phones and PC. In this respect, the security of the electronic devices that people use in daily life becomes much more important. In this area, exerts has tried many ways to keep data and uses safe from attacks. This paper has designed a Voice Based Biometric Security System, which is able to reinforce the security level of the target device. With its help, the electronic devices will be more security.
基金supported in part by the Major Key Project of PCL under Grant PCL2023A07the Research and Development Program of China Telecom under Grant T-2025-27。
文摘Non-Terrestrial Networks(NTN)can be used to provide emergency voice services in Sixth-Generation(6G)communication systems.However,Internet of Things(Io T)terminals,which comprise restricted bandwidth resources and weak computing power,which make ensuring high-quality voice services over NTN challenging.Recent advancements in Artificial Intelligence(AI)techniques have been increasingly applied to enhance the audio quality and reduce the bit rate.However,applying models with high computational complexity to Io T terminals is difficult.In this study,we propose a voice-services-over NTN solution including a novel 6G non-terrestrial and ground network integrated framework and a lightweight Large Models(LMs)-driven codec operating at 450 bits per second.We also designed a new voice packet header and deployed an agent on-ground gateway to reduce the bandwidth overhead.The non-standard Session Initiation Protocol header was converted to the standard format while re-encapsulating Internet Protocol and User Datagram Protocol headers,replacing the conventional implementations.Additionally,an operational NTN satellite was used to evaluate the proposed Re Codec.The experimental results demonstrate that the Re Codec decreases the computational complexity by 96.61%while increasing the voice quality by 17.55%when compared with the state-of-the-art mechanisms.Furthermore,the design of the packet header reduced the voice frame header to 50 bytes.
基金supported by the China Postdoctoral Science Foundation(Certificate Number:2024M760126).
文摘This study examined the relationship between inclusive leadership and subordinates’upward voice,focusing on the mediating role of psychological safety and the moderating role of collectivism.Data were collected from 284 subordinates and supervisors across 11 organizations in China in three cross-lagged waves.Structural equation modeling results indicated that inclusive leadership was associated with subordinates’upward voice via psychological safety.Moreover,collectivism strengthens the association between inclusive leadership and upward voice via psychological safety,leading to a higher upward voice.These findings highlight the importance of inclusive leadership in fostering an environment that promotes open communication and psychological safety between supervisors and subordinates,ultimately enhancing workplace health and well-being.The implications of these findings suggest that management practices should cultivate inclusive leadership behaviors for enhancing psychological safety,and encouraging subordinates to voice their opinions for the overall success of the organization.
文摘Although substantial research shows the effectiveness of written corrective feedback(WCF)in treating simple grammar structures,more research is still needed to refute Truscott’s claim that WCF may not work on complex grammar structures.Similarly,a previous body of research has shown that the degree of explicitness of feedback moderates the efficacy of WCF.However,most WCF studies have systematically manipulated only direct corrective feedback.The current study was therefore conducted to fill these gaps in the literature.To this end,five intact classes of Functional English were recruited and later randomly assigned to four treatment groups:DCF,DCF+ME,ICF,and ICF+ME,and one control group that received no feedback.All the groups took part in three WCF treatment sessions,during which they wrote two different pieces:a news report and a picture description.Later,only the treatment groups received the WCF.The WCF’s effectiveness was measured by writing tests and grammaticality judgment tasks(GJT).The results demonstrated that WCF helped L2 learners improve their grammatical accuracy of passive voice tenses.The study further showed that the group that received the most explicit type of WCF fared better than the ones that received the least explicit type of WCF.Important pedagogical implications for ESL/EFL teachers are discussed.
基金supported by the National Natural Science Foundation of China(Nos.81730107 and 81973883)the National Science&Technology Basic Research Project(No.2015FY111700)the Shanghai Pudong New District New Area Project(No.PW2022A-78(WQZ)).
文摘Objectives:By investigating the distinct speech and voice phenotype among TCM constitution for adults,this study aims at providing a convenient and objective methodological reference for judging TCM constitution.Methods:Acoustic analysis and TCM constitution assessment were performed for all 620 participants using Praat software and the CCMQ,respectively.Results:For formant features,the speech duration of special constitution participants was shorter than that of neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency participants when pronuncing the vowels/a/,/i/,and/u/.Compare to Yang-deficiency,Qi-deficiency participants had a shorter speech duration when pronucing/i/.For/u/,blood-stasis participants exhibited a lower F1 value than neutral participants.For vocal features,special constitution participants showed higher local jitter than neutral,dampness-heat,and Yang-deficiency participants(for/a/,/i/,and/u/).Higher absolute local jitter than neutral or dampness-heat participants.Compared with neutral or Yang-deficiency participants,special participants owned a higher local shimmer(dB).Special participants had a lower harmonicity autocorrelation than neutral,dampness-heat,or Yang-deficiency participants.Conclusions:Formant features may effectively differentiate special constitution from neutral,phlegm-dampness,dampness-heat,Yin-deficiency,or Yang-deficiency constitutions based on vowel duration measurements(/a/,/i/,/u/).For the vowel/u/,F1 values may help distinguish blood-stasis from neutral constitution.Vocal features appear particularly useful for distinguishing special constitution from neutral,dampness-heat,or Yang-deficiency constitution,with local jitter and harmonicity autocorrelation showing significant discriminatory power.
基金supported by the Major Research Project of Philosophy and Social Sciences in Universities of Henan Province(2025-JCZD-10)the Natural Science Foundation of Henan Province(242300421311).
文摘This study examined the relationship between leader-employee calling congruence on employees’voice behaviour.Participants were 173 leader-employee dyads from the Chinese service industry.They completed online surveys on calling,perceived insider status,and voice behaviour.Results from polynomial regression and response surface analysis showed that employees perceived insider status to be weaker with the low leader-low subordinate calling congruence,and stronger with high leader and high subordinate calling congruence.Employees perceived insider status is stronger in low leader and high subordinate calling incongruence compared with high leader and low subordinate calling incongruence.Perceived insider status plays a mediating role among calling congruence and voice behaviour.This study’sfindings suggest pathways of calling congruence on voice behaviour,which are important for promoting employee voice behaviour and guiding organisational recruitment in the workplace.
文摘Objective:The objective of this study was to compare the effect of nurse and beloved family member’s recording voice on consciousness and physical parameters in patients with coma state.Materials and Methods:A randomized control trial parallel group design was conducted among 45 comatose patients divided into two intervention groups,i.e.nurse voice stimulus group,receiving nurses voice with standard care,family members voice stimulus group receiving their beloved family member voice with standard care and one control group receiving only standard care in medicine intensive care unit.The intervention was provided three times a day,each lasting 5 min for 7 days in addition to standard care.Repeated measure analysis of variance and independent t-test were used to compare within and between groups,respectively.Results:The study found significant differences in Glasgow coma scale(GCS)scores within both the nurse(F=2.78,P=0.042)and family member(F=10.27,P=0.0001)voice stimulus groups over 7 days.Comparing GCS scores between intervention groups showed significant variations before(P=0.028),during(P=0.047),and after(P=0.036)the intervention on day 7.Comparing GCS scores between the family members’voice stimulus group and the control group,significant changes were observed on days 5 and 7(P=0.043,0.030,0.030,and 0.014,0.012,0.012)before,during,and after the intervention.Conclusions:The use of beloved family members’voices proved more effective in elevating the patients’level of consciousness compared to both the nurse voice stimulus group and the control group.
文摘Du Zhanyuan,Standing Committee member of the 14th CPPCC National Committee and CICG president,on how to tell engaging stories about China.AS changes unseen in a century accelerate across the world,cultural exchange and mutual learning between civilizations are becoming increasingly important.
基金funding from the Research Council of Lithuania(LMTLT),agreement No.S-MIP-23-46.
文摘This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization(MSO)to address the complex multi-objective challenges of pathological speech synthesis in underresourced Lithuanian language with unique phonemes not present in most pre-trained models.Unlike existing voice synthesis models that often optimize for a single objective or are restricted to major languages,our approach explicitly balances four competing criteria:speech naturalness,speaker similarity,computational efficiency,and adaptability to pathological voice patterns.We evaluate four model configurations combining Lithuanian and English encoders,synthesizers,and vocoders.The hybrid model(English encoder,Lithuanian synthesizer,English vocoder),optimized via MSO,achieved the highest Mean Opinion Score(MOS)of 4.3 and demonstrated superior intelligibility and speaker fidelity.The results confirm that MSO enables effective navigation of trade-offs in multilingual pathological voice cloning,offering a scalable path toward high-quality voice restoration in clinical speech applications.This work represents the first integration of Mordukhovich optimization into pathological TTS,setting a new benchmark for speech synthesis under clinical and linguistic constraints.