This study investigates the correlation between musical competence and English prosodic production as well as the associations among their internal components.Eighty Chinese EFL learners in college took the musical pe...This study investigates the correlation between musical competence and English prosodic production as well as the associations among their internal components.Eighty Chinese EFL learners in college took the musical perception test,musical production test and English prosodic production test in sequence.In order to take a closer look at the prosodic performance,taking native English speakers as a reference,we also conducted an in-depth acoustic analysis of the English prosody produced by six cases in both high and low musical competence groups with the aid of Praat and ToBI.Our results indicate that students with high musical competence tend to outperform those with low musical competence in English prosodic production.However,musical perception competence seems to have a greater correlation with English prosodic competence than musical production competence.In addition,beat alignment or tempo(related to time and stress)could be the most associated component in musical competence with English prosody,while melody in music was double confirmed to be unhelpful.It was also spotted that cases with high musical competence performed more native-like patterns of phrase accent,boundary tones,intonational phrases as well as placement of stress.The results have important theoretical implications for the construct interpretation of musical competence,and also suggest that rhythmic perception training could be the most effective way to transfer music achievement to English speech prosody.展开更多
Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitud...Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.展开更多
The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic featur...The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.展开更多
Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for liste...Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.展开更多
Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language...Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language family of Altaic language system and further verifies the reliability of proposed Uyghur prosodic boundary annotation rules by acoustic analysis. In the prediction part, a two-layer shifting hierarchical approach based on decision tree is used for predicting prosodic word and prosodic phrase boundary, and the influence of different feature sets on the Uyghur prosodic boundary prediction is also investigated. Experimental results clearly show the acoustical changes and automatic prediction performance of different prosodic boundaries of Uyghur language, thus laying a good foundation for further research.展开更多
Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate...Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.展开更多
In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel c...In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.展开更多
Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music cultur...Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music culture that takes inspiration from the customs and traditions of a society. Prosodic meter is echoed in form, usul and general structure in works composed from the divan literature and almost lives in the work. In the same way, when examples of folk literature composed by composers and performed by poets and a^lks are examined, it is observed that there are parallels between literary features and form, structure and rhythmic features. The aim of this paper is to reveal the integral link between Melody-Usul and Meter in Ottoman Turkish Music展开更多
This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-stud...This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.展开更多
To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially...To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.展开更多
韵律趋同研究是语言学、心理学、神经科学、人工智能及临床医学交叉领域的重要课题和研究热点。CiteSpace可视化文献计量方法,对Web of Science核心数据库中2000-2024年国际韵律趋同的相关研究进行全面分析发现:1.25年来,韵律趋同研究...韵律趋同研究是语言学、心理学、神经科学、人工智能及临床医学交叉领域的重要课题和研究热点。CiteSpace可视化文献计量方法,对Web of Science核心数据库中2000-2024年国际韵律趋同的相关研究进行全面分析发现:1.25年来,韵律趋同研究的年度发文量持续增长,其中2013年陡增至67篇,达到峰值;研究主要从属的学科有语言学、计算机科学、声学、语言病理学、心理学等;2.国际韵律趋同研究的热点体现了10个聚类,分别是语言产出、韵律特征、语音实现、趋同、语言习得、大脑神经、人机互动、听力视觉、语言合成、语言感知;3.韵律趋同研究的前沿趋势主要体现在人机互动、模型构建、个体差异(包括认知水平、语音能力、心理状态、文化背景)等方面。展开更多
文摘This study investigates the correlation between musical competence and English prosodic production as well as the associations among their internal components.Eighty Chinese EFL learners in college took the musical perception test,musical production test and English prosodic production test in sequence.In order to take a closer look at the prosodic performance,taking native English speakers as a reference,we also conducted an in-depth acoustic analysis of the English prosody produced by six cases in both high and low musical competence groups with the aid of Praat and ToBI.Our results indicate that students with high musical competence tend to outperform those with low musical competence in English prosodic production.However,musical perception competence seems to have a greater correlation with English prosodic competence than musical production competence.In addition,beat alignment or tempo(related to time and stress)could be the most associated component in musical competence with English prosody,while melody in music was double confirmed to be unhelpful.It was also spotted that cases with high musical competence performed more native-like patterns of phrase accent,boundary tones,intonational phrases as well as placement of stress.The results have important theoretical implications for the construct interpretation of musical competence,and also suggest that rhythmic perception training could be the most effective way to transfer music achievement to English speech prosody.
文摘Modification on time scale and pitch scale of Chinese syllable based on sinusoidal model is presented in this paper. Firstly, the short term speech is decomposed into a sum of sinusoidal waves of different magnitudes and phases. Then vocal tract system and excitation are obtained using a homomophic technique. Lastly, the speech with desired time scale and pitch scale is obtained through the change of frequency and phase of excitation while the parameters of vocal tract system are changed accordingly. The results show that the adjustable scale of pitch and time scale is big using this algorithm and it is suitable to be used in analysis and synthesis of Chinese speech.
文摘The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.
文摘Interactive communication is not straightforward but complicated. Prosodic features play an influential role in English communication. They can be used to signal certain pragmatic purposes in real situations for listeners and speakers to have mutual understanding. Identifying the pragmatic functions of prosodic features will facilitate the teaching of listening and speaking. English teachers need to clarify and emphasize the relationship between prosodic features and their pragmatic functions, attempting to work out how to combine them together into teaching in order to teach students to communicate effectively.
基金Supported by the National Natural Science Foundation of China(61065005and61062008)
文摘Correct prosodic boundary prediction is crucial for the quality of synthesized speech in text-to-speech system. This article mainly presents the prosodic hierarchy of Uyghur language, which belongs to Turkish language family of Altaic language system and further verifies the reliability of proposed Uyghur prosodic boundary annotation rules by acoustic analysis. In the prediction part, a two-layer shifting hierarchical approach based on decision tree is used for predicting prosodic word and prosodic phrase boundary, and the influence of different feature sets on the Uyghur prosodic boundary prediction is also investigated. Experimental results clearly show the acoustical changes and automatic prediction performance of different prosodic boundaries of Uyghur language, thus laying a good foundation for further research.
基金supported by the Brain Korea 21 Project in 2010,and the MKE(The Ministry of Knowledge Economy,Korea)the ITRC(Information Technology Research Center)support program(NIPA-2010-(C1090-1021-0010))
文摘Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.
文摘In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.
文摘Language, literature, customs and traditions, music and art are cultural items that were transmitted from generation to generation throughout history. In this context, literature is an important source of music culture that takes inspiration from the customs and traditions of a society. Prosodic meter is echoed in form, usul and general structure in works composed from the divan literature and almost lives in the work. In the same way, when examples of folk literature composed by composers and performed by poets and a^lks are examined, it is observed that there are parallels between literary features and form, structure and rhythmic features. The aim of this paper is to reveal the integral link between Melody-Usul and Meter in Ottoman Turkish Music
文摘This paper, particularly focusing on the pitch of prosodic words,has conducted a contrastive study on the structure of prosodic words in Englishand Mandarin . This paper reports a Mandarin monologue speech corpus-study, anexperimental phonetic attempt to conduct a study on the pitch of trisyllabic prosodicwords in Mandarin monologue. In addition, taking the characteristics of Englishprosodic words into consideration, the paper makes a contrastive analysis of prosodicwords in English and Mandarin. This study finds that the pitch of trisyllabic prosodicwords in Mandarin is inevitably affected by structural factors. As far as the leftsyllable is concerned, the grammatical category, prosodic hierarchical boundary andthe position of the intonational phrase where the syllable is located, the mid syllableand the right syllable may have influences on the pitch contour of the left syllable.As to the mid syllable, the grammatical category, the left syllable, the right syllableand the position of the intonational phrase where the syllable is located may haveinfluences on the pitch contour of the mid syllable. As for the right syllable, theprosodic hierarchical boundary where the syllable is located and the mid syllable mayhave effects on the pitch contour of the right syllable. Different from the previousfindings of the study on read corpus, this study shows that the mid syllable not onlyhas dissimilatory effects but also has assimilatory effects on the pitch of its precedingsyllable. The left syllable has anticipatory effects on the onset pitch of the mid syllableand the right syllable has coarticulation effects on the offset pitch of the mid syllable.
文摘To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.
文摘韵律趋同研究是语言学、心理学、神经科学、人工智能及临床医学交叉领域的重要课题和研究热点。CiteSpace可视化文献计量方法,对Web of Science核心数据库中2000-2024年国际韵律趋同的相关研究进行全面分析发现:1.25年来,韵律趋同研究的年度发文量持续增长,其中2013年陡增至67篇,达到峰值;研究主要从属的学科有语言学、计算机科学、声学、语言病理学、心理学等;2.国际韵律趋同研究的热点体现了10个聚类,分别是语言产出、韵律特征、语音实现、趋同、语言习得、大脑神经、人机互动、听力视觉、语言合成、语言感知;3.韵律趋同研究的前沿趋势主要体现在人机互动、模型构建、个体差异(包括认知水平、语音能力、心理状态、文化背景)等方面。