Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In thi...Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In this paper,we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms.The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers.The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase.Firstly,we extract keyframes from our dataset.Secondly,we produce a Concatenated Frame Images(CFIs)that represent the utterance sequence in one single image.Finally,the VGG-19 is employed for visual features extraction in our proposed model.We have examined different keyframes:10,15,and 20 for comparing two types of approaches in the proposed model:(1)the VGG-19 base model and(2)VGG-19 base model with batch normalization.The results show that the second approach achieves greater accuracy:94%for digit recognition,97%for phrase recognition,and 93%for digits and phrases recognition in the test dataset.Therefore,our proposed model is superior to models based on CFIs input.展开更多
The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual sp...The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual speech recognition,is the ability to understand and predict spoken speech based solely on lip movements without using sounds.Due to the lack of research studies on visual speech recognition for the Arabic language in general,and its absence in the Quranic research,this research aims to fill this gap.This paper introduces a new publicly available Arabic lip-reading dataset containing 10490 videos captured from multiple viewpoints and comprising data samples at the letter level(i.e.,single letters(single alphabets)and Quranic disjoined letters)and in the word level based on the content and context of the book Al-Qaida Al-Noorania.This research uses visual speech recognition to recognize spoken Arabic letters(Arabic alphabets),Quranic disjoined letters,and Quranic words,mainly phonetic as they are recited in the Holy Quran according to Quranic study aid entitled Al-Qaida Al-Noorania.This study could further validate the correctness of pronunciation and,subsequently,assist people in correctly reciting Quran.Furthermore,a detailed description of the created dataset and its construction methodology is provided.This new dataset is used to train an effective pre-trained deep learning CNN model throughout transfer learning for lip-reading,achieving the accuracies of 83.3%,80.5%,and 77.5%on words,disjoined letters,and single letters,respectively,where an extended analysis of the results is provided.Finally,the experimental outcomes,different research aspects,and dataset collection consistency and challenges are discussed and concluded with several new promising trends for future work.展开更多
The application of Information and Communication Technologies has transformed traditional Teaching and Learning in the past decade to computerized-based era. This evolution has resulted from the emergence of the digit...The application of Information and Communication Technologies has transformed traditional Teaching and Learning in the past decade to computerized-based era. This evolution has resulted from the emergence of the digital system and has greatly impacted on the global education and socio-cultural development. Multimedia has been absorbed into the education sector for producing a new learning concept and a combination of educational and entertainment approach. This research is concerned with the application of Window Speech Recognition and Microsoft Visual Basic 2008 Integrated/Interactive Development Environment in Multimedia-Assisted Courseware prototype development for Primary School Mathematics contents, namely, single digits and the addition. The Teaching and Learning techniques—Explain, Instruct and Facilitate are proposed and these could be viewed as instructors’ centered strategy, instructors’—learners’ dual communication and learners' active participation. The prototype is called M-EIF and deployed only users' voices;hence the activation of Window Speech Recognition is required prior to a test run.展开更多
Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech ...Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech impairments.However,most Visual Speech Recognition(VSR)research has primarily focused on the English language and general-purpose applications,limiting its practical applicability in medical and rehabilitative settings.This study introduces the first Deep Learning(DL)based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions,facilitating communication for patients recovering from vocal cord surgeries,whether temporarily or permanently impaired.To ensure relevance and effectiveness in real-world scenarios,a carefully curated vocabulary of twenty-five Italian words was selected,encompassing critical semantic fields such as Needs,Questions,Answers,Emergencies,Greetings,Requests,and Body Parts.These words were chosen to address both essential daily communication and urgent medical assistance requests.Our approach combines a spatiotemporal Convolutional Neural Network(CNN)with a bidirectional Long Short-Term Memory(BiLSTM)recurrent network,and a Connectionist Temporal Classification(CTC)loss function to recognize individual words,without requiring predefined words boundaries.The experimental results demonstrate the system’s robust performance in recognizing target words,reaching an average accuracy of 96.4%in individual word recognition,suggesting that the system is particularly well-suited for offering support in constrained clinical and caregiving environments,where quick and reliable communication is critical.In conclusion,the study highlights the importance of developing language-specific,application-driven VSR solutions,particularly for non-English languages with limited linguistic resources.By bridging the gap between deep learning-based lip-reading and real-world clinical needs,this research advances assistive communication technologies,paving the way for more inclusive and medically relevant applications of VSR in rehabilitation and healthcare.展开更多
目的探究视听语言智能康复技术联合治疗性游戏对语言发育迟缓(delayed language development,DLD)患儿言语功能的影响。方法纳入2022年2月~2023年1月我院收治的LDD患儿86例,随机分为观察组和对照组各43例。对照组行常规语言训练,观察组...目的探究视听语言智能康复技术联合治疗性游戏对语言发育迟缓(delayed language development,DLD)患儿言语功能的影响。方法纳入2022年2月~2023年1月我院收治的LDD患儿86例,随机分为观察组和对照组各43例。对照组行常规语言训练,观察组在对照组基础上使用视听语言智能康复技术联合治疗性游戏进行干预,均干预3个月。干预前后分别采用Gesell发育量表(gesell developmental schedules,GDS)、普通话听觉理解和表达能力标准化评估表(diagnostic receptive and expressive assessment of mandarin-comprehensive,DREAM-C)、口部运动量表评估两组患儿的发育商(developmental quotient,DQ)、语言发育水平、口部运动功能。结果干预后,观察组Gesell发育评分(语言行为、适应性行为、个人-社交行为)、DREAM-C评分(总体语言、听力理解、语言表达、语义、句法)、唇部、下颌和舌部功能均显著高于对照组(P<0.05)。结论视听语言智能康复技术联合治疗性游戏可促进DLD患儿智力发育,改善言语和口部运动功能。展开更多
文摘Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In this paper,we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms.The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers.The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase.Firstly,we extract keyframes from our dataset.Secondly,we produce a Concatenated Frame Images(CFIs)that represent the utterance sequence in one single image.Finally,the VGG-19 is employed for visual features extraction in our proposed model.We have examined different keyframes:10,15,and 20 for comparing two types of approaches in the proposed model:(1)the VGG-19 base model and(2)VGG-19 base model with batch normalization.The results show that the second approach achieves greater accuracy:94%for digit recognition,97%for phrase recognition,and 93%for digits and phrases recognition in the test dataset.Therefore,our proposed model is superior to models based on CFIs input.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual speech recognition,is the ability to understand and predict spoken speech based solely on lip movements without using sounds.Due to the lack of research studies on visual speech recognition for the Arabic language in general,and its absence in the Quranic research,this research aims to fill this gap.This paper introduces a new publicly available Arabic lip-reading dataset containing 10490 videos captured from multiple viewpoints and comprising data samples at the letter level(i.e.,single letters(single alphabets)and Quranic disjoined letters)and in the word level based on the content and context of the book Al-Qaida Al-Noorania.This research uses visual speech recognition to recognize spoken Arabic letters(Arabic alphabets),Quranic disjoined letters,and Quranic words,mainly phonetic as they are recited in the Holy Quran according to Quranic study aid entitled Al-Qaida Al-Noorania.This study could further validate the correctness of pronunciation and,subsequently,assist people in correctly reciting Quran.Furthermore,a detailed description of the created dataset and its construction methodology is provided.This new dataset is used to train an effective pre-trained deep learning CNN model throughout transfer learning for lip-reading,achieving the accuracies of 83.3%,80.5%,and 77.5%on words,disjoined letters,and single letters,respectively,where an extended analysis of the results is provided.Finally,the experimental outcomes,different research aspects,and dataset collection consistency and challenges are discussed and concluded with several new promising trends for future work.
文摘The application of Information and Communication Technologies has transformed traditional Teaching and Learning in the past decade to computerized-based era. This evolution has resulted from the emergence of the digital system and has greatly impacted on the global education and socio-cultural development. Multimedia has been absorbed into the education sector for producing a new learning concept and a combination of educational and entertainment approach. This research is concerned with the application of Window Speech Recognition and Microsoft Visual Basic 2008 Integrated/Interactive Development Environment in Multimedia-Assisted Courseware prototype development for Primary School Mathematics contents, namely, single digits and the addition. The Teaching and Learning techniques—Explain, Instruct and Facilitate are proposed and these could be viewed as instructors’ centered strategy, instructors’—learners’ dual communication and learners' active participation. The prototype is called M-EIF and deployed only users' voices;hence the activation of Window Speech Recognition is required prior to a test run.
文摘Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech impairments.However,most Visual Speech Recognition(VSR)research has primarily focused on the English language and general-purpose applications,limiting its practical applicability in medical and rehabilitative settings.This study introduces the first Deep Learning(DL)based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions,facilitating communication for patients recovering from vocal cord surgeries,whether temporarily or permanently impaired.To ensure relevance and effectiveness in real-world scenarios,a carefully curated vocabulary of twenty-five Italian words was selected,encompassing critical semantic fields such as Needs,Questions,Answers,Emergencies,Greetings,Requests,and Body Parts.These words were chosen to address both essential daily communication and urgent medical assistance requests.Our approach combines a spatiotemporal Convolutional Neural Network(CNN)with a bidirectional Long Short-Term Memory(BiLSTM)recurrent network,and a Connectionist Temporal Classification(CTC)loss function to recognize individual words,without requiring predefined words boundaries.The experimental results demonstrate the system’s robust performance in recognizing target words,reaching an average accuracy of 96.4%in individual word recognition,suggesting that the system is particularly well-suited for offering support in constrained clinical and caregiving environments,where quick and reliable communication is critical.In conclusion,the study highlights the importance of developing language-specific,application-driven VSR solutions,particularly for non-English languages with limited linguistic resources.By bridging the gap between deep learning-based lip-reading and real-world clinical needs,this research advances assistive communication technologies,paving the way for more inclusive and medically relevant applications of VSR in rehabilitation and healthcare.
文摘目的探究视听语言智能康复技术联合治疗性游戏对语言发育迟缓(delayed language development,DLD)患儿言语功能的影响。方法纳入2022年2月~2023年1月我院收治的LDD患儿86例,随机分为观察组和对照组各43例。对照组行常规语言训练,观察组在对照组基础上使用视听语言智能康复技术联合治疗性游戏进行干预,均干预3个月。干预前后分别采用Gesell发育量表(gesell developmental schedules,GDS)、普通话听觉理解和表达能力标准化评估表(diagnostic receptive and expressive assessment of mandarin-comprehensive,DREAM-C)、口部运动量表评估两组患儿的发育商(developmental quotient,DQ)、语言发育水平、口部运动功能。结果干预后,观察组Gesell发育评分(语言行为、适应性行为、个人-社交行为)、DREAM-C评分(总体语言、听力理解、语言表达、语义、句法)、唇部、下颌和舌部功能均显著高于对照组(P<0.05)。结论视听语言智能康复技术联合治疗性游戏可促进DLD患儿智力发育,改善言语和口部运动功能。