期刊文献+
共找到219篇文章
< 1 2 11 >
每页显示 20 50 100
Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha 被引量:1
1
作者 Tessfu Geteye Fantaye Junqing Yu Tulu Tilahun Hailu 《Journal of Signal and Information Processing》 2020年第1期1-21,共21页
Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency a... Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems. 展开更多
关键词 automatic speech recognition MULTILINGUAL DNN Modeling Methods Basic PHONE ACOUSTIC UNITS Rounded PHONE ACOUSTIC UNITS Chaha
在线阅读 下载PDF
Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization
2
作者 Soonshin Seo Ji-Hwan Kim 《Computers, Materials & Continua》 SCIE EI 2023年第12期2833-2856,共24页
Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these... Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs. 展开更多
关键词 automatic speech recognition neural language model Mogrifier long short-term memory PRUNING DISTILLATION efficient deployment OPTIMIZATION joint training
在线阅读 下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
3
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 speech recognition automatic speech recognition(asr) mel-frequency cepstral coefficients(MFCC) hidden Markov model(HMM) artificial neural network(ANN)
在线阅读 下载PDF
Speech Recognition via CTC-CNN Model
4
作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o... In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution. 展开更多
关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition
在线阅读 下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
5
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
在线阅读 下载PDF
A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
6
作者 Peiyuan Jiang Weijun Pan +2 位作者 Jian Zhang Teng Wang Junxiang Huang 《Computers, Materials & Continua》 SCIE EI 2023年第10期911-940,共30页
This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents ... This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition model.To tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms.On the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output characters.The Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream tasks.To enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training strategy.The model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss function.The experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model performance.On the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline model.Moreover,the effectiveness of the proposed model is validated on two publicly available datasets.On the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline model.Similarly,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline model.Additionally,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognition systems.In robustness evaluation experiments based on this criterion,the proposed model demonstrates a performance improvement of 22%compared to the baseline model. 展开更多
关键词 Air traffic control automatic speech recognition CONFORMER robustness evaluation T5 error correction model
在线阅读 下载PDF
Automatic evaluation of speech impairment caused by wearing a dental appliance
7
作者 Mariko Hattori Yuka I. Sumita Hisashi Taniguchi 《Open Journal of Stomatology》 2013年第7期365-369,共5页
In dentistry, speech evaluation is important for appropriate orofacial dysfunction rehabilitation. The speech intelligibility test is often used to assess patients’ speech, and it involves an evaluation by human list... In dentistry, speech evaluation is important for appropriate orofacial dysfunction rehabilitation. The speech intelligibility test is often used to assess patients’ speech, and it involves an evaluation by human listeners. However, the test has certain shortcomings, and an alternative method, without a listening procedure, is needed. The purpose of this study was to test the applicability of an automatic speech intelligibility test system using a computerized speech recognition technique. Speech of 10 normal subjects, when wearing a dental appliance, was evaluated using an automatic speech intelligibility test system that was developed using computerized speech recognition software. The results of the automatic test were referred to as the speech recognition scores. The Wilcoxon signed rank test was used to analyze differences in the results of the test between the following 2 conditions: with the palatal plate in place and with the palatal plate removed. Spearman correlation coefficients were used to evaluate whether the speech recognition score correlated with the result of conventional intelligibility test. The speech recognition score was significantly decreased when wearing the plate (z = -2.807, P = 0.0050). The automatic evaluation results positively correlated with that of conventional evaluation when wearing the appliance (r = 0.729, P = 0.017). The automatic speech testing system may be useful for evaluating speech intelligibility in denture wearers. 展开更多
关键词 PROSTHODONTICS MAXILLOFACIAL PROSTHODONTICS speech automatic speech recognition
暂未订购
Development of Application Specific Continuous Speech Recognition System in Hindi
8
作者 Gaurav Gaurav Devanesamoni Shakina Deiv +1 位作者 Gopal Krishna Sharma Mahua Bhattacharya 《Journal of Signal and Information Processing》 2012年第3期394-401,共8页
Application specific voice interfaces in local languages will go a long way in reaching the benefits of technology to rural India. A continuous speech recognition system in Hindi tailored to aid teaching Geometry in P... Application specific voice interfaces in local languages will go a long way in reaching the benefits of technology to rural India. A continuous speech recognition system in Hindi tailored to aid teaching Geometry in Primary schools is the goal of the work. This paper presents the preliminary work done towards that end. We have used the Mel Frequency Cepstral Coefficients as speech feature parameters and Hidden Markov Modeling to model the acoustic features. Hidden Markov Modeling Tool Kit —3.4 was used both for feature extraction and model generation. The Julius recognizer which is language independent was used for decoding. A speaker independent system is implemented and results are presented. 展开更多
关键词 automatic speech recognition Mel Frequency Cepstral COEFFICIENTS Hidden MARKOV Modeling
在线阅读 下载PDF
Phoneme Sequence Modeling in the Context of Speech Signal Recognition in Language “Baoule”
9
作者 Hyacinthe Konan Etienne Soro +2 位作者 Olivier Asseu Bi Tra Goore Raymond Gbegbe 《Engineering(科研)》 2016年第9期597-617,共22页
This paper presents the recognition of “Baoule” spoken sentences, a language of C?te d’Ivoire. Several formalisms allow the modelling of an automatic speech recognition system. The one we used to realize our system... This paper presents the recognition of “Baoule” spoken sentences, a language of C?te d’Ivoire. Several formalisms allow the modelling of an automatic speech recognition system. The one we used to realize our system is based on Hidden Markov Models (HMM) discreet. Our goal in this article is to present a system for the recognition of the Baoule word. We present three classical problems and develop different algorithms able to resolve them. We then execute these algorithms with concrete examples. 展开更多
关键词 HMM MATLAB Language Model Acoustic Model recognition automatic speech
在线阅读 下载PDF
Investigation of Knowledge Transfer Approaches to Improve the Acoustic Modeling of Vietnamese ASR System 被引量:5
10
作者 Danyang Liu Ji Xu +1 位作者 Pengyuan Zhang Yonghong Yan 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第5期1187-1195,共9页
It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languag... It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languages where scripted speech is difficult to obtain, data sparsity is the main problem that limits the performance of speech recognition system. In this paper, several knowledge transfer methods are investigated to overcome the data sparsity problem with the help of high-resource languages.The first one is a pre-training and fine-tuning(PT/FT) method, in which the parameters of hidden layers are initialized with a welltrained neural network. Secondly, the progressive neural networks(Prognets) are investigated. With the help of lateral connections in the network architecture, Prognets are immune to forgetting effect and superior in knowledge transferring. Finally,bottleneck features(BNF) are extracted using cross-lingual deep neural networks and serves as an enhanced feature to improve the performance of ASR system. Experiments are conducted in a low-resource Vietnamese dataset. The results show that all three methods yield significant gains over the baseline system, and the Prognets acoustic model performs the best. Further improvements can be obtained by combining the Prognets model and bottleneck features. 展开更多
关键词 BOTTLENECK feature (BNF) cross-lingual automatic speech recognition (asr) PROGRESSIVE neural networks (Prognets) model transfer learning
在线阅读 下载PDF
WTASR:Wavelet Transformer for Automatic Speech Recognition of Indian Languages 被引量:1
11
作者 Tripti Choudhary Vishal Goyal Atul Bansal 《Big Data Mining and Analytics》 EI CSCD 2023年第1期85-91,共7页
Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation.This translation is used in a variety of applications like voice enabled commands,assist... Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation.This translation is used in a variety of applications like voice enabled commands,assistive devices and bots,etc.There is a significant lack of efficient technology for Indian languages.In this paper,an wavelet transformer for automatic speech recognition(WTASR)of Indian language is proposed.The speech signals suffer from the problem of high and low frequency over different times due to variation in speech of the speaker.Thus,wavelets enable the network to analyze the signal in multiscale.The wavelet decomposition of the signal is fed in the network for generating the text.The transformer network comprises an encoder decoder system for speech translation.The model is trained on Indian language dataset for translation of speech into corresponding text.The proposed method is compared with other state of the art methods.The results show that the proposed WTASR has a low word error rate and can be used for effective speech recognition for Indian language. 展开更多
关键词 transformer WAVELET automatic speech recognition(asr) Indian language
原文传递
基于ASR与Arduino的语音控制照明系统设计 被引量:1
12
作者 胡芷晗 《电声技术》 2019年第5期56-57,63,共3页
通过对Arduino单板深入研究,结合高性能的ASR语音识别芯片,将语音识别技术引入照明系统设计中,进行了语音控制系统的总体结构、主控制模块和语音识别的软硬件设计,实现了一套基于Arduino的语音控制系统。最终测试完成了远程控制台灯即... 通过对Arduino单板深入研究,结合高性能的ASR语音识别芯片,将语音识别技术引入照明系统设计中,进行了语音控制系统的总体结构、主控制模块和语音识别的软硬件设计,实现了一套基于Arduino的语音控制系统。最终测试完成了远程控制台灯即时状态,提高智能化程度的目的。 展开更多
关键词 语音识别 语音控制 asr ARDUINO
在线阅读 下载PDF
基于ASR的呼叫中心系统设计与可靠性研究
13
作者 郭瑞 《环境技术》 2010年第2期34-38,51,共6页
本文以IT运行维护的故障申报系统为例,介绍如何利用Nuance Recognizer9.0自动语音识别系统和东进D081A模拟中继语音卡电话处理系统设计基于ASR(自动语音识别)的呼叫中心。文中不仅介绍了设计过程中的各个关键环节,而且对该系统的可靠性... 本文以IT运行维护的故障申报系统为例,介绍如何利用Nuance Recognizer9.0自动语音识别系统和东进D081A模拟中继语音卡电话处理系统设计基于ASR(自动语音识别)的呼叫中心。文中不仅介绍了设计过程中的各个关键环节,而且对该系统的可靠性进行了深入讨论。其中包括如何合理设计语法文件以提高语音识别率;如何在系统运行期间进行同步保障,使系统逐步趋于完善。 展开更多
关键词 自动语音识别(asr) 呼叫中心 语法文件 同步保障
在线阅读 下载PDF
Real Time Speech Based Integrated Development Environment for C Program 被引量:1
14
作者 Bharathi Bhagavathsingh Kavitha Srinivasan Mariappan Natrajan 《Circuits and Systems》 2016年第3期69-82,共14页
This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Envir... This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Environment (IDE) for C program. This paper proposes a technique to facilitate the visually impaired people or the person with arm injuries with excellent programming skills that can code the C program through voice input. The proposed system accepts the C program as voice input and produces compiled C program as output. The user should utter each line of the C program through voice input. First the voice input is recognized as text. The recognized text will be converted into C program by using syntactic constructs of the C language. After conversion, C program will be fetched as input to the IDE. Furthermore, the IDE commands like open, save, close, compile, run are also given through voice input only. If any error occurs during the compilation process, the error is corrected through voice input only. The errors can be corrected by specifying the line number through voice input. Performance of the speech recognition system is analyzed by varying the vocabulary size as well as number of mixture components in HMM. 展开更多
关键词 automatic speech recognition Integrated Development Environment Hidden Markov Model Mel Frequency Cepstral Coefficients
在线阅读 下载PDF
Speech Signal Recovery Based on Source Separation and Noise Suppression
15
作者 Zhe Wang Haijian Zhang Guoan Bi 《Journal of Computer and Communications》 2014年第9期112-120,共9页
In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed spe... In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements. 展开更多
关键词 speech RECOVERY TIME-FREQUENCY Source SEPARATION Adaptive Noise SUPPRESSION automatic speech recognition
在线阅读 下载PDF
基于EfficientNetV2-RetNet的端到端中文管制语音识别 被引量:2
16
作者 梁海军 常瀚文 +2 位作者 何一民 赵志伟 孔建国 《电讯技术》 北大核心 2025年第2期254-260,共7页
自动语音识别(Automatic Speech Recognition, ASR)技术在空中交通管制(Air Traffic Control, ATC)领域的应用有望提高通信效率、减少人为错误、提升安全性,并促进航空交通管理系统的创新和改进。然而,由于ATC通信通常涉及敏感信息,获... 自动语音识别(Automatic Speech Recognition, ASR)技术在空中交通管制(Air Traffic Control, ATC)领域的应用有望提高通信效率、减少人为错误、提升安全性,并促进航空交通管理系统的创新和改进。然而,由于ATC通信通常涉及敏感信息,获取大量带有标签的ATC语音数据较为困难,这给构建高准确度的ASR系统带来了巨大挑战。基于Retentive Network(RetNet)和迁移学习设计了一种新的端到端ASR框架EfficientNetV2-RetNet-CTC,用于ATC系统。EfficientNetV2的多层卷积结构有助于对语音信号提取更复杂的特征表示。RetNet使用多尺度保持机制学习序列数据上的全局时间动态,可以非常高效地处理长距离依赖性。连接时序分类不用强制对齐标签且标签可变长。此外,迁移学习通过在源任务上学习的知识来改善在目标任务上的性能,解决了民航领域数据资源稀缺的问题且提高了模型的泛化能力。实验结果表明,所设计的模型优于其他基线,在Aishell语料库上预训练的最低词错误率为7.6%和8.7%,在ATC语料库上降至5.6%和6.8%。 展开更多
关键词 空中交通管制 自动语音识别 端到端深度学习 迁移学习
在线阅读 下载PDF
基于AR与ASR的变电运检系统设计与实现
17
作者 梁日才 刘文平 +1 位作者 罗海鑫 王晓强 《通信电源技术》 2022年第13期99-103,共5页
目前电力企业需要开展变电设备巡视、维护、检修和紧急抢修工作,传统变电运检工作存在技能水平不足、沟通不畅以及智能化水平不高的问题。同时,变电运检工作还具有复杂性和综合性的特点,为现场作业人员提供实时专家库支持,是变电运检工... 目前电力企业需要开展变电设备巡视、维护、检修和紧急抢修工作,传统变电运检工作存在技能水平不足、沟通不畅以及智能化水平不高的问题。同时,变电运检工作还具有复杂性和综合性的特点,为现场作业人员提供实时专家库支持,是变电运检工作的重要发展方向。为提高专家会诊效率和质量,保障专家快速了解现场并作出准确的指导,缩短消缺周期和提高消缺效率,基于增强现实(Augmented Reality,AR)和自动语音识别(Automatic Speech Recognition,ASR)技术,设计了一种交互式变电运检系统,实现了专家远程快速会诊功能,高效辅助解决现场问题,显著提升了变电运检工作效率,并进一步保障了变电作业人员的人身安全。该系统在某变电管理所的成功应用,验证了系统的实用性及有效性。 展开更多
关键词 变电消缺 增强现实(AR) 自动语音识别(asr) 交互式系统 远程视频会诊
在线阅读 下载PDF
基于多特征迁移学习的低资源临高方言语音识别方法
18
作者 王忠 曹春杰 +3 位作者 谢夏 穆罕默德·艾哈迈德·拉扎 陈勇青 陈昱珏 《通信学报》 北大核心 2025年第10期221-232,共12页
针对低资源临高方言语音识别中数据稀缺、字错误率高的问题,提出了一种基于多特征迁移学习的端到端语音识别方法。以TeleSpeech-ASR1.0-large多方言预训练模型为基座,融合梅尔频率倒谱系数、滤波器组能量系数与对数梅尔谱3类互补声学特... 针对低资源临高方言语音识别中数据稀缺、字错误率高的问题,提出了一种基于多特征迁移学习的端到端语音识别方法。以TeleSpeech-ASR1.0-large多方言预训练模型为基座,融合梅尔频率倒谱系数、滤波器组能量系数与对数梅尔谱3类互补声学特征,通过构建Conformer-LAS-CTC联合优化架构,利用深度可分离卷积和多头自注意力机制分别捕捉语音信号的局部特征与全局依赖关系,并设计融合CTC、中间层CTC与注意力机制的多任务损失函数进行联合训练。在总时长为280 h的临高方言与普通话混合语料上的实验结果表明,所提方法的字错误率降低至18.89%,显著优于基线模型,有效缓解了低资源方言面临的数据瓶颈问题,为濒危语言的数字化保护提供了可行的技术路径。 展开更多
关键词 低资源语音识别 迁移学习 CONFORMER 多特征融合 临高方言
在线阅读 下载PDF
短时语音的法庭自动说话人识别研究 被引量:1
19
作者 张翠玲 刘明星 《中国人民公安大学学报(自然科学版)》 2025年第2期100-108,共9页
为了探究短时语音在法庭说话人识别中的应用价值,利用基于似然比框架的法庭自动说话人识别系统,对典型案件条件下的短时语音进行了法庭说话人识别的验证测试和分析比较。通过对不同时长、不同校准集人数规模及其音频数量的测试比较,量... 为了探究短时语音在法庭说话人识别中的应用价值,利用基于似然比框架的法庭自动说话人识别系统,对典型案件条件下的短时语音进行了法庭说话人识别的验证测试和分析比较。通过对不同时长、不同校准集人数规模及其音频数量的测试比较,量化评估了系统在短时语音条件下的识别性能、3种因素对识别性能的影响以及短时语音在司法实践中的应用价值。研究结果表明,短时语音条件下系统仍具有良好的准确性和可靠性,这不仅验证了该系统的有效性和顽健性,也说明了短时语音在司法实践中的应用潜力。 展开更多
关键词 短时语音 法庭说话人识别 自动说话人识别 似然比
在线阅读 下载PDF
低资源语言自动语音识别中的数据处理与数据增强综述
20
作者 杨健 孙浏 张丽芳 《计算机科学》 北大核心 2025年第8期86-99,共14页
由于标注语音数据不足,端到端自动语音识别(Automatic Speech Recognition,ASR)技术难以直接应用到低资源语言场景,低资源语言ASR也成为NLP领域的热点问题。目前,低资源环境下ASR的研究可以从数据增强和模型改进两方面开展,以低资源语言... 由于标注语音数据不足,端到端自动语音识别(Automatic Speech Recognition,ASR)技术难以直接应用到低资源语言场景,低资源语言ASR也成为NLP领域的热点问题。目前,低资源环境下ASR的研究可以从数据增强和模型改进两方面开展,以低资源语言ASR中的训练数据处理为主要研究对象,重点从数据增强、样本处理、特征工程等角度,对近年来该领域的重要研究成果进行梳理和总结。分析了不同类型的数据增强方案,强调未配对语音和文本的利用,并从特征抽取、嵌入和融合等不同方面对低资源环境下ASR的特征工程进行分析和总结,阐述了低资源语音语料库建设等问题,并对低资源环境下用于语音识别的数据增强技术未来可以进一步深入研究的重要方向进行展望。 展开更多
关键词 低资源 自动语音识别 数据增强 特征表示
在线阅读 下载PDF
上一页 1 2 11 下一页 到第
使用帮助 返回顶部