为探究X⁃codec对大语言模型音频生成性能的影响,本研究基于LibriSpeech数据集分析语料特征(时长、音色)对基于X⁃codec的大语言模型(large language model,LLM)在音频生成任务中的表现。相似性目标(similarity objective,Sim⁃O)得分和全...为探究X⁃codec对大语言模型音频生成性能的影响,本研究基于LibriSpeech数据集分析语料特征(时长、音色)对基于X⁃codec的大语言模型(large language model,LLM)在音频生成任务中的表现。相似性目标(similarity objective,Sim⁃O)得分和全体平均意见得分(user test mean opinion score,UTMOS)指标测定结果表明:当语料时长超过10 s(即长语料)且音色为男声时,Sim⁃O得分和UTMOS在算术平均数上均显著高于相应特征分类中的其他组,同时在标准差上均显著低于相应特征分类中的其他组。因此,男声的长语料更有可能使应用了X⁃codec的LLM性能达到最佳状态。本研究结果可为优化音频编解码器设计提供理论支持。展开更多
In the field of Internet, an image is of great significance to information transmission. Meanwhile, how to ensure and improve its security has become the focus of international research. We combine DNA codec with quan...In the field of Internet, an image is of great significance to information transmission. Meanwhile, how to ensure and improve its security has become the focus of international research. We combine DNA codec with quantum Arnold transform(QAr T) to propose a new double encryption algorithm for quantum color images to improve the security and robustness of image encryption. First, we utilize the biological characteristics of DNA codecs to perform encoding and decoding operations on pixel color information in quantum color images, and achieve pixel-level diffusion. Second, we use QAr T to scramble the position information of quantum images and use the operated image as the key matrix for quantum XOR operations. All quantum operations in this paper are reversible, so the decryption operation of the ciphertext image can be realized by the reverse operation of the encryption process. We conduct simulation experiments on encryption and decryption using three color images of “Monkey”, “Flower”, and “House”. The experimental results show that the peak value and correlation of the encrypted images on the histogram have good similarity, and the average normalized pixel change rate(NPCR) of RGB three-channel is 99.61%, the average uniform average change intensity(UACI) is 33.41%,and the average information entropy is about 7.9992. In addition, the robustness of the proposed algorithm is verified by the simulation of noise interference in the actual scenario.展开更多
In the study and implementation of a programmable RS codec module in satellite communication modem, FPGA is used as the kernel in the implementation, while some ASICs are used as necessary assistant measures. The modu...In the study and implementation of a programmable RS codec module in satellite communication modem, FPGA is used as the kernel in the implementation, while some ASICs are used as necessary assistant measures. The module includes the RS codec unit, the interleaver and deinterleaver unit, the scrambler and descrambler unit and the frame synchronization unit. The module is realized successfully and it can be programmed on-line to meet the requirements of IESS 308/309/310 including many specifications about different service types and data rates. With the implementation combining FPGA with ASICs, size of the circuit is much reduced, its flexibility dramatically increased, and its stability further strengthened. Furthermore, the module is based on the software radio concept and can be easily integrated into the whole satellite communication modem.展开更多
Aiming at improving rate flexibility of the enhanced voice services (EVS) channel-aware mode for various VoIP applications, two new bit-rate channel-aware modes are proposed in this paper in addition to the existing 1...Aiming at improving rate flexibility of the enhanced voice services (EVS) channel-aware mode for various VoIP applications, two new bit-rate channel-aware modes are proposed in this paper in addition to the existing 13.2 kbit/s mode. Channel-aware mode uses forward error correction by transmitting re-encoded information redundantly for use when the original information is lost or discarded due to late arrival to the receiver. The primary frame bit rate is reduced for the redundant accommodation. A modified quantization scheme is proposed for core encoding regarding the quality degradation. Partial redundant coding is a simplification of that in the existing 13.2 kbit/s channel-aware mode due to the bit constraint. The objective evaluation results of PESQ show that the additional channel-aware modes achieve similar performance in improving the error robustness against missing packets as that of the existing 13.2 kbit/s mode. Multiple bit-rate modes can be dynamically selected in the communication system for more voice services in different bandwidths. On the other hand, optimal allocation based on real-time feedback can adapt to the rapidly-changing network environment as well as possible.展开更多
The different formats of codec stream carried in the radio access network and the core network make the double speech encoding/decoding necessary, which degrades the speech quality. Accordingly, codec negotiation tech...The different formats of codec stream carried in the radio access network and the core network make the double speech encoding/decoding necessary, which degrades the speech quality. Accordingly, codec negotiation technologies are necessary for unifying encoding/ decoding in the whole process. Transcoder Free Operation (TrFO), Tandem Free Operation (TFO), and network quality deciding technology are the leading codec negotiation technologies. The TrFO is a mechanism for optimum selection during the establishment of a call. It tries to establish connection between User Equipment (UE) without Transcoder (TC). Its successful fulfillment enables the efficient utilization of bandwidth. The TFO, a standby technology of TrFO, is the negotiation technology of an in-band codec. With it, the user codec stream is free from the compression and decompression by the voice codec, and the quality of voice can accordingly be improved. The network-quantity deciding technology adopts G.711 or G.729 flexibly according to the number of accessed calls. This allows the access of new calls while won’t increase the load of network too much.展开更多
ITU-T G. 729 is the primarily recommended speech codec by H. 323 standard. This paper describes how to implement G. 729 codec in IP telephony gateway, and goes deep into the programming skills on TMS320C6201 DSP and o...ITU-T G. 729 is the primarily recommended speech codec by H. 323 standard. This paper describes how to implement G. 729 codec in IP telephony gateway, and goes deep into the programming skills on TMS320C6201 DSP and optimizing methods of program code to reduce the speech processing delay time of G. 729 codec. Due to adopting these optimizing methods and programming skills, we have implemented a high-speed speech codec that can process concurrently 20 voice channels with single TMS320C6201 chip in IP telephony gateway. Finally, the paper analyzes the performance results of ITU-T G. 729 codec based on TMS320C6201.展开更多
A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stabil...A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stability of linear predictive coding(LPC)filter,predicts the lost frames by utilizing the linear spectrum frequency and different continuous attenuation factor of different kinds of lost frames.Signal noise ratio(SNR)test and multiple stimuli with hidden reference and anchor(MUSHRA)test are conducted to evaluate the performance of this approach in adaptive multi-rate wideband plus(AMR-WB+)audio codec.Compared with the original frame error concealment scheme,our scheme achieves better audio recovery quality in AMR-WB+audio codec.展开更多
In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and ...In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and a low percent root mean square difference (PRD). Since ECG signals have both intra-beat and inter-beat redundancies like video signals, which have both intra-frame and inter-frame correlation, video codec technology can be used for ECG compression. In order to do this, some pre-process will be needed. The ECG signals should firstly be segmented and normalized to a sequence of beat cycles with the same length, and then these beat cycles can be treated as picture frames and compressed with video codec technology. We have used records from MIT-BIH arrhythmia database to evaluate our algorithm. Results show that, besides compression efficiently, this algorithm has the advantages of resolution adjustable, random access and flexibility for irregular period and QRS false detection.展开更多
文摘为探究X⁃codec对大语言模型音频生成性能的影响,本研究基于LibriSpeech数据集分析语料特征(时长、音色)对基于X⁃codec的大语言模型(large language model,LLM)在音频生成任务中的表现。相似性目标(similarity objective,Sim⁃O)得分和全体平均意见得分(user test mean opinion score,UTMOS)指标测定结果表明:当语料时长超过10 s(即长语料)且音色为男声时,Sim⁃O得分和UTMOS在算术平均数上均显著高于相应特征分类中的其他组,同时在标准差上均显著低于相应特征分类中的其他组。因此,男声的长语料更有可能使应用了X⁃codec的LLM性能达到最佳状态。本研究结果可为优化音频编解码器设计提供理论支持。
基金Project supported by the Natural Science Foundation of Shandong Province, China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)the Key R&D Program of Shandong Province, China (Grant No. 2023CXGC010901)。
文摘In the field of Internet, an image is of great significance to information transmission. Meanwhile, how to ensure and improve its security has become the focus of international research. We combine DNA codec with quantum Arnold transform(QAr T) to propose a new double encryption algorithm for quantum color images to improve the security and robustness of image encryption. First, we utilize the biological characteristics of DNA codecs to perform encoding and decoding operations on pixel color information in quantum color images, and achieve pixel-level diffusion. Second, we use QAr T to scramble the position information of quantum images and use the operated image as the key matrix for quantum XOR operations. All quantum operations in this paper are reversible, so the decryption operation of the ciphertext image can be realized by the reverse operation of the encryption process. We conduct simulation experiments on encryption and decryption using three color images of “Monkey”, “Flower”, and “House”. The experimental results show that the peak value and correlation of the encrypted images on the histogram have good similarity, and the average normalized pixel change rate(NPCR) of RGB three-channel is 99.61%, the average uniform average change intensity(UACI) is 33.41%,and the average information entropy is about 7.9992. In addition, the robustness of the proposed algorithm is verified by the simulation of noise interference in the actual scenario.
文摘In the study and implementation of a programmable RS codec module in satellite communication modem, FPGA is used as the kernel in the implementation, while some ASICs are used as necessary assistant measures. The module includes the RS codec unit, the interleaver and deinterleaver unit, the scrambler and descrambler unit and the frame synchronization unit. The module is realized successfully and it can be programmed on-line to meet the requirements of IESS 308/309/310 including many specifications about different service types and data rates. With the implementation combining FPGA with ASICs, size of the circuit is much reduced, its flexibility dramatically increased, and its stability further strengthened. Furthermore, the module is based on the software radio concept and can be easily integrated into the whole satellite communication modem.
基金Supported by the International Cooperation Research Project between Ericsson(Sweden) and BIT
文摘Aiming at improving rate flexibility of the enhanced voice services (EVS) channel-aware mode for various VoIP applications, two new bit-rate channel-aware modes are proposed in this paper in addition to the existing 13.2 kbit/s mode. Channel-aware mode uses forward error correction by transmitting re-encoded information redundantly for use when the original information is lost or discarded due to late arrival to the receiver. The primary frame bit rate is reduced for the redundant accommodation. A modified quantization scheme is proposed for core encoding regarding the quality degradation. Partial redundant coding is a simplification of that in the existing 13.2 kbit/s channel-aware mode due to the bit constraint. The objective evaluation results of PESQ show that the additional channel-aware modes achieve similar performance in improving the error robustness against missing packets as that of the existing 13.2 kbit/s mode. Multiple bit-rate modes can be dynamically selected in the communication system for more voice services in different bandwidths. On the other hand, optimal allocation based on real-time feedback can adapt to the rapidly-changing network environment as well as possible.
文摘The different formats of codec stream carried in the radio access network and the core network make the double speech encoding/decoding necessary, which degrades the speech quality. Accordingly, codec negotiation technologies are necessary for unifying encoding/ decoding in the whole process. Transcoder Free Operation (TrFO), Tandem Free Operation (TFO), and network quality deciding technology are the leading codec negotiation technologies. The TrFO is a mechanism for optimum selection during the establishment of a call. It tries to establish connection between User Equipment (UE) without Transcoder (TC). Its successful fulfillment enables the efficient utilization of bandwidth. The TFO, a standby technology of TrFO, is the negotiation technology of an in-band codec. With it, the user codec stream is free from the compression and decompression by the voice codec, and the quality of voice can accordingly be improved. The network-quantity deciding technology adopts G.711 or G.729 flexibly according to the number of accessed calls. This allows the access of new calls while won’t increase the load of network too much.
基金Supported by the National Natural Science Foundation of China under grant!69773046
文摘ITU-T G. 729 is the primarily recommended speech codec by H. 323 standard. This paper describes how to implement G. 729 codec in IP telephony gateway, and goes deep into the programming skills on TMS320C6201 DSP and optimizing methods of program code to reduce the speech processing delay time of G. 729 codec. Due to adopting these optimizing methods and programming skills, we have implemented a high-speed speech codec that can process concurrently 20 voice channels with single TMS320C6201 chip in IP telephony gateway. Finally, the paper analyzes the performance results of ITU-T G. 729 codec based on TMS320C6201.
基金Supported by the National High Technology Research and Development Program of China(863 Program)(2015AA016306)the Foundation of Outstanding Middle-aged and Young Scientific and Technological Innovation Team Program of Department of Education of Hubei Province(T201516)the Foundation of Department of Education of Hubei Province(Q20132207)
文摘A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stability of linear predictive coding(LPC)filter,predicts the lost frames by utilizing the linear spectrum frequency and different continuous attenuation factor of different kinds of lost frames.Signal noise ratio(SNR)test and multiple stimuli with hidden reference and anchor(MUSHRA)test are conducted to evaluate the performance of this approach in adaptive multi-rate wideband plus(AMR-WB+)audio codec.Compared with the original frame error concealment scheme,our scheme achieves better audio recovery quality in AMR-WB+audio codec.
文摘In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and a low percent root mean square difference (PRD). Since ECG signals have both intra-beat and inter-beat redundancies like video signals, which have both intra-frame and inter-frame correlation, video codec technology can be used for ECG compression. In order to do this, some pre-process will be needed. The ECG signals should firstly be segmented and normalized to a sequence of beat cycles with the same length, and then these beat cycles can be treated as picture frames and compressed with video codec technology. We have used records from MIT-BIH arrhythmia database to evaluate our algorithm. Results show that, besides compression efficiently, this algorithm has the advantages of resolution adjustable, random access and flexibility for irregular period and QRS false detection.