最近混淆网络在融合多个机器翻译结果中展示很好的性能.然而为了克服在不同的翻译系统中不同的词序,假设对齐在混淆网络的构建上仍然是一个重要的问题.但以往的对齐方法都没有考虑到语义信息.本文为了更好地改进系统融合的性能,提出了...最近混淆网络在融合多个机器翻译结果中展示很好的性能.然而为了克服在不同的翻译系统中不同的词序,假设对齐在混淆网络的构建上仍然是一个重要的问题.但以往的对齐方法都没有考虑到语义信息.本文为了更好地改进系统融合的性能,提出了用词义消歧(Word sense disambiguation,WSD)来指导混淆网络中的对齐.同时骨架翻译的选择也是通过计算句子间的相似度来获得的,句子的相似性计算使用了二分图的最大匹配算法.为了使得基于WordNet词义消歧方法融入到系统中,本文将翻译错误率(Translation error rate,TER)算法进行了改进,实验结果显示本方法的性能好于经典的TER算法的性能.展开更多
This paper will focus on some common word order errors that Chinese students make.The author will present a body of contrastive material to illustrate what specific word order errors are in collected writing samples o...This paper will focus on some common word order errors that Chinese students make.The author will present a body of contrastive material to illustrate what specific word order errors are in collected writing samples of Chinese students.Then the er⁃rors will be broken down into certain error types and be analyzed in depth.At the end of this paper,each error type occurrence rate will be calculated and the statistics will be represented in a pie chart.The paper finds out interlingual interference leads to these common errors by Chinese English language learners.展开更多
The challenging task of handwriting style synthesis requires capturing the individuality and diversity of human handwriting.The majority of currently available methods use either a generative adversarial network(GAN)o...The challenging task of handwriting style synthesis requires capturing the individuality and diversity of human handwriting.The majority of currently available methods use either a generative adversarial network(GAN)or a recurrent neural network(RNN)to generate new handwriting styles.This is why these techniques frequently fall short of producing diverse and realistic text pictures,particularly for terms that are not commonly used.To resolve that,this research proposes a novel deep learning model that consists of a style encoder and a text generator to synthesize different handwriting styles.This network excels in generating conditional text by extracting style vectors from a series of style images.The model performs admirably on a range of handwriting synthesis tasks,including the production of text that is out-of-vocabulary.It works more effectively than previous approaches by displaying lower values on key Generative Adversarial Network evaluation metrics,such Geometric Score(GS)(3.21×10^(-5))and Fréchet Inception Distance(FID)(8.75),as well as text recognition metrics,like Character Error Rate(CER)and Word Error Rate(WER).A thorough component analysis revealed the steady improvement in image production quality,highlighting the importance of specific handwriting styles.Applicable fields include digital forensics,creative writing,and document security.展开更多
In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a va...In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.展开更多
文摘最近混淆网络在融合多个机器翻译结果中展示很好的性能.然而为了克服在不同的翻译系统中不同的词序,假设对齐在混淆网络的构建上仍然是一个重要的问题.但以往的对齐方法都没有考虑到语义信息.本文为了更好地改进系统融合的性能,提出了用词义消歧(Word sense disambiguation,WSD)来指导混淆网络中的对齐.同时骨架翻译的选择也是通过计算句子间的相似度来获得的,句子的相似性计算使用了二分图的最大匹配算法.为了使得基于WordNet词义消歧方法融入到系统中,本文将翻译错误率(Translation error rate,TER)算法进行了改进,实验结果显示本方法的性能好于经典的TER算法的性能.
文摘This paper will focus on some common word order errors that Chinese students make.The author will present a body of contrastive material to illustrate what specific word order errors are in collected writing samples of Chinese students.Then the er⁃rors will be broken down into certain error types and be analyzed in depth.At the end of this paper,each error type occurrence rate will be calculated and the statistics will be represented in a pie chart.The paper finds out interlingual interference leads to these common errors by Chinese English language learners.
基金supported by the National Research Foundation of Korea(NRF)Grant funded by the Korean government(MSIT)(NRF-2023R1A2C1005950).
文摘The challenging task of handwriting style synthesis requires capturing the individuality and diversity of human handwriting.The majority of currently available methods use either a generative adversarial network(GAN)or a recurrent neural network(RNN)to generate new handwriting styles.This is why these techniques frequently fall short of producing diverse and realistic text pictures,particularly for terms that are not commonly used.To resolve that,this research proposes a novel deep learning model that consists of a style encoder and a text generator to synthesize different handwriting styles.This network excels in generating conditional text by extracting style vectors from a series of style images.The model performs admirably on a range of handwriting synthesis tasks,including the production of text that is out-of-vocabulary.It works more effectively than previous approaches by displaying lower values on key Generative Adversarial Network evaluation metrics,such Geometric Score(GS)(3.21×10^(-5))and Fréchet Inception Distance(FID)(8.75),as well as text recognition metrics,like Character Error Rate(CER)and Word Error Rate(WER).A thorough component analysis revealed the steady improvement in image production quality,highlighting the importance of specific handwriting styles.Applicable fields include digital forensics,creative writing,and document security.
基金the Commonweal Technical Project of Zhejiang Province of China under Grant No. 2013C33063, the National Natural Science Foundation of China under Grant Nos. 61100183, 61402417, the Natural Science Foundation of Zhejiang Province of China under Grant No. LQ13F020014, and the 521 Talents Project of Zhejiang Sci-Tech University.
文摘In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.