Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a lan...Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.展开更多
以文本为基础,借助现代检索软件进行文本分析是语料库辅助外语教学的一个重要方面。本研究以Eugene Raud-sepp的议论文《做点白日梦》(Daydream a Little)为例,通过语料库辅助手段对比主题词,从量化的角度直观地分析议论文中论点的提出...以文本为基础,借助现代检索软件进行文本分析是语料库辅助外语教学的一个重要方面。本研究以Eugene Raud-sepp的议论文《做点白日梦》(Daydream a Little)为例,通过语料库辅助手段对比主题词,从量化的角度直观地分析议论文中论点的提出、论据的运用和论证的推理过程以及三者的紧密结合,说明利用语料库辅助方法进行议论文本的量化分析是一种新的尝试。展开更多
【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提...【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提供参考。【方法/过程】本文采用内容分析法,以Web of science、维普和中国知网为信息源对其中刊载的上下位关系识别相关研究成果进行了梳理与分析。【结果/结论】上下位关系识别取得了一定的成果,但远未解决,对此还需要进一步的探索和研究。最后从研究方法、基准与评估、领域知识、语言以及应用5个方面对上下位关系识别研究给出了建议。展开更多
Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Searc...Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolution model, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.展开更多
文摘Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.
文摘以文本为基础,借助现代检索软件进行文本分析是语料库辅助外语教学的一个重要方面。本研究以Eugene Raud-sepp的议论文《做点白日梦》(Daydream a Little)为例,通过语料库辅助手段对比主题词,从量化的角度直观地分析议论文中论点的提出、论据的运用和论证的推理过程以及三者的紧密结合,说明利用语料库辅助方法进行议论文本的量化分析是一种新的尝试。
文摘【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提供参考。【方法/过程】本文采用内容分析法,以Web of science、维普和中国知网为信息源对其中刊载的上下位关系识别相关研究成果进行了梳理与分析。【结果/结论】上下位关系识别取得了一定的成果,但远未解决,对此还需要进一步的探索和研究。最后从研究方法、基准与评估、领域知识、语言以及应用5个方面对上下位关系识别研究给出了建议。
基金Acknowledgements The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which significantly contributed to improving the manuscript. This work was supported by the National Key Basic Research Project of China (973 Program) (2012CB316400), the National Natural Science Foundation of China (Grant Nos. 61471321, 61202400, 31300539, and 31570629), the Zhejiang Provincial Natural Science Foundation of China (LY15C140005, LY16F010004), Science and Technology Department of Zhejiang Province Public Welfare Project (2016C31G2010057, 2015C31004), Fundamental Research Funds for the Central Universities (172210261) and the Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology Research.
文摘Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolution model, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.