All times, there have been translators who endeavored to convey the original meaning of the poet by their own understandings and ways of expression. Therefore, different translators might have different translation fo...All times, there have been translators who endeavored to convey the original meaning of the poet by their own understandings and ways of expression. Therefore, different translators might have different translation for the same poem. Some pay more attention to the form while others to the meaning.展开更多
【目的/意义】利用图书文本内容实现相似图书推荐,海量图书数据环境下提高图书相似度计算效率。【方法/过程】构建了一种基于图结构的相似图书内容推荐方法,在图书的文本内容进行短语抽取后计算短语网络中的TextRank值获得图书关键词,...【目的/意义】利用图书文本内容实现相似图书推荐,海量图书数据环境下提高图书相似度计算效率。【方法/过程】构建了一种基于图结构的相似图书内容推荐方法,在图书的文本内容进行短语抽取后计算短语网络中的TextRank值获得图书关键词,进而建立图书向量并结合层次可导航小世界算法(Hierarchcal Navigable Small World,HNSW)得到目标图书和推荐图书之间的相似度。【结果/结论】利用基于内容的相似图书推荐方法得到的用户评价平均准确率达到0.807,客观平均准确率显著高于TF-IDF和TextRank的文本表示方法,可以实现较好的图书推荐效果,HNSW算法将计算效率缩小到对数级别,对大数据环境下的相似图书计算效率起到一定的优化作用。【创新/局限】本研究创新性地结合图结构和HNSW算法提高了图书推荐的准确性和计算效率,但受限于对腾讯词典的依赖,影响了向量表达的普适性和跨语言适应性。展开更多
Introduction Types of paper Contributions falling into the following categories will be considered for publication:Reviews article(>10000 words),Short review(~5000 words),Feature article(>8000 words),Research pa...Introduction Types of paper Contributions falling into the following categories will be considered for publication:Reviews article(>10000 words),Short review(~5000 words),Feature article(>8000 words),Research paper,Short communication(~3000 words),Commentary(~1000 words),and Viewpiont(~3000 words).Please ensure that you select the appropriate article type from the list of options when making your submission.Authors contributing to special issues should ensure that they select the special issue article type from this list.展开更多
目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时...目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。Purpose: To address the limitations of topic identification and evolution analysis methods based on LDA models, such as difficulty in selecting the number of topics and strong subjectivity in time window partitioning, and to propose optimization improvements, in order to promote the progress of topic identification and evolution analysis methods. Method: Combining TF-IDF algorithm and Word2Vec word vector technology to calculate topic vectors, reducing the influence of commonly used vocabulary in topic generation, while achieving semantic expression of topic vectors. Propose a method for dividing time windows based on changes in topic semantic distance during the process of topic evolution, and track the evolution trend of topic intensity and content in the target domain. Finally, empirical research will be conducted using literature in the field of open source software as an example. Result: The research results show that the optimization method proposed in this paper can effectively identify research topics and hot topics in the field, track the path of topic evolution over time, and visualize it. Conclusion: There are six key themes in software open source research, among which “open source governance” and “market competition” are hot topics in this research field. From the evolution of the theme content, research on open source software has shifted from the autonomous motivation of individual participation to the participation of organizations such as enterprises and governments.展开更多
文摘All times, there have been translators who endeavored to convey the original meaning of the poet by their own understandings and ways of expression. Therefore, different translators might have different translation for the same poem. Some pay more attention to the form while others to the meaning.
文摘【目的/意义】利用图书文本内容实现相似图书推荐,海量图书数据环境下提高图书相似度计算效率。【方法/过程】构建了一种基于图结构的相似图书内容推荐方法,在图书的文本内容进行短语抽取后计算短语网络中的TextRank值获得图书关键词,进而建立图书向量并结合层次可导航小世界算法(Hierarchcal Navigable Small World,HNSW)得到目标图书和推荐图书之间的相似度。【结果/结论】利用基于内容的相似图书推荐方法得到的用户评价平均准确率达到0.807,客观平均准确率显著高于TF-IDF和TextRank的文本表示方法,可以实现较好的图书推荐效果,HNSW算法将计算效率缩小到对数级别,对大数据环境下的相似图书计算效率起到一定的优化作用。【创新/局限】本研究创新性地结合图结构和HNSW算法提高了图书推荐的准确性和计算效率,但受限于对腾讯词典的依赖,影响了向量表达的普适性和跨语言适应性。
文摘Introduction Types of paper Contributions falling into the following categories will be considered for publication:Reviews article(>10000 words),Short review(~5000 words),Feature article(>8000 words),Research paper,Short communication(~3000 words),Commentary(~1000 words),and Viewpiont(~3000 words).Please ensure that you select the appropriate article type from the list of options when making your submission.Authors contributing to special issues should ensure that they select the special issue article type from this list.
文摘目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。Purpose: To address the limitations of topic identification and evolution analysis methods based on LDA models, such as difficulty in selecting the number of topics and strong subjectivity in time window partitioning, and to propose optimization improvements, in order to promote the progress of topic identification and evolution analysis methods. Method: Combining TF-IDF algorithm and Word2Vec word vector technology to calculate topic vectors, reducing the influence of commonly used vocabulary in topic generation, while achieving semantic expression of topic vectors. Propose a method for dividing time windows based on changes in topic semantic distance during the process of topic evolution, and track the evolution trend of topic intensity and content in the target domain. Finally, empirical research will be conducted using literature in the field of open source software as an example. Result: The research results show that the optimization method proposed in this paper can effectively identify research topics and hot topics in the field, track the path of topic evolution over time, and visualize it. Conclusion: There are six key themes in software open source research, among which “open source governance” and “market competition” are hot topics in this research field. From the evolution of the theme content, research on open source software has shifted from the autonomous motivation of individual participation to the participation of organizations such as enterprises and governments.