摘要
为了进一步弘扬丝路文化遗产价值,有必要对丝路文化遗产数据进行深入分析与探究。然而,目前丝路文化遗产数据呈现多源异构的特性,包括不同来源的数据和不同模态的数据,导致了对多维度海量数据进行深层次的处理较为困难。文章首先通过对互联网数据的垂直搜索,高效采集丝绸之路相关信息;接着利用支持向量机自动快速、精确地完成文本分类工作;然后利用文本聚类技术对数据信息进行去重、去噪等清洗作业;最后,评选出影响力较大的事件,形成《丝绸之路文化遗产年报》对全球公开发布。文章为丝路文化遗产数据的分析与挖掘提供了经验与借鉴。
Silk,as one of the important inventions in ancient China,carries rich cultural,technological and social connotations.The Silk Road,which was originally used for silk transport,opened the first large-scale trade exchange between the East and the West in the world history.Since the successful inscription of the Silk Road on the World Heritage List in 2014,its historical concept and practical significance have been further explored and expanded.In order to further promote the value of Silk Road cultural heritage and build a bridge for mutual learning between different civilizations,it is necessary to conduct an in-depth analysis and exploration on the Silk Roads cultural heritage data.However,the current data including that from different sources(e.g.:data from different countries,data in different languages,and data from different platforms)and that in different modalities(e.g.:databases in structured data,document report XML and other unstructured data)present the characteristics of multi-source and heterogeneity,which results in the difficulty of deep processing to the multi-dimensional massive data.For achievingthe deep and efficient integration of Silk Road cultural heritage data,an intelligent mining method for multi-source and heterogeneous data is studied.We first collect information about Silk Roads through a vertical search of Internet data.For the multi-source and heterogeneous Silk Road data,it can collect coarse-grained information on the entire Internet under the man-machine integration to ensure its wide coverage.It enables a massive data storage and full-text retrieval system,and retrieval work implementing with a millisecond response speed for millions of documents.Through the man-machine method,the translation software is integrated in the process of using the capture software to achieve the universally multilingual information.Then,we use support vector machine to automatically complete the text classification work quickly and accurately.Specifically,we use TF-IDF to extract words in the text that can highly effectively express the subject and content,identify the full text of the information and extract key elements.The text is serialized to represent and output the abstract sentence with the highest weight,and the support vector machine is used to classify the text content.Then,the data information is cleaned using text clustering techniques such as de-duplication and denoising.In the aspect of redundancy removal,a similarity calculation method based on text clustering technology is proposed to filter redundant data by setting a critical threshold.In terms of denoising,outlier analysis is used to eliminate the noise data effectively.Finally,the influential events are selected to form the Annual Report Cultural Heritage on the Silk Roads for public release around the world.In this paper,a data acquisition system with high coverage and high efficiency is constructed,redundant and noisy data are removed by multi-dimensional fusion data cleaning method,and automatic indexing,automatic abstracting and data classification methods are designed for multi-source heterogeneous silk road heritage data.It is found that using artificial intelligence data mining technology to study the data of Silk Road cultural heritage can effectively ensure the comprehensiveness,multi-dimension and efficiency of the data.The research results aim to publicize the value of the Silk Road heritage to the public,enhance and stimulate the public’s attention and interest in the Silk Roads,and provide experience and reference for the analysis and mining of the Silk Road cultural heritage data.
作者
杨寒淋
周娅鹃
赵丰
徐蓉
安薇竹
翁正秋
宁灵舰
金宇
YANG Hanlin;ZHOU Yajuan;ZHAO Feng;XU Rong;AN Weizhu;WENG Zhengqiu;NING Lingjian;JIN Yu(Department of International Exchanges,China National Silk Museum,Hangzhou 310002,China;College of Artificial Intelligence,Wenzhou Polytechnic,Wenzhou 325006,China;College of Textile Science and Engineering(International Institute of Silk),Zhejiang Sci-Tech University,Hangzhou 310018,China;Zhejiang Branch,Tongfang Knowledge Network Technology Co.,Ltd.(Beijing),Hangzhou 310018,China)
出处
《丝绸》
CAS
CSCD
北大核心
2023年第1期9-15,共7页
Journal of Silk
基金
浙江省文物保护科技项目(2021002)
温州职业技术学院高职教育专项研究项目(WZYGJzd202101)。
关键词
丝绸之路
文化遗产
多源异构
数据垂直搜索
支持向量机
文本聚类
大数据分析
Silk Road
cultural heritage
multi-source heterogeneity
vertical data search
support vector machine
text clustering
big data analysis