期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Compressed data direct computing for Chinese dataset on DCU
1
作者 Yani Liu Feng Zhang +4 位作者 Zaifeng Pan Xiaoguang Guo Yihua Hu Xiao Zhang Xiaoyong Du 《CCF Transactions on High Performance Computing》 2024年第2期206-220,共15页
the era of big data,information is growing at an explosive rate and shows a variety of characteristics.Accordingly,how to scientifically and efficiently manage and analyze massive amounts of data has become an urgent ... the era of big data,information is growing at an explosive rate and shows a variety of characteristics.Accordingly,how to scientifically and efficiently manage and analyze massive amounts of data has become an urgent problem for technical enterprises and government departments.Among all proposed modern techniques to handle data on large scales,text analytics directly on compression(TADOC)stands out with an innovative idea of operating on the compression and has substantial potential in various applications.Meanwhile,DCU(Deep Computing Unit),a new Chinese domestic accelerator with high acceleration performance,exhibits tremendous adaptability in transplanting the work of TADOC.Therefore,this paper proposes D-TADOC,a compressed data direct computing technology for Chinese dataset on DCU,which can effectively process data in Chinese without decompression and visualize the analytics results.There are three key components in D-TADOC.First,we incorporate TADOC with the word segmentation tool in the data preprocessing module,enabling TADOC to analyze not only English,but also Chinese texts.Second,we design the parallel processing module on the DCU architecture.Third,we develop the result visualization module,which supports the user-friendly presentation of the text analytics outcomes.We conduct experiments of D-TADOC on Sugon’s cloud computing service platform with diverse public datasets and evaluate the performance.The experiment results show that D-TADOC achieves an average speedup of 40.5×compared with the TADOC baseline on the CPU,demonstrating the adaptability of DCU for TADOC tasks as well as the efficiency of D-TADOC on compressed text analytics. 展开更多
关键词 tadoc Text analytics DCU Parallel computing
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部