摘要
针对单/双面英文文本文件,经过双向(横向 + 纵向)切割后形成的碎纸片,本文通过设计拼接算法将其还原。首先,利用“英文字母的结构特征”和“空白行间距”这两个几何特征将原图中同行的碎纸片按行聚类。在此基础上,我们利用向量的l1范数差异度模型对每类碎片进行列拼接,以形成一个横切碎片,最后再对所有的横切碎片进行行拼接即可。在算法的数值检验部分,我们以2013年全国大学生数学建模赛题为例,对横纵切后形成的209块单/双面英文碎纸片进行拼接复原。数值复原结果证实了该算法实现简单,且聚类成功率高,其中聚类部分的正确率可以达到93%以上。
This paper designs an algorithm to restore English shredded documents no matter they are single- sized or double-sized text files which are cut both vertically and horizontally. Firstly, we cluster the fragments which were located in the same line in original text files according to the structural features of English letters and the row spacing. Then, using l1 norm difference model, we attach the fragments in the same class. By this way, the scraps of paper in the same line can be restored as a whole crosscutting shredded document. Finally, we should splice the crosscutting shredded doc-uments into a complete image. In the numerical test part, taking the 2013 national mathematics model contest problem as examples, our algorithm restores 209 pieces of English shredded doc-uments. Numerical results show that the correct rate of clustering is over 93% which demonstrates the efficiency of the algorithm.
出处
《应用数学进展》
2016年第2期159-165,共7页
Advances in Applied Mathematics