Increased awareness of Tibetan cultural preservation,along with technological advancements,has led to significant efforts in academic research on Tibetan.However,the structural complexity of the Tibetan language and l...Increased awareness of Tibetan cultural preservation,along with technological advancements,has led to significant efforts in academic research on Tibetan.However,the structural complexity of the Tibetan language and limited labeled handwriting data impede advancements in Optical Character Recognition(OCR)and other applications.To address these challenges,this paper proposes an innovative Tibetan data augmentation technique,using Generative Adversarial Networks(GANs)to synthesise arbitrary handwriting images in variable calligraphic styles based on inputs.Moreover,our method leverages a Real-Fake Cross Inputs Strategy during training to enhance generation diversity and improve model generalisability in generating handwritten text beyond the training set and pre-defined corpus.The model was trained on three Tibetan handwriting datasets,including Ume style numerals,Uchen style consonants,and Khyug-yig style words.Experimental results demonstrate that the model successfully generates realistic and recognisable Tibetan numeral and consonant handwriting,achieving Frechet Inception Distance(FID)scores of 14.45 and 27.63,respectively.The proposed method's effectiveness in augmenting OCR models was validated as evidenced by a reduced OCR Word Error Rate(WER)on the augmented datasets.展开更多
弗雷德里克·威廉·托马斯(Frederick William Thomas),是英国著名的藏学家之一,也是系统整理并出版敦煌古藏文文献的重要学者。他的《敦煌西域古藏文社会历史文献》与《东北藏古代民间文学》两部著作,代表了当时国际敦煌古藏...弗雷德里克·威廉·托马斯(Frederick William Thomas),是英国著名的藏学家之一,也是系统整理并出版敦煌古藏文文献的重要学者。他的《敦煌西域古藏文社会历史文献》与《东北藏古代民间文学》两部著作,代表了当时国际敦煌古藏文研究的最高水平,被誉为“倾其一生才智贡献给学术界的力作”,兼具第一手史料价值与开拓性研究意义。作为英国藏学界的杰出代表,其研究不仅填补了藏学领域的诸多空白,更开创了敦煌藏文文献系统研究之先河。他融合文献学、语言学、历史学与人类学等多学科方法,为藏学研究提供了全新的范式与方法论基础,至今仍深刻影响着该学科的发展。本文采用文献研究法,从托马斯的人生经历、学术成果及其对藏学的主要贡献3方面入手,探讨其如何通过文献、语言与文化的多维视角,构建起独具特色的藏学研究体系,并持续影响后世。展开更多
文摘Increased awareness of Tibetan cultural preservation,along with technological advancements,has led to significant efforts in academic research on Tibetan.However,the structural complexity of the Tibetan language and limited labeled handwriting data impede advancements in Optical Character Recognition(OCR)and other applications.To address these challenges,this paper proposes an innovative Tibetan data augmentation technique,using Generative Adversarial Networks(GANs)to synthesise arbitrary handwriting images in variable calligraphic styles based on inputs.Moreover,our method leverages a Real-Fake Cross Inputs Strategy during training to enhance generation diversity and improve model generalisability in generating handwritten text beyond the training set and pre-defined corpus.The model was trained on three Tibetan handwriting datasets,including Ume style numerals,Uchen style consonants,and Khyug-yig style words.Experimental results demonstrate that the model successfully generates realistic and recognisable Tibetan numeral and consonant handwriting,achieving Frechet Inception Distance(FID)scores of 14.45 and 27.63,respectively.The proposed method's effectiveness in augmenting OCR models was validated as evidenced by a reduced OCR Word Error Rate(WER)on the augmented datasets.
文摘弗雷德里克·威廉·托马斯(Frederick William Thomas),是英国著名的藏学家之一,也是系统整理并出版敦煌古藏文文献的重要学者。他的《敦煌西域古藏文社会历史文献》与《东北藏古代民间文学》两部著作,代表了当时国际敦煌古藏文研究的最高水平,被誉为“倾其一生才智贡献给学术界的力作”,兼具第一手史料价值与开拓性研究意义。作为英国藏学界的杰出代表,其研究不仅填补了藏学领域的诸多空白,更开创了敦煌藏文文献系统研究之先河。他融合文献学、语言学、历史学与人类学等多学科方法,为藏学研究提供了全新的范式与方法论基础,至今仍深刻影响着该学科的发展。本文采用文献研究法,从托马斯的人生经历、学术成果及其对藏学的主要贡献3方面入手,探讨其如何通过文献、语言与文化的多维视角,构建起独具特色的藏学研究体系,并持续影响后世。