期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
OCR-Assisted Masked BERT for Homoglyph Restoration towards Multiple Phishing Text Downstream Tasks
1
作者 Hanyong Lee Ye-Chan Park Jaesung Lee 《Computers, Materials & Continua》 2025年第12期4977-4993,共17页
Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that ap... Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications. 展开更多
关键词 homoglyph attack text restoration token-level correction text restoration character-level BERT OCR-assisted NLP
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部