期刊文献+
共找到1,225篇文章
< 1 2 62 >
每页显示 20 50 100
小样本高校实验室事故致因语义分析与建模
1
作者 刘春祥 吴欣悦 +2 位作者 黄萍 余龙星 范传刚 《实验技术与管理》 北大核心 2026年第1期265-272,共8页
针对传统事故致因分析方法主观性强、耗时耗力的局限性,该文提出融合领域知识进行文本分析的事故致因分析方法。该文以73例高校事故文本为样本,运用Deepseek-reasoner模型对文本进行结构化构造,经TF-IDF算法处理提取了30个关键事故致因... 针对传统事故致因分析方法主观性强、耗时耗力的局限性,该文提出融合领域知识进行文本分析的事故致因分析方法。该文以73例高校事故文本为样本,运用Deepseek-reasoner模型对文本进行结构化构造,经TF-IDF算法处理提取了30个关键事故致因因素;结合实验室安全领域知识训练了Fast Text模型并对语义进行向量化处理,有效克服了小样本事故文本分析中语义稀疏性的弊端。随后通过K-means聚类分析事故致因因素向量,并与传统4M模型对比验证。结果显示,K-means聚类结果与4M模型划分结果高度一致,验证了融合领域知识的Fast Text模型在小样本条件下的可行性与有效性。研究表明,该方法可实现高校实验室事故致因因素的自动化识别、语义理解与分类,克服了小样本事故文本分析中语义信息稀疏性和领域适配性不足的问题,并具有向其他领域推广的潜力。 展开更多
关键词 实验室安全 事故分析 文本分析 Deepseek模型 Fast Text模型
在线阅读 下载PDF
AI-Generated Text Detection:A Comprehensive Review of Active and Passive Approaches
2
作者 Lingyun Xiang Nian Li +1 位作者 Yuling Liu Jiayong Hu 《Computers, Materials & Continua》 2026年第3期201-229,共29页
The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and ot... The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and other risks.Detecting AI-generated text has thus become essential to safeguard the authenticity and reliability of digital information.This survey reviews recent progress in detection methods,categorizing approaches into passive and active categories based on their reliance on intrinsic textual features or embedded signals.Passive detection is further divided into surface linguistic feature-based and language model-based methods,whereas active detection encompasses watermarking-based and semantic retrieval-based approaches.This taxonomy enables systematic comparison of methodological differences in model dependency,applicability,and robustness.A key challenge for AI-generated text detection is that existing detectors are highly vulnerable to adversarial attacks,particularly paraphrasing,which substantially compromises their effectiveness.Addressing this gap highlights the need for future research on enhancing robustness and cross-domain generalization.By synthesizing current advances and limitations,this survey provides a structured reference for the field and outlines pathways toward more reliable and scalable detection solutions. 展开更多
关键词 AI-generated text detection large language models text classification WATERMARKING
在线阅读 下载PDF
A Dynamic Masking-Based Multi-Learning Framework for Sparse Classification
3
作者 Woo Hyun Park Dong Ryeol Shin 《Computers, Materials & Continua》 2026年第3期1365-1380,共16页
With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study p... With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification. 展开更多
关键词 Text classification dynamic learning contextual features data sparsity masking-based representation
在线阅读 下载PDF
The Soul of Geography:Bojie Fu’s vision for science and humanity
4
作者 Michael E Meadows 《Geography and Sustainability》 2026年第1期253-254,共2页
In an academic environment increasingly shaped by metrics and the imperatives of“publish or perish”,it is rare to encounter a leading scientist willing to interweave personal narrative with conceptual reflection.The... In an academic environment increasingly shaped by metrics and the imperatives of“publish or perish”,it is rare to encounter a leading scientist willing to interweave personal narrative with conceptual reflection.The Soul of Geography by Fu(2025)achieves precisely this.The book resists simple categorisation:it is neither a conventional monograph nor a memoir,but rather a hybrid text that integrates autobiography,disciplinary reflection,and scientific arguments.In doing so,Fu articulates not only the trajectory of his own career but also a vision of geography as a discipline of theoretical depth and practical relevance. 展开更多
关键词 scientific arguments hybrid text AUTOBIOGRAPHY GEOGRAPHY disciplinary reflection publish perish theoretical depth metrics
在线阅读 下载PDF
Research on the Classification of Digital Cultural Texts Based on ASSC-TextRCNN Algorithm
5
作者 Zixuan Guo Houbin Wang +1 位作者 Sameer Kumar Yuanfang Chen 《Computers, Materials & Continua》 2026年第3期2119-2145,共27页
With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard ... With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture. 展开更多
关键词 Text classification natural language processing TextRCNN model albert pre-training singular value decomposition cross-entropy loss function
在线阅读 下载PDF
The Continuation Task and the Model-as-Feedback Writing Task in L2 Writing Development:Timing of Model Texts
6
作者 Xiaoyan Zhang 《Chinese Journal of Applied Linguistics》 2026年第1期76-91,160,共17页
This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a con... This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning. 展开更多
关键词 continuation task model-as-feedback writing task L2 writing development timing of model texts
在线阅读 下载PDF
Impact of texting and web surfing on driving behavior and safety in rural roads
7
作者 Marios Sekadakis Christos Katrakazas +3 位作者 Foteini Orfanou Dimosthenis Pavlou Maria Oikonomou George Yannis 《International Journal of Transportation Science and Technology》 2023年第3期665-682,共18页
The present study aims to investigate the impact of texting and web surfing on the driving behavior and safety of young drivers on rural roads.For this purpose,driving data were gathered through a driving simulator ex... The present study aims to investigate the impact of texting and web surfing on the driving behavior and safety of young drivers on rural roads.For this purpose,driving data were gathered through a driving simulator experiment with 37 young drivers.Additionally,a survey was conducted to collect their demographic characteristics and driving behavior preferences.During the experiment,the drivers were distracted using contemporary smartphone internet applications i.e.,Facebook Messenger,Facebook and Google Maps.Regression analysis models were developed in order to identify and investigate the effect of distraction on accident probability,speed deviation,headway distance,as well as lateral distance deviation.Additionally,random forest(RF),a machine learning classification algorithm,was deployed for real-time distraction prediction.It was revealed that distraction due to web surfing and texting leads to a statistically significant increase in accident probability,headway distance and lateral distance deviation by 32%,27%and 6%,respectively.Moreover,the driving speed deviation was reduced by 47%during distraction.Apart from the real-time prediction,the RF revealed that headway distance,lateral distance,and traffic volume were important features.The RF outcomes revealed consistency with regression analysis and drivers during the distractive task are more defensive by driving at the edge of the road near the hard shoulder and maintaining longer headways.Overall,driving behavior and safety among young drivers were both significantly affected by the investigated internet applications. 展开更多
关键词 DISTRACTION Driving simulator SMARTPHONE Web surfing texting Road safety
在线阅读 下载PDF
RNSQL:融合逆规范化的Text2SQL生成
8
作者 帖军 范子琪 +2 位作者 孙翀 郑禄 朱柏尔 《计算机应用与软件》 北大核心 2025年第9期31-37,86,共8页
Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及... Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及多表连接的SQL生成仍存在大量错误。因此,提出一种基于神经网络的Text2SQL方法,该方法通过逆规范化技术,对数据库模式进行重构,关注SQL句法生成的正确性,称为逆规范化网络(Reverse Normalization SQL,RNSQL)。经理论分析和在公共数据集Spider上实验验证,RNSQL能有效提升Text2SQL任务的质量。 展开更多
关键词 逆规范化 语义解析 Text2SQL 槽填充
在线阅读 下载PDF
基于自然语言处理的“双碳”政策知识图谱构建及应用 被引量:1
9
作者 吕涛 王青山 +3 位作者 张紫玉 吴昱磊 周孜柔 王洛 《煤炭经济研究》 2025年第2期122-132,共11页
“双碳”政策具有发布数量多、覆盖范围广、内容复杂多样等特点,现有的呈现方式难以满足知识检索和内在分析的需求。以2953条“双碳”政策文本为数据源,提出了一种基于自然语言处理的“双碳”政策知识图谱构建方法,首先构建了知识图谱... “双碳”政策具有发布数量多、覆盖范围广、内容复杂多样等特点,现有的呈现方式难以满足知识检索和内在分析的需求。以2953条“双碳”政策文本为数据源,提出了一种基于自然语言处理的“双碳”政策知识图谱构建方法,首先构建了知识图谱模式层,定义了“双碳”政策实体、属性和关系,之后采用Text Rank关键词抽取、LDA主题建模等算法提取政策实体、属性及关系,构建了知识图谱数据层,最终将〈实体,关系,实体〉三元组存入Neo4j图数据库,形成“双碳”政策知识图谱。所构建的知识图谱包含2048个实体节点和32336条关系,可通过Cypher语言实现不同细粒度政策实体和关系的关联查询与可视化,挖掘“双碳”政策中的关键语义信息和政策热点,还可为智能服务提供语义增强功能,提高“双碳”政策推荐系统的效率和政策问答系统的准确度。 展开更多
关键词 “双碳”政策 知识图谱 自然语言处理 Neo4j LDA Text Rank
原文传递
Separate Source Channel Coding Is Still What You Need:An LLM-Based Rethinking 被引量:3
10
作者 REN Tianqi LI Rongpeng +5 位作者 ZHAO Mingmin CHEN Xianfu LIU Guangyi YANG Yang ZHAO Zhifeng ZHANG Honggang 《ZTE Communications》 2025年第1期30-44,共15页
Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information ... Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need. 展开更多
关键词 separate source channel coding(SSCC) joint source channel coding(JSCC) end-to-end communication system Large Language Model(LLM) lossless text compression Error Correction Code Transformer(ECCT)
在线阅读 下载PDF
基于Text2Vec_AE_KMeans的微博话题聚类分析方法
11
作者 万文桐 黄润才 《智能计算机与应用》 2025年第5期82-89,共8页
传统的话题聚类分析方法使用静态词向量对微博文本进行建模,对微博文本不规范表达、一词多义等特点应对不佳,从而影响聚类效果与话题表述。针对此,提出了一种基于Text2Vec_AE_KMeans的深度文本特征提取与聚类的微博话题聚类分析方法。首... 传统的话题聚类分析方法使用静态词向量对微博文本进行建模,对微博文本不规范表达、一词多义等特点应对不佳,从而影响聚类效果与话题表述。针对此,提出了一种基于Text2Vec_AE_KMeans的深度文本特征提取与聚类的微博话题聚类分析方法。首先,使用基于MacBert预训练模型与CoSENT文本语句建模方法设计的Text2Vec预训练模型,对微博话题文本进行文本语义表示,从而改进静态词向量在文本特征建模方面的不足;然后,通过带有非线性激活函数的AutoEncoder降维网络对高维非线性文本特征进行降维;最后,在话题聚类分析的过程中采用KMeans_C-TF-IDF算法进行面向微博文本的聚类分析,从聚类簇的角度把握话题分布信息。在真实微博话题数据集上,相较于传统静态词向量建模方法,本文提出的方法在聚类评价指标上表现优异,生成的话题信息可识别性较好。 展开更多
关键词 话题聚类分析 CoSENT Text2Vec 自编码器
在线阅读 下载PDF
全球家纺行业的韧性:Heimtextil 2025展览规模创新高 被引量:1
12
作者 钟梦夏 《中国纺织》 2025年第1期96-97,共2页
1月14日至17日,Heimtextil 2025法兰克福国际家用及商用纺织品展览会(以下简称“Heimtextil 2025”)在德国法兰克福展览中心隆重举行。这场为期四天的展会,来自全球142个国家和地区的3000多家展商聚集于此,50000多名观众参与其中,展商... 1月14日至17日,Heimtextil 2025法兰克福国际家用及商用纺织品展览会(以下简称“Heimtextil 2025”)在德国法兰克福展览中心隆重举行。这场为期四天的展会,来自全球142个国家和地区的3000多家展商聚集于此,50000多名观众参与其中,展商数量、观众数量、观众满意度等多项数据再创新记录。 展开更多
关键词 展览规模 家纺行业 法兰克福展览 观众满意度 TEXT 纺织品 He
在线阅读 下载PDF
From text to image:challenges in integrating vision into ChatGPT for medical image interpretation
13
作者 Shunsuke Koga Wei Du 《Neural Regeneration Research》 SCIE CAS 2025年第2期487-488,共2页
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te... Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023). 展开更多
关键词 IMAGE DIAGNOSIS TEXT
在线阅读 下载PDF
Detection and Recognition of Spray Code Numbers on Can Surfaces Based on OCR
14
作者 Hailong Wang Junchao Shi 《Computers, Materials & Continua》 SCIE EI 2025年第1期1109-1128,共20页
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ... A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition. 展开更多
关键词 Can coding recognition differentiable binarization network scene visual text recognition model pruning and quantification transport model
在线阅读 下载PDF
A Deep Learning Framework for Arabic Cyberbullying Detection in Social Networks
15
作者 Yahya Tashtoush Areen Banysalim +3 位作者 Majdi Maabreh Shorouq Al-Eidi Ola Karajeh Plamen Zahariev 《Computers, Materials & Continua》 2025年第5期3113-3134,共22页
Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to signi... Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to significant challenges,one of the most pressing being cyberbullying.This issue has become a major concern in modern society,particularly due to its profound negative impacts on the mental health and well-being of its victims.In the Arab world,where social media usage is exceptionblly high,cyberbullying has become increasingly prevalent,necessitating urgent attention.Early detection of harmful online behavior is critical to fostering safer digital environments and mitigating the adverse efcts of cyberbullying.This underscores the importance of developing advanced tools and systems to identify and address such behavior efectively.This paper investigates the development of a robust cyberbullying detection and classifcation system tailored for Arabic comments on YouTube.The study explores the efectiveness of various deep learning models,including Bi-LSTM(Bidirectional Long Short Term Memory),LSTM(Long Short-Term Memory),CNN(Convolutional Neural Networks),and a hybrid CNN-LSTM,in classifying Arabic comments into binary classes(bullying or not)and multiclass categories.A comprehensive dataset of 20,000 Arabic YouTube comments was collected,preprocessed,and labeled to support these tasks.The results revealed that the CNN and hybrid CNN-LSTM models achieved the highest accuracy in binary classification,reaching an impressive 91.9%.For multiclass dlassification,the LSTM and Bi-LSTM models outperformed others,achieving an accuracy of 89.5%.These findings highlight the efctiveness of deep learning approaches in the mitigation of cyberbullying within Arabic online communities. 展开更多
关键词 Arabic text lassification arabic text mining cyberbullying detection neural networks deep learning CNN LSTM YOUTUBE Bi-LSTM
在线阅读 下载PDF
Reflective thinking meets artificial intelligence:Synthesizing sustainability transition knowledge in left-behind mountain regions
16
作者 Andrej Ficko Simo Sarkki +2 位作者 Yasar Selman Gultekin Antonia Egli Juha Hiedanpää 《Geography and Sustainability》 2025年第1期159-169,共11页
We demonstrate a multi-method approach towards discovering and structuring sustainability transition knowl edge in marginalized mountain regions.By employing reflective thinking,artificial intelligence(AI)-powered tex... We demonstrate a multi-method approach towards discovering and structuring sustainability transition knowl edge in marginalized mountain regions.By employing reflective thinking,artificial intelligence(AI)-powered text summarization and text mining,we synthesize experts’narratives on sustainable development challenges and solutions in Kardüz Upland,Türkiye.We then analyze their alignment with the UN Sustainable Development Goals(SDGs)using document embedding.Investment in infrastructure,education,and resilient socio-ecological systems emerged as priority sectors to combat poor infrastructure,geographic isolation,climate change,poverty,depopulation,unemployment,low education levels,and inadequate social services.The narratives were closest in substance to SDG 1,3,and 11.Social dimensions of sustainability were more pronounced than environmental dimensions.The presented approach supports policymakers in organizing loosely structured sustainability tran sition knowledge and fragmented data corpora,while also advancing AI applications for designing and planning sustainable development policies at the regional level. 展开更多
关键词 Artificial intelligence INNOVATION Reflective thinking Scientific imagination Text mining Text summarization
在线阅读 下载PDF
Text Structured Algorithm of Lung Cancer Cases Based on Deep Learning
17
作者 MI Linhui YUAN Junyi +1 位作者 ZHOU Yankang HOU Xumin 《Journal of Shanghai Jiaotong university(Science)》 2025年第4期778-789,共12页
Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from l... Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models. 展开更多
关键词 text structuring text classification sequence labeling data augmentation lung cancer electronic medical record
原文传递
OCR-Assisted Masked BERT for Homoglyph Restoration towards Multiple Phishing Text Downstream Tasks
18
作者 Hanyong Lee Ye-Chan Park Jaesung Lee 《Computers, Materials & Continua》 2025年第12期4977-4993,共17页
Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that ap... Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications. 展开更多
关键词 Homoglyph attack text restoration token-level correction text restoration character-level BERT OCR-assisted NLP
在线阅读 下载PDF
Application of Legal Texts in the Migration from Analog to Digital Television in the Republic of Guinea
19
作者 M’mahawa Bangoura Alsény Bangoura Mamadou Sanoussi Camara 《Journal of Energy and Power Engineering》 2025年第2期54-58,共5页
The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunicatio... The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunications Union),to national regulations defining the obligations of audiovisual operators and the modalities of consumer support.Many countries have introduced specific laws and regulations to organize the gradual switch-off of analog broadcasting and encourage the adoption of new digital standards.Consequently,the digitization of Guinea’s broadcasting network cannot be carried out without taking into account the legal framework:allocation of resources and broadcasting players.Analog and digital broadcasting,according to regulatory texts,shows the relationships between the different communication management structures.As for digital broadcasting,we note the appearance of a new service,multiplex. 展开更多
关键词 APPLICATION TEXTS legal MIGRATION television ANALOG digital Republic Guinea
在线阅读 下载PDF
Safeguarding a Treasure Trove:Sakya Monastery Preserves Relics and Ancient Texts
20
作者 Palden Nyima(Text/Photos) 《China's Tibet》 2025年第6期36-39,共4页
Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in anci... Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in ancient text preservation. 展开更多
关键词 protection utilization ancient texts DIGITIZATION Sakya Monastery ancient text preservation digitization project Xizang Autonomous Region
在线阅读 下载PDF
上一页 1 2 62 下一页 到第
使用帮助 返回顶部