The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and ot...The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and other risks.Detecting AI-generated text has thus become essential to safeguard the authenticity and reliability of digital information.This survey reviews recent progress in detection methods,categorizing approaches into passive and active categories based on their reliance on intrinsic textual features or embedded signals.Passive detection is further divided into surface linguistic feature-based and language model-based methods,whereas active detection encompasses watermarking-based and semantic retrieval-based approaches.This taxonomy enables systematic comparison of methodological differences in model dependency,applicability,and robustness.A key challenge for AI-generated text detection is that existing detectors are highly vulnerable to adversarial attacks,particularly paraphrasing,which substantially compromises their effectiveness.Addressing this gap highlights the need for future research on enhancing robustness and cross-domain generalization.By synthesizing current advances and limitations,this survey provides a structured reference for the field and outlines pathways toward more reliable and scalable detection solutions.展开更多
With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study p...With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.展开更多
In an academic environment increasingly shaped by metrics and the imperatives of“publish or perish”,it is rare to encounter a leading scientist willing to interweave personal narrative with conceptual reflection.The...In an academic environment increasingly shaped by metrics and the imperatives of“publish or perish”,it is rare to encounter a leading scientist willing to interweave personal narrative with conceptual reflection.The Soul of Geography by Fu(2025)achieves precisely this.The book resists simple categorisation:it is neither a conventional monograph nor a memoir,but rather a hybrid text that integrates autobiography,disciplinary reflection,and scientific arguments.In doing so,Fu articulates not only the trajectory of his own career but also a vision of geography as a discipline of theoretical depth and practical relevance.展开更多
With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard ...With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture.展开更多
This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a con...This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning.展开更多
The present study aims to investigate the impact of texting and web surfing on the driving behavior and safety of young drivers on rural roads.For this purpose,driving data were gathered through a driving simulator ex...The present study aims to investigate the impact of texting and web surfing on the driving behavior and safety of young drivers on rural roads.For this purpose,driving data were gathered through a driving simulator experiment with 37 young drivers.Additionally,a survey was conducted to collect their demographic characteristics and driving behavior preferences.During the experiment,the drivers were distracted using contemporary smartphone internet applications i.e.,Facebook Messenger,Facebook and Google Maps.Regression analysis models were developed in order to identify and investigate the effect of distraction on accident probability,speed deviation,headway distance,as well as lateral distance deviation.Additionally,random forest(RF),a machine learning classification algorithm,was deployed for real-time distraction prediction.It was revealed that distraction due to web surfing and texting leads to a statistically significant increase in accident probability,headway distance and lateral distance deviation by 32%,27%and 6%,respectively.Moreover,the driving speed deviation was reduced by 47%during distraction.Apart from the real-time prediction,the RF revealed that headway distance,lateral distance,and traffic volume were important features.The RF outcomes revealed consistency with regression analysis and drivers during the distractive task are more defensive by driving at the edge of the road near the hard shoulder and maintaining longer headways.Overall,driving behavior and safety among young drivers were both significantly affected by the investigated internet applications.展开更多
Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information ...Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ...A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.展开更多
Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to signi...Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to significant challenges,one of the most pressing being cyberbullying.This issue has become a major concern in modern society,particularly due to its profound negative impacts on the mental health and well-being of its victims.In the Arab world,where social media usage is exceptionblly high,cyberbullying has become increasingly prevalent,necessitating urgent attention.Early detection of harmful online behavior is critical to fostering safer digital environments and mitigating the adverse efcts of cyberbullying.This underscores the importance of developing advanced tools and systems to identify and address such behavior efectively.This paper investigates the development of a robust cyberbullying detection and classifcation system tailored for Arabic comments on YouTube.The study explores the efectiveness of various deep learning models,including Bi-LSTM(Bidirectional Long Short Term Memory),LSTM(Long Short-Term Memory),CNN(Convolutional Neural Networks),and a hybrid CNN-LSTM,in classifying Arabic comments into binary classes(bullying or not)and multiclass categories.A comprehensive dataset of 20,000 Arabic YouTube comments was collected,preprocessed,and labeled to support these tasks.The results revealed that the CNN and hybrid CNN-LSTM models achieved the highest accuracy in binary classification,reaching an impressive 91.9%.For multiclass dlassification,the LSTM and Bi-LSTM models outperformed others,achieving an accuracy of 89.5%.These findings highlight the efctiveness of deep learning approaches in the mitigation of cyberbullying within Arabic online communities.展开更多
We demonstrate a multi-method approach towards discovering and structuring sustainability transition knowl edge in marginalized mountain regions.By employing reflective thinking,artificial intelligence(AI)-powered tex...We demonstrate a multi-method approach towards discovering and structuring sustainability transition knowl edge in marginalized mountain regions.By employing reflective thinking,artificial intelligence(AI)-powered text summarization and text mining,we synthesize experts’narratives on sustainable development challenges and solutions in Kardüz Upland,Türkiye.We then analyze their alignment with the UN Sustainable Development Goals(SDGs)using document embedding.Investment in infrastructure,education,and resilient socio-ecological systems emerged as priority sectors to combat poor infrastructure,geographic isolation,climate change,poverty,depopulation,unemployment,low education levels,and inadequate social services.The narratives were closest in substance to SDG 1,3,and 11.Social dimensions of sustainability were more pronounced than environmental dimensions.The presented approach supports policymakers in organizing loosely structured sustainability tran sition knowledge and fragmented data corpora,while also advancing AI applications for designing and planning sustainable development policies at the regional level.展开更多
Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from l...Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models.展开更多
Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that ap...Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications.展开更多
The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunicatio...The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunications Union),to national regulations defining the obligations of audiovisual operators and the modalities of consumer support.Many countries have introduced specific laws and regulations to organize the gradual switch-off of analog broadcasting and encourage the adoption of new digital standards.Consequently,the digitization of Guinea’s broadcasting network cannot be carried out without taking into account the legal framework:allocation of resources and broadcasting players.Analog and digital broadcasting,according to regulatory texts,shows the relationships between the different communication management structures.As for digital broadcasting,we note the appearance of a new service,multiplex.展开更多
Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in anci...Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in ancient text preservation.展开更多
基金supported in part by the Science and Technology Innovation Program of Hunan Province under Grant 2025RC3166the National Natural Science Foundation of China under Grant 62572176the National Key R&D Program of China under Grant 2024YFF0618800.
文摘The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and other risks.Detecting AI-generated text has thus become essential to safeguard the authenticity and reliability of digital information.This survey reviews recent progress in detection methods,categorizing approaches into passive and active categories based on their reliance on intrinsic textual features or embedded signals.Passive detection is further divided into surface linguistic feature-based and language model-based methods,whereas active detection encompasses watermarking-based and semantic retrieval-based approaches.This taxonomy enables systematic comparison of methodological differences in model dependency,applicability,and robustness.A key challenge for AI-generated text detection is that existing detectors are highly vulnerable to adversarial attacks,particularly paraphrasing,which substantially compromises their effectiveness.Addressing this gap highlights the need for future research on enhancing robustness and cross-domain generalization.By synthesizing current advances and limitations,this survey provides a structured reference for the field and outlines pathways toward more reliable and scalable detection solutions.
基金supported by the SungKyunKwan University and the BK21 FOUR(Graduate School Innovation)funded by the Ministry of Education(MOE,Korea)and National Research Foundation of Korea(NRF).
文摘With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.
文摘In an academic environment increasingly shaped by metrics and the imperatives of“publish or perish”,it is rare to encounter a leading scientist willing to interweave personal narrative with conceptual reflection.The Soul of Geography by Fu(2025)achieves precisely this.The book resists simple categorisation:it is neither a conventional monograph nor a memoir,but rather a hybrid text that integrates autobiography,disciplinary reflection,and scientific arguments.In doing so,Fu articulates not only the trajectory of his own career but also a vision of geography as a discipline of theoretical depth and practical relevance.
基金funded by China National Innovation and Entrepreneurship Project Fund Innovation Training Program(202410451009).
文摘With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture.
文摘This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning.
文摘The present study aims to investigate the impact of texting and web surfing on the driving behavior and safety of young drivers on rural roads.For this purpose,driving data were gathered through a driving simulator experiment with 37 young drivers.Additionally,a survey was conducted to collect their demographic characteristics and driving behavior preferences.During the experiment,the drivers were distracted using contemporary smartphone internet applications i.e.,Facebook Messenger,Facebook and Google Maps.Regression analysis models were developed in order to identify and investigate the effect of distraction on accident probability,speed deviation,headway distance,as well as lateral distance deviation.Additionally,random forest(RF),a machine learning classification algorithm,was deployed for real-time distraction prediction.It was revealed that distraction due to web surfing and texting leads to a statistically significant increase in accident probability,headway distance and lateral distance deviation by 32%,27%and 6%,respectively.Moreover,the driving speed deviation was reduced by 47%during distraction.Apart from the real-time prediction,the RF revealed that headway distance,lateral distance,and traffic volume were important features.The RF outcomes revealed consistency with regression analysis and drivers during the distractive task are more defensive by driving at the edge of the road near the hard shoulder and maintaining longer headways.Overall,driving behavior and safety among young drivers were both significantly affected by the investigated internet applications.
基金supported in part by the National Key Research and Development Program of China under Grant No.2024YFE0200600the Zhejiang Provincial Natural Science Foundation of China under Grant No.LR23F010005the Huawei Cooperation Project under Grant No.TC20240829036。
文摘Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
文摘A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.
基金financed by the European Union-NextGenerationEU,through the National Recowery and Resilience Plan of the Republic of Bulgaria,Project No.BG-RRP-2.013-0001-C01.
文摘Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to significant challenges,one of the most pressing being cyberbullying.This issue has become a major concern in modern society,particularly due to its profound negative impacts on the mental health and well-being of its victims.In the Arab world,where social media usage is exceptionblly high,cyberbullying has become increasingly prevalent,necessitating urgent attention.Early detection of harmful online behavior is critical to fostering safer digital environments and mitigating the adverse efcts of cyberbullying.This underscores the importance of developing advanced tools and systems to identify and address such behavior efectively.This paper investigates the development of a robust cyberbullying detection and classifcation system tailored for Arabic comments on YouTube.The study explores the efectiveness of various deep learning models,including Bi-LSTM(Bidirectional Long Short Term Memory),LSTM(Long Short-Term Memory),CNN(Convolutional Neural Networks),and a hybrid CNN-LSTM,in classifying Arabic comments into binary classes(bullying or not)and multiclass categories.A comprehensive dataset of 20,000 Arabic YouTube comments was collected,preprocessed,and labeled to support these tasks.The results revealed that the CNN and hybrid CNN-LSTM models achieved the highest accuracy in binary classification,reaching an impressive 91.9%.For multiclass dlassification,the LSTM and Bi-LSTM models outperformed others,achieving an accuracy of 89.5%.These findings highlight the efctiveness of deep learning approaches in the mitigation of cyberbullying within Arabic online communities.
基金work conducted under COST Action CA21125-a European forum for revitalisation of marginalised moun-tain areas(MARGISTAR)supported by COST(European Cooperation in Science and Technology)gratefully acknowledges the support received for the research from the University of Ljubljana’s research program Forest,forestry and renewable forest resources(P4-0059).
文摘We demonstrate a multi-method approach towards discovering and structuring sustainability transition knowl edge in marginalized mountain regions.By employing reflective thinking,artificial intelligence(AI)-powered text summarization and text mining,we synthesize experts’narratives on sustainable development challenges and solutions in Kardüz Upland,Türkiye.We then analyze their alignment with the UN Sustainable Development Goals(SDGs)using document embedding.Investment in infrastructure,education,and resilient socio-ecological systems emerged as priority sectors to combat poor infrastructure,geographic isolation,climate change,poverty,depopulation,unemployment,low education levels,and inadequate social services.The narratives were closest in substance to SDG 1,3,and 11.Social dimensions of sustainability were more pronounced than environmental dimensions.The presented approach supports policymakers in organizing loosely structured sustainability tran sition knowledge and fragmented data corpora,while also advancing AI applications for designing and planning sustainable development policies at the regional level.
基金the Special Project of the Shanghai Municipal Commission of Economy and Information Technology for Promoting High-Quality Industrial Development(No.2024-GZL-RGZN-02011)the Shanghai City Digital Transformation Project(No.202301002)the Project of Shanghai Shenkang Hospital Development Center(No.SHDC22023214)。
文摘Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)[RS-2021-II211341,Artificial Intelligence Graduate School Program(Chung-Ang University)]by the Chung-Ang University Graduate Research Scholarship in 2024.
文摘Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications.
文摘The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunications Union),to national regulations defining the obligations of audiovisual operators and the modalities of consumer support.Many countries have introduced specific laws and regulations to organize the gradual switch-off of analog broadcasting and encourage the adoption of new digital standards.Consequently,the digitization of Guinea’s broadcasting network cannot be carried out without taking into account the legal framework:allocation of resources and broadcasting players.Analog and digital broadcasting,according to regulatory texts,shows the relationships between the different communication management structures.As for digital broadcasting,we note the appearance of a new service,multiplex.
文摘Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in ancient text preservation.