This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a con...This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper propo...Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper proposes a new GSPT-CVAE model(Graph Structured Processing,Single Vector,and Potential Attention Com-puting Transformer-Based Conditioned Variational Autoencoder model).The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text,and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation.In the process of potential representation guiding text generation,the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text,to improve the controllability and effectiveness of text generation.The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations.The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints.The ROUGE-1 F1 score of this model is 0.243,the ROUGE-2 F1 score is 0.041,the ROUGE-L F1 score is 0.22,and the PPL-Word score is 34.303,which gives the GSPT-CVAE model a certain advantage over the baseline model.Meanwhile,this paper compares this model with the state-of-the-art generative models T5,GPT-4,Llama2,and so on,and the experimental results show that the GSPT-CVAE model has a certain competitiveness.展开更多
Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that ap...Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications.展开更多
Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from l...Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models.展开更多
We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of t...We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains.展开更多
On January 14,Heimtextil kicked off the new trade fair year with over 3,000 exhibitors from 65 countries.With steady growth,the leading trade fair for home and contract textiles and textile design is strongly position...On January 14,Heimtextil kicked off the new trade fair year with over 3,000 exhibitors from 65 countries.With steady growth,the leading trade fair for home and contract textiles and textile design is strongly positioned. This makes it a reliable platform for international participants.At the opening,architect and designer Patricia Urquiola presented her installation 'among-us' at Heimtextil.展开更多
Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in anci...Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in ancient text preservation.展开更多
The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunicatio...The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunications Union),to national regulations defining the obligations of audiovisual operators and the modalities of consumer support.Many countries have introduced specific laws and regulations to organize the gradual switch-off of analog broadcasting and encourage the adoption of new digital standards.Consequently,the digitization of Guinea’s broadcasting network cannot be carried out without taking into account the legal framework:allocation of resources and broadcasting players.Analog and digital broadcasting,according to regulatory texts,shows the relationships between the different communication management structures.As for digital broadcasting,we note the appearance of a new service,multiplex.展开更多
With the rapid development of web technology,Social Networks(SNs)have become one of the most popular platforms for users to exchange views and to express their emotions.More and more people are used to commenting on a...With the rapid development of web technology,Social Networks(SNs)have become one of the most popular platforms for users to exchange views and to express their emotions.More and more people are used to commenting on a certain hot spot in SNs,resulting in a large amount of texts containing emotions.Textual Emotion Cause Extraction(TECE)aims to automatically extract causes for a certain emotion in texts,which is an important research issue in natural language processing.It is different from the previous tasks of emotion recognition and emotion classification.In addition,it is not limited to the shallow-level emotion classification of text,but to trace the emotion source.In this paper,we provide a survey for TECE.First,we introduce the development process and classification of TECE.Then,we discuss the existing methods and key factors for TECE.Finally,we enumerate the challenges and developing trend for TECE.展开更多
China agriculture encounters and achieves persistently develop,push and robust from 2024 to 2025,and achieve sustainable perfect,reshape and remold from closing years.China agriculture develop is a new start point of ...China agriculture encounters and achieves persistently develop,push and robust from 2024 to 2025,and achieve sustainable perfect,reshape and remold from closing years.China agriculture develop is a new start point of China agriculture,is China agriculture develop’s new orientation,new protect,and new orientation.展开更多
文摘This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
文摘Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper proposes a new GSPT-CVAE model(Graph Structured Processing,Single Vector,and Potential Attention Com-puting Transformer-Based Conditioned Variational Autoencoder model).The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text,and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation.In the process of potential representation guiding text generation,the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text,to improve the controllability and effectiveness of text generation.The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations.The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints.The ROUGE-1 F1 score of this model is 0.243,the ROUGE-2 F1 score is 0.041,the ROUGE-L F1 score is 0.22,and the PPL-Word score is 34.303,which gives the GSPT-CVAE model a certain advantage over the baseline model.Meanwhile,this paper compares this model with the state-of-the-art generative models T5,GPT-4,Llama2,and so on,and the experimental results show that the GSPT-CVAE model has a certain competitiveness.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)[RS-2021-II211341,Artificial Intelligence Graduate School Program(Chung-Ang University)]by the Chung-Ang University Graduate Research Scholarship in 2024.
文摘Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications.
基金the Special Project of the Shanghai Municipal Commission of Economy and Information Technology for Promoting High-Quality Industrial Development(No.2024-GZL-RGZN-02011)the Shanghai City Digital Transformation Project(No.202301002)the Project of Shanghai Shenkang Hospital Development Center(No.SHDC22023214)。
文摘Surgical site infections(SSIs)are the most common healthcare-related infections in patients with lung cancer.Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts,which involves two types of text structuring tasks:attribute discrimination and attribute extraction.This article proposes a joint model,Multi-BGLC,around these two types of tasks,using bidirectional encoder representations from transformers(BERT)as the encoder and fine-tuning the decoder composed of graph convolutional neural network(GCNN)+long short-term memory(LSTM)+conditional random field(CRF)based on cancer case data.The GCNN is used for attribute discrimination,whereas the LSTM and CRF are used for attribute extraction.The experiment verified the effectiveness and accuracy of the model compared with other baseline models.
文摘We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains.
文摘On January 14,Heimtextil kicked off the new trade fair year with over 3,000 exhibitors from 65 countries.With steady growth,the leading trade fair for home and contract textiles and textile design is strongly positioned. This makes it a reliable platform for international participants.At the opening,architect and designer Patricia Urquiola presented her installation 'among-us' at Heimtextil.
文摘Since the launch of a digitization project for the protection and utilization of ancient texts in the Sakya Monastery of the Xizang Autonomous Region in 2012,significant efforts and achievements have been made in ancient text preservation.
文摘The application of legal texts in the context of digital television is a process that relies on several normative instruments,ranging from international treaties,such as those of the ITU(International Telecommunications Union),to national regulations defining the obligations of audiovisual operators and the modalities of consumer support.Many countries have introduced specific laws and regulations to organize the gradual switch-off of analog broadcasting and encourage the adoption of new digital standards.Consequently,the digitization of Guinea’s broadcasting network cannot be carried out without taking into account the legal framework:allocation of resources and broadcasting players.Analog and digital broadcasting,according to regulatory texts,shows the relationships between the different communication management structures.As for digital broadcasting,we note the appearance of a new service,multiplex.
基金partially supported by the National Natural Science Foundation of China under Grant No.62372121the Ministry of education of Humanities and Social Science project under Grant No.20YJAZH118+1 种基金the National Key Research and Development Program of China under Grant No.2020YFB1005804the MOE Project at Center for Linguistics and Applied Linguistics,Guangdong University of Foreign Studies。
文摘With the rapid development of web technology,Social Networks(SNs)have become one of the most popular platforms for users to exchange views and to express their emotions.More and more people are used to commenting on a certain hot spot in SNs,resulting in a large amount of texts containing emotions.Textual Emotion Cause Extraction(TECE)aims to automatically extract causes for a certain emotion in texts,which is an important research issue in natural language processing.It is different from the previous tasks of emotion recognition and emotion classification.In addition,it is not limited to the shallow-level emotion classification of text,but to trace the emotion source.In this paper,we provide a survey for TECE.First,we introduce the development process and classification of TECE.Then,we discuss the existing methods and key factors for TECE.Finally,we enumerate the challenges and developing trend for TECE.
文摘China agriculture encounters and achieves persistently develop,push and robust from 2024 to 2025,and achieve sustainable perfect,reshape and remold from closing years.China agriculture develop is a new start point of China agriculture,is China agriculture develop’s new orientation,new protect,and new orientation.