In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance o...In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance of two advanced models,the Large Language Model(LLM)LLaMA model and NLP BERT model,in the context of airline review sentiment analysis.Through fine-tuning,domain adaptation,and the application of few-shot learning,the study addresses the subtleties of sentiment expressions in airline-related text data.Employing predictive modeling and comparative analysis,the research evaluates the effectiveness of Large Language Model Meta AI(LLaMA)and Bidirectional Encoder Representations from Transformers(BERT)in capturing sentiment intricacies.Fine-tuning,including domain adaptation,enhances the models'performance in sentiment classification tasks.Additionally,the study explores the potential of few-shot learning to improve model generalization using minimal annotated data for targeted sentiment analysis.By conducting experiments on a diverse airline review dataset,the research quantifies the impact of fine-tuning,domain adaptation,and few-shot learning on model performance,providing valuable insights for industries aiming to predict recommendations and enhance customer satisfaction through a deeper understanding of sentiment in user-generated content(UGC).This research contributes to refining sentiment analysis models,ultimately fostering improved customer satisfaction in the airline industry.展开更多
Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data descr...Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).展开更多
In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Ve...In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.展开更多
针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展...针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展专利检索工作,通过SPC算法提取技术主路径的关键核心专利;其次,运用大语言模型提示工程抽取“问题-方案对”,借此解析技术功效,并结合功能导向搜索(Function-Oriented Search,FOS)初步查找可能具备技术替代功效的相关专利;再次,采用BERT-LSTM模型对专利文本实施二元分类,精准识别出具备技术替代功效的专利样本;通过提示工程抽取“方案-类别对”,系统识别替代技术方案;最后,建立科学-产业双维度评估体系完成替代技术潜力分级。文章以光刻技术为例,阐述该识别方法的应用流程,系统识别出极紫外(Extreme Ultra-violet,EUV)光刻技术的五种替代技术及其替代潜力。展开更多
针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention...针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention-BERT, FAT-BERT)。首先通过傅里叶变换将时域数据转换为频域数据,采用多尺度卷积全面捕捉负荷信号的时域和频域特征,从而增强模型对多样化负荷信号的表达能力;其次,在多头自注意力机制中引入频率注意力机制,从而增强模型对时序数据中频率成分的感知能力,进一步改善复杂负荷模式的表示,改进BERT模型中增加局部自注意力从而减少不必要的全局计算,提升模型的运行速度;接着将残差连接和正则化技术结合使模型在训练过程中更加稳定,并且能够更好地避免过拟合,最后在REDD和UK-DALE数据集上对提出的方法进行实验,实验结果验证了所提方法的有效性。展开更多
Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or...Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.展开更多
文摘In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance of two advanced models,the Large Language Model(LLM)LLaMA model and NLP BERT model,in the context of airline review sentiment analysis.Through fine-tuning,domain adaptation,and the application of few-shot learning,the study addresses the subtleties of sentiment expressions in airline-related text data.Employing predictive modeling and comparative analysis,the research evaluates the effectiveness of Large Language Model Meta AI(LLaMA)and Bidirectional Encoder Representations from Transformers(BERT)in capturing sentiment intricacies.Fine-tuning,including domain adaptation,enhances the models'performance in sentiment classification tasks.Additionally,the study explores the potential of few-shot learning to improve model generalization using minimal annotated data for targeted sentiment analysis.By conducting experiments on a diverse airline review dataset,the research quantifies the impact of fine-tuning,domain adaptation,and few-shot learning on model performance,providing valuable insights for industries aiming to predict recommendations and enhance customer satisfaction through a deeper understanding of sentiment in user-generated content(UGC).This research contributes to refining sentiment analysis models,ultimately fostering improved customer satisfaction in the airline industry.
基金financially supported by the National Key R&D Program of China (No.2022YFF0711601)the Natural Science Foundation of Hubei Province of China (No.2022CFB640)+2 种基金the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-2022-07-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Beijing Key Laboratory of Urban Spatial Information Engineering (No.20220108)。
文摘Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).
文摘In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.
文摘针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展专利检索工作,通过SPC算法提取技术主路径的关键核心专利;其次,运用大语言模型提示工程抽取“问题-方案对”,借此解析技术功效,并结合功能导向搜索(Function-Oriented Search,FOS)初步查找可能具备技术替代功效的相关专利;再次,采用BERT-LSTM模型对专利文本实施二元分类,精准识别出具备技术替代功效的专利样本;通过提示工程抽取“方案-类别对”,系统识别替代技术方案;最后,建立科学-产业双维度评估体系完成替代技术潜力分级。文章以光刻技术为例,阐述该识别方法的应用流程,系统识别出极紫外(Extreme Ultra-violet,EUV)光刻技术的五种替代技术及其替代潜力。
文摘针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention-BERT, FAT-BERT)。首先通过傅里叶变换将时域数据转换为频域数据,采用多尺度卷积全面捕捉负荷信号的时域和频域特征,从而增强模型对多样化负荷信号的表达能力;其次,在多头自注意力机制中引入频率注意力机制,从而增强模型对时序数据中频率成分的感知能力,进一步改善复杂负荷模式的表示,改进BERT模型中增加局部自注意力从而减少不必要的全局计算,提升模型的运行速度;接着将残差连接和正则化技术结合使模型在训练过程中更加稳定,并且能够更好地避免过拟合,最后在REDD和UK-DALE数据集上对提出的方法进行实验,实验结果验证了所提方法的有效性。
基金funded by Scientific Research Deanship at University of Hail-Saudi Arabia through Project Number RG-23092.
文摘Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.