In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance o...In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance of two advanced models,the Large Language Model(LLM)LLaMA model and NLP BERT model,in the context of airline review sentiment analysis.Through fine-tuning,domain adaptation,and the application of few-shot learning,the study addresses the subtleties of sentiment expressions in airline-related text data.Employing predictive modeling and comparative analysis,the research evaluates the effectiveness of Large Language Model Meta AI(LLaMA)and Bidirectional Encoder Representations from Transformers(BERT)in capturing sentiment intricacies.Fine-tuning,including domain adaptation,enhances the models'performance in sentiment classification tasks.Additionally,the study explores the potential of few-shot learning to improve model generalization using minimal annotated data for targeted sentiment analysis.By conducting experiments on a diverse airline review dataset,the research quantifies the impact of fine-tuning,domain adaptation,and few-shot learning on model performance,providing valuable insights for industries aiming to predict recommendations and enhance customer satisfaction through a deeper understanding of sentiment in user-generated content(UGC).This research contributes to refining sentiment analysis models,ultimately fostering improved customer satisfaction in the airline industry.展开更多
Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or...Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.展开更多
Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data descr...Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).展开更多
Traditional named entity recognition methods need professional domain knowl-edge and a large amount of human participation to extract features,as well as the Chinese named entity recognition method based on a neural n...Traditional named entity recognition methods need professional domain knowl-edge and a large amount of human participation to extract features,as well as the Chinese named entity recognition method based on a neural network model,which brings the prob-lem that vector representation is too singular in the process of character vector representa-tion.To solve the above problem,we propose a Chinese named entity recognition method based on the BERT-BiLSTM-ATT-CRF model.Firstly,we use the bidirectional encoder representations from transformers(BERT)pre-training language model to obtain the se-mantic vector of the word according to the context information of the word;Secondly,the word vectors trained by BERT are input into the bidirectional long-term and short-term memory network embedded with attention mechanism(BiLSTM-ATT)to capture the most important semantic information in the sentence;Finally,the conditional random field(CRF)is used to learn the dependence between adjacent tags to obtain the global optimal sentence level tag sequence.The experimental results show that the proposed model achieves state-of-the-art performance on both Microsoft Research Asia(MSRA)corpus and people’s daily corpus,with F1 values of 94.77% and 95.97% respectively.展开更多
In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Ve...In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.展开更多
Electrocardiogram(ECG)is a commonly used tool in biological diagnosis of heart diseases.ECG allows the representation of electrical signals which cause heart muscles to contract and relax.Recently,accurate deep learni...Electrocardiogram(ECG)is a commonly used tool in biological diagnosis of heart diseases.ECG allows the representation of electrical signals which cause heart muscles to contract and relax.Recently,accurate deep learning methods have been developed to overcome manual diagnosis in terms of time and effort.However,most of current automatic medical diagnosis use long electrocardiogram(ECG)signals to inspect different types of heart arrhythmia.Therefore,ECG signal files tend to require large storage to store and may cause significant overhead when exchanged over a computer network.This raises the need to come up with effective compression methods for ECG signals.In this work,the authors investigate using BERT(Bidirectional Encoder Representations from Transformers)model,which is a bidirectional neural network that was originally designed for natural language.The authors evaluate the model with respect to its compression ratio and information preservation,and measure information preservation in terms of the of the accuracy of a convolutional neural network in classifying the decompressed signal.The results show that the method can achieve up to 83%saving in storage.Also,the classification accuracy of the decompressed signals is around 92.41%.Furthermore,the method enables the user to balance the compression ratio and the required accuracy of the CNN classifiers.展开更多
Objective To improve the efficiency of patent clustering related to COVID-19 through the topic extraction algorithm and BERT model,and to help researchers understand the patent applications for novel corona virus.Meth...Objective To improve the efficiency of patent clustering related to COVID-19 through the topic extraction algorithm and BERT model,and to help researchers understand the patent applications for novel corona virus.Methods The weights of topic vector and BERT model vector were adjusted by cross-entropy loss algorithm to obtain joint vector.Then,k-means++algorithm was used for patent clustering after dimension reduction.Results and Conclusion The model was applied to patents for corona virus drugs,and five clustering topics were generated.Through comparison,it is proved that the clustering results of this model are more centralized and the differentiation between clusters is significant.The five clusters generated are visually analyzed to reveal the development status of patents for corona virus drugs.展开更多
Searchable encryption provides an effective way for data security and privacy in cloud storage.Users can retrieve encrypted data in the cloud under the premise of protecting their own data security and privacy.However...Searchable encryption provides an effective way for data security and privacy in cloud storage.Users can retrieve encrypted data in the cloud under the premise of protecting their own data security and privacy.However,most of the current content-based retrieval schemes do not contain enough semantic information of the article and cannot fully reflect the semantic information of the text.In this paper,we propose two secure and semantic retrieval schemes based on BERT(bidirectional encoder representations from transformers)named SSRB-1,SSRB-2.By training the documents with BERT,the keyword vector is generated to contain more semantic information of the documents,which improves the accuracy of retrieval and makes the retrieval result more consistent with the user’s intention.Finally,through testing on real data sets,it is shown that both of our solutions are feasible and effective.展开更多
In the evaluation of graduation theses,teachers’evaluation criteria for graduation theses are inconsistent,subjective and not completely reasonable and fair.This paper proposes using the BERT model to analyze the exi...In the evaluation of graduation theses,teachers’evaluation criteria for graduation theses are inconsistent,subjective and not completely reasonable and fair.This paper proposes using the BERT model to analyze the existing graduation papers in colleges and universities and make quantitatively evaluate students’graduation projects according to the given relevant parameters.The purpose of this method is to use standards to make comprehensive,systematic and accurate evaluations and avoid the phenomenon of high repetition and similarity caused by a large number of teachers’comments.This can not only effectively improve the efficiency of graduation design evaluation but also improve the fairness of evaluation.In this paper,changing the review work of the graduation thesis from pure manual operation to machine review combined with manual operation can not only reduce manpower consumption but also make the review work more objective and fair,making it more objective on the basis of traditional subjective review.展开更多
文摘In the rapidly evolving landscape of natural language processing(NLP)and sentiment analysis,improving the accuracy and efficiency of sentiment classification models is crucial.This paper investigates the performance of two advanced models,the Large Language Model(LLM)LLaMA model and NLP BERT model,in the context of airline review sentiment analysis.Through fine-tuning,domain adaptation,and the application of few-shot learning,the study addresses the subtleties of sentiment expressions in airline-related text data.Employing predictive modeling and comparative analysis,the research evaluates the effectiveness of Large Language Model Meta AI(LLaMA)and Bidirectional Encoder Representations from Transformers(BERT)in capturing sentiment intricacies.Fine-tuning,including domain adaptation,enhances the models'performance in sentiment classification tasks.Additionally,the study explores the potential of few-shot learning to improve model generalization using minimal annotated data for targeted sentiment analysis.By conducting experiments on a diverse airline review dataset,the research quantifies the impact of fine-tuning,domain adaptation,and few-shot learning on model performance,providing valuable insights for industries aiming to predict recommendations and enhance customer satisfaction through a deeper understanding of sentiment in user-generated content(UGC).This research contributes to refining sentiment analysis models,ultimately fostering improved customer satisfaction in the airline industry.
基金funded by Scientific Research Deanship at University of Hail-Saudi Arabia through Project Number RG-23092.
文摘Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.
基金financially supported by the National Key R&D Program of China (No.2022YFF0711601)the Natural Science Foundation of Hubei Province of China (No.2022CFB640)+2 种基金the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-2022-07-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Beijing Key Laboratory of Urban Spatial Information Engineering (No.20220108)。
文摘Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).
文摘Traditional named entity recognition methods need professional domain knowl-edge and a large amount of human participation to extract features,as well as the Chinese named entity recognition method based on a neural network model,which brings the prob-lem that vector representation is too singular in the process of character vector representa-tion.To solve the above problem,we propose a Chinese named entity recognition method based on the BERT-BiLSTM-ATT-CRF model.Firstly,we use the bidirectional encoder representations from transformers(BERT)pre-training language model to obtain the se-mantic vector of the word according to the context information of the word;Secondly,the word vectors trained by BERT are input into the bidirectional long-term and short-term memory network embedded with attention mechanism(BiLSTM-ATT)to capture the most important semantic information in the sentence;Finally,the conditional random field(CRF)is used to learn the dependence between adjacent tags to obtain the global optimal sentence level tag sequence.The experimental results show that the proposed model achieves state-of-the-art performance on both Microsoft Research Asia(MSRA)corpus and people’s daily corpus,with F1 values of 94.77% and 95.97% respectively.
文摘In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.
文摘Electrocardiogram(ECG)is a commonly used tool in biological diagnosis of heart diseases.ECG allows the representation of electrical signals which cause heart muscles to contract and relax.Recently,accurate deep learning methods have been developed to overcome manual diagnosis in terms of time and effort.However,most of current automatic medical diagnosis use long electrocardiogram(ECG)signals to inspect different types of heart arrhythmia.Therefore,ECG signal files tend to require large storage to store and may cause significant overhead when exchanged over a computer network.This raises the need to come up with effective compression methods for ECG signals.In this work,the authors investigate using BERT(Bidirectional Encoder Representations from Transformers)model,which is a bidirectional neural network that was originally designed for natural language.The authors evaluate the model with respect to its compression ratio and information preservation,and measure information preservation in terms of the of the accuracy of a convolutional neural network in classifying the decompressed signal.The results show that the method can achieve up to 83%saving in storage.Also,the classification accuracy of the decompressed signals is around 92.41%.Furthermore,the method enables the user to balance the compression ratio and the required accuracy of the CNN classifiers.
文摘Objective To improve the efficiency of patent clustering related to COVID-19 through the topic extraction algorithm and BERT model,and to help researchers understand the patent applications for novel corona virus.Methods The weights of topic vector and BERT model vector were adjusted by cross-entropy loss algorithm to obtain joint vector.Then,k-means++algorithm was used for patent clustering after dimension reduction.Results and Conclusion The model was applied to patents for corona virus drugs,and five clustering topics were generated.Through comparison,it is proved that the clustering results of this model are more centralized and the differentiation between clusters is significant.The five clusters generated are visually analyzed to reveal the development status of patents for corona virus drugs.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.U1836110 and U1836208)by the Jiangsu Basic Research Programs-Natural Science Foundation under grant numbers BK20200039.
文摘Searchable encryption provides an effective way for data security and privacy in cloud storage.Users can retrieve encrypted data in the cloud under the premise of protecting their own data security and privacy.However,most of the current content-based retrieval schemes do not contain enough semantic information of the article and cannot fully reflect the semantic information of the text.In this paper,we propose two secure and semantic retrieval schemes based on BERT(bidirectional encoder representations from transformers)named SSRB-1,SSRB-2.By training the documents with BERT,the keyword vector is generated to contain more semantic information of the documents,which improves the accuracy of retrieval and makes the retrieval result more consistent with the user’s intention.Finally,through testing on real data sets,it is shown that both of our solutions are feasible and effective.
基金supported by the scientific research project of Nanjing Xiaozhuang University (Project name:Graduation paper“Evaluation”--Evaluation Automatic generation system based on multilevel analysis,No.2020XSKY001)supported by the project (Project name:Research and development of self generating remote controller for intelligent toilet,No.2021320108001139)supported by the project (Project name:Modeling of energy industry based on artificial intelligence holographic interactive atlas system,No.2021320108002080).
文摘In the evaluation of graduation theses,teachers’evaluation criteria for graduation theses are inconsistent,subjective and not completely reasonable and fair.This paper proposes using the BERT model to analyze the existing graduation papers in colleges and universities and make quantitatively evaluate students’graduation projects according to the given relevant parameters.The purpose of this method is to use standards to make comprehensive,systematic and accurate evaluations and avoid the phenomenon of high repetition and similarity caused by a large number of teachers’comments.This can not only effectively improve the efficiency of graduation design evaluation but also improve the fairness of evaluation.In this paper,changing the review work of the graduation thesis from pure manual operation to machine review combined with manual operation can not only reduce manpower consumption but also make the review work more objective and fair,making it more objective on the basis of traditional subjective review.