Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarizatio...Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarization system of musk deer domain.Due to limited annotated data,diverse entity types,and the ambiguity of Chinese word boundaries in musk deer domain NER,we present a novel NER model,CAELF-GP,which is based on cross-attention mechanism enhanced lexical features(CAELF).Specifically,we employ BERT as a character encoder and advocate the integration of external lexical information at the character representation layer.In the feature fusion module,instead of indiscriminately merging external dictionary information,we innovatively adopted a feature fusion method based on a cross-attention mechanism,which guides the model to focus on important lexical information by calculating the correlation between each character and its corresponding word sets.This module enhances the model’s semantic representation ability and entity boundary recognition capability.Ultimately,we introduce the decoding module of GlobalPointer(GP)for entity type recognition,capable of identifying both nested and non-nested entities.Since there is currently no publicly available dataset for the musk deer domain,we built a named entity recognition dataset for this domain by collecting relevant literature and working under the guidance of domain experts.The dataset facilitates the training and validation of the model and provides data foundation for subsequent related research.The model undergoes experimentation on two public datasets and the dataset of musk deer domain.The results show that it is superior to the baseline models,offering a promising technical avenue for the intelligent recognition of named entities in the musk deer domain.展开更多
The mass data of social media and social networks generated by users play an important role in tracking users’sentiments and opinions online.A good polarity lexicon which can effectively improve the classification re...The mass data of social media and social networks generated by users play an important role in tracking users’sentiments and opinions online.A good polarity lexicon which can effectively improve the classification results of sentiment analysis is indispensable to analyze the user’s sentiments.Inspired by social cognitive theories,we combine basic emotion value lexicon and social evidence lexicon to improve traditional polarity lexicon.The proposed method obtains significant improvement in Chinese text sentiment analysis by using the proposed lexicon and new syntactic analysis method.展开更多
This paper presents a method to learn semantic lexicons using a new bootstrapping method based on graph mutual reinforcement(CMR).The approach uses only unlabeled data and a few seed words to learn new words for each ...This paper presents a method to learn semantic lexicons using a new bootstrapping method based on graph mutual reinforcement(CMR).The approach uses only unlabeled data and a few seed words to learn new words for each semantic category.Different from other bootstrapping methods,we use GMR-based bootstrapping to sort the candidate words and patterns.Experi-mental results show that the GMR.-based bootstrapping approach outperforms the existing algorithms both in in-domain data and out-domain data.Furthermore,it shows that the result depends on not only the size of the corpus but also the quality.展开更多
A novel method of constructing sentiment lexicon of new words(SLNW)is proposed to realize effective Weibo sentiment analysis by integrating existing lexicons of sentiments,lexicons of degree,negation and network.Based...A novel method of constructing sentiment lexicon of new words(SLNW)is proposed to realize effective Weibo sentiment analysis by integrating existing lexicons of sentiments,lexicons of degree,negation and network.Based on left-right entropy and mutual information(MI)neologism discovery algorithms,this new algorithm divides N-gram to obtain strings dynamically instead of relying on fixed sliding window when using Trie as data structure.The sentiment-oriented point mutual information(SO-PMI)algorithm with Laplacian smoothing is used to distinguish sentiment tendency of new words found in the data set to form SLNW by putting new words to basic sentiment lexicon.Experiments show that the sentiment analysis based on SLNW performs better than others.Precision,recall and F-measure are improved in both topic and non-topic Weibo data sets.展开更多
For a long time,there exists a considerable amount of sexism in English,especially in English lexicon.In this paper,the author will discuss some presentations of sexism in English lexicon,and try to analyze some facto...For a long time,there exists a considerable amount of sexism in English,especially in English lexicon.In this paper,the author will discuss some presentations of sexism in English lexicon,and try to analyze some factors which have a great influence on the existence of sexism in English.This paper wants to arouse more and more people to realize the importance and urgency of desexism.展开更多
Forensic linguistics, which is the interface between language and law, is a newly emerging interdiscipline in China. It belongs to neither the science of law nor the pure research category of linguistics, but it is an...Forensic linguistics, which is the interface between language and law, is a newly emerging interdiscipline in China. It belongs to neither the science of law nor the pure research category of linguistics, but it is an interdisciplinary subject based on these two disciplines. The linguistic issue in legal field is its key problem. At present, forensic linguistics in present China lays emphasis on written language instead of spoken language. This article gives a brief comparative analysis of Chinese and English forensic lexicon and the similarity of English and Chinese forensic lexicon. It also suggests that learners should view the differences between the two from the cultural perspective.展开更多
The focus of the thesis is the construction of multidimensional mental lexicon of second language. It is made up of four dimensions—dimension of meaning, dimension of pronunciation, dimension of orthography and dimen...The focus of the thesis is the construction of multidimensional mental lexicon of second language. It is made up of four dimensions—dimension of meaning, dimension of pronunciation, dimension of orthography and dimension of context so that through establishing these four dimensions, it comes into being.展开更多
The theories of mental lexicon explain how words are organized and accessed in human brain from the angle of psycho linguistics. It draws great interest to study on the field of psycholinguistics and SLA. This paper f...The theories of mental lexicon explain how words are organized and accessed in human brain from the angle of psycho linguistics. It draws great interest to study on the field of psycholinguistics and SLA. This paper focuses on incidental vocabulary acquisition of L2 and explores how to assist learners to reinforce and expand their network of mental lexicon by applying all kinds of mental connection in order to promote the learners to acquire English vocabulary.展开更多
Currently,the sentiment analysis research in the Malaysian context lacks in terms of the availability of the sentiment lexicon.Thus,this issue is addressed in this paper in order to enhance the accuracy of sentiment a...Currently,the sentiment analysis research in the Malaysian context lacks in terms of the availability of the sentiment lexicon.Thus,this issue is addressed in this paper in order to enhance the accuracy of sentiment analysis.In this study,a new lexicon for sentiment analysis is constructed.A detailed review of existing approaches has been conducted,and a new bilingual sentiment lexicon known as MELex(Malay-English Lexicon)has been generated.Constructing MELex involves three activities:seed words selection,polarity assignment,and synonym expansions.Our approach differs from previous works in that MELex can analyze text for the two most widely used languages in Malaysia,Malay,and English,with the accuracy achieved,is 90%.It is evaluated based on the experimentation and case study approaches where the affordable housing projects in Malaysia are selected as case projects.This finding has given an implication on the ability of MELex to analyze public sentiments in the Malaysian context.The novel aspects of this paper are two-fold.Firstly,it introduces the new technique in assigning the polarity score,and second,it improves the performance over the classification of mixed language content.展开更多
基金funded by 5·5 Engineering Research&Innovation Team Project of Beijing Forestry University(No.BLRC2023C02).
文摘Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarization system of musk deer domain.Due to limited annotated data,diverse entity types,and the ambiguity of Chinese word boundaries in musk deer domain NER,we present a novel NER model,CAELF-GP,which is based on cross-attention mechanism enhanced lexical features(CAELF).Specifically,we employ BERT as a character encoder and advocate the integration of external lexical information at the character representation layer.In the feature fusion module,instead of indiscriminately merging external dictionary information,we innovatively adopted a feature fusion method based on a cross-attention mechanism,which guides the model to focus on important lexical information by calculating the correlation between each character and its corresponding word sets.This module enhances the model’s semantic representation ability and entity boundary recognition capability.Ultimately,we introduce the decoding module of GlobalPointer(GP)for entity type recognition,capable of identifying both nested and non-nested entities.Since there is currently no publicly available dataset for the musk deer domain,we built a named entity recognition dataset for this domain by collecting relevant literature and working under the guidance of domain experts.The dataset facilitates the training and validation of the model and provides data foundation for subsequent related research.The model undergoes experimentation on two public datasets and the dataset of musk deer domain.The results show that it is superior to the baseline models,offering a promising technical avenue for the intelligent recognition of named entities in the musk deer domain.
基金the National Natural Science Foundation of China(No.61303094)the Doctoral Fund ofMinistry of Education of China(No.20123108120027)+2 种基金the Program of Science and Technology Commission of Shanghai Municipality(No.14511107100)the Shanghai Leading Academic Discipline Project(No.J50103)the Innovation Program of Shanghai Municipal Education Commission(No.14YZ024)
文摘The mass data of social media and social networks generated by users play an important role in tracking users’sentiments and opinions online.A good polarity lexicon which can effectively improve the classification results of sentiment analysis is indispensable to analyze the user’s sentiments.Inspired by social cognitive theories,we combine basic emotion value lexicon and social evidence lexicon to improve traditional polarity lexicon.The proposed method obtains significant improvement in Chinese text sentiment analysis by using the proposed lexicon and new syntactic analysis method.
基金Supported by National Natural Science Foundation of China(60673038,60503070)
文摘This paper presents a method to learn semantic lexicons using a new bootstrapping method based on graph mutual reinforcement(CMR).The approach uses only unlabeled data and a few seed words to learn new words for each semantic category.Different from other bootstrapping methods,we use GMR-based bootstrapping to sort the candidate words and patterns.Experi-mental results show that the GMR.-based bootstrapping approach outperforms the existing algorithms both in in-domain data and out-domain data.Furthermore,it shows that the result depends on not only the size of the corpus but also the quality.
基金Natural Science Foundation of Shanghai,China(No.18ZR1401200)Special Fund for Innovation and Development of Shanghai Industrial Internet,China(No.2019-GYHLW-01004)。
文摘A novel method of constructing sentiment lexicon of new words(SLNW)is proposed to realize effective Weibo sentiment analysis by integrating existing lexicons of sentiments,lexicons of degree,negation and network.Based on left-right entropy and mutual information(MI)neologism discovery algorithms,this new algorithm divides N-gram to obtain strings dynamically instead of relying on fixed sliding window when using Trie as data structure.The sentiment-oriented point mutual information(SO-PMI)algorithm with Laplacian smoothing is used to distinguish sentiment tendency of new words found in the data set to form SLNW by putting new words to basic sentiment lexicon.Experiments show that the sentiment analysis based on SLNW performs better than others.Precision,recall and F-measure are improved in both topic and non-topic Weibo data sets.
文摘For a long time,there exists a considerable amount of sexism in English,especially in English lexicon.In this paper,the author will discuss some presentations of sexism in English lexicon,and try to analyze some factors which have a great influence on the existence of sexism in English.This paper wants to arouse more and more people to realize the importance and urgency of desexism.
文摘Forensic linguistics, which is the interface between language and law, is a newly emerging interdiscipline in China. It belongs to neither the science of law nor the pure research category of linguistics, but it is an interdisciplinary subject based on these two disciplines. The linguistic issue in legal field is its key problem. At present, forensic linguistics in present China lays emphasis on written language instead of spoken language. This article gives a brief comparative analysis of Chinese and English forensic lexicon and the similarity of English and Chinese forensic lexicon. It also suggests that learners should view the differences between the two from the cultural perspective.
文摘The focus of the thesis is the construction of multidimensional mental lexicon of second language. It is made up of four dimensions—dimension of meaning, dimension of pronunciation, dimension of orthography and dimension of context so that through establishing these four dimensions, it comes into being.
文摘The theories of mental lexicon explain how words are organized and accessed in human brain from the angle of psycho linguistics. It draws great interest to study on the field of psycholinguistics and SLA. This paper focuses on incidental vocabulary acquisition of L2 and explores how to assist learners to reinforce and expand their network of mental lexicon by applying all kinds of mental connection in order to promote the learners to acquire English vocabulary.
文摘Currently,the sentiment analysis research in the Malaysian context lacks in terms of the availability of the sentiment lexicon.Thus,this issue is addressed in this paper in order to enhance the accuracy of sentiment analysis.In this study,a new lexicon for sentiment analysis is constructed.A detailed review of existing approaches has been conducted,and a new bilingual sentiment lexicon known as MELex(Malay-English Lexicon)has been generated.Constructing MELex involves three activities:seed words selection,polarity assignment,and synonym expansions.Our approach differs from previous works in that MELex can analyze text for the two most widely used languages in Malaysia,Malay,and English,with the accuracy achieved,is 90%.It is evaluated based on the experimentation and case study approaches where the affordable housing projects in Malaysia are selected as case projects.This finding has given an implication on the ability of MELex to analyze public sentiments in the Malaysian context.The novel aspects of this paper are two-fold.Firstly,it introduces the new technique in assigning the polarity score,and second,it improves the performance over the classification of mixed language content.