Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks i...Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.展开更多
Recently dependency information has been used in different ways to improve neural machine translation.For example,add dependency labels to the hidden states of source words.Or the contiguous information of a source wo...Recently dependency information has been used in different ways to improve neural machine translation.For example,add dependency labels to the hidden states of source words.Or the contiguous information of a source word would be found according to the dependency tree and then be learned independently and be added into Neural Machine Translation(NMT)model as a unit in various ways.However,these works are all limited to the use of dependency information to enrich the hidden states of source words.Since many works in Statistical Machine Translation(SMT)and NMT have proven the validity and potential of using dependency information.We believe that there are still many ways to apply dependency information in the NMT structure.In this paper,we explore a new way to use dependency information to improve NMT.Based on the theory of local attention mechanism,we present Dependency-based Local Attention Approach(DLAA),a new attention mechanism that allowed the NMT model to trace the dependency words related to the current translating words.Our work also indicates that dependency information could help to supervise attention mechanism.Experiment results on WMT 17 Chineseto-English translation task shared training datasets show that our model is effective and perform distinctively on long sentence translation.展开更多
A new method is proposed for constructing the Chinese sentential semantic structure in this paper. The method adopts the features including predicates, relations between predicates and basic arguments, relations betwe...A new method is proposed for constructing the Chinese sentential semantic structure in this paper. The method adopts the features including predicates, relations between predicates and basic arguments, relations between words, and case types to train the models of CRF + + and de- pendency parser. On the basis of the data set in Beijing Forest Studio-Chinese Tagged Corpus ( BFS- CTC), the proposed method obtains precision value of 73.63% in open test. This result shows that the formalized computer processing can construct the sentential semantic structure absolutely. The features of predicates, topic and comment extracted with the method can be applied in Chinese in- formation processing directly for promoting the development of Chinese semantic analysis. The method makes the analysis of sentential semantic analysis based on large scale of data possible. It is a tool for expanding the corpus and has certain theoretical research and practical application value.展开更多
Sentiment analysis of online reviews and other user generated content is an important research problem for its wide range of applications.In this paper,we propose a feature-based vector model and a novel weighting alg...Sentiment analysis of online reviews and other user generated content is an important research problem for its wide range of applications.In this paper,we propose a feature-based vector model and a novel weighting algorithm for sentiment analysis of Chinese product reviews.Specifically,an opinionated document is modeled by a set of feature-based vectors and corresponding weights.Different from previous work,our model considers modifying relationships between words and contains rich sentiment strength descriptions which are represented by adverbs of degree and punctuations.Dependency parsing is applied to construct the feature vectors.A novel feature weighting algorithm is proposed for supervised sentiment classification based on rich sentiment strength related information.The experimental results demonstrate the effectiveness of the proposed method compared with a state of the art method using term level weighting algorithms.展开更多
The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural fr...The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural framework consisting of two parts:a span-based model and a graph-based model.The span-based model can tackle overlapping problems compared with BILOU methods,whereas the graph-based model treats relation prediction as graph classification.Our main contribution is to incorporate external lexical and syntactic knowledge of a specific domain,such as domain dictionaries and dependency structures from texts,into end-to-end neural models.We conducted extensive experiments on a Chinese military entity and relation extraction corpus.The results show that the proposed framework outperforms the baselines with better performance in terms of entity and relation prediction.The proposed method provides insight into problems with the joint extraction of entities and their relations.展开更多
Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured...Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured natural language texts.In this paper,an assembly semantic entity recognition and relation con-struction method oriented to assembly process documents is proposed.First,the assembly process sentences are extracted from the table through concerned region recognition and cell division,and they will be stored as a key-value object file.Then,the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type.The syntactic rules are designed for realizing automatic construction of relation between entities.Finally,by using the self-constructed corpus,it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language.The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene,compared with manual method.The results show that the proposed method can help designers accumulate knowledge automatically and efficiently.展开更多
Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community include...Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.展开更多
Discriminative approaches have shown their effectiveness in unsupervised dependency parsing.However,due to their strong representational power,discriminative approaches tend to quickly converge to poor local optima du...Discriminative approaches have shown their effectiveness in unsupervised dependency parsing.However,due to their strong representational power,discriminative approaches tend to quickly converge to poor local optima during unsupervised training.In this paper,we tackle this problem by drawing inspiration from robust deep learning techniques.Specifically,we propose robust unsupervised discriminative dependency parsing,a framework that integrates the concepts of denoising autoencoders and conditional random field autoencoders.Within this framework,we propose two types of sentence corruption mechanisms as well as a posterior regularization method for robust training.We tested our methods on eight languages and the results show that our methods lead to significant improvements over previous work.展开更多
This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play...This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.展开更多
基金supported by National Key Basic Research Program of China (No.2014CB340600)partially supported by National Natural Science Foundation of China (Grant Nos.61332019,61672531)partially supported by National Social Science Foundation of China (Grant No.14GJ003-152)
文摘Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.
基金This research was funded in part by the National Natural Science Foundation of China(61871140,61872100,61572153,U1636215,61572492,61672020)the National Key research and Development Plan(Grant No.2018YFB0803504)Open Fund of Beijing Key Laboratory of IOT Information Security Technology(J6V0011104).
文摘Recently dependency information has been used in different ways to improve neural machine translation.For example,add dependency labels to the hidden states of source words.Or the contiguous information of a source word would be found according to the dependency tree and then be learned independently and be added into Neural Machine Translation(NMT)model as a unit in various ways.However,these works are all limited to the use of dependency information to enrich the hidden states of source words.Since many works in Statistical Machine Translation(SMT)and NMT have proven the validity and potential of using dependency information.We believe that there are still many ways to apply dependency information in the NMT structure.In this paper,we explore a new way to use dependency information to improve NMT.Based on the theory of local attention mechanism,we present Dependency-based Local Attention Approach(DLAA),a new attention mechanism that allowed the NMT model to trace the dependency words related to the current translating words.Our work also indicates that dependency information could help to supervise attention mechanism.Experiment results on WMT 17 Chineseto-English translation task shared training datasets show that our model is effective and perform distinctively on long sentence translation.
基金Supported by the Science and Technology Innovation Plan of Beijing Institute of Technology(2013)
文摘A new method is proposed for constructing the Chinese sentential semantic structure in this paper. The method adopts the features including predicates, relations between predicates and basic arguments, relations between words, and case types to train the models of CRF + + and de- pendency parser. On the basis of the data set in Beijing Forest Studio-Chinese Tagged Corpus ( BFS- CTC), the proposed method obtains precision value of 73.63% in open test. This result shows that the formalized computer processing can construct the sentential semantic structure absolutely. The features of predicates, topic and comment extracted with the method can be applied in Chinese in- formation processing directly for promoting the development of Chinese semantic analysis. The method makes the analysis of sentential semantic analysis based on large scale of data possible. It is a tool for expanding the corpus and has certain theoretical research and practical application value.
基金This work was supported in part by National Natural Science Foundation of China under Grants No.60970052,the Beijing Natural Science Foundation under Grants No.4133084,the Beijing Educational Committee Science and Technology Development Planned under Grants No.KM201410028017 and the Beijing Key Disciplines of Computer Application Technology
文摘Sentiment analysis of online reviews and other user generated content is an important research problem for its wide range of applications.In this paper,we propose a feature-based vector model and a novel weighting algorithm for sentiment analysis of Chinese product reviews.Specifically,an opinionated document is modeled by a set of feature-based vectors and corresponding weights.Different from previous work,our model considers modifying relationships between words and contains rich sentiment strength descriptions which are represented by adverbs of degree and punctuations.Dependency parsing is applied to construct the feature vectors.A novel feature weighting algorithm is proposed for supervised sentiment classification based on rich sentiment strength related information.The experimental results demonstrate the effectiveness of the proposed method compared with a state of the art method using term level weighting algorithms.
基金supported by the Jiangsu Province“333”project BRA2020418the NSFC under Grant Number 71901215+2 种基金the National University of Defense Technology Research Project ZK20-46the Outstanding Young Talents Program of National University of Defense Technologythe National University of Defense Technology Youth Innovation Project。
文摘The joint extraction of entities and their relations from certain texts plays a significant role in most natural language processes.For entity and relation extraction in a specific domain,we propose a hybrid neural framework consisting of two parts:a span-based model and a graph-based model.The span-based model can tackle overlapping problems compared with BILOU methods,whereas the graph-based model treats relation prediction as graph classification.Our main contribution is to incorporate external lexical and syntactic knowledge of a specific domain,such as domain dictionaries and dependency structures from texts,into end-to-end neural models.We conducted extensive experiments on a Chinese military entity and relation extraction corpus.The results show that the proposed framework outperforms the baselines with better performance in terms of entity and relation prediction.The proposed method provides insight into problems with the joint extraction of entities and their relations.
文摘Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured natural language texts.In this paper,an assembly semantic entity recognition and relation con-struction method oriented to assembly process documents is proposed.First,the assembly process sentences are extracted from the table through concerned region recognition and cell division,and they will be stored as a key-value object file.Then,the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type.The syntactic rules are designed for realizing automatic construction of relation between entities.Finally,by using the self-constructed corpus,it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language.The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene,compared with manual method.The results show that the proposed method can help designers accumulate knowledge automatically and efficiently.
基金the National Natural Science Foundation of China(Grant Nos.61602160 and 61672211)。
文摘Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.
基金supported by the National Natural Science Foundation of China (No.61503248)the Major Program of Science and Technology Commission Shanghai Municipal (No.17JC1404102)
文摘Discriminative approaches have shown their effectiveness in unsupervised dependency parsing.However,due to their strong representational power,discriminative approaches tend to quickly converge to poor local optima during unsupervised training.In this paper,we tackle this problem by drawing inspiration from robust deep learning techniques.Specifically,we propose robust unsupervised discriminative dependency parsing,a framework that integrates the concepts of denoising autoencoders and conditional random field autoencoders.Within this framework,we propose two types of sentence corruption mechanisms as well as a posterior regularization method for robust training.We tested our methods on eight languages and the results show that our methods lead to significant improvements over previous work.
基金Supported by the National Natural Science Foundation of China under Grant Nos.61273320,61331011,61070123the National High Technology Research and Development 863 Program of China under Grant No.2012AA011102
文摘This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.