Multimodal-based action recognition methods have achieved high success using pose and RGB modality.However,skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitatio...Multimodal-based action recognition methods have achieved high success using pose and RGB modality.However,skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations.To address this,the authors introduce human parsing feature map as a novel modality,since it can selectively retain effective semantic features of the body parts while filtering out most irrelevant noise.The authors propose a new dual-branch framework called ensemble human parsing and pose network(EPP-Net),which is the first to leverage both skeletons and human parsing modalities for action recognition.The first human pose branch feeds robust skeletons in the graph convolutional network to model pose features,while the second human parsing branch also leverages depictive parsing feature maps to model parsing features via convolutional backbones.The two high-level features will be effectively combined through a late fusion strategy for better action recognition.Extensive experiments on NTU RGB t D and NTU RGB t D 120 benchmarks consistently verify the effectiveness of our proposed EPP-Net,which outperforms the existing action recognition methods.Our code is available at https://github.com/liujf69/EPP-Net-Action.展开更多
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj...The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.展开更多
In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full...In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full parsing into shallow parsing and sentence skeleton parsing. In shallow parsing, we finish POS tagging, Base NP identification, prepositional phrase attachment and subordinate clause identification. In skeleton parsing, we use a layered feature-oriented statistical method. Modularity possesses the advantage of solving different problems in parsing with corresponding mechanisms. Feature-oriented rule is able to express the complex lingual phenomena at the key point if needed. Evaluated on Penn Treebank corpus, we obtained 89.2% precision and 89.8% recall.展开更多
Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks i...Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.展开更多
Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform mo...Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.展开更多
Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other seman...Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins.展开更多
This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing mod...This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.展开更多
Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing cloth...Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing clothing parsing algorithms, this paper proposes an enhanced positional attention module(EPAM) to collect positional information in the vertical direction of each pixel, and an efficient global prior module(GPM) to aggregate contextual information from different sub-regions. The EPAM and GPM based residual network(EG-ResNet) could effectively exploit the intrinsic features of clothing images while capturing information between different scales and sub-regions. Experimental results show that the proposed EG-ResNet achieves promising performance in clothing parsing of the colorful fashion parsing dataset(CFPD)(51.12% of mean Intersection over Union(mIoU) and 92.79% of pixel-wise accuracy(PA)) compared with other state-of-the-art methods.展开更多
Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and e...Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.展开更多
Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are av...Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are available to recognize events which happen alternately. The other is that the temporal relationship between atomic actions is not fully utilized. Aiming at these problems, an algo- rithm based on an extended stochastic context-free grammar (SCFG) representation is proposed for events recognition. Events are modeled by a series of atomic actions and represented by an extended SCFG. The extended SCFG can express the hierarchical structure of the events and the temporal re- lationship between the atomic actions. In comparison with previous work, the main contributions of this paper are as follows: ① Events (include alternating events) can be recognized by an improved stochastic parsing and shortest path finding algorithm. ② The algorithm can disambiguate the detec- tion results of atomic actions by event context. Experimental results show that the proposed algo- rithm can recognize events accurately and most atomic action detection errors can be corrected sim- ultaneously.展开更多
A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses ...A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syn- tactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are de- signed to select the training parameters and verify the validity of the method. The result shows that the method costs 78. 98 ms and 4. 63 ms to train and test a Chinese sentence of 17. 9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.展开更多
The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Acc...The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Access Network) service of mobile operators, which ensures the planning and optimization of network coverage. The overall objective of this study is to make synchronous physical data of the sites deployed in the field with the ATOLL database which contains all the data of the coverage of the mobile networks of the operators. We have made an application that automates, updates with the following functionalities: import of radio parameters with the parsing method we have defined, visualization of data and its export to the Template of the ATOLL database. The results of the tests and validations of our application developed for a 4G network have made it possible to have a solution that performs updates with a constraint on the size of data to be imported. Our solution is a reliable resource for updating the databases containing the radio parameters of the network at all mobile operators, subject to a limitation in terms of the volume of data to be imported.展开更多
Frame-Semantic Parsing(FSP)aims to extract frame-semantic structures from text.The task usually involves three subtasks sequentially:Target Identification(TI),Frame Identification(FI),and Frame Semantic Role Labeling(...Frame-Semantic Parsing(FSP)aims to extract frame-semantic structures from text.The task usually involves three subtasks sequentially:Target Identification(TI),Frame Identification(FI),and Frame Semantic Role Labeling(FSRL).The three subtasks are closely related while most previous studies model them individually,encountering error propagation and running efficiency problems.Recently,an end-to-end graphbased model is proposed to jointly process three subtasks in one model.However,it still encounters three problems:insufficient semantic modeling between targets and arguments,span missing,and lacking knowledge incorporation of FrameNet.To address the mentioned problems,this paper presents an End-to-end FSP model with Table Encoder(EFSP-TE),which models FSP as two semantically dependent region classification problems and extracts frame-semantic structures from sentences in a one-step manner.Specifically,EFSP-TE incorporates lexical unit knowledge into context encoder via saliency embedding,and develops an effective table representation learning method based on Biaffine network and multi-layer ResNetstyle-CNNs(Convolutional Neural Networks),which can fully exploit word-to-word interactions and capture the information of various levels of semantic relations between targets and arguments.In addition,it adopts two separate region-based modules to obtain potential targets and arguments,followed by two interactive classification modules to predict the frames and roles for the potential targets and arguments.Experiments on two public benchmarks show that the proposed approach achieves state-of-the-art performance in end-to-end setting.展开更多
The human-centric visual analysis field thrives on rich video datasets that explore human behaviours and interactions.Yet,a gap persists in datasets covering both human pose estimation and parsing challenges.In this s...The human-centric visual analysis field thrives on rich video datasets that explore human behaviours and interactions.Yet,a gap persists in datasets covering both human pose estimation and parsing challenges.In this study,a notable effort has been made to develop a dedicated dataset named“Single Person Video-in-Person(SP-VIP)”to suit the research scenario,resolving a lack of a universal dataset to support three major human-centric visual analysis methods.The SP-VIP dataset was derived by extracting videos from the VIP dataset initially designed exclusively for parsing-related tasks.Furthermore,the VIP dataset did not encompass provisions for pose estimation and human activity recognition,which are crucial elements for human activity recognition.To bridge this gap,the SP-VIP dataset was meticulously curated with a specific focus on single-person activities.Videos in the newly created dataset are split into frames with semantic labels and joint values for each frame.To assess the performance of the tailored dataset,a novel architecture Single-person Parsing and Pose Network(SPPNet)was employed using a Deep ConvNet network for parsing while simultaneously performing pose estimation using the stacked hourglass method.To demonstrate the effectiveness of the newly created dataset,extensive experiments were performed on the discussed architecture,which produced favourable results with a pixel accuracy of 88.50%,a mean accuracy of 60.50%,and a mean Intersection over Union(IoU)of 49.30%signifying enhancement in performance.展开更多
Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community include...Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.展开更多
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agr...Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.展开更多
We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior ...We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models,but struggles with rough pixel-text score maps for complex scene parsing.We argue that,as they contain all textual information in a dataset,the pixel-text score maps,i.e.,dense prompts,are inevitably mixed with noise.To overcome this challenge,we propose a two-step method.Firstly,we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images.Secondly,based on the top-k categories and confidence scores,our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes,and incorporates them into the visual features fed into the decoder for segmentation.Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results.Our method achieves competitive performance,limited by the available visual-language pre-trained models.Our CLIP-SP performs 1.14%better(in terms of mIoU)than DenseCLIP on ADE20K,using a ResNet-50 backbone.展开更多
This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play...This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.展开更多
Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training per...Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training performance, the ability to effectively exploit the dataset, and the ability to adapt to complex environments when deploying the model. By utilizing the knowledge distillation techniques, the article strives to overcome the above challenges with the inheritance of the advantages of both the teacher model and the student model. More precisely, the ResNet152-PSP-Net model’s characteristics are utilized to train the ResNet18-PSP-Net model. Pyramid pooling blocks are utilized to decode multi-scale feature maps, creating a complete semantic map inference. The student model not only preserves the strong segmentation performance from the teacher model but also improves the inference speed of the prediction results. The proposed method exhibits a clear advantage over conventional convolutional neural network (CNN) models, as evident from the conducted experiments. Furthermore, the proposed model also shows remarkable improvement in processing speed when compared with light-weight models such as MobileNetV2 and EfficientNet based on latency and throughput parameters. The proposed KD-SegNet model obtains an accuracy of 96.3% and a mIoU (mean Intersection over Union) of 77%, outperforming the performance of existing models by more than 15% on the same training dataset. The suggested method has an average training time that is only 0.51 times less than same field models, while still achieving comparable segmentation performance. Hence, the semantic segmentation frames are collected, forming the motion trajectory for the system in the environment. Overall, this architecture shows great promise for the development of knowledge-based systems for MR’s navigation.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:62203476Natural Science Foundation of Guangdong Province,Grant/Award Number:2024A1515012089+1 种基金Natural Science Foundation of Shenzhen,Grant/Award Number:JCYJ20230807120801002Shenzhen Innovation in Science and Technology Foundation for The Excellent Youth Scholars,Grant/Award Number:RCYX20231211090248064。
文摘Multimodal-based action recognition methods have achieved high success using pose and RGB modality.However,skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations.To address this,the authors introduce human parsing feature map as a novel modality,since it can selectively retain effective semantic features of the body parts while filtering out most irrelevant noise.The authors propose a new dual-branch framework called ensemble human parsing and pose network(EPP-Net),which is the first to leverage both skeletons and human parsing modalities for action recognition.The first human pose branch feeds robust skeletons in the graph convolutional network to model pose features,while the second human parsing branch also leverages depictive parsing feature maps to model parsing features via convolutional backbones.The two high-level features will be effectively combined through a late fusion strategy for better action recognition.Extensive experiments on NTU RGB t D and NTU RGB t D 120 benchmarks consistently verify the effectiveness of our proposed EPP-Net,which outperforms the existing action recognition methods.Our code is available at https://github.com/liujf69/EPP-Net-Action.
文摘The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.
文摘In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full parsing into shallow parsing and sentence skeleton parsing. In shallow parsing, we finish POS tagging, Base NP identification, prepositional phrase attachment and subordinate clause identification. In skeleton parsing, we use a layered feature-oriented statistical method. Modularity possesses the advantage of solving different problems in parsing with corresponding mechanisms. Feature-oriented rule is able to express the complex lingual phenomena at the key point if needed. Evaluated on Penn Treebank corpus, we obtained 89.2% precision and 89.8% recall.
基金supported by National Key Basic Research Program of China (No.2014CB340600)partially supported by National Natural Science Foundation of China (Grant Nos.61332019,61672531)partially supported by National Social Science Foundation of China (Grant No.14GJ003-152)
文摘Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.
基金国家高技术研究发展计划(863计划),the National Natural Science Foundation of China
文摘Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.
基金Project(61262035) supported by the National Natural Science Foundation of ChinaProjects(GJJ12271,GJJ12742) supported by the Science and Technology Foundation of Education Department of Jiangxi Province,ChinaProject(20122BAB201033) supported by the Natural Science Foundation of Jiangxi Province,China
文摘Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins.
基金the National Natural Science Foundation of China (No.60435020, 60575042 and 60503072).
文摘This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
基金National Natural Science Foundation of China (No.62006039)Shanghai Special Fund for Software and Integrated Circuit Industry Development,China (No.180330)。
文摘Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing clothing parsing algorithms, this paper proposes an enhanced positional attention module(EPAM) to collect positional information in the vertical direction of each pixel, and an efficient global prior module(GPM) to aggregate contextual information from different sub-regions. The EPAM and GPM based residual network(EG-ResNet) could effectively exploit the intrinsic features of clothing images while capturing information between different scales and sub-regions. Experimental results show that the proposed EG-ResNet achieves promising performance in clothing parsing of the colorful fashion parsing dataset(CFPD)(51.12% of mean Intersection over Union(mIoU) and 92.79% of pixel-wise accuracy(PA)) compared with other state-of-the-art methods.
文摘Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.
基金Supported by the National Natural Science Foundation of China(60805028,60903146)Natural Science Foundation of Shandong Province of China (ZR2010FM027)+1 种基金SDUST Research Fund(2010KYTD101)China Postdoctoral Science Foundation(2012M521336)
文摘Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are available to recognize events which happen alternately. The other is that the temporal relationship between atomic actions is not fully utilized. Aiming at these problems, an algo- rithm based on an extended stochastic context-free grammar (SCFG) representation is proposed for events recognition. Events are modeled by a series of atomic actions and represented by an extended SCFG. The extended SCFG can express the hierarchical structure of the events and the temporal re- lationship between the atomic actions. In comparison with previous work, the main contributions of this paper are as follows: ① Events (include alternating events) can be recognized by an improved stochastic parsing and shortest path finding algorithm. ② The algorithm can disambiguate the detec- tion results of atomic actions by event context. Experimental results show that the proposed algo- rithm can recognize events accurately and most atomic action detection errors can be corrected sim- ultaneously.
基金Supported by the Science and Technology Innovation Plan of Beijing Institute of Technology(2013)
文摘A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syn- tactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are de- signed to select the training parameters and verify the validity of the method. The result shows that the method costs 78. 98 ms and 4. 63 ms to train and test a Chinese sentence of 17. 9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.
文摘The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Access Network) service of mobile operators, which ensures the planning and optimization of network coverage. The overall objective of this study is to make synchronous physical data of the sites deployed in the field with the ATOLL database which contains all the data of the coverage of the mobile networks of the operators. We have made an application that automates, updates with the following functionalities: import of radio parameters with the parsing method we have defined, visualization of data and its export to the Template of the ATOLL database. The results of the tests and validations of our application developed for a 4G network have made it possible to have a solution that performs updates with a constraint on the size of data to be imported. Our solution is a reliable resource for updating the databases containing the radio parameters of the network at all mobile operators, subject to a limitation in terms of the volume of data to be imported.
基金supported by the Science and Technology Cooperation and Exchange Special Project of Shanxi Province(No.202204041101016)the Basic Research Program of Shanxi Province(No.202203021211286)+2 种基金the Four“Batches”Innovation Project of Invigorating Medical through Science and Technology of Shanxi Province(No.2022XM01)the Research Fund of Guangxi Key Lab of Multi-source Information Mining Security(No.MIMS22-10)the Foundation of Shanxi Vocational University of Engineering Science and Technology(No.KJ202203).
文摘Frame-Semantic Parsing(FSP)aims to extract frame-semantic structures from text.The task usually involves three subtasks sequentially:Target Identification(TI),Frame Identification(FI),and Frame Semantic Role Labeling(FSRL).The three subtasks are closely related while most previous studies model them individually,encountering error propagation and running efficiency problems.Recently,an end-to-end graphbased model is proposed to jointly process three subtasks in one model.However,it still encounters three problems:insufficient semantic modeling between targets and arguments,span missing,and lacking knowledge incorporation of FrameNet.To address the mentioned problems,this paper presents an End-to-end FSP model with Table Encoder(EFSP-TE),which models FSP as two semantically dependent region classification problems and extracts frame-semantic structures from sentences in a one-step manner.Specifically,EFSP-TE incorporates lexical unit knowledge into context encoder via saliency embedding,and develops an effective table representation learning method based on Biaffine network and multi-layer ResNetstyle-CNNs(Convolutional Neural Networks),which can fully exploit word-to-word interactions and capture the information of various levels of semantic relations between targets and arguments.In addition,it adopts two separate region-based modules to obtain potential targets and arguments,followed by two interactive classification modules to predict the frames and roles for the potential targets and arguments.Experiments on two public benchmarks show that the proposed approach achieves state-of-the-art performance in end-to-end setting.
文摘The human-centric visual analysis field thrives on rich video datasets that explore human behaviours and interactions.Yet,a gap persists in datasets covering both human pose estimation and parsing challenges.In this study,a notable effort has been made to develop a dedicated dataset named“Single Person Video-in-Person(SP-VIP)”to suit the research scenario,resolving a lack of a universal dataset to support three major human-centric visual analysis methods.The SP-VIP dataset was derived by extracting videos from the VIP dataset initially designed exclusively for parsing-related tasks.Furthermore,the VIP dataset did not encompass provisions for pose estimation and human activity recognition,which are crucial elements for human activity recognition.To bridge this gap,the SP-VIP dataset was meticulously curated with a specific focus on single-person activities.Videos in the newly created dataset are split into frames with semantic labels and joint values for each frame.To assess the performance of the tailored dataset,a novel architecture Single-person Parsing and Pose Network(SPPNet)was employed using a Deep ConvNet network for parsing while simultaneously performing pose estimation using the stacked hourglass method.To demonstrate the effectiveness of the newly created dataset,extensive experiments were performed on the discussed architecture,which produced favourable results with a pixel accuracy of 88.50%,a mean accuracy of 60.50%,and a mean Intersection over Union(IoU)of 49.30%signifying enhancement in performance.
基金the National Natural Science Foundation of China(Grant Nos.61602160 and 61672211)。
文摘Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.
基金supported in part by National Natural Science Foundation of China(Nos.62132002,61825101 and 62202010)the Key-Area Research and Development Program of Guangdong Province,China(No.2021B0101400002)the China Postdoctoral Science Foundation(No.2022M710212).
文摘Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
文摘We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models,but struggles with rough pixel-text score maps for complex scene parsing.We argue that,as they contain all textual information in a dataset,the pixel-text score maps,i.e.,dense prompts,are inevitably mixed with noise.To overcome this challenge,we propose a two-step method.Firstly,we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images.Secondly,based on the top-k categories and confidence scores,our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes,and incorporates them into the visual features fed into the decoder for segmentation.Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results.Our method achieves competitive performance,limited by the available visual-language pre-trained models.Our CLIP-SP performs 1.14%better(in terms of mIoU)than DenseCLIP on ADE20K,using a ResNet-50 backbone.
基金Supported by the National Natural Science Foundation of China under Grant Nos.61273320,61331011,61070123the National High Technology Research and Development 863 Program of China under Grant No.2012AA011102
文摘This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.
基金funded by Hanoi University of Science and Technology(HUST)under project number T2023-PC-008.
文摘Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training performance, the ability to effectively exploit the dataset, and the ability to adapt to complex environments when deploying the model. By utilizing the knowledge distillation techniques, the article strives to overcome the above challenges with the inheritance of the advantages of both the teacher model and the student model. More precisely, the ResNet152-PSP-Net model’s characteristics are utilized to train the ResNet18-PSP-Net model. Pyramid pooling blocks are utilized to decode multi-scale feature maps, creating a complete semantic map inference. The student model not only preserves the strong segmentation performance from the teacher model but also improves the inference speed of the prediction results. The proposed method exhibits a clear advantage over conventional convolutional neural network (CNN) models, as evident from the conducted experiments. Furthermore, the proposed model also shows remarkable improvement in processing speed when compared with light-weight models such as MobileNetV2 and EfficientNet based on latency and throughput parameters. The proposed KD-SegNet model obtains an accuracy of 96.3% and a mIoU (mean Intersection over Union) of 77%, outperforming the performance of existing models by more than 15% on the same training dataset. The suggested method has an average training time that is only 0.51 times less than same field models, while still achieving comparable segmentation performance. Hence, the semantic segmentation frames are collected, forming the motion trajectory for the system in the environment. Overall, this architecture shows great promise for the development of knowledge-based systems for MR’s navigation.