In recent years,many text summarization models based on pretraining methods have achieved very good results.However,in these text summarization models,semantic deviations are easy to occur between the original input r...In recent years,many text summarization models based on pretraining methods have achieved very good results.However,in these text summarization models,semantic deviations are easy to occur between the original input representation and the representation that passed multi-layer encoder,which may result in inconsistencies between the generated summary and the source text content.The Bidirectional Encoder Representations from Transformers(BERT)improves the performance of many tasks in Natural Language Processing(NLP).Although BERT has a strong capability to encode context,it lacks the fine-grained semantic representation.To solve these two problems,we proposed a semantic supervision method based on Capsule Network.Firstly,we extracted the fine-grained semantic representation of the input and encoded result in BERT by Capsule Network.Secondly,we used the fine-grained semantic representation of the input to supervise the fine-grained semantic representation of the encoded result.Then we evaluated our model on a popular Chinese social media dataset(LCSTS),and the result showed that our model achieved higher ROUGE scores(including R-1,R-2),and our model outperformed baseline systems.Finally,we conducted a comparative study on the stability of the model,and the experimental results showed that our model was more stable.展开更多
The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gaine...The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gained significant attention for improving training efficiency.Most current algorithms rely on Convolutional Neural Networks(CNNs)for feature extraction.Although CNNs are proficient at capturing local features,they often struggle with global context,leading to incomplete and false Class Activation Mapping(CAM).To address these limitations,this work proposes a Contextual Prototype-Based End-to-End Weakly Supervised Semantic Segmentation(CPEWS)model,which improves feature extraction by utilizing the Vision Transformer(ViT).By incorporating its intermediate feature layers to preserve semantic information,this work introduces the Intermediate Supervised Module(ISM)to supervise the final layer’s output,reducing boundary ambiguity and mitigating issues related to incomplete activation.Additionally,the Contextual Prototype Module(CPM)generates class-specific prototypes,while the proposed Prototype Discrimination Loss and Superclass Suppression Loss guide the network’s training,(LPDL)(LSSL)effectively addressing false activation without the need for extra supervision.The CPEWS model proposed in this paper achieves state-of-the-art performance in end-to-end weakly supervised semantic segmentation without additional supervision.The validation set and test set Mean Intersection over Union(MIoU)of PASCAL VOC 2012 dataset achieved 69.8%and 72.6%,respectively.Compared with ToCo(pre trained weight ImageNet-1k),MIoU on the test set is 2.1%higher.In addition,MIoU reached 41.4%on the validation set of the MS COCO 2014 dataset.展开更多
基金This work was partially supported by the National Natural Science Foundation of China(Grant No.61502082)the National Key R&D Program of China(Grant No.2018YFA0306703).
文摘In recent years,many text summarization models based on pretraining methods have achieved very good results.However,in these text summarization models,semantic deviations are easy to occur between the original input representation and the representation that passed multi-layer encoder,which may result in inconsistencies between the generated summary and the source text content.The Bidirectional Encoder Representations from Transformers(BERT)improves the performance of many tasks in Natural Language Processing(NLP).Although BERT has a strong capability to encode context,it lacks the fine-grained semantic representation.To solve these two problems,we proposed a semantic supervision method based on Capsule Network.Firstly,we extracted the fine-grained semantic representation of the input and encoded result in BERT by Capsule Network.Secondly,we used the fine-grained semantic representation of the input to supervise the fine-grained semantic representation of the encoded result.Then we evaluated our model on a popular Chinese social media dataset(LCSTS),and the result showed that our model achieved higher ROUGE scores(including R-1,R-2),and our model outperformed baseline systems.Finally,we conducted a comparative study on the stability of the model,and the experimental results showed that our model was more stable.
基金funding from the following sources:National Natural Science Foundation of China(U1904119)Research Programs of Henan Science and Technology Department(232102210054)+3 种基金Chongqing Natural Science Foundation(CSTB2023NSCQ-MSX0070)Henan Province Key Research and Development Project(231111212000)Aviation Science Foundation(20230001055002)supported by Henan Center for Outstanding Overseas Scientists(GZS2022011).
文摘The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gained significant attention for improving training efficiency.Most current algorithms rely on Convolutional Neural Networks(CNNs)for feature extraction.Although CNNs are proficient at capturing local features,they often struggle with global context,leading to incomplete and false Class Activation Mapping(CAM).To address these limitations,this work proposes a Contextual Prototype-Based End-to-End Weakly Supervised Semantic Segmentation(CPEWS)model,which improves feature extraction by utilizing the Vision Transformer(ViT).By incorporating its intermediate feature layers to preserve semantic information,this work introduces the Intermediate Supervised Module(ISM)to supervise the final layer’s output,reducing boundary ambiguity and mitigating issues related to incomplete activation.Additionally,the Contextual Prototype Module(CPM)generates class-specific prototypes,while the proposed Prototype Discrimination Loss and Superclass Suppression Loss guide the network’s training,(LPDL)(LSSL)effectively addressing false activation without the need for extra supervision.The CPEWS model proposed in this paper achieves state-of-the-art performance in end-to-end weakly supervised semantic segmentation without additional supervision.The validation set and test set Mean Intersection over Union(MIoU)of PASCAL VOC 2012 dataset achieved 69.8%and 72.6%,respectively.Compared with ToCo(pre trained weight ImageNet-1k),MIoU on the test set is 2.1%higher.In addition,MIoU reached 41.4%on the validation set of the MS COCO 2014 dataset.