Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MM...Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MMSS,they overlook the powerful generation ability of generative pre-trained language models(GPLMs),which have shown to be effective in many text generation tasks.To fill this research gap,we propose to using GPLMs to promote the performance of MMSS.Notably,adopting GPLMs to solve MMSS inevitably faces two challenges:1)What fusion strategy should we use to inject visual information into GPLMs properly?2)How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM.To address these two challenges,we propose a vision enhanced generative pre-trained language model for MMSS,dubbed as Vision-GPLM.In Vision-GPLM,we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary.In particular,we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM.Meanwhile,we train Vision-GPLM in two stages:the vision-oriented pre-training stage and fine-tuning stage.In the vision-oriented pre-training stage,we particularly train the visual encoder by the masked language model task while the other components are frozen,aiming to obtain homogeneous representations of text and image.In the fine-tuning stage,we train all the components of Vision-GPLM by the MMSS task.Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines.展开更多
文摘目的探讨CO2点阵激光对预防内眦赘皮矫正术后切口瘢痕增生的有效性。方法采用回顾性分析的方法,将162例接受内眦赘皮矫正术的患者,按照术后接受干预方法的不同分为治疗组(84例)和对照组(78例)。治疗组于拆线时(术后5~7 d),术后1、2个月分别应用CO2点阵激光对切口进行治疗,共治疗3次;对照组未行CO2点阵激光处理,仅进行常规护理。于拆线日(激光治疗前)及术后6个月应用改良曼彻斯特瘢痕量表(modified manchester scar scale,MMSS)对治疗组和对照组的瘢痕情况进行定量分析。结果治疗组及对照组均无明显术后并发症发生。拆线时(激光治疗前),两组MMSS总分差异无统计学意义(P>0.05);术后6个月(末次激光治疗后4个月),两组MMSS总分相对于治疗前均减少(P<0.05);治疗组MMSS总分相对于对照组明显减少(P<0.05);治疗组MMSS各子项目(VAS、瘢痕颜色、平整度和外形)评分均较对照组减少(P<0.05)。结论内眦赘皮矫正术后早期给予CO2点阵激光进行干预,可以有效预防内眦赘皮矫正术后切口的瘢痕增生。
文摘Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MMSS,they overlook the powerful generation ability of generative pre-trained language models(GPLMs),which have shown to be effective in many text generation tasks.To fill this research gap,we propose to using GPLMs to promote the performance of MMSS.Notably,adopting GPLMs to solve MMSS inevitably faces two challenges:1)What fusion strategy should we use to inject visual information into GPLMs properly?2)How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM.To address these two challenges,we propose a vision enhanced generative pre-trained language model for MMSS,dubbed as Vision-GPLM.In Vision-GPLM,we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary.In particular,we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM.Meanwhile,we train Vision-GPLM in two stages:the vision-oriented pre-training stage and fine-tuning stage.In the vision-oriented pre-training stage,we particularly train the visual encoder by the masked language model task while the other components are frozen,aiming to obtain homogeneous representations of text and image.In the fine-tuning stage,we train all the components of Vision-GPLM by the MMSS task.Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines.