Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ qu...The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.展开更多
Semantic segmentation in street scenes is a crucial technology for autonomous driving to analyze the surrounding environment.In street scenes,issues such as high image resolution caused by a large viewpoints and diffe...Semantic segmentation in street scenes is a crucial technology for autonomous driving to analyze the surrounding environment.In street scenes,issues such as high image resolution caused by a large viewpoints and differences in object scales lead to a decline in real-time performance and difficulties in multi-scale feature extraction.To address this,we propose a bilateral-branch real-time semantic segmentationmethod based on semantic information distillation(BSDNet)for street scene images.The BSDNet consists of a Feature Conversion Convolutional Block(FCB),a Semantic Information Distillation Module(SIDM),and a Deep Aggregation Atrous Convolution Pyramid Pooling(DASP).FCB reduces the semantic gap between the backbone and the semantic branch.SIDM extracts high-quality semantic information fromthe Transformer branch to reduce computational costs.DASP aggregates information lost in atrous convolutions,effectively capturingmulti-scale objects.Extensive experiments conducted on Cityscapes,CamVid,and ADE20K,achieving an accuracy of 81.7% Mean Intersection over Union(mIoU)at 70.6 Frames Per Second(FPS)on Cityscapes,demonstrate that our method achieves a better balance between accuracy and inference speed.展开更多
Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a chall...Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a challenge for accurate segmentation.In this paper,we propose a 3D semantic segmentation network for neuronal soma segmentation to address this issue.Using an encoding-decoding structure,we introduce a Multi-Scale feature extraction and Adaptive Weighting fusion module(MSAW)after each encoding block.The MSAW module can not only emphasize the fine structures via an upsampling strategy,but also provide pixel-wise weights to measure the importance of the multi-scale features.Additionally,a dynamic convolution instead of normal convolution is employed to better adapt the network to input data with different distributions.The proposed MSAW-based semantic segmentation network(MSAW-Net)was evaluated on three neuronal soma images from mouse brain and one neuronal soma image from macaque brain,demonstrating the efficiency of the proposed method.It achieved an F1 score of 91.8%on Fezf2-2A-CreER dataset,97.1%on LSL-H2B-GFP dataset,82.8%on Thy1-EGFP-Mline dataset,and 86.9%on macaque dataset,achieving improvements over the 3D U-Net model by 3.1%,3.3%,3.9%,and 2.3%,respectively.展开更多
Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made re...Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made remarkable achievements in both fine-grained segmentation and real-time performance.However,when faced with the huge differences in scale and semantic categories brought about by the mixed scenes of aerial remote sensing and road traffic,they still face great challenges and there is little related research.Addressing the above issue,this paper proposes a semantic segmentation model specifically for mixed datasets of aerial remote sensing and road traffic scenes.First,a novel decoding-recoding multi-scale feature iterative refinement structure is proposed,which utilizes the re-integration and continuous enhancement of multi-scale information to effectively deal with the huge scale differences between cross-domain scenes,while using a fully convolutional structure to ensure the lightweight and real-time requirements.Second,a welldesigned cross-window attention mechanism combined with a global information integration decoding block forms an enhanced global context perception,which can effectively capture the long-range dependencies and multi-scale global context information of different scenes,thereby achieving fine-grained semantic segmentation.The proposed method is tested on a large-scale mixed dataset of aerial remote sensing and road traffic scenes.The results confirm that it can effectively deal with the problem of large-scale differences in cross-domain scenes.Its segmentation accuracy surpasses that of the SOTA methods,which meets the real-time requirements.展开更多
In recent years,with the continuous deepening of smart city construction,there have been significant changes and improvements in the field of intelligent transportation.The semantic segmentation of road scenes has imp...In recent years,with the continuous deepening of smart city construction,there have been significant changes and improvements in the field of intelligent transportation.The semantic segmentation of road scenes has important practical significance in the fields of automatic driving,transportation planning,and intelligent transportation systems.However,the current mainstream lightweight semantic segmentation models in road scene segmentation face problems such as poor segmentation performance of small targets and insufficient refinement of segmentation edges.Therefore,this article proposes a lightweight semantic segmentation model based on the LiteSeg model improvement to address these issues.The model uses the lightweight backbone network MobileNet instead of the LiteSeg backbone network to reduce the network parameters and computation,and combines the Coordinate Attention(CA)mechanism to help the network capture long-distance dependencies.At the same time,by combining the dependencies of spatial information and channel information,the Spatial and Channel Network(SCNet)attention mechanism is proposed to improve the feature extraction ability of the model.Finally,a multiscale transposed attention encoding(MTAE)module was proposed to obtain features of different resolutions and perform feature fusion.In this paper,the proposed model is verified on the Cityscapes dataset.The experimental results show that the addition of SCNet and MTAE modules increases the mean Intersection over Union(mIoU)of the original LiteSeg model by 4.69%.On this basis,the backbone network is replaced with MobileNet,and the CA model is added at the same time.At the cost of increasing the minimum model parameters and computing costs,the mIoU of the original LiteSeg model is increased by 2.46%.This article also compares the proposed model with some current lightweight semantic segmentation models,and experiments show that the comprehensive performance of the proposed model is the best,especially in achieving excellent results in small object segmentation.Finally,this article will conduct generalization testing on the KITTI dataset for the proposed model,and the experimental results show that the proposed algorithm has a certain degree of generalization.展开更多
Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are ...Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.展开更多
We proposed an Intemet resource aggregation platform based on semantic web. The platform includes an Web Ontology Language(OWL) ontology design toolkit(VO-Editor) and a selective inference algorithm engine so that...We proposed an Intemet resource aggregation platform based on semantic web. The platform includes an Web Ontology Language(OWL) ontology design toolkit(VO-Editor) and a selective inference algorithm engine so that it can visually editing ontology and using novel selective reasoning for information aggregation. We introduce the VO-Editor and the principle of selective inference algorithm. At last a case of budget travel system is used to interpret the approach of Internet resources aggregation by this platform.展开更多
Rain streaks in an image appear in different sizes and orientations,resulting in severe blurring and visual quality degradation.Previous CNNbased algorithms have achieved encouraging deraining results although there a...Rain streaks in an image appear in different sizes and orientations,resulting in severe blurring and visual quality degradation.Previous CNNbased algorithms have achieved encouraging deraining results although there are certain limitations in the description of rain streaks and the restoration of scene structures in different environments.In this paper,we propose an efficient multi-scale enhancement and aggregation network(MEAN)to solve the single-image deraining problem.Considering the importance of large receptive fields and multi-scale features,we introduce a multi-scale enhanced unit(MEU)to capture longrange dependencies and exploit features at different scales to depict rain.Simultaneously,an attentive aggregation unit(AAU)is designed to utilize the informative features in spatial and channel dimensions,thereby aggregating effective information to eliminate redundant features for rich scenario details.To improve the deraining performance of the encoder–decoder network,we utilized an AAU to filter the information in the encoder network and concatenated the useful features to the decoder network,which is conducive to predicting high-quality clean images.Experimental results on synthetic datasets and real-world samples show that the proposed method achieves a significant deraining performance compared to state-of-the-art approaches.展开更多
目的 视频内容描述任务旨在自动生成自然语言句子,精准表达视频视觉语义信息。尽管编码器—解码器方法在视觉表达与语言生成上已有进展,但视频编码器难以建模目标级运动与事件,解码器也难以实现跨模态语义对齐,限制了生成文本质量。为此...目的 视频内容描述任务旨在自动生成自然语言句子,精准表达视频视觉语义信息。尽管编码器—解码器方法在视觉表达与语言生成上已有进展,但视频编码器难以建模目标级运动与事件,解码器也难以实现跨模态语义对齐,限制了生成文本质量。为此,提出融合轨迹时空感知与自适应语义聚焦的方法,以增强目标运动建模能力并改善多模态语义对齐。方法 首先,提出基于点轨迹的视觉特征聚合方法,通过时空建模生成兼具空间外观与时间连续性的轨迹特征,并与局部运动特征融合,以增强模型在运动和形变场景下的目标追踪能力和语义连贯性;同时,设计无监督自适应关键轨迹聚焦学习方法,利用密集点轨迹动态信息,通过注意力权重自适应筛选关键轨迹并引入聚焦损失,引导模型优先关注关键语义区域、抑制背景干扰,从而提升跨模态语义关联能力。结果 在MSRVTT(Microsoft research video to text)和MSVD(Microsoft research video description corpus)两个公开数据集上进行实验,所提方法在CIDEr(consensus-based image description evaluation)指标上分别取得61.2和130.1的得分,显著优于现有主流方法,验证了所提方法在描述准确性与语义丰富性方面的有效性。定性分析表明,该方法在提升描述的时序连贯性和语义表达能力方面表现优异。结论 本文方法有效提升了视频描述模型在复杂动态环境下的目标语义连续性建模能力,并通过无监督的自适应关键轨迹聚焦学习方法改善了注意力机制对视频与文本语义关联的能力。展开更多
[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis...[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.展开更多
Semantic refinement of stakeholders' requirements is a fundamental issue in requirements engineering. Facing with the on-demand collaboration problem among the heterogeneous, autonomous, and dynamic service resources...Semantic refinement of stakeholders' requirements is a fundamental issue in requirements engineering. Facing with the on-demand collaboration problem among the heterogeneous, autonomous, and dynamic service resources in the Web, service requirements refinement becomes extremely important, and the key issue in service requirements refinement is semantic interoperability aggregation. A method for creating connecting ontologies driven by requirement sign ontology is proposed. Based on connecting ontologies, a method for semantic interoperability aggregation in requirements refinement is proposed. In addition, we discover that the necessary condition for semantic interoperability is semantic similarity, and the sufficient condition is the coverability of the agreed mediation ontology. Based on this viewpoint, a metric framework for calculating semantic interoperability capability is proposed. This methodology can provide a semantic representation mechanism for refining users' requirements; meanwhile, since users' requirements in the Web usually originate from different domains, it can also provide semantic interoperability guidance for networked service discovery, and is an effective approach for the realization of on-demand service integration. The methodology will be beneficial in service-oriented software engineering and cloud computing.展开更多
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
文摘The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.
基金supported in part by the National Natural Science Foundation of China[Grant number 62471075]the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission[Grant number KJZD-M202301901]Graduate Innovation Fund of Chongqing[gzlcx20253235].
文摘Semantic segmentation in street scenes is a crucial technology for autonomous driving to analyze the surrounding environment.In street scenes,issues such as high image resolution caused by a large viewpoints and differences in object scales lead to a decline in real-time performance and difficulties in multi-scale feature extraction.To address this,we propose a bilateral-branch real-time semantic segmentationmethod based on semantic information distillation(BSDNet)for street scene images.The BSDNet consists of a Feature Conversion Convolutional Block(FCB),a Semantic Information Distillation Module(SIDM),and a Deep Aggregation Atrous Convolution Pyramid Pooling(DASP).FCB reduces the semantic gap between the backbone and the semantic branch.SIDM extracts high-quality semantic information fromthe Transformer branch to reduce computational costs.DASP aggregates information lost in atrous convolutions,effectively capturingmulti-scale objects.Extensive experiments conducted on Cityscapes,CamVid,and ADE20K,achieving an accuracy of 81.7% Mean Intersection over Union(mIoU)at 70.6 Frames Per Second(FPS)on Cityscapes,demonstrate that our method achieves a better balance between accuracy and inference speed.
基金supported by the STI2030-Major-Projects(No.2021ZD0200104)the National Natural Science Foundations of China under Grant 61771437.
文摘Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a challenge for accurate segmentation.In this paper,we propose a 3D semantic segmentation network for neuronal soma segmentation to address this issue.Using an encoding-decoding structure,we introduce a Multi-Scale feature extraction and Adaptive Weighting fusion module(MSAW)after each encoding block.The MSAW module can not only emphasize the fine structures via an upsampling strategy,but also provide pixel-wise weights to measure the importance of the multi-scale features.Additionally,a dynamic convolution instead of normal convolution is employed to better adapt the network to input data with different distributions.The proposed MSAW-based semantic segmentation network(MSAW-Net)was evaluated on three neuronal soma images from mouse brain and one neuronal soma image from macaque brain,demonstrating the efficiency of the proposed method.It achieved an F1 score of 91.8%on Fezf2-2A-CreER dataset,97.1%on LSL-H2B-GFP dataset,82.8%on Thy1-EGFP-Mline dataset,and 86.9%on macaque dataset,achieving improvements over the 3D U-Net model by 3.1%,3.3%,3.9%,and 2.3%,respectively.
基金supported by the National Key Research and Development of China(No.2022YFB2503400).
文摘Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made remarkable achievements in both fine-grained segmentation and real-time performance.However,when faced with the huge differences in scale and semantic categories brought about by the mixed scenes of aerial remote sensing and road traffic,they still face great challenges and there is little related research.Addressing the above issue,this paper proposes a semantic segmentation model specifically for mixed datasets of aerial remote sensing and road traffic scenes.First,a novel decoding-recoding multi-scale feature iterative refinement structure is proposed,which utilizes the re-integration and continuous enhancement of multi-scale information to effectively deal with the huge scale differences between cross-domain scenes,while using a fully convolutional structure to ensure the lightweight and real-time requirements.Second,a welldesigned cross-window attention mechanism combined with a global information integration decoding block forms an enhanced global context perception,which can effectively capture the long-range dependencies and multi-scale global context information of different scenes,thereby achieving fine-grained semantic segmentation.The proposed method is tested on a large-scale mixed dataset of aerial remote sensing and road traffic scenes.The results confirm that it can effectively deal with the problem of large-scale differences in cross-domain scenes.Its segmentation accuracy surpasses that of the SOTA methods,which meets the real-time requirements.
基金the National Natural Science Foundation of China(No.62063006)the Natural Science Foundation of Guangxi Province(No.2023GXNSFAA026025)+3 种基金to the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)to the Research Project for Young and Middle-Aged Teachers in Guangxi Universities(ID:2020KY15013)to the Special Research Project of Hechi University(ID:2021GCC028)supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘In recent years,with the continuous deepening of smart city construction,there have been significant changes and improvements in the field of intelligent transportation.The semantic segmentation of road scenes has important practical significance in the fields of automatic driving,transportation planning,and intelligent transportation systems.However,the current mainstream lightweight semantic segmentation models in road scene segmentation face problems such as poor segmentation performance of small targets and insufficient refinement of segmentation edges.Therefore,this article proposes a lightweight semantic segmentation model based on the LiteSeg model improvement to address these issues.The model uses the lightweight backbone network MobileNet instead of the LiteSeg backbone network to reduce the network parameters and computation,and combines the Coordinate Attention(CA)mechanism to help the network capture long-distance dependencies.At the same time,by combining the dependencies of spatial information and channel information,the Spatial and Channel Network(SCNet)attention mechanism is proposed to improve the feature extraction ability of the model.Finally,a multiscale transposed attention encoding(MTAE)module was proposed to obtain features of different resolutions and perform feature fusion.In this paper,the proposed model is verified on the Cityscapes dataset.The experimental results show that the addition of SCNet and MTAE modules increases the mean Intersection over Union(mIoU)of the original LiteSeg model by 4.69%.On this basis,the backbone network is replaced with MobileNet,and the CA model is added at the same time.At the cost of increasing the minimum model parameters and computing costs,the mIoU of the original LiteSeg model is increased by 2.46%.This article also compares the proposed model with some current lightweight semantic segmentation models,and experiments show that the comprehensive performance of the proposed model is the best,especially in achieving excellent results in small object segmentation.Finally,this article will conduct generalization testing on the KITTI dataset for the proposed model,and the experimental results show that the proposed algorithm has a certain degree of generalization.
基金This work was supported by the Project of Sichuan Outstanding Young Scientific and Technological Talents(19JCQN0003)the major Project of Education Department in Sichuan(17ZA0063 and 2017JQ0030)+1 种基金in part by the Natural Science Foundation for Young Scientists of CUIT(J201704)the Sichuan Science and Technology Program(2019JDRC0077).
文摘Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.
基金Supported by the Foundation of Hubei Information Indus-try (05050)
文摘We proposed an Intemet resource aggregation platform based on semantic web. The platform includes an Web Ontology Language(OWL) ontology design toolkit(VO-Editor) and a selective inference algorithm engine so that it can visually editing ontology and using novel selective reasoning for information aggregation. We introduce the VO-Editor and the principle of selective inference algorithm. At last a case of budget travel system is used to interpret the approach of Internet resources aggregation by this platform.
基金supported by the National Natural Science Foundation of China(No.61972227)the Natural Science Foundation of Shandong Province(No.ZR201808160102)+4 种基金Shandong Provincial Natural Science Foundation Key Project(No.ZR2020KF015)the Key Research and Development Project of Shandong Province(No.2019GSF109112)the Science and Technology Plan for Young Talents in Colleges and Universities of Shandong Province(No.2020KJN007)the Scientific Research Studio in Colleges and Universities of Ji’nan City(No.2021GXRC092)the Science and Technology Research Program for Colleges and Universities in Shandong Province(No.KJ2018BZN029).
文摘Rain streaks in an image appear in different sizes and orientations,resulting in severe blurring and visual quality degradation.Previous CNNbased algorithms have achieved encouraging deraining results although there are certain limitations in the description of rain streaks and the restoration of scene structures in different environments.In this paper,we propose an efficient multi-scale enhancement and aggregation network(MEAN)to solve the single-image deraining problem.Considering the importance of large receptive fields and multi-scale features,we introduce a multi-scale enhanced unit(MEU)to capture longrange dependencies and exploit features at different scales to depict rain.Simultaneously,an attentive aggregation unit(AAU)is designed to utilize the informative features in spatial and channel dimensions,thereby aggregating effective information to eliminate redundant features for rich scenario details.To improve the deraining performance of the encoder–decoder network,we utilized an AAU to filter the information in the encoder network and concatenated the useful features to the decoder network,which is conducive to predicting high-quality clean images.Experimental results on synthetic datasets and real-world samples show that the proposed method achieves a significant deraining performance compared to state-of-the-art approaches.
文摘目的 视频内容描述任务旨在自动生成自然语言句子,精准表达视频视觉语义信息。尽管编码器—解码器方法在视觉表达与语言生成上已有进展,但视频编码器难以建模目标级运动与事件,解码器也难以实现跨模态语义对齐,限制了生成文本质量。为此,提出融合轨迹时空感知与自适应语义聚焦的方法,以增强目标运动建模能力并改善多模态语义对齐。方法 首先,提出基于点轨迹的视觉特征聚合方法,通过时空建模生成兼具空间外观与时间连续性的轨迹特征,并与局部运动特征融合,以增强模型在运动和形变场景下的目标追踪能力和语义连贯性;同时,设计无监督自适应关键轨迹聚焦学习方法,利用密集点轨迹动态信息,通过注意力权重自适应筛选关键轨迹并引入聚焦损失,引导模型优先关注关键语义区域、抑制背景干扰,从而提升跨模态语义关联能力。结果 在MSRVTT(Microsoft research video to text)和MSVD(Microsoft research video description corpus)两个公开数据集上进行实验,所提方法在CIDEr(consensus-based image description evaluation)指标上分别取得61.2和130.1的得分,显著优于现有主流方法,验证了所提方法在描述准确性与语义丰富性方面的有效性。定性分析表明,该方法在提升描述的时序连贯性和语义表达能力方面表现优异。结论 本文方法有效提升了视频描述模型在复杂动态环境下的目标语义连续性建模能力,并通过无监督的自适应关键轨迹聚焦学习方法改善了注意力机制对视频与文本语义关联的能力。
文摘[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.
基金supported by the National Basic Research 973 Program of China under Grant No.2007CB310801the National Natural Science Foundation of China under Grant Nos.60970017 and 60903034
文摘Semantic refinement of stakeholders' requirements is a fundamental issue in requirements engineering. Facing with the on-demand collaboration problem among the heterogeneous, autonomous, and dynamic service resources in the Web, service requirements refinement becomes extremely important, and the key issue in service requirements refinement is semantic interoperability aggregation. A method for creating connecting ontologies driven by requirement sign ontology is proposed. Based on connecting ontologies, a method for semantic interoperability aggregation in requirements refinement is proposed. In addition, we discover that the necessary condition for semantic interoperability is semantic similarity, and the sufficient condition is the coverability of the agreed mediation ontology. Based on this viewpoint, a metric framework for calculating semantic interoperability capability is proposed. This methodology can provide a semantic representation mechanism for refining users' requirements; meanwhile, since users' requirements in the Web usually originate from different domains, it can also provide semantic interoperability guidance for networked service discovery, and is an effective approach for the realization of on-demand service integration. The methodology will be beneficial in service-oriented software engineering and cloud computing.