In the field of optoelectronics,certain types of data may be difficult to accurately annotate,such as high-resolution optoelectronic imaging or imaging in certain special spectral ranges.Weakly supervised learning can...In the field of optoelectronics,certain types of data may be difficult to accurately annotate,such as high-resolution optoelectronic imaging or imaging in certain special spectral ranges.Weakly supervised learning can provide a more reliable approach in these situations.Current popular approaches mainly adopt the classification-based class activation maps(CAM)as initial pseudo labels to solve the task.展开更多
The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gaine...The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gained significant attention for improving training efficiency.Most current algorithms rely on Convolutional Neural Networks(CNNs)for feature extraction.Although CNNs are proficient at capturing local features,they often struggle with global context,leading to incomplete and false Class Activation Mapping(CAM).To address these limitations,this work proposes a Contextual Prototype-Based End-to-End Weakly Supervised Semantic Segmentation(CPEWS)model,which improves feature extraction by utilizing the Vision Transformer(ViT).By incorporating its intermediate feature layers to preserve semantic information,this work introduces the Intermediate Supervised Module(ISM)to supervise the final layer’s output,reducing boundary ambiguity and mitigating issues related to incomplete activation.Additionally,the Contextual Prototype Module(CPM)generates class-specific prototypes,while the proposed Prototype Discrimination Loss and Superclass Suppression Loss guide the network’s training,(LPDL)(LSSL)effectively addressing false activation without the need for extra supervision.The CPEWS model proposed in this paper achieves state-of-the-art performance in end-to-end weakly supervised semantic segmentation without additional supervision.The validation set and test set Mean Intersection over Union(MIoU)of PASCAL VOC 2012 dataset achieved 69.8%and 72.6%,respectively.Compared with ToCo(pre trained weight ImageNet-1k),MIoU on the test set is 2.1%higher.In addition,MIoU reached 41.4%on the validation set of the MS COCO 2014 dataset.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully superv...Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.展开更多
The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions va...The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions vary widely among individuals,making it challenging to accurately diagnose the disease.This study proposed a deep-learning disease diagnosismodel based onweakly supervised learning and clustering visualization(W_CVNet)that fused classification with segmentation.First,the data were preprocessed.An optimizable weakly supervised segmentation preprocessing method(O-WSSPM)was used to remove redundant data and solve the category imbalance problem.Second,a deep-learning fusion method was used for feature extraction and classification recognition.A dual asymmetric complementary bilinear feature extraction method(D-CBM)was used to fully extract complementary features,which solved the problem of insufficient feature extraction by a single deep learning network.Third,an unsupervised learning method based on Fuzzy C-Means(FCM)clustering was used to segment and visualize COVID-19 lesions enabling physicians to accurately assess lesion distribution and disease severity.In this study,5-fold cross-validation methods were used,and the results showed that the network had an average classification accuracy of 85.8%,outperforming six recent advanced classification models.W_CVNet can effectively help physicians with automated aid in diagnosis to determine if the disease is present and,in the case of COVID-19 patients,to further predict the area of the lesion.展开更多
A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore...A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.展开更多
The problem of art forgery and infringement is becoming increasingly prominent,since diverse self-media contents with all kinds of art pieces are released on the Internet every day.For art paintings,object detection a...The problem of art forgery and infringement is becoming increasingly prominent,since diverse self-media contents with all kinds of art pieces are released on the Internet every day.For art paintings,object detection and localization provide an efficient and ef-fective means of art authentication and copyright protection.However,the acquisition of a precise detector requires large amounts of ex-pensive pixel-level annotations.To alleviate this,we propose a novel weakly supervised object localization(WSOL)with background su-perposition erasing(BSE),which recognizes objects with inexpensive image-level labels.First,integrated adversarial erasing(IAE)for vanilla convolutional neural network(CNN)dropouts the most discriminative region by leveraging high-level semantic information.Second,a background suppression module(BSM)limits the activation area of the IAE to the object region through a self-guidance mechanism.Finally,in the inference phase,we utilize the refined importance map(RIM)of middle features to obtain class-agnostic loc-alization results.Extensive experiments are conducted on paintings,CUB-200-2011 and ILSVRC to validate the effectiveness of our BSE.展开更多
Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data....Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data. These methods aretypically based on learning with contrastive losses whileautomatically deriving per-point pseudo-labels from asparse set of user-annotated labels. In this paper, ourkey observation is that the selection of which samplesto annotate is as important as how these samplesare used for training. Thus, we introduce a methodfor weakly supervised segmentation of 3D scenes thatcombines self-training with active learning. Activelearning selects points for annotation that are likelyto result in improvements to the trained model, whileself-training makes efficient use of the user-providedlabels for learning the model. We demonstrate thatour approach leads to an effective method that providesimprovements in scene segmentation over previouswork and baselines, while requiring only a few userannotations.展开更多
Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy f...Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy for weakly supervised target detectors to locate significant and highly discriminative local areas of objects.We propose a weak monitoring method that combines attention and erasure mechanisms.The supervised target detection method uses attention maps to search for areas with higher discrimination within candidate regions,and then uses an erasure mechanism to erase the region,forcing the model to enhance its learning of features in areas with weaker discrimination.To improve the positioning ability of the detector,we cascade the weakly supervised target detection network and the fully supervised target detection network,and jointly train the weakly supervised target detection network and the fully supervised target detection network through multi-task learning.Based on the validation trials,the category mean average precision(mAP)and the correct localization(CorLoc)on the two datasets,i.e.,VOC2007 and VOC2012,are 55.2% and 53.8%,respectively.In regard to the mAP and CorLoc,this approach significantly outperforms previous approaches,which creates opportunities for additional investigations into weakly supervised target identification algorithms.展开更多
Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on man...Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on manual labeling effort, learning with weak video-level supervision becomes a potential solution. In this paper, we propose a novel weakly supervised framework to recognize actions and locate the corresponding frames in untrimmed videos simultaneously. Considering that there are abundant trimmed videos publicly available and well-segmented with semantic descriptions, the instructive knowledge learned on trimmed videos can be fully leveraged to analyze untrimmed videos. We present an effective knowledge transfer strategy based on inter-class semantic relevance. We also take advantage of the self-attention mechanism to obtain a compact video representation, such that the influence of background frames can be effectively eliminated. A learning architecture is designed with twin networks for trimmed and untrimmed videos, to facilitate transferable self-attentive representation learning. Extensive experiments are conducted on three untrimmed benchmark datasets (i.e., THUMOS14, ActivityNet1.3, and MEXaction2), and the experimental results clearly corroborate the efficacy of our method. It is especially encouraging to see that the proposed weakly supervised method even achieves comparable results to some fully supervised methods.展开更多
Anticipating future actions without observing any partial videos of future actions plays an important role in action prediction and is also a challenging task.To obtain abundant information for action anticipation,som...Anticipating future actions without observing any partial videos of future actions plays an important role in action prediction and is also a challenging task.To obtain abundant information for action anticipation,some methods integrate multimodal contexts,including scene object labels.However,extensively labelling each frame in video datasets requires considerable effort.In this paper,we develop a weakly supervised method that integrates global motion and local finegrained features from current action videos to predict next action label without the need for specific scene context labels.Specifically,we extract diverse types of local features with weakly supervised learning,including object appearance and human pose representations without ground truth.Moreover,we construct a graph convolutional network for exploiting the inherent relationships of humans and objects under present incidents.We evaluate the proposed model on two datasets,the MPII-Cooking dataset and the EPIC-Kitchens dataset,and we demonstrate the generalizability and effectiveness of our approach for action anticipation.展开更多
Weakly supervised object localization mines the pixel-level location information based on image-level annotations.The traditional weakly supervised object localization approaches exploit the last convolutional feature...Weakly supervised object localization mines the pixel-level location information based on image-level annotations.The traditional weakly supervised object localization approaches exploit the last convolutional feature map to locate the discriminative regions with abundant semantics.Although it shows the localization ability of classification network,the process lacks the use of shallow edge and texture features,which cannot meet the requirement of object integrity in the localization task.Thus,we propose a novel shallow feature-driven dual-edges localization(DEL)network,in which dual kinds of shallow edges are utilized to mine entire target object regions.Specifically,we design an edge feature mining(EFM)module to extract the shallow edge details through the similarity measurement between the original class activation map and shallow features.We exploit the EFM module to extract two kinds of edges,named the edge of the shallow feature map and the edge of shallow gradients,for enhancing the edge details of the target object in the last convolutional feature map.The total process is proposed during the inference stage,which does not bring extra training costs.Extensive experiments on both the ILSVRC and CUB-200-2011 datasets show that the DEL method obtains consistency and substantial performance improvements compared with the existing methods.展开更多
Background:Image-based automatic diagnosis of field diseases can help increase crop yields and is of great importance.However,crop lesion regions tend to be scattered and of varying sizes,this along with substantial i...Background:Image-based automatic diagnosis of field diseases can help increase crop yields and is of great importance.However,crop lesion regions tend to be scattered and of varying sizes,this along with substantial intraclass variation and small inter-class variation makes segmentation difficult.Methods:We propose a novel end-to-end system that only requires weak supervision of image-level labels for lesion region segmentation.First,a two-branch network is designed for joint disease classification and seed region generation.The generated seed regions are then used as input to the next segmentation stage where we design to use an encoder-decoder network.Different from previous works that use an encoder in the segmentation network,the encoder-decoder network is critical for our system to successfully segment images with small and scattered regions,which is the major challenge in image-based diagnosis of field diseases.We further propose a novel weakly supervised training strategy for the encoder-decoder semantic segmentation network,making use of the extracted seed regions.Results:Experimental results show that our system achieves better lesion region segmentation results than state of the arts.In addition to crop images,our method is also applicable to general scattered object segmentation.We demonstrate this by extending our framework to work on the PASCAL VOC dataset,which achieves comparable performance with the state-of-the-art DSRG(deep seeded region growing)method.Conclusion:Our method not only outperforms state-of-the-art semantic segmentation methods by a large margin for the lesion segmentation task,but also shows its capability to perform well on more general tasks.展开更多
Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it ise...Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.展开更多
Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable...Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable.In this paper,we propose a weakly-supervised temporal action localization approach in untrimmed videos.To settle this issue,we train the model based on the proxies of each action class.The proxies are used to measure the distances between action segments and different original action features.We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds.Compared with state-of-the-art methods,our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.展开更多
Laser speckle contrast imaging(LSCI)is a noninvasive,label-free technique that allows real-time investigation of the microcirculation situation of biological tissue.High-quality microvascular segmentation is critical ...Laser speckle contrast imaging(LSCI)is a noninvasive,label-free technique that allows real-time investigation of the microcirculation situation of biological tissue.High-quality microvascular segmentation is critical for analyzing and evaluating vascular morphology and blood flow dynamics.However,achieving high-quality vessel segmentation has always been a challenge due to the cost and complexity of label data acquisition and the irregular vascular morphology.In addition,supervised learning methods heavily rely on high-quality labels for accurate segmentation results,which often necessitate extensive labeling efforts.Here,we propose a novel approach LSWDP for high-performance real-time vessel segmentation that utilizes low-quality pseudo-labels for nonmatched training without relying on a substantial number of intricate labels and image pairing.Furthermore,we demonstrate that our method is more robust and effective in mitigating performance degradation than traditional segmentation approaches on diverse style data sets,even when confronted with unfamiliar data.Importantly,the dice similarity coefficient exceeded 85%in a rat experiment.Our study has the potential to efficiently segment and evaluate blood vessels in both normal and disease situations.This would greatly benefit future research in life and medicine.展开更多
Accurate and timely surveying of airfield pavement distress is crucial for cost-effective air-port maintenance.Deep learning(DL)approaches,leveraging advancements in computer science and image acquisition techniques,h...Accurate and timely surveying of airfield pavement distress is crucial for cost-effective air-port maintenance.Deep learning(DL)approaches,leveraging advancements in computer science and image acquisition techniques,have become the mainstream for automated air-field pavement distress detection.However,fully-supervised DL methods require a large number of manually annotated ground truth labels to achieve high accuracy.To address the challenge of limited high-quality manual annotations,we propose a novel end-to-end distress detection model called class activation map informed weakly-supervised dis-tress detection(WSDD-CAM).Based on YOLOv5,WSDD-CAM consists of an efficient back-bone,a classification branch,and a localization network.By utilizing class activation map(CAM)information,our model significantly reduces the need for manual annotations,auto-matically generating pseudo bounding boxes with a 71%overlap with the ground truth.To evaluate WSDD-CAM,we tested it on a self-made dataset and compared it with other weakly-supervised and fully-supervised models.The results show that our model achieves 49.2%mean average precision(mAP),outperforming other weakly-supervised methods and even approaching state-of-the-art fully-supervised methods.Additionally,ablation experiments confirm the effectiveness of our architecture design.In conclusion,our WSDD-CAM model offers a promising solution for airfield pavement distress detection,reducing manual annotation time while maintaining high accuracy.This efficient and effec-tive approach can significantly contribute to cost-effective airport maintenance management.展开更多
Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to ...Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.展开更多
Accurate prognosis prediction is essential for guiding cancer treatment and improving patient outcomes.While recent studies have demonstrated the potential of histopathological images in survival analysis,existing mod...Accurate prognosis prediction is essential for guiding cancer treatment and improving patient outcomes.While recent studies have demonstrated the potential of histopathological images in survival analysis,existing models are typically developed in a cancerspecific manner,lack extensive external validation,and often rely on molecular data that are not routinely available in clinical practice.To address these limitations,we present PROGPATH,a unified model capable of integrating histopathological image features with routinely collected clinical variables to achieve pancancer prognosis prediction.PROGPATH employs a weakly supervised deep learning architecture built upon the foundation model for image encoding.Morphological features are aggregated through an attention-guided multiple instance learning module and fused with clinical information via a cross-attention transformer.A router-based classification strategy further refines the prediction performance.PROGPATH was trained on 7999 whole-slide images(WSIs)from 6,670 patients across 15 cancer types,and extensively validated on 17 external cohorts with a total of 7374 WSIs from 4441 patients,covering 12 cancer types from 8 consortia and institutions across three continents.PROGPATH achieved consistently superior performance compared with state-of-the-art multimodal prognosis prediction models.It demonstrated strong generalizability across cancer types and robustness in stratified subgroups,including early-and advancedstage patients,treatment cohorts(radiotherapy and pharmaceutical therapy),and biomarker-defined subsets.We further provide model interpretability by identifying pathological patterns critical to PROGPATH’s risk predictions,such as the degree of cell differentiation and extent of necrosis.Together,these results highlight the potential of PROGPATH to support pancancer outcome prediction and inform personalized cancer management strategies.展开更多
We study the novel problem of weakly supervised instance action recognition(WSiAR)in multi-person(crowd)scenes.We specifically aim to recognize the action of each subject in the crowd,for which we propose the use of a...We study the novel problem of weakly supervised instance action recognition(WSiAR)in multi-person(crowd)scenes.We specifically aim to recognize the action of each subject in the crowd,for which we propose the use of a weakly supervised method,considering the expense of large-scale annotations for training.This problem is of great practical value for video surveillance and sports scene analysis.To this end,we investigated and designed a series of weak annotations for the supervision of weakly supervised instance action recognition(WSiAR).We propose two categories of weak label settings,bag labels and sparse labels,to significantly reduce the number of labels.Based on the former,we propose a novel sub-block-aware multi-instance learning(MIL)loss to obtain more effective information from weak labels during training.With respect to the latter,we propose a pseudo label generation strategy for extending sparse labels.This enables our method to achieve results comparable to those of fully supervised methods but with significantly fewer annotations.The experimental results on two benchmarks verified the rationality of the problem definition and effectiveness of the proposed weakly supervised training method in solving our problem.展开更多
文摘In the field of optoelectronics,certain types of data may be difficult to accurately annotate,such as high-resolution optoelectronic imaging or imaging in certain special spectral ranges.Weakly supervised learning can provide a more reliable approach in these situations.Current popular approaches mainly adopt the classification-based class activation maps(CAM)as initial pseudo labels to solve the task.
基金funding from the following sources:National Natural Science Foundation of China(U1904119)Research Programs of Henan Science and Technology Department(232102210054)+3 种基金Chongqing Natural Science Foundation(CSTB2023NSCQ-MSX0070)Henan Province Key Research and Development Project(231111212000)Aviation Science Foundation(20230001055002)supported by Henan Center for Outstanding Overseas Scientists(GZS2022011).
文摘The primary challenge in weakly supervised semantic segmentation is effectively leveraging weak annotations while minimizing the performance gap compared to fully supervised methods.End-to-end model designs have gained significant attention for improving training efficiency.Most current algorithms rely on Convolutional Neural Networks(CNNs)for feature extraction.Although CNNs are proficient at capturing local features,they often struggle with global context,leading to incomplete and false Class Activation Mapping(CAM).To address these limitations,this work proposes a Contextual Prototype-Based End-to-End Weakly Supervised Semantic Segmentation(CPEWS)model,which improves feature extraction by utilizing the Vision Transformer(ViT).By incorporating its intermediate feature layers to preserve semantic information,this work introduces the Intermediate Supervised Module(ISM)to supervise the final layer’s output,reducing boundary ambiguity and mitigating issues related to incomplete activation.Additionally,the Contextual Prototype Module(CPM)generates class-specific prototypes,while the proposed Prototype Discrimination Loss and Superclass Suppression Loss guide the network’s training,(LPDL)(LSSL)effectively addressing false activation without the need for extra supervision.The CPEWS model proposed in this paper achieves state-of-the-art performance in end-to-end weakly supervised semantic segmentation without additional supervision.The validation set and test set Mean Intersection over Union(MIoU)of PASCAL VOC 2012 dataset achieved 69.8%and 72.6%,respectively.Compared with ToCo(pre trained weight ImageNet-1k),MIoU on the test set is 2.1%higher.In addition,MIoU reached 41.4%on the validation set of the MS COCO 2014 dataset.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
文摘Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.
基金funded by the Open Foundation of Anhui EngineeringResearch Center of Intelligent Perception and Elderly Care,Chuzhou University(No.2022OPA03)the Higher EducationNatural Science Foundation of Anhui Province(No.KJ2021B01)and the Innovation Team Projects of Universities in Guangdong(No.2022KCXTD057).
文摘The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions vary widely among individuals,making it challenging to accurately diagnose the disease.This study proposed a deep-learning disease diagnosismodel based onweakly supervised learning and clustering visualization(W_CVNet)that fused classification with segmentation.First,the data were preprocessed.An optimizable weakly supervised segmentation preprocessing method(O-WSSPM)was used to remove redundant data and solve the category imbalance problem.Second,a deep-learning fusion method was used for feature extraction and classification recognition.A dual asymmetric complementary bilinear feature extraction method(D-CBM)was used to fully extract complementary features,which solved the problem of insufficient feature extraction by a single deep learning network.Third,an unsupervised learning method based on Fuzzy C-Means(FCM)clustering was used to segment and visualize COVID-19 lesions enabling physicians to accurately assess lesion distribution and disease severity.In this study,5-fold cross-validation methods were used,and the results showed that the network had an average classification accuracy of 85.8%,outperforming six recent advanced classification models.W_CVNet can effectively help physicians with automated aid in diagnosis to determine if the disease is present and,in the case of COVID-19 patients,to further predict the area of the lesion.
基金supported by National Natural Science Foundation of China(62276058,61902057,41774063)Fundamental Research Funds for the Central Universities(N2217003)Joint Fund of Science&Technology Department of Liaoning Province and State Key Laboratory of Robotics,China(2020-KF-12-11).
文摘A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.
基金This work was supported in part by Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application,China(No.2022B1212010011).
文摘The problem of art forgery and infringement is becoming increasingly prominent,since diverse self-media contents with all kinds of art pieces are released on the Internet every day.For art paintings,object detection and localization provide an efficient and ef-fective means of art authentication and copyright protection.However,the acquisition of a precise detector requires large amounts of ex-pensive pixel-level annotations.To alleviate this,we propose a novel weakly supervised object localization(WSOL)with background su-perposition erasing(BSE),which recognizes objects with inexpensive image-level labels.First,integrated adversarial erasing(IAE)for vanilla convolutional neural network(CNN)dropouts the most discriminative region by leveraging high-level semantic information.Second,a background suppression module(BSM)limits the activation area of the IAE to the object region through a self-guidance mechanism.Finally,in the inference phase,we utilize the refined importance map(RIM)of middle features to obtain class-agnostic loc-alization results.Extensive experiments are conducted on paintings,CUB-200-2011 and ILSVRC to validate the effectiveness of our BSE.
基金supported by Guangdong Natural Science Foundation(2021B1515020085)Shenzhen Science and Technology Program(RCYX20210609103121030)+4 种基金National Natural Science Foundation of China(62322207,61872250,U2001206,U21B2023)Department of Education of Guangdong Province Innovation Team(2022KCXTD025)Shenzhen Science and Technology Innovation Program(JCYJ20210324120213036)the Natural Sciences and Engineering Research Council of Canada(NSERC)Guangdong Laboratory of Artificial Intelligence and Digital Economy(ShenZhen).
文摘Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data. These methods aretypically based on learning with contrastive losses whileautomatically deriving per-point pseudo-labels from asparse set of user-annotated labels. In this paper, ourkey observation is that the selection of which samplesto annotate is as important as how these samplesare used for training. Thus, we introduce a methodfor weakly supervised segmentation of 3D scenes thatcombines self-training with active learning. Activelearning selects points for annotation that are likelyto result in improvements to the trained model, whileself-training makes efficient use of the user-providedlabels for learning the model. We demonstrate thatour approach leads to an effective method that providesimprovements in scene segmentation over previouswork and baselines, while requiring only a few userannotations.
基金supported by the National Natural Science Foundation of China(No.61871182,61773160)the Natural Science Foundation of Hebei Province of China(No.F2021502013)+1 种基金the Fundamental Research Funds for the Central Universities(No.2020MS153,2021PT018)the National Natural Science Foundation of China(No.62371188).
文摘Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy for weakly supervised target detectors to locate significant and highly discriminative local areas of objects.We propose a weak monitoring method that combines attention and erasure mechanisms.The supervised target detection method uses attention maps to search for areas with higher discrimination within candidate regions,and then uses an erasure mechanism to erase the region,forcing the model to enhance its learning of features in areas with weaker discrimination.To improve the positioning ability of the detector,we cascade the weakly supervised target detection network and the fully supervised target detection network,and jointly train the weakly supervised target detection network and the fully supervised target detection network through multi-task learning.Based on the validation trials,the category mean average precision(mAP)and the correct localization(CorLoc)on the two datasets,i.e.,VOC2007 and VOC2012,are 55.2% and 53.8%,respectively.In regard to the mAP and CorLoc,this approach significantly outperforms previous approaches,which creates opportunities for additional investigations into weakly supervised target identification algorithms.
基金supported by National Natural Science Foundation of China(Nos.61871378,U2003111,62122013 and U2001211).
文摘Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on manual labeling effort, learning with weak video-level supervision becomes a potential solution. In this paper, we propose a novel weakly supervised framework to recognize actions and locate the corresponding frames in untrimmed videos simultaneously. Considering that there are abundant trimmed videos publicly available and well-segmented with semantic descriptions, the instructive knowledge learned on trimmed videos can be fully leveraged to analyze untrimmed videos. We present an effective knowledge transfer strategy based on inter-class semantic relevance. We also take advantage of the self-attention mechanism to obtain a compact video representation, such that the influence of background frames can be effectively eliminated. A learning architecture is designed with twin networks for trimmed and untrimmed videos, to facilitate transferable self-attentive representation learning. Extensive experiments are conducted on three untrimmed benchmark datasets (i.e., THUMOS14, ActivityNet1.3, and MEXaction2), and the experimental results clearly corroborate the efficacy of our method. It is especially encouraging to see that the proposed weakly supervised method even achieves comparable results to some fully supervised methods.
基金supported partially by the National Natural Science Foundation of China(NSFC)(Grant Nos.U1911401 and U1811461)Guangdong NSF Project(2020B1515120085,2018B030312002)+2 种基金Guangzhou Research Project(201902010037)Research Projects of Zhejiang Lab(2019KD0AB03)the Key-Area Research and Development Program of Guangzhou(202007030004).
文摘Anticipating future actions without observing any partial videos of future actions plays an important role in action prediction and is also a challenging task.To obtain abundant information for action anticipation,some methods integrate multimodal contexts,including scene object labels.However,extensively labelling each frame in video datasets requires considerable effort.In this paper,we develop a weakly supervised method that integrates global motion and local finegrained features from current action videos to predict next action label without the need for specific scene context labels.Specifically,we extract diverse types of local features with weakly supervised learning,including object appearance and human pose representations without ground truth.Moreover,we construct a graph convolutional network for exploiting the inherent relationships of humans and objects under present incidents.We evaluate the proposed model on two datasets,the MPII-Cooking dataset and the EPIC-Kitchens dataset,and we demonstrate the generalizability and effectiveness of our approach for action anticipation.
基金This work was partly supported by National Natural Science Foundation of China(No.62072394)Natural Science Foundation of Hebei Province,China(No.F2021203019)Hebei Key Laboratory Project,China(No.202250701010046).
文摘Weakly supervised object localization mines the pixel-level location information based on image-level annotations.The traditional weakly supervised object localization approaches exploit the last convolutional feature map to locate the discriminative regions with abundant semantics.Although it shows the localization ability of classification network,the process lacks the use of shallow edge and texture features,which cannot meet the requirement of object integrity in the localization task.Thus,we propose a novel shallow feature-driven dual-edges localization(DEL)network,in which dual kinds of shallow edges are utilized to mine entire target object regions.Specifically,we design an edge feature mining(EFM)module to extract the shallow edge details through the similarity measurement between the original class activation map and shallow features.We exploit the EFM module to extract two kinds of edges,named the edge of the shallow feature map and the edge of shallow gradients,for enhancing the edge details of the target object in the last convolutional feature map.The total process is proposed during the inference stage,which does not bring extra training costs.Extensive experiments on both the ILSVRC and CUB-200-2011 datasets show that the DEL method obtains consistency and substantial performance improvements compared with the existing methods.
基金This work was partially supported by the National Natural Science Foundation of China(Nos.61725204 and 62002258)a Grant from Science and Technology Department of Jiangsu Province,China.
文摘Background:Image-based automatic diagnosis of field diseases can help increase crop yields and is of great importance.However,crop lesion regions tend to be scattered and of varying sizes,this along with substantial intraclass variation and small inter-class variation makes segmentation difficult.Methods:We propose a novel end-to-end system that only requires weak supervision of image-level labels for lesion region segmentation.First,a two-branch network is designed for joint disease classification and seed region generation.The generated seed regions are then used as input to the next segmentation stage where we design to use an encoder-decoder network.Different from previous works that use an encoder in the segmentation network,the encoder-decoder network is critical for our system to successfully segment images with small and scattered regions,which is the major challenge in image-based diagnosis of field diseases.We further propose a novel weakly supervised training strategy for the encoder-decoder semantic segmentation network,making use of the extracted seed regions.Results:Experimental results show that our system achieves better lesion region segmentation results than state of the arts.In addition to crop images,our method is also applicable to general scattered object segmentation.We demonstrate this by extending our framework to work on the PASCAL VOC dataset,which achieves comparable performance with the state-of-the-art DSRG(deep seeded region growing)method.Conclusion:Our method not only outperforms state-of-the-art semantic segmentation methods by a large margin for the lesion segmentation task,but also shows its capability to perform well on more general tasks.
基金Project supported by the Key Project of the National Natural Science Foundation of China(No.U1836220)the National Nat-ural Science Foundation of China(No.61672267)+1 种基金the Qing Lan Talent Program of Jiangsu Province,China,the Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace,China,the Finnish Cultural Foundation,the Jiangsu Specially-Appointed Professor Program,China(No.3051107219003)the liangsu Joint Research Project of Sino-Foreign Cooperative Education Platform,China,and the Talent Startup Project of Nanjing Institute of Technology,China(No.YKJ201982)。
文摘Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.
基金supported by the National Key Research and Development Program of China(2018AAA0100104 and 2018AAA0100100)the National Natural Science Foundation of China(Grant No.61702095)+1 种基金Natural Science Foundation of Jiangsu Province(BK20211164,BK20190341,and BK20210002)the Big Data Computing Center of Southeast University.
文摘Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable.In this paper,we propose a weakly-supervised temporal action localization approach in untrimmed videos.To settle this issue,we train the model based on the proxies of each action class.The proxies are used to measure the distances between action segments and different original action features.We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds.Compared with state-of-the-art methods,our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.
基金supported by grants fromthe State Key Laboratory of Vaccines for Infectious Diseases,Xiang An Biomedicine Laboratory(2023XAKJ0101031)National Natural Science Foundation of China(81971665)+8 种基金Natural Science Foundation of Fujian Province(2021J011366)Medical and Health Guidance Project of Xiamen(3502Z20214ZD1016)Xiamen Health High-Level Talent Training Program,Ningxia Hui Autonomous Region Key Research and Development Program(2022BEG03127)Fundamental Research Funds for the Central Universities of China(20720210117)Fujian Province Science and Technology Plan Guiding Project(2022Y0002)National Natural Science Foundation of China(62005048)Natural Science Foundation of Fujian Province(2020J01158)Ministry of Education Industry-university Cooperative Education Project(220606053295218)XMU Undergraduate Innovation and Entrepreneurship Training Programs(2023X805,2023X808,2023Y1109).
文摘Laser speckle contrast imaging(LSCI)is a noninvasive,label-free technique that allows real-time investigation of the microcirculation situation of biological tissue.High-quality microvascular segmentation is critical for analyzing and evaluating vascular morphology and blood flow dynamics.However,achieving high-quality vessel segmentation has always been a challenge due to the cost and complexity of label data acquisition and the irregular vascular morphology.In addition,supervised learning methods heavily rely on high-quality labels for accurate segmentation results,which often necessitate extensive labeling efforts.Here,we propose a novel approach LSWDP for high-performance real-time vessel segmentation that utilizes low-quality pseudo-labels for nonmatched training without relying on a substantial number of intricate labels and image pairing.Furthermore,we demonstrate that our method is more robust and effective in mitigating performance degradation than traditional segmentation approaches on diverse style data sets,even when confronted with unfamiliar data.Importantly,the dice similarity coefficient exceeded 85%in a rat experiment.Our study has the potential to efficiently segment and evaluate blood vessels in both normal and disease situations.This would greatly benefit future research in life and medicine.
基金support of the National Natural Science Foundation of China(Nos.52008311,51878499,and 52178433)the Science and Technology Commission of Shanghai Municipality(No.21ZR1465700)the Fundamental Research Funds for the Central Universities(No.22120230196).
文摘Accurate and timely surveying of airfield pavement distress is crucial for cost-effective air-port maintenance.Deep learning(DL)approaches,leveraging advancements in computer science and image acquisition techniques,have become the mainstream for automated air-field pavement distress detection.However,fully-supervised DL methods require a large number of manually annotated ground truth labels to achieve high accuracy.To address the challenge of limited high-quality manual annotations,we propose a novel end-to-end distress detection model called class activation map informed weakly-supervised dis-tress detection(WSDD-CAM).Based on YOLOv5,WSDD-CAM consists of an efficient back-bone,a classification branch,and a localization network.By utilizing class activation map(CAM)information,our model significantly reduces the need for manual annotations,auto-matically generating pseudo bounding boxes with a 71%overlap with the ground truth.To evaluate WSDD-CAM,we tested it on a self-made dataset and compared it with other weakly-supervised and fully-supervised models.The results show that our model achieves 49.2%mean average precision(mAP),outperforming other weakly-supervised methods and even approaching state-of-the-art fully-supervised methods.Additionally,ablation experiments confirm the effectiveness of our architecture design.In conclusion,our WSDD-CAM model offers a promising solution for airfield pavement distress detection,reducing manual annotation time while maintaining high accuracy.This efficient and effec-tive approach can significantly contribute to cost-effective airport maintenance management.
基金supported in part by the National Key R&D Program of China(2017YFB0502904)the National Science Foundation of China(61876140)。
文摘Recently,video object segmentation has received great attention in the computer vision community.Most of the existing methods heavily rely on the pixel-wise human annotations,which are expensive and time-consuming to obtain.To tackle this problem,we make an early attempt to achieve video object segmentation with scribble-level supervision,which can alleviate large amounts of human labor for collecting the manual annotation.However,using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete.To address this issue,this paper introduces two novel elements to learn the video object segmentation model.The first one is the scribble attention module,which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background.The other one is the scribble-supervised loss,which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.To evaluate the proposed method,we implement experiments on two video object segmentation benchmark datasets,You Tube-video object segmentation(VOS),and densely annotated video segmentation(DAVIS)-2017.We first generate the scribble annotations from the original per-pixel annotations.Then,we train our model and compare its test performance with the baseline models and other existing works.Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.
基金supported in part by the National Cancer Institute under award numbers R01CA268287A1,U01CA269181,R01CA26820701A1,R01CA249992-01A1,R01CA202752-01A1,R01CA208236-01A1,R01CA216579-01A1,R01CA220581-01A1,R01CA257612-01A1,1U01CA239055-01,1U01CA248226-01,1U54CA254566-01National Heart,Lung and Blood Institute 1R01HL15127701A1,R01HL15807101A1+8 种基金National Institute of Biomedical Imaging and Bioengineering 1R43EB028736-01VA Merit Review Award IBX004121A from the United States Department of Veterans Affairs Biomedical Laboratory Research and Development Service the Office of the Assistant Secretary of Defense for Health Affairs,through the Breast Cancer Research Program(W81XWH-19-1-0668)the Prostate Cancer Research Program(W81XWH-20-1-0851)the Lung Cancer Research Program(W81XWH-18-1-0440,W81XWH-20-1-0595)the Peer Reviewed Cancer Research Program(W81XWH-18-1-0404,W81XWH-21-1-0345,W81XWH-211-0160)the Kidney Precision Medicine Project(KPMP)Glue Grant and sponsored research agreements from Bristol Myers-Squibb,Boehringer-Ingelheim,Eli-Lilly and Astrazenecasupported in part by the National Natural Science Foundation of China general program(No.61571314)the Sichuan University-Yibin City Strategic Cooperation Special Fund(No.2020CDYB-27)Support Program of Sichuan Science and Technology Department(No.2023YFS0327-LH).
文摘Accurate prognosis prediction is essential for guiding cancer treatment and improving patient outcomes.While recent studies have demonstrated the potential of histopathological images in survival analysis,existing models are typically developed in a cancerspecific manner,lack extensive external validation,and often rely on molecular data that are not routinely available in clinical practice.To address these limitations,we present PROGPATH,a unified model capable of integrating histopathological image features with routinely collected clinical variables to achieve pancancer prognosis prediction.PROGPATH employs a weakly supervised deep learning architecture built upon the foundation model for image encoding.Morphological features are aggregated through an attention-guided multiple instance learning module and fused with clinical information via a cross-attention transformer.A router-based classification strategy further refines the prediction performance.PROGPATH was trained on 7999 whole-slide images(WSIs)from 6,670 patients across 15 cancer types,and extensively validated on 17 external cohorts with a total of 7374 WSIs from 4441 patients,covering 12 cancer types from 8 consortia and institutions across three continents.PROGPATH achieved consistently superior performance compared with state-of-the-art multimodal prognosis prediction models.It demonstrated strong generalizability across cancer types and robustness in stratified subgroups,including early-and advancedstage patients,treatment cohorts(radiotherapy and pharmaceutical therapy),and biomarker-defined subsets.We further provide model interpretability by identifying pathological patterns critical to PROGPATH’s risk predictions,such as the degree of cell differentiation and extent of necrosis.Together,these results highlight the potential of PROGPATH to support pancancer outcome prediction and inform personalized cancer management strategies.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant Nos.62402490 and 62072334.
文摘We study the novel problem of weakly supervised instance action recognition(WSiAR)in multi-person(crowd)scenes.We specifically aim to recognize the action of each subject in the crowd,for which we propose the use of a weakly supervised method,considering the expense of large-scale annotations for training.This problem is of great practical value for video surveillance and sports scene analysis.To this end,we investigated and designed a series of weak annotations for the supervision of weakly supervised instance action recognition(WSiAR).We propose two categories of weak label settings,bag labels and sparse labels,to significantly reduce the number of labels.Based on the former,we propose a novel sub-block-aware multi-instance learning(MIL)loss to obtain more effective information from weak labels during training.With respect to the latter,we propose a pseudo label generation strategy for extending sparse labels.This enables our method to achieve results comparable to those of fully supervised methods but with significantly fewer annotations.The experimental results on two benchmarks verified the rationality of the problem definition and effectiveness of the proposed weakly supervised training method in solving our problem.