Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably incr...Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL.展开更多
Recent years have witnessed significant progress in deep learning for remote sensing image Super-Resolution(SR).However,in real-world applications,paired data is often unavailable,making supervised training infeasible...Recent years have witnessed significant progress in deep learning for remote sensing image Super-Resolution(SR).However,in real-world applications,paired data is often unavailable,making supervised training infeasible,while unknown degradation factors constrain reconstruction performance and impair detail recovery.To this end,we propose a Degradation-Adaptive Self-supervised SR method,named DASSR,which recovers high-fidelity details from low-resolution remote sensing images without requiring supervision from high-resolution groundtruth.DASSR employs a dual-path closed-loop architecture,enabling joint learning of SR reconstruction and blur kernel estimation through cycle consistency in the main branch and regularization in the auxiliary branch.Specifically,we incorporate an Edge-Preserving SR network(EPSRN)into DASSR,whose core Hybrid Attention Enhancement Block(HAEB)captures precise structural representations to guide accurate detail reconstruction.Furthermore,a composite loss function is designed,integrating spatial reconstruction consistency,frequencydomain spectrum alignment,and kernel sparsity constraints to ensure stable and efficient self-supervised learning.Experiments on both simulated and real-world remote sensing datasets demonstrate that the proposed DASSR method outperforms competitive deep learning-based SR methods,notably achieving approximately 9%and 15%improvements in the Average Gradient(AG)and Spatial Frequency(SF)metrics,respectively,over the best-performing competitor.展开更多
Importance:Precisely decoding brain dysfunction from high-dimensional functional recordings is crucial for advancing our understanding of brain dysfunction in brain disorders.Self-supervised learning(SSL)models offer ...Importance:Precisely decoding brain dysfunction from high-dimensional functional recordings is crucial for advancing our understanding of brain dysfunction in brain disorders.Self-supervised learning(SSL)models offer a transformative approach for mapping dependencies in functional neuroimaging data.Leveraging the intrinsic organization of brain signals for comprehensive feature extraction,these models enable the analysis of critical neurofunctional features within a clinically relevant framework,overcoming challenges related to data heterogeneity and the scarcity of labeled data.Highlight:This paper provides a comprehensive overview of SSL techniques applied to functional neuroimaging data,such as functional magnetic resonance imaging and electroencephalography,with a specific focus on their applications in various neuropsychiatric disorders.We discuss 3 main categories of SSL methods:contrastive learning,generative learning,and generative-contrastive methods,outlining their basic principles and representative methods.Critically,we highlight the potential of SSL in addressing data scarcity,multimodal integration,and dynamic network modeling for disease detection and prediction.We showcase successful applications of these techniques in understanding and classifying conditions such as Alzheimer’s disease,Parkinson’s disease,and epilepsy,demonstrating their potential in downstream neuropsychological applications.Conclusion:SSL models provide a scalable and effective methodology for individual detection and prediction in brain disorders.Despite current limitations in interpretability and data heterogeneity,the potential of SSL for future clinical applications,particularly in the areas of transdiagnostic psychosis subtyping and decoding task-based brain functional recordings,is substantial.展开更多
The authors regret that there were errors in the affiliations and the funding declaration in the original published version.The affiliations a and b of the original manuscript are"School of Information Engineerin...The authors regret that there were errors in the affiliations and the funding declaration in the original published version.The affiliations a and b of the original manuscript are"School of Information Engineering,Jiangxi Provincial Key Laboratory of Advanced Signal Processing and Intelligent Communications,Nanchang University,Nanchang 330031,China",and"School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China",respectively.The order of the two affiliations are not correct.展开更多
Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),...Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),comprising an initialization followed by subsequent learning,selects a small subset of informative data points for labeling.Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks.However,their potential in cold-start AL remains unexplored.Methods:To validate the efficacy of domain-specific pretraining,we compared two foundation models:supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet.Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks:psychiatric pneumonia and COVID-19.For initialization,we assessed their integration with three strategies:diversity,uncertainty,and hybrid sampling.For subsequent learning,we focused on uncertainty sampling powered by different pretrained models.We also conducted statistical tests to compare the foundation models with ImageNet counterparts,investigate the relationship between initialization and subsequent learning,examine the performance of one-shot initialization against the full AL process,and investigate the influence of class balance in initialization samples on initialization and subsequent learning.Results:First,domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection.Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios.However,pretrained model-based initialization surpassed random sampling,the default approach in cold-start AL.Second,initialization performance was positively correlated with subsequent learning performance,highlighting the importance of initialization strategies.Third,one-shot initialization performed comparably to the full AL process,demonstrating the potential of reducing experts'repeated waiting during AL iterations.Last,a U-shaped correlation was observed between the class balance of initialization samples and model performance,suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.Conclusions:In this study,we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL.We also identified promising outcomes related to cold-start AL,including initialization based on pretrained models,the positive influence of initialization on subsequent learning,the potential for one-shot initialization,and the influence of class balance on middle-budget AL.Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.展开更多
In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to ...In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to the selfsupervised approach to eliminate the need for labels,thus avoiding the accompanied high cost of data collection and annotation.We first construct the dense connection network(DCnet)with three modules:the feature extraction module for extracting channel characteristic from a large amount of channel data,the feature fusion module for combining multidimensional features,and the prediction module for generating the HBF matrices.Next,we establish a lightweight network architecture,named as LDnet,to reduce the number of model parameters and computational complexity.The proposed sub-6GHz assisted approach eliminates mmWave pilot resources compared to the method using mmWave channel information directly.The simulation results indicate that the proposed DCnet and LDnet can achieve the spectral efficiency that is superior to the traditional orthogonal matching pursuit(OMP)algorithm by 13.66% and 10.44% under LOS scenarios and by 32.35% and 27.75% under NLOS scenarios,respectively.Moreover,the LDnet achieves 98.52% reduction in the number of model parameters and 22.93% reduction in computational complexity compared to DCnet.展开更多
State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to ac...State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.展开更多
By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral...By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By consolidating the DIL and DSL into a unified deep framework,the authors construct a more robust HSI SR method without any prior training and have great potential of flexible adaptation to different settings per obser-vation.To verify the effectiveness of the proposed approach,extensive experiments have been conducted on two benchmark HS datasets,including the CAVE and Harvard datasets,and demonstrate the great performance gain of the proposed method over the state-of-the-art methods.展开更多
We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks an...We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks and employ the Fourier positional encodings to enable the approximation of high-frequency modes.We formulate a self-supervised training objective for spectral learning and propose a novel regularization mechanism to ensure that the network finds the exact eigenfunctions instead of a space spanned by the eigenfunctions.Furthermore,we investigate the effect of weight normalization as a mechanism to alleviate the risk of recovering linear dependent modes,allowing us to accurately recover a large number of eigenpairs.The effectiveness of our methods is demonstrated across a collection of representative benchmarks including both local and non-local diffusion operators,as well as high-dimensional time-series data from a video sequence.Our results indicate that the present algorithm can outperform competing approaches in terms of both approximation accuracy and computational cost.展开更多
Reliable and automated three-dimensional segmentation of plant organs is essential for extracting phenotypic traits at the organ level.However,existing methods for plant organ segmentation predominantly rely on fully ...Reliable and automated three-dimensional segmentation of plant organs is essential for extracting phenotypic traits at the organ level.However,existing methods for plant organ segmentation predominantly rely on fully supervised learning,which still necessitates extensive point-by-point annotated datasets and fails to overcome the challenges associated with annotating plant point cloud data.In recent years,self-supervised learning-based point cloud segmentation methods have garnered widespread attention in both industry and academia because of their potential to alleviate the difficulties of point cloud data annotation to some extent.In this study,the paradigm of self-supervised learning is innovatively applied to the field of plant phenotyping through the development of the Plant-MAE,a self-supervised learning-based point cloud segmentation framework.The innovations of the Plant-MAE include a kernel-based point convolution embedding module and a multiangle feature extraction block(MAFEB)based on attention mechanisms.To validate the effectiveness of the model,extensive experiments were conducted on multiple point cloud datasets,which achieved competitive performance,with average precision,recall,F1 score,and IoU values of 92.08%,88.50%,89.80%,and 84.03%,respectively.The Plant-MAE out-performs advanced deep learning networks,including PointNet++,point transformer,and Point-M2AE,achieving average improvements of at least 0.53%,1.36%,0.88%,and 2.38%in precision,recall,F1 score,and IoU,respectively.Additionally,on the Pheno4D dataset,only half of the training data were necessary for fine-tuning to achieve performance comparable to that of the point transformer and PointNet++.This study provides technical support for the estimation of crop phenotypic parameters,thereby advancing the development of modern smart agriculture.展开更多
In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining ...In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods.展开更多
The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced met...The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced metering infrastructure services.However,this digital transformation also exposes power system to evolving threats,ranging from cyber intrusions and electricity theft to device malfunctions,and the unpredictable nature of these anomalies,coupled with the scarcity of labeled fault data,makes realtime detection exceptionally challenging.To address these difficulties,a real-time decision support framework is presented for smart meter anomality detection that leverages rolling time windows and two self-supervised contrastive learning modules.The first module synthesizes diverse negative samples to overcome the lack of labeled anomalies,while the second captures intrinsic temporal patterns for enhanced contextual discrimination.The end-to-end framework continuously updates its model with rolling updated meter data to deliver timely identification of emerging abnormal behaviors in evolving grids.Extensive evaluations on eight publicly available smart meter datasets over seven diverse abnormal patterns testing demonstrate the effectiveness of the proposed full framework,achieving average recall and F1 score of more than 0.85.展开更多
Scanning Electron Microscopes(SEMs)are widely used in experimental science laboratories,often requiring cumbersome and repetitive user analysis.Automating SEM image analysis processes is highly desirable to address th...Scanning Electron Microscopes(SEMs)are widely used in experimental science laboratories,often requiring cumbersome and repetitive user analysis.Automating SEM image analysis processes is highly desirable to address this challenge.In particle sample analysis,Machine Learning(ML)has emerged as the most effective approach for particle segmentation.However,the time-intensive process of manually annotating thousands of SEM images limits the applicability of supervised learning approaches.Self-Supervised Learning(SSL)offers a promising alternative by enabling knowledge extraction from raw,unlabeled data.This study presents a framework for evaluating SSL techniques in SEM image analysis,focusing on novel methods leveraging the ConvNeXtV2 architecture for particle detection.A dataset comprising 25,000 SEM images is curated to benchmark these proposed SSL methods.The results demonstrate that ConvNeXtV2 models,with varying parameter counts,consistently outperform other techniques in particle detection across different length scales,achieving up to a34%reduction in relative error compared to established SSL methods.Furthermore,an ablation study explores the relationship between dataset size and SSL performance,providing actionable insights for practitioners regarding model selection and resource efficiency.This research advances the integration of SSL into autonomous analysis pipelines and supports its application in accelerating materials science discovery.展开更多
The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-super...The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.展开更多
Automated grading of dandruff severity is a clinically significant but challenging task due to the inherent ordinal nature of severity levels and the high prevalence of label noise from subjective expert annotations.S...Automated grading of dandruff severity is a clinically significant but challenging task due to the inherent ordinal nature of severity levels and the high prevalence of label noise from subjective expert annotations.Standard classification methods fail to address these dual challenges,limiting their real-world performance.In this paper,a novel,three-phase training framework is proposed that learns a robust ordinal classifier directly from noisy labels.The approach synergistically combines a rank-based ordinal regression backbone with a cooperative,semi-supervised learning strategy to dynamically partition the data into clean and noisy subsets.A hybrid training objective is then employed,applying a supervised ordinal loss to the clean set.The noisy set is simultaneously trained using a dualobjective that combines a semi-supervised ordinal loss with a parallel,label-agnostic contrastive loss.This design allows themodel to learn fromthe entire noisy subset while using contrastive learning to mitigate the risk of error propagation frompotentially corrupt supervision.Extensive experiments on a new,large-scale,multi-site clinical dataset validate our approach.Themethod achieves state-of-the-art performance with 80.71%accuracy and a 76.86%F1-score,significantly outperforming existing approaches,including a 2.26%improvement over the strongest baseline method.This work provides not only a robust solution for a practical medical imaging problem but also a generalizable framework for other tasks plagued by noisy ordinal labels.展开更多
Recently,deeplearning based fingerprint localization has attracted significant interest due to its simplicity in implementation and effectiveness in complex multipath environments,especially for the Internet of Things...Recently,deeplearning based fingerprint localization has attracted significant interest due to its simplicity in implementation and effectiveness in complex multipath environments,especially for the Internet of Things(loT)devices in multiple-input multiple-output(MiMO)-orthogonal frequency-division multiplexing(OFDM)system.However,the huge amount of training data collection has become a challenge,which increases the labor burden of fingerprint localization heavily and hinders its large-scale implementation.In this paper,we propose a novel fingerprint localization system,termed as SiamResNet,which can be trained only on the radio map by contrastive self-supervised learning without the need for any other additional data.To be more specific,we first model the fingerprint localization problem as a dictionary look-up task.Subsequently,a channel fingerprint capturing the multipath angle and delay of wireless propagation is introduced,which exhibits excellent uniqueness,stability,and distinguishability.Meanwhile,we propose the corresponding data augmentation strategy to ensure data diversity when generating the training data from the radio map.Thus,the cost of data collection for training can be significantly reduced.Lastly,the Siamese architecture based SiamResNet is applied for location estimation,which can comprehensively extract the features of fingerprints and accurately compare the similarity of any fingerprint to the radio map in the representation space.The performance of the proposed localization method is validated through extensive simulations with a ray-tracing channel model,which demonstrates promising localization accuracy for our SiamResNet with reduced training costs.展开更多
The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large la...The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.展开更多
Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propos...Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.展开更多
Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by ...Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by the scarcity of high-quality annotated medical imaging data. An emerging solution is self-supervised learning (SSL), among which contrastive SSL is the most successful approach to rivalling or outperforming supervised learning. This review investigates several state-of-the-art contrastive SSL algorithms originally on natural images as well as their adaptations for medical images, and concludes by discussing recent advances, current limitations, and future directions in applying contrastive SSL in the medical domain.展开更多
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
文摘Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL.
基金National Natural Science Foundation of China(Nos.42501465,42471504)。
文摘Recent years have witnessed significant progress in deep learning for remote sensing image Super-Resolution(SR).However,in real-world applications,paired data is often unavailable,making supervised training infeasible,while unknown degradation factors constrain reconstruction performance and impair detail recovery.To this end,we propose a Degradation-Adaptive Self-supervised SR method,named DASSR,which recovers high-fidelity details from low-resolution remote sensing images without requiring supervision from high-resolution groundtruth.DASSR employs a dual-path closed-loop architecture,enabling joint learning of SR reconstruction and blur kernel estimation through cycle consistency in the main branch and regularization in the auxiliary branch.Specifically,we incorporate an Edge-Preserving SR network(EPSRN)into DASSR,whose core Hybrid Attention Enhancement Block(HAEB)captures precise structural representations to guide accurate detail reconstruction.Furthermore,a composite loss function is designed,integrating spatial reconstruction consistency,frequencydomain spectrum alignment,and kernel sparsity constraints to ensure stable and efficient self-supervised learning.Experiments on both simulated and real-world remote sensing datasets demonstrate that the proposed DASSR method outperforms competitive deep learning-based SR methods,notably achieving approximately 9%and 15%improvements in the Average Gradient(AG)and Spatial Frequency(SF)metrics,respectively,over the best-performing competitor.
基金supported by grants from the National Natural Science Foundation of P.R.China(62276081 and 62106113)Guangdong Basic and Applied Basic Research Foundation(2023A1515010792 and 2023B1515120065)Shenzhen Science and Technology Program(GXWD20231129121139001 and JCYJ20240813110522029).
文摘Importance:Precisely decoding brain dysfunction from high-dimensional functional recordings is crucial for advancing our understanding of brain dysfunction in brain disorders.Self-supervised learning(SSL)models offer a transformative approach for mapping dependencies in functional neuroimaging data.Leveraging the intrinsic organization of brain signals for comprehensive feature extraction,these models enable the analysis of critical neurofunctional features within a clinically relevant framework,overcoming challenges related to data heterogeneity and the scarcity of labeled data.Highlight:This paper provides a comprehensive overview of SSL techniques applied to functional neuroimaging data,such as functional magnetic resonance imaging and electroencephalography,with a specific focus on their applications in various neuropsychiatric disorders.We discuss 3 main categories of SSL methods:contrastive learning,generative learning,and generative-contrastive methods,outlining their basic principles and representative methods.Critically,we highlight the potential of SSL in addressing data scarcity,multimodal integration,and dynamic network modeling for disease detection and prediction.We showcase successful applications of these techniques in understanding and classifying conditions such as Alzheimer’s disease,Parkinson’s disease,and epilepsy,demonstrating their potential in downstream neuropsychological applications.Conclusion:SSL models provide a scalable and effective methodology for individual detection and prediction in brain disorders.Despite current limitations in interpretability and data heterogeneity,the potential of SSL for future clinical applications,particularly in the areas of transdiagnostic psychosis subtyping and decoding task-based brain functional recordings,is substantial.
文摘The authors regret that there were errors in the affiliations and the funding declaration in the original published version.The affiliations a and b of the original manuscript are"School of Information Engineering,Jiangxi Provincial Key Laboratory of Advanced Signal Processing and Intelligent Communications,Nanchang University,Nanchang 330031,China",and"School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China",respectively.The order of the two affiliations are not correct.
文摘Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),comprising an initialization followed by subsequent learning,selects a small subset of informative data points for labeling.Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks.However,their potential in cold-start AL remains unexplored.Methods:To validate the efficacy of domain-specific pretraining,we compared two foundation models:supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet.Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks:psychiatric pneumonia and COVID-19.For initialization,we assessed their integration with three strategies:diversity,uncertainty,and hybrid sampling.For subsequent learning,we focused on uncertainty sampling powered by different pretrained models.We also conducted statistical tests to compare the foundation models with ImageNet counterparts,investigate the relationship between initialization and subsequent learning,examine the performance of one-shot initialization against the full AL process,and investigate the influence of class balance in initialization samples on initialization and subsequent learning.Results:First,domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection.Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios.However,pretrained model-based initialization surpassed random sampling,the default approach in cold-start AL.Second,initialization performance was positively correlated with subsequent learning performance,highlighting the importance of initialization strategies.Third,one-shot initialization performed comparably to the full AL process,demonstrating the potential of reducing experts'repeated waiting during AL iterations.Last,a U-shaped correlation was observed between the class balance of initialization samples and model performance,suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.Conclusions:In this study,we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL.We also identified promising outcomes related to cold-start AL,including initialization based on pretrained models,the positive influence of initialization on subsequent learning,the potential for one-shot initialization,and the influence of class balance on middle-budget AL.Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.
基金supported in part by the National Natural Science Foundation of China under Grants 62325107,62341107,62261160650,and U23A20272in part by the Beijing Natural Science Foundation under Grant L222002.
文摘In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to the selfsupervised approach to eliminate the need for labels,thus avoiding the accompanied high cost of data collection and annotation.We first construct the dense connection network(DCnet)with three modules:the feature extraction module for extracting channel characteristic from a large amount of channel data,the feature fusion module for combining multidimensional features,and the prediction module for generating the HBF matrices.Next,we establish a lightweight network architecture,named as LDnet,to reduce the number of model parameters and computational complexity.The proposed sub-6GHz assisted approach eliminates mmWave pilot resources compared to the method using mmWave channel information directly.The simulation results indicate that the proposed DCnet and LDnet can achieve the spectral efficiency that is superior to the traditional orthogonal matching pursuit(OMP)algorithm by 13.66% and 10.44% under LOS scenarios and by 32.35% and 27.75% under NLOS scenarios,respectively.Moreover,the LDnet achieves 98.52% reduction in the number of model parameters and 22.93% reduction in computational complexity compared to DCnet.
基金funded by the “SMART BATTERY” project, granted by Villum Foundation in 2021 (project number 222860)。
文摘State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.
基金Ministry of Education,Culture,Sports,Science and Technology,Grant/Award Number:20K11867。
文摘By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By consolidating the DIL and DSL into a unified deep framework,the authors construct a more robust HSI SR method without any prior training and have great potential of flexible adaptation to different settings per obser-vation.To verify the effectiveness of the proposed approach,extensive experiments have been conducted on two benchmark HS datasets,including the CAVE and Harvard datasets,and demonstrate the great performance gain of the proposed method over the state-of-the-art methods.
基金Project supported by the U.S.Department of Energy under the Advanced Scientific Computing Research Program(No.DE-SC0019116)the U.S.Air Force Office of Scientific Research(No.AFOSR FA9550-20-1-0060)。
文摘We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks and employ the Fourier positional encodings to enable the approximation of high-frequency modes.We formulate a self-supervised training objective for spectral learning and propose a novel regularization mechanism to ensure that the network finds the exact eigenfunctions instead of a space spanned by the eigenfunctions.Furthermore,we investigate the effect of weight normalization as a mechanism to alleviate the risk of recovering linear dependent modes,allowing us to accurately recover a large number of eigenpairs.The effectiveness of our methods is demonstrated across a collection of representative benchmarks including both local and non-local diffusion operators,as well as high-dimensional time-series data from a video sequence.Our results indicate that the present algorithm can outperform competing approaches in terms of both approximation accuracy and computational cost.
基金This work was supported by the National Key Research and Development Program of China(2023YFF1000100,2022YFD2002304).
文摘Reliable and automated three-dimensional segmentation of plant organs is essential for extracting phenotypic traits at the organ level.However,existing methods for plant organ segmentation predominantly rely on fully supervised learning,which still necessitates extensive point-by-point annotated datasets and fails to overcome the challenges associated with annotating plant point cloud data.In recent years,self-supervised learning-based point cloud segmentation methods have garnered widespread attention in both industry and academia because of their potential to alleviate the difficulties of point cloud data annotation to some extent.In this study,the paradigm of self-supervised learning is innovatively applied to the field of plant phenotyping through the development of the Plant-MAE,a self-supervised learning-based point cloud segmentation framework.The innovations of the Plant-MAE include a kernel-based point convolution embedding module and a multiangle feature extraction block(MAFEB)based on attention mechanisms.To validate the effectiveness of the model,extensive experiments were conducted on multiple point cloud datasets,which achieved competitive performance,with average precision,recall,F1 score,and IoU values of 92.08%,88.50%,89.80%,and 84.03%,respectively.The Plant-MAE out-performs advanced deep learning networks,including PointNet++,point transformer,and Point-M2AE,achieving average improvements of at least 0.53%,1.36%,0.88%,and 2.38%in precision,recall,F1 score,and IoU,respectively.Additionally,on the Pheno4D dataset,only half of the training data were necessary for fine-tuning to achieve performance comparable to that of the point transformer and PointNet++.This study provides technical support for the estimation of crop phenotypic parameters,thereby advancing the development of modern smart agriculture.
基金supported by the National Natural Science Foundation of China under Grant No.62173317the Key Research and Development Program of Anhui under Grant No.202104a05020064。
文摘In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods.
文摘The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced metering infrastructure services.However,this digital transformation also exposes power system to evolving threats,ranging from cyber intrusions and electricity theft to device malfunctions,and the unpredictable nature of these anomalies,coupled with the scarcity of labeled fault data,makes realtime detection exceptionally challenging.To address these difficulties,a real-time decision support framework is presented for smart meter anomality detection that leverages rolling time windows and two self-supervised contrastive learning modules.The first module synthesizes diverse negative samples to overcome the lack of labeled anomalies,while the second captures intrinsic temporal patterns for enhanced contextual discrimination.The end-to-end framework continuously updates its model with rolling updated meter data to deliver timely identification of emerging abnormal behaviors in evolving grids.Extensive evaluations on eight publicly available smart meter datasets over seven diverse abnormal patterns testing demonstrate the effectiveness of the proposed full framework,achieving average recall and F1 score of more than 0.85.
基金funded by the German Research Foundation(DFG)under Project ID 390874152(POLiS Cluster of Excellence)N.J.S.,A.G.,G.C.,O.D.,A.J.were supported by the D2S2 program within the U.S.Department of Energy,Office of Basic Energy Sciences,Materials Sciences and Engineering Division under Contract No.DE-AC02-05-CH11231(D2S2 program,KCD2S2)A.G.acknowledges support from the Swiss National Science Foundation(SNSF,project#P500PN_222166).
文摘Scanning Electron Microscopes(SEMs)are widely used in experimental science laboratories,often requiring cumbersome and repetitive user analysis.Automating SEM image analysis processes is highly desirable to address this challenge.In particle sample analysis,Machine Learning(ML)has emerged as the most effective approach for particle segmentation.However,the time-intensive process of manually annotating thousands of SEM images limits the applicability of supervised learning approaches.Self-Supervised Learning(SSL)offers a promising alternative by enabling knowledge extraction from raw,unlabeled data.This study presents a framework for evaluating SSL techniques in SEM image analysis,focusing on novel methods leveraging the ConvNeXtV2 architecture for particle detection.A dataset comprising 25,000 SEM images is curated to benchmark these proposed SSL methods.The results demonstrate that ConvNeXtV2 models,with varying parameter counts,consistently outperform other techniques in particle detection across different length scales,achieving up to a34%reduction in relative error compared to established SSL methods.Furthermore,an ablation study explores the relationship between dataset size and SSL performance,providing actionable insights for practitioners regarding model selection and resource efficiency.This research advances the integration of SSL into autonomous analysis pipelines and supports its application in accelerating materials science discovery.
基金supported by the National Natural Science Foundation of China(32471964)。
文摘The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.
文摘Automated grading of dandruff severity is a clinically significant but challenging task due to the inherent ordinal nature of severity levels and the high prevalence of label noise from subjective expert annotations.Standard classification methods fail to address these dual challenges,limiting their real-world performance.In this paper,a novel,three-phase training framework is proposed that learns a robust ordinal classifier directly from noisy labels.The approach synergistically combines a rank-based ordinal regression backbone with a cooperative,semi-supervised learning strategy to dynamically partition the data into clean and noisy subsets.A hybrid training objective is then employed,applying a supervised ordinal loss to the clean set.The noisy set is simultaneously trained using a dualobjective that combines a semi-supervised ordinal loss with a parallel,label-agnostic contrastive loss.This design allows themodel to learn fromthe entire noisy subset while using contrastive learning to mitigate the risk of error propagation frompotentially corrupt supervision.Extensive experiments on a new,large-scale,multi-site clinical dataset validate our approach.Themethod achieves state-of-the-art performance with 80.71%accuracy and a 76.86%F1-score,significantly outperforming existing approaches,including a 2.26%improvement over the strongest baseline method.This work provides not only a robust solution for a practical medical imaging problem but also a generalizable framework for other tasks plagued by noisy ordinal labels.
文摘Recently,deeplearning based fingerprint localization has attracted significant interest due to its simplicity in implementation and effectiveness in complex multipath environments,especially for the Internet of Things(loT)devices in multiple-input multiple-output(MiMO)-orthogonal frequency-division multiplexing(OFDM)system.However,the huge amount of training data collection has become a challenge,which increases the labor burden of fingerprint localization heavily and hinders its large-scale implementation.In this paper,we propose a novel fingerprint localization system,termed as SiamResNet,which can be trained only on the radio map by contrastive self-supervised learning without the need for any other additional data.To be more specific,we first model the fingerprint localization problem as a dictionary look-up task.Subsequently,a channel fingerprint capturing the multipath angle and delay of wireless propagation is introduced,which exhibits excellent uniqueness,stability,and distinguishability.Meanwhile,we propose the corresponding data augmentation strategy to ensure data diversity when generating the training data from the radio map.Thus,the cost of data collection for training can be significantly reduced.Lastly,the Siamese architecture based SiamResNet is applied for location estimation,which can comprehensively extract the features of fingerprints and accurately compare the similarity of any fingerprint to the radio map in the representation space.The performance of the proposed localization method is validated through extensive simulations with a ray-tracing channel model,which demonstrates promising localization accuracy for our SiamResNet with reduced training costs.
基金supported in part by the National Natural Science Foundation of China under Grant 62071405the National Natural Science Foundation of China under Grant 12175189.
文摘The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.
文摘Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.
文摘Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by the scarcity of high-quality annotated medical imaging data. An emerging solution is self-supervised learning (SSL), among which contrastive SSL is the most successful approach to rivalling or outperforming supervised learning. This review investigates several state-of-the-art contrastive SSL algorithms originally on natural images as well as their adaptations for medical images, and concludes by discussing recent advances, current limitations, and future directions in applying contrastive SSL in the medical domain.