Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),...Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),comprising an initialization followed by subsequent learning,selects a small subset of informative data points for labeling.Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks.However,their potential in cold-start AL remains unexplored.Methods:To validate the efficacy of domain-specific pretraining,we compared two foundation models:supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet.Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks:psychiatric pneumonia and COVID-19.For initialization,we assessed their integration with three strategies:diversity,uncertainty,and hybrid sampling.For subsequent learning,we focused on uncertainty sampling powered by different pretrained models.We also conducted statistical tests to compare the foundation models with ImageNet counterparts,investigate the relationship between initialization and subsequent learning,examine the performance of one-shot initialization against the full AL process,and investigate the influence of class balance in initialization samples on initialization and subsequent learning.Results:First,domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection.Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios.However,pretrained model-based initialization surpassed random sampling,the default approach in cold-start AL.Second,initialization performance was positively correlated with subsequent learning performance,highlighting the importance of initialization strategies.Third,one-shot initialization performed comparably to the full AL process,demonstrating the potential of reducing experts'repeated waiting during AL iterations.Last,a U-shaped correlation was observed between the class balance of initialization samples and model performance,suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.Conclusions:In this study,we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL.We also identified promising outcomes related to cold-start AL,including initialization based on pretrained models,the positive influence of initialization on subsequent learning,the potential for one-shot initialization,and the influence of class balance on middle-budget AL.Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.展开更多
In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to ...In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to the selfsupervised approach to eliminate the need for labels,thus avoiding the accompanied high cost of data collection and annotation.We first construct the dense connection network(DCnet)with three modules:the feature extraction module for extracting channel characteristic from a large amount of channel data,the feature fusion module for combining multidimensional features,and the prediction module for generating the HBF matrices.Next,we establish a lightweight network architecture,named as LDnet,to reduce the number of model parameters and computational complexity.The proposed sub-6GHz assisted approach eliminates mmWave pilot resources compared to the method using mmWave channel information directly.The simulation results indicate that the proposed DCnet and LDnet can achieve the spectral efficiency that is superior to the traditional orthogonal matching pursuit(OMP)algorithm by 13.66% and 10.44% under LOS scenarios and by 32.35% and 27.75% under NLOS scenarios,respectively.Moreover,the LDnet achieves 98.52% reduction in the number of model parameters and 22.93% reduction in computational complexity compared to DCnet.展开更多
By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral...By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By consolidating the DIL and DSL into a unified deep framework,the authors construct a more robust HSI SR method without any prior training and have great potential of flexible adaptation to different settings per obser-vation.To verify the effectiveness of the proposed approach,extensive experiments have been conducted on two benchmark HS datasets,including the CAVE and Harvard datasets,and demonstrate the great performance gain of the proposed method over the state-of-the-art methods.展开更多
State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to ac...State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.展开更多
We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks an...We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks and employ the Fourier positional encodings to enable the approximation of high-frequency modes.We formulate a self-supervised training objective for spectral learning and propose a novel regularization mechanism to ensure that the network finds the exact eigenfunctions instead of a space spanned by the eigenfunctions.Furthermore,we investigate the effect of weight normalization as a mechanism to alleviate the risk of recovering linear dependent modes,allowing us to accurately recover a large number of eigenpairs.The effectiveness of our methods is demonstrated across a collection of representative benchmarks including both local and non-local diffusion operators,as well as high-dimensional time-series data from a video sequence.Our results indicate that the present algorithm can outperform competing approaches in terms of both approximation accuracy and computational cost.展开更多
In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining ...In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods.展开更多
Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably incr...Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL.展开更多
Seismic data denoising is a critical process usually applied at various stages of the seismic processing workflow,as our ability to mitigate noise in seismic data affects the quality of our subsequent analyses.However...Seismic data denoising is a critical process usually applied at various stages of the seismic processing workflow,as our ability to mitigate noise in seismic data affects the quality of our subsequent analyses.However,finding an optimal balance between preserving seismic signals and effectively reducing seismic noise presents a substantial challenge.In this study,we introduce a multi-stage deep learning model,trained in a self-supervised manner,designed specifically to suppress seismic noise while minimizing signal leakage.This model operates as a patch-based approach,extracting overlapping patches from the noisy data and converting them into 1D vectors for input.It consists of two identical sub-networks,each configured differently.Inspired by the transformer architecture,each sub-network features an embedded block that comprises two fully connected layers,which are utilized for feature extraction from the input patches.After reshaping,a multi-head attention module enhances the model’s focus on significant features by assigning higher attention weights to them.The key difference between the two sub-networks lies in the number of neurons within their fully connected layers.The first sub-network serves as a strong denoiser with a small number of neurons,effectively attenuating seismic noise;in contrast,the second sub-network functions as a signal-add-back model,using a larger number of neurons to retrieve some of the signal that was not preserved in the output of the first sub-network.The proposed model produces two outputs,each corresponding to one of the sub-networks,and both sub-networks are optimized simultaneously using the noisy data as the label for both outputs.Evaluations conducted on both synthetic and field data demonstrate the model’s effectiveness in suppressing seismic noise with minimal signal leakage,outperforming some benchmark methods.展开更多
The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced met...The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced metering infrastructure services.However,this digital transformation also exposes power system to evolving threats,ranging from cyber intrusions and electricity theft to device malfunctions,and the unpredictable nature of these anomalies,coupled with the scarcity of labeled fault data,makes realtime detection exceptionally challenging.To address these difficulties,a real-time decision support framework is presented for smart meter anomality detection that leverages rolling time windows and two self-supervised contrastive learning modules.The first module synthesizes diverse negative samples to overcome the lack of labeled anomalies,while the second captures intrinsic temporal patterns for enhanced contextual discrimination.The end-to-end framework continuously updates its model with rolling updated meter data to deliver timely identification of emerging abnormal behaviors in evolving grids.Extensive evaluations on eight publicly available smart meter datasets over seven diverse abnormal patterns testing demonstrate the effectiveness of the proposed full framework,achieving average recall and F1 score of more than 0.85.展开更多
Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propos...Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.展开更多
In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in s...In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in surface measurements, plays a key role in this field. However, 3D imaging based on confocal microscopy is often challenged by significant data requirements and slow measurement speeds. In this paper, we present a novel self-supervised learning algorithm called SSL Depth that overcomes these challenges. Specifically, our method exploits the feature learning capabilities of neural networks while avoiding the need for labeled data sets typically associated with supervised learning approaches. Through practical demonstrations on a commercially available confocal microscope, we find that our method not only maintains higher quality, but also significantly reduces the frequency of the z-axis sampling required for 3D imaging. This reduction results in a remarkable 16×measurement speed, with the potential for further acceleration in the future. Our methodological advance enables highly efficient and accurate 3D surface reconstructions, thereby expanding the potential applications of confocal microscopy in various scientific and industrial fields.展开更多
Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i...Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.展开更多
Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constr...Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constrained by the high costs of check-up tests and the scarcity of labeled data.This paper presents a framework utilizing self-supervised machine learning to harness the potential of unlabeled data for diagnosing battery aging in electric vehicles during field operations.We validate our method using battery degradation datasets collected over more than two years from twenty real-world electric vehicles.Our analysis comprehensively addresses cell inconsistencies,physical interpretations,and charging uncertainties in real-world applications.This is achieved through self-supervised feature extraction using random short charging sequences in the main peak of incremental capacity curves.By leveraging inexpensive unlabeled data in a self-supervised approach,our method demonstrates improvements in average root mean square errors of 74.54%and 60.50%in the best and worst cases,respectively,compared to the supervised benchmark.This work underscores the potential of employing low-cost unlabeled data with self-supervised machine learning for effective battery health and safety management in realworld scenarios.展开更多
Few-shot learning has emerged as a crucial technique for coral species classification,addressing the challenge of limited labeled data in underwater environments.This study introduces an optimized few-shot learning mo...Few-shot learning has emerged as a crucial technique for coral species classification,addressing the challenge of limited labeled data in underwater environments.This study introduces an optimized few-shot learning model that enhances classification accuracy while minimizing reliance on extensive data collection.The proposed model integrates a hybrid similarity measure combining Euclidean distance and cosine similarity,effectively capturing both feature magnitude and directional relationships.This approach achieves a notable accuracy of 71.8%under a 5-way 5-shot evaluation,outperforming state-of-the-art models such as Prototypical Networks,FEAT,and ESPT by up to 10%.Notably,the model demonstrates high precision in classifying Siderastreidae(87.52%)and Fungiidae(88.95%),underscoring its effectiveness in distinguishing subtle morphological differences.To further enhance performance,we incorporate a self-supervised learning mechanism based on contrastive learning,enabling the model to extract robust representations by leveraging local structural patterns in corals.This enhancement significantly improves classification accuracy,particularly for species with high intra-class variation,leading to an overall accuracy of 76.52%under a 5-way 10-shot evaluation.Additionally,the model exploits the repetitive structures inherent in corals,introducing a local feature aggregation strategy that refines classification through spatial information integration.Beyond its technical contributions,this study presents a scalable and efficient approach for automated coral reef monitoring,reducing annotation costs while maintaining high classification accuracy.By improving few-shot learning performance in underwater environments,our model enhances monitoring accuracy by up to 15%compared to traditional methods,offering a practical solution for large-scale coral conservation efforts.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
Computed Tomography(CT)reconstruction is essential inmedical imaging and other engineering fields.However,blurring of the projection during CT imaging can lead to artifacts in the reconstructed images.Projection blur ...Computed Tomography(CT)reconstruction is essential inmedical imaging and other engineering fields.However,blurring of the projection during CT imaging can lead to artifacts in the reconstructed images.Projection blur combines factors such as larger ray sources,scattering and imaging system vibration.To address the problem,we propose DeblurTomo,a novel self-supervised learning-based deblurring and reconstruction algorithm that efficiently reconstructs sharp CT images from blurry input without needing external data and blur measurement.Specifically,we constructed a coordinate-based implicit neural representation reconstruction network,which can map the coordinates to the attenuation coefficient in the reconstructed space formore convenient ray representation.Then,wemodel the blur as aweighted sumof offset rays and design the RayCorrectionNetwork(RCN)andWeight ProposalNetwork(WPN)to fit these rays and their weights bymulti-view consistency and geometric information,thereby extending 2D deblurring to 3D space.In the training phase,we use the blurry input as the supervision signal to optimize the reconstruction network,the RCN,and the WPN simultaneously.Extensive experiments on the widely used synthetic dataset show that DeblurTomo performs superiorly on the limited-angle and sparse-view in the simulated blurred scenarios.Further experiments on real datasets demonstrate the superiority of our method in practical scenarios.展开更多
Blended acquisition offers efficiency improvements over conventional seismic data acquisition, at the cost of introducing blending noise effects. Besides, seismic data often suffers from irregularly missing shots caus...Blended acquisition offers efficiency improvements over conventional seismic data acquisition, at the cost of introducing blending noise effects. Besides, seismic data often suffers from irregularly missing shots caused by artificial or natural effects during blended acquisition. Therefore, blending noise attenuation and missing shots reconstruction are essential for providing high-quality seismic data for further seismic processing and interpretation. The iterative shrinkage thresholding algorithm can help obtain deblended data based on sparsity assumptions of complete unblended data, and it characterizes seismic data linearly. Supervised learning algorithms can effectively capture the nonlinear relationship between incomplete pseudo-deblended data and complete unblended data. However, the dependence on complete unblended labels limits their practicality in field applications. Consequently, a self-supervised algorithm is presented for simultaneous deblending and interpolation of incomplete blended data, which minimizes the difference between simulated and observed incomplete pseudo-deblended data. The used blind-trace U-Net (BTU-Net) prevents identity mapping during complete unblended data estimation. Furthermore, a multistep process with blending noise simulation-subtraction and missing traces reconstruction-insertion is used in each step to improve the deblending and interpolation performance. Experiments with synthetic and field incomplete blended data demonstrate the effectiveness of the multistep self-supervised BTU-Net algorithm.展开更多
Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by ...Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by the scarcity of high-quality annotated medical imaging data. An emerging solution is self-supervised learning (SSL), among which contrastive SSL is the most successful approach to rivalling or outperforming supervised learning. This review investigates several state-of-the-art contrastive SSL algorithms originally on natural images as well as their adaptations for medical images, and concludes by discussing recent advances, current limitations, and future directions in applying contrastive SSL in the medical domain.展开更多
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
文摘Objective:Deep learning(DL)has become the prevailing method in chest radiograph analysis,yet its performance heavily depends on large quantities of annotated images.To mitigate the cost,cold-start active learning(AL),comprising an initialization followed by subsequent learning,selects a small subset of informative data points for labeling.Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks.However,their potential in cold-start AL remains unexplored.Methods:To validate the efficacy of domain-specific pretraining,we compared two foundation models:supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet.Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks:psychiatric pneumonia and COVID-19.For initialization,we assessed their integration with three strategies:diversity,uncertainty,and hybrid sampling.For subsequent learning,we focused on uncertainty sampling powered by different pretrained models.We also conducted statistical tests to compare the foundation models with ImageNet counterparts,investigate the relationship between initialization and subsequent learning,examine the performance of one-shot initialization against the full AL process,and investigate the influence of class balance in initialization samples on initialization and subsequent learning.Results:First,domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection.Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios.However,pretrained model-based initialization surpassed random sampling,the default approach in cold-start AL.Second,initialization performance was positively correlated with subsequent learning performance,highlighting the importance of initialization strategies.Third,one-shot initialization performed comparably to the full AL process,demonstrating the potential of reducing experts'repeated waiting during AL iterations.Last,a U-shaped correlation was observed between the class balance of initialization samples and model performance,suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.Conclusions:In this study,we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL.We also identified promising outcomes related to cold-start AL,including initialization based on pretrained models,the positive influence of initialization on subsequent learning,the potential for one-shot initialization,and the influence of class balance on middle-budget AL.Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.
基金supported in part by the National Natural Science Foundation of China under Grants 62325107,62341107,62261160650,and U23A20272in part by the Beijing Natural Science Foundation under Grant L222002.
文摘In this paper,we propose a sub-6GHz channel assisted hybrid beamforming(HBF)for mmWave system under both line-of-sight(LOS)and non-line-of-sight(NLOS)scenarios without mmWave channel estimation.Meanwhile,we resort to the selfsupervised approach to eliminate the need for labels,thus avoiding the accompanied high cost of data collection and annotation.We first construct the dense connection network(DCnet)with three modules:the feature extraction module for extracting channel characteristic from a large amount of channel data,the feature fusion module for combining multidimensional features,and the prediction module for generating the HBF matrices.Next,we establish a lightweight network architecture,named as LDnet,to reduce the number of model parameters and computational complexity.The proposed sub-6GHz assisted approach eliminates mmWave pilot resources compared to the method using mmWave channel information directly.The simulation results indicate that the proposed DCnet and LDnet can achieve the spectral efficiency that is superior to the traditional orthogonal matching pursuit(OMP)algorithm by 13.66% and 10.44% under LOS scenarios and by 32.35% and 27.75% under NLOS scenarios,respectively.Moreover,the LDnet achieves 98.52% reduction in the number of model parameters and 22.93% reduction in computational complexity compared to DCnet.
基金Ministry of Education,Culture,Sports,Science and Technology,Grant/Award Number:20K11867。
文摘By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By consolidating the DIL and DSL into a unified deep framework,the authors construct a more robust HSI SR method without any prior training and have great potential of flexible adaptation to different settings per obser-vation.To verify the effectiveness of the proposed approach,extensive experiments have been conducted on two benchmark HS datasets,including the CAVE and Harvard datasets,and demonstrate the great performance gain of the proposed method over the state-of-the-art methods.
基金funded by the “SMART BATTERY” project, granted by Villum Foundation in 2021 (project number 222860)。
文摘State of health(SoH) estimation plays a key role in smart battery health prognostic and management.However,poor generalization,lack of labeled data,and unused measurements during aging are still major challenges to accurate SoH estimation.Toward this end,this paper proposes a self-supervised learning framework to boost the performance of battery SoH estimation.Different from traditional data-driven methods which rely on a considerable training dataset obtained from numerous battery cells,the proposed method achieves accurate and robust estimations using limited labeled data.A filter-based data preprocessing technique,which enables the extraction of partial capacity-voltage curves under dynamic charging profiles,is applied at first.Unsupervised learning is then used to learn the aging characteristics from the unlabeled data through an auto-encoder-decoder.The learned network parameters are transferred to the downstream SoH estimation task and are fine-tuned with very few sparsely labeled data,which boosts the performance of the estimation framework.The proposed method has been validated under different battery chemistries,formats,operating conditions,and ambient.The estimation accuracy can be guaranteed by using only three labeled data from the initial 20% life cycles,with overall errors less than 1.14% and error distribution of all testing scenarios maintaining less than 4%,and robustness increases with aging.Comparisons with other pure supervised machine learning methods demonstrate the superiority of the proposed method.This simple and data-efficient estimation framework is promising in real-world applications under a variety of scenarios.
基金Project supported by the U.S.Department of Energy under the Advanced Scientific Computing Research Program(No.DE-SC0019116)the U.S.Air Force Office of Scientific Research(No.AFOSR FA9550-20-1-0060)。
文摘We propose a self-supervising learning framework for finding the dominant eigenfunction-eigenvalue pairs of linear and self-adjoint operators.We represent target eigenfunctions with coordinate-based neural networks and employ the Fourier positional encodings to enable the approximation of high-frequency modes.We formulate a self-supervised training objective for spectral learning and propose a novel regularization mechanism to ensure that the network finds the exact eigenfunctions instead of a space spanned by the eigenfunctions.Furthermore,we investigate the effect of weight normalization as a mechanism to alleviate the risk of recovering linear dependent modes,allowing us to accurately recover a large number of eigenpairs.The effectiveness of our methods is demonstrated across a collection of representative benchmarks including both local and non-local diffusion operators,as well as high-dimensional time-series data from a video sequence.Our results indicate that the present algorithm can outperform competing approaches in terms of both approximation accuracy and computational cost.
基金supported by the National Natural Science Foundation of China under Grant No.62173317the Key Research and Development Program of Anhui under Grant No.202104a05020064。
文摘In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods.
文摘Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL.
基金supported by the King Abdullah University of Science and Technology(KAUST)。
文摘Seismic data denoising is a critical process usually applied at various stages of the seismic processing workflow,as our ability to mitigate noise in seismic data affects the quality of our subsequent analyses.However,finding an optimal balance between preserving seismic signals and effectively reducing seismic noise presents a substantial challenge.In this study,we introduce a multi-stage deep learning model,trained in a self-supervised manner,designed specifically to suppress seismic noise while minimizing signal leakage.This model operates as a patch-based approach,extracting overlapping patches from the noisy data and converting them into 1D vectors for input.It consists of two identical sub-networks,each configured differently.Inspired by the transformer architecture,each sub-network features an embedded block that comprises two fully connected layers,which are utilized for feature extraction from the input patches.After reshaping,a multi-head attention module enhances the model’s focus on significant features by assigning higher attention weights to them.The key difference between the two sub-networks lies in the number of neurons within their fully connected layers.The first sub-network serves as a strong denoiser with a small number of neurons,effectively attenuating seismic noise;in contrast,the second sub-network functions as a signal-add-back model,using a larger number of neurons to retrieve some of the signal that was not preserved in the output of the first sub-network.The proposed model produces two outputs,each corresponding to one of the sub-networks,and both sub-networks are optimized simultaneously using the noisy data as the label for both outputs.Evaluations conducted on both synthetic and field data demonstrate the model’s effectiveness in suppressing seismic noise with minimal signal leakage,outperforming some benchmark methods.
文摘The rapid integration of Internet of Things(IoT)technologies is reshaping the global energy landscape by deploying smart meters that enable high-resolution consumption monitoring,two-way communication,and advanced metering infrastructure services.However,this digital transformation also exposes power system to evolving threats,ranging from cyber intrusions and electricity theft to device malfunctions,and the unpredictable nature of these anomalies,coupled with the scarcity of labeled fault data,makes realtime detection exceptionally challenging.To address these difficulties,a real-time decision support framework is presented for smart meter anomality detection that leverages rolling time windows and two self-supervised contrastive learning modules.The first module synthesizes diverse negative samples to overcome the lack of labeled anomalies,while the second captures intrinsic temporal patterns for enhanced contextual discrimination.The end-to-end framework continuously updates its model with rolling updated meter data to deliver timely identification of emerging abnormal behaviors in evolving grids.Extensive evaluations on eight publicly available smart meter datasets over seven diverse abnormal patterns testing demonstrate the effectiveness of the proposed full framework,achieving average recall and F1 score of more than 0.85.
文摘Research on reconstructing imperfect faces is a challenging task.In this study,we explore a data-driven approach using a pre-trained MICA(MetrIC fAce)model combined with 3D printing to address this challenge.We propose a training strategy that utilizes the pre-trained MICA model and self-supervised learning techniques to improve accuracy and reduce the time needed for 3D facial structure reconstruction.Our results demonstrate high accuracy,evaluated by the geometric loss function and various statistical measures.To showcase the effectiveness of the approach,we used 3D printing to create a model that covers facial wounds.The findings indicate that our method produces a model that fits well and achieves comprehensive 3D facial reconstruction.This technique has the potential to aid doctors in treating patients with facial injuries.
基金supported by the Innovation Program for Quantum Science and Technology (No.2021ZD0303200)the CAS Project for Young Scientists in Basic Research (No.YSBR-049)+1 种基金the National Natural Science Foundation of China (No.62225506)the Anhui Provincial Key Research and Development Plan (No.2022b13020006)。
文摘In scientific and industrial research, three-dimensional (3D) imaging, or depth measurement, is a critical tool that provides detailed insight into surface properties. Confocal microscopy, known for its precision in surface measurements, plays a key role in this field. However, 3D imaging based on confocal microscopy is often challenged by significant data requirements and slow measurement speeds. In this paper, we present a novel self-supervised learning algorithm called SSL Depth that overcomes these challenges. Specifically, our method exploits the feature learning capabilities of neural networks while avoiding the need for labeled data sets typically associated with supervised learning approaches. Through practical demonstrations on a commercially available confocal microscope, we find that our method not only maintains higher quality, but also significantly reduces the frequency of the z-axis sampling required for 3D imaging. This reduction results in a remarkable 16×measurement speed, with the potential for further acceleration in the future. Our methodological advance enables highly efficient and accurate 3D surface reconstructions, thereby expanding the potential applications of confocal microscopy in various scientific and industrial fields.
基金This work was supported by High-grade,Precision and Advanced Discipline Construction Project of Beijing Universities,Major Projects of National Social Science Fund of China(No.21ZD19)Nation Culture and Tourism Technological Innovation Engineering Project of China.
文摘Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.
基金supported by the research project‘‘SafeDaBatt”(03EMF0409A)funded by the German Federal Ministry for Digital and Transport(BMDV)+2 种基金the National Key Research and Development Program of China(2022YFE0102700)the Key Research and Development Program of Shaanxi Province(2023-GHYB-05,2023-YBSF-104)the financial support from the China Scholarship Council(CSC)(202206567008)。
文摘Accurate aging diagnosis is crucial for the health and safety management of lithium-ion batteries in electric vehicles.Despite significant advancements achieved by data-driven methods,diagnosis accuracy remains constrained by the high costs of check-up tests and the scarcity of labeled data.This paper presents a framework utilizing self-supervised machine learning to harness the potential of unlabeled data for diagnosing battery aging in electric vehicles during field operations.We validate our method using battery degradation datasets collected over more than two years from twenty real-world electric vehicles.Our analysis comprehensively addresses cell inconsistencies,physical interpretations,and charging uncertainties in real-world applications.This is achieved through self-supervised feature extraction using random short charging sequences in the main peak of incremental capacity curves.By leveraging inexpensive unlabeled data in a self-supervised approach,our method demonstrates improvements in average root mean square errors of 74.54%and 60.50%in the best and worst cases,respectively,compared to the supervised benchmark.This work underscores the potential of employing low-cost unlabeled data with self-supervised machine learning for effective battery health and safety management in realworld scenarios.
基金funded by theNational Science and TechnologyCouncil(NSTC),Taiwan,under grant numbers NSTC 112-2634-F-019-001 and NSTC 113-2634-F-A49-007.
文摘Few-shot learning has emerged as a crucial technique for coral species classification,addressing the challenge of limited labeled data in underwater environments.This study introduces an optimized few-shot learning model that enhances classification accuracy while minimizing reliance on extensive data collection.The proposed model integrates a hybrid similarity measure combining Euclidean distance and cosine similarity,effectively capturing both feature magnitude and directional relationships.This approach achieves a notable accuracy of 71.8%under a 5-way 5-shot evaluation,outperforming state-of-the-art models such as Prototypical Networks,FEAT,and ESPT by up to 10%.Notably,the model demonstrates high precision in classifying Siderastreidae(87.52%)and Fungiidae(88.95%),underscoring its effectiveness in distinguishing subtle morphological differences.To further enhance performance,we incorporate a self-supervised learning mechanism based on contrastive learning,enabling the model to extract robust representations by leveraging local structural patterns in corals.This enhancement significantly improves classification accuracy,particularly for species with high intra-class variation,leading to an overall accuracy of 76.52%under a 5-way 10-shot evaluation.Additionally,the model exploits the repetitive structures inherent in corals,introducing a local feature aggregation strategy that refines classification through spatial information integration.Beyond its technical contributions,this study presents a scalable and efficient approach for automated coral reef monitoring,reducing annotation costs while maintaining high classification accuracy.By improving few-shot learning performance in underwater environments,our model enhances monitoring accuracy by up to 15%compared to traditional methods,offering a practical solution for large-scale coral conservation efforts.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
基金supported in part by the National Natural Science Foundation of China under Grants 62472434 and 62402171in part by the National Key Research and Development Program of China under Grant 2022YFF1203001+1 种基金in part by the Science and Technology Innovation Program of Hunan Province under Grant 2022RC3061in part by the Sci-Tech Innovation 2030 Agenda under Grant 2023ZD0508600.
文摘Computed Tomography(CT)reconstruction is essential inmedical imaging and other engineering fields.However,blurring of the projection during CT imaging can lead to artifacts in the reconstructed images.Projection blur combines factors such as larger ray sources,scattering and imaging system vibration.To address the problem,we propose DeblurTomo,a novel self-supervised learning-based deblurring and reconstruction algorithm that efficiently reconstructs sharp CT images from blurry input without needing external data and blur measurement.Specifically,we constructed a coordinate-based implicit neural representation reconstruction network,which can map the coordinates to the attenuation coefficient in the reconstructed space formore convenient ray representation.Then,wemodel the blur as aweighted sumof offset rays and design the RayCorrectionNetwork(RCN)andWeight ProposalNetwork(WPN)to fit these rays and their weights bymulti-view consistency and geometric information,thereby extending 2D deblurring to 3D space.In the training phase,we use the blurry input as the supervision signal to optimize the reconstruction network,the RCN,and the WPN simultaneously.Extensive experiments on the widely used synthetic dataset show that DeblurTomo performs superiorly on the limited-angle and sparse-view in the simulated blurred scenarios.Further experiments on real datasets demonstrate the superiority of our method in practical scenarios.
基金supported by the National Natural Science Foundation of China(42374134,42304125,U20B6005)the Science and Technology Commission of Shanghai Municipality(23JC1400502)the Fundamental Research Funds for the Central Universities.
文摘Blended acquisition offers efficiency improvements over conventional seismic data acquisition, at the cost of introducing blending noise effects. Besides, seismic data often suffers from irregularly missing shots caused by artificial or natural effects during blended acquisition. Therefore, blending noise attenuation and missing shots reconstruction are essential for providing high-quality seismic data for further seismic processing and interpretation. The iterative shrinkage thresholding algorithm can help obtain deblended data based on sparsity assumptions of complete unblended data, and it characterizes seismic data linearly. Supervised learning algorithms can effectively capture the nonlinear relationship between incomplete pseudo-deblended data and complete unblended data. However, the dependence on complete unblended labels limits their practicality in field applications. Consequently, a self-supervised algorithm is presented for simultaneous deblending and interpolation of incomplete blended data, which minimizes the difference between simulated and observed incomplete pseudo-deblended data. The used blind-trace U-Net (BTU-Net) prevents identity mapping during complete unblended data estimation. Furthermore, a multistep process with blending noise simulation-subtraction and missing traces reconstruction-insertion is used in each step to improve the deblending and interpolation performance. Experiments with synthetic and field incomplete blended data demonstrate the effectiveness of the multistep self-supervised BTU-Net algorithm.
文摘Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But, the application of deep learning in medical image analysis is limited by the scarcity of high-quality annotated medical imaging data. An emerging solution is self-supervised learning (SSL), among which contrastive SSL is the most successful approach to rivalling or outperforming supervised learning. This review investigates several state-of-the-art contrastive SSL algorithms originally on natural images as well as their adaptations for medical images, and concludes by discussing recent advances, current limitations, and future directions in applying contrastive SSL in the medical domain.