Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlo...Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.展开更多
Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)t...Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.展开更多
This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to d...This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.展开更多
The classification of respiratory sounds is crucial in diagnosing and monitoring respiratory diseases.However,auscultation is highly subjective,making it challenging to analyze respiratory sounds accurately.Although d...The classification of respiratory sounds is crucial in diagnosing and monitoring respiratory diseases.However,auscultation is highly subjective,making it challenging to analyze respiratory sounds accurately.Although deep learning has been increasingly applied to this task,most existing approaches have primarily relied on supervised learning.Since supervised learning requires large amounts of labeled data,recent studies have explored self-supervised and semi-supervised methods to overcome this limitation.However,these approaches have largely assumed a closedset setting,where the classes present in the unlabeled data are considered identical to those in the labeled data.In contrast,this study explores an open-set semi-supervised learning setting,where the unlabeled data may contain additional,unknown classes.To address this challenge,a distance-based prototype network is employed to classify respiratory sounds in an open-set setting.In the first stage,the prototype network is trained using labeled and unlabeled data to derive prototype representations of known classes.In the second stage,distances between unlabeled data and known class prototypes are computed,and samples exceeding an adaptive threshold are identified as unknown.A new prototype is then calculated for this unknown class.In the final stage,semi-supervised learning is employed to classify labeled and unlabeled data into known and unknown classes.Compared to conventional closed-set semisupervised learning approaches,the proposed method achieved an average classification accuracy improvement of 2%–5%.Additionally,in cases of data scarcity,utilizing unlabeled data further improved classification performance by 6%–8%.The findings of this study are expected to significantly enhance respiratory sound classification performance in practical clinical settings.展开更多
In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
Semi-supervised learning(SSL)aims to improve performance by exploiting unlabeled data when labels are scarce.Conventional SSL studies typically assume close environments where important factors(e.g.,label,feature,dist...Semi-supervised learning(SSL)aims to improve performance by exploiting unlabeled data when labels are scarce.Conventional SSL studies typically assume close environments where important factors(e.g.,label,feature,distribution)between labeled and unlabeled data are consistent.However,more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent.It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation,even worse than the simple supervised learning baseline.Manually verifying the quality of unlabeled data is not desirable,therefore,it is important to study robust SSL with inconsistent unlabeled data in open environments.This paper briefly introduces some advances in this line of research,focusing on techniques concerning label,feature,and data distribution inconsistency in SSL,and presents the evaluation benchmarks.Open research problems are also discussed for reference purposes.展开更多
The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for...The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.展开更多
Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a fa...Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.展开更多
Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,l...Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,labeled data is very scarce due to patient privacy concerns.For researchers,obtaining high-quality labeled images is exceedingly challenging because it involves manual annotation and clinical understanding.In addition,skin datasets are highly suitable for medical image classification studies due to the inter-class relationships and the inter-class similarities of skin lesions.In this paper,we propose a model called Coalition Sample Relation Consistency(CSRC),a consistency-based method that leverages Canonical Correlation Analysis(CCA)to capture the intrinsic relationships between samples.Considering that traditional consistency-based models only focus on the consistency of prediction,we additionally explore the similarity between features by using CCA.We enforce feature relation consistency based on traditional models,encouraging the model to learn more meaningful information from unlabeled data.Finally,considering that cross-entropy loss is not as suitable as the supervised loss when studying with imbalanced datasets(i.e.,ISIC 2017 and ISIC 2018),we improve the supervised loss to achieve better classification accuracy.Our study shows that this model performs better than many semi-supervised methods.展开更多
In the realm of medical image segmentation,particularly in cardiac magnetic resonance imaging(MRI),achieving robust performance with limited annotated data is a significant challenge.Performance often degrades when fa...In the realm of medical image segmentation,particularly in cardiac magnetic resonance imaging(MRI),achieving robust performance with limited annotated data is a significant challenge.Performance often degrades when faced with testing scenarios from unknown domains.To address this problem,this paper proposes a novel semi-supervised approach for cardiac magnetic resonance image segmentation,aiming to enhance predictive capabilities and domain generalization(DG).This paper establishes an MT-like model utilizing pseudo-labeling and consistency regularization from semi-supervised learning,and integrates uncertainty estimation to improve the accuracy of pseudo-labels.Additionally,to tackle the challenge of domain generalization,a data manipulation strategy is introduced,extracting spatial and content-related information from images across different domains,enriching the dataset with a multi-domain perspective.This papers method is meticulously evaluated on the publicly available cardiac magnetic resonance imaging dataset M&Ms,validating its effectiveness.Comparative analyses against various methods highlight the out-standing performance of this papers approach,demonstrating its capability to segment cardiac magnetic resonance images in previously unseen domains even with limited annotated data.展开更多
Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for a...Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).展开更多
Semi-supervised clustering techniques attempt to improve clustering accuracy by utilizing a limited number of labeled data for guidance.This method effectively integrates prior knowledge using pre-labeled data.While s...Semi-supervised clustering techniques attempt to improve clustering accuracy by utilizing a limited number of labeled data for guidance.This method effectively integrates prior knowledge using pre-labeled data.While semi-supervised fuzzy clustering(SSFC)methods leverage limited labeled data to enhance accuracy,they remain highly susceptible to inappropriate or mislabeled prior knowledge,especially in noisy or overlapping datasets where cluster boundaries are ambiguous.To enhance the effectiveness of clustering algorithms,it is essential to leverage labeled data while ensuring the safety of the previous knowledge.Existing solutions,such as the Trusted Safe Semi-Supervised Fuzzy Clustering Method(TS3FCM),struggle with random centroid initialization,fixed neighbor radius formulas,and handling outliers or noise at cluster overlaps.A new framework called Active Safe Semi-Supervised Fuzzy Clustering with Pairwise Constraints Based on Cluster Boundary(AS3FCPC)is proposed in this paper to deal with these problems.It does this by combining pairwise constraints and active learning.AS3FCPC uses active learning to query only the most informative data instances close to the cluster boundaries.It also uses pairwise constraints to enforce the cluster structure,which makes the system more accurate and robust.Extensive test results on diverse datasets,including challenging noisy and overlapping scenarios,demonstrate that AS3FCPC consistently achieves superior performance compared to state-of-the-art methods like TS3FCM and other baselines,especially when the data is noisy and overlaps.This significant improvement underscores AS3FCPC’s potential for reliable and accurate semisupervised fuzzy clustering in complex,real-world applications,particularly by effectively managing mislabeled data and ambiguous cluster boundaries.展开更多
Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Alth...Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.展开更多
Medical image segmentation is a crucial task in clinical applications.However,obtaining labeled data for medical images is often challenging.This has led to the appeal of semi-supervised learning(SSL),a technique adep...Medical image segmentation is a crucial task in clinical applications.However,obtaining labeled data for medical images is often challenging.This has led to the appeal of semi-supervised learning(SSL),a technique adept at leveraging a modest amount of labeled data.Nonetheless,most prevailing SSL segmentation methods for medical images either rely on the single consistency training method or directly fine-tune SSL methods designed for natural images.In this paper,we propose an innovative semi-supervised method called multi-consistency training(MCT)for medical image segmentation.Our approach transcends the constraints of prior methodologies by considering consistency from a dual perspective:output consistency across different up-sampling methods and output consistency of the same data within the same network under various perturbations to the intermediate features.We design distinct semi-supervised loss regression methods for these two types of consistencies.To enhance the application of our MCT model,we also develop a dedicated decoder as the core of our neural network.Thorough experiments were conducted on the polyp dataset and the dental dataset,rigorously compared against other SSL methods.Experimental results demonstrate the superiority of our approach,achieving higher segmentation accuracy.Moreover,comprehensive ablation studies and insightful discussion substantiate the efficacy of our approach in navigating the intricacies of medical image segmentation.展开更多
In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical inte...In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.展开更多
Predicting blasting quality during tunnel construction holds practical significance.In this study,a new semi-supervised learning method using convolutional variational autoencoder(CVAE)and deep neural network(DNN)is p...Predicting blasting quality during tunnel construction holds practical significance.In this study,a new semi-supervised learning method using convolutional variational autoencoder(CVAE)and deep neural network(DNN)is proposed for the prediction of blasting quality grades.Tunnel blasting quality can be measured by over/under excavation.The occurrence of over/under excavation is influenced by three factors:geological conditions,blasting parameters,and tunnel geometric dimensions.The proposed method reflects the geological conditions through measurements while drilling and utilizes blasting parameters,tunnel geometric dimensions,and tunnel depth as input variables to achieve tunnel blasting quality grades prediction.Furthermore,the model is optimized by considering the influence of surrounding rock mass features on the predicted positions.The results demonstrate that the proposed method outperforms other commonly used machine learning and deep learning algorithms in extracting over/under excavation feature information and achieving blasting quality prediction.展开更多
Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a r...Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.展开更多
This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial...This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.展开更多
In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This...In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.展开更多
High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data...High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.展开更多
基金supported by the National Natural Science Foundation of China(No.52207229)the Key Research and Development Program of Ningxia Hui Autonomous Region of China(No.2024BEE02003)+1 种基金the financial support from the AEGiS Research Grant 2024,University of Wollongong(No.R6254)the financial support from the China Scholarship Council(No.202207550010).
文摘Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.
基金supported by the Natural Science Foundation of China(No.41804112,author:Chengyun Song).
文摘Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.
基金supported by DST-FIST(Government of India)(Grant No.SR/FIST/MS-1/2017/13)and Seed Money Project(Grant No.DoRDC/733).
文摘This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.
基金supported by Innovative Human Resource Development for Local Intellectualization Programthrough the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(IITP-2025-RS-2022-00156360).
文摘The classification of respiratory sounds is crucial in diagnosing and monitoring respiratory diseases.However,auscultation is highly subjective,making it challenging to analyze respiratory sounds accurately.Although deep learning has been increasingly applied to this task,most existing approaches have primarily relied on supervised learning.Since supervised learning requires large amounts of labeled data,recent studies have explored self-supervised and semi-supervised methods to overcome this limitation.However,these approaches have largely assumed a closedset setting,where the classes present in the unlabeled data are considered identical to those in the labeled data.In contrast,this study explores an open-set semi-supervised learning setting,where the unlabeled data may contain additional,unknown classes.To address this challenge,a distance-based prototype network is employed to classify respiratory sounds in an open-set setting.In the first stage,the prototype network is trained using labeled and unlabeled data to derive prototype representations of known classes.In the second stage,distances between unlabeled data and known class prototypes are computed,and samples exceeding an adaptive threshold are identified as unknown.A new prototype is then calculated for this unknown class.In the final stage,semi-supervised learning is employed to classify labeled and unlabeled data into known and unknown classes.Compared to conventional closed-set semisupervised learning approaches,the proposed method achieved an average classification accuracy improvement of 2%–5%.Additionally,in cases of data scarcity,utilizing unlabeled data further improved classification performance by 6%–8%.The findings of this study are expected to significantly enhance respiratory sound classification performance in practical clinical settings.
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金supported by the Key Program of Jiangsu Science Foundation(BK20243012)the National Natural Science Foundation of China(NSFC)(Grant Nos.62306133,62176118).
文摘Semi-supervised learning(SSL)aims to improve performance by exploiting unlabeled data when labels are scarce.Conventional SSL studies typically assume close environments where important factors(e.g.,label,feature,distribution)between labeled and unlabeled data are consistent.However,more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent.It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation,even worse than the simple supervised learning baseline.Manually verifying the quality of unlabeled data is not desirable,therefore,it is important to study robust SSL with inconsistent unlabeled data in open environments.This paper briefly introduces some advances in this line of research,focusing on techniques concerning label,feature,and data distribution inconsistency in SSL,and presents the evaluation benchmarks.Open research problems are also discussed for reference purposes.
基金funded by the INTER program and cofunded by the Fond National de la Recherche,Luxembourg(FNR)and the Fund for Scientific Research-FNRS,Belgium(F.R.S-FNRS),T.0233.20-‘Sustainable Residential Densification’project(SusDens,2020–2024).
文摘The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.
文摘Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.
基金sponsored by the National Natural Science Foundation of China Grant No.62271302the Shanghai Municipal Natural Science Foundation Grant 20ZR1423500.
文摘Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,labeled data is very scarce due to patient privacy concerns.For researchers,obtaining high-quality labeled images is exceedingly challenging because it involves manual annotation and clinical understanding.In addition,skin datasets are highly suitable for medical image classification studies due to the inter-class relationships and the inter-class similarities of skin lesions.In this paper,we propose a model called Coalition Sample Relation Consistency(CSRC),a consistency-based method that leverages Canonical Correlation Analysis(CCA)to capture the intrinsic relationships between samples.Considering that traditional consistency-based models only focus on the consistency of prediction,we additionally explore the similarity between features by using CCA.We enforce feature relation consistency based on traditional models,encouraging the model to learn more meaningful information from unlabeled data.Finally,considering that cross-entropy loss is not as suitable as the supervised loss when studying with imbalanced datasets(i.e.,ISIC 2017 and ISIC 2018),we improve the supervised loss to achieve better classification accuracy.Our study shows that this model performs better than many semi-supervised methods.
基金Supported by the National Natural Science Foundation of China(No.62001313)the Key Project of Liaoning Provincial Department of Science and Technology(No.2021JH2/10300134,2022JH1/10500004)。
文摘In the realm of medical image segmentation,particularly in cardiac magnetic resonance imaging(MRI),achieving robust performance with limited annotated data is a significant challenge.Performance often degrades when faced with testing scenarios from unknown domains.To address this problem,this paper proposes a novel semi-supervised approach for cardiac magnetic resonance image segmentation,aiming to enhance predictive capabilities and domain generalization(DG).This paper establishes an MT-like model utilizing pseudo-labeling and consistency regularization from semi-supervised learning,and integrates uncertainty estimation to improve the accuracy of pseudo-labels.Additionally,to tackle the challenge of domain generalization,a data manipulation strategy is introduced,extracting spatial and content-related information from images across different domains,enriching the dataset with a multi-domain perspective.This papers method is meticulously evaluated on the publicly available cardiac magnetic resonance imaging dataset M&Ms,validating its effectiveness.Comparative analyses against various methods highlight the out-standing performance of this papers approach,demonstrating its capability to segment cardiac magnetic resonance images in previously unseen domains even with limited annotated data.
基金supported by the Natural Science Foundation of Shanghai(23ZR1463600)Shanghai Pudong New Area Health Commission Research Project(PW2021A-69)Research Project of Clinical Research Center of Shanghai Health Medical University(22MC2022002)。
文摘Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).
文摘Semi-supervised clustering techniques attempt to improve clustering accuracy by utilizing a limited number of labeled data for guidance.This method effectively integrates prior knowledge using pre-labeled data.While semi-supervised fuzzy clustering(SSFC)methods leverage limited labeled data to enhance accuracy,they remain highly susceptible to inappropriate or mislabeled prior knowledge,especially in noisy or overlapping datasets where cluster boundaries are ambiguous.To enhance the effectiveness of clustering algorithms,it is essential to leverage labeled data while ensuring the safety of the previous knowledge.Existing solutions,such as the Trusted Safe Semi-Supervised Fuzzy Clustering Method(TS3FCM),struggle with random centroid initialization,fixed neighbor radius formulas,and handling outliers or noise at cluster overlaps.A new framework called Active Safe Semi-Supervised Fuzzy Clustering with Pairwise Constraints Based on Cluster Boundary(AS3FCPC)is proposed in this paper to deal with these problems.It does this by combining pairwise constraints and active learning.AS3FCPC uses active learning to query only the most informative data instances close to the cluster boundaries.It also uses pairwise constraints to enforce the cluster structure,which makes the system more accurate and robust.Extensive test results on diverse datasets,including challenging noisy and overlapping scenarios,demonstrate that AS3FCPC consistently achieves superior performance compared to state-of-the-art methods like TS3FCM and other baselines,especially when the data is noisy and overlaps.This significant improvement underscores AS3FCPC’s potential for reliable and accurate semisupervised fuzzy clustering in complex,real-world applications,particularly by effectively managing mislabeled data and ambiguous cluster boundaries.
文摘Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.
基金the Innovation Program of Shanghai Industrial Synergy(No.XTCX-KJ-2023-2-12)。
文摘Medical image segmentation is a crucial task in clinical applications.However,obtaining labeled data for medical images is often challenging.This has led to the appeal of semi-supervised learning(SSL),a technique adept at leveraging a modest amount of labeled data.Nonetheless,most prevailing SSL segmentation methods for medical images either rely on the single consistency training method or directly fine-tune SSL methods designed for natural images.In this paper,we propose an innovative semi-supervised method called multi-consistency training(MCT)for medical image segmentation.Our approach transcends the constraints of prior methodologies by considering consistency from a dual perspective:output consistency across different up-sampling methods and output consistency of the same data within the same network under various perturbations to the intermediate features.We design distinct semi-supervised loss regression methods for these two types of consistencies.To enhance the application of our MCT model,we also develop a dedicated decoder as the core of our neural network.Thorough experiments were conducted on the polyp dataset and the dental dataset,rigorously compared against other SSL methods.Experimental results demonstrate the superiority of our approach,achieving higher segmentation accuracy.Moreover,comprehensive ablation studies and insightful discussion substantiate the efficacy of our approach in navigating the intricacies of medical image segmentation.
基金supported by the National Natural Science Foundation of China(No.92370117)the CAS Project for Young Scientists in Basic Research(No.YSBR-090)。
文摘In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.
基金financially supported by the Science and Technology Research and Development Project of China Railway Corporation(Grant No.N2023G079)the National Key R&D Program of China(Grant No.2024YFE0198500).
文摘Predicting blasting quality during tunnel construction holds practical significance.In this study,a new semi-supervised learning method using convolutional variational autoencoder(CVAE)and deep neural network(DNN)is proposed for the prediction of blasting quality grades.Tunnel blasting quality can be measured by over/under excavation.The occurrence of over/under excavation is influenced by three factors:geological conditions,blasting parameters,and tunnel geometric dimensions.The proposed method reflects the geological conditions through measurements while drilling and utilizes blasting parameters,tunnel geometric dimensions,and tunnel depth as input variables to achieve tunnel blasting quality grades prediction.Furthermore,the model is optimized by considering the influence of surrounding rock mass features on the predicted positions.The results demonstrate that the proposed method outperforms other commonly used machine learning and deep learning algorithms in extracting over/under excavation feature information and achieving blasting quality prediction.
基金supported by the Youth Scientific Research Project of Fujian Provincial Center for Disease Control and Prevention(2022QN02)the Fujian Provincial Health Youth Scientific Research Project(2023QNA040).
文摘Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.
文摘This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.
基金Supported by the Natural Science Foundation of Fujian Province(2022J011177,2024J01903)the Key Project of Fujian Provincial Education Department(JZ230054)。
文摘In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.
基金Supported by the Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of Chi-na(LHZY24A010002)the MOE Project of Humanities and Social Sciences(21YJCZH235).
文摘High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.