High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ...High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.展开更多
The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such...The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.展开更多
BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recogn...BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.展开更多
To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting no...To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting nonlinear integer-valued time series exhibiting a piecewise phenomenon.Specifically,we focus on the parameter estimation in the first-order Self-Exciting Threshold Integer-valued Autoregressive(SETINAR(2,1))process with symmetry,asymmetry,and contaminated innovations.We establish the asymptotic properties of the estimator under certain regularity conditions.Monte Carlo simulations demonstrate the superior performance of the QR method compared to the conditional least squares(CLS)approach.Furthermore,we validate the robustness of the proposed method through empirical quantile regression estimation and forecasting for larceny incidents and CAD drug call counts in Pittsburgh,showcasing its effectiveness across diverse levels of data heterogeneity.展开更多
Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a fa...Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.展开更多
The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for...The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.展开更多
Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for a...Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).展开更多
In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical inte...In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.展开更多
Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a r...Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.展开更多
Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence o...Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.展开更多
As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely...As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.展开更多
This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to d...This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.展开更多
In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functiona...In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functional relationship between the state variable and basic variables in reliability design. The algorithm has treated successfully some problems of implicit performance function in reliability analysis. However, its theoretical basis of empirical risk minimization narrows its range of applications for...展开更多
In the application of regression analysis method to model dam deformation, the ill-condition problem occurred in coefficient matrix always prevents an accurate modeling mainly due to the multicollinearity of the varia...In the application of regression analysis method to model dam deformation, the ill-condition problem occurred in coefficient matrix always prevents an accurate modeling mainly due to the multicollinearity of the variables. Independent component regression (ICR) was proposed to model the dam deformation and identify the physical origins of the deformation. Simulation experiment shows that ICR can successfully resolve the problem of ill-condition and produce a reliable deformation model. After that, the method is applied to model the deformation of the Wuqiangxi Dam in Hunan province, China. The result shows that ICR can not only accurately model the deformation of the dam, but also help to identify the physical factors that affect the deformation through the extracted independent components.展开更多
In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This...In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.展开更多
Branch size is a crucial characteristic,closely linked to both tree growth and wood quality.A review of existing branch size models reveals various approaches,but the ability to estimate branch diameter and length wit...Branch size is a crucial characteristic,closely linked to both tree growth and wood quality.A review of existing branch size models reveals various approaches,but the ability to estimate branch diameter and length within the same whorl remains underexplored.In this study,a total of 77 trees were sampled from Northeast China to model the vertical distribution of branch diameter and length within each whorl along the crown.Several commonly used functions were taken as the alternative model forms,and the quantile regression method was employed and compared with the classical two-step modeling approach.The analysis incorporated stand,tree,and competition factors,with a particular focus on how these factors influence branches of varying sizes.The modified Weibull function was chosen as the optimal model,due to its excellent performance across all quantiles.Eight quantile regression curves(ranging from 0.20 to 0.85)were combined to predict branch diameter,while seven curves(ranging from 0.20 to 0.80)were used for branch length.The results showed that the quantile regression method outperformed the classical approach at model fitting and validation,likely due to its ability to estimate different rates of change across the entire branch size distribution.Lager branches in each whorl were more sensitive to changes in DBH,crown length(CL),crown ratio(CR)and dominant tree height(H_(dom)),while slenderness(HDR)more effectively influenced small and medium-sized branches.The effect of stand basal area(BAS)was relatively consistent across different branch sizes.The findings indicate that quantile regression is a good way not only a more accurate method for predicting branch size but also a valuable tool for understanding how branch growth responds to stand and tree factors.The models developed in this study are prepared to be further integrated into tree growth and yield simulation system,contributing to the assessment and promotion of wood quality.展开更多
To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks o...To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks obtained from a public dataset are cropped into patches of 256 square pixels that are classified with a pre-trained deep convolution neural network,the true positives are segmented,and crack properties are extracted using two different methods.The first method is primarily based on active contour models and level-set segmentation and the second method consists of the domain adaptation of a mathematical morphology-based method known as FIL-FINDER.A statistical test has been performed for the comparison of the stated methods and a database prepared with the more suitable method.An advanced convolution neural network-based multi-output regression model has been proposed which was trained with the prepared database and validated with the held-out dataset for the prediction of crack-length,crack-width,and width-uncertainty directly from input image patches.The pro-posed model has been tested on crack patches collected from different locations.Huber loss has been used to ensure the robustness of the proposed model selected from a set of 288 different variations of it.Additionally,an ablation study has been conducted on the top 3 models that demonstrated the influence of each network component on the pre-diction results.Finally,the best performing model HHc-X among the top 3 has been proposed that predicted crack properties which are in close agreement to the ground truths in the test data.展开更多
This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial...This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.展开更多
The packaging quality of coaxial laser diodes(CLDs)plays a pivotal role in determining their optical performance and long-term reliability.As the core packaging process,high-precision laser welding requires precise co...The packaging quality of coaxial laser diodes(CLDs)plays a pivotal role in determining their optical performance and long-term reliability.As the core packaging process,high-precision laser welding requires precise control of process parameters to suppress optical power loss.However,the complex nonlinear relationship between welding parameters and optical power loss renders traditional trial-and-error methods inefficient and imprecise.To address this challenge,a physics-informed(PI)and data-driven collaboration approach for welding parameter optimization is proposed.First,thermal-fluid-solid coupling finite element method(FEM)was employed to quantify the sensitivity of welding parameters to physical characteristics,including residual stress.This analysis facilitated the identification of critical factors contributing to optical power loss.Subsequently,a Gaussian process regression(GPR)model incorporating finite element simulation prior knowledge was constructed based on the selected features.By introducing physics-informed kernel(PIK)functions,stress distribution patterns were embedded into the prediction model,achieving high-precision optical power loss prediction.Finally,a Bayesian optimization(BO)algorithm with an adaptive sampling strategy was implemented for efficient parameter space exploration.Experimental results demonstrate that the proposedmethod effectively establishes explicit physical correlations between welding parameters and optical power loss.The optimized welding parameters reduced optical power loss by 34.1%,providing theoretical guidance and technical support for reliable CLD packaging.展开更多
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(RS-2020-NR049579).
文摘High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.
基金Under the auspices of the National Natural Science Foundation of China(No.42271224,41901193)Ministry of Edu cation Humanities and Social Sciences Research Planning Fund Project of China(No.24YJAZH190)+1 种基金Anhui Province Excellent Youth Research Project in Universities(No.2022AH030019)Anhui Social Sciences Innovation Development Research Project(No.2024CXQ503)。
文摘The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.
基金Supported by High-level Professional Groups in Gangdong Province,No.GSPZYQ2020101Guangdong Province Educational Research Planning Project,No.2024GXJK742。
文摘BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.
基金supported by Social Science Planning Foundation of Liaoning Province(Grand No.L22ZD065)National Natural Science Foundation of China(Grand Nos.12271231,1247012719,12001229)。
文摘To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting nonlinear integer-valued time series exhibiting a piecewise phenomenon.Specifically,we focus on the parameter estimation in the first-order Self-Exciting Threshold Integer-valued Autoregressive(SETINAR(2,1))process with symmetry,asymmetry,and contaminated innovations.We establish the asymptotic properties of the estimator under certain regularity conditions.Monte Carlo simulations demonstrate the superior performance of the QR method compared to the conditional least squares(CLS)approach.Furthermore,we validate the robustness of the proposed method through empirical quantile regression estimation and forecasting for larceny incidents and CAD drug call counts in Pittsburgh,showcasing its effectiveness across diverse levels of data heterogeneity.
文摘Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.
基金funded by the INTER program and cofunded by the Fond National de la Recherche,Luxembourg(FNR)and the Fund for Scientific Research-FNRS,Belgium(F.R.S-FNRS),T.0233.20-‘Sustainable Residential Densification’project(SusDens,2020–2024).
文摘The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.
基金supported by the Natural Science Foundation of Shanghai(23ZR1463600)Shanghai Pudong New Area Health Commission Research Project(PW2021A-69)Research Project of Clinical Research Center of Shanghai Health Medical University(22MC2022002)。
文摘Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).
基金supported by the National Natural Science Foundation of China(No.92370117)the CAS Project for Young Scientists in Basic Research(No.YSBR-090)。
文摘In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.
基金supported by the Youth Scientific Research Project of Fujian Provincial Center for Disease Control and Prevention(2022QN02)the Fujian Provincial Health Youth Scientific Research Project(2023QNA040).
文摘Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.
文摘Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.
基金supported by the National Natural Science Foundation of China(62375013).
文摘As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.
基金supported by DST-FIST(Government of India)(Grant No.SR/FIST/MS-1/2017/13)and Seed Money Project(Grant No.DoRDC/733).
文摘This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金National High-tech Research and Development Pro-gram (2006AA04Z405)
文摘In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functional relationship between the state variable and basic variables in reliability design. The algorithm has treated successfully some problems of implicit performance function in reliability analysis. However, its theoretical basis of empirical risk minimization narrows its range of applications for...
基金Project(41074004)supported by the National Natural Science Foundation of ChinaProject(2013CB733303)supported by the National Basic Research Program of China
文摘In the application of regression analysis method to model dam deformation, the ill-condition problem occurred in coefficient matrix always prevents an accurate modeling mainly due to the multicollinearity of the variables. Independent component regression (ICR) was proposed to model the dam deformation and identify the physical origins of the deformation. Simulation experiment shows that ICR can successfully resolve the problem of ill-condition and produce a reliable deformation model. After that, the method is applied to model the deformation of the Wuqiangxi Dam in Hunan province, China. The result shows that ICR can not only accurately model the deformation of the dam, but also help to identify the physical factors that affect the deformation through the extracted independent components.
基金Supported by the Natural Science Foundation of Fujian Province(2022J011177,2024J01903)the Key Project of Fujian Provincial Education Department(JZ230054)。
文摘In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.
基金supported by the Young Scientists Fund of the National Key R&D Program of China(No.2022YFD2201800)the Youth Science Fund Program of National Natural Science Foundation of China(No.32301581)+2 种基金the Joint Funds for Regional Innovation and Development of the National Natural Science Foundation of China(No.U21A20244)the China Postdoctoral Science Foundation(No.2024M750383)the Heilongjiang Touyan Innovation Team Program(Technology Development Team for High-Efficiency Silviculture of Forest Resources).
文摘Branch size is a crucial characteristic,closely linked to both tree growth and wood quality.A review of existing branch size models reveals various approaches,but the ability to estimate branch diameter and length within the same whorl remains underexplored.In this study,a total of 77 trees were sampled from Northeast China to model the vertical distribution of branch diameter and length within each whorl along the crown.Several commonly used functions were taken as the alternative model forms,and the quantile regression method was employed and compared with the classical two-step modeling approach.The analysis incorporated stand,tree,and competition factors,with a particular focus on how these factors influence branches of varying sizes.The modified Weibull function was chosen as the optimal model,due to its excellent performance across all quantiles.Eight quantile regression curves(ranging from 0.20 to 0.85)were combined to predict branch diameter,while seven curves(ranging from 0.20 to 0.80)were used for branch length.The results showed that the quantile regression method outperformed the classical approach at model fitting and validation,likely due to its ability to estimate different rates of change across the entire branch size distribution.Lager branches in each whorl were more sensitive to changes in DBH,crown length(CL),crown ratio(CR)and dominant tree height(H_(dom)),while slenderness(HDR)more effectively influenced small and medium-sized branches.The effect of stand basal area(BAS)was relatively consistent across different branch sizes.The findings indicate that quantile regression is a good way not only a more accurate method for predicting branch size but also a valuable tool for understanding how branch growth responds to stand and tree factors.The models developed in this study are prepared to be further integrated into tree growth and yield simulation system,contributing to the assessment and promotion of wood quality.
文摘To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks obtained from a public dataset are cropped into patches of 256 square pixels that are classified with a pre-trained deep convolution neural network,the true positives are segmented,and crack properties are extracted using two different methods.The first method is primarily based on active contour models and level-set segmentation and the second method consists of the domain adaptation of a mathematical morphology-based method known as FIL-FINDER.A statistical test has been performed for the comparison of the stated methods and a database prepared with the more suitable method.An advanced convolution neural network-based multi-output regression model has been proposed which was trained with the prepared database and validated with the held-out dataset for the prediction of crack-length,crack-width,and width-uncertainty directly from input image patches.The pro-posed model has been tested on crack patches collected from different locations.Huber loss has been used to ensure the robustness of the proposed model selected from a set of 288 different variations of it.Additionally,an ablation study has been conducted on the top 3 models that demonstrated the influence of each network component on the pre-diction results.Finally,the best performing model HHc-X among the top 3 has been proposed that predicted crack properties which are in close agreement to the ground truths in the test data.
文摘This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.
基金funded by the National Key R&D Program of China,Grant No.2024YFF0504904.
文摘The packaging quality of coaxial laser diodes(CLDs)plays a pivotal role in determining their optical performance and long-term reliability.As the core packaging process,high-precision laser welding requires precise control of process parameters to suppress optical power loss.However,the complex nonlinear relationship between welding parameters and optical power loss renders traditional trial-and-error methods inefficient and imprecise.To address this challenge,a physics-informed(PI)and data-driven collaboration approach for welding parameter optimization is proposed.First,thermal-fluid-solid coupling finite element method(FEM)was employed to quantify the sensitivity of welding parameters to physical characteristics,including residual stress.This analysis facilitated the identification of critical factors contributing to optical power loss.Subsequently,a Gaussian process regression(GPR)model incorporating finite element simulation prior knowledge was constructed based on the selected features.By introducing physics-informed kernel(PIK)functions,stress distribution patterns were embedded into the prediction model,achieving high-precision optical power loss prediction.Finally,a Bayesian optimization(BO)algorithm with an adaptive sampling strategy was implemented for efficient parameter space exploration.Experimental results demonstrate that the proposedmethod effectively establishes explicit physical correlations between welding parameters and optical power loss.The optimized welding parameters reduced optical power loss by 34.1%,providing theoretical guidance and technical support for reliable CLD packaging.