High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ...High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.展开更多
This paper examines whether the parametric regression model is correctly specified for both source and target data and whether the regression pattern in the source domain aligns with that of the target domain.This eva...This paper examines whether the parametric regression model is correctly specified for both source and target data and whether the regression pattern in the source domain aligns with that of the target domain.This evaluation is a critical prerequisite for applying model-based transfer learning methods under covariate shift assumptions.Traditional regression model checks and twosample regression tests are insufficient to address this issue.To overcome these limitations,the authors propose a novel adaptive-to-regression test statistic that is asymptotically distribution-free.Under the null hypothesis,the test follows a chi-square weak limit,preserving the significance level and enabling critical value determination without resampling techniques.Additionally,the authors systematically analyze the test's power performance,highlighting its sensitivity to different sub-local alternatives that deviate from the null hypothesis.Numerical studies,including simulations,assess finite-sample performance,and a real-world data example is provided for illustration.展开更多
Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated...Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated by Brent Crude’s trajectory from pandemic-induced negative pricing to geopolitically driven surges exceeding USD 100 per barrel. This study addresses these complexities through an integrated methodological framework applied to medium-permeability sandstone reservoirs in the Xinjiang oilfield by combining advanced numerical simulations with multivariate regression analysis. The methodology employs Latin Hypercube Sampling (LHS) to stratify geological parameter distributions and constructs heterogeneous reservoir models using Petrel software, rigorously validated through historical production data matching. Production forecasting integrates numerical simulation and Decline Curve Analysis (DCA), while investment estimation utilizes Ordinary Least Squares (OLS) regression to correlate engineering parameters with drilling and completion costs. Economic evaluation incorporates Discounted Cash Flow (DCF) modeling and breakeven analysis, establishing techno-economic boundaries via oil price sensitivity analysis ranging from USD 40 to 90 per barrel. Visualization tools, including 3D heatmaps, delineate nonlinear interactions among engineering, geological, and investment datasets under economic constraints. Key findings demonstrate that for the target reservoirs, as oil prices increase from USD 40 to USD 90 per barrel, the minimum economic thickness threshold decreases from approximately 5.7 m to about 2.5 m, with model prediction errors consistently below 25% across validation datasets. This framework provides scientifically grounded decision support for optimizing capital allocation and offers actionable insights to enhance undeveloped hydrocarbon development planning amid market uncertainty. Ultimately, it supports national energy security through technically robust and economically viable resource exploitation strategies.展开更多
Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
Urban Heat Island(UHI)effects are exacerbated by the expansion of impervious surfaces and loss of vegetation in urban centers,leading to elevated air and surface temperatures and reduced thermal comfort.Urban trees,th...Urban Heat Island(UHI)effects are exacerbated by the expansion of impervious surfaces and loss of vegetation in urban centers,leading to elevated air and surface temperatures and reduced thermal comfort.Urban trees,through shading and evapotranspiration,are among the most effective Nature-based Solutions(NbS)for passive cooling.This study assesses the cooling potential of selected tree species by analyzing their morphological and physiological traits using a combination of ENVI-met microclimate simulations and multiple regression modeling.A total of 15 urban tree species were selected from the literature and analyzed based on their dependency of their cooling efficacy.Later validated in urban setting by Envi-met simulations.Key traits,such as Leaf Area Index(LAI),canopy density,transpiration rate,tree height,rooting depth,and water availability,were analyzed.Multiple linear regression analysis was conducted to quantify the contribution of each trait to ambient temperature reduction.Results revealed that LAI(R^(2)=0.76,p<0.001)and transpiration rate(R^(2)=0.71,p<0.001)were the most significant predictors of daytime cooling,while canopy openness and tree height were more strongly correlated with nighttime heat dissipation.High-performing species,such as Ficus benghalensis,Azadirachta indica,and Samanea saman,demonstrated a maximum temperature reduction of 2.5-4.2℃,especially in compact,low-rise,and mid-rise zones.The study provides a quantitative trait-based framework for tree selection in urban greening initiatives and offers evidence to guide landscape planning and UHI mitigation strategies through scientifically informed plantation design.展开更多
In response to the challenges of inadequate predictive accuracy and limited generalization capability in data-driven modeling for the mechanical properties of the cold-rolled strip steel,a predictive modeling method n...In response to the challenges of inadequate predictive accuracy and limited generalization capability in data-driven modeling for the mechanical properties of the cold-rolled strip steel,a predictive modeling method named RFR-WOA is developed based on random forest regression(RFR)and whale optimization algorithm(WOA).Firstly,using Pearson and Spearman correlation analysis and Gini coefficient importance ranking on an actual production dataset containing 37,878 samples,22 key variables are selected as model inputs from 112 variables that affect mechanical properties.Subsequently,an RFR-based predictive model for the mechanical properties of cold-rolled strip steel is constructed.Then,with the combination of the coefficient of determination(R^(2))and root mean square error as the optimization objective,the hyperparameters of RFR model are iteratively optimized using WOA,and better predictive effectiveness is obtained.Finally,the mechanical properties prediction model based on RFR-WOA is compared with models established using deep neural networks,convolutional neural networks,and other methods.The test results on 9469 samples of actual production data show that the model developed present has better predictive accuracy and generalization capability.展开更多
The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such...The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.展开更多
BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recogn...BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.展开更多
Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a r...Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.展开更多
This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to d...This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.展开更多
Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence o...Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.展开更多
As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely...As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.展开更多
In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for...The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.展开更多
Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a fa...Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.展开更多
To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting no...To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting nonlinear integer-valued time series exhibiting a piecewise phenomenon.Specifically,we focus on the parameter estimation in the first-order Self-Exciting Threshold Integer-valued Autoregressive(SETINAR(2,1))process with symmetry,asymmetry,and contaminated innovations.We establish the asymptotic properties of the estimator under certain regularity conditions.Monte Carlo simulations demonstrate the superior performance of the QR method compared to the conditional least squares(CLS)approach.Furthermore,we validate the robustness of the proposed method through empirical quantile regression estimation and forecasting for larceny incidents and CAD drug call counts in Pittsburgh,showcasing its effectiveness across diverse levels of data heterogeneity.展开更多
Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for a...Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).展开更多
In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical inte...In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.展开更多
This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial...This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.展开更多
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(RS-2020-NR049579).
文摘High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.
基金supported by the Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science(East China Normal University),Ministry of Educationsupported by the National Natural Scientific Foundation of China under Grant No.NSFC12131006the Scientific and Technological Innovation Project of China Academy of Chinese Medical Science under Grant No.CI2023C063YLL。
文摘This paper examines whether the parametric regression model is correctly specified for both source and target data and whether the regression pattern in the source domain aligns with that of the target domain.This evaluation is a critical prerequisite for applying model-based transfer learning methods under covariate shift assumptions.Traditional regression model checks and twosample regression tests are insufficient to address this issue.To overcome these limitations,the authors propose a novel adaptive-to-regression test statistic that is asymptotically distribution-free.Under the null hypothesis,the test follows a chi-square weak limit,preserving the significance level and enabling critical value determination without resampling techniques.Additionally,the authors systematically analyze the test's power performance,highlighting its sensitivity to different sub-local alternatives that deviate from the null hypothesis.Numerical studies,including simulations,assess finite-sample performance,and a real-world data example is provided for illustration.
文摘Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated by Brent Crude’s trajectory from pandemic-induced negative pricing to geopolitically driven surges exceeding USD 100 per barrel. This study addresses these complexities through an integrated methodological framework applied to medium-permeability sandstone reservoirs in the Xinjiang oilfield by combining advanced numerical simulations with multivariate regression analysis. The methodology employs Latin Hypercube Sampling (LHS) to stratify geological parameter distributions and constructs heterogeneous reservoir models using Petrel software, rigorously validated through historical production data matching. Production forecasting integrates numerical simulation and Decline Curve Analysis (DCA), while investment estimation utilizes Ordinary Least Squares (OLS) regression to correlate engineering parameters with drilling and completion costs. Economic evaluation incorporates Discounted Cash Flow (DCF) modeling and breakeven analysis, establishing techno-economic boundaries via oil price sensitivity analysis ranging from USD 40 to 90 per barrel. Visualization tools, including 3D heatmaps, delineate nonlinear interactions among engineering, geological, and investment datasets under economic constraints. Key findings demonstrate that for the target reservoirs, as oil prices increase from USD 40 to USD 90 per barrel, the minimum economic thickness threshold decreases from approximately 5.7 m to about 2.5 m, with model prediction errors consistently below 25% across validation datasets. This framework provides scientifically grounded decision support for optimizing capital allocation and offers actionable insights to enhance undeveloped hydrocarbon development planning amid market uncertainty. Ultimately, it supports national energy security through technically robust and economically viable resource exploitation strategies.
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
文摘Urban Heat Island(UHI)effects are exacerbated by the expansion of impervious surfaces and loss of vegetation in urban centers,leading to elevated air and surface temperatures and reduced thermal comfort.Urban trees,through shading and evapotranspiration,are among the most effective Nature-based Solutions(NbS)for passive cooling.This study assesses the cooling potential of selected tree species by analyzing their morphological and physiological traits using a combination of ENVI-met microclimate simulations and multiple regression modeling.A total of 15 urban tree species were selected from the literature and analyzed based on their dependency of their cooling efficacy.Later validated in urban setting by Envi-met simulations.Key traits,such as Leaf Area Index(LAI),canopy density,transpiration rate,tree height,rooting depth,and water availability,were analyzed.Multiple linear regression analysis was conducted to quantify the contribution of each trait to ambient temperature reduction.Results revealed that LAI(R^(2)=0.76,p<0.001)and transpiration rate(R^(2)=0.71,p<0.001)were the most significant predictors of daytime cooling,while canopy openness and tree height were more strongly correlated with nighttime heat dissipation.High-performing species,such as Ficus benghalensis,Azadirachta indica,and Samanea saman,demonstrated a maximum temperature reduction of 2.5-4.2℃,especially in compact,low-rise,and mid-rise zones.The study provides a quantitative trait-based framework for tree selection in urban greening initiatives and offers evidence to guide landscape planning and UHI mitigation strategies through scientifically informed plantation design.
基金supported by National Natural Science Foundation of China(Grant 62573375)the Natural Science Foundation of Hebei Province(Grant F2024203038)+2 种基金the Science and Technology Research and Development Plan Project of Qinhuangdao City(Grant 202302B048)the Provincial Key Laboratory Performance Subsidy Project(Grant 22567612H)the Shandong Provincial Natural Science Foundation Youth Project(ZR2023QF044)。
文摘In response to the challenges of inadequate predictive accuracy and limited generalization capability in data-driven modeling for the mechanical properties of the cold-rolled strip steel,a predictive modeling method named RFR-WOA is developed based on random forest regression(RFR)and whale optimization algorithm(WOA).Firstly,using Pearson and Spearman correlation analysis and Gini coefficient importance ranking on an actual production dataset containing 37,878 samples,22 key variables are selected as model inputs from 112 variables that affect mechanical properties.Subsequently,an RFR-based predictive model for the mechanical properties of cold-rolled strip steel is constructed.Then,with the combination of the coefficient of determination(R^(2))and root mean square error as the optimization objective,the hyperparameters of RFR model are iteratively optimized using WOA,and better predictive effectiveness is obtained.Finally,the mechanical properties prediction model based on RFR-WOA is compared with models established using deep neural networks,convolutional neural networks,and other methods.The test results on 9469 samples of actual production data show that the model developed present has better predictive accuracy and generalization capability.
基金Under the auspices of the National Natural Science Foundation of China(No.42271224,41901193)Ministry of Edu cation Humanities and Social Sciences Research Planning Fund Project of China(No.24YJAZH190)+1 种基金Anhui Province Excellent Youth Research Project in Universities(No.2022AH030019)Anhui Social Sciences Innovation Development Research Project(No.2024CXQ503)。
文摘The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.
基金Supported by High-level Professional Groups in Gangdong Province,No.GSPZYQ2020101Guangdong Province Educational Research Planning Project,No.2024GXJK742。
文摘BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.
基金supported by the Youth Scientific Research Project of Fujian Provincial Center for Disease Control and Prevention(2022QN02)the Fujian Provincial Health Youth Scientific Research Project(2023QNA040).
文摘Background:The COVID-1’s impact on influenza activity is of interest to inform future flu prevention and control strategies.Our study aim to examine COVID-19’s effects on influenza in Fujian Province,China,using a regression discontinuity design.Methods:We utilized influenza-like illness(ILI)percentage as an indicator of influenza activity,with data from all sentinel hospitals between Week 4,2020,and Week 51,2023.The data is divided into two groups:the COVID-19 epidemic period and the post-epidemic period.Statistical analysis was performed with R software using robust RD design methods to account for potential confounders including seasonality,temperature,and influenza vaccination rates.Results:There was a discernible increase in the ILI percentage during the post-epidemic period.The robustness of the findings was confirmed with various RD design bandwidth selection methods and placebo tests,with certwo bandwidth providing the largest estimated effect size:a 14.6-percentage-point increase in the ILI percentage(β=0.146;95%CI:0.096–0.196).Sensitivity analyses and adjustments for confounders consistently pointed to an increased ILI percentage during the post-epidemic period compared to the epidemic period.Conclusion:The 14.6 percentage-point increase in the ILI percentage in Fujian Province,China,after the end of the COVID-19 pandemic suggests that there may be a need to re-evaluate and possibly enhance public health measures to control influenza transmission.Further research is needed to fully understand the factors contributing to this rise and to assess the ongoing impacts of post-pandemic behavioral changes.
基金supported by DST-FIST(Government of India)(Grant No.SR/FIST/MS-1/2017/13)and Seed Money Project(Grant No.DoRDC/733).
文摘This study numerically examines the heat and mass transfer characteristics of two ternary nanofluids via converging and diverg-ing channels.Furthermore,the study aims to assess two ternary nanofluids combinations to determine which configuration can provide better heat and mass transfer and lower entropy production,while ensuring cost efficiency.This work bridges the gap be-tween academic research and industrial feasibility by incorporating cost analysis,entropy generation,and thermal efficiency.To compare the velocity,temperature,and concentration profiles,we examine two ternary nanofluids,i.e.,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O and TiO_(2)+SiO_(2)+Cu/H_(2)O,while considering the shape of nanoparticles.The velocity slip and Soret/Dufour effects are taken into consideration.Furthermore,regression analysis for Nusselt and Sherwood numbers of the model is carried out.The Runge-Kutta fourth-order method with shooting technique is employed to acquire the numerical solution of the governed system of ordinary differential equations.The flow pattern attributes of ternary nanofluids are meticulously examined and simulated with the fluc-tuation of flow-dominating parameters.Additionally,the influence of these parameters is demonstrated in the flow,temperature,and concentration fields.For variation in Eckert and Dufour numbers,TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher temperature than TiO_(2)+SiO_(2)+Cu/H_(2)O.The results obtained indicate that the ternary nanofluid TiO_(2)+SiO_(2)+Al_(2)O_(3)/H_(2)O has a higher heat transfer rate,lesser entropy generation,greater mass transfer rate,and lower cost than that of TiO_(2)+SiO_(2)+Cu/H_(2)O ternary nanofluid.
文摘Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.
基金supported by the National Natural Science Foundation of China(62375013).
文摘As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金funded by the INTER program and cofunded by the Fond National de la Recherche,Luxembourg(FNR)and the Fund for Scientific Research-FNRS,Belgium(F.R.S-FNRS),T.0233.20-‘Sustainable Residential Densification’project(SusDens,2020–2024).
文摘The impact of different global and local variables in urban development processes requires a systematic study to fully comprehend the underlying complexities in them.The interplay between such variables is crucial for modelling urban growth to closely reflects reality.Despite extensive research,ambiguity remains about how variations in these input variables influence urban densification.In this study,we conduct a global sensitivity analysis(SA)using a multinomial logistic regression(MNL)model to assess the model’s explanatory and predictive power.We examine the influence of global variables,including spatial resolution,neighborhood size,and density classes,under different input combinations at a provincial scale to understand their impact on densification.Additionally,we perform a stepwise regression to identify the significant explanatory variables that are important for understanding densification in the Brussels Metropolitan Area(BMA).Our results indicate that a finer spatial resolution of 50 m and 100 m,smaller neighborhood size of 5×5 and 3×3,and specific density classes—namely 3(non-built-up,low and high built-up)and 4(non-built-up,low,medium and high built-up)—optimally explain and predict urban densification.In line with the same,the stepwise regression reveals that models with a coarser resolution of 300 m lack significant variables,reflecting a lower explanatory power for densification.This approach aids in identifying optimal and significant global variables with higher explanatory power for understanding and predicting urban densification.Furthermore,these findings are reproducible in a global urban context,offering valuable insights for planners,modelers and geographers in managing future urban growth and minimizing modelling.
文摘Triaxial tests,a staple in rock engineering,are labor-intensive,sample-demanding,and costly,making their optimization highly advantageous.These tests are essential for characterizing rock strength,and by adopting a failure criterion,they allow for the derivation of criterion parameters through regression,facilitating their integration into modeling programs.In this study,we introduce the application of an underutilized statistical technique—orthogonal regression—well-suited for analyzing triaxial test data.Additionally,we present an innovation in this technique by minimizing the Euclidean distance while incorporating orthogonality between vectors as a constraint,for the case of orthogonal linear regression.Also,we consider the Modified Least Squares method.We exemplify this approach by developing the necessary equations to apply the Mohr-Coulomb,Murrell,Hoek-Brown,andÚcar criteria,and implement these equations in both spreadsheet calculations and R scripts.Finally,we demonstrate the technique's application using five datasets of varied lithologies from specialized literature,showcasing its versatility and effectiveness.
基金supported by Social Science Planning Foundation of Liaoning Province(Grand No.L22ZD065)National Natural Science Foundation of China(Grand Nos.12271231,1247012719,12001229)。
文摘To better capture the characteristics of asymmetry and structural fluctuations observed in count time series,this study delves into the application of the quantile regression(QR)method for analyzing and forecasting nonlinear integer-valued time series exhibiting a piecewise phenomenon.Specifically,we focus on the parameter estimation in the first-order Self-Exciting Threshold Integer-valued Autoregressive(SETINAR(2,1))process with symmetry,asymmetry,and contaminated innovations.We establish the asymptotic properties of the estimator under certain regularity conditions.Monte Carlo simulations demonstrate the superior performance of the QR method compared to the conditional least squares(CLS)approach.Furthermore,we validate the robustness of the proposed method through empirical quantile regression estimation and forecasting for larceny incidents and CAD drug call counts in Pittsburgh,showcasing its effectiveness across diverse levels of data heterogeneity.
基金supported by the Natural Science Foundation of Shanghai(23ZR1463600)Shanghai Pudong New Area Health Commission Research Project(PW2021A-69)Research Project of Clinical Research Center of Shanghai Health Medical University(22MC2022002)。
文摘Gastric cancer is the third leading cause of cancer-related mortality and remains a major global health issue^([1]).Annually,approximately 479,000individuals in China are diagnosed with gastric cancer,accounting for almost 45%of all new cases worldwide^([2]).
基金supported by the National Natural Science Foundation of China(No.92370117)the CAS Project for Young Scientists in Basic Research(No.YSBR-090)。
文摘In recent years,machine learning(ML)techniques have been shown to be effective in accelerating the development process of optoelectronic devices.However,as"black box"models,they have limited theoretical interpretability.In this work,we leverage symbolic regression(SR)technique for discovering the explicit symbolic relationship between the structure of the optoelectronic Fabry-Perot(FP)laser and its optical field distribution,which greatly improves model transparency compared to ML.We demonstrated that the expressions explored through SR exhibit lower errors on the test set compared to ML models,which suggests that the expressions have better fitting and generalization capabilities.
文摘This opinion article discusses the original research work of Yünkül et al.(the Authors)published in the Journal of Mountain Science 21(9):3108–3122.Employing non-linear regression,fuzzy logic and artificial neural network modeling techniques,the Authors interrogated a large database assembled from the existing research literature to assess the performance of twelve equation rules in predicting the undrained shear strength(s_(u))mobilized for remolded fine-grained soils at different values of liquidity index(I_(L))and water content ratio.Based on their analyses,the Authors proposed a simple and reportedly reliable correlation(i.e.,Eq.9 in their paper)for predicting s_(u) over the I_(L) range of 0.15 to 3.00.This article describes various shortcomings in the Authors’assembled database(including potentially anomalous data and covering an excessively wide I_(L) range in relation to routine geotechnical and transportation engineering applications)and their proposed s_(u)=f(I_(L))correlation.Contrary to the Authors’assertions,their proposed correlation is not reliable for fine-grained soils with consistencies in the general firm to stiff range(i.e.,for 0.15<I_(L)<0.40),increasingly overestimating s_(u) for reducing I_(L),and eventually predicting s_(u)→+∞for I_(L)→0.15+(while producing mathematically undefined s_(u) for I_(L)<0.15),thus rendering their correlation unconservative and potentially leading to unsafe geotechnical designs.Exponential or regular-power type s_(u)=f(I_(L))models are more s_(u)itable when developing correlations that are applicable over the full plastic range(of 0<I_(L)<1),thereby providing reasonably conservative s_(u) predictions for use in the preliminary design for routine geotechnical engineering applications.