The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data.Utilizing kernel smoothing techniques,the authors propose a locally concerned U-statistic method...The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data.Utilizing kernel smoothing techniques,the authors propose a locally concerned U-statistic method to assess the overall significance of the coefficients.The authors establish that the proposed test is asymptotically normal under both the null hypothesis and local alternatives.Based on the locally concerned U-statistic,the authors further develop a globally concerned U-statistic to test whether the coefficient function is zero.A stochastic perturbation method is employed to approximate the distribution of the globally concerned test statistic.Monte Carlo simulations demonstrate the validity of the proposed test in finite samples.展开更多
Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d...Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.展开更多
Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when ta...Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when tackling high-dimensional optimization challenges.To effectively address these challenges,this study introduces cooperative metaheuristics integrating dynamic dimension reduction(DR).Building upon particle swarm optimization(PSO)and differential evolution(DE),the proposed cooperative methods C-PSO and C-DE are developed.In the proposed methods,the modified principal components analysis(PCA)is utilized to reduce the dimension of design variables,thereby decreasing computational costs.The dynamic DR strategy implements periodic execution of modified PCA after a fixed number of iterations,resulting in the important dimensions being dynamically identified.Compared with the static one,the dynamic DR strategy can achieve precise identification of important dimensions,thereby enabling accelerated convergence toward optimal solutions.Furthermore,the influence of cumulative contribution rate thresholds on optimization problems with different dimensions is investigated.Metaheuristic algorithms(PSO,DE)and cooperative metaheuristics(C-PSO,C-DE)are examined by 15 benchmark functions and two engineering design problems(speed reducer and composite pressure vessel).Comparative results demonstrate that the cooperative methods achieve significantly superior performance compared to standard methods in both solution accuracy and computational efficiency.Compared to standard metaheuristic algorithms,cooperative metaheuristics achieve a reduction in computational cost of at least 40%.The cooperative metaheuristics can be effectively used to tackle both high-dimensional unconstrained and constrained optimization problems.展开更多
Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three ...Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.展开更多
Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P...Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders.This approach integrates theoretical knowledge and practical case studies through a dual-track strategy.(i)Theoretical track:textbooks and guidelines on TCM mental disorders were manually segmented.Initial responses were generated using DeepSeek-V3,followed by refinement by the Qwen3-32B model to align the expression with human preferences.A screening algorithm was then applied to select 16000 high-quality instruction pairs.(ii)Practical track:starting from over 600 real clinical case seeds,diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation,resulting in 4000 high-quality practiceoriented instruction pairs.The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset,comprising a total of 20000 instruction pairs.To validate the dataset’s effectiveness,three experimental evaluations(both manual and automated)were conducted:(i)comparative studies to compare the performance of models fine-tuned on different datasets;(ii)benchmarking to compare against mainstream TCM-specific large language models(LLMs);(iii)data ablation study to investigate the relationship between data volume and model performance.Results Experimental results demonstrate the superior performance of T&P-model finetuned on the Med-Mental-Instruct-T&P dataset.In the comparative study,the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data.This superiority was evident in both automated metrics(ROUGEL>0.55)and expert manual evaluations(scoring above 7/10 across accuracy).In benchmark comparisons,the T&P-model also excelled against existing mainstream TCM LLMs(e.g.,HuatuoGPT and ZuoyiGPT).It showed particularly strong capabilities in handling diverse clinical presentations,including challenging disorders such as insomnia and coma,showcasing its robustness and versatility.Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10%to 50%;beyond 50%,performance improvement slowed significantly,with metrics plateauing and approaching a saturation point.展开更多
The decoherence of high-dimensional orbital angular momentum(OAM)entanglement in the weak scintillation regime has been investigated.In this study,we simulate atmospheric turbulence by utilizing a multiple-phase scree...The decoherence of high-dimensional orbital angular momentum(OAM)entanglement in the weak scintillation regime has been investigated.In this study,we simulate atmospheric turbulence by utilizing a multiple-phase screen imprinted with anisotropic non-Kolmogorov turbulence.The entanglement negativity and fidelity are introduced to quantify the entanglement of a high-dimensional OAM state.The numerical evaluation results indicate that entanglement negativity and fidelity last longer for a high-dimensional OAM state when the azimuthal mode has a lower value.Additionally,the evolution of higher-dimensional OAM entanglement is significantly influenced by OAM beam parameters and turbulence parameters.Compared to isotropic atmospheric turbulence,anisotropic turbulence has a lesser influence on highdimensional OAM entanglement.展开更多
Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthc...Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare.展开更多
When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes...When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.展开更多
It is known that monotone recurrence relations can induce a class of twist homeomorphisms on the high-dimensional cylinder,which is an extension of the class of monotone twist maps on the annulus or two-dimensional cy...It is known that monotone recurrence relations can induce a class of twist homeomorphisms on the high-dimensional cylinder,which is an extension of the class of monotone twist maps on the annulus or two-dimensional cylinder.By constructing a bounded solution of the monotone recurrence relation,the main conclusion in this paper is acquired:The induced homeomorphism has Birkhoff orbits provided there is a compact forward-invariant set.Therefore,it generalizes Angenent's results in low-dimensional cases.展开更多
Introduction:Accurate contouring of thoracic organs at risk(OARs)is essential for minimizing complications in radiation treatment.Manual contouring of thoracic OARs is not only time-consuming but also prone to substan...Introduction:Accurate contouring of thoracic organs at risk(OARs)is essential for minimizing complications in radiation treatment.Manual contouring of thoracic OARs is not only time-consuming but also prone to substantial user variation.To enhance the efficiency and consistency,we developed a unified deep learning(DL)OAR contouring model,DeepOAR,that was trained using multiple partially labeled datasets for segmenting a comprehensive set of thoracic OARs following the Radiation Therapy Oncology Group(RTOG)-guided OAR atlas.This DL model supports the segmentation of six required and eight optional OARs guided by the NRG-RTOG 1106 trial,providing precise and reproducible OARs contouring that are ready to be used in radiotherapy practice.Materials and methods:Following the OAR contouring recommendation of the NRG-RTOG 1106 trial,we collected and curated three private datasets and two public datasets,comprising a total of 531 patients with partially annotated thoracic OARs.These partially annotated datasets were utilized to develop DeepOAR,which consisted of a shared encoder and 14 separate decoders,with each decoder dedicated to one specific OAR.For model training,we utilized all patients from the two public datasets and 75%of the patients from the private datasets.We reserved the remaining 25%of the private datasets for independent testing.A multi-user study involving 21 radiation oncologists was conducted on 40 randomly selected patients from the independent testing dataset to evaluate the clinical applicability of DeepOAR.The Dice coefficient score(DSC)and average surface distance(ASD)were computed to evaluate the quantitative delineation performance of the model.Results:DeepOAR outperformed nnUNet(the benchmark medical segmentation model)across all 14 OARs,achieving mean DSC and ASD values of 88.4%and 1.0 mm,respectively,in the independent testing set.Multi-user validation demonstrated that 89.7%of DeepOAR-generated OARs were clinically acceptable or required only minor revisions.A comparison using two randomly selected patients showed that the delineation variability of DeepOAR was significantly smaller than the inter-user variation among radiation oncologists.Human editing of DeepOAR’s predictions could further improve OAR delineation accuracy by an average of 3%increase in DSC and 40%reduction in ASD while significantly reducing the workload of radiation oncologists for contouring 14 thoracic OARs by an average of 77.0%.Conclusion:We developed DeepOAR,a DL-based unified contouring model trained using multiple partially labeled datasets,to delineate a comprehensive set of 14 thoracic OARs following the RTOG-guided OAR atlas.Both qualitative and quantitative results demonstrated the strong clinical applicability of DeepOAR for the OAR delineation process in thoracic cancer radiotherapy workflows,along with improved efficiency,comprehensiveness,and quality.展开更多
Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveill...Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveillance,biometric authentication,and human-computer interaction.This paper provides a comprehensive review of face detection techniques developed to handle occluded faces.Studies are categorized into four main approaches:feature-based,machine learning-based,deep learning-based,and hybrid methods.We analyzed state-of-the-art studies within each category,examining their methodologies,strengths,and limitations based on widely used benchmark datasets,highlighting their adaptability to partial and severe occlusions.The review also identifies key challenges,including dataset diversity,model generalization,and computational efficiency.Our findings reveal that deep learning methods dominate recent studies,benefiting from their ability to extract hierarchical features and handle complex occlusion patterns.More recently,researchers have increasingly explored Transformer-based architectures,such as Vision Transformer(ViT)and Swin Transformer,to further improve detection robustness under challenging occlusion scenarios.In addition,hybrid approaches,which aim to combine traditional andmodern techniques,are emerging as a promising direction for improving robustness.This review provides valuable insights for researchers aiming to develop more robust face detection systems and for practitioners seeking to deploy reliable solutions in real-world,occlusionprone environments.Further improvements and the proposal of broader datasets are required to developmore scalable,robust,and efficient models that can handle complex occlusions in real-world scenarios.展开更多
Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemio...Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemiology and risk assessment,including high dimensionality,correlated exposure,and subtle individual effects.Methods We proposed a novel statistical approach,the generalized functional linear model(GFLM),to analyze the health effects of exposure mixtures.GFLM treats the effect of mixture exposures as a smooth function by reordering exposures based on specific mechanisms and capturing internal correlations to provide a meaningful estimation and interpretation.The robustness and efficiency was evaluated under various scenarios through extensive simulation studies.Results We applied the GFLM to two datasets from the National Health and Nutrition Examination Survey(NHANES).In the first application,we examined the effects of 37 nutrients on BMI(2011–2016 cycles).The GFLM identified a significant mixture effect,with fiber and fat emerging as the nutrients with the greatest negative and positive effects on BMI,respectively.For the second application,we investigated the association between four pre-and perfluoroalkyl substances(PFAS)and gout risk(2007–2018 cycles).Unlike traditional methods,the GFLM indicated no significant association,demonstrating its robustness to multicollinearity.Conclusion GFLM framework is a powerful tool for mixture exposure analysis,offering improved handling of correlated exposures and interpretable results.It demonstrates robust performance across various scenarios and real-world applications,advancing our understanding of complex environmental exposures and their health impacts on environmental epidemiology and toxicology.展开更多
Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geograph...Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources.展开更多
Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.Howeve...Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.However,the majority of existing TPD data is accessible in the form of published images,which lacks structured and quantitative datasets.This constrains its utility for rigorous quantitative analysis and computational modelling.Using carbon monoxide(CO)which is a widely adopted probe molecule,a curated and standardized dataset of CO-TPD is constructed,encompassing 14 transition-metal single-crystal surfaces,including copper(Cu)and ruthenium(Ru).By systematically extracting numerical data points from published spectra and applying normalization,essential spectral features such as peak shape are fully preserved.The dataset also documents relevant experimental parameters,including heating rates,and was developed using a standardized protocol for data collection and quality control.This resource serves as both a reference library to support the deconvolution of TPD spectra from complex catalysts and an experimental benchmark for calibrating parameters in theoretical models.By providing a reliable and accessible data function,this work advances the microscopic understanding and the rational design of catalyst active centers.展开更多
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume...Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.展开更多
The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficu...The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.展开更多
Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensi...Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.展开更多
基金supported by the National Social Science Foundation of China under Grant No.23&ZD126National Science Foundation of China under Grant No.12471256+1 种基金Natural Science Foundation of Shanxi Province under Grant No.202203021221219Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi under Grant No.2023L164。
文摘The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data.Utilizing kernel smoothing techniques,the authors propose a locally concerned U-statistic method to assess the overall significance of the coefficients.The authors establish that the proposed test is asymptotically normal under both the null hypothesis and local alternatives.Based on the locally concerned U-statistic,the authors further develop a globally concerned U-statistic to test whether the coefficient function is zero.A stochastic perturbation method is employed to approximate the distribution of the globally concerned test statistic.Monte Carlo simulations demonstrate the validity of the proposed test in finite samples.
基金The work described in this paper was fully supported by a grant from Hong Kong Metropolitan University(RIF/2021/05).
文摘Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
基金funded by National Natural Science Foundation of China(Nos.12402142,11832013 and 11572134)Natural Science Foundation of Hubei Province(No.2024AFB235)+1 种基金Hubei Provincial Department of Education Science and Technology Research Project(No.Q20221714)the Opening Foundation of Hubei Key Laboratory of Digital Textile Equipment(Nos.DTL2023019 and DTL2022012).
文摘Owing to their global search capabilities and gradient-free operation,metaheuristic algorithms are widely applied to a wide range of optimization problems.However,their computational demands become prohibitive when tackling high-dimensional optimization challenges.To effectively address these challenges,this study introduces cooperative metaheuristics integrating dynamic dimension reduction(DR).Building upon particle swarm optimization(PSO)and differential evolution(DE),the proposed cooperative methods C-PSO and C-DE are developed.In the proposed methods,the modified principal components analysis(PCA)is utilized to reduce the dimension of design variables,thereby decreasing computational costs.The dynamic DR strategy implements periodic execution of modified PCA after a fixed number of iterations,resulting in the important dimensions being dynamically identified.Compared with the static one,the dynamic DR strategy can achieve precise identification of important dimensions,thereby enabling accelerated convergence toward optimal solutions.Furthermore,the influence of cumulative contribution rate thresholds on optimization problems with different dimensions is investigated.Metaheuristic algorithms(PSO,DE)and cooperative metaheuristics(C-PSO,C-DE)are examined by 15 benchmark functions and two engineering design problems(speed reducer and composite pressure vessel).Comparative results demonstrate that the cooperative methods achieve significantly superior performance compared to standard methods in both solution accuracy and computational efficiency.Compared to standard metaheuristic algorithms,cooperative metaheuristics achieve a reduction in computational cost of at least 40%.The cooperative metaheuristics can be effectively used to tackle both high-dimensional unconstrained and constrained optimization problems.
基金supported by the research fund of Hanyang University(HY-202500000001616).
文摘Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.
基金Key Scientific Research Project of the Hunan Provincial Department of Education(23A312).
文摘Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders.This approach integrates theoretical knowledge and practical case studies through a dual-track strategy.(i)Theoretical track:textbooks and guidelines on TCM mental disorders were manually segmented.Initial responses were generated using DeepSeek-V3,followed by refinement by the Qwen3-32B model to align the expression with human preferences.A screening algorithm was then applied to select 16000 high-quality instruction pairs.(ii)Practical track:starting from over 600 real clinical case seeds,diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation,resulting in 4000 high-quality practiceoriented instruction pairs.The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset,comprising a total of 20000 instruction pairs.To validate the dataset’s effectiveness,three experimental evaluations(both manual and automated)were conducted:(i)comparative studies to compare the performance of models fine-tuned on different datasets;(ii)benchmarking to compare against mainstream TCM-specific large language models(LLMs);(iii)data ablation study to investigate the relationship between data volume and model performance.Results Experimental results demonstrate the superior performance of T&P-model finetuned on the Med-Mental-Instruct-T&P dataset.In the comparative study,the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data.This superiority was evident in both automated metrics(ROUGEL>0.55)and expert manual evaluations(scoring above 7/10 across accuracy).In benchmark comparisons,the T&P-model also excelled against existing mainstream TCM LLMs(e.g.,HuatuoGPT and ZuoyiGPT).It showed particularly strong capabilities in handling diverse clinical presentations,including challenging disorders such as insomnia and coma,showcasing its robustness and versatility.Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10%to 50%;beyond 50%,performance improvement slowed significantly,with metrics plateauing and approaching a saturation point.
基金supported by the Project of the Hubei Provincial Department of Science and Technology(Grant Nos.2022CFB957,2022CFB475)the National Natural Science Foundation of China(Grant No.11847118)。
文摘The decoherence of high-dimensional orbital angular momentum(OAM)entanglement in the weak scintillation regime has been investigated.In this study,we simulate atmospheric turbulence by utilizing a multiple-phase screen imprinted with anisotropic non-Kolmogorov turbulence.The entanglement negativity and fidelity are introduced to quantify the entanglement of a high-dimensional OAM state.The numerical evaluation results indicate that entanglement negativity and fidelity last longer for a high-dimensional OAM state when the azimuthal mode has a lower value.Additionally,the evolution of higher-dimensional OAM entanglement is significantly influenced by OAM beam parameters and turbulence parameters.Compared to isotropic atmospheric turbulence,anisotropic turbulence has a lesser influence on highdimensional OAM entanglement.
文摘Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare.
基金supported by the Natural Science Basic Research Program of Shaanxi(Program No.2024JC-YBMS-026).
文摘When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.
基金Supported by the National Natural Science Foundation of China(12201446)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(22KJB110005)the Shuangchuang Program of Jiangsu Province(JSSCBS20220898)。
文摘It is known that monotone recurrence relations can induce a class of twist homeomorphisms on the high-dimensional cylinder,which is an extension of the class of monotone twist maps on the annulus or two-dimensional cylinder.By constructing a bounded solution of the monotone recurrence relation,the main conclusion in this paper is acquired:The induced homeomorphism has Birkhoff orbits provided there is a compact forward-invariant set.Therefore,it generalizes Angenent's results in low-dimensional cases.
基金Xianghua Ye,Zhejiang Provincial Science and Technology Project,2024-KY1-001-105.This funding is intended to support the training of organ auto-segmentation models.
文摘Introduction:Accurate contouring of thoracic organs at risk(OARs)is essential for minimizing complications in radiation treatment.Manual contouring of thoracic OARs is not only time-consuming but also prone to substantial user variation.To enhance the efficiency and consistency,we developed a unified deep learning(DL)OAR contouring model,DeepOAR,that was trained using multiple partially labeled datasets for segmenting a comprehensive set of thoracic OARs following the Radiation Therapy Oncology Group(RTOG)-guided OAR atlas.This DL model supports the segmentation of six required and eight optional OARs guided by the NRG-RTOG 1106 trial,providing precise and reproducible OARs contouring that are ready to be used in radiotherapy practice.Materials and methods:Following the OAR contouring recommendation of the NRG-RTOG 1106 trial,we collected and curated three private datasets and two public datasets,comprising a total of 531 patients with partially annotated thoracic OARs.These partially annotated datasets were utilized to develop DeepOAR,which consisted of a shared encoder and 14 separate decoders,with each decoder dedicated to one specific OAR.For model training,we utilized all patients from the two public datasets and 75%of the patients from the private datasets.We reserved the remaining 25%of the private datasets for independent testing.A multi-user study involving 21 radiation oncologists was conducted on 40 randomly selected patients from the independent testing dataset to evaluate the clinical applicability of DeepOAR.The Dice coefficient score(DSC)and average surface distance(ASD)were computed to evaluate the quantitative delineation performance of the model.Results:DeepOAR outperformed nnUNet(the benchmark medical segmentation model)across all 14 OARs,achieving mean DSC and ASD values of 88.4%and 1.0 mm,respectively,in the independent testing set.Multi-user validation demonstrated that 89.7%of DeepOAR-generated OARs were clinically acceptable or required only minor revisions.A comparison using two randomly selected patients showed that the delineation variability of DeepOAR was significantly smaller than the inter-user variation among radiation oncologists.Human editing of DeepOAR’s predictions could further improve OAR delineation accuracy by an average of 3%increase in DSC and 40%reduction in ASD while significantly reducing the workload of radiation oncologists for contouring 14 thoracic OARs by an average of 77.0%.Conclusion:We developed DeepOAR,a DL-based unified contouring model trained using multiple partially labeled datasets,to delineate a comprehensive set of 14 thoracic OARs following the RTOG-guided OAR atlas.Both qualitative and quantitative results demonstrated the strong clinical applicability of DeepOAR for the OAR delineation process in thoracic cancer radiotherapy workflows,along with improved efficiency,comprehensiveness,and quality.
基金funded by A’Sharqiyah University,Sultanate of Oman,under Research Project grant number(BFP/RGP/ICT/22/490).
文摘Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveillance,biometric authentication,and human-computer interaction.This paper provides a comprehensive review of face detection techniques developed to handle occluded faces.Studies are categorized into four main approaches:feature-based,machine learning-based,deep learning-based,and hybrid methods.We analyzed state-of-the-art studies within each category,examining their methodologies,strengths,and limitations based on widely used benchmark datasets,highlighting their adaptability to partial and severe occlusions.The review also identifies key challenges,including dataset diversity,model generalization,and computational efficiency.Our findings reveal that deep learning methods dominate recent studies,benefiting from their ability to extract hierarchical features and handle complex occlusion patterns.More recently,researchers have increasingly explored Transformer-based architectures,such as Vision Transformer(ViT)and Swin Transformer,to further improve detection robustness under challenging occlusion scenarios.In addition,hybrid approaches,which aim to combine traditional andmodern techniques,are emerging as a promising direction for improving robustness.This review provides valuable insights for researchers aiming to develop more robust face detection systems and for practitioners seeking to deploy reliable solutions in real-world,occlusionprone environments.Further improvements and the proposal of broader datasets are required to developmore scalable,robust,and efficient models that can handle complex occlusions in real-world scenarios.
基金supported in part by the Young Scientists Fund of the National Natural Science Foundation of China(Grant Nos.82304253)(and 82273709)the Foundation for Young Talents in Higher Education of Guangdong Province(Grant No.2022KQNCX021)the PhD Starting Project of Guangdong Medical University(Grant No.GDMUB2022054).
文摘Objective Humans are exposed to complex mixtures of environmental chemicals and other factors that can affect their health.Analysis of these mixture exposures presents several key challenges for environmental epidemiology and risk assessment,including high dimensionality,correlated exposure,and subtle individual effects.Methods We proposed a novel statistical approach,the generalized functional linear model(GFLM),to analyze the health effects of exposure mixtures.GFLM treats the effect of mixture exposures as a smooth function by reordering exposures based on specific mechanisms and capturing internal correlations to provide a meaningful estimation and interpretation.The robustness and efficiency was evaluated under various scenarios through extensive simulation studies.Results We applied the GFLM to two datasets from the National Health and Nutrition Examination Survey(NHANES).In the first application,we examined the effects of 37 nutrients on BMI(2011–2016 cycles).The GFLM identified a significant mixture effect,with fiber and fat emerging as the nutrients with the greatest negative and positive effects on BMI,respectively.For the second application,we investigated the association between four pre-and perfluoroalkyl substances(PFAS)and gout risk(2007–2018 cycles).Unlike traditional methods,the GFLM indicated no significant association,demonstrating its robustness to multicollinearity.Conclusion GFLM framework is a powerful tool for mixture exposure analysis,offering improved handling of correlated exposures and interpretable results.It demonstrates robust performance across various scenarios and real-world applications,advancing our understanding of complex environmental exposures and their health impacts on environmental epidemiology and toxicology.
文摘Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources.
基金Supported by the Robotic AI-Scientist Platform of Chinese Academy of SciencesNational Natural Science Foundation of China(22372185)+2 种基金Youth Talent Development Program of SKLCC(2025BWZ009)Natural Science Foundation of Shanxi Province(202203021221219)Research on the Construction of Scientific and Technological Innovation Think Tank of Shanxi Association for Science and Technology(KXKT202542)。
文摘Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.However,the majority of existing TPD data is accessible in the form of published images,which lacks structured and quantitative datasets.This constrains its utility for rigorous quantitative analysis and computational modelling.Using carbon monoxide(CO)which is a widely adopted probe molecule,a curated and standardized dataset of CO-TPD is constructed,encompassing 14 transition-metal single-crystal surfaces,including copper(Cu)and ruthenium(Ru).By systematically extracting numerical data points from published spectra and applying normalization,essential spectral features such as peak shape are fully preserved.The dataset also documents relevant experimental parameters,including heating rates,and was developed using a standardized protocol for data collection and quality control.This resource serves as both a reference library to support the deconvolution of TPD spectra from complex catalysts and an experimental benchmark for calibrating parameters in theoretical models.By providing a reliable and accessible data function,this work advances the microscopic understanding and the rational design of catalyst active centers.
基金supported by Fundamental Research Program of Shanxi Province(Nos.202203021211088,202403021212254,202403021221109)Graduate Research Innovation Project in Shanxi Province(No.2024KY616).
文摘Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.
文摘The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.
文摘Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.