The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduce...The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.展开更多
With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy...With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy.However,efficient client selection and adaptive weight allocation in heterogeneous and non-IID environments remain challenging.To address these issues,we propose Federated Learning with Client Selection and Adaptive Weighting(FedCW),a novel algorithm that leverages adaptive client selection and dynamic weight allocation for optimizing model convergence in real-time vehicular networks.FedCW selects clients based on their Euclidean distance from the global model and dynamically adjusts aggregation weights to optimize both data diversity and model convergence.Experimental results show that FedCW significantly outperforms existing FL algorithms such as FedAvg,FedProx,and SCAFFOLD,particularly in non-IID settings,achieving faster convergence,higher accuracy,and reduced communication overhead.These findings demonstrate that FedCW provides an effective solution for enhancing the performance of FL in heterogeneous,edge-based computing environments.展开更多
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from...Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal...Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.展开更多
The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more e...The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more efficient and reliable intrusion detection systems(IDSs).However,the advent of larger IDS datasets has negatively impacted the performance and computational complexity of AI-based IDSs.Many researchers used data preprocessing techniques such as feature selection and normalization to overcome such issues.While most of these researchers reported the success of these preprocessing techniques on a shallow level,very few studies have been performed on their effects on a wider scale.Furthermore,the performance of an IDS model is subject to not only the utilized preprocessing techniques but also the dataset and the ML/DL algorithm used,which most of the existing studies give little emphasis on.Thus,this study provides an in-depth analysis of feature selection and normalization effects on IDS models built using three IDS datasets:NSL-KDD,UNSW-NB15,and CSE–CIC–IDS2018,and various AI algorithms.A wrapper-based approach,which tends to give superior performance,and min-max normalization methods were used for feature selection and normalization,respectively.Numerous IDS models were implemented using the full and feature-selected copies of the datasets with and without normalization.The models were evaluated using popular evaluation metrics in IDS modeling,intra-and inter-model comparisons were performed between models and with state-of-the-art works.Random forest(RF)models performed better on NSL-KDD and UNSW-NB15 datasets with accuracies of 99.86%and 96.01%,respectively,whereas artificial neural network(ANN)achieved the best accuracy of 95.43%on the CSE–CIC–IDS2018 dataset.The RF models also achieved an excellent performance compared to recent works.The results show that normalization and feature selection positively affect IDS modeling.Furthermore,while feature selection benefits simpler algorithms(such as RF),normalization is more useful for complex algorithms like ANNs and deep neural networks(DNNs),and algorithms such as Naive Bayes are unsuitable for IDS modeling.The study also found that the UNSW-NB15 and CSE–CIC–IDS2018 datasets are more complex and more suitable for building and evaluating modern-day IDS than the NSL-KDD dataset.Our findings suggest that prioritizing robust algorithms like RF,alongside complex models such as ANN and DNN,can significantly enhance IDS performance.These insights provide valuable guidance for managers to develop more effective security measures by focusing on high detection rates and low false alert rates.展开更多
目的:基于网络药理学和分子对接技术,探讨经验方归元膏的活性成分防治慢性疲劳综合征(CFS)的潜在机制。方法:从TCMSP数据库、BATMAN数据库筛选归元膏的化学成分和靶点,通过GeneCards、OMIM、DisGenet数据库检索收集CFS靶点,将归元膏作...目的:基于网络药理学和分子对接技术,探讨经验方归元膏的活性成分防治慢性疲劳综合征(CFS)的潜在机制。方法:从TCMSP数据库、BATMAN数据库筛选归元膏的化学成分和靶点,通过GeneCards、OMIM、DisGenet数据库检索收集CFS靶点,将归元膏作用靶点和疾病靶点取交集,使用Cytoscape Version 3.9.1制做药物和疾病靶点韦恩图,运用STRING数据库构建蛋白质与蛋白质相互作用的PPI网络作用图,并通过Cytoscape3.8.2软件中的插件CytoNCA筛选关键靶点,并将交集靶点导入David数据库,利用微生信程序进行基因本体(GO)和京都和基因组百科全书(KEGG)富集分析,并通过Cytoscape3.8.2构建归元膏药物–有效成分–靶点和归元膏药物–成分–靶点–通路网络,最后用AutoDock vinna软件进行分子对接验证。结果:检索出归元膏活性成分有111个,其中quercetin,isoflavanone,poriferasta-7,22E-dien-3beta-ol,hancinol,orchinol等为归元膏防治CFS的关键成分,CFS相关靶点507个,其中IL-6 (白细胞介素-6),TNF (肿瘤坏死因子),STAT3 (信号转导及转录激活因子3),JUN,BCL2 (B细胞淋巴瘤/白血病-2),HIF1A (缺氧诱导因子1α亚单位),AKT1 (AKT丝氨酸/苏氨酸激酶1),CASP3 (caspase-3),MMP9 (基质金属蛋白酶9),NFKB1 (核因子kappa B亚基1)为防治CFS的关键靶点,GO富集分析示生物学过程方面可能通过对凋亡过程的负调控、细胞增殖的正调控、蛋白质磷酸化的正调控、内皮细胞增殖的正调控等,分子功能主要包括酶结合、受体结合、蛋白酶结合等;细胞功能主要包含线粒体、蛋白质复合物、细胞内膜结合的细胞器等。KEGG富集分析一共得到186条信号通路,主要与PI3K-Akt通路、卡波西肉瘤相关疱疹病毒感染、丝裂原活化蛋白激酶信号通路等有关。分子对接结果表明归元膏中的核心作用靶点和主要活性成分具备稳定性的结合活性。结论:归元膏将quercetin、poriferasta-7,22E-dien-3beta-ol、orchinol、isoflavanone等关键活性成分与IL6、TNF、STAT3、JUN、BCL2、HIF1A、AKT1和CASP3等靶点结合,通过对PI3K/Akt、MAPK、NF-κB等多条信号通路的调控,调节了CFS患者的免疫反应、细胞凋亡、代谢紊乱等多个生物学过程。通过激活抗凋亡通路、抑制促炎因子的产生和调节能量代谢,验证了该方多靶点多线路的调控,发挥了多靶点、多通路的协同作用,显著改善了CFS的症状。Objective: Based on network pharmacology and molecular docking technology, this study explores the potential mechanisms of active ingredients in the empirical formula Guiyuan Ointment for the prevention and treatment of chronic fatigue syndrome (CFS). Methods: Chemical components and targets of Guiyuan Ointment were screened from the TCMSP database and BATMAN database. CFS targets were collected through GeneCards, OMIM, and DisGenet databases. The intersection of Guiyuan Ointment’s action targets and disease targets was taken. Cytoscape Version 3.9.1 was used to create a Venn diagram of drug and disease targets. The STRING database was employed to construct a protein-protein interaction (PPI) network diagram, and key targets were screened using the CytoNCA plugin in Cytoscape 3.8.2 software. The intersection targets were imported into the David database, and gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed using the microbioinformatics program. Cytoscape 3.8.2 was used to construct the Guiyuan Ointment drug-effective component-target and drug-component-target-pathway networks. Finally, molecular docking validation was conducted using AutoDock Vina software. Results: A total of 111 active components of Guiyuan Ointment were retrieved, among which quercetin, isoflavanone, poriferasta-7,22E-dien-3beta-ol, hancinol, and orchinol are key components for the prevention and treatment of CFS by Guiyuan Ointment. There are 507 CFS-related targets, with IL-6 (interleukin-6), TNF (tumor necrosis factor), STAT3 (signal transducer and activator of transcription 3), JUN, BCL2 (B-cell lymphoma/leukemia-2), HIF1A (hypoxia-inducible factor 1α subunit), AKT1 (AKT serine/threonine kinase 1), CASP3 (caspase-3), MMP9 (matrix metallopeptidase 9), and NFKB1 (nuclear factor kappa B subunit 1) being key targets for the prevention and treatment of CFS. GO enrichment analysis showed that in terms of biological processes, it may involve negative regulation of apoptosis, positive regulation of cell proliferation, positive regulation of protein phosphorylation, and positive regulation of endothelial cell proliferation, etc. The main molecular functions include enzyme binding, receptor binding, and protease binding, etc. The main cellular functions involve mitochondria, protein complexes, and cell organelles bound to the intracellular membrane, etc. KEGG enrichment analysis yielded a total of 186 signaling pathways, mainly related to the PI3K-Akt pathway, Kaposi’s sarcoma-associated herpesvirus infection, and mitogen-activated protein kinase signaling pathway, etc. The molecular docking results indicated that the core action targets and main active components in Guiyuan Ointment have stable binding activity. Conclusion: Guiyuan Ointment combines key active components such as quercetin, poriferasta-7,22E-dien-3beta-ol, orchinol, and isoflavanone with targets like IL6, TNF, STAT3, JUN, BCL2, HIF1A, AKT1, and CASP3. By regulating multiple signaling pathways such as PI3K/Akt, MAPK, and NF-κB, it modulates various biological processes in CFS patients, including immune response, cell apoptosis, and metabolic disorders. Through activating anti-apoptotic pathways, inhibiting the production of pro-inflammatory factors, and regulating energy metabolism, it verifies the multi-target and multi-pathway regulation of this formula, exerting synergistic effects on multiple targets and pathways, and significantly improving the symptoms of CFS.展开更多
Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these i...Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.展开更多
The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly re...The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.展开更多
In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(...In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(CSI),which is difficult to achieve in practice.To be more practical,it is important to investigate covert communications against a warden with uncertain locations and imperfect CSI,which makes it difficult for legitimate transceivers to estimate the detection probability of the warden.First,the uncertainty caused by the unknown warden location must be removed,and the Optimal Detection Position(OPTDP)of the warden is derived which can provide the best detection performance(i.e.,the worst case for a covert communication).Then,to further avoid the impractical assumption of perfect CSI,the covert throughput is maximized using only the channel distribution information.Given this OPTDP based worst case for covert communications,the jammer selection,the jamming power,the transmission power,and the transmission rate are jointly optimized to maximize the covert throughput(OPTDP-JP).To solve this coupling problem,a Heuristic algorithm based on Maximum Distance Ratio(H-MAXDR)is proposed to provide a sub-optimal solution.First,according to the analysis of the covert throughput,the node with the maximum distance ratio(i.e.,the ratio of the distances from the jammer to the receiver and that to the warden)is selected as the friendly jammer(MAXDR).Then,the optimal transmission and jamming power can be derived,followed by the optimal transmission rate obtained via the bisection method.In numerical and simulation results,it is shown that although the location of the warden is unknown,by assuming the OPTDP of the warden,the proposed OPTDP-JP can always satisfy the covertness constraint.In addition,with an uncertain warden and imperfect CSI,the covert throughput provided by OPTDP-JP is 80%higher than the existing schemes when the covertness constraint is 0.9,showing the effectiveness of OPTDP-JP.展开更多
Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain c...Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.展开更多
This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independ...This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.展开更多
Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp...Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.展开更多
In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this stud...Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this study,we investigate mtDNA variants that may affect obesity risk in 2877 Han Chinese individuals from 3 independent populations.The association analysis of 16 basal mtDNA haplogroups with body mass index,waist circumference,and waist-to-hip ratio reveals that only haplogroup M7 is significantly negatively correlated with all three adiposity-related anthropometric traits in the overall cohort,verified by the analysis of a single population,i.e.,the Zhengzhou population.Furthermore,subhaplogroup analysis suggests that M7b1a1 is the most likely haplogroup associated with a decreased obesity risk,and the variation T12811C(causing Y159H in ND5)harbored in M7b1a1 may be the most likely candidate for altering the mitochondrial function.Specifically,we find that proportionally more nonsynonymous mutations accumulate in M7b1a1 carriers,indicating that M7b1a1 is either under positive selection or subject to a relaxation of selective constraints.We also find that nuclear variants,especially in DACT2 and PIEZO1,may functionally interact with M7b1a1.展开更多
Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with mult...Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.展开更多
基金supported by Ho Chi Minh City Open University,Vietnam under grant number E2024.02.1CD and Suan Sunandha Rajabhat University,Thailand.
文摘The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.
文摘With the increasing complexity of vehicular networks and the proliferation of connected vehicles,Federated Learning(FL)has emerged as a critical framework for decentralized model training while preserving data privacy.However,efficient client selection and adaptive weight allocation in heterogeneous and non-IID environments remain challenging.To address these issues,we propose Federated Learning with Client Selection and Adaptive Weighting(FedCW),a novel algorithm that leverages adaptive client selection and dynamic weight allocation for optimizing model convergence in real-time vehicular networks.FedCW selects clients based on their Euclidean distance from the global model and dynamically adjusts aggregation weights to optimize both data diversity and model convergence.Experimental results show that FedCW significantly outperforms existing FL algorithms such as FedAvg,FedProx,and SCAFFOLD,particularly in non-IID settings,achieving faster convergence,higher accuracy,and reduced communication overhead.These findings demonstrate that FedCW provides an effective solution for enhancing the performance of FL in heterogeneous,edge-based computing environments.
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)Henan Provincial Science and Technology Research Project(No.252102211085,No.252102211105)+3 种基金Endogenous Security Cloud Network Convergence R&D Center(No.602431011PQ1)The Special Project for Research and Development in Key Areas of Guangdong Province(No.2021ZDZX1098)The Stabilization Support Program of Science,Technology and Innovation Commission of Shenzhen Municipality(No.20231128083944001)The Key scientific research projects of Henan higher education institutions(No.24A520042).
文摘Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
文摘Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.
文摘The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more efficient and reliable intrusion detection systems(IDSs).However,the advent of larger IDS datasets has negatively impacted the performance and computational complexity of AI-based IDSs.Many researchers used data preprocessing techniques such as feature selection and normalization to overcome such issues.While most of these researchers reported the success of these preprocessing techniques on a shallow level,very few studies have been performed on their effects on a wider scale.Furthermore,the performance of an IDS model is subject to not only the utilized preprocessing techniques but also the dataset and the ML/DL algorithm used,which most of the existing studies give little emphasis on.Thus,this study provides an in-depth analysis of feature selection and normalization effects on IDS models built using three IDS datasets:NSL-KDD,UNSW-NB15,and CSE–CIC–IDS2018,and various AI algorithms.A wrapper-based approach,which tends to give superior performance,and min-max normalization methods were used for feature selection and normalization,respectively.Numerous IDS models were implemented using the full and feature-selected copies of the datasets with and without normalization.The models were evaluated using popular evaluation metrics in IDS modeling,intra-and inter-model comparisons were performed between models and with state-of-the-art works.Random forest(RF)models performed better on NSL-KDD and UNSW-NB15 datasets with accuracies of 99.86%and 96.01%,respectively,whereas artificial neural network(ANN)achieved the best accuracy of 95.43%on the CSE–CIC–IDS2018 dataset.The RF models also achieved an excellent performance compared to recent works.The results show that normalization and feature selection positively affect IDS modeling.Furthermore,while feature selection benefits simpler algorithms(such as RF),normalization is more useful for complex algorithms like ANNs and deep neural networks(DNNs),and algorithms such as Naive Bayes are unsuitable for IDS modeling.The study also found that the UNSW-NB15 and CSE–CIC–IDS2018 datasets are more complex and more suitable for building and evaluating modern-day IDS than the NSL-KDD dataset.Our findings suggest that prioritizing robust algorithms like RF,alongside complex models such as ANN and DNN,can significantly enhance IDS performance.These insights provide valuable guidance for managers to develop more effective security measures by focusing on high detection rates and low false alert rates.
文摘目的:基于网络药理学和分子对接技术,探讨经验方归元膏的活性成分防治慢性疲劳综合征(CFS)的潜在机制。方法:从TCMSP数据库、BATMAN数据库筛选归元膏的化学成分和靶点,通过GeneCards、OMIM、DisGenet数据库检索收集CFS靶点,将归元膏作用靶点和疾病靶点取交集,使用Cytoscape Version 3.9.1制做药物和疾病靶点韦恩图,运用STRING数据库构建蛋白质与蛋白质相互作用的PPI网络作用图,并通过Cytoscape3.8.2软件中的插件CytoNCA筛选关键靶点,并将交集靶点导入David数据库,利用微生信程序进行基因本体(GO)和京都和基因组百科全书(KEGG)富集分析,并通过Cytoscape3.8.2构建归元膏药物–有效成分–靶点和归元膏药物–成分–靶点–通路网络,最后用AutoDock vinna软件进行分子对接验证。结果:检索出归元膏活性成分有111个,其中quercetin,isoflavanone,poriferasta-7,22E-dien-3beta-ol,hancinol,orchinol等为归元膏防治CFS的关键成分,CFS相关靶点507个,其中IL-6 (白细胞介素-6),TNF (肿瘤坏死因子),STAT3 (信号转导及转录激活因子3),JUN,BCL2 (B细胞淋巴瘤/白血病-2),HIF1A (缺氧诱导因子1α亚单位),AKT1 (AKT丝氨酸/苏氨酸激酶1),CASP3 (caspase-3),MMP9 (基质金属蛋白酶9),NFKB1 (核因子kappa B亚基1)为防治CFS的关键靶点,GO富集分析示生物学过程方面可能通过对凋亡过程的负调控、细胞增殖的正调控、蛋白质磷酸化的正调控、内皮细胞增殖的正调控等,分子功能主要包括酶结合、受体结合、蛋白酶结合等;细胞功能主要包含线粒体、蛋白质复合物、细胞内膜结合的细胞器等。KEGG富集分析一共得到186条信号通路,主要与PI3K-Akt通路、卡波西肉瘤相关疱疹病毒感染、丝裂原活化蛋白激酶信号通路等有关。分子对接结果表明归元膏中的核心作用靶点和主要活性成分具备稳定性的结合活性。结论:归元膏将quercetin、poriferasta-7,22E-dien-3beta-ol、orchinol、isoflavanone等关键活性成分与IL6、TNF、STAT3、JUN、BCL2、HIF1A、AKT1和CASP3等靶点结合,通过对PI3K/Akt、MAPK、NF-κB等多条信号通路的调控,调节了CFS患者的免疫反应、细胞凋亡、代谢紊乱等多个生物学过程。通过激活抗凋亡通路、抑制促炎因子的产生和调节能量代谢,验证了该方多靶点多线路的调控,发挥了多靶点、多通路的协同作用,显著改善了CFS的症状。Objective: Based on network pharmacology and molecular docking technology, this study explores the potential mechanisms of active ingredients in the empirical formula Guiyuan Ointment for the prevention and treatment of chronic fatigue syndrome (CFS). Methods: Chemical components and targets of Guiyuan Ointment were screened from the TCMSP database and BATMAN database. CFS targets were collected through GeneCards, OMIM, and DisGenet databases. The intersection of Guiyuan Ointment’s action targets and disease targets was taken. Cytoscape Version 3.9.1 was used to create a Venn diagram of drug and disease targets. The STRING database was employed to construct a protein-protein interaction (PPI) network diagram, and key targets were screened using the CytoNCA plugin in Cytoscape 3.8.2 software. The intersection targets were imported into the David database, and gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed using the microbioinformatics program. Cytoscape 3.8.2 was used to construct the Guiyuan Ointment drug-effective component-target and drug-component-target-pathway networks. Finally, molecular docking validation was conducted using AutoDock Vina software. Results: A total of 111 active components of Guiyuan Ointment were retrieved, among which quercetin, isoflavanone, poriferasta-7,22E-dien-3beta-ol, hancinol, and orchinol are key components for the prevention and treatment of CFS by Guiyuan Ointment. There are 507 CFS-related targets, with IL-6 (interleukin-6), TNF (tumor necrosis factor), STAT3 (signal transducer and activator of transcription 3), JUN, BCL2 (B-cell lymphoma/leukemia-2), HIF1A (hypoxia-inducible factor 1α subunit), AKT1 (AKT serine/threonine kinase 1), CASP3 (caspase-3), MMP9 (matrix metallopeptidase 9), and NFKB1 (nuclear factor kappa B subunit 1) being key targets for the prevention and treatment of CFS. GO enrichment analysis showed that in terms of biological processes, it may involve negative regulation of apoptosis, positive regulation of cell proliferation, positive regulation of protein phosphorylation, and positive regulation of endothelial cell proliferation, etc. The main molecular functions include enzyme binding, receptor binding, and protease binding, etc. The main cellular functions involve mitochondria, protein complexes, and cell organelles bound to the intracellular membrane, etc. KEGG enrichment analysis yielded a total of 186 signaling pathways, mainly related to the PI3K-Akt pathway, Kaposi’s sarcoma-associated herpesvirus infection, and mitogen-activated protein kinase signaling pathway, etc. The molecular docking results indicated that the core action targets and main active components in Guiyuan Ointment have stable binding activity. Conclusion: Guiyuan Ointment combines key active components such as quercetin, poriferasta-7,22E-dien-3beta-ol, orchinol, and isoflavanone with targets like IL6, TNF, STAT3, JUN, BCL2, HIF1A, AKT1, and CASP3. By regulating multiple signaling pathways such as PI3K/Akt, MAPK, and NF-κB, it modulates various biological processes in CFS patients, including immune response, cell apoptosis, and metabolic disorders. Through activating anti-apoptotic pathways, inhibiting the production of pro-inflammatory factors, and regulating energy metabolism, it verifies the multi-target and multi-pathway regulation of this formula, exerting synergistic effects on multiple targets and pathways, and significantly improving the symptoms of CFS.
基金supported by the National Natural Science Foundation of China(42250101)the Macao Foundation。
文摘Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.
基金supported by the National Natural Science Foundation of China(32160782 and 32060737).
文摘The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.
基金supported by the CAS Project for Young Scientists in Basic Research under Grant YSBR-035Jiangsu Provincial Key Research and Development Program under Grant BE2021013-2.
文摘In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(CSI),which is difficult to achieve in practice.To be more practical,it is important to investigate covert communications against a warden with uncertain locations and imperfect CSI,which makes it difficult for legitimate transceivers to estimate the detection probability of the warden.First,the uncertainty caused by the unknown warden location must be removed,and the Optimal Detection Position(OPTDP)of the warden is derived which can provide the best detection performance(i.e.,the worst case for a covert communication).Then,to further avoid the impractical assumption of perfect CSI,the covert throughput is maximized using only the channel distribution information.Given this OPTDP based worst case for covert communications,the jammer selection,the jamming power,the transmission power,and the transmission rate are jointly optimized to maximize the covert throughput(OPTDP-JP).To solve this coupling problem,a Heuristic algorithm based on Maximum Distance Ratio(H-MAXDR)is proposed to provide a sub-optimal solution.First,according to the analysis of the covert throughput,the node with the maximum distance ratio(i.e.,the ratio of the distances from the jammer to the receiver and that to the warden)is selected as the friendly jammer(MAXDR).Then,the optimal transmission and jamming power can be derived,followed by the optimal transmission rate obtained via the bisection method.In numerical and simulation results,it is shown that although the location of the warden is unknown,by assuming the OPTDP of the warden,the proposed OPTDP-JP can always satisfy the covertness constraint.In addition,with an uncertain warden and imperfect CSI,the covert throughput provided by OPTDP-JP is 80%higher than the existing schemes when the covertness constraint is 0.9,showing the effectiveness of OPTDP-JP.
基金funded by the Natural Science Foundation of China(Grant Nos.42377164 and 41972280)the Badong National Observation and Research Station of Geohazards(Grant No.BNORSG-202305).
文摘Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.
文摘This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.
基金the Deanship of Scientifc Research at King Khalid University for funding this work through large group Research Project under grant number RGP2/421/45supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2024/R/1446)+1 种基金supported by theResearchers Supporting Project Number(UM-DSR-IG-2023-07)Almaarefa University,Riyadh,Saudi Arabia.supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1F1A1055408).
文摘Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金supported by the National Natural Science Foundation of China(32270670,32288101,32271186,and 32200482)the National Basic Research Program of China(2015FY111700)the CAMS Innovation Fund for Medical Sciences(2019-I2M-5-066).
文摘Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this study,we investigate mtDNA variants that may affect obesity risk in 2877 Han Chinese individuals from 3 independent populations.The association analysis of 16 basal mtDNA haplogroups with body mass index,waist circumference,and waist-to-hip ratio reveals that only haplogroup M7 is significantly negatively correlated with all three adiposity-related anthropometric traits in the overall cohort,verified by the analysis of a single population,i.e.,the Zhengzhou population.Furthermore,subhaplogroup analysis suggests that M7b1a1 is the most likely haplogroup associated with a decreased obesity risk,and the variation T12811C(causing Y159H in ND5)harbored in M7b1a1 may be the most likely candidate for altering the mitochondrial function.Specifically,we find that proportionally more nonsynonymous mutations accumulate in M7b1a1 carriers,indicating that M7b1a1 is either under positive selection or subject to a relaxation of selective constraints.We also find that nuclear variants,especially in DACT2 and PIEZO1,may functionally interact with M7b1a1.
文摘Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.